>> Um, so back in February basically, I wrote er a decompiler because everybody was talking about ethereum. Ah, I wish I’d bought, er, coins instead. I would have made more money out of it. So, just to give you a quick introduction, um, if you don’t follow me already on Twitter this is my handle, I’m also the organizer for a conference in Dubai. I didn’t come to Vegas since 2011. I took a Vegas break. No, it’s okay I’m fine coming back again. Um, and my new claim to fame is to have been called a ‘fun guy’ by er the ShadowBrokers. So…[laughs] so just so you know er, I won’t be talking about what, like blockchain is, Merkle Trees is um, er all those things and er we’re going to focus on Smart - Contracts, er how to decompile them as er Windows reverse engineering. I thought it was interesting to analyze like the er VM itself. And um, I’m also using a tool which is going to be open source, so, I’m going to give the link at the end, and er, yeah, and of course the tool is not perfect so you’re more than welcome to contribute and er and give like pull request. Um, just like a short overview of what we are going to be talking about today. We going to be talking about the EVM, its memory management, ah, or we can do like type discovery which er is important, er especially with er many like static analyzers but also like if you’re building a decompiler obviously or, it can be used for like static and dynamic analysis, um and the known type of er class of bugs we know of so far, and what to expect in the future. So, how many of you are familiar with er Solidity, er please raise your hand. So, basically er, Solidity is er the decompiler for Ethereum so the way Ethereum works is like to execute the Smart-Contracts is basically like a software layer on top of the blockchain. So as a user compiler of Solidity which is translating er code which is er written in a JavaScript like format into bytecode. Um, and Porosity’s the tool I’m going to be describing today and releasing. Um, so, if you’re familiar with like chemistry and physics, it is the exact opposite of solidity. Um, hence um the name. So, so far like there is a lot of high counts on Ethereum, there’s like millions of them and if you look at the actual number of er contract accounts, er it’s almost one million - probably it’s like one million by now. But the actual number of verified contract is very small. And the definition of er, what’s a verified contract is very obscure, um but basically it comes down to if the source code is provided, um so when it comes down to reverse engineering, like usually don’t care if you have the source or not, but it’s interesting that since this software layer has been introduced with Ethereum ah, we see like a need for reverse engineering. Um, and especially since Ethereum introduce a concept of ICO, to leverage the use cases for Smart-Contracts we’ve heard a lot of stories since the beginning of the month. I think here I’m mentioning like two, but I think three different stories happened this month, including er, one with er CoinDash. Er, which is like the first one. Someone changed like the address where it received the funds and most recently, er Parity which is a project, um, started by um, one of the key developer of Ethereum. Er, at a vulnerability with one of their Smart-Contract and thirty million got lost and few days back something happened with, I don’t know, their ICO and ten million vanished. Um, so that’s basically what happened when you like writing software to store money but don’t have proper security checks. Um, small damaging then a blue screen of death, it’s like a wallet of death. But er, so, the EVM, so the Ethereum Virtual Machine, um so for like each, um like you have three concepts: Accounts, contracts and blockchain which are interchangeable. Um, even like for a person like me the difference is like quite obscure but I’m mainly focus on the actual bytecode which is stored on the actual blockchain, so a Smart-Contract it is basically a synonym, in more fancy word for bytecode. And it’s stored inside the blockchain and um it uses (one sixty) 160-bits addresses and addresses correspond to an account. And one of the specificity or so of the Ethereum virtual machine is that use like (two fifty six) 256-bits registers. Um, but they don’t really have like registers like you would know in traditional infrastructures, like F eight six for instance, so I like this concept of virtual stack. Um, the more you look at it, the more you see like it was kind of their trying out different things, still like the outcome is pretty good, but if you going to be like, going to be like a virtual machine, you know, a lot of the things are a bit, er, a bit shaky. Um, so for those who are not familiar with Solidity, it’s basically what it looks like um, on the left that’s a simple like, um coin contract so it’s very simple, usually you have few returns, some storage memory, um and even like the instructions themselves um are like quite straightforward. You do like many, um, you store like um, an integer, you do subtraction, addition er, that’s pretty much it. So, er like the level of complexity of a contract is um, very far away from the complexity of a kernel driver, per se. Um, then you compile it using Solidity and then you get like all that bytecode, which is going to end up on the er actual, er, blockchain. And at the same time you compile, it also saves the um, interface which is going to be used for like um, all the Smart-Contracts to like to call that, er, specific contract. So, we’re getting the memory management. So you’d have like three different type of memory, um that are significant. So the first one is the the stacks that I was mentioning before, um like under traditional architecture you would use like the stack to push arguments, at least with F eighty six, with 264-bit architectures, you would use the stack to push arguments to a function. So, here you push arguments to opcodes and then there is like a limited size to the actual stack which is, um, one thousand twenty four elements, and then you would have like two type of storage so like a Persistent one um which is designed er, as it name says to retain data and another one er which is like more volatile, that you can identify easily from the instructions. So, the volatile one is interesting because it’s basically what we would use to like store strings um, but even if you look at it, the way it’s done it’s a little bit dirty but it does the job. Like I was saying, Smart-Contracts are like very, er, er, simple, very simple per design. So, if you do a static or dynamic analysis, um control flow graph and especially if you write a decompiler, one of the most important thing to understand is the actual like control flow of your program. Um, so the first thing we need to know, is basically or to identify basic blocks. Um, so I guess in that case, with Smart-Contract, it’s so much easier than a traditional architecture. Um, because it’s not hard, this concept, of confiscation um, and everything don’t really exist yet with Smart-Contracts, er for sure we expect them in the future, but it’s not like um present yet. So, in most of cases the start instruction is called JUMPDEST, which is indicating like the beginning of a new block, um in most of cases, I’d say like eighty five per cent of them, um you’d have this instruction at the beginning of each block and then you have a bunch of different instructions, er, for like conditional jumps, so very traditional jumps. Um, the difference with traditional um architecture, like you’re on X eighty six for instance, er you’d have your opcode and then your destination, right? In the same, em, in the same opcode. Um, whereas, like here, they first push it on the stack and then execute the opcode for the that. Um, but the main difference is, sometimes you just going to push it at the beginning of the function to a bunch of instructions. So you would like totally forget which er, was like the destination address. So that’s why you’d have to write the pseudo [inaudible] that you would have to emulate most of the instruction to keep track of all the destinations. And, er, basically it one of the main limitation of static analysis of Smart-Contracts also as like most of it could be done er statically. Bust because of those like weird scenarios where like the destination is stored beforehand, um, you have to like emulate like certain basic blocks in order to like keep, um, the er, destination of er, the basic block. Um, for stack manipulations there is basically like a few instructions where you have duplicate, SWAP and er POP and PUSH. It’s pretty straight forward. Um, when it comes to the actual opcodes of Ethereum, you can have different um categories, so obviously the main one is for like arithmetic functions, um, um, it’s mainly designed to deal with money to store, to create like wireless for transactions. Which makes sense, right? And then you have like the block environments and the environmental informations, where you would have information like the sender, the person receiving it and then you have all the memory, um related operation plus logging operations to keep track of certain events. So, like I was saying before, the main thing here, er just as an example, where basically like opcodes are more like functions because you need to push the arguments on the virtual stack. Um, so here, um, you find that addition you would push the two valuables that you want to add, then you would retrieve the arguments, not in the register, but in the first item of the actual stack. Um, so that would come like that if you would write it in a EVM pseudo code um, and then you have like um EVM calls, so those are like pretty interesting , they allow you to like call a different contract, um you have multiple types of calls. You have the regular code, also like delegate call which is what has been abused like recently in the Parity, um contract. Um, which is also interesting from the perspectives that basically you, call like third party libraries that you don’t necessarily own. Um, so this is also an interesting context, um so it leaves a lot of um, opportunities for undefined behavior and um when it comes to um, static analysis, dynamic analysis, or even like trying to define your scope, um once it creates a lot of issues. Um there are like four exceptions for like out coded contracts including for SHA2 to identity function and er for RIPEMD functions. So like the er, contract addresses are like one, two three and four. So whenever you look at the actual bytecode, you would, um, notice like those static addresses. Um, so when it comes down to user defined functions that are exportable by default by each Smart-Contract, you would easily recognized how many parameters they have based on the call data load instruction which is basically reading the um, um, environment information block which is basically, um, like like a like a buffer contents or the inputs parameter including the hash of the function we want to execute. Um, so the structure of that, um, block is pretty straight forward. Like the first four bytes, would be the hash method of the actual er, function which we are going to describe later. And then it would be followed by the arguments. So, if you look at the actual like pseudo code here, so basically, if A and B are being recovered and the first parameter is the actual offset inside the block, um and then for the addition, um that’s what you would get. So, that function is like very simple - so it’s an addition, right, um, but it’s basically what it would look like in a pseudo EVM, um, code. So, when it comes down to type discovery, um the main type you will see, the main type you want to recognize, er are addresses. So, if this is like the address of the er, sender, this destination of the wallet of another contract. Um so it will be um uncoded one sixty bits, right? And most of the time, every time you need… ah, something which is not on two fift- two fifty six bits, you will see a hand operation, so in most cases you will see like, in some cases you gonna see it like out coded, but in most of cases you going to see like some EVM assembly, like um, optimization , like here, the following one where it’s using like er, it’s computing like the mask dynamically. Um, so there is like few of them that we can recognize like very easily. And, again, like er, if we do like type discovery while emulating the code, we would actually, um we would even be actually able to just check the mask associated to the instruction. Um, so now, that we have seen all EVMs is kind of working. Um, let’s talk about the bytecode. Now, so you can have two different categories, so you can have like the Pre-Loader code and which is going to be in charge of copying the actual Smart-Contract where like all the interesting stuff is, er inside the executable memory and then, which is er, basically the Runtime code of the contract. And then, um, so the runtime code, which is basically what we want to analyze, contains all the information that we want to ask, spend the time on, so it would contain, er, the whole class, the whole contract so each function. And which is basically like the er, what had been produced by the uh solidity compiler. Um, so this is what the actual like pre-loader looks like, so there is instruction, code called CODECOPY which is basically taking charge of like taking the actual bytecodes of the contracts that we need to put it inside the executable memory. So we can, er, execute it after at the offset zero. Em, once we enter inside the actual, er, Smart-Contract there is like a dispatcher, which is in charge of, em, splitting all the different functions. So the way it works is basically like giant switch instruction, so it would first like recover the hash method from the code data load instruction and from that, um, so here you can see em, when the code optimization is basically just like first reading two fifty six bit register. And then, from that it would like er, apply a mask to any extract the first four bytes. So, that’s basically like the hash method. And then you enter in a switch statement, which on each switch statement is corresponding to an actual function. Um, in some cases you gonna also have a fall back function, so for each, er so if there’s an unknown method which is not recognised by the Smart-Contract, it will just executor method by default. Er, some in cases, um, like in the case of the Parity contract, which is, which is what we are going see after also, so um, it redirects like a call blindly to another contract. Um, well it’s not the kinds of things you’ll see it not doing, you know, people will start to freak out to be honest. But that’s something that seem normal to some people writing Smart-Contract. Um,some of those things, to be honest, seem obscure. Like, er, I don’t really understand why you would have a fall back function, so I mean, I understand why they did it, because of like this thing like they want contracts to be like backward compatible and forward compatible, but the source of so much problem, like er, by design, if you think of security. it doesn’t make sense to be honest. So, um, function hashes, um the way I compute it; basically take the function name and the parameter of each argument, and then er, just like stick them together and compute the share free, um of that um, impute. And the result of the first four bytes would basically be the hash method. So, it’s pretty straightforward, um so um if you have like the um ABI, so the actual interface of the contract, um, you can easily like recompute it. Um, but if you don’t from the actual switch function, you like just extract the actual like hash method, um from the um runtime code and you can create um, um like a name of the fly, like you would do like with either way you just give like um a sub-function with the actual offset of the function when you don’t have symbols. So like with the ABI, justifies the equivalent of symbols for Smart-Contracts. Um, here is the instruction I was mentioning where we basically extract the four bytes. Um, and then um, that’s like the pseudo code for it. So here is a compare reason, if you purely do like static control flow, um like our reconstruction, um you try to emulate, you can see in some cases, er, you really need to emulate the code to keep track of all the actual like destination, um and pointers. Um, so that will be like a simple contract where you would have like two functions, um, which is here. So, once you can start to analyze the actual like runtime code from that, um, like I was saying, it’s basically like a giant switch, each case of the switch is basically a function for it. Um, and then once we decompile it, we get something closer to what’s on the er, right of the screen. Um, to go into it in detail, um the runtime code. So here, for instance, the double functions. In yellow we have actual hash of that function, then it’s going to jump to the offset like at twenty four. Um, which is marked with the JUMPDEST instruction. And then it’s pushing, um the um, argument two, then we arrive in a new block, so here in that case, there is like a JUMPDEST but it’s not in the new basic block, but it’s used by another function , it’s a shared basic block. And then you going to do just like the multiplication and something with the tripper, we going to see is going to read something again like the impute parameter sha3 and then execute like the multiple instruction. So, if we go back to like the initial source code of it, um, it’s basically what it was doing, um it was pretty straightforward. So if we use Smart-Contract that, I wanna say, like way more complex than that, but more complex than this. It was basically to illustrate how easy you can decompile the actual code. Um, if we look at the, er, bug I was mentioning before, this is the Parity Bug that happened like a week ago. Um, remember when I was talking like before, it was like a different call, you have like a call, it allow you to call like a third party contract, um, and then in some cases you have like um, a fall back function that allows you to execute a contra- um like execute code if the method is unknown. So,um , here, like the address, um in the constrictor the address was opcoded and, um you know it’s green, it’s computing the actual hash method dynamically, and then the code is going to execute that specific function. And then for some random reason you ad, er, like a fall back function, that was basically like allowing you to call any function inside the wallet library and to pass function or parameter you want. So, this is why I was saying, like some concept obscure, basically like the er, actual, er, reason for the vulnerability. So obviously, now looking at it it, it’s obvious, but that’s a new type of bugs that have been er, discovered by the architecture. That’s, er, a pretty good find. Now once you know the type of bug it’s pretty, it’s pretty obvious. Um, so like those fall back functions, so if you have like a switch with executing code with er no actual hash, so that’s what your fall back function would be, um, so I mean then it’s a design issue, right? So the main reason for that it is - keep in mind, so it’s adding like a software layer to the blockchain, right? Um, but also means that if there’s a security bug in it, while you cannot patch blockchain, right? That’s the main thing about it’s like retaining data, moving it around. Um, that’s the main reason.From reading about it, it’s backward and forward compatibility. Because of this lack of capability, to apply patches, um to be honest it does not make sense. I think it’s stupid, but, er, whatever. Um, it’s not like you can design a language which is verifiable because you have like too many unknowns, if you like start calling libraries you don’t even know what’s it going to be called. You have to predict all contracts, all er, future contracts, and like imagine if your kernel would be doing the actual same thing, that would be nuts, you’ll see people rioting in the streets, you know? Well, I hope so. Um, so the actual way that bug was fixed was basically like some of the function where the design to be prevent function. um, so when it was able to em, call the library again directly, any function, the actual constrictor. Um, it could even recall the actual constrictor because it will not even check if it was initialized or not. So this is the type of bug you would see in Smart-Contracts. It’s very far away from the er, classical bugs you would see with buffer overflow and everything. Um, there is another example of er like er vulnerable contracts. So, that’s similar to what [inaudible] was using. So like here the vulnerability was basically here, um, it’s similar to a Race Condition so basically same thing - it comes down to like a fall back function being reused. So, er, that would create a Race Condition, where like the balance would not be initialized on time. Um, so for that type of vulnerabilities, the good thing is, because there is not many instructions, you can like tag each basic block to see what they are doing. And every time there is a call with an external contract, you will track at as either a warning or as a neuro so in that case, um we could see that the SStore instruction was being used after. So, it would like be easy to analyze. Um, going to show you a quick demo. So, um, that’s um, that’s um, like the actual Smart-Contract itself. Er, so, so… to call the tool you can basically provide the you only need to provide like the actual bytecode. Um, if you have like this symbol, so like the ABI file you can just pass it. And then you just like run the two [pause] and once you give it a name to the tool you can easily reconstr- reconstruct something, er, very close to a the, er, actual source code, so And if you had some like features, because to build a compiler you basically build everything you need for dynamic analysis and also static analysis we can easily, easily um, oh I didn’t see the title there, that’s cool um, you easily like use it to track potential vulnerabilities, just like you would have with most of compilers….now when your prefast or per visual studio er, you have a lot of like static analysis tools that can be used. Um, whenever you are writing code, right, so, now if you look at like the actual Smart-Contract community, um they are still like building all those tools, it’s something very new so a lot of the tools that we would find pretty obvious with GCC, your visual studio compiler are not present for, er, those type of software. Um, so because like the whole concept was to introduce a software layer to it, but it comes without a lot of testing tool, um which would be required for enterprise softwares. Um, so so far a few um, class of bugs that have been detected. So the first one was like the Race Condition USD DAO, then Call Stack Vulnerability, um there’s some good papers about it where er, like I was saying before, the virtual stack is limited and when you use all of it, um, it’s not even returning an exception and for a while some issues with like returning exceptions, reverting like a state of a contract as well as like concepts that are very new, have been introduced recently. Um, Time Dependency vulnerability - while some of the actual like um, instruction give you like some time information with the ability to block, um, so you can easily like get get call with what happened with the parateer contract. Um, so there is a fork of Ethereum called Quorum created by J.P. Morgan which is um pretty interesting. The main reason people were a bit worried or so with Ethereum was basically um, um if you are an enterprise you can’t just have everything like return like transparently, so they introduced this privacy layer, and er, permissions to our Smart-Contracts which is pretty cool. Um, like the Quorum team is here, so I don’t know if you guys want to stand up? If people are going to see?But er, that’s a pretty cool project. We see a lot of stuff happening, um around like Ethereum, which is pretty cool and, er, this week like er, Quorum just released um, [inaudible] to Porosity which check like nodes inside the actual network. So, that’s pretty cool too. And like I was saying, the main thing, which is a bit worrying, and what we er hear many like er, stories about the er ICO’s hack, and why for sure we’re going to see more of those stories of the summer is mainly because there is no proper like testing tools for that new like software layer - which is pretty nuts, you know? You, you guys have seen how long it took for traditional software to get like proper security tools Um, er, to be tested, even like, er Azure stuff, this tye of like high level framework, um, while we’ve - Smart-Contracts is pretty much ground zero, I’ve never heard of it in most cases, which is why there’s a need for search tools. Um, here is some like screenshot of the current Porosity integration. So, then it can be integrated in the actual workflow, um, which is um pretty good at least to think of integrating, um, tools like that, um in the actual workflow. It was very fast for the actual Quorum team to do the integration, um, they have a pretty good framework so they can like really add more and more tools very quickly. It’s pretty good and, in my opinion, it’s a requirement if people are going to start really using Smart-Contracts especially to store money. Um, again, like if you write a Smart-Contract, most of the time you’ll use it to store money, right, not to browse YouTube and watch cat videos….um, I mean, yeah. You know if you find like a zero day like in a web browser, then you would have to struggle to where to send it. While if you found a bug in a Smart-Contract, you just like take the bank, you know? It’s like being Lazarus Group, but for crypto currency. Um , so like I was saying, for sure we will see more and more, um tests, um, testing tools and like we can definitely expect by the end of the year issues like ICO’s hack, since like it’s like a new thing, everyone wants to like raise money with ICO’s um, we’re getting the tools also - so like some improvement required for like um, um fall out of the conditional statement and um when it comes to Ethereum and security, like I was saying, there is like a fast growing community, especially like the main incentive is, why either you want to steal money or you want to protect your money, right? So it’s pretty straightforward in term of, er, motives if you want to get into like Smart-Contract security. Um, and like I was saying, when I initially looked at it, I was like, why is everybody talking about blockchain, it sounds like really boring, um, then I saw like some virtual machine around, ah maybe there’s some interesting thing to do. And, for those who are like familiar with virtual machine vulnerabilities, um well QMUs has a lot of them, but then you have like, you know like talks like CLOUDBURST that like happened many years ago, um, I think it was Black Hat 2010 or something, um, where basically you would be able to do a VMS escape, and now VMS escape are becoming more and more common, even like Microsoft with their IP vulnerability would be raising their bug bounty. Um well you can be sure if you have your own virtual machine you can also expect bugs in it, right? Um, then the whole thing of claiming it’s sandbox does not really apply. Um, so the question now is: Is Ethereum going to stay alive and if their virtual machine is going to be the main virtual machine, or if we’re going to select more providers for Smart-Contracts with their own virtual machines? Um , I was looking at the roadmap for next year and I saw there are, um planning to use WebAssembly. Um, I had no idea what was WebAssembly, and then I looked into it and basically, um so it’s being described, it’s portable low time efficiency format, so have like your own bytecode um, that can be executed by most of the JavaScript engine like V8 or Spidermonkey, er, and so they’re planning to use a VCM engine that will send your browser to a Smart-Contract, um I don’t know if it’s good, I guess from the fact that it’s also going to be used by all the platforms, it will also benefit of the auditing of that specific language. And in terms of performance, I’ve seen some good stuff. Like if you look at the er demos online, the guys are like running almost like video games with V8’s they can like almost make compile…. I don’t know if it’s a good thing or not, but you can like compile like code, C++ code into like WebAssembly, um for an interface it seems a bit confusing to go from like, I think a VM with a specific set of instruction to like being able to compile C++, um, so I don’t know to which extent it’s going to be used in the Ethereum VM by next year, um, but for sure we would see like stuff, um leveraging that. Um, there were some people I wanted to thank, worked with me during the paper, so um including like the DefCon review board, because er, initially I was just like sending the decompilers, - um, can we use it for security? I was like well, it’s a decompiler, so you can do anything. So they kinda pushed me to do a security analysis for it. Um, so if you want to download like the slides and the actual white paper, which is more complete, and, um, in case you didn’t understand my French accent, which I can understand but bit what I can say? I’m French, no, not going to apologise but er, you can download via the actual tool and the rest and yeah, if you have any question, you can either drop me an email or we have free minutes now so I don’t know how it works for like Q&A here? But er, yeah if you have any question, let me know. [applause]