>> Alright, so like you said I'm David Dorsey. I work at Click Security doing R&D. I've been on the defensive side for the last ten years. File analysis, RE, a little bit of IR. Recently doing some stuff with machine learning, if you're over at B sides. But, we're here to talk about ROP. So, quick level setting, ROP is a technique used to bypass non-executable memory, bounce around memory, executing small gadgets. Typically input the return instruction, hence the return oriented programming. I use pin to do this. It's a dynamic binary instrumentation framework from Intel, it's really nice. You don't need to compile source code and it can instrument dynamically generated code. It gives you lots of different granularity on where you can instrument by instruction, basic block function, dll loads and unloads. It can do way more than what I'm using it for but it's a really cool tool. So, the basic idea here is, we're going to force some control flow integrity. We're starting with just coarse grain controls. So, the idea is, we know where your calls and returns are supposed to go. You know, the functions, instructions after a call instruction, things like that. So we want to create a white list of these addresses and then we want to store the offset to these because dll's are loaded ALSR now. Then, if an indirect call or a return doesn't go to one of these addresses then we go, hey, this is a ROP. So, first we have to get those offsets, we have to build our white lists. So how do we get those? So, the first I did is I build a pin tool to give me those addresses. So, when DLL is loaded it iterates through the exported functions, analyzes them, finds the calls, if it's a direct call we put the target of that on the white list and then the instruction afterwards. If it's an indirect call just the instruction afterwards because we don't know where it goes yet. Then, in addition to that, all calls and returns are instrumented on the instruction level. This is because when you run a program not ever function is in one of those exported ones. There's a whole lot more so we need to get access to those. Then, these offsets are stored when the program ends, it's dumped to a text file. There is some post processing that goes on afterwards to add them to the white list and make sure that you don't have any duplicates. So, this is really good in the sense that we get real, actual used values. There are not questions, there's no heuristics we have to worry about. Bad news is that it's not the fastest thing and we only get values that are actually run. So, if a DLL isn't loaded you're not getting a code for that. And you also probably need to run it multiple times just to be safe. I would have to run three or four times to make sure every different code path is executed. So then I decided that wasn't a good idea. I didn't want to do that so I turned to pyew. It's much better at detecting these functions. It's much better. You can extract the flow graphs and then you can bulk run them. I actually had to create a patch for pyew that I'm making available. At some point it will be put in the main trunk but I have to fix it first. But it's much better; I get a lot more details, or a lot more information quicker. So, now that we have the data what do we do with it? So, I store all of them by per dll and I store them by the md5 hash. This is because dll's can have inversions and I need to be able to account for that. So, now that we have all of this let's detect the ROP. So, we have another pin tool. When a dll is loaded pin gives you the location of that dll disk. So, I take that location, open up the file, I hash it, then take that hash and load the white list for that. Then I instrument all indirect calls and returns and if you're not on the white list you're a ROP. So, examples: Adobe 9.3 on Windows XP, there is the hash. This is an old exploit obviously. So, we run it, wave our hands, and viola; we have a detection. So, you can see the address where we are coming from and where we're going to. Then in the parenthesis there's the offset. So yay, we found one. That's really good, I was really happy. But, we only detected one and that was disappointing. All you need to do is detect one but, you know, I always want more. So let's take a look at what happens. So, this is an enco-type dll. The attacker already has control of the EAX. This is the beginning of the ROP change. So, the call executes and then we just here. Actually, we don't detect this one at the moment, there is a call before it, so this address is actually on the white list at the moment. That's disappointing so that's probably why we didn't detect this one. We execute those three instructions and we get to this, so we do a stack pivot, and this is the one we detect. Ok, great, we detected this one now let's see why. Let's look at the instruction before it and there really isn't an instruction before it because it's not really an instruction that is supposed to be there. Just returning into the middle of an instruction and we can detect that. So, at this point this is where it dies, at this return right here. And that's because the pin is affecting my memory layout. It's messing up the heap spray so when the return happens we're going to a random spot in memory which is generally not useful when you are trying to execute a program or anything. So, on a side note, I don't know if we should run everything in pin because hey, I prevented it from happening. So, let's make believe here. How would we have done if this had actually executed completely? So, there are forty-five chains in this ROP sequence, only fourteen unique addresses. Not terribly surprising once you have a gadget that works. No sense finding another to do the same thing. Two of these were indirect calls and we had 43 returns. Three of the fourteen addresses were on the white list so we had a pretty good detection rate there. So, overall out of all the chains we only missed three of them. Not too shabby there. Example two here; Adobe 9.5, also on Windows xp more recent vulnerability from late last year. This one, unfortunately, I didn't get anything. Pin was messing up the memory layout again, despite my best efforts. So again, unfortunately we have to go the make believe. So this one actually had a huge ROP sequence, its 208, but it was dominated by a NOP sled essentially. It was returning to itself over and over again and once it finished that then it went around about its business. Fifteen unique addresses this time, all of them returns. Again, only three were on the white list and of all of the chains we detected two-hundred and four out of two-hundred and eight. So, we seem to, on the address part, about the same detection rate. So, let's do a little math. How good would this work if we could get it to work in pin? So, you can see, you know, the math works out in your favor pretty quickly. Even at ten there are a lot of nines in that detection probability. That's a good thing. So, let's say that might be a little optimistic, let's say. So, what if we drop it to fifty percent? Even at ten addresses you still have 99.9 percent. I was happy with that. So, let's talk about the limitations of this. Obviously breaking on the stack limit. That's a pretty big limitation unfortunately. Since we are running this in pin it's also kind of slow. This is not ready for prime time. I don't want to make you think that it is. We don't handle jump oriented programming, JOP; I would need a sample to test that out first. We only do the course grained control flow integrity, not the fine grained yet. So, what's left to do? Obviously I still have to figure out the heat problem. There are probably much better, smarter implementations I can do. Maybe do things at the basic lock level rather than every single instruction. I can push the analysis to another thread. The analysis doesn’t take a long time but microseconds add up when you do it billions of times. I want to add checks for JOP in there since I think we can do that. I’d love to implement this on OS X and Linux would be fine. Also, on Wednesday at Black Hat there was a really good talk on bypassing all the ROP stuff called The Beast in in Your Memory. I want to talk about that for a little bit here. They defeated essentially all of the coarse grained CFI’s. They had to sample a demo where they defeated EMET, the latest one. It would defeat my current implementation. It defeated return frequency/sequence length heuristics, from Kbouncer and ROPecker. So, that is the state of the art now. So, they raised the bar so how can we defeat them now? That’s the big question. So we can start to implement fine grain control. As I stated earlier, we currently wouldn't detect this particular call into this location. However, really the only address, the only place where you should return to this specific, the CB38 is from that call. So, if we're not coming from the return in that call, then that's very suspicious. So we should be able to do that. When we pull all of the addresses with pyew we can easily get all the return addresses from where they go. Since we do all of this pre-process this shouldn't add a lot of analysis time and I think, you know, we should be able to detect ‑‑ we should be able to defeat what they currently have at the moment. So, it's just raising the bar and then they're trying to raise the bar. So, there’s probably a smarter way to do this. Maybe you want to do this in a debugger, Detours, Kbouncer, in particular, I know uses the MSRs for the last branch and that's how they can get their performance. Pin is pretty slow tonight but it's nice. It’s slow, it's good for using, for poofing out a concept but it's not going to be a prime time tool at this point. And so that's about it. I have code to your. It's not up yet. It should be I'd like to say tonight but let's be honest it would be more like tomorrow morning. If you want to contact me you can. There's further reading. You might be able to read this if you're in the front but if you want the slides later there's some good links there about some things there about ROP. And that's it. Any questions? (Applause). >> All right. I got tons of time. So, I can go back. I know, I talk fast, I apologize. So, one thing to note here, the icucnv36 that's used a lot because it's not ALSR'd so in the 9.X series and most of your attacks go behind that. They'll use this. Let's see what else? That's about it really. The other ones ‑‑ there was some calls that I could detect, indirect calls, so it wasn't just returns.