Alright, so let's get this show on the road. Please settle down, take a seat and please give a warm welcome to a first time DEF CON speaker, Ulf. So please. So thank you everyone for coming and listening to me. Today we are going to direct memory attack the kernel. My name is Ulf Rysk and helping me with the demos today I have uh, Martin Bergqvist. Today we are going to totally own Linux, Windows and OSX kernels by DMA code injection. We're going to dump memory at speeds in excess of 150 megabytes per second. We're going to both pull and push files from the target system. We're going to execute code and spawn a system shell. After this talk I will be open sourcing the software making all this possible. And since we are talking about a hardware based attack you also need a hardware which is already available for purchase online for less than $100. But first a little bit about myself. My name is Ulf Rysk. I'm working as a penetration tester primarily with online banking security. I'm employed in the uh, financial sector in uh, Stockholm, Sweden. I have a master of science in uh, computer science and software engineering. I'm also working as an engineer in uh, most recently I've been taking a special interest in low level windows programming and DMA. And this has been a little bit like a learning by doing project for my part. Learning more about 64 bit assembly and operating system kernels. Actually in order to be able to do this talk I have to put up this slide. I need to point out that uh, this talk is given by me as an individual. My employer, my personal description by my employer is not involved in anyway whatsoever what I'm doing here today. But I'm here today to present PCI Leach. PCI Leach is the combination between the PLX technologies, USB 3380, development board, coupled with a custom firmware and a custom software. On the image here, you see this development board and the Mini-PCI Express form factor. To your left you see the PCI Express site, to my left you see the PCI Express site. For your it's the side that goes into the target computer or if you wanted to call it the victim computer. The USB 3380 is able to send both DMA reads and writes into the target system main memory. To the right you see a USB 3 connector which allows us for connecting this board to a controlling computer and once connecting to a controlling computer this controlling computer is able to transfer memory at very high speeds with USB 3 straight into the memory of the target computer. What's very nice about this hardware is that it requires no drivers at all on the target computer. It just works. It's hardware only. And with this piece of hardware I'm able to get well over 150 megabytes per second DMA transfer speeds. I'm going to show you how that works. Unfortunately this chip is only capable of 32 bit addressing and that means that you're only able to access the lower 4 gigabytes of memory with this card. As we'll see later on that's not really a problem in practice. Actually, actually the USB 3380 has been presented here at DEFCON before. It was presented 2 years ago as the NSA playset slot screamer device. By Joe Fitzpatrick and Myles D. So I want to really thank Joe for bringing this really nice piece of hardware into my attention. So thank you very much Joe. If I compare PCI Leach to slot screamer it's obviously exactly the same hardware. It's the complete, it's a different firmware and a different software. This also means that if you already do have a slot screamer device you should be able to reflash it and try this software on. It's, uh, faster. The slot screamer was able to achieve around 3 megabytes per second, something like that. Uh, the PCI Leach device is able to achieve well over 150 megabytes per second DMA transfer speeds. The PCI Leach is also capable of, uh, kernel implants. In fact it's relying heavily on kernel implants. But what makes all this possible is of course PCI Express. So PCI Leach is also capable of another hardware, PCI Express is a high speed serial expansion bus. Or, it's not really a bus since it's point to point communication, but anyway, it's, uh, packet based. And to the upper right you see a schematic of, uh, PCI express. You have the PCI express root complex anchored within the CPU ship. From this root complex you have, uh, several serial lanes that you can connect PCI express end points to. You can also connect like PCI express switches and bridges. So you can say that PCI Express forms a small device network within a computer. Depending on how much bandwidth the device needs it can consume between 1 and 16 serial lanes. A graphics card that needs lots of bandwidth typically consumes like 16 lanes. PCI Express is designed to be hot pluggable and it comes in many form factors and variations. It comes as the standard form factor as you all know the PCI Express. Standard graphics card and similar things. It comes in the mini PCI Express form factor as you saw in the previous slide. It comes as express cards which goes into laptops. And also Thunderbolt encapsulates PCI Express. And what's nice about PCI Express from our point of view is that it's uh DMA capable and that means that it's circumventing the CPU cores. So the PCI Express endpoints can read and write memory directly. But what is direct memory access? And how does it work? And you have the CPU core. It usually executes code in something called a virtual address space. And you have a memory management unit which is built into the CPU which uses page tables in order to translate these virtual addresses into physical addresses. And uh it actually translates pages and a page is uh typically four kilobytes long. It can be larger as well but uh most cases are four kilobytes long. Uh PCI Express devices have traditionally been able to access all physical memory straight out without any limitations whatsoever. But CPUs nowadays they do have something called an IOMMU which works similar way to the I to the memory management unit for the CPU. And this allows for virtualization of uh device addresses as well. So in theory the operating system should be able to protect themselves fully against DMA attacks if the IOMMU is fully used. But as we'll see later on that's not really the case. Actually this is the complete firmware of the PCI Leach device. It's a whopping 46 bytes in total. And uh it's a very powerful device. And the first two bytes is a header or actually the first byte is a header 5A. 0 0 tells us that uh just load configuration data from the configuration uh into the configuration registers at uh power on. Next we have the length which is in little endium. So 2A is uh 42 bytes of configuration data. Then we have the uh USB port which is uh the USB controller register. We need to enable the USB 3 port on this board because it's disabled by default. First you have an address to the register with the 2310 here. And then you have uh a D word or 4 bytes or 32 bits which is uh programmed into that register at the power on. And this enables the USB 3 port. Then we set the PCI Express vendor ID and product ID to a Broadcom SD card and um this is pretty much just a left over from the slot screamer software I started to toy around with. And then in green here we enable the four DMA endpoints which are capable of high speed DMA transfers between USB 3 and PCI Express. We set the first endpoint to a write endpoint which allows us to write memory from USB into main memory of the target computer at high speed. Then we set the following three endpoints to read endpoints. Reason why we set the three endpoints to read endpoints is that read is much more common than write. And we can get a little bit more transfer speed out of this chip if we're doing multi-threaded access. And at last we set the USB vendor ID and product ID to Google Glass. And the reason why I'm doing this is that uh I've wrote this program for Windows. Windows has a very nice user mode USB stack called WinUSB. But uh in order to activate it for a certain hardware you need to sign a small configuration file with a driver signing certificate. And uh those ones are kinda expensive. So I didn't want to purchase one. So it was much easier to find a device out there that uh actually uses this WinUSB stuff already and lie about being that device. But let's get into the kernels. Most computers today they do have more than four gigs of memory. If you're able to get the kernel module into a system it should be able to access all memory and also be able to execute code. So what we can do is we can search for kernel structures, code signatures, whatever in lower memory using DMA and uh patch that code and hijack it. So we can get the execution flow of the kernel code that way. And when we are doing this we need to keep in mind that the PC Express DMA works with physical addresses. Kernel code runs in uh virtual address space. I divided exploitation into three stages. First we have uh the stage one which is pretty much just a hook. Then we have the uh stage two which is the uh stager for the final stage three kernel module implant. And then we start by trying to locate the the kernel or a driver or whatever in the kernel space that we can target. Usually at the end of the uh kernel itself or a driver there is some free space in the last page because it's usually not completely filled out. And that page is already executable so we put our stage two code in there which is around 500 bytes. Then we search for a function to hook. And then we search for a function to hook. And uh yeah. And once we find that function we overwrite it with a call into the uh stage two code which is already written into the kernel. And when a thread starts executing the hook function it immediately jumps to the stage two part code. And uh the very first thing the stage two code does it restores the stage one code to it's original state. Then we uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh check if we are the first thread running here. We might run in a multi-threaded environment. And uh if we are not the first thread running here we immediately jumps back to the now unhooked stage one function and resume the normal execution flow for that kernel thread. But if we are the first thread running here we locate the base of the kernel. And we need that in order to look up some uh function pointers that we are going to need later on. For example we need those function pointer in order to allocate two pages of memory. The first page we use as a buffer that uh the PCLH main control program running on the other computer can use DMA in this buffer in order to communicate with the uh kernel module that we are going to insert. The second page is the kernel module or the stage two three code itself. Then we write a small stage three stub into the second page and uh this is pretty much just a tight loop. And then we create a new kernel thread in that loop. And at the very end in the stage two section we write the physical address of the buffer we allocated into the the code where the stage two part is located. And the PCLH main control program is pulling this buffer all the time uh with DMA and once it receives the physical address it writes the complete stage three contents into that address it received. Then the loop which is already executing the third kernel thread there. It senses that it uh the complete stage three contents is written. So it exits exits and uh it starts by setting up a DMA buffer which is around four to sixteen megabytes big in lower memory. And then it starts looping. And then the kernel is uh waiting for commands. The commands are pretty much read memory, write memory, execute code, or exit. But let's start by attacking Linux. The Linux kernel is uh located in uh low physical memory. If kernel address space layer randomization is not enabled it's located at sixteen megabytes in physical memory. If KASLR is enabled it slides at the two megabyte chart. So once we find the kernel we search for a function or random function to hook. In uh my code I shows VFS read since it gets called pretty often and it works fine. Then I search for a function called call sims lookup name. This is pretty much the equivalent of get proc address in windows. It allows me to use a kernel symbol name uh and send it to that function and it will look up the function point function pointer for that or a symbol for that. Then we write the stage two code and write the stage one code. Then we wait for the stage two code to return with the physical address of stage three. We write the complete stage three code and then it's demo time. In this demo I will show how we can use a generic kernel implant in order to both pull and push files from and to a Linux system and we're also going to dump the memory. Hmm And this is the demos. It's supposed to be like this. Sorry about that. Here you see a Kali Linux computer. And we will try to log on to that computer with the root account. Now I'm gonna show you Thank you. That one was not working here. We'll show, we will reboot the computer afterwards and do the demo after the Windows demo, but we will start by dumping the memory on this computer anyway. So, let's, uh, dump the memory. Uh. We'll use the Linux 64 bit kernel module. We're going to dump the memory and store it in C temp here. So, first we insert the kernel module into running kernel, then we receive execution and then memory dumping is starting. Memory dumping works the following way, is that the kernel module first asks the running kernel about the memory that it's about the physical memory map. In computers, physical memory is what not one big chunk of memory. You have like memory PC EXPRESS devices in between there. But if you read those ones, you can crash the computer, whatever. You also have unreadable memories such as system management mode, that you can't read. So, it first queries the computer about the memory map, reports this back to the PCLH main control program. And once the main control program knows about the physical memory map it can ask the inserted kernel module to read certain memory chunks. Dumping memory is usually pretty fast it's uh should be well over 150 megabytes per second but in this demo I have to use a crappy USB hub so the speed is a little bit lower but it should still be well in excess of 100 megabytes per second as you can see here. Thank you very much. And uh when we dump the memory let's try to run volatility on it as well. I'm running the Linux PS3 command here just to show you that uh it's working here. At the very bottom you see the PCI leach kernel thread for the inserted kernel module and if you scroll up here we see lots of kernel threads and user mode uh processes here and the systemd at the very top here. Uh so let's move back to Windows 10. In Windows 10 the kernel is located at the top of the physical memory. Uh the memory is kind of boring for us since we can't access it directly and this is a problem for us if the computer do have more than around three and a half gigs of RAM and the reason for that is like memory map PC express devices and other things pushes the last uh bytes of the memory well above 4 gigs. So this means that the kernel executable is not reachable directly and most drivers are also loaded below uh above 4 gigs so they are not reachable but if we look at the uh memory structures below 4 gigs we see that uh the page table for the kernel itself and important uh uh kernel drivers are actually loaded below 4 gigs in its entirely. So let's attack the page table. Paging on uh 64 bit system works this way. First you have a 64 bit system you have a virtual address or line or a linear address at the top in red here that you wish to translate into physical address and this is what the memory management unit is doing. So it memory management unit starts by reading the physical address in a CPU register called CR3 in order to find the uh physical base address of a table called page mapping level 4. And you take the top most bits from the virtual address to point out which entry in that table to use and in the PML4 entry you have the physical address of the page directory pointer table. And you take some more bits from the virtual address to point out the entry in that table which contains the physical address of the page directory. Take some more bits from the virtual address which contains the entry in that table which uh is uh the physical base address of the page table itself. Take some more bits in the page in the virtual address and you get the page table entry. It's the entry that we're going to target and corrupt in order to gain kernel execution. What's nice about this is that uh all four uh paging structures here are actually loaded below four jigs so we can access them by using DMA. Kernel address space starts at the address you see here on this slide. Windows do have kernel address space layer ma- randomization so that means that there is no fixed uh virtual addresses between reboots. The kernel is uh loaded at different places and drivers are loaded at different places as well so we can't use that. But if you take a page table entry uh you can see that the uh the uh the uh the kernel is uh loaded directly into the UhDK. Let's go ahead and have a look at the uh lowest three bits and the highest bit which is the present bit and if it's a read or write page or if it's a user or a supervisor slash kernel page or if it's an executable or non-executable page those four bits together form what I call a page signature. And if you take a have a look at the driver or the kernel itself it actually you can follow the uh the uh the uh the uh the uh those collect that collection of page signatures a driver signature. So what I'm doing I'm searching for the driver signature by walking the page table. Once I find the correct driver to target I locate the page and rewrite the physical address in the page table entry to a place below 4 gigs which I can control over DMA. So let's continue on to the Windows 10 demo. In this demo I will use a page table rewrite in order to uh in implant a kernel module. I'm going to execute code, I'm going to dump memory, I'm going to spawn a system shell and also try to unlock the computer. So let's switch over to the demo. Here we have a Windows 10 computer. We will try to log on to that computer without using a password here. So let's go back down to the PCL each device. As you can As you can see we couldn't log on to that computer without using a password on the domain account. But what we can do is that we can insert the PCL each device here into the computer. And once we can try to load a kernel module into running kernel by using a page table hijack. So in Windows 10 because we are looking for driver signatures we need to target the specific driver version. So let's do that and use the page table hijack here. So we search for page table location. We hijack the page table, we wait for a kernel thread to start executing there. We receive execution and we loaded the kernel module at this memory address. And now we can try to remove the passphrase requirement to that computer. By the way it's fully bitlockered so we can log on to it without using a password. In order to do this we need to uh specify that we are going to use the unlock implant. It works similar way to inception. But uh we need to use the unlock implant. It works similar way to inception but uh we need to specify that this is all done in kernel code because we are inserting this kernel module into the target system. And in order to insert it we also need to specify the memory address we just received here. So let's do that. And it says zero is success here. So it says zero here. So let's try to log on. As you can see it's quite easy to lock onto that, this computer. Uh let's uh try to dump the memory of that computer as well. And we need to specify that we are going to use the unlock the kernel module address of the loaded kernel module here as well. Dumping, dumping memory works in a similar way to Linux. First we ask the uh kernel module that is already inserted to report back the physical memory map to the PCLH leech main control program running on uh my demo computer here. Then it uh asks the running kernel module to read certain memory chunks that it knows it's already accessible and store them in the uh DMA buffer that was already allocated in lower memory. Memory dumping takes around uh a minute on an 8 gig system and of course once you dumped all memory you can run memory forensics tools on it such as volatility. You should also be able to for example extract credentials with Mimikatz or things like that. So uh uh uh if you want to do that you can do it in a different way. And this works on fully bitlockered computers by the way. So let's wait until the memory dump is complete here. And uh let's try to spawn a system shell. In order to spawn a system shell you can use the uh PS uh uh uh uh uh CMD kernel implant and we also need to specify the memory address of the kernel module that is already inserted into the kernel. So let's try to run it spawn a system shell. And it's as easy as that. So let's check who we are. Thank you very much. And as you can see we're a system here and once we're in a system uh of course we can do everything. We can disable bitlocker, we can uh spy on other users files and uh do whatever stuff so. But let's not do this here. So uh because this is a Windows demo there is one more thing missing here. So we need to specify the kernel module address here as well. We're missing a blue screen here. I was missing that one. Uh uh uh uh uh uh uh uh uh so let's run the uh PS blue uh kernel implant here. Uh and uh as you can see. As you can see Windows don't like me. Actually Windows 10 they do have some very nice anti-DMA features that are very useful. Uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh is built in. In the enterprise version. But uh they are not enabled by default. Windows 10 can be made rather secure against DMA attacks. If uh the virtualization based security features are enabled. Like credential guard and device guard. But it's quite easy often for users to mess around with settings in the UAFI. For example disableitibed or disable secure boot and things like that. Uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh that and uh then uh this virtualization based security features will be disabled in Windows as well so. So we come to recommendations later on. But uh let's target the last missing operating system here that is OSX. OSX is just like Linux it's located the kernel of OSX is located in uh low physical memory. It's location is dependent on the kernel ASLR slide which slides in two megabyte chunks. OSX nowadays enforces uh kernel extension signing, system integrated protection means that uh users can't write to certain folders and kernel extension signing means that you can't load unsigned drivers. Old Macs today pretty much have uh uh Thunderbolt but uh Thunderbolt is uh actually protected with uh VTD. OSX actually uses this IOMMU in order to protect itself from DNA attacks. So that's kind of boring so what can we do in order to change that? So we can visit Apple's website. Thank you Apple. And Apple on their website tells us in plain how to uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh what happened to disable VTD? So uh yeah. It's, uh it's just that. In OSX with first uh by using DMA we will search for the uh Mac 0 kernel header. Mac 0 is the binary format on binaries in Mac including the kernel. Uh then we search for like a random nice function hook. I think I hooked mem copy in this example. Uh then we write the uh pinned stage two code into the memory of the target computer, then we write the stage one code, we wait for stage two code to return with the physical address of the stage three code, we write the stage three code, and then it's demo time. In this demo, I will show you how to uh disable VTD in order to gain DMA access, and then we're going to dump the memory and unlock the computer. So here you have a MAC, actually to write here you have a express card to uh Thunderbolt converter, which you don't really need for this part. Uh all you need in order to disable VTD is that you need to uh power on the MAC. So uh which we will do in a second. This was kind of slow here. I think the movie was very slow. Let's try to reopen it. Uh let's move on here. We actually we boot into recovery mode by pressing command R. Uh when we are starting the computer, then you enter recovery mode. There is no password into recovery mode. And uh then you start the terminal and then you type uh nvram boot args dot equals zero just as uh the Mac uh Apple tells you on their website and VTD is now fully disabled here. Uh so once VTD is fully disabled, we should be able to target the computer over Thunderbolt here. So let's do that. Uh here you have uh MacBook Air, with that adapter connected to the right. And let's try to log on to that computer without without using a password at all. As you saw we couldn't log on to that computer which is kind of boring so let's insert the PCLH uh control adapter in the converter here so let's start by loading a kernel module into the running Mac OS kernel here. And it's as easy as that. We say that we're going to load the kernel module and that we're going to target OSX here. And the kernel module is loaded at this address. And then we should that be able to remove the password requirement from this Mac. So let's run the uh Apple server and then 64 bit unlock implant here and we need to specify the uh memory address of the already inserted kernel module as well. And it says uh zero is success here and we have a status zero here so we should be able to log on here. So let's try to do that. And we're in. Thank you very much. Uh. So what can we do about this in order to protect ourselves better? Of course we can purchase hardware without using any DMA ports whatsoever. It's the low tech variant. It works perfectly fine. Um if we do have Windows or something like that then we can use the Windows with auto booting BitLocker and things like that. And we should be able to uh disable like express card ports in the uh computers. You can do that this in uh do UAP settings usually. But then you need to probably you need to change the BitLocker settings in order to trigger if uh this port is re-enabled at the latest stage. Of course if you don't want to have your Mac uh uh security disabled in the recovery mode you can set a firmware password on the Mac in order to protect yourself. And also setting a BIOS password in the PC is a good idea. Of course pre-boot authentication is always nice to have. And uh of course the long term solution here is for the operating system vendors actually to make full use of the IOMMU that is already in the hardware. And Windows 10 has some very nice virtualization based security features there going on. So Microsoft seems to do some very nice work as well. So what can we use PCLH for? Of course we can use it for awareness. It's part why I'm doing this talk. Uh you saw today that the full disk encryption is not really invincible in any way. It's uh excellent for for everyone. Uh it's uh VoilĂ . Another important thing to keep in mind is that the software that you use in your well. But uh please if you want to take a look at this don't do any evil with this tool. PCI leech targets uh 64 bit operating systems. It runs on 64 bit windows 7 and 10 at the moment. It's able to read up to 4 gigs natively and if you're able to insert a kernel module it should be able to read all memory of the target system that the kernel can uh read. And if a kernel module is inserted obviously can you can execute code on the target system as well. I have kernel modules for Linux, Windows and OSX at the moment. It's uh written in uh C and assembly in Visual Studio. It's as a modular design I tried to make it as modular as possible. You should be able to create your own signatures very easily. It's very easy to do it easily. And uh also create your own kernel implants. Actually to the right here you see a very minimal kernel implant. Uh it's in assembly and it reads some control registers on the of the CPU and prints them on screen on the computer running the PCI leech main control program. Maybe we should. But we are missing one thing here we should try the Linux demo again here. See if we're in a better better luck this time. So as you saw we couldn't log on with uh Tor as the default password. So let's pull a file from the Linux system. A nice file to pull is the shadow file. And it's as easy as just pulling a shadow file from a running Linux system. Which uses the encrypt by the way. And uh then we can open the shadow file and have a look at it. Uh so let's see what we can do here. So let's see what we can do here. Uh we can do this. And the root account here has a very long password hash. So of course we can try to crack it. But it's no fun doing that. So let's replace it instead. With the default password hash of Tor. Uh so this is the default password hash of Tor. Uh so let's write the file back. And uh we're going to push it back to the Linux system. And we are going to use the file push kernel implant here. And now it should be on the Tor system. So let's try to log on here. See if it works better this time. And as you can see we're in. So when you leave here today uh I want you to remember that uh inexpensive universal DNA attacking is here. It's the new reality of today. Physical access is uh still very much an issue. You should be aware of potential evil made attacks. For example if you bring your Mac onto security conferences. And uh please do remember that this encryption is not invincible. After this talk uh I will be making the GitHub repo public at the this address here. And uh I'm going to show you how to make sure uh this is please give me a couple of hours in order to do that but I will definitely do it uh today. And uh thank you very much to Joe for uh the uh slot streamer and uh you've been a huge inspirational source for my work here so thank you very much Joe. And also thank you to Inception for being a big inspirational source for my work and also thank you to the guys at PLX Technologies for creating this wonderful ship. So thank you, thank you very much for today.