>>My name is Morten Schenk. Uh I come from a small consulting firm in Denmark and I do a lot of. [Inaudible yelling from audience] Better? Great. I do a lot of uh blogging about exploitation both use mode and Kernel mode. Um you can read on the blog and so look it up. So this talk is uh primarily about how to leverage these vulnerabilities all from low integrity endpoint inside a sandbox. So you pay lot of eh hex and uh C code here. And some o days. First we’ll go through some uh brief history on kernel exploitation to try to get everyone up to date. Then uh look at the core common mitigations that have been put into place in the latest versions of windows ten. Let’s see how we can actually overcome them. So write-what-where vulnerability. The best case of this vulnerability class is that if you co- you can write a controlled value at a controlled address somewhere in the kernel. A more commonly found bug will most likely write a non-controlled or semi-controlled value at a controlled address in the kernel. So once that’s possible we have to leverage this right? And so we have to get kern- kernel code execution. The most important part about this is we have t know where to write. So that’s uh one of the problems. This techniques shown here are for write-what-where vulnerabilities but they can be used for all other vulnerabilities as well, like uh pu- kula- pu- pool overflows or or use after freeze. Yeah so looking at uh Windows 7. Uh was actually quite easier back then. Um first of all you could use uh just allocate just the code directly to the NonPagedPool, ‘cause it was executable. Then use the uh built in APIs of the operating system just to uh return the address of the location. Then use the [indiscernible] to overwrite uh a function address like in HalDispatchTable and then just call that get your code executed in the kernel. Even easier you can actually allocate uh user-mode memory and then allocate eh eh execute that from the kernel as well. So it’s very easy back then. Going forward Microsoft to make quite a lot of improvements for security wise. Uh first of all the uh API to call to uh get the kernel addresses have been blocked from a sandbox so not possible not to get these allocation addresses anymore. Furthermore, uh most built in APIs uh use the NonPagedPoolNx as a standard now. Which means that the locations are not allocated in executable pool memory. So even if we knew where they were, we couldn’t execute code off there. And finally uh user-mode is uh supervisor mode execution prevention has been uh implemented as well. Which means if we try to execute code in user mode from the kernel, we’ll get blocked as well. So the technique of windows seven don’t really work anymore. So what we need is we need a primitive. We provide primitive just like we know from the uh browser exponents, just in the kernel. Some of the most known ones are the uh bitmap primitive and the window primitive and these have existed for some years. Um. The idea is uh for the bitmap primitive that you can then get the address of a bitmap through uh a table called a gdisharedhandletable. Once you have the location of of a bitmap you can use the vulnerabilities to override the size of it. Once you overwrite the size, make sure to have two bitmaps uh after each other. Then use the first one to overwrite the second one and overwrite the pointer to the data area and use the second to read and write into memory. Certainly we can use a window primitive, kind of the same thing. We leak the address of the window, this time using another table called the user handle table. Overwrite the size of uh what’s called the extra bytes. And then use the uh um set window long pointer parameter for API to uh allocate his ex uh change his extra bytes. And since we over writing over written the size of the extra bytes, we can actually change um the pointer for the string string im- string end pointer of the window. And this allows us to read or write again into our memory using use mode APIs. We got in user mode uh me- kernel execution, uh sorry use- pe- user mode page is execution uh we have to bypass uh SMEP or uh get the uh dep in the kernel disabled. The common used method is uh overwriting the page table entries using the write primitive. So we find the uh address of the page table entry for the code we want to execute and simply either flip the uh NX bit or change use mode page into a kernel mode page. Then we can execute it again. Sometime a uh KASLR Bypass in the kernel is needed as well for some uh x points especially when you want to run actually run the uh shell code. Um there are two known techniques. First of all the Hal Heap uh has been static uh in a man- for many years. So there is a pointer there. So uh you know whether its pointing to the NT kernel a static address. Just like the SIDT uh instruction can be used. You get an address back from uh disrupted table and then that there the nto kernel pointer. Then you use a read primitive to leak this pointer out and find the base address. By simply going taking the NT pointer, going backwards in the driver to find the API header. This is kind of like where kernel load exploitation was before uh the version of the anniversary last year. In the anniversary update, Microsoft did quite a few changes we’ll go through here. See how try to mitigate other ways of doing the kernel exploitation. First of all the randomized the page table entries which means we can’t just flip the bit because we don’t know where it is anymore. They removed all the kernel addresses of the bitmaps and other objects from the uh gdi shared user handle table. Which means we don’t know where to write anymore. So we can use bitmap primitive. And also the SIDT command, uh has been mitigate if we running in eh um we’re using Hyper-V then it simply it gets uh a disclosed value so we don’t know where the NT pointer is. So let’s uh break some of this stuff. Additionally also these mitigations which weren’t really uh public, uh what it did is that they uh made sure that the string importer for window object has to point inside the desktop heap. ‘Cause that’s where the actual gets allocated. So we try to read write outside the desktop heap by overwriting the pointer. We crash. So this allows breaks the unit primitive. So let’s see if these mitigations actually work. Let’s look at the bitmap first. So the bitmaps stored somewhere in memory. It’s actually stored in what’s called large paged pool. This takes place if the bitmap is at least uh thirt- thirty thousand uh bytes long. Um or more. Of course large paged pool is randomized in reboot so we don’t know where it is beforehand. So we need some kind of kernel information leak to find it. Luckily if you look at the TEB we have the field in win thirty-two thread info, it actually includes a portal into the kernel. It doesn’t point a large paged pool, but does point close to it. So we could do, we should try to stabilize pool. First allocate a few very large bitmap objects. Once this is done, we can add a static offset to the pointer we find. We actually pointing inside of bitmaps. Sadly we only pointing inside the data of the bitmaps. There’s not really, we cannot really use it from our cloud. But what we can do, is we delete the second of these bitmap objects. And try to allocate around ten thousand new bitmap objects. Of a s- of a page each. We’ll find this point actually does point into new bitmaps. This is a way to find the bitmaps again. Since we know what the bitmaps hit, you know we can just overwrite the size and again use two consecutive bitmaps to actually get uh a primitive back. So even though uh Microsoft removed the uh addresses from the uh the table, it's not really needed. And you see here I did the uh simulator write what where just by overwriting the length and then [indiscernible] execution with the code it's possible to read out the uh the the constant of the kernel and also write to the kernel. So looking at the uh window primitive, well we’re not allowed to really write outside the desktop heap and this is due to a new function called the desktop verify heap pointer. Uh and every time we try to actually use the uh string importer by either reading or writing for it, well we have to be - is validate it through this function. So it takes the base address of the desktop heap and the size of it and checks it inside these two. Well what we notice is that the pointer comes from an object a tes- tech- tag desktop object. And there’s actually no validation performed in this object pointer. And the pointer is hidden for your object. Our window object. Which means if we can find this pointer and replace it, we control the verification address. So what we do is we use uh delete this which is known from the user handle table and just leak the address of it. And then web just when we overwrite the uh string importer we also overwrite the pointer for the uh test tag desktop object because its the header as well. Then when we try to read or write well we control where the desktop heap is. So verification succeeds everywhere. Again we can see here trying to simulate write what where and just like before we can read and write an er- to the kernel. So yeah, that’s what pu- what was put in anniversary update, but let’s talk what’s created these updates. Let’s see what passed on there. ‘Cause we de- did additional mitigations. What we find in the especially for the window object is that the user handle table is used to uh disclose addresses of the windows objects has been changed. So before we saw a lot of kernel addresses there, which were all the objects, they’re removed now. So we cannot oh- don’t know where the objects is so we don’t know where the windows are anymore. Additionally the uh the field called cli- client data, which is the offset from he user mode map desktop heap to the actual desktop heap has been has been removed as well. So there’s not really anyway here of finding the uh windows objects anymore. Additionally we use the set window long p- uh pointer API to uh make sure we e- brought the extra bytes to the kernel and overwrote the string importer. The new up update here, we see that uh when from this action the content which was [indiscernible] if you’re interested, is no longer written to kernel mode. It’s written to user mode. So even if we actually knew where it was and we overwrote the length of it, we would try to actually overwrite it, it doesn’t. It would write in user mode instead. This of course breaks uh the primitive doesn’t work anymore. Um some different additional changes to that is that the size of the bitmap object header uh has increased. This doesn’t ch- uh break the primitive, but we need to uh make sure that we uh change the the size so that allocation alignment still works. And now the HAL Heap is also randomized so we don’t need where the NT pointer is anymore. So actually the try to do a lot of stuff to break the window primitive. Let’s see if that actually works. We set uh set the client delta is now gone, if you inspect memory, we find that it’s been replaced by user-mode pointer and if check what’s there the user-mode pointer. We actually find the desktop heap. The kernel desktop heap to user mode to make uh lookups faster. But that’s also make sure that we actually have the kernel addresses. The user handle table was nice because it was a metadata, so it was fast to do a lookup. But we don’t have that anymore, but we have the actual data. So we do instead is we just manually search for it. So we ta- take the uh the um uh the handle value, we search through desktop heap until we actually find it. Once we find it, we know where its at. So we’re still able to leak the address. But there’s additional pr- problem right? Even though we knew the address, or the size of extra bytes we couldn’t use it anymore. Because we just write in user mode memory. Looking at how that’s working we find that the size of extra bytes is actually defined when we registered the windows class. Well we register the windows class to set different parameters as well. We also set the parameter for an object and even better, it also has an API to uh set these extra bytes. And its not the same. This one is called set class long pointer. And low and behold even though Microsoft tired to mitigate it, they only changed one of the APIs. Didn’t change a second one. So the extra bytes from the tagCLS was still placed in the kernel. Which means we can allocate a tagCLS before the window object. We can use extra bytes from the tagCLS object to overwrite the string and pointer of the window object. And this way we have a read write primitive back again. It’s not easy to mitigate them. So this is clear even though in kernels update it did a lot of changes we still have our primitives. But they lo- list a lot of stuff right? To make sure the kernel ASLR is better and looking in memory almost all the kernel memories are randomized. The only place I know of that isn’t, is the uh the user shared data uh structure for the kernel. But its not executable then there’s no point is there? So its not really interesting. The HAL Heap and is randomized. SIDT mitigated, so we need some new way to leak the NT kernel pointer. My idea is that uh perhaps you can finally leak this primitive related. What we first need, we need uh two bypasses. We have two primitives and I wanted to be ac- be the nt kernel pointer we leak. So to uh try and pursue this idea. I start about reactos. Reactos is the open source pre-implementation of Windows xp. With a reverse engineering of all structures. Which means that undocumented kernel structures are found there even though they’re for windows XP and thirty-two bit. That might give some hints. So looking at the data structure for bitmap or surface optic as its really called in the kernel. Uh we find a field uh called H DEV. An explanation for that H DEV is pointer to a null object called PD DEV object. Inspecting further we find the REACTOS has documented this pic as well. Actually contains a lot of function pointer, which means these function pointers, point to same kind of kernel forever. We see again gives us our kernel um bypass. KASLR bypass. We’ve got shake on bitmap object. We find the H DEV field is empty. There’s nothing there, which means we cannot find this, we cannot find the integral anyway. Luckily, the bitmap we created using [indiscernible] bit by API isn’t the only bitmap. There are several other APIs to create bitmaps with. One of them is the uh create compatible bitmap. And trying to use it, this API shows that we actually do populate the H DEV field now. And while this is populated, we can verify in the dbarker that it does contain a pointer to driver. It’s not the NT kernel driver, but is a driver. So how do we do this in export? Well what we do is we uh know where the first bitmap is at. We found the datas so we go to office two thousand, we free this bitmap then we re-allocate it with the compatible bitmap. We just spray a couple hundred of these to make sure that one of them is re-allocated in the same spot. Once that is done we just read out the NT pointer for the code driver. You use, the reason I took this driver so this function in us it says this function actually contains a call of two B instantly into the kernel. So from this its very easy just to read out NT kernel pointer and then again usually uh you read primitive to find the base address of the NT driver. So this way we have a generic bypass to find the NT kernel, using the bitmap primitive. Looking at the window primitive, well its also the argument REACTOS. And the header structure for window update is kind of convoluted. Its a lot of structure nested in each other. Well if we follow the chain, we find that a lot of different header structures. Uh end up pointing at the uh the uh sort of the uh end thread. Of the kernel process. And this for its for its very interesting, because it contains a pointer to the NT kernel. So again e just use a read primitive and the location of the uh window ob- window object to read out NT point- NT kernel pointer and then find the base address, by looking for the PE header. So in this way we actually have a we generic way to find the uh NT kernel pointer NT base address no matter which primitive we use. As I said I’m going to be talking some extra stuff here. So uh actually while I take this research I found a couple other uh bypasses. Um one of them is a primitive independent. Um which comes from the TEB. As I said the win thirty-two thread info field we use that to disclose the address of the bitmaps. But there’s more than that because it’s actually thread info pointer. And a thread info pointer points to the E thread. As you saw before, the E thread contains the NT pointer. So in this way we just create the NT pointer without actually allocating any kind of objects if they have a reader primitive. So this would work with any other primitive as well. Its from our firewall. If we need to know where the the uh bitmap primitives are, instead of using the uh the TEB, we can also the the desktop heap. It actually contains on the the um windows update we can find point there, use the same setting offset from and we get the address of the bitmaps again. This was uh a largely fixed update after the submission of dark, but it still works on the anniversary update, using the uh thread local storage pointer. You can also disclose uh an address into the kernel, which can be used to find the address of the bitmaps. And finally also mitigating my previous update, but still working an anniversary update uh instead of allocating, actually allocating the window object and using the head of that, we can use the DCE infrastructure directly by uh searching through that and actually finding uh prior to NT kernel. But simply looking through structures on simply land a cor- correct Windows object. So this way we’re sure we have a way to bypass KASLR even if one of them is fixed. Although its always cool to bypass kernel KASLR, what do we need for? Well we need to do something and what we did before is we used to read our primitive to overwrite the page table entries. So the pages were either executable uh in the kernel or we flipped the bits so it was uh usable age would turn into a kernel page. The reason we can do that was before this the uh page change ta- the page table entries were u starting at a static base address. And we just calculate the address of the uh page table entries for any address. But now since its randomized we cannot do that anymore. So we need to try to de-randomize it. And my thought is that even though its randomized the kernel must use the lot. So it must have APIs for this. These APIs must work even as randomization. So I looked up and I found couple different APIs which uh I used and the most simple one I found is called the migetpteaddress and it's simply used to translate between uh an address and the um and the PTE. Looking at it an item we can start to analyze, we find that on the left side is it has a static address. So it's not compiled into it its gets uh changed on run time because we can see on the right side, that uh at run time the address is different. That also means that when the drivers running in memory we have a way actually of finding the base address of the page table entries. So what we do is we find this this uh function and from that we read the base um randomized base address. And how do we find this function? Well one way is of course since we have the base address of the uh kernel uh the NT kernel, we just add a static offset. The problem with this is it doesn’t work across patches right? Every time there’s a patch, the offset will change and we need to fix it. So better way would tot would to do this dynamically. Since we have a read primitive, I would like to look this up. Uh what we do is we just dump the uh content of the NT kernel, then search through it using hashing function. As it turns out, the hashing function just asks the first QWORDS of the function is collision free. So we just calculate hash beforehand and then it should be doing the execution of the the uh next point, dump use the uh read primitive to dump the content, go through it until we find it. And doing this finds it across different versions. From that we can just read it out since it's at offset thirteen. We read out the base address and then we can use the old formula again with the updated address. And from this we simply find it uh simply find the uh page table entry and we’re back to where we were before. We just allocate this time, this case uh of shell code directly in kernel memory. Flip the NX bit using page table entries on the write primitive and then we can call the shell code like before through by overwriting the HalDispatchTable uh and then invoking it. Similarly we could also just allocate it in user g- user memory and flip the the uh the bits so it becomes a kernel page instead and execute it. Let’s will work. So this is like what we did uh before. So even though they implemented a lot of changes, lot of mitigations, well we can bypass them all. And just a recap steps here, we use vulnerability to create a read write primitive. From that we leaked a base address of the NT kernel using either primitives. We loca- locate the address of this uh function. From that we get the randomized base address of the PTE tables, we can then calculate the PTE of shellcode address. Co- uh copy a shell code to the right page, then overwrite the overwrite the PTE of the shell code and run it. So yeah. This is just like we did before. There’s not really uh thing that changes how to do kernel exploitation now. It’s the same thing. But I thought, I like the old days in windows seven where we could allocate executable kernel mode uh memory. Just execute it without flipping any kind of uh bits of the page table entries. Is that possible in windows ten? That would be awesome. So I started looking into that. And uh the first thing you come across is how does kernel actually allocate pool memory? Well it uses an API called exallocatepoolwithtag and this uh API takes some ar- some uh arguments. One of them is the pool type. And even though the uh new standard is the uh nonpaged pool NX. The old one is still uh support is also now called the nonpagedpoolexecute. So we would invoke this API with the correct pool type. We would allocate executable pool memory and this API also returns the address of that pool memory. The other problem is of course this API is a kernel mode API. We can call it directly from userland. So what we can do is we could of course override the um the Hal dispatch table, this API could call it. The problem is we need control of the arguments. And when we invoke the uh functions through the Hal dispatch table, we can only specify two arguments and they have to have specific values for it to actually work. So this system call doesn’t work. We need a different one. Looking around I found the variation [inaudible] ones less NTGdiDdDDICreateAllocation. And trying to follow that in the kernel, we find it jumps through all the uh bits by bit two K kernel drivers. Until it ends up being called through a function table. So it's a very thin trampoline, nothing gets touched in the way there and looking this function table, we find that this calls into a different driver and best of all that the arguments are not modified from the time that our system call in user land so the execution of the uh function through the uh function table. Additionally, prototype of this function, the user mode function, is that it returns QWORD. Which could be an address. So it fits all our requirements, to actually to call it, to to call the the uh allocate pool attack. The only thing missing is that we uh we should be able to actually write that address there. In inspecting the function table we find that its writable. So we can just patch it using our write primitive and overwrite it with uh allocate pool attack API. But of course we need to find it and since it's a function table, it contains pointers we don’t know beforehand. So we cannot use a hashing function to find it directly. So we need to find a function which uses this function able. Um and one function I found which works is the dive occlusion state change notify. Uh and it's quite simple function, simple header and calls into the uh function table. The other problem is it's located in the win thirty-two K dot full dot sys driver. That we don’t have the address of that. So we turn one problem into another. We need to find that kernel driver. One way to do that is using the uh P- PS loaded module list. Its a uh length list which contains all the kernel drivers currently loaded um and looking through the structure we find that the uh name of one driver is located offset sixty. And the base address is that driver lo- located offset thirty. So again we can use a read primitive to read through this list looking all the names until we actually find the correct uh driver name and just read out the base address. So now we have the base address. But again we turn the problem into a different one. Now we need to find the Ps Loaded Module List. And this is a load length list we can use a hashing function because we don’t bu- the the values of it. So again we need to find a function that uses this. One function is the uh keCapturePersistentThreadState. Which is located in NT kernel. And we have the NT kernel base address, right? So we can use then a hashing function find this function from them, get the base address so we from that get the uh Ps Loaded Module List from that get the base address of the win thirty-two k full dot sys. From that get the drive occlusion state change notify and from that get the function table. Luckily all of this takes less than a second when running, so it's not a problem. Once we got that we simply overwrite offset sixty-eight with the uh the uh allocate uh pool with tag API and then call it. Allocating pool memory and returns allocating pool memory then we use a write primitive to copy the shellcode into it and execute it by overwriting the uh function table with the allocated pool memory. This is like just like the days of win- almost like win- days of windows seven. Allocate executable pool memory and execute it. This has a few more steps so it's not more efficient in anyway, but it's a different way. So one, one method gets fixed, doesn’t work, we can use another way, another one. Yeah. So let’s try to see it in action. So we have a windows ten here. I’m running in uh low mandatory level. So uh just like inside a sandbox. So we try and run it. Does the uh pool spray and wants us to simulate the write what where, so let’s do that. From the kernel reporter. And we see here that it's we get the address here and it contains the uh the length of the bitmap, just as we want it. So uh just simulate the write what where. So I increase the length here and then execute. We go back. We haven’t had a crash. But we got a system shell. [applause] And as you can see from the time I actually ended the uh simulated write what where the execution was yeah this in a second. So even though all these lookups don’t take any time in reality. So in a summary, even though there are a bit lot of mitigations in the uh the kernel versions of uh windows seven latest versions of windows seven uh windows seven. None of the old techniques really broke it. We can revive them, we can bring them back. So read write primitive work. Page table entry uh overwrite work. We can actually lead the NT kernel in new ways. Which work- didn’t work before, but we can leak it. And we can actually also now allocate executable po- uh kernel pool memory. The code for this is already on uh GitHub. Um so you can you can get it there if you want to play with it. Uh we’ll just say that of course I didn’t find out all of this, uh this was pre-research here. Which I want to credit people for. Yeah. That’s it. Thank you for listening to me. [applause]