>> All righty. The last talk of the day. Before we go to the closing ceremonies. With that, here is Svetlana and Ivan. >> So hi guys. I am a student at Moscow state university. My co-speaker is Ivan a Masters student at the same university as well. Today we're going present our work on shellcode detection for ARM based platform. We'll briefly discuss. After the we'll briefly discuss the one difference between two platforms. X86 and ARM. Because sorry. Nowadays it's well-known that we have a lot of shellcodes. A lot of shellcode detection, algorithm approach which web-based for catching shellcodes before the x86 platform. Our research is mainly focused on if those methods are applicable (sic) to ARM base platform and how different between two platforms can possibly impact the structure of shellcode and some other features of shellcodes. So actually and what can we learn from it. So let's start. Okay so now we know that a lot of ARM-based devices increases significantly, actually we know that it’s already a (inaudible) times on x86 platform. For example our cell phones and tablets, it's network devices, a lot of routers and controllers work on ARM based devices. So actually it's already laptops for example we know about and so on. We still have a huge space of code written in c and c++. So those codes we want to inherit from other systems. We want to reuse it somehow. So it contains a lot of vulnerabilities and we're talking about vulnerabilities. They're still there. Every time I speak about this problem, people who are interested in virus detection and so on and so on, I'm hearing one simple question, is this problem decidable at all? Because the world could have a huge variation of what it can do. So how's it possible to detect it. Answer is we, uhm, exploit certain vulnerabilities, like memory corruption vulnerabilities. We want to overwrite some pointers and these leads to some structure limitations and shellcodes. We will focus on that later. That's a simple example of shellcodes which focus on stack. So typically stack shellcodes have activator which can be presented by some kind of code which trying to understand its own location of the address executable process. So, if we want to (inaudible) detectors like signature based detectors so we want to our code somehow. So this way we typically have decryptor thee. That situation maybe it's not that bad itself. Because well since we know about the problem if I remember everything correctly modern for 20 years already. So we have a lot of protection techniques. Like stack canaries, ourselves age, depth, so on. Probably we don't have problem but, unfortunately, it's all bypassed. But to our problem. We have as I mentioned before we have a huge base detectors which focus on detecting shellcodes on x86. So we have for platform, we have shellcodes, so they already exist in data repository such that exploit database and so. But unfortunately we don't have shellcode detection method that will be at least a little bit smarter than, like, simple signature based. So what are we going to do with it? First of all, as I mentioned before, we want to understand how two platforms differ from each other. Of course it is two different platforms and the number of differences is huge but we want to focus on those that can possibly impact how shellcode looks like. So let's briefly discuss it. First of all, the common thread in ARM platform is fixed and x86. In x861-byte from 16. Actually in ARM base platform we have 32-bit for ARM and 16-bit for some. So it's led to such techniques that self-synchronize or disassembly. So X86 it doesn't matter from which byte of stack we started to assemble our code. We certainly will have some point at which we will synchronize or disassembly. It will be it helps us to detect shellcode because it doesn’t matter on each offset. But we can probably miss some significant destruction. So there's the ability we don't understand that we engage shellcode or something bad. So as I mentioned before, again, we have two different CPU modes. There's this ability to switch between them dynamically. So what -- in shellcodes we have already analyzed, I mean for ARM platform we have already seen this application. So, for example, you trying to disassemble your shellcode and your bit of data to ARM mode. So you have some signature on it. So you try to detect those signatures. If you switch to another CPU mode dynamically, you will break your signature. Simple application technique. Another interesting stuff there's this ability of conditional interaction applications. So in our platform we have several group of instruction and some of them can be executed or not executed. It strictly depends on the flight values. So speak about for example shellcode. When you are trying to construct your malicious (inaudible) from legitimate binary from piece of library here we're not speaking about it but it's some kind of same probably high level idea. So imagine totally legitimate executable. So, if you construct the executable from conditional instructions so you can have your malicious (inaudible) executable by, like, just by flight values. So to be honest we haven't seen that shellcodes yet but probably will see them in future. Because, again, for me it's very interesting. In this platform to have access to problem we have to have some kind of PC code. Here in ARM we don't have to do it, because there's this ability of direct access to it. In ARM we have lots of storage detection. So it isn't possible to access memory directly from arithmetic instructions. And function arguments and return addresses, go to registers, not stack. Both registers. What you can do with shellcodes. How we can run it because it's not in stack. Actually in ARM we have such functions, it will have some imbedded functions every second. And other functions registers go to stack. It's totally executable. Definitely. Here is a simple example of conditional execution. Conditional execution that was used in imbedded system. So it was used for compilation of programs because it was critical. In imbedded systems that limitation and memory limitation is very critical. But, again, as I mentioned before it's relatively new for application. Here is example of thumb CPU mode and ARM CPU mode, it is the same piece of code so we can see somewhat we can, again, compress our binaries and we have some free space to do everything else. Again, so we analyzed also for detection techniques which I use detection of shellcode for x86 platform. And here's, like, what we found. Static analysis is very difficult. Static analysis which we use for detection of x86 shellcode. Because if, for example, somewhat an analysis of functional flow graph (inaudible). So it's very difficult in case for example of conditional executions to construct such functional flow graph. It's very time consuming. So it's not reasonable to use it here. Makes it all much more difficult. I mentioned it before. Okay. What's the next step? So we already have some on-going research on detection of shellcodes for x86 platform. Basically how we did it. We identified set of features which are typical for shellcodes. And features -- not features of code, it's not signature based. It's typically behavior based and it's trying to analyze some specific type of application which typically used by shellcode writers. Now we want to understand other features applicable to ARM base platform and can we identify something new. So here is the list of static features. >> Here is the list of shellcode features which could mark the shellcode. Half of the features came from x86 shellcodes and the remaining ones are the ARM shellcode features. So we'll run through this list of features and then check them on the examples of real shellcode. The first feature is correct disassembly of data. The point is that the shellcode is executable program. So it has a continuous chain of instructions. If you like this do a simple random data in chain of instructions with high probability, this chain will not have particularly (inaudible). We can assure that we see a shellcode. Second feature is one related to the shellcode for ARM. This represents the technique of writing ARM shellcodes. More than 80 percent of existing ARM shellcodes uses this technique. So it's a good feature. Another one tells the shellcode is trying to find its location in memory. This feature is changed x86 so-called feature. The Get-UsePC code. The difference is that we check not only in excess to information about location but also check how shellcode is used this information in functioning. Then features of problems which currently the system calls or library calls. Every shellcode is an executable program. So you need to make system codes with the right arguments. So we can use this as a detection characteristic. Another one tells the shellcode contains (inaudible). Shellcode can be encrypted and requires some functions to decrypt it. And to direct the control of flow to the shellcode (inaudible). Next feature is (inaudible) which (inaudible). We can check the incoming flow or (inaudible). And the remaining ones are normal conditions for every shellcode. Okay let's see a few examples. First and the simplest feature is the length of this assembled chain. With a help of the feature we can distinguish random data from the executable code. Here you see the shellcode and random data. Then they were disassembled to a chain of instructions. Seems like the shellcode is longer than random data. So, if we want to detect a shellcode we should check the length of chain. Then some fresh code. But ARM code has a huge density. So random data could be disassembled in a long chain too. How can we find necessary threshold? The answer is the initial code is surrounded by random data and the data increase the length of the disassembled chain of the shellcode. So it's not a problem to find proper fresh code. The second feature. This is the most interesting feature, I think. This feature tells the shellcode you should change to ARM code. More than 80 percent of the ARM shellcodes on the web were created with this technique. What is this technique? Shellcode are written in thumb mode and at the beginning of the code the release a selector to thumb mode. For the change in CPU mode we need to make a jump to the address of thumb code and the rest of the IP address should be set into one. In ARM architecture code is linked byte for byte so we can use last IP address information. Let's take a look at the example. Here's the shellcode that you can download from Ex….com. At the beginning of the shellcode we have the graphics which takes the code with write in it to a six register and add in as one which is thumb mode number. Next instruction is branch instruction. Which makes the switch between CPU modes and jumps to the thumb code. This technique is used for making shellcode smaller or to provide more functionality in the same size of a buffer. Because the length of ARM instructions is four bytes and some instructions is two bytes. So the code is far more compact. This way more than 80 percent of the shellcodes have been created in this way. But it is not to the size of shellcode it can go further. He can use this technique before application. One can divide the code into small parts and write every part in some process or mold that is different for preceding part of the code. So, if we try to disassemble these from every step, every instruction chain will be small. Because instruction for one may not be to disassemble construction for another processor mold. But the attacker still needs to produce switching between these molds. So every disassembled chain should contain this switch routine. Thus we may check instruction chains or this chains. The next feature, Get-UsePC Code. In x86 the code was a normal feature because there's no access to problem count register. So we used different methods to get it. For example we could call function then the other return address from the stack. This method gives us a certain fingerprint in the control flow graph which can be detected with control flow graph analysis. Unfortunately in ARM we can direct the get program counter register without any redirections. That is to say the existence of the code could not be proper detection. So we can later GetPC feature to get and usepc the feature. What is this feature? This feature tells there was an excess to problem count register then it was used in load, store or jump instructions. Let's take a look on the example. Here is a shell project (inaudible). The first instruction puts address of the last decrypted instruction to x86. Then used to make a jump to the bl instruction. Here is the pattern we need. Then the program counter is written to the link register with the use of the bl instruction. Because the bl is a function code and the rest is LR register. So LR was the shellcode. This way you use three things. To get another encrypted word, to put the decrypted word back and to jump to the start of the of the decrypted shellcode dialogue. How do we define the ARM specific code? We define it by running an extra execution. We forward the instructions but don't make iterations on the registers. But our register has two Boolean values. The first value means that this register has been initialized by a counter value. For example, if we put constant in our zero register, it becomes initialized. The second value shows that this register is somehow in directly referred to the place of shellcode in memory. How we define it? If there were iterations that put program count to the register, this register started to point to the rest of some part of the shellcode and the second Boolean value becomes true. Then if we make preparations at this register and if all arguments of the separation are initialized registers for constancy, the register remains in direct iteration on the shellcode on the stack. Otherwise this register becomes uninitialized. Next feature is the existence of character form, system calls or library calls. This means that arguments of this cause should be initialized. Let's consider the parts of the shellcode from the framework. Here we can see some system calls. How we use system call in ARM. All registers i0 to i6 use arguments for system call. And a sound register keeps the (inaudible). To understand that we've got a current system called we need to check that some arguments have been initialized and some register has been initialized, too. If we wouldn't check for this and only check the existence of system call the precision of our detection would be critically small. Because there is a big probability to disassemble the system call somewhere in random data. This call is only for byte value. So, if we have a lot of this system call bytes we should not be interpreted as a shellcode. Seems that our feature already works twice. As you can see here calls socket connect and there will be more following. The existence of read, write cycle. This feature tells the decrypted routine should have the cycle that can always load and store instructions. Because this routine should load encrypted payload from memory. Decrypt it and put it back. Let’s consider the shellcode decrypted from (inaudible). We put the count on the cycle and with the use of r14 we ran the shellcode where the shellcode will be executed. In this instruction we put the value of the program count from the register. So it refers to the shellcode. We see the cycle and it covers two instructions, load and store. We make some limitations on this feature. All registers that I use to load and store instructions should be initialized. So we have to use the same abstract execution technique that we have seen previously. If there was not such a limitation it will be more than 50 percent of false positive degree. Because sometimes the random data has cycles. This can waste a lot of data that may contain log and store instructions. But if these registers were initialized only then we will be able to mark the degree. Another interesting feature is the existence of return address zone. The return address zone is the ridge of very useful addresses. These addresses should be valid because they direct the control flow to the shellcode. So this return zone should allow execution of instructions. Otherwise we will get an excess violation. On the example we see how the shellcode overflows the buffer to derive this return address. So, if we check the data for this valid address results we can detect shellcode. This is a very simple technique. Almost all features that we review earlier have (inaudible) to the dynamic detectors. The dynamic detectors are identified in the virtual environment. We run an emulator then run the shellcode in the emulator to identify these features. We can the last malicious system use of program count to register, switch to different processor models with that emulator. It will have more accuracy but will also will have more complexity. To arrange and combine the (Inaudible) static and dynamic detectors, we come to the idea of (inaudible) that will be explained a bit later. Now let's consider the features that can be examined only in dynamic analysis. The read and write from memory which marks the decryptor. We run a shellcode and we see that the end number of reads and he also did the end number of writes. You may assume this is a shellcode which has read some payload, decrypted it and put it back on the stack. Therefore to understand that we've got a decryptor routine we should count the number of reads and writes and compare them with the fresh value. A very important point. We count the number of the unique memory reads and writes. I mean, the number of memory where reads and writes were done. If we didn't do that, then maybe random data which has an endless cycle which writes only one byte in the same place. By all indications this is not a decryptor. Another decryptor feature is providing control flow to the shellcode payload. After decryption the routine should make a jump to the start of decrypted shellcode. Thus we need to check there was a jump to addresses where the writes were done before. And we also take into account the number of these writes should exceed the write number of fresh code. But this could not be only jump. It could be every automatic instruction, because we can directly move the address to the problem count register. We run this end emulator so we need to check on the value of the PC register iS equal to address where the writes were done before. Here is an important example why the dynamic method get complicated in the ARM architecture. ARM architecture provides additional execution. This means the instructions can be executed or ignored depending on the flight register value. This provides bi-conditional suffiz in the instruction. For example, we have two emulations of one program with different enter conditions. At first, there is a block of the instructions with the AL suffix. Provided this instruction will be executed with all register values. So it seems that this block will be executed in both emulations. Then we have instruction with the not equal suffix. That means that this instruction will be executed only when zero flag set to 1. So the instruction will be executed only in second emulation. There we see the suffix. This means this instruction can change the flags register value. In the second emulation, flags become zero and one. Then we have the CS block, it works when C is set to 1. After the block there is empty NE with instructions that executes only when zero flag is set to zero. This block change the r3 and r4 registers. So the next ACCS instruction that executes in both emulations give different result. I mean, the same instruction in the same place gives different result in different emulations. This example means that we can now work initial values of the flags register, use at start of emulation. So we should emulate the same code 15 times. Because we have 50 conditions in the ARM architecture. If we want to use only the dynamic detectors we should emulate the same code from every off set. Because shellcode can be anywhere in the analyst data and that affects the performance of detection. So, if we want to wisely use all the features together we should make that hybrid classifier. >> I talked in depth in simple (inaudible). So of course we don't want to throw away all research approaches which were done on software detection. So, basically, we want to combine them in some of multiplier. But that characteristic that we mentioned before. They provide a good distinguish between legitimate flow. So what I mean? So we implemented all the features at the code detection library. So each and every vertex in the graph means that it has some classifier or some specific feature of code. So each vertex says that it's probably shellcode we would transfer it to their adverse one. So the main idea here is to put the simplest detectors at the top of the list of detectors on top of that. So that help us to, like, you know, reduce legitimate flow as fast as possible and to analyze with dynamic methods or with existence compilation methods which works and are sometimes time consuming on top of the (inaudible). Run it on the most probable shellcode. So this works like decision making tree. And after that all information collected on decision making models which says that it's possible shellcode. And it's classified shellcode. By classifying shellcode I mean that we are able to say that this shellcode is shellcode because it has, for example, no threat. This shellcode we believe that this piece of data shellcode because it have some certain type of classification and so on. So as I mentioned before our previous work was focused on the shellcode detection for the x86 platform. So we implemented it as open source shellcode detection tool called Demorpheus. Well here we extended it for model which works with shellcodes for ARM. So our experiment wasn't for data set. It was shellcodes which were collected from public depositories and (inaudible) as well. So we were also testing on legitimate binaries because all of the shellcode detection methods works -- doesn't work really good on legitimate binaries. They have a lot of positive. So it was also random data like the collect normal traffic of normal user working on the system. And again it was lead -- we collect multimedia because shellcode detection algorithm which exist right now worked really bad on multimedia. Our experiment shown on the current slides. And this is result on shellcode. Again we used a simple laptop. (Applause) >> Hi! >> Excuse the interruption. We have a tradition. A new tradition. (Laughter) You like that right? No, no you're part of the tradition. Come back over here. So this young lady's a new speaker. What do we do with new speakers? (Shouting). >> There you go. >> I'm sure they just heard me on the moon. >> How about some love for the new speakers? (Applause) It's much more fun now. >> Okay. A drink for the first time. >> Oh yes. (Applause) Reproduced on applications for (inaudible). The version of Android is 2.3.3. The vulnerability is a simple one for overflow. In the dynamics delivery that is loaded through the Java interface to our application. We wrote exploit for vulnerability and exploit and use the shell (inaudible) as a payload. Okay. We adjust the parameter of exploit and -- just a second. (Discussion off mic). >> And here we made the tool to (inaudible). Okay. Here we see the tool found the shellcode. For this IP address (inaudible). >> Again, it works surprisingly well on the (inaudible) but, again, as I mentioned before we used sophisticated algorithms which ran on applicator shellcodes. So thank you guys for your attention. Thanks for being for the very last talk. So we really appreciate it. (Applause) As far as I can understand Q & A session will be in the lobby. Yeah. Okay. So we still have seven minutes? If anyone has questions. Okay. Thank you. (Applause)