Thank you for waiting with us. Uh, I'm Mudge, uh, this is Sarah, and we're going to give you a little, uh, uh, information about the Cyber Independent Testing Lab. Uh, you may have been reading about it, kind of pull back, uh, the curtains, and, um, so, that's the introduction. I'm gonna let Sarah start this off. It kind of, as a heads up, uh, we go a lot from like a 30,000 foot all the way down into the weeds and then back up, so, uh, it's gonna be a little bumpy ride, but hopefully it's enjoyable. We know different people are interested in different technical levels, so we wanted to make sure we hit something for everyone. Uh, but anyways, uh, this is just a preliminary data peek behind the scenes, so, um, if you visit our website, it's mostly just coming soon right now, but, uh, this is what we're, but, uh, uh, this is, uh, what we're up to. Uh, just a quick peek. Okay, so first off, uh, what problem are we trying to address? It's the fact that, uh, uh, uh, uh, uh, uh, uh, uh, uh, we've been trying to get people to care about security for years, but then whenever somebody says, okay, I give, I get that it's important, what should I do? We don't have very much concrete data to give them. They don't have anything to act on. There's no consumer reports that, uh, for software security that tells them what's the safest browser, and if I put it out to the floor, what's the safest browser, we'd get a lot of very strong opinions, but a lot of those opinions would be a, based on what feels true rather than actual data, and that's the problem, that nobody actually really has data. Um, so, uh, what are the things people are trying to do right now, and why don't they work? First off, there's certifications and evaluations, which are frequently focused on processes and procedures and don't actually look at code or at the final product, so they can't really, they don't really have any kind of an impact on the data. Uh, so, uh, speak to security. Um there's industry and marketing labels but these are frequently vague or misleading um and they're they're not you can't compare between them. If two people have the same sticker on their product you don't know which of them did it better or which one did just the bare minimum and then stopped. So it's not useful to a consumer. Um source code review definitely has its place but it's not there to serve the consumer. It's the either an internal thing by the vendor or the vendor's paying somebody to do this on their product and they're the consumer doesn't see the results of this or whether the bugs that were found in review actually got fixed. Um and so it's just uh the incentive structure there isn't right for consumer advocacy. And finally legislation is frequently well meaning but a lot of the time ends up trying to fix a problem by making it illegal to look at it. And so it's not easy to fix it. And that's a terrible way to fix anything. So is this mic on? Can folks hear me in the back? Great. Thank you. So I wanted to dive pretty quickly into some of the data. Um and just so you keep it in your mind we're measuring how difficult it is for an adversary how much work you can impose on an adversary to newly exploit a piece of software. The more work you can impose on your opponent and you know without any work on your own you know the better off you are. So to date we've looked at about 100,000 binary applications. Um with about 100 plus features uh static features we've got a whole bunch of dynamic ones we'll go into uh some difference there. Uh and uh measure you know as bug hunters and attackers and exploiters which I've been doing that for 20 almost 30 years now um I'm old. Uh you know what did we see? So when you get such a large amount of data and applications and binaries and libraries to look at in a operating system you can build up these continuums. So here's the first continuum. This is Linux Ubuntu and this is the first uh 10,000 some odd binaries. And the way you read this is the easier to exploit is further to the left going down to negative numbers here. And you know we normalized up at 100 percent on the far right. And then the number of binaries which are executables and libraries are you know uh bucketed into the columns of uh five point bins. And a with this you can kind of so the let me back up the installation is this is all the base software that comes with it plus most of the con most common third party applications installed and because of this you can kind of pull in any of the data sheets from the really detailed data uh extracts uh the the relevant metrics and kind of plot them. So if we throw the first couple on here uh were gonna do it on a relative hardening line underneath that's mapped to the top part. And we'll take two common browsers Chrome and Firefox. And if we do this we have a sistema we call it an Chrome, you know, did a little bit better. It's a little bit more difficult to exploit. If you look at the underground market, the cost of a zero day on Chrome bears that out. It's uh slightly more expensive than the cost of a zero day for Firefox. Uh and then the yellow triangles up on top, you know, show where that is. Next slide. We see a little further down, um cause we're, we're approaching the only fifth percentile mark, uh you start to see some of the office sweeps, the client side applications. Um for crowds like this, this isn't that much of a surprise because we know that the people writing those, you know, it's, it's, they don't view themselves as the most common attack surface but it's also why client side attacks and attachments that get opened up by these are, you know, the easiest path to compromise and exploit. Um and next slide. This is OSX. So we've done this on Linux OSX and Windows and we'll have some of the RTOS's and the other architectures coming online. And instead of just the first 10,000, this is the 38,000. So this is all of OSX, uh El Capitan, uh and about 20,000 third party applications that we went out and measured uh and then plotted on here. So let's see what this looks like now that we've kind of seen Linux. We have a much wider spread for Chrome on the far right, uh pretty decent, almost approaching the 95th percentile. Safari and then Firefox is way down here and this was surprising to us. Uh it was also surprising to the Firefox development team. And they've now confirmed this as well. Um so that's nice and I'm gonna get into what makes up these numbers to show you the depth of the data that we're extracting out of the binaries to give you some confidence uh in how this is working and how it would be useful to you. So let's look at some other applications out of these 30 some odd thousand. Well, here are the different office suites that are available on OSX. You've got Microsoft Office coming in pretty, again, pretty easy to understand, pretty close to the bottom 5th percentile. Uh open office and you know Mac uh uh Apple's office. Again, all on one particular platform. I should point out with this sort of view from the data, it is important to only do comparisons within the same platform. So it's not fair to say, oh, how did Firefox on Apple do compared to Explorer on Windows? Because they have different attributes and different traits, I mean, and the compilation settings are different, but we'll go into that in more. So let's pull out a few more examples. Because it's nice to have something on the far right and the far left. And why not let it be the very software that installs the security patches for your product anyway? Um, yeah, that's a little disappointing to see the Microsoft Office updater being a negative 7.5. Uh, I was talking to a team uh that I believe might be involved with uh some of the zero days going around that are popping a lot of the OSX boxes on El Capitan. And I said, uh, you know, any, any clues as to. You know, how are you going in? And they said, oh, you'll figure it out. And I went back to them, I said, all the boxes that you're going in on have Microsoft Office installed, don't they? And they're like, well, maybe. Um, and this is a real quick way of looking at, not only as a defender, being able to choose and say, hey, if all things are even, and I can kind of choose which Office suite, because it doesn't matter to me, I'm going to choose the one that imposes more work to my opponent. And that's kind of the goal. The flip side on the offensive side as the adversary is, I want to choose the lowest hanging targets because I've got finite amount of resource and time as well, so why am I going to go after OSX's software updater if I see Microsoft's updater listening on the network and taking input? Okay. This is the base, just the base, no other apps, we haven't finished this one yet, of Windows 10. And I wanted to pull this out because this is really impressive. Because this is a level of consistency that we didn't see in the base of, um, Linux. Uh, when we were looking at the base, uh, before we added in all the other ones that we did not see this in OSX, and the development life cycle and the compilation process that Microsoft has imposed, this is way different than what it looked like on Windows XP. Um, because you can see, they're very consistent with what they put in. There are a few on the bottom, that's because there's this one set of applications that I did install, it's a big data analytics package, it was installed on all three, it was the bottom in all three, and we're going to point out the lessons learned on that one in a moment as well. Rest assured, as we put in more third party applications and as we start to flesh out more of the binary and more of the dynamic feature sets, this is going to start becoming a bit more, uh, dispersed. But right now, this is, this is kudos, Microsoft knows what the heck they're doing on Windows. Um, previous slide we might question if they know what they're doing as well on OSX. Okay, so. Into the mic a bunch. Alright, so the, we've shown you some numbers, but you probably want to know what kind of data we're looking at to produce those numbers. Uh, and we're, there's a lot of other industries that have trained consumers in how to make complex technical decisions. So, in figuring out how to help consumers make security software decisions, we're drawing from those. So, first off, we've got our static analysis features, things where we're measuring aspects of the application without running it, and this is like the nutritional facts for the software, which functions were used, and what are the complexity values, et cetera. Um, and then, uh, the runtime testing, the, uh, dynamic stuff where we're fuzzing and, you know, crash testing is like crash testing cars. You know, so the Monroney sticker that you see in any news, new car that you're buying, uh, you know, tells you how it did in crash testing, and what's its, uh, EPA expected miles per gallon, things like that. And then, um, the safety continuum, where you see where that software falls compared to its peers, or to the rest of the, that, that, that, uh, software environment, uh, that's like energy guide, where you find out, you know, how much does this fridge to run, cost to run monthly versus another one, or what have you. So, in the, in the, uh, static part, which is a large amount of what went into creating those scores on the back end on the continuums that you saw, um, and is really going to be the focus of most of this talk here, although we will talk about what we're doing on the dynamic aspect, um, includes essentially the hundred plus feature, uh, extractions in the following three categories. Measurements of complexity, uh, turn out to be very, very important. Not only because, um, you know, the more complex something is, the more difficult it is for the developers or the creators to have gotten it right, and then for other people to ensure its correctness in operation and function and intent, um, but because it works across the board. I mean, we can do things like just the code side, which is, as a simple metric example, up through measures of branch prediction complexity, through, uh, stack adjusts and rack and stack those, up through more complex, uh, things such as cyclomatic complexity for functions, and this works across any sort of operating system. In fact, it works on any binaries, all the way down to extracting them off of firmware from televisions or cars. And this is a nice way of comparing, you know, how complex, you know, product A is versus how complex product B, product B is. The other areas are a bit more specific. Two things. The first one is, you know, the, the, the, to the types of operating systems and development environments themselves. So, application armory is a catchall that we say for all of the features that can be imposed and buttressed into and, uh, reinforced in the binary from the compilation stage. So, if you slat, do slash GS in Microsoft, it will try to go in and do stack, uh, protection by putting stack guards in. Uh, if you do dash D fortify source on OSX or Linux, uh, it'll go in and look at 72 common risky functions and see if it can't do heuristics to replace them with a safer version. Um, and, you know, the more recent, uh, modern advances have control flow integrity, code pointer integrity to prevent ROP, et cetera, return on your programming. Then there's the linker. So, this is, you know, where address space layout randomization comes in, uh, is important and, of course, there's lots of measurements, not just is it turned on or not, but, you know, is it high entropy, uh, how ubiquitous is it across all of the components, et cetera, et cetera. And then the last one is, you know, the loader, which is, you know, what's ultimately run at the very end, right when the application is put in and is being told how to mark memory as executable or not. These are all very important safety features. In fact, some of these are, are akin to automobiles with, like, the seat belts, the airbags, the anti-lock brakes. If I'm buying a piece of software or using a piece of software that doesn't have address space layout randomization, fortified source and stack guards, I'm buying a car that doesn't have, you know, seat belts and airbags and ABS and I need to know that. Because, you know, if you're buying a car that doesn't have, you know, seat belts and airbags you need to know that. Because they've been around for decades and they definitively and demonstrably and quantifiably have made applications more difficult to exploit. In fact, uh, um, some of the capture the flag, uh, people, when they want to make an easier challenge binary, they just go get an older compiler that doesn't have those attributes and they build it. Because it's much easier to exploit. And then the final part is developer hygiene. So, here's about 500, uh, common function calls across POSIX and ANSI that historically have been the root urbane of memory correct, uh, corruption, the, uh, code and data, uh, uh, confusion, et cetera. And we break those out into the following buckets. There's ICK functions. There's only a few of these. If you remember the poison sticker underneath, uh, your, your kitchen sink for the detergents and stuff, you know, that's ICK. If you see an ICK function like get S in commercial code, run screaming. Those people should not be doing commercial code. And then there's classic bad functions which are difficult to use correctly. The unbounded stir copies, some of the mem copies, et cetera. Um, risky ones which, you know, the bounded versions. And more recently, we have some good functions that are hard to use incorrectly. Like the stir LCAT, stir L copies, uh, done very nicely. The problem is the only people who know about those good functions are folks who cared about security in the first place and we don't teach those in school or anything else. When you see those in the code, in the binary, um, that's a really good sign. But it's not as good of a sign when you don't see a consistent use of them and you see the risky and the bad ones next to it. So anyway, these are all the sorts of things that we're pulling out of all of those, you know, tens, hundreds of thousands of binaries from the static component. When we, next slide please. Right before I go into the dyna, dynamic one, uh, which I'll just talk briefly about, we're gonna do a bunch of deep dives just showing you deep dives looking at one or two of those static sort of features and then kind of pop the stack back up. Because it gets way too much of a challenge to do. We're gonna push in the weeds otherwise. Dynamic fuzzing, because it's really nice to say like, well this looks like it's a super soft target, um, but it's even better to say we know we can get a crash with a sig bus or illegal, uh, instruction or you know, whatever, sig seg v. Right now we're using AFL. AFL is fantastic. It gives us good enough coverage, um, and we use it for three specific results. One of them is the exploitability, because our environment, we really care about exploitability because, you know, we're bug hunters, we, we like to write exploit code. But that's not always the most important thing for different consumers. Which is why we call out the level of disruptability as well. Think about a big business. Think about one that's doing offshore oil drilling. Um, they care much more about the disruptability than the exploitability. And if you ask them, they'll say, if our system crashes, the drill bit stops, the molten core solidifies, and that offshore oil rig goes offline for more than a year. So then 12 months as we have to build a new one, push it out there. I don't care, I don't want the system to be compromised and exploited, but if it is, I'd rather they're on IRC or doing it as a wearer's distribution site and they didn't crash the system. You go talk to a bank and it's a different story, because they're saying, um, we, we want our systems to crash rather than be exploited in a way that we can't trust the integrity of the underlying data and we're propagating bad information and before too long we can't unroll and, and, and, uh, reclaim the books. And then the final one, which is a new one, and I'm calling this out here because this is about two, three, four, hopefully longer off, but it's coming. And this is algorithmic complexity and this matters to any large distributed companies, like your LinkedIns, your Facebooks, your, uh, Googles. Uh, because they're pretty impervious to distributed denial of service because they've got more bandwidth than, you know, well they are the world's bandwidth, uh, for all intents and purposes and they're distributed and decentralized. But when you can find a small amount of input that causes the worst case in a particular types of algorithms, so a linked list devolves, or a, uh, a hash table devolves to a linked list, you can start taking these guys out and they can't defend against it. The traditional DDoS defenses, de-aggregate, decentralize, increase bandwidth, don't work. And so, based upon how we modify AFL, we can get all three of these. But, fuzzing's expensive. Right. So, fuzzing's expensive, so we would like to do as little of it as we need to. Um, and so what, but on the other hand, math is cheap. Uh, so what we're doing is fuzzing a statistically significant portion of the software and then using Bayesian math and linear regression so that we can model how the rest of the software would do. Uh, because we don't care about finding a specific exploit. We want to know what categories of function, of, uh, vulnerabilities are present. And we want to know what sorts of problems do we expect to see. And for that, we can model that based off of the static features. Um, the, uh, uh, like somebody who's actually looking for an exploit still has to do the, all that heavy lifting. But for our risk assessments, we don't need that. So, uh, some software will have a little A in the corner saying actual, we really fuzzed this. And then some things will have a little E saying it's estimated. Um, and, uh, that's the, um, that's the icing on the cake. The part that would make this scale, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, really well for, you know. And mathematically we can show you to what level we're able to accurately predict this, you know, 99.99, whatever we have. And this isn't too unusual. You're actually used to this in a different area. Explain, you know, the cars. Right. So, uh, like, for the EPA miles per gallon, they don't run every single car until it's on fumes. They do it for enough so that they can understand and model how the rest of the cars will do. You know, assuming no one's trying to trick them with, uh. Yeah. Volkswagen figured out a way around that. Right. And so we would continue to do spot checking and if we find an anomaly that goes back into our model, but it should work really nicely long term for us. And, uh, even out one of the, uh, traditional asymmetries of defender and, uh, attacker because this is something that works really well for defense, not so well for offense. Yeah. One of the things I learned when I was, uh, uh, Deputy Director of ATAP out in Google and got to see how Google did things is, um, don't underestimate the power of Bayesian analysis and linear regression testing. So, go ahead. Okay. So, we're going to, uh, I'm just going to set this up for Sarah. Uh, I mentioned we want to take you on a deep dive on just, uh, some small subsets to show you, uh, kind of the power of what you can do when you start to really tease out information on specific attributes from the binary extraction, uh, and, uh, then we'll pop back up to a higher level view of the world again. Okay. So, um, this is one of our automatically generated reports. Uh, this is a report for Google Chrome on OSX and, uh, I got to spend lots of quality time with, I can never remember whether people prefer LaTeX or, you know, I read it more than I say. So, um, the first page is always first a table that gives the rubric for how scores were achieved because as we tweak things we want to be able to look at old reports and remember how we got those numbers. And then, uh, after that we get a summary of any anomalies for the file. So, did it have any weird flags that you don't normally see or strange initial permissions or what have you. Uh, how did it do for function consistency, which, uh, we'll, we'll talk about in a little bit. And then, uh, the file code and data size for the main application, average library, total libraries, and then all of that together. Um, because, you know, the, I mean, Google Chrome, the main application is just 601 bytes of code. It's a stub. All the action is happening on the main application. Um, and then, uh, we look at what's happening in the libraries. And when you look at the libraries it links to directly and then the libraries they link to and so on down the rabbit hole, eventually Chrome is using 176 libraries. So, we put out that number and then the average and minimum library scores that occurred too. And then the next page is the, the main report. Obviously, there's more than one page of this if there's 176 libraries. So, the, this is just the first page. But, uh, um, the first line will be the main application and then all of the libraries listed after that. And, uh, for it, then the columns are whether it's 32 or 64 bit, what score did it get, and then two categories of features that we're highlighting here are application armoring and function hygiene. Um, and so the application armoring are the, you know, safety features that make software better. And then the, uh, function hygiene is a measure of how well the application is running and how well do the programmers know what they're doing. Um. Go ahead. Okay. So, that's more detailed view. This is a very coarse screen view comparing three browsers on OSX. And, uh, what we have here is that we're just looking at four application armoring features. ASLR, non-executable heap, stack guards, and fortified source. And, uh, if all the files for your browser had ASLR enabled, that would be 25 points. So, that's 25 points. Uh, if you had all, all, if all files had all four features, you'd get 100 points, which no one did. Um, but Google Chrome comes out ahead because they had pretty consistent application of ASLR and non-executable heap. Uh, Safari did not quite so well. They had all four of those things present in some cases, but just not consistently. And then Firefox was missing ASLR entirely, which was a shock to us. And then if you looked at the, uh, Bugzilla comments, it was a shock to the development team. So, it was quite interesting because Kim Zetter did a very nice article, uh, on us in, um, I can't remember the name of the, uh, the journal. Uh, and some folks on Firefox dev team popped up and said, that doesn't make sense. We've had ASLR since 2000 mumble mumble. And it was enjoyable in a kind of, you know, uh, awkward way to watch some of the other developers point out all the situations where they intentionally disabled it and guess kind of what we might have measured. And somebody said, well wait a second, they've got Safari and that doesn't live on other things. This must be OSX. We have ASLR and OSX. Don't, don't we? And then somebody goes and looks and goes, no. Uh, not at all. And they dig it out and it's like, sure enough, for backwards compatibility, you know, for, uh, from 10.6 OSX, you know, they had to drop it out. Now the good news is, in September, they're going to rectify that. They won't be able to fix, I don't, I don't, it doesn't look like they're going to try and fix the non-executable heap the way Chrome, Google Chrome does. So, uh, they're going to try and fix what Chrome did, which is a bummer. Um, so, but the good news is a fix is coming based upon this data. Uh, we'll see how well it goes across the board. The bad news is, until then, I don't know what the recommendation is. Maybe use Chrome. So, that was a view of, uh, just a few of the application armoring, uh, callouts. So, let's dive down into just one specific one, uh, since the data on the back end that we're using for all this is actually pretty rich. And this is the Fortify source. And I gave you a little, for those who aren't familiar with the Fortify source, I gave you a little, for those who aren't familiar with this, there are 72 functions, I think it's 72 presently, um, that the compiler uh, has replacement safer versions for. And if you say Fortify source on Linux or OSX, it'll go through and see if you have any of the functions in your code there, do a bunch of heuristics on each one to see if it can guess what you had really intended, and then replace that risky function with one that's more strongly bounded. So, it'll kind of like do some extra safety for you. And put that into the resulting binary. So, we looked at, um, the Linux applications, and across, so, uh, across all of those there were about 2 million opportunities for a risky function to be, uh, enforced or improved with Fortify source. And the way you read this graph is that each one of these dots is a file. And on, along the X axis is what percentage of the opportunities of those risky functions it found it was able to replace in the file. And on the Y axis is the number of those risky functions per file. So, the far left, you know, a little bit off, uh, is, there's a file with over 10,000 risky functions that was only able to replace about 7% of those. Uh, for those who are Linux hackers, uh, system D, a kind of extremely important, uh, uh, binary, uh, for Linux, was off the scale. How many? 43,000. 43,000 opportunities, mostly P reads. And it was able to successfully reinforce those less than seven tenths of 1% of the time. And the interesting part here is that the developer is trying to do the right thing. I mean, the right thing would not be to use those risky functions in the first place, but sometimes you're kind of, you're kind of hosed and have to. So they told it Fortify source, but then they don't know the efficacy or the coverage that it got. And the consumer needs to know that as well. Because the fact that two people have anti-lock breaks is a lot different than the fact that one of them will stop within 300 yards and one of them will stop within 10 yards. And depending on your environment, you need to be able to know each. So, this gave us a nice view of source code fortification across Linux. How does it look across OSX? Um, first, a couple other things for this chart. The, uh, a third of the files end up being in the 95 to 100% range. And then the rest were really evenly distributed from 0 to 95%. And the interesting thing to note is that some of these very well fortified, close to 100% files are up at like 25,000 functions. So, it, it's not that they've only had 5 functions and that's why it got 100%. Okay, now moving on. But it al- but it also means that two thirds of the time, it's not able to, uh, protect you, uh, when it's putting on there. So, this is OSX. And you can see it's weighted a little bit towards the other side. In fact, there weren't any functions, uh, any binaries that had a significant number of functions that were, uh, completely fortified. The 95 to 100% one, uh, the largest one had like 121, uh, functions that, that were replaced. Uh, and this led us to believe that source code fortification is lagging behind in OSX as to what it is on Linux. So, we dove into the data and decided to slice and dice this one more way. And look at it per function. So, remember there are 72 functions and the way you read this chart is the far left one says there were about 37 functions out of those 72 that were never, 0 to 5% of the time, were they ever able to be fortified and replaced. Far right you see that there were 15 functions that almost all the time, uh, were able to be replaced successfully with the safer versions. Far right makes up essentially your stir copies, your unbounded ones there that it says I know what you're trying to do. The far left all of your pointer arithmetic on mem copies largely. But, there were 28 functions out of there that never got touched. They're essentially academic. Sorry. Hm? Sorry. I had some slide issues. Oh, no worries. So, with our hypothesis that it's a little, uh, more mature on Linux than it is on OSX, what did OSX look like? It's not that good. So, there were, you know, 50 plus, uh, out of the 72 that are, you know, only ever, uh, occasionally and then a whole slew of them, well, 52 were 0% out of the 72. So, this is a way of using the data rather than looking at a per binary of looking at an entire environment and figuring out its maturity level as a consumer as to the different safety mechanisms that are in place. And here, where there were 15 functions that were 95 to 100% fortified in Linux, here there's only one in the 90 to 95 bin and then nothing that was 100% fortified. So, you, you, you, you, you, you, you, you, you, you, you, you, can see that it's a much less mature feature on OSX. Go ahead. Okay, um, and, uh, a lot of the things we're looking at are not new things to be looking for. They're things that attackers always look at when they're trying to find a weak target and figure out how to target their efforts. But, usually, they look until they find something that looks juicy and then they start working on that. Uh, it's, as far as we know, nobody's ever applied these metrics across the entire ecosystem to see on a broad scale what it all looks like. So, and I wanted to add, we've shown you at the beginning an example of looking at individual files. The last couple showed you looking at an entire operating system and kind of the maturity level. And now what Sarah's going to walk you through is if you step back and look at the institutions that made the code and what you can infer about them that maybe they don't even know about themselves. So, what we're going to look at is Google Chrome and Microsoft Excel on OSX. So, we're going to look at the OSX and what you can learn about the OSX development, uh, for those two organizations. And what you can see, uh, as I'm going to explain is that each organization has something they do very well and each organization has a blind spot. So, first off, uh, the application armoring features. What they really tell us about is the development and build environments for the ben- the software developers. Um, and that's an area where Google Chrome does very well. They actually had the only 64-bit files on OSX that had the non-executable heap flag enabled, uh, out of the almost 40,000 binaries that we looked at. So, kudos to them. They did a great job there. Uh, and- They figured out how to manually go in and hack the binaries because they knew the operating system and the ABI would allow you to explicitly call that out even though the com- you know, the compiler chains don't give you the option to do that. And they said, hey, OSX tries to make the heap non-executable by default, but it's a system-wide control, and we can explicitly say, no, it should never be executable for our app, please. And a lot of you might be surprised that for at least a period of time, it was executable on all of your distributions. Some of them it might still be. So, uh, they went above and beyond to make sure that they had the best development and build environment. On the other hand, Microsoft Excel was still a 32-bit binary and they didn't have some of the application armoring features that are default for 32-bit binaries on OSX, uh, if you're doing a modern build chain. And so, what we can infer from this is that either they were using a really old build system for OSX because it's not their wheelhouse, their main focus is of course Windows development, or that they were using a modern build environment and specifically disabled it. So we're giving them the benefit of the doubt and assuming that it's just a really old compiler. Um, but then, uh, on the other hand, uh, let's look at function hygiene. So, Google Chrome has more use of good functions than Excel does. But, all of those binaries that had good functions also had the risky and bad versions of the same functions. Which, sort of defeats the purpose. Uh, you know, the, if you're gonna use the strl copy, always use the strl copy, don't also have strl copies right next to it. Um, on the other hand, Microsoft Excel, when they use the good function, they just use the good function. They, uh, they don't have the strl copies in there anymore because they got beat up for them too many times and they don't let their developers use them now. Um, and, uh, so this is an area where Microsoft has learned the hard lessons and is really doing the right thing. But, uh, at Google, it's a little bit of the wild west. It's some developers know about the good functions and so they'll catch it and code review or use the right function from the beginning. But other ones don't and so it's a little bit of the wild west. Sort of luck of the draw. Yeah, and they have, I mean, they have really good programmers at Google. This isn't a knock on them. But, you know, not all of them are security, uh, as their main focus. Uh, and it's a little bit of luck of the draw as to who you get as the reviewer as well. But Microsoft, in their precompiler, has a dirty words list, essentially. And you've seen this. On their deprecated, you know, uh, these functions are too risky for security. We will not let you ship, you know, if you're using that out of Microsoft. So, it flags them, refuses to go to a gold build and they have to go back and actually replace it. So, it's an institutionally enforced area, which is impressive. And, uh, this is also just to plug a different pet peeve of mine. The, um, variance in what you see from security knowledge of developers is also just the fault of computer science curriculum. That it's not something that's included for general computer science. Uh, but anyways, moving on, back on ta- back on topic. Uh, this is, uh, this is the, the sort of curated report that we're doing automatically off of the data sets. And it's only highlighting those sub-functions out of app armoring and, uh, and the function hygiene. But what we're doing is, as the non-profit, we're opening up and licensing the data sources so other folks can cut it and slice it and dice it any way they want for their own analysis. But we're modeling it off of consumer reports to figure out something that's in the middle that will give consumers the ability to kind of look at a high level of what's going on. Go ahead. And if we pop back to the big picture and you think back to the histograms of Linux OSX and the, and the, the, the early one on Windows, there was something, cause we always, we always want to know, um, you know, what the attacker had on, what's in the far left, what's making up all that low hanging fruit. Next. And we had installed the same package on all three of them. And there was a package that had about 600 binaries. Uh, if you are any startup in Silicon Valley working with big data, you probably rely heavily on this. Uh, as do a lot of larger organizations. And it really surprised us because, um, this would not have been picked up with source code analysis, which is also another reason other than we won't, don't want to be under NDAs cause we want to be able to give you the output why we don't look at source. Cause source is the developer's intent, but the binaries is the actual ground truth. So what is that? And that's the story of something called Anaconda. Go ahead. We were looking at address space layout randomization, uh, and measuring the efficacy, uh, of, of, of the binaries, uh, in some of the Linux areas. And this is a view of the number of dynamically, of dynamic, uh, dynamic symbols that are fixed on the x-axis and the number of function pointers that are fixed on the y-axis. You want everything to be 0, 0. That means you've got all the ASLR that the kernel can do as much as it can to try and protect you against particular types of attacks. Anything moving up to the top right is getting worse and worse and worse and worse. So we were like, what the heck are all of these? And we looked at it and we saw that, ooh, a bunch of them are all from the same package. What's going on? And we looked at it and we saw that, ooh, a bunch of them are all from the same package. What's going on? And that's where we decided to look into this particular package. Go ahead. Um, it's a DARPA funded package, um, which is a little embarrassing. I mean, I had a lot of fun at DARPA and I'm a huge fan. I'm a booster. You saw Cyber Grand Challenge. That was, that, that was history actually, uh, that's going on. And, um, you know, they'll be the first ones to say, look, this is, sometimes it's about rapid prototyping, but as a consumer I need to know where I'm accepting more risk than I ever expected. It's a roll up. It's a whole bunch of open source software for R and Python with all of your numpy and pandas and your side plot and then it's got, you know, open SSL libraries and XML and lib curl, all precompiled and packaged together. It is super convenient. I mean, it's like Backtracker Kali for Linux, because it's a royal pain in the butt to try and put that stuff together and get it working on your own. Somebody else does it. Yay! And I roll it out. The problem is, on all of our systems, I mean, and I'm not the only one that's rolled it out. Here's the customer list from their, um, from their website. Uh, it's major Fortune 10 companies, you know, everything from Bank of America, Siemens, everything to DOD and what have you. Um, so, how large is the footprint when you install it? And why is it the bottom score on all of these operating systems for 600 plus binaries? Because we had these binaries from other kits. Other things used open SSL and had the binaries there and they weren't scoring as low. So what was up with that? And as we looked across them, um, they had the old version of the, uh, segment and section layout for Linux, which, you know, implied, yeah, that was weird because you can overwrite in strange ways. They were missing things like basic stack guards. Uh, even on OSX, almost everybody that's been default for the comp, for the compiler settings for eons. And on Windows, uh, no high entropy, uh, um, address based layout randomization, no safe structured exception handlers. Oh, we did fix that. Yeah, no, it, it, we fixed it again. Microsoft thinks it's G rather than SSL. Uh, uh, uh, uh, uh, uh, uh, that's fine. So, we finally were able to get the ground truth as to why this was happening, because on Linux, um, they put in the, uh, the dot info section, it spits in the compiler version and the build environment, and their most recent, uh, batch of binaries is being recompiled from all the source code on a, what, 2008 GCC running on a 2005 installation of Linux. Those were different defaults back then. And it missed all of those safety features, all of those anti-lock breaks, airbags, seat belts. Uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, side, you know, side impact, et cetera. So, I hadn't realized that they had essentially done the same trick unintentionally that the capture the flag folks do to make very easy targets for exploitation. Uh, they missed almost a decade worth of improvements, and you saw how well that worked for Google on the compiler toolchains, by staying up to date on the latest and greatest. Um, so, that's a kind of interesting little side story we figured we'd share about why binary, looking at binaries rather than source, actually sometimes isn't the, uh, the best way to deal with the, uh, deficiency that many people think it is. Alright, so, uh, just to wrap things up, there's been various misconceptions about what we're doing or not. So, this is me making sure that we've, uh, got everyone on the same page as us. So, these are preliminary results. The, uh, detailed data releases are planned for end of this year, early next year. Uh, the goal of this was just to familiarize everyone with what we're doing. Um, and, as we've stated, we do binary only analysis. Partially because we don't want to be beholden to vendors, or have to have NDAs signed, or anything like that, but also because, in a lot of ways, source code is the theory, whereas binaries are the ground truth, and we want to see what the consumer gets. Um, this is not a pass fail, you get a gold star sort of thing. It's meant to be quantified and comparable between different products. And, uh, we look at overall classes of vulnerabilities and trends, rather than specific instances. So, we're not going to be, like, you know, giving you, oh, there's an exploit on this line of whatever, you know. Um, But we will tell you that this will be exploitable at a 99.9% percentage, and that the adversary still has to do all the heavy work in order to do it. So, finally, an asymmetric win for the consumer and the defender. This talk focused just on the static analysis, because we had to pick a silo to focus on, but, uh, we, the dynamic analysis results are planned to be released next year. Um, and, we are also going to be looking at, uh, Internet of Things devices, again, next year. Um, we're a 501 , which means we're a non-profit charitable organization, uh, the, uh, with non-exclusive rights to use the IP. We are going to be offering, uh, the ability to license our data, or to, uh, um, uh, you know, other partnerships with corporations, just not with the corporations that make the software. This was actually a really important, uh, choice that we made, because we needed the incentive structures to be aligned that would enable us, nay, force us to be able to give out the data, uh, to everybody. And, there are a couple of other efforts here. I know folks are familiar with the more traditional underwriters laboratory. You might not be aware, and I hope they, they do well. I'm not sure I'm, I'm, I'm a, a booster of their approach. They're now a for-profit organization, and it's a for-profit organization based on public safety. And, to me, those are fundamentally misaligned incentive structures. Uh, and then, of course, what they're doing is a bit more of the, did you do all of the 40 EAL common criteria, which we've already demonstrated, really are just measurements of your processes and not, not what the safety is in the end product. And, uh, um, we're partnering with, uh, Consumer Reports for some things right now. Yeah, Consumer Reports is involved with us. But Consumer Reports' sort of business model is what we're, think of that for what we're going to be doing. Yeah, if you're wondering how are they going to make money, or how are they going to do anything else, look at Consumer Reports, and we're trying to do it the exact same way that they do. We won't accept things from vendors. We will go out and buy the software and analyze it ourselves, or get it the same way you do off of the download sites, the legitimate ones. And finally, one thing a lot of people ask us about is, we're not looking at software configuration, vendor history, how quickly they put out patches, interpreted code, corporate policies, privacy, any of those things. It's not that we think they're unimportant, it's just that, that data's already out there, other people are looking at it, and we're trying to focus on the blind spot, the thing that no one has data on yet. And, uh, you know, we've, uh, compared what we're doing to nutritional facts, so, you know, having the nutritional facts on the thing is great, but you still sometimes need a doctor or a dietician, so the security specialist, the, you know, consultants, they can tell you what diet you should be on for software, and they can also bring in all these other factors and, you know, help, uh, put it into a whole picture for you. Yeah, we aren't telling you what to buy or what not to buy. We don't want to do that. We want to tell you what's inside of it so that you can make your own informed decision, the same way that I enjoy my candy bar, I want to eat my candy bar, um, but I do want to know, you know, what it's made of, and I actually enjoy it more when I know how many calories and sugar, and I'm cheating. And since we've just got one minute left. Yep. So, I'm flying through this. This is what we're going to see at the end of the year, some of those curated data releases coming online. We're going to release in detail on the site our static measurement methodology so other folks can recreate and do it. We are not releasing the actual source code because that's, uh, a gaming issue, um, but we're releasing enough data so that you can recreate it on your own. 2017, the large-scale data analytics dumps and everything else, uh, Internet of Things, and then the large-scale fuzzing results and the mathematic models are going to be released more publicly and described publicly in 2017. Two slides, I've got to get to the, these are our thanks, DARPA AFL, AFL community, uh, Capstone. Uh, Ford Foundation is funding us through Consumer Reports. We've got some money from DARPA. I'm huge fans. They took the risk on us. And this is the mandatory slide at the end of the year. Uh, we're going to be publishing this at the end, uh, because we have DOD funding inside of it, but it's basic research, meaning that we can publish without having to go ask permission, um, because this information just needs to be out there, is that this does not necessarily represent their opinions or anything else from the Air Force. DARPA said, me too, make sure, me too. And then the White House says, why do you keep name dropping us all the time? Would you please knock it off? Um, but so be it. Thank you very much. And finally, if you want to get in touch with us. Thank you.