00:00:00.133,00:00:04.471 >> Thank you for waiting with us, I am Mudge this is Sarah and 00:00:04.471,00:00:07.507 we're going to give you a little uh uh information about the 00:00:07.507,00:00:10.410 Cyber Independent Testing Lab. You may have been reading about 00:00:10.410,00:00:14.381 it kinda pull back the curtains and ahh so that's the 00:00:14.381,00:00:17.417 introduction. I'm gonna let Sarah start this off as a kinda 00:00:17.417,00:00:21.455 as a heads up uh we go a lot from like a 30000 foot all the 00:00:21.455,00:00:24.424 way down into the weeds and then back up so it's gonna be a 00:00:24.424,00:00:28.829 little bumpy ride but hopefully it's enjoyable. >> We know 00:00:28.829,00:00:31.698 different people are interested in different technical levels so 00:00:31.698,00:00:35.002 we wanted to make sure we hit something for everyone ahh, but 00:00:35.002,00:00:39.006 anyways this is just a preliminary data peak behind the 00:00:39.006,00:00:43.911 scenes so if you visit our website it's mostly just coming 00:00:43.911,00:00:46.113 soon right now but uhh this is what were but uhh uhh this is 00:00:46.113,00:00:51.118 what we are up to. Just a quick peek. Ok so first off uhh what 00:00:55.022,00:01:01.528 problem are we trying to address its the fact that uhh we been 00:01:01.528,00:01:04.331 trying to get people to care about security for years but 00:01:04.331,00:01:07.868 then whenever somebody says ok I give I get that it's important 00:01:07.868,00:01:12.039 what should I do we don't have very much concrete data to give 00:01:12.039,00:01:15.909 them. They don't have anything to act on, there's no consumer 00:01:15.909,00:01:19.780 reports that uhh firsts software security that tells them what's 00:01:19.780,00:01:24.084 the safest browser and if I put it up to the floor what's the 00:01:24.084,00:01:27.888 safest browser we'd get a lot of very strong opinions but a lot 00:01:27.888,00:01:31.792 of those opinions would be based on what feels true rather than 00:01:31.792,00:01:36.530 actual data and that's the problem that nobody actually 00:01:36.530,00:01:41.535 really has data umm. So umm what are the things people are trying 00:01:43.570,00:01:47.107 to do right now and why don't they work, first off there's 00:01:47.107,00:01:51.345 certifications and evaluations which are frequently focussed on 00:01:51.345,00:01:55.515 processes and procedures and don't actually look at code or 00:01:55.515,00:01:59.252 at the final products so they can't really speak to security 00:01:59.252,00:02:03.090 umm. There's industry marketing labels but these are frequently 00:02:03.090,00:02:07.394 vague and misleading umm and their their not you can't 00:02:07.394,00:02:10.964 compare between them if two people have the same sticker on 00:02:10.964,00:02:13.500 their product you don't know which of them did it better or 00:02:13.500,00:02:17.337 which one did just the bare minimum and then stopped. So it 00:02:17.337,00:02:22.309 is not useful to consumer umm, source code review definately 00:02:22.309,00:02:26.113 has it's place but it's not there to serve the consumer it's 00:02:26.113,00:02:30.584 the either internal thing by the vendor or the vendor's paying 00:02:30.584,00:02:35.322 somebody to do this on their product and their the consumer 00:02:35.322,00:02:38.658 doesn't see the results of this or whether the bugs that were 00:02:38.658,00:02:44.131 found in review actually got fixed umm and so it's just the 00:02:44.131,00:02:49.136 incentive structure there isn't right for consumer advocacy and 00:02:49.136,00:02:53.240 finally legislation is frequently well meaning but a 00:02:53.240,00:02:56.643 lot of the time ends up trying to fix a problem by making it 00:02:56.643,00:03:01.581 illegal to look at it and that's a terrible way to fix anything. 00:03:03.650,00:03:07.621 >> So is this mic on, can folks hear me in the back? Great, 00:03:07.621,00:03:11.491 thank you. So I wanted to dive pretty quickly into some of the 00:03:11.491,00:03:16.063 data umm and just see keep it in your mind, we're measuring how 00:03:16.063,00:03:19.299 difficult it is for an adversary how much work you can impose on 00:03:19.299,00:03:23.036 an adversary newly exploit a piece of software. The more work 00:03:23.036,00:03:26.106 you can impose on your opponent and you know without any work on 00:03:26.106,00:03:29.176 your own, you know the better off you are. So today we've 00:03:29.176,00:03:32.546 looked at about a hundred thousand binary applications umm 00:03:32.546,00:03:35.649 with about a hundred plus features uhh static features 00:03:35.649,00:03:39.786 we've a whole bunch of dynamic ones we'll go into uhh some 00:03:39.786,00:03:43.423 difference there uh and uhh measure you know as bug hunters 00:03:43.423,00:03:45.792 and attackers and exploiters which I've been doing that for 00:03:45.792,00:03:52.732 20 almost 30 years now, I'm old uhh what we see. So when you get 00:03:52.732,00:03:56.536 such a large amount of data and applications, binaries, 00:03:56.536,00:03:58.839 libraries to look at and operating systems you can build 00:03:58.839,00:04:04.211 up these continuum. So here's the first continuum, this is 00:04:04.211,00:04:09.082 Linux Ubuntu and this is the first ten thousands some off 00:04:09.082,00:04:13.120 binaries and the way you read this is the easier to exploit is 00:04:13.120,00:04:16.389 further to the left going down to negative numbers here and you 00:04:16.389,00:04:19.326 know we normalise things at a hundred percent on the far right 00:04:19.326,00:04:22.462 and then the number of binaries which are executables and 00:04:22.462,00:04:26.500 libraries are you know bucketed into the columns of five point 00:04:26.500,00:04:30.937 bins and with this you could kind of , let me back up. The 00:04:30.937,00:04:33.773 installation is this is all the base software that comes with 00:04:33.773,00:04:37.010 it, plus of the most common third party applications 00:04:37.010,00:04:40.547 installed and because of this you can kinda pull in any of the 00:04:40.547,00:04:43.583 datasheets from the really detailed data, uhh, extract the 00:04:43.583,00:04:48.188 uhh the relevant metrics it kinda plot them so if we throw 00:04:48.188,00:04:51.424 the first couple on here uhh we're gonna do it on a relative 00:04:51.424,00:04:55.428 harding line underneath it's map to the top part and we'll take 00:04:55.428,00:04:59.599 two common browsers, Chrome and Firefox and Chrome you know did 00:04:59.599,00:05:01.234 a little bit better, it's a little bit more difficult to 00:05:01.234,00:05:04.004 exploit if you look at the underground market cost of a 00:05:04.004,00:05:07.073 zero day on Chrome bears that out, it's slightly more 00:05:07.073,00:05:10.243 expensive that the cost for a Zero Day for Firefox uhh and 00:05:10.243,00:05:13.547 then the yellow triangle goes up on top show where that is. Next 00:05:13.547,00:05:18.919 slide. We see a little further down uhh cause we approaching 00:05:18.919,00:05:23.423 the only 5th percentile mark uhh you start to see some of the 00:05:23.423,00:05:25.992 office suites the client side applications umm for crowds like 00:05:25.992,00:05:29.095 this this isn't that much of a surprise because we know the 00:05:29.095,00:05:32.566 people writing those either it's it's they don’t view themselves 00:05:32.566,00:05:35.468 as the most common attack surface but it also why client 00:05:35.468,00:05:39.172 side attacks in attachments that get opened up by these are you 00:05:39.172,00:05:43.910 know the easiest path to compromise an exploit uhm and 00:05:43.910,00:05:50.850 next slide. This is OS/X so we’ve done this on Linux, OS/X 00:05:50.850,00:05:54.120 and Windows and we’ll have some of the arch-classes and the 00:05:54.120,00:05:56.923 other architectures coming online and instead of just the 00:05:56.923,00:05:59.726 first ten thousand this is the Thirty eight thousand so this is 00:05:59.726,00:06:04.831 all of OS/X El Capitan uhh and about twenty thousand third 00:06:04.831,00:06:07.968 party applications that we went out and measured uhh and then 00:06:07.968,00:06:10.337 plotted on here so let’s see what this looks like now that 00:06:10.337,00:06:15.175 we’ve kinda seen Linux. We have a much wider spread for Chrome 00:06:15.175,00:06:18.278 on the far right, uhh pretty decent almost approaching the 00:06:18.278,00:06:23.383 95th percentile Safari and then Firefox is way down here and 00:06:23.383,00:06:26.820 this was surprising to us it was also surprising to the Firefox 00:06:26.820,00:06:29.689 development team, and they have now confirmed this as well, 00:06:29.689,00:06:32.525 [cough] umm so that’s nice and I am gonna get into what makes up 00:06:36.196,00:06:38.732 these numbers to show you the depth of the data that we’re 00:06:38.732,00:06:42.435 extracting out of the binaries to give you some confidence uhh 00:06:42.435,00:06:45.238 in how this is working and how it can be useful to you. So let 00:06:45.238,00:06:48.208 look at some other application out of these thirty some odd 00:06:48.208,00:06:51.511 thousand. Well here are the different office suites that are 00:06:51.511,00:06:55.015 available on OS/X you’ve got Microsoft Office coming in 00:06:55.015,00:06:58.652 pretty again pretty close to the bottom 5th percentile uhh Open 00:06:58.652,00:07:03.390 Office and you know Mac uhh uhh Apple’s Office, again all on one 00:07:03.390,00:07:07.160 particular platform. I should point out with this sort of view 00:07:07.160,00:07:10.263 from the data it is important to only do comparisons within the 00:07:10.263,00:07:14.734 same platform, it is not fair to say, oh how did Firefox on Apple 00:07:14.734,00:07:18.004 do compared to Explorer on Windows cause they have 00:07:18.004,00:07:20.373 different attributes and different traits I mean and 00:07:20.373,00:07:22.575 compilation settings are different but we’ll go into that 00:07:22.575,00:07:25.979 more. So let’s pull out a few more examples because it is nice 00:07:25.979,00:07:29.115 to have something on the far right and the far left and why 00:07:29.115,00:07:31.785 not let it be the very software that installs the security 00:07:31.785,00:07:35.588 patches for your product anyway, Umm, yeah it is a little 00:07:35.588,00:07:38.458 disappointing to see the Microsoft Office updater being a 00:07:38.458,00:07:43.663 negative 7 point 5. I was talking to a team, uhh that I 00:07:43.663,00:07:47.701 believe might be involved with uhh some of the zero-days going 00:07:47.701,00:07:50.470 around that are popping a lot of the OS/X boxes on El Capitan and 00:07:50.470,00:07:51.805 I said umm any clues as to you know how you going in and they 00:07:51.805,00:07:53.139 said oh, you’ll figure it out and I went back to them and I 00:07:53.139,00:07:58.745 said all the boxes that you’re going in on have Microsoft 00:07:58.745,00:08:03.683 Office installed don’t they? And they’re like maybe umm and this 00:08:05.919,00:08:09.389 is a real quick way of looking at not only as a defender being 00:08:09.389,00:08:12.058 able to choose and say hey if all things are even and I can 00:08:12.058,00:08:14.327 kind of choose which office suite cause it doesn’t matter to 00:08:14.327,00:08:16.529 me, I’m going to choose the one that imposes more work to my 00:08:16.529,00:08:20.066 opponent, that’s kind of the goal. The flip side on the 00:08:20.066,00:08:24.003 offensive side as the adversary is I want to choose the lowest 00:08:24.003,00:08:26.439 hanging targets because I’ve got finite amount of resources and 00:08:26.439,00:08:29.876 time as well, so why am going to go after the OS/X’s software 00:08:29.876,00:08:33.179 updater if I see Microsoft’s updater listening on the network 00:08:33.179,00:08:39.219 and taking input. Ok, this is the base, just the base no other 00:08:39.219,00:08:42.188 apps we haven't finished this one yet of Windows 10 and I 00:08:42.188,00:08:45.225 wanted to pull this out because this is really impressive 00:08:45.225,00:08:47.994 because this is a level of consistency that we didn’t see 00:08:47.994,00:08:52.432 in the base of umm Linux uhm when we were looking at the base 00:08:52.432,00:08:54.701 uh before we added in all of the other ones we did not see this 00:08:54.701,00:08:59.272 in OS/X and the development lifecycle and the compilation 00:08:59.272,00:09:02.642 process that Microsoft has imposed this is way different 00:09:02.642,00:09:05.979 than what it looked like in Windows XP uhm cause you can see 00:09:05.979,00:09:09.549 they are very consistent with what they put in there are a few 00:09:09.549,00:09:11.651 at the bottom that’s because there’s this one set of 00:09:11.651,00:09:14.721 applications that I did install it’s a big data analytics 00:09:14.721,00:09:18.425 package it was installed on all three it was the bottom of all 00:09:18.425,00:09:20.460 three and then we’re going to point out the lessons learnt on 00:09:20.460,00:09:24.898 that one in a moment as well. Rest assured as we put in more 00:09:24.898,00:09:27.600 third party applications and as we start to flesh out more of 00:09:27.600,00:09:30.303 the binary and more of the dynamic feature sets, this is 00:09:30.303,00:09:33.706 gonna start becoming a bit more uhh dispersed but right now this 00:09:33.706,00:09:36.242 is this is kudos Microsoft knows what the heck they’re doing on 00:09:36.242,00:09:40.513 Windows, umm Previous slide we might question if they know what 00:09:40.513,00:09:46.286 they’re doing as well on OS/X >> Ok, so >> Into the Mic >> 00:09:46.286,00:09:49.622 Alright, so the we’ve shown you some numbers but you probably 00:09:49.622,00:09:52.125 want to know what kind of data we’re looking at to produce 00:09:52.125,00:09:55.995 those numbers. Uhh and we’re there’s a lot of other 00:09:55.995,00:09:59.199 industries that have trained consumers in how to make complex 00:09:59.199,00:10:02.235 technical decisions so in figuring out how to help 00:10:02.235,00:10:06.272 consumers make security software decisions we’re drawing from 00:10:06.272,00:10:10.610 those. So first off we’ve got our static analysis features 00:10:10.610,00:10:13.880 things where we are measuring aspects of the applications 00:10:13.880,00:10:17.517 without running it and this is like the nutritional facts for 00:10:17.517,00:10:21.154 the software which functions were used and what are the 00:10:21.154,00:10:27.994 complexity values and then uhh the runtime testing the uhh 00:10:27.994,00:10:31.931 dynamic stuff where we’re fuzzing and crash testing is 00:10:31.931,00:10:35.368 like crash testing cars so you know so the Monroney sticker 00:10:35.368,00:10:39.038 that you see in any news new car that you’re buying uhh you know 00:10:39.038,00:10:42.909 tell you how it did in crash testing and what’s it’s EPA 00:10:42.909,00:10:47.413 expected miles per gallon things like that and then the umm 00:10:47.413,00:10:50.416 safety continuum where you see where that software falls 00:10:50.416,00:10:54.521 comparing to it’s peers or to the rest of the that software 00:10:54.521,00:10:57.757 environment uhh that’s like energy guide where you find out 00:10:57.757,00:11:00.793 you know how much does this fridge to run cost to run 00:11:00.793,00:11:05.798 monthly vs another one or what have you. >> So, in the uhh 00:11:10.303,00:11:15.008 static part which is a large amount of what went into 00:11:15.008,00:11:18.545 creating the scores in the back end on the continuums that you 00:11:18.545,00:11:22.582 saw, uhm and is really gonna be the focus of most of this talk 00:11:22.582,00:11:26.219 although we will talk about what we’re doing on dynamic aspect 00:11:26.219,00:11:31.724 includes essentially the hundred plus feature uhm extractions in 00:11:31.724,00:11:35.962 the following three categories, measurement of complexity uhh 00:11:35.962,00:11:39.599 turned out to be very very important not only because uhm 00:11:39.599,00:11:41.401 you know the more complex something is the more difficult 00:11:41.401,00:11:44.204 it is for the developers or the creators to have gotten it right 00:11:44.204,00:11:47.140 and then for other people to ensure it’s correctness in 00:11:47.140,00:11:50.410 operation and function and intent uhm but because it works 00:11:50.410,00:11:53.613 across the board. I mean we can do things like just the code 00:11:53.613,00:11:57.617 size as a simple metric example up through measures of branch 00:11:57.617,00:12:01.354 prediction, complexity through the stack adjusts in rack and 00:12:01.354,00:12:04.490 stack those up through more complex uhh things such as 00:12:04.490,00:12:07.927 cyclomatic complexity for functions and this works across 00:12:07.927,00:12:10.763 any sort of operating system, in fact it work on any binaries all 00:12:10.763,00:12:13.366 the way down to extracting them off from firmware from 00:12:13.366,00:12:17.103 televisions or cars and this is a nice way of comparing you know 00:12:17.103,00:12:20.206 how complex you know product A is versus how complex product B 00:12:20.206,00:12:24.844 is. The other area is a bit more specific to the types of 00:12:24.844,00:12:28.081 Operating Systems in development environments themselves. So 00:12:28.081,00:12:31.951 application armory is a catch all that we say for all of the 00:12:31.951,00:12:36.723 features that can be imposed and buttressed into and uhh 00:12:36.723,00:12:40.360 reinforced in the binary from the compilation stage so if you 00:12:40.360,00:12:44.063 slash do slash gs in Microsoft it will try to go in and do 00:12:44.063,00:12:47.834 stack protection by putting stack cards in uhh if you do 00:12:47.834,00:12:52.005 dash d fortify source on OS/X or Linux it will go in and look at 00:12:52.005,00:12:54.907 seventy two common risky functions to see if it can’t do 00:12:54.907,00:12:59.145 heuristics to replace them with a safer version. Uhm and you 00:12:59.145,00:13:02.882 know the more recent uhm modern advances have control flow 00:13:02.882,00:13:06.519 integrity code pointer integrity to prevent ROP etc Return 00:13:06.519,00:13:10.490 oriented programming. Then there is the linker, so this is you 00:13:10.490,00:13:13.426 know where address based layout randomization comes in uhm is 00:13:13.426,00:13:16.095 important and of course there’s lots of measurements not just is 00:13:16.095,00:13:19.866 it turned on or not but you know is it high entropy uhh how 00:13:19.866,00:13:23.202 ubiquitous is it across all of the components etc etc and then 00:13:23.202,00:13:25.872 the loader which is you know what’s ultimately run at the 00:13:25.872,00:13:30.209 very end when the uh application is put in and is being told how 00:13:30.209,00:13:33.312 to mark memory as executable or not these are all very important 00:13:33.312,00:13:37.917 safety features, in fact some of these are akin to automobiles 00:13:37.917,00:13:41.854 with seat belts the airbag the anti lock brakes if I’m buying a 00:13:41.854,00:13:44.424 piece of software or using a piece of software that doesn’t 00:13:44.424,00:13:46.893 have address space layout randomization fortified source 00:13:46.893,00:13:49.996 and stack cards, I’m buying a car that doesn’t have you know, 00:13:49.996,00:13:53.466 seatbelts and airbags and ABS and I need to know that because 00:13:53.466,00:13:56.602 they’ve been around for decades and they definitively and 00:13:56.602,00:13:59.906 demonstrably and quantifiably have made applications more 00:13:59.906,00:14:03.876 difficult to exploit. In fact uhh um, some of the capture the 00:14:03.876,00:14:08.181 flag uh people when they want to make an easier challenge binary, 00:14:08.181,00:14:10.683 they get an older compiler that doesn't have those attributes 00:14:10.683,00:14:14.020 and build it cause it’s much easier to exploit and the final 00:14:14.020,00:14:19.559 part is developer hygiene, so here’s about five hundred uhh 00:14:19.559,00:14:22.261 common function calls across posix and ANSII that 00:14:22.261,00:14:26.065 historically have been the root of bane of memory correct uh 00:14:26.065,00:14:31.270 corruption code and data uhh uhh confusion etc and we break those 00:14:31.270,00:14:35.308 out into the following buckets, there’s IK functions, there’s 00:14:35.308,00:14:38.177 only a few of these if you remember the poison sticker 00:14:38.177,00:14:42.048 underneath your kitchen sink for the detergent and stuff, you 00:14:42.048,00:14:45.685 know that’s IK if you see a IK function like get s in 00:14:45.685,00:14:50.123 commercial code, run screaming, those people should not be doing 00:14:50.123,00:14:52.859 commercial code. And then there’s classic bad functions 00:14:52.859,00:14:54.827 which are difficult to use correctly, the unbounded 00:14:54.827,00:14:59.632 strcopys some memcopys etc um risky ones which you know the 00:14:59.632,00:15:03.836 bounded versions and more recently we have some good 00:15:03.836,00:15:07.340 functions that are hard to use incorrectly like the strlcat 00:15:07.340,00:15:10.643 strlcpy uh done very nicely the problem is the only people who 00:15:10.643,00:15:13.079 know about those good functions are folks who cared about 00:15:13.079,00:15:15.414 security in the first place and we don't teach those in school 00:15:15.414,00:15:18.918 or anything else, when you see those in the code in the binary, 00:15:18.918,00:15:22.388 um that’s a really good sign, but it’s not as good of a sign 00:15:22.388,00:15:24.423 when you don't see a consistent use of them and you see the 00:15:24.423,00:15:27.527 risky and the bad ones next to it. But anyway these are all the 00:15:27.527,00:15:30.396 sorts of things that we’re pulling out of all of those you 00:15:30.396,00:15:33.199 know tens hundreds of thousands of binaries from the static 00:15:33.199,00:15:37.804 component. When we next slide please right before I go into 00:15:37.804,00:15:41.707 the dynamic one um which I’ll just talk briefly about we’re 00:15:41.707,00:15:44.477 gonna do a bunch of deep dives just showing you deep dives 00:15:44.477,00:15:47.847 looking at one or two of those static sort of features and then 00:15:47.847,00:15:50.249 kind of pop the stack back up cause it gets way too much in 00:15:50.249,00:15:53.753 the weeds otherwise. Dynamic fuzzing cause it’s really nice 00:15:53.753,00:15:56.923 to say like well this looks like this a super soft target uhm but 00:15:56.923,00:16:00.259 it’s even better to say we know we can get a crash with a sigbus 00:16:00.259,00:16:05.531 or illegal uh instruction or you know whatever, Sigsegv. Right 00:16:05.531,00:16:09.268 now we’re using AFL. AFL is fantastic, it gives us good 00:16:09.268,00:16:15.775 enough coverage umm and we use it for three specific results. 00:16:15.775,00:16:18.744 One of them is the exploitability because our 00:16:18.744,00:16:21.981 environment we really care about exploitability cause you know 00:16:21.981,00:16:25.284 we’re bug hunters we like to write exploit code, but it’s not 00:16:25.284,00:16:28.221 always the most important thing for different consumers which is 00:16:28.221,00:16:31.824 why we call out the level of disruptability as well. Think 00:16:31.824,00:16:34.327 about a big business, think about one that’s doing off-shore 00:16:34.327,00:16:38.197 oil drilling uhm they care a lot more about the disruptability 00:16:38.197,00:16:42.001 than the exploitability and if you ask them they’ll say if our 00:16:42.001,00:16:47.907 system crashes, the drill bit stops the molten core solidifies 00:16:47.907,00:16:50.676 and that off-shore oil rig goes off-line for more than twelve 00:16:50.676,00:16:53.613 months as we have to build a new one push it out there. I don’t 00:16:53.613,00:16:56.916 care, I don't want the system to be compromised and exploited, 00:16:56.916,00:16:59.986 but if it is I’d rather they all on IRC or doing it as a warez 00:16:59.986,00:17:03.155 distributions site and they didn't crash the system. You go 00:17:03.155,00:17:05.124 talk to a bank and it’s a different story cause they’re 00:17:05.124,00:17:09.428 saying, uhm we would want our systems to crash rather than be 00:17:09.428,00:17:11.898 exploited in a way that we can’t trust the integrity of the 00:17:11.898,00:17:15.401 underlying data and we’re propagating bad information and 00:17:15.401,00:17:19.171 before too long we can’t unroll and and and uh re uh reclaim the 00:17:19.171,00:17:23.175 books and then the final one which is a new one and I’m 00:17:23.175,00:17:26.779 calling this out here because this is about two three four 00:17:26.779,00:17:29.882 hopefully longer off but it’s coming and this is algorithmic 00:17:29.882,00:17:33.552 complexity and this matters to any large distributed companies 00:17:33.552,00:17:37.023 like your LinkedIn’s your Facebooks, your Google’s uhm 00:17:37.023,00:17:40.293 because their pretty impervious to distributed denial of service 00:17:40.293,00:17:44.630 cause the got more bandwidth than you know, well they are the 00:17:44.630,00:17:48.267 world’s bandwidth, uh for all intents and purposed and they’re 00:17:48.267,00:17:51.537 distributed and decentralised but when you can find a small 00:17:51.537,00:17:55.207 amount of input that causes the worst case in a particular types 00:17:55.207,00:17:59.578 of algorithms so a Linked List devolved or uh uh a Hash table 00:17:59.578,00:18:02.915 devolves to a link list, you can start taking these guys out and 00:18:02.915,00:18:07.420 they can’t defend against it. The traditional DDOS defences, 00:18:07.420,00:18:10.323 deaggregate, decentralise, increase bandwidth don’t work 00:18:10.323,00:18:14.560 and so based upon how we modify AFL we can get all three of 00:18:14.560,00:18:19.265 these, but fuzzing’s expensive. >> Right, so fuzzing’s 00:18:19.265,00:18:24.270 expensive, so we’d like to do as little of it as we need to. Uhm 00:18:24.270,00:18:29.108 and so what but on the other hand, math is cheap uhh so what 00:18:29.108,00:18:31.877 we’re doing is fuzzing a statistically significant 00:18:31.877,00:18:37.149 portion of the software and then using bayesian math and linear 00:18:37.149,00:18:40.686 regression so that we can model how the rest of the software 00:18:40.686,00:18:45.891 would do, uh cause we don’t care about finding a specific exploit 00:18:45.891,00:18:49.595 we wanna know what categories of functions of vulnerabilities are 00:18:49.595,00:18:52.565 present and what sorts of problems do we expect to see and 00:18:52.565,00:18:57.303 for that we can model that based of this static features uhm the 00:18:57.303,00:19:00.873 uh uh Like somebody who’s actually looking for an exploit 00:19:00.873,00:19:03.809 still has to do the all that heavy lifting but for our risk 00:19:03.809,00:19:08.180 assessments we don’t need that, so uh some software will have a 00:19:08.180,00:19:10.850 little A in the corner saying actual we have really fuzzed 00:19:10.850,00:19:13.886 this and some things will have little E’s saying it’s estimated 00:19:13.886,00:19:18.724 uhm and that’s the uhm that’s the icing on the cake, the part 00:19:18.724,00:19:22.728 that would make this scale really well for you know. >> And 00:19:22.728,00:19:25.431 mathematically we can show you to what level we’re able to 00:19:25.431,00:19:27.600 accurately predict this, you know ninety nine point nine nine 00:19:27.600,00:19:31.704 what ever we have and this isn’t too unusual you’re actually used 00:19:31.704,00:19:34.106 to this in a different area explain you know the cars >> 00:19:34.106,00:19:38.010 Right uh so like for the EPA miles per gallon they don’t run 00:19:38.010,00:19:41.547 every single car until it’s on fumes, they do it for enough so 00:19:41.547,00:19:44.116 that they can understand and model how the rest of the cars 00:19:44.116,00:19:49.188 will do, assuming no one’s trying to trick them with um 00:19:49.188,00:19:51.323 [laugh] >> Volkswagen figured out a way around that right >> 00:19:51.323,00:19:54.427 and so would continue to do spot checking and if we find an 00:19:54.427,00:19:58.130 anomaly that goes back into our model but it should work really 00:19:58.130,00:20:02.068 nicely long term for us >> yeah >> and even now, one of the 00:20:02.068,00:20:06.005 traditional asymmetries of defender and attacker because it 00:20:06.005,00:20:09.141 is something that works really well for defence not so well for 00:20:09.141,00:20:12.011 offence. >> Yeah, one of the things I learned when I was uh 00:20:12.011,00:20:14.914 deputy director of ATAP out in Google and got to see how Google 00:20:14.914,00:20:17.149 did things is uhm don't underestimate the power of 00:20:17.149,00:20:22.721 bayesian analysis and linear regression testing. So go ahead, 00:20:22.721,00:20:26.392 ok so we’re going to uh I’m just going to set this up for Sarah. 00:20:26.392,00:20:29.862 I mentioned we want to take you on a deep dive on just a small 00:20:29.862,00:20:33.499 sub sets to show you uh kind of the power of what you can do 00:20:33.499,00:20:36.402 when you start to really tease out information on specific 00:20:36.402,00:20:40.673 attributes from the binary extraction uh and uh then we’ll 00:20:40.673,00:20:46.345 pop back up to a higher level view of the world again. >> Ok, 00:20:46.345,00:20:50.382 so um this is one of our automatically generated report 00:20:50.382,00:20:54.520 on Google Chrome on OS/X and uh I got to spend lots of quality 00:20:54.520,00:20:58.090 time with I can never remember whether people prefer Latex >> 00:20:58.090,00:21:03.162 Latex yeah [laughs] >> I read it more than I say it, so um the 00:21:03.162,00:21:07.133 first page is always first a table that gives the rubric for 00:21:07.133,00:21:11.604 how scores were achieved because as we tweak things we want to be 00:21:11.604,00:21:13.806 able to look at old reports and remember how we got those 00:21:13.806,00:21:18.010 numbers and then uh after that we get a summary of any 00:21:18.010,00:21:21.213 anomalies for the files, so did it have any weird flags that you 00:21:21.213,00:21:24.717 don’t normally see or strange initial permissions or what have 00:21:24.717,00:21:29.288 you uh how did it do for function consistency which uh 00:21:29.288,00:21:34.160 we’ll talk about in a little bit and then uh the file code and 00:21:34.160,00:21:37.663 data size for the main application average library, 00:21:37.663,00:21:41.367 total libraries and then all of that together because you know 00:21:41.367,00:21:45.237 the I mean Google Chrome the main application is just six 00:21:45.237,00:21:50.009 hundred and one bytes of code it’s a stubb, all the action is 00:21:50.009,00:21:53.145 happening in the libraries and when you look at the libraries 00:21:53.145,00:21:56.015 it links to directly and then the libraries they link to and 00:21:56.015,00:21:59.919 so on down the rabbit hole eventually Chrome is using a 00:21:59.919,00:22:03.789 hundred and seventy six libraries so we put out that 00:22:03.789,00:22:07.493 number and then uh average and minimum library scores that 00:22:07.493,00:22:11.597 occurred too and then the next page is the main report 00:22:11.597,00:22:13.766 obviously there is more than one page of this if there’s a 00:22:13.766,00:22:16.602 hundred and seventy six libraries so this is just the 00:22:16.602,00:22:20.239 first page, but uh um the first line will be the main 00:22:20.239,00:22:23.676 application and then all the libraries listed after that and 00:22:23.676,00:22:29.348 uh for e men a columns or whether it’s thirty two or sixty 00:22:29.348,00:22:33.919 four bit what score did it get and then two categories of 00:22:33.919,00:22:37.022 features that we’re highlighting here are application armouring 00:22:37.022,00:22:42.428 and function hygiene Uhm and so the application armouring or the 00:22:42.428,00:22:45.998 you know safety feature that make software better and then 00:22:45.998,00:22:49.969 the uh function hygiene is um measure of how well do the 00:22:49.969,00:22:54.974 programmers know what they’re doing. Um >> Go ahead >> Ok, so 00:22:57.142,00:23:00.746 that’s more detailed view this is a very core screen view 00:23:00.746,00:23:04.783 comparing three browsers on OS/X and uh what we have here is that 00:23:04.783,00:23:07.686 we’re just looking at four application armouring features 00:23:07.686,00:23:12.992 ASLR non executable heap. stack guards and fortified source and 00:23:12.992,00:23:18.597 uh if all the files for your browser had ASLR enabled that 00:23:18.597,00:23:24.336 would be twenty five points uh if you had all if all files had 00:23:24.336,00:23:26.872 all four features you’d get a hundred points, which no-one 00:23:26.872,00:23:31.844 did. Uhm but Google Chrome comes out ahead because they had 00:23:31.844,00:23:36.615 pretty consistent application of ASLR and non-executable heap. Uh 00:23:36.615,00:23:41.253 Safari did not quite so well, they had all four of those 00:23:41.253,00:23:45.024 things presents in some cases, but just not consistently. And 00:23:45.024,00:23:49.028 then Firefox was missing ASLR entirely which was a shock to us 00:23:49.028,00:23:53.232 and then if you looked at the Bugzilla comments it was a shock 00:23:53.232,00:23:55.968 to the developer team. >> SO it was quite interesting because 00:23:55.968,00:23:59.438 Kim Zetter did a very nice article on us in ah, I can’t 00:23:59.438,00:24:03.809 remember the name of the uh Journal, uh and some folks from 00:24:03.809,00:24:06.779 Firefox dev team popped up and said that doesn’t make sense, 00:24:06.779,00:24:10.582 we’ve had ASLR since two thousand mumble mumble and it 00:24:10.582,00:24:15.988 was enjoyable in a kind of [sounds] awkward way to watch 00:24:15.988,00:24:18.157 some of the other developers point out all the situations 00:24:18.157,00:24:20.726 where they intentionally disabled it and guess kind of 00:24:20.726,00:24:23.262 what we might have measured and somebody said wait a second they 00:24:23.262,00:24:25.698 got Safari and that doesn’t live on other things, this must be 00:24:25.698,00:24:30.235 OS/X we have ASLR on OS/X, don’t we? And then somebody goes and 00:24:30.235,00:24:33.972 looks and goes no uh not at all and they dig it out and there’s 00:24:33.972,00:24:37.209 like sure enough for backwards compatibility, you know from ten 00:24:37.209,00:24:40.312 six OS/X you know they had to drop it out. Now the good news 00:24:40.312,00:24:44.049 is in September they’re going to rectify that they won’t be able 00:24:44.049,00:24:46.151 to fix I don’t don’t it doesn’t look like they’re gonna try and 00:24:46.151,00:24:49.254 fix the non-executable heap the way Chrome Google Chrome did 00:24:49.254,00:24:52.324 which is a bummer um but the good news is a fix is coming 00:24:52.324,00:24:55.961 based upon this data, uh we’ll see how well it goes across the 00:24:55.961,00:24:58.964 board the bad news is till then I don’t know what their 00:24:58.964,00:25:04.670 recommendation is maybe use Chrome. So that was a view of a 00:25:04.670,00:25:07.439 just a few of the application armouring uh callouts so let’s 00:25:07.439,00:25:11.343 dive down into just one specific one uh since the data on the 00:25:11.343,00:25:14.913 backend that we're using for all this is actually pretty rich and 00:25:14.913,00:25:18.150 this is the Fortify source and I gave you a little for those who 00:25:18.150,00:25:21.053 aren’t familiar with this there are seventy two functions I 00:25:21.053,00:25:24.590 think it is seventy two presently um that the compiler 00:25:24.590,00:25:29.161 uh has replacement safer version for and if you save Fortify’s 00:25:29.161,00:25:33.332 source on Linux or OS/X it'll go through and see if you have any 00:25:33.332,00:25:36.034 of the functions in your code there do a bunch of heuristics 00:25:36.034,00:25:38.170 on each one to see if it can guess what you had really 00:25:38.170,00:25:41.907 intended and then replace that risky function with one that’s 00:25:41.907,00:25:44.476 more strongly bounded so it’ll kind of like do some extra 00:25:44.476,00:25:48.580 safety for you and put that into the resulting binary so we 00:25:48.580,00:25:52.551 looked at um the Linux applications and across so 00:25:52.551,00:25:55.120 across all of those there were about two million opportunities 00:25:55.120,00:25:58.524 for a risky function to be uh enforced or improved by Fortify 00:25:58.524,00:26:03.562 source and the way you read this graph is that each one of these 00:26:03.562,00:26:05.564 dots is a file and um along the X axis is what percentage of the 00:26:05.564,00:26:06.932 opportunities of those risky functions it found was able to 00:26:06.932,00:26:11.937 replace in the file and the Y axis is the number of those 00:26:16.241,00:26:19.778 risky function per file so you the far left you go little bit 00:26:19.778,00:26:23.115 off uh is there is a file with over ten thousand risky 00:26:23.115,00:26:27.152 functions that was only able to replace about seven percent of 00:26:27.152,00:26:32.291 those Uh for those who are Linux hackers uh systemd a kind of 00:26:32.291,00:26:37.629 extremely important uh uh binary uh for Linux was off the scale 00:26:37.629,00:26:40.299 how many? >> forty three thousand >> forty three thousand 00:26:40.299,00:26:43.702 opportunities, mostly spreads and it was able to successfully 00:26:43.702,00:26:47.906 re-enforce those less than seven tenths of one percent of the 00:26:47.906,00:26:51.810 time and the interesting part here is that the developer is 00:26:51.810,00:26:53.579 trying to do the right thing I mean the right things would not 00:26:53.579,00:26:56.248 to be used the risky function in the first place but sometimes 00:26:56.248,00:26:58.750 your kind of you’re kind of hosed and have to so they told 00:26:58.750,00:27:01.453 it fortify source but then they don’t know the efficacy or the 00:27:01.453,00:27:03.755 coverage that it got and the consumer needs to know that as 00:27:03.755,00:27:07.025 well because the fact that two people have antilock brakes is a 00:27:07.025,00:27:09.361 lot different than the fact that one of them will stop within 00:27:09.361,00:27:12.331 three hundred yards and one of them will stop within ten yards 00:27:12.331,00:27:14.266 and depending on your environment you need to be able 00:27:14.266,00:27:18.403 to know each so this gave us a nice view of source code 00:27:18.403,00:27:21.607 fortification across Linux, how does it look across OS/X >> Um 00:27:21.607,00:27:25.611 first a couple other things for this chart, the ah a third of 00:27:25.611,00:27:28.514 the files end up being in the ninety five to hundred percent 00:27:28.514,00:27:31.450 range and the rest were really evenly distribute from zero to 00:27:31.450,00:27:34.887 ninety five percent and the interesting thing to note is 00:27:34.887,00:27:38.123 that some of these very well fortified close to a hundred 00:27:38.123,00:27:40.993 percent files are up at like twenty five thousand functions 00:27:40.993,00:27:44.429 so it it’s not that they only had five functions and that’s 00:27:44.429,00:27:47.499 why they got a hundred percent, ok now moving on >> But it al 00:27:47.499,00:27:50.002 but it also means that two thirds of the time it is not 00:27:50.002,00:27:53.472 able to uh protect you uh when it’s putting on there. So this 00:27:53.472,00:27:56.742 is OS/X and you can see it’s weighted a little towards the 00:27:56.742,00:27:59.912 other side, in fact there weren't any functions uh any 00:27:59.912,00:28:02.948 binaries that had a significant number of functions that were uh 00:28:02.948,00:28:06.351 completely fortified the ninety five the hundred percent one the 00:28:06.351,00:28:09.288 largest one had like a hundred and twenty one uhh functions 00:28:09.288,00:28:12.925 that were replaced uhh and this led us to believe that source 00:28:12.925,00:28:17.095 code fortification is lagging behind in OS/X as to what it is 00:28:17.095,00:28:21.066 on Linux so we dove into the data decided to slice and dice 00:28:21.066,00:28:24.836 this one more way and look at it per function, so remember there 00:28:24.836,00:28:27.673 are seventy two functions and the way you read this chart is 00:28:27.673,00:28:31.577 the far left one says there are about thirty seven functions out 00:28:31.577,00:28:33.912 of that seventy two that were never zero to five percent of 00:28:33.912,00:28:37.983 the time were they ever be able to fortified and replaced, far 00:28:37.983,00:28:41.286 right you see that there were fifteen functions that almost 00:28:41.286,00:28:44.523 all the time uh were able to be replaced successfully with the 00:28:44.523,00:28:48.427 safer versions far right makes up essentially your strcopys 00:28:48.427,00:28:50.228 your unbounded ones there that it says that I know what you’re 00:28:50.228,00:28:53.198 trying to do the far left all of your pointer arithmetic on 00:28:53.198,00:28:57.235 memcopys largely but there were twenty eight functions out of 00:28:57.235,00:29:00.672 there that never got touched, they’re essentially academic >> 00:29:00.672,00:29:05.811 Sorry >> hmm? >> I had some slide issues >> Oh, no worries 00:29:05.811,00:29:09.014 so with our hypothesis that it’s a little more mature on Linux 00:29:09.014,00:29:14.753 than it is on OS/X what did OS/X look like? It’s not that good, 00:29:14.753,00:29:17.723 so there were you know fifty plus uh out of the seventy two 00:29:17.723,00:29:21.159 that are you know only ever uh occasionally and then a whole 00:29:21.159,00:29:23.362 slew of them well, fifty two were zero percent out of the 00:29:23.362,00:29:26.365 seventy two, so this is a way of using the data rather than 00:29:26.365,00:29:29.368 looking at the per binary of looking at an entire environment 00:29:29.368,00:29:32.170 and figuring out it’s maturity level as a consumer as to the 00:29:32.170,00:29:35.607 different safety mechanisms that are in place. >> and here where 00:29:35.607,00:29:38.710 there were fifteen function that were ninety five to a hundred 00:29:38.710,00:29:41.513 percent fortified in Linux here there’s only one in the ninety 00:29:41.513,00:29:44.516 to ninety five bin and then nothing was a hundred percent 00:29:44.516,00:29:48.286 fortified. So you can see it is a much less matured feature on 00:29:48.286,00:29:53.291 OS/X >> Good >> Ok, um, and a lot of the things we’re looking 00:29:56.395,00:29:59.965 at are not new things to be looking for they’re things that 00:29:59.965,00:30:02.134 attackers always look at when they’re trying to find a weak 00:30:02.134,00:30:07.239 target and figure out how to target their efforts but usually 00:30:07.239,00:30:09.908 they look until they find something that looks juicy and 00:30:09.908,00:30:14.146 then they start working on that uh it’s as far as we know 00:30:14.146,00:30:17.015 nobody’s ever applied these metrics across the entire 00:30:17.015,00:30:22.020 ecosystem to see on a broad scale what it all looks like >> 00:30:22.020,00:30:24.990 And I wanted to add we’ve shown you at the beginning an example 00:30:24.990,00:30:27.793 of looking at individual files, the last couple showed you 00:30:27.793,00:30:30.562 looking at an entire operating system and kind of the maturity 00:30:30.562,00:30:33.131 level and now Sarah is going to walk you through is if you step 00:30:33.131,00:30:35.767 back and look at the institutions that made the code 00:30:35.767,00:30:38.270 and what you can infer about them that maybe they don’t even 00:30:38.270,00:30:42.708 know about themselves. >> So what we’re gonna look at is 00:30:42.708,00:30:46.445 Google Chrome and Microsoft Excel on OS/X and what you can 00:30:46.445,00:30:50.582 learn about the OS/X development uh for those two organisations 00:30:50.582,00:30:54.519 and what you can see uh as I’m gonna explain is that each 00:30:54.519,00:30:57.089 organisation has something they do very well, and each 00:30:57.089,00:31:02.461 organisation has a blind spot. So first off uh the application 00:31:02.461,00:31:05.330 armouring features what they really tell us about is the 00:31:05.330,00:31:09.201 development and build environments for the ben the 00:31:09.201,00:31:13.972 software developers um and that scenario where Google Chrome 00:31:13.972,00:31:18.043 does really well, they actually had the only sixty four bit 00:31:18.043,00:31:22.647 files on OS/X that had the non-executable heap flag enabled 00:31:22.647,00:31:26.952 uh out of the almost forty thousand binaries that we looked 00:31:26.952,00:31:31.823 at, so kudos to them they did a great job there. >> They figured 00:31:31.823,00:31:34.359 out how to manually go in and hack the binaries because they 00:31:34.359,00:31:37.062 knew the operating system and the ABI would allow you 00:31:37.062,00:31:40.932 explicitly call that out even though the compiler change don’t 00:31:40.932,00:31:44.803 give you the option to do that and they said hey OS/X tries to 00:31:44.803,00:31:47.472 make the heap non-executable by default but it’s system wide 00:31:47.472,00:31:50.542 control and we can explicitly say, no it should never be 00:31:50.542,00:31:53.178 executable for our app please and a lot of you might be 00:31:53.178,00:31:57.015 surprised that for at least a period of time it was executable 00:31:57.015,00:32:00.085 on all of your distributions, some of them it might still be. 00:32:00.085,00:32:03.522 >> So uh they went above and beyond to make sure they had the 00:32:03.522,00:32:06.491 best development and build environment on the other hand 00:32:06.491,00:32:10.262 Microsoft Excel was still a thirty two bit binary and they 00:32:10.262,00:32:12.764 didn't have some of the application armouring features 00:32:12.764,00:32:16.535 that are default for thirty two bit binaries on OS/X uh if 00:32:16.535,00:32:20.906 you’re doing a modern build chain and so what we can infer 00:32:20.906,00:32:26.211 from this is that either they were using a really old build 00:32:26.211,00:32:30.715 system for OS/X because it’s not their wheelhouse their main 00:32:30.715,00:32:33.785 focus is of course Windows development or that they were 00:32:33.785,00:32:37.122 using a modern build environment and specifically disabled it so 00:32:37.122,00:32:39.591 we’re giving them the benefits of the doubt in assuming that 00:32:39.591,00:32:45.330 it’s just a really old compiler, um but then um on the other 00:32:45.330,00:32:49.501 hand, let’s look at function hygiene so Google Chrome has 00:32:49.501,00:32:53.205 more use of good functions than Excel does but all those 00:32:53.205,00:32:56.741 binaries that had good functions also had the risky and bad 00:32:56.741,00:33:00.345 versions of the same functions which sort of defeats the 00:33:00.345,00:33:04.716 purpose, uh you know if you gonna use the strlcopy always 00:33:04.716,00:33:09.654 use strlcopy don't also use strcopys right next to it. Um on 00:33:09.654,00:33:12.390 the other hand Microsoft Excel when they used the good 00:33:12.390,00:33:15.393 function, they just used the good function. They they don’t 00:33:15.393,00:33:19.197 have the strcopys in there anymore because they got beat up 00:33:19.197,00:33:23.335 for them too many times and they don’t let their developers use 00:33:23.335,00:33:27.739 them now, um and e so this is an area where Microsoft has learnt 00:33:27.739,00:33:32.110 the hard lessons and is really doing the right thing but uh at 00:33:32.110,00:33:35.213 Google it’s a little bit of the wild west it’s some developers 00:33:35.213,00:33:38.450 know about the good functions and so they’ll catch it in code 00:33:38.450,00:33:41.286 review or use the right function from the beginning but other 00:33:41.286,00:33:44.422 one’s don’t and so it’s sorta luck of the draw. >> Yeah, and 00:33:44.422,00:33:47.092 they have I mean they have really good programmers at 00:33:47.092,00:33:50.328 Google, this is not a knock on them but you know not all of 00:33:50.328,00:33:54.065 them are security uh as their main focus and it’s a little bit 00:33:54.065,00:33:56.868 of the luck of the draw who you get as the reviewer as well, but 00:33:56.868,00:34:00.438 Microsoft in their pre-compiler has a dirty words list 00:34:00.438,00:34:02.607 essentially and you’ve seen this on their deprecated se- you 00:34:02.607,00:34:05.510 know- se- uh these functions are too risky for security we will 00:34:05.510,00:34:07.979 not let you ship you know if you’re using that out of 00:34:07.979,00:34:11.583 Microsoft so it flags them, refuses to go to a gold build 00:34:11.583,00:34:13.585 and they have to go back and actually replace it so it’s an 00:34:13.585,00:34:17.355 institutionally enforced area which is impressive >>And uh 00:34:17.355,00:34:20.759 this is also just to plug a different pet peeve of mine the 00:34:20.759,00:34:25.196 um variance in what you see from security knowledge of developers 00:34:25.196,00:34:29.134 is also just the fault of computer science curriculum that 00:34:29.134,00:34:31.536 it’s not something that’s included for general computer 00:34:31.536,00:34:37.742 science uh but anyways moving on back on tar- back on topic 00:34:37.742,00:34:39.644 [incomprehensible off mic comment] okay. >>Uh this is the 00:34:39.644,00:34:43.782 the sort of curated report that we’re doing automatically off of 00:34:43.782,00:34:46.918 the data sets and it’s only highlighting those sub functions 00:34:46.918,00:34:50.588 out of app armoring and uh and the function hygiene but what 00:34:50.588,00:34:53.358 we’re doing is as the non profit, we’re opening up and 00:34:53.358,00:34:56.328 licensing the data sources so other folks can cut it and slice 00:34:56.328,00:34:59.631 it and dice it any way they want for their own analysis but we’re 00:34:59.631,00:35:02.834 modeling it off of consumer reports to figure out something 00:35:02.834,00:35:05.070 that’s in the middle that will give consumers the ability to 00:35:05.070,00:35:10.275 kind of look at a high level what’s going on go ahead and if 00:35:10.275,00:35:12.077 we pop back to the big picture and you think back to the 00:35:12.077,00:35:16.414 histograms of linux OS/X and the- and the the the early one 00:35:16.414,00:35:21.086 on Windows there was something ‘cause we always, we always 00:35:21.086,00:35:24.389 wanna know uh what the attacker had one what’s in the far left, 00:35:24.389,00:35:30.462 what’s making up all that low hanging fruit? Next and we had 00:35:30.462,00:35:32.931 installed the same package on all three of them and there was 00:35:32.931,00:35:36.101 a package that had about six hundred binaries uh if you are 00:35:36.101,00:35:39.004 any startup in silicon valley working with big data you 00:35:39.004,00:35:42.440 probably rely heavily on this uh as do a lot of larger 00:35:42.440,00:35:46.845 organizations and it really surprised us because um this 00:35:46.845,00:35:48.813 would not have been picked up with source code analysis which 00:35:48.813,00:35:51.082 is also another reason other than we wo - do- don’t want to 00:35:51.082,00:35:54.119 be under NDAs because we want to be able to give you the output 00:35:54.119,00:35:56.721 why we don’t look at source ‘cause source is the developer's 00:35:56.721,00:36:00.325 intent but the binaries is the actual ground truth. So what is 00:36:00.325,00:36:03.661 that? And that’s the story of something called Anaconda. Go 00:36:03.661,00:36:06.998 ahead. We were looking at address based layout 00:36:06.998,00:36:11.036 randomization uh and measuring the efficacy uh in some of the 00:36:11.036,00:36:15.173 linux areas and this is a view of the number of dynamically of 00:36:15.173,00:36:18.410 dynami- of dynamic symbols that are fixed on the x axis and the 00:36:18.410,00:36:22.013 number of function pointers that are fixed on the y axis you want 00:36:22.013,00:36:24.582 everything to be zero zero that means that you’ve got all of the 00:36:24.582,00:36:27.719 ASLR that the kernel can do as much as it can to try and 00:36:27.719,00:36:30.588 protect you against particular types of attacks anything moving 00:36:30.588,00:36:33.191 up to the top right is getting worse and worse and worse and 00:36:33.191,00:36:36.094 worse so we were like what the heck are all of these? And we 00:36:36.094,00:36:39.531 looked at it and we saw that oooh a bunch of them are all 00:36:39.531,00:36:42.300 from the same package, what’s going on? And that’s when we 00:36:42.300,00:36:48.773 decided to look into this particular package. Go ahead. Um 00:36:48.773,00:36:51.943 it’s a DARPA funded package um [laughter] which is a little 00:36:51.943,00:36:54.412 embarrassing I mean I had a lot of fun at DARPA an- and I’m a 00:36:54.412,00:36:57.649 huge fan, I’m a booster, you saw cyber grand challenge that was- 00:36:57.649,00:37:01.853 that- that was history actually uh that’s going on and um >>You 00:37:01.853,00:37:04.856 know they’ll be the first ones to say look this is some kind 00:37:04.856,00:37:07.525 about rapid prototyping but as a consumer I need to know where 00:37:07.525,00:37:10.462 I’m accepting more risk than I ever expected, it’s a roll up, 00:37:10.462,00:37:13.398 it’s a whole bunch of open source software for R and Python 00:37:13.398,00:37:17.769 with all of your numpee and pandas and your xyplot and then 00:37:17.769,00:37:21.272 it’s got you know open SSL libraries and XML and libcurl 00:37:21.272,00:37:24.576 all precompiled and packaged together it is super convenient 00:37:24.576,00:37:27.745 I mean it’s like backtrack kali for Linux ‘cause it’s a royal 00:37:27.745,00:37:29.781 pain in the butt to try and put that stuff together and get it 00:37:29.781,00:37:32.851 working on your own somebody else does it yay! And I roll it 00:37:32.851,00:37:36.387 out the problem is on all of our systems I mean and I’m not the 00:37:36.387,00:37:39.457 only that’s roll it out, here’s the customer list from there um 00:37:39.457,00:37:43.795 from their website uh it’s major fortune ten companies you know 00:37:43.795,00:37:47.832 everything from Bank of America, Siemens, to DOD and what have 00:37:47.832,00:37:54.005 you um so how large is the footprint when you install it? 00:37:54.005,00:37:56.307 And why is it the bottom score on all of these operating 00:37:56.307,00:37:59.711 systems for six thousand plus binaries? Because we had these 00:37:59.711,00:38:02.947 binaries from other kits other things used open SSL and had the 00:38:02.947,00:38:06.017 binaries there and they weren’t scoring as low so what was up 00:38:06.017,00:38:09.954 with that? And as we looked across them um they had the old 00:38:09.954,00:38:13.892 version of the uh segment and section layout for Linux which 00:38:13.892,00:38:16.327 you know im- implied that was weird ‘cause you can overwrite 00:38:16.327,00:38:18.797 in strange ways they were missing things like basic stack 00:38:18.797,00:38:22.667 guards uh even on OSX almost everybody that’s been default 00:38:22.667,00:38:26.938 for the com- for the compiler settings for eons and on Windows 00:38:26.938,00:38:30.275 uh no high entropy uh um uh address based layout 00:38:30.275,00:38:33.044 randomization no safe structured exception handlers >>Oh they 00:38:33.044,00:38:36.247 didn't fix that >>Yeah, no it it refixed it again Microsoft 00:38:36.247,00:38:41.286 thinks it’s SHE rather than SCH or for that’s fine so we finally 00:38:41.286,00:38:43.354 were able to get to ground truth as to why this was happening 00:38:43.354,00:38:47.292 because on Linux uh they put in the uh the dot info section it 00:38:47.292,00:38:50.228 spits in the compiler version and the build environment and 00:38:50.228,00:38:53.832 their most recent batch of binaries is being recompiled 00:38:53.832,00:38:57.969 from all the source code on a what 2008 GCC running on a 2005 00:38:57.969,00:39:02.941 installation of Linux those were different defaults back then and 00:39:02.941,00:39:06.077 it missed all of those safety features, all of those anti lock 00:39:06.077,00:39:11.149 brakes, airbags, seatbelts, side you know side impact etcetera so 00:39:11.149,00:39:13.985 I hadn’t realized that they had essentially done the same trick 00:39:13.985,00:39:17.355 unintentionally that the capture the flags folks do to make very 00:39:17.355,00:39:21.292 easy targets for exploitation uh they missed almost a decade 00:39:21.292,00:39:24.128 worth of improvements and you saw how well that worked for 00:39:24.128,00:39:27.665 Google on the compiler tool change by staying up to date on 00:39:27.665,00:39:31.236 the latest and greatest um so that’s a kind of interesting 00:39:31.236,00:39:33.938 little side story we figured we’d share about why binar- 00:39:33.938,00:39:37.308 looking at binaries rather than source actually sometimes isn’t 00:39:37.308,00:39:42.280 the uh deficiency that many people think it is. >>Alright so 00:39:42.280,00:39:46.084 uh just to wrap things up there’s been various 00:39:46.084,00:39:49.487 misconceptions about what we’re doing or not so this is me 00:39:49.487,00:39:52.390 making sure that we’ve uh got everyone on the same page as us. 00:39:52.390,00:39:57.295 So these are preliminary results, the uh detailed data 00:39:57.295,00:40:00.498 releases are planned for end of this year, early next year, uh 00:40:00.498,00:40:02.900 the goal of this was just to familiarize everyone with what 00:40:02.900,00:40:06.704 we’re doing um and as we’ve stated we do binary only 00:40:06.704,00:40:09.774 analysis partially because we don’t want to be beholden to 00:40:09.774,00:40:13.177 vendors or have to have NDAs signed or anything like that but 00:40:13.177,00:40:16.748 also because in a lot of ways source code is the theory 00:40:16.748,00:40:19.817 whereas binaries are the ground truth and we want to see what 00:40:19.817,00:40:24.255 the consumer gets. Um this is not a pass fail you get a gold 00:40:24.255,00:40:28.726 start sort of thing, it’s meant to be quantified and comparable 00:40:28.726,00:40:33.164 between different products and uh we look at overall classes of 00:40:33.164,00:40:37.168 vulnerability and trends rather than specific instances so we’re 00:40:37.168,00:40:41.239 not going to be like you know giving you oh there’s an exploit 00:40:41.239,00:40:45.009 on this line of whatever you know um >>but we will tell you 00:40:45.009,00:40:48.313 that this will be exploitable at a 99 point 9 percent uh 00:40:48.313,00:40:50.682 percentage and that the adversary still has to do all 00:40:50.682,00:40:53.017 the heavy work in order to do that so finally an asymmetric 00:40:53.017,00:40:56.988 win for the consumer and the defender. >>This talk focused 00:40:56.988,00:40:59.991 just on the static analysis because we had to pick a silo to 00:40:59.991,00:41:04.128 focus on but uh we the dynamic analysis results are planned to 00:41:04.128,00:41:09.100 be released next year um and we are also going to be looking at 00:41:09.100,00:41:14.672 uh internet of things devices again next year um we’re a five 00:41:14.672,00:41:17.542 o one c three which means we’re a non profit charitable 00:41:17.542,00:41:23.548 organization uh the with non exclusive rights to use the IP 00:41:23.548,00:41:27.986 we are going to be offering uh the ability to license our data 00:41:27.986,00:41:33.624 or to uh um uh you know other partnerships with corporations 00:41:33.624,00:41:36.227 just not with the corporations that make the software >>This 00:41:36.227,00:41:39.364 was actually really important uh choice that we made because we 00:41:39.364,00:41:42.133 needed the incentive structures to be aligned that would enable 00:41:42.133,00:41:45.203 us nay force us to be able to give out the data uh to 00:41:45.203,00:41:48.439 everybody and there are a couple of other efforts here, I know 00:41:48.439,00:41:50.608 folks are familiar with the more traditional underwriters 00:41:50.608,00:41:53.177 laboratory you might not be aware and I hope that they do 00:41:53.177,00:41:57.382 well I’m not sure I’m I’m I’m uh a booster of their approach 00:41:57.382,00:41:59.917 they’re now a for profit organization and it’s a for 00:41:59.917,00:42:02.687 profit based on public safety and to me those are 00:42:02.687,00:42:05.590 fundamentally misaligned incentive structures uh and then 00:42:05.590,00:42:08.025 of course what they’re doing is a bit more of the did you do all 00:42:08.025,00:42:12.363 of the forty EAL common criteria which we’ve already demonstrated 00:42:12.363,00:42:15.366 really are just measurements of your processes and no- not what 00:42:15.366,00:42:19.771 the safety is in the end product >>And uh um we’re partnering 00:42:19.771,00:42:22.273 with uh consumer reports for some things right now but their 00:42:22.273,00:42:24.075 >>Yeah consumer reports is involved with this >>But 00:42:24.075,00:42:27.278 consumer reports is sort of business model is what we’re 00:42:27.278,00:42:29.580 think of that for what we’re going to be doing um >>Yeah if 00:42:29.580,00:42:31.482 you’re wondering how are they going to make money or how are 00:42:31.482,00:42:33.918 they going to do anything else, look at consumer reports and 00:42:33.918,00:42:36.721 we’re trying to do it the exact same way that they do. We won’t 00:42:36.721,00:42:39.290 accept things from vendors we will go out and buy the software 00:42:39.290,00:42:41.793 and analyze it ourselves or get it the same way you do off of 00:42:41.793,00:42:45.062 the download sites >>And finally one thing a lot of people ask us 00:42:45.062,00:42:49.734 about is we’re not looking at software configuration, vendor 00:42:49.734,00:42:53.438 history, how quickly they put out patches, interpreted code, 00:42:53.438,00:42:56.441 corporate policies, privacy, any of those things it’s not that we 00:42:56.441,00:42:59.310 think they’re unimportant it’s just that that data is already 00:42:59.310,00:43:02.213 out there other people are looking at it and we’re trying 00:43:02.213,00:43:05.583 to focus on the blind spot, the thing that no one has data on 00:43:05.583,00:43:10.121 yet and uh you know we’ve uh compared to what we’re doing to 00:43:10.121,00:43:13.991 nutritional facts so you know having the nutritional facts on 00:43:13.991,00:43:17.328 the thing is great but you still sometimes need a doctor or a 00:43:17.328,00:43:20.932 dietician so the security specialists the you know 00:43:20.932,00:43:24.068 consultants they can tell you what diet you should be on for 00:43:24.068,00:43:28.239 software and they can also bring in all of these other factors 00:43:28.239,00:43:32.043 and you know help uh put it into a whole picture for you. >>Yeah 00:43:32.043,00:43:35.079 we aren’t telling you what to buy or what not to buy, we don’t 00:43:35.079,00:43:37.582 want to do that, we want to tell you what's inside of it so that 00:43:37.582,00:43:40.184 you can make your own informed decision the same way that I 00:43:40.184,00:43:44.322 enjoy my candybar I want to eat my candybar um but I do want to 00:43:44.322,00:43:47.658 know you know what it’s made of and I actually enjoy it more 00:43:47.658,00:43:49.727 when I know how many calories >>And since we’ve just got one 00:43:49.727,00:43:52.129 minute left >>and I’m cheating [laugh] yeah so I’m flying 00:43:52.129,00:43:53.798 through this, this is what we’re going to see at the end of the 00:43:53.798,00:43:56.701 year some of those curated data releases coming online we’re 00:43:56.701,00:43:59.337 gonna release in detail on the site our static measurement 00:43:59.337,00:44:02.073 methodologies so other folks can recreate and do it we are not 00:44:02.073,00:44:05.776 releasing the actual source code because that’s uh a gaming issue 00:44:05.776,00:44:08.212 um but we’re releasing enough data so that you can recreate it 00:44:08.212,00:44:10.515 on your own. Twenty seventeen the large scale of data 00:44:10.515,00:44:13.618 analytics dumps and everything else uh internet of things and 00:44:13.618,00:44:15.486 then the large scale fuzzing results and the mathematic 00:44:15.486,00:44:18.189 models are going to be released more publically and described 00:44:18.189,00:44:21.259 publicly in twenty seventeen, two slides, I gotta get to the- 00:44:21.259,00:44:25.496 these are our thanks DARPA, AFRL, AFL community uh Capstone, 00:44:25.496,00:44:28.199 uh Ford Foundation is funding us through consumer reports we’ve 00:44:28.199,00:44:31.369 got some money from DARPA so huge fans they took a risk on us 00:44:31.369,00:44:34.372 and this is the mandatory slide at the end um because we have 00:44:34.372,00:44:37.041 DOD funding inside of it but it’s basic research meaning that 00:44:37.041,00:44:40.144 we can publish without having to go ask permission um because 00:44:40.144,00:44:43.214 this information just needs to be out there is that this does 00:44:43.214,00:44:45.983 not necessarily represent their opinions or anything else from 00:44:45.983,00:44:49.754 the air force, DARPA said me too, make sure me too, and then 00:44:49.754,00:44:52.056 the white house says why do you keep name dropping us all the 00:44:52.056,00:44:57.194 time would you please knock it off um but so be it. Thank you 00:44:57.194,00:44:59.931 very much >>And finally if you want to get in touch with us 00:44:59.931,00:45:01.866 [applause]