Get started. Everybody's like, okay, this is cool, but not that cool. Apart from John, who's totally cool. Okay, should we start? Let's start. All right, you're up. So I'm going to boot this off because I sort of started Shellfish, but I'm sort of reaping the benefit of it without really doing anything. These guys are actually the brains behind it and the guys that stayed up all night doing all the work. I'm just looking at them thinking, oh, my God, I remember when I did that. Twenty-five years ago. Giovanni did a lot of high-level planning and sushi delivery. Exactly. That's my role. Feed them. They will poop software. Okay. Cyber Grunt Challenge. If you look at our code, that's really accurate. That's the code. That's the code. That's the code. That's the code. That's the code. That's actually true. That's actually true. So I'm going to be very short on this. Shellfish was born out of the SecLab, which is the security group at UC Santa Barbara. Every time you say UC, people say University of California. That's not right. That's Berkeley. UC Santa Monica. That does not exist. It's UC Santa Barbara. So get it right. SecLab is the group. That's where we come from. And the group is led currently by me and my assistant, um, uh, colleague, Christopher Krugel. Uh, we look very professional here, like professors, but we're actually hackers behind weird handles like everybody else. I, I never got the handle thing, but I, I needed one. And so if you look about Zanardi on the, on the Internet, he's somebody with a gigantic nose and a ponytail, which I once had. Giovanni, would you say Chris is your life partner? I think Chris Me, uh, Chris Me. Uh, Chris Me. Uh, Chris Me. Uh, Chris Me. Uh, Chris Christopher is my academic wife. So I, I have to take care of all his needs and his, uh, I wish he would be here. He would be very, he's super proud of everybody. But this is our university. Not bad. And that's why Shellfish is here. Our lab is exactly there where the arrow points. We're all right on the beach. We have a private beach. And that's why our tagline is Hacks on the Beach. Um, uh, we're lucky that way. Might be back in a while. Uh, Christopher is my academic wife. Subscribe to our YouTube Channel. Safe, Safe Age. Um, noises are a big priority right here. Is it back here? Yes, it is. It is. All right. So, how this started? It started in 2004. I know it's incredibly such a long time It's me, but then had a bunch of, uh, grad students, including Chris, uh, and we evolved into, uh, our community and in 2005 we actually won DefCon CTF. Never won since then. That was the good社 old days and it's all, it's, it's not, it's all through Elon Musk logic for the record. Because cuma awesome because they say the older you get, the more awesome you wear, so I'm milking it for whatever I can, but we grew up, you know, and then suddenly Void, Chris, moved to Vienna, became a professor there, recruited some more people, that became more people, that came back to Santa Barbara because it's awesome, became more people, more students, more students, even more students, and what happened is that some people went to Boston, so we have a substantial presence in Boston, and we evolved as a group more and more in the years, until at a certain point, all our graduate students actually became professors as well, and so a lot of you, you know, UC people became professors all around the world, in London, at Arizona State University, University of Michigan, and so on, and so on, and so on, and so on, and so on, and so on, and so on, we're becoming friends, and right now, Shellfish is a very big group of all academic people all around the world doing interesting stuff, so right now, our group is pretty much this, we're very inclusive, we're, you know, we foster research, and that's what we care about, and with this, I'll give my presentation baton to Jan. Thank you, Giovanni. before we go on with uh the cyber grand challenge itself I'd like to give a shout out to uh all the other shellfishes in the audience so raise your hand if you're a shellfish oh yeah right there nice nice yeah shellfish is uh bigger than just the CGC team the CGC team is a strict subset but we have a lot of uh people that were cheering us from the sidelines even on the team so let's talk about the cyber grand challenge um DARPA has a history of grand challenges right you guys are probably familiar with the self-driving car grand challenge and the robotics grand challenge because uh they got a lot of press similar to the cyber grand challenge just now and uh the idea behind these is DARPA finds this fledgling technology self-driving cars and they fund it with a lot of money right so their prizes million dollar prizes for uh self-driving cars and this motivated a lot of people to put a lot of research into it at the time people were of course saying because the time was 2006 when we didn't even have smartphones and people were saying do you really think that someday you'll be sitting inside a computer and it'll be driving you around that's absurd and now we have people driving themselves to the hospital while they're having a heart attack in their tesla and so you know this technology push really pays off and it's probably going to be the same with robotics DARPA did the robotic cyber grand challenge and probably in 10 years we're all going to be dead and it's also going to be the same with programs so the cyber grand challenge really pushed the frontier of automatic program analysis exploitation and defense right now it's in its infancy I think uh you'll see how the CRS did at DEFCON CTF uh but maybe they won't beat the best humans but that's the beginning the chess systems didn't beat the best humans and the self-driving cars aren't going to beat the best humans in races right now but eventually they will and eventually mechanical fish will kill us all or hack us all while the actual robots kill us so that's the cyber grand challenge um let's talk about Shellfish's involvement in the cyber grand challenge as Giovanni said Shellfish is a bunch of academics and hackers right so we're kind of hackademics so um at one point we decided to shift our research uh interests in uh at UCSB closer to binary analysis right we started looking into uh doing automated binary analysis and all of the things along with that automatic vulnerability discovery and so forth completely independent of the cyber grand challenge we started doing this sometime in 2013 and in late 2013 DARPA announces the cyber grand challenge right and so I have an email somewhere in my history saying hey guys check this out this is this cool thing maybe you should participate because we're working on a lot of the same stuff and everyone said yeah let's do it let's go for it I said great and then promptly forgot about it for like a year right so the deadline for registration was in late 2014 I sent in the kind of uh application literally 15 seconds before the deadline because that's that's how we roll and they uh said great you're in congratulations uh let's you know see what you got the first court event is coming up in like four months and so we were like okay cool oh no like on the graph it's like in one month right so we said cool let's let's build a CRS we're gonna we're gonna rock the scored event the first kind of practice round that that was the term DARPA used for them scored events so the first practice round uh we were gonna we're gonna do super awesome we were gonna kill it and we totally forgot about it the morning of the practice round I wake up and I'm like shit there's a practice round for the the CGC stuff tonight and so we started working on our CRS right so the first commit to the CRS is two hours maybe three hours let's say before the practice round begins right so we start writing our CRS practice round begins we play the practice round with some janky ass CRS that that kinda half works cool so then we're like alright well now we we started we're gonna get it all super put together before the second practice round second practice round rolls around and now we remember about it maybe three days before right so the second commit to the CRS happens three days before the second practice round we uh build it up build it up build it up play in the second round say okay cool now we have this uh kind of cyber reasoning system that's uh kind of ready to play in the CQE if we keep working on it solidly until the qualifiers and then of course we forget about it for another couple months and then two and a half weeks before the qualifiers we remember hey wait a second the qualifiers are coming up so then we start working like crazy and then we start crazy and not sleeping. Three weeks of complete insanity until the Cyber Grand Challenge qualifiers and we have a cyber reasoning system that we can field for the Cyber Grand Challenge qualifiers and we qualify. With three weeks of absolute insanity. And so then we figured cool, now A, we're super rich because the qualifiers came with $750,000 of prize money. And B, we can now spend a year working solidly, right, solidly with test cases, test cases, code freezes, milestones, milestones, lots of milestones and absolutely, you know, continuous integration and, you know, test rounds and everything for an entire freaking year. Agile development, that's the key word here. None of that happened. So for 9 months, we used our money to fly around the world giving conference talks and like saying how, how cool we are and how, you know, Phish is a, is a Chinese martial arts expert or, wait, that was, that was Kevin. Kevin is a Chinese martial arts expert and, you know, Antonio is mysterious and all this shit but really what we should have been doing is working on the CRS, right? And 3 months before the finals, 3 months ago, we realized this. I mean like, I mean, we're like, crap. We should really write a CRS for real actually, right? Like, I mean, we should take what we had in quals and actually like, you know, extend it so it can win finals. So 3 months ago, we, we started working like crazy. We, we stopped sleeping, right? I have a fiance and I haven't seen her in 3 months basically. That, that's, you know, the insanity. To the founding agency that are listening, we're a lot more responsible than it looks. Yeah. This, this is our hacker persona, right? We also have an academic persona where of course we have CI, of course, come on. Who doesn't have CI? And code freezes, right? And we, we finish all our papers 2 weeks before they're due so that our professors can, uh, go over them and, absolutely. This, this is the hacker shellfish persona. Alright, anyways. So, we went crazy for 2 weeks. We, we, we went crazy for 2 weeks. 3 months. We got, uh, the final commit to the CRS 30 minutes before the air gap was established. 30 minutes. Alright? And it was a commit in one of the core components so shit could go wrong. There's a slide for that. And, alright, I'm killing us. So, we did it. We played the CAGC. We got third. And this is the team that we've already introduced. We're from all around the world. Italy, Germany, the US, India. There was a guy uh, qualifying with us who was hopefully sitting in the uh, audience from Senegal. Uh, fishes from China. Were from all over the place. And we are very rich. Because we got TWO 750,000 dollars prizes now. So, that's kind of a brief intro to our involvement in the CTC. I'll pass it off to Yaquapu to introduce the CGC's next-generation burger. CGC as a platform and what it means. Alright so let's thank Jan for a very very very true and very effective introduction to the shellfish hacker. Very distinct from academia. Very distinct from the shellfish academy. Uh alright so just very briefly so what does it mean to actually score well in the CGC? You have to you're gonna go blind with binaries that you have never seen before. You have to analyze them in whatever way you want. There's no limitation on how you do it. You have to own them either by a crash or by leaking a secret and you also have to patch them so that the other guys cannot do the same to you. And this is a classic um classic CTF uh structure that has some modifications to the Cree in in the Cree operating system to make it more modable more easier to model and more efficient to model and easier to handle for a for a program. Okay? So one of the simplifications is that uh so the architecture is Intel x86. All opcodes are legal which can lead to interesting situations that we will see in a in a bit. Um syscalls are simplified much easier to model pretty much read and write select uh allocate the allocate like malloc and free random and obviously exit. A lot easier to model for a program. And the actual binaries are actually a lot a lot more realistic uh a very real they're not uh complete fake binaries. So as a side note the Defcon CTF just finished. The Defcon CTF was also played on the same platform so just an example of how real and complex these binaries could be. One of the challenges in the Defcon CTF was a power PC interpreter and jitterer which was awful. So there's a lot of room for complexity in these programs. And on the actual pwning side um I don't know if some of you guys want to barge in but basically what it means is that there is no there is no state. Every program runs once there is no state. It runs you either own it or it's gonna do it's thing. There's no uh there's no state. There's no file system to modify. This is a lot easier to. To model. For the for the qualifications and only for the qualifications it was just enough to crash the program. Seq fault illegal instruction you will get the points. You have owned the binaries. For the finals things a lot more nuanced and the actual exploitation as we will see is a lot is a lot more complicated and it's a very interesting application of how to use symbolic execution and static analysis. Uh but as a as a as a basic idea of how to use symbolic execution and static analysis. So the uh the two ways you do is either via a control crash in which you can show that you can not only crash the program in some place but you can actually crash the program at a place that the API that DARPA is gonna tell you please crash the program in this place and set this register to this value. If you can do that you verify that you have actually control of the program or alternative that you can leak a secret flag from memory. And on the patching side just uh a brief note on how you how this API is designed so that it does not become too easy like for instance we can submit patches to the binary. Okay so what is preventing us from just submitting a binary just exits. Okay. This programs this program obviously never crashes but also does not do anything useful. So the way this is prevented is that there are functionality checks if you if the program does not maintain it but not deny function if the program is a math calculator it needs to still be able to do all the math operation that it can do normally. And similarly there is no signal handling so no way to just hide away all the segfaults. If you segfault you are crashing. And finally how would prevent us from putting in an interpreter that runs everything so checks before every possible instruction. Am I gonna crash? Am I gonna crash? Obviously it will never crash. But it will never crash. It will never crash. And the way this is prevented by DARPA is that you can actually do it. You can do it if you want. But you're gonna pay a performance price. You're gonna lose points for performance. This is believe me not as easy as it sounds. Understanding exactly how your patch is performing is definitely not an easy task. Many of us looked into it. I looked into it a bit. Antonio looked into it in a bit. It's definitely pretty hard. And then we gave up testing performance. We just say this is a patch. Deal with it. Yes. Yes. That's very true. And you know informally we know other teams also had trouble. But I think no one more than Aravind knows very well how much of a pain. How much of a big pain it can be to actually test the performance and the functionality of binary. So big props to Aravind for actually pushing through this task and actually making it. And this actually helped us a lot during our own internal testing even if it did not go into the live part. And I will now hand over. Somebody. Somebody. Antonio. All right. So the uh CQE for the qualifying event was not the full. That was not the full cyber grand challenge. It was you needed to patch binaries and you needed to crash binaries. You didn't need to exploit anything. You just needed to crash it. The final event you need to patch binaries, crash binaries to find where vulnerabilities are and then exploit those vulnerabilities. And on top of that it wasn't just a simple game or a simple program challenge where you got a binary and you crashed it. It was a game. So you had to have a game theoretic aspect that uh played against the binary and then you had to play against other actual competitors, right? Similar to a human CTF but all with computers. Um so the competition was actually divided into 96 rounds. Uh and that wasn't predetermined. It was you know however many rounds they got through in a day. Uh there was a minimum time per round and that ended up being 96. Uh and there was a bunch of uh challenge binaries as they term. Uh as DARPA terms them. Uh which were provided to the teams by the team members. Uh and the team members were given a total score for each round. To hack. And for each score for each round the team would have a separate round score that when aggregated would be their total score for the game. The score was calculated based on a multiplication of the team's availability, which means how much did they fuck up the binary and how fast the binary still was, right? How much overhead the patches had, which is something uh Yakbo alluded to. The security score, which is how exploitable were the binaries still or were they still exploitable? exploitable and the valuation score which means did we find, did the team find an exploit for this binary. So it was very easy to screw yourself in this context because they're all multipliers. If you completely break the binary even if you have perfect offense, even if you find all of the exploits for this binary then you still get zero points because you broke the binary. In developing for this competition we uh ran into a lot of kind of uh organizational things as I alluded to earlier. We started super late so for example up until depressingly short time ago this was our database. Alright. After all this is a research group run by an Italian. Yes. Again this is our hacker persona. So we actually had to do a join in order to get the actual information. So we had to join in on this database at one point. When we uh got the real database up we were joining between the paper database and the actual database. This is relevant because it's about our performance scores. This is the database of our performance scores we're trying to analyze. Yeah. That's why it's relevant to the previous slides. Specifically this database contains the feedback from some uh practice sessions for the final event. So this is what Diver called sparring partner sessions. We wrote them down and then we had to join them with the real database to get the actual information. So this is the database. Uh we also tried to go into code freeze several times. So at 4 1 p.m. on some god forsaken day. Uh we froze a component of our uh CRS called Farnsworth. Uh and very shortly thereafter this is the commit log. Right. So the code freeze didn't work very well. Um there are commits such as this gem here. So that that that's that's that's the commit log. So that's the Francesco here. The the you know beautiful beautiful code. This commit was okay actually. He just has very high standards. Actually it was probably crap. But you know. Um and then of course this is uh a long time into our code freeze. 12 uh 15 hours before our nodes were shut down a couple days ago. We were still changing very core components of the system. That's me upside down. I was at this point no longer sane. So our uh CRS consisted of a lot of components. Right. We had a um what's going on. We had a central database that we called Farnsworth for some reason. Uh which stored all of the data that uh we got from the uh CyberGang Challenge API through a component that uh we'll talk about later. Uh it stored network uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh uh. Uh it made um stored the scheduling decisions of what jobs to run and then sort the result of the jobs. So now we're gonna go one by one into all of these components probably pretty quickly. You have 15, minuts left and we'll start with the organization or the core organization components and I'll have and over to Francesco and Kevin. So obviously coordination is very nodes and of course since we needed to do that recently came up with like using one database to simply store all the ground truth that we have as a bunch of you probably know this is from Futurama so we just went with essentially Farnsworth because well good news everyone and it's the only component that we actually tested fairly well at about 69% test coverage I think the rest probably dumped around at like 1% Zero? Oh, perfect even better who needs testing anyways right? I mean I think Inger has at least 15% code coverage. I think Francesco probably disagrees but who cares then on top of that we also had Meister which the Germans you know so essentially just Meister which looks at scheduling jobs and deciding what jobs we want to run what kind off our pipeline we want to run exploits patching if we want to run AFL, these kind of things. It scheduled them based on priority and it is obviously, sorry, the last component that we actually changed with the last commit being I guess 2 hours and 18 minutes before the actual deadline. So yeah this was at 12.42 and the same deadline to actually the node shutdown was at 3pm. But we made a commit, I think we rolled that commit back 30 minutes before the deadline. Yeah there were a bunch of commits at like 2pm but we actually reverted them and cleaned up the history just to make sure that they're actually not there because they caused a bunch of failures on our side. Anyways we would also like to give a big shout out to essentially the open source components that we rely on. One of them is Python, the Microsoft Research Z3 compiler, all of our things runs inside of Docker containers which are running Ubuntu with PyPi. We're also using Kubernetes, QEMU, PeeWee, Vax, Postgres, obviously Anger which I'm sure a bunch of people are going to talk about now. And I think that's probably Jan, possibly Solz, Andrew, John I guess and Pizza. Yeah go ahead. No I want to say something. I agree with everything you said. Anger's the open source binary project, binary analysis project that we have in the stack lab. It's really really cool. It's been open source for like a year now. We released it at DEF CON last year right? Yeah. Um it does everything. It's cool. Um no time. It's very cool. That's our logo. It's a creative project. It's really really cool. It's a creative commons. Um we in order to do the actual exploitation and analysis pipeline we split it up into a whole bunch of components and rearrange them into these weirder things like we've used Concolic Execution in order to do some basic analysis of what can go where. There's automatic exploitation and patching which will all be talked about. I think they've all got their sections in this presentation. Um there's crashes. I think you can slow down a little. It's just a cat. Fine. Who wants to? Sorry. So who's the first one? Who wants to talk about crashing? Crashing. Uh guys we haven't been sleeping for three days so check it out. I always talk this fast. I'm sorry if you're friends with me. To all the funding agency we're not doing drugs or alcohol. Looks like it but we're not. I'm not even 21. Alright. Crashes. Sols. Nick. Talk about it. You see how prepared we were for this huge DEF CON talk. Hello. Uh so uh crashing uh so our exploitation strategy is we find crashes and we turn this into exploits. Uh so. Pretty incredible. Uh so actually like a lot of teams the thing we do the most is fuzzing and this is what generates a lot lots of test cases lots of crashes. The majority of our crashes but not entirely uh all the goodies we find. So uh we use AFL as our uh fuzzing. We uh I'm we'll explain how AFL works like these slides do I suppose and uh essentially begin begins by generating lots of inputs which attempt to explore different parts of the program. Uh the inputs are basically random. Uh some of them are are more or less educated guesses and how well these inputs do in exploring the program is tracked by instrumentation which is uh compiled into the binary or which is provided by uh an emulator like QMM. Uh so we're going to go over all of these. Um so let's see did I go over all these? So AFL does a great job of doing this. We've modified it slightly to work better on CGC binaries. So we have a couple of hacks which I think will be open sourcing which make it perfect for CGC or at least a lot better. Uh okay. Uh the uncrasher I don't think that's actually it actually exists. But I don't think there's an uncrasher. I don't think there's an uncrasher. The uncrasher man. The points of flag and all this shit. Rex. So one. It's like karaoke slides. Right. Uh I already mentioned this right? AFL. It's great. This is how fuzzing works. Uh random stuff gets put into the binary. Yep. Same input all over again. Eventually it comes up with a random thing that works. This is much harder for a fuzzer. We have to generate a very specific input. Fuzzing will have no luck with this. A v- keeps continues to lose uh makes absolutely no progress. If you guys can't feel like you can't keep up with Mike Pizza I feel like that very frequently. Okay. So anger on the other hand is a symbolic execution engine. It's slower and more heavy weight but it's great at finding very specific cases like the one we just described. And the way this works is by generating these states following different paths. As you can see here in the control flow graph we have different states which are being um followed. Uh eventually there is a state which will satisfy the you win expression and we talk to Z3. We ask it to generate an input which gives us the state and boom. So what we tried to do is combine both AFL and anger. And we this is called driller. Driller begins by fuzzing. It gets basic code coverage of the program the way you would expect AFL to. It get maybe gets a couple test cases in this example x and y. We get the cheap coverage. Next slide. Then it okay then we take those test cases and we trace all of them with anger. And then we take the input. So we make the input completely concrete almost. We actually make it keep it symbolic but we constrain it to be this concrete input that AFL generated. And we see at any point in the program if we could have taken a different path which AFL failed to take. If we could have taken that path we talk to Z3 or anger more specifically and we say give me an input which satisfies this new path. In this case we get the CGC magic. And the new test case is generated and now we continue the loop and we feed this back in the AFL which continues to mutate that further and fuzz and it goes on and on until we continue to get more code coverage. Uh. And then we play video games. Alright so this next part is the auto exploitation. How we go from a crash which is generated by AFL and driller to actually an exploit for the CGC which scores us a flag. Alright so in this example I think there's a buff so there's a buffer overflow inside the heat inside this mallet object here. And when you overflow this buffer you actually control the function. So you can actually control the function pointer. And so we're inputting inputting inputting symbolic bytes and eventually we control the buffer the symbolic address. We're gonna call in to an address we control. And so to exploit this we use anger. We check we trace the input using anger and check that first the IP is symbolic. The PC here we say is the state does the state have a symbolic PC. At that point we know it's probably exploitable. We can control where we're gonna jump to. And so let's set the buffer to contain our shellcode. We ask v3 to give us input where the buffer point contains shellcode and then we jump to the buffer and that'll give us an exploit. Um and to do this we synthesize the input in an anger that's just called state dot posix dot done zero. So in the CGC this is discovered by taking a crashing input and tracing that with anger. So keeping all the input that AFL created symbolic and then following the path that took until we have our crashing input. So this is the first use symbolic state. So keep in mind this is very simplified. We have a bunch more techniques that handle the harder cases and that can take a not so good crash and turn it into a better crash. And you can find those all when we do our open source release and when we release more details and papers later. And in the open source release um this component is called rex. If you're interested in auto exploitation check that out. Alright so then the the steps again. We've got the program counter. We create a vulnerable symbolic state where we've controlled the PC. We add the constraints to set the shell code and to set the the program counter to point to the shell code. And then we synthesize the input and that creates our exploit. Okay so this uh this component will be talking about auto exploitation of flag leaks. So if you didn't know there are two types of exploits you can generate in the CGC. The type one is sort of classic memory corruption. Show that you can control the program uh counter. And show that you can control the general purpose register. Uh however there's another type called the type two. Very creative. Which uh shows that you can leak arbitrary memory from the program. So in the CGC there's actually a uh sensitive crypt uh sensitive data that's mapped at a special address uh in every single binary. And if you can leak content from this page in memory you score points. Like uh Heartbleed for example with uh there was a Heartbleed challenge in this game which uh uh uh uh uh uh uh uh uh uh uh uh where the premise was leaking this data from this flag page. The sensitive data. So the way we do this in a fast way is we actually use uh the unicorn engine which Anger integrates to make the entire input completely uh concrete. The only thing which is symbolic during the flag leak detection is the flag page itself. So we trace the entire program and it executes very fast because everything is being concretely uh uh emulated by QMU with unicorn. And the only thing we can detect uh in transmit because we hook it with Anger, when the flag page is actually being emitted. And then we can see exactly which transformations are done to this flag page. You can tell if it's been XORed or if some complicated constraints have been applied. For example this actually solved the DEFCON CTF challenge uh which okay. I don't have enough time to talk about that. But we solved the DEFCON CTF challenge this way so. We'll talk about it a little more later. So uh. You have seven minutes. You have seven minutes. So of course one of the challenges was to patch this binary. So we had a component called PatchRex that was going from patch from unpatched binary to patch binary. So the general idea is we have patching techniques. For instance let's add let's encrypt the return address. And these patching techniques generate patches. Such as let's add this code here. Let's add this data there. And these patches were injected within the binary. We had three different ways. The first one was slower but more reliable. And the last one was faster but. Uh less a little bit less reliable. And Fish is probably gonna talk about the reassembler. And so we had adversarial patches that were designed not to make uh our binary our patch binary analyzable by others. And this is a one of them that is pretty cool and. Um this is a detect QEMU detection. This if you run this code in QEMU. QEMU R3D6. It'll hang forever. Well not really forever as long as it takes to int to increment a 64 bit int to the 64 times. That's it. So that's that's the basically forever. Um and we actually owned the cyber grand challenge um visualization infrastructure with this. They're apparently using QEMU for instruction tracing and so at one point during the CGC we noticed that their instruction tracing had just stopped. And it stopped right on this code which was designed to detect QEMU and crash. Well not crash but hang forever. This is a zero day take a picture. Uh there's we have a lot of open source bug fixes to contribute starting now. So there were other sort of all sort of adversarial patches so to speak. For instance our binary was starting by transmitting the flag out but uh uh they were transmitting to STDR. So to STDR so that uh this could probably confuse an analysis system that could misidentify this as a as a type two vulnerability. We also have a back door that if some team was using our patch uh in the in in their uh submission we could actually. Exploit that. And I'm not sure if the back door worked during the CGC but for sure it worked during the CTF. Yeah how many team. During DEFCON. How many team. I know that. Fielded our back door. A lot of teams use our back door during uh DEFCON. Can you name names? I'm sure. No no no no no no. It was three teams that fielded our back door at the CTF. During CGC? CTF. CTF. CTF. Okay cool. So then we had also some sort of genetic patches that are these are more standard academic things such as uh protecting the return pointer, protecting data codes, and when when we are going to release these uh code you will see all these sort of kind of more standard techniques. And then targeted patches. So the general idea. Oh you can speak about something. So targeted patches right so qualification events. We just wanted to avoid crashes right cause uh anything that crashes counts as an exploit. So we had some uh you know we just checked uh using a weird query. So we had some uh quirk of one of the syscalls uh using a weird quirk of one of the syscalls we checked to see if the uh if memory was uh readable at a certain point if it wasn't recrashed. So I would like to take specific credit for our no back one slide. For our targeted patches in the final event which were exactly nil. And it worked great. So what what can I say? And one note that uh. No functionality overhead. I thought it was a bug in the slides. No no that was intentional. And one one cool thing about this is that we we thought we were cool uh finding these uh weird syscall tricks to detect memory locations but actually when we analyze uh uh qualification binaries from other teams when they were released we found at least one other team was using exactly the same trick. So you're saying they were both cool? Yeah we're both cool. Yeah. Okay. So uh we are running out of time. So what uh the only thing I want to say is uh anger is awesome. I spent three minutes on it. So I'm going to have three days in writing um reassembler and another three days in writing optimizer. So it works out. So so what is a reassembler? Just real quick. Reassembler is a static binary uh rewriter that basically okay we'll um talk about it later. No no no. Okay. Oh it's fine. Alright we have we had a we had a breakdown from our I think I think one of our slide guys is uh is um. It's fine. Okay it's fine. But the reason it's fine. It's fine. It's fine. It's fine. It's fine. It's fine. Reassembler is awesome. Fish wrote a binary writer where you can inject code into binaries and it'll seamlessly reassemble the binary to include that code. Check it out in the open source release. You go. There's nothing much to say. Basically tried so we DARPA gave us 64 powerful servers. Wait how many servers? 64. 64? I'm not joking. 64. Holy shit. Not 30. 64. So we tried to maximize this usage. The usage of the of these nodes. And yeah we kinda did it with the CPU at least. Not the memory but that's it. That's that's it. So the 64 servers we had a lot of media attention over the CGC and uh what we got what what we got people excited about the most strangely enough is the fact that we had 64 servers all to ourselves. Incredible. Anyways. So we implemented all these systems in uh breakneck like 3 months uh and we pushed as hard as we could. We got it all running. We made commits at the last second and we played the game. Or rather our baby played the game. She walked on her own. We walked into the room and they told us hey your guys' bot started up and it's doing a lot of disk IO and we fucking lost it. Because until. We freaking lost it. Because up until then we thought you know it's gonna turn on and something will fail and and it'll all crap itself. So this was incredible. And then we got 3rd place. Top 3 is amazing for us guys. I can't I can't tell you how incredible it is to have been part of this comp- and we're going on. It was incredible. Since we played in the CTF we didn't really get much uh of a chance to actually look at the data. Um however we quickly briefly looked at it. So in total there were 82 channel sets fielded. At least our bot saw only 82 so if more have been fielded we might have actually missed them. In total mechanical fish generated about 2450 exploits. Um we generated a total of 1700 exploits for 14 out of the 82 um channel sets. All of them have 100% reliability in so far as score like always leaking or something like that. So we're essentially um crashing at a specific address. Did you check how many were like mostly reliable? Um I did not. So essentially it seems that we only got 14 out of 82 channel sets. We do not know how many essentially Gram Attack with Tech Axe and Zandra got or Mayhem with 4 all secure. The rumors are that we have top exploitation but we didn't have the best game theory. So like always our SLA sucks. Our SLA is shit. And yeah so in total uh can you back up one slide. Um these are essentially the exploits that we actually generated some uh for. Actually I should say the the caveat to those rumors is Mayhem was only up half the game and I think they still got almost as many exploits. So yeah. Yeah. And so we got two of the rematch challenges so so two of the historical challenges that DARPA introduced. One of them was SQL Slammer which I think two other teams are also got but don't quote me on that. And then there was also Crack Adder which supposedly only we got right. And then in total if you look at essentially the different challenges that we had and the vulnerabilities that were in there this is the list of challenges that that essentially we got. And with that from all of us thank you for the attention. So real quick let's talk about the next steps. Real quick. The next steps beyond automated hacking is. Machines augmenting human intelligence. So in DEFCON CTF we hooked up our CRS. Mayhem as the winner they played completely autonomously. We played with our CRS. So I mentioned already that the CRS actually pwned one binary without us even realizing it. It actually assisted us with five of the exploits. There were five exploits at which either after providing the crash um or after just providing interaction it created an exploit for um and our CRS inserts back doors into every binary that it patches. And so you might have heard already that a lot of teams actually used our back door. This sounds all awesome but we didn't win even close. We almost got close to last. So let's turn down the bragging. That's right. Just a tiny bit. The CRS did amazing. But there were some issues. Like for example the DEFCON CTF did amazing. But there were some things that they were having to do the CRS not organizational had to implement a separate API for the infrastructure than DARPA did. Right? Because the DARPA API had to be secret so that you know everyone wasn't even on a playing field. And so there were some API incompatibilities and computers are very brittle and so these API incompatibilities screwed us until the very last day. So the last day I feel we had a good showing. Up until then the CRS kept crashing, the CRS kept getting invalid data. It was kind of touch and go. Um Um, so, as you, uh, might have heard, we're going to open source everything. We're going to do... Thank you. We, uh, we're going to do a full open source vomit because we believe in raising the playing field for everybody. So, the next time a CGC runs around, rolls around, we expect all of you to play as well. Hopefully using our stuff. So... We don't, uh, have it all ready right now to push to GitHub because we are playing the CTF. We thought we had time but we don't. But Chris, do you think we can do a symbolic open sourcing of ng-rop? Alright, let's do it. Right on stage. Uh, I'm gonna unplug the video, Kevin, so Chris isn't logging in. Unless, I mean, just don't type your password into the wrong field. I've seen that before at DEF CON. It was incredible. It was someone fairly famous, too. Ah, there we go. Better save than sorry. I think their password was star star star star star star star star star. I enabled logging before. Pssh. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. I'm sorry. Ciao ciao four is what Giovanni says. I think that's his password, though. Alright, so we're gonna plug it back in while we try to, uh, desperately find the settings of the open source project. So ng-rop is our ROP compiler. So if you are tired of writing return oriented programming payloads by hand, you can, wait, hold on, let me explain what it is. You can, uh, use ng-rop, which uses ng-r, to compile ROP payloads into whatever you want. So you say, actually, just read this memory, or execute the syscall, and it figures out the ROP payload that it needs to generate. Chris wrote it. He's an amazing guy, and it's an amazing project. And here it is, being open source for the world. Boom. The rest of the code, uh, is just, uh, we need to scrub three of, uh, the private keys, because there are so depressingly many, uh, and other, uh, depressing, uh, things, and then we will push it out. This week. Also if you find a private key that we haven't scrubbed, can you please, gently, let us know? Instead of, destroying our infrastructure. Yes, please. We will appreciate it. We're hackers. Hackers have some of the worst security in the world, so, and and and... I wanna change the, exactly. I my password is six characters long, just to give you an idea. Alright, Kevin, how do I get back to our uh, thing? But I think we're done, basically. Thank you guys. Thank you guys. So, stay in touch, hit us up on Twitter, by email, jump on our RC channel, you can chat with us about our CRS at ShellfishCRS and Freenode, I'm the only one there right now, it's super exclusive, or on Anchor at Freenode, on Anchor questions. Are there any actual questions? Yeah, hi, uh, congratulations. Thank you. Uh, on your, uh, work. Um, so in your driller paper, you had said that, uh, uh, the fuzzing was mostly responsible for 68 of the binary CRS whereas, uh, having the symbolic execution based fuzzing only let you find, uh, vulnerabilities in 11 more than that. So, uh, is that still the case or is the symbolic execution more effective than fuzzing now? You want to talk about drillers 3.0? Uh, sure. So, one thing we've done to actually improve... One thing, one thing we've done to actually improve, uh, driller, uh, especially on CGC binaries is to identify functions and install sim procedures, uh, in their place. So, what this means is that a lot of basic block transitions, which are hard for, uh, or uninteresting for one symbolic execution solve, are more interesting when we have a sim procedure. We can talk about it more if you want to come up here. Mike? Oh, last question. Okay, well, uh, congrats guys. Thank you. First, uh, second, I wanted to know, uh, how compute bound you felt. Like, were th- wh- did you get enough compute power, too little, too much? Would you put something else in there? Backplane, RAM, what'd you think? So, at this point we don't actually know, because we haven't gotten a chance to actually look through all of the logs. Um, we had some problems in the very beginning, so actually on Wednesday still, to get all of our Kubernetes pods scheduled, simply because Kubernetes was not catching up. Um, we kind of solved that, but we, at this point we don't really know what the status is insofar as the utilization of all the nodes. From watching the power consumption, it seemed that the way that it dropped off, it seemed that it had a lot of unnecessary jobs that it would deschedule later. So I think we could have used a little less even, and, and it was still, yeah, we could have probably used 32 nodes and done about the same. But the more the merrier, especially if we can schedule more jobs. We definitely had jobs to schedule that we couldn't schedule because of delays in Kubernetes. Cool. Thanks. Alright, thank you. And thank you for organizing this thing. Please give Shellfish team a huge round of applause. What they've accomplished is immense. Thank you guys, it was a dream come true to be here. Yes.