>> All right hi everyone. So it's really a great thrill to be here to talk to you about our research. Ills basically catching malware En Masse DNS and IP style. My name is Dhia and I work at OpenDNS, I work as a Senior Security Researcher. This is Thibault, our guru in data visualization, very smart guy and Andree the director of our network operations. Very smart dude in terms of internet and DNS. Let's look at the agenda today. So we'll start with an introduction about open DNS global presence then dive into the main parts of the talk. The first one is about catching malware from a DNS perspective and the second is looking at the IP structure and within the cases we will discuss and support that with a 3D visualization that we have. We open sourced this week. So open DNS has 22 data centers across the globe, 4 continents and that allows us to see a lot of traffic. Mainly DNS of two times. Recursive and authoritative. That allows us to build a lot of useful models. Now the first will be about catching malware in a DNS perspective but before that let's look at the, kind of crimeware ecosystem. Mainly of today to give some foundations. So here we see 4 components maybe. The top part is when you visit the website that is compromised or delivering malicious ads you get some exploits, scripts running your machine seeing you have vulnerability and if that succeeds you will have a malware payload dropped on your box and it will try to phone back to some CnCs. Usually they are domains and in this case they are hosted on fast flux proxy networks. This is the focus of this part today. And we'll be focusing on the Zbot one and they have a characteristic of TTL of 150 seconds. And that's the focus of today. Now a little bit about our research process, we start focusing on the Zbot proxy networks. One is using hadoop data. Most of it is on hdsf, the other one is using a streaming dns traffic, because we have streaming authoritative dns traffic coming in so use that. With that we can identify the malware and we also discovered a phony panel hosted on one of these proxy network domains. And then at the end we will share some stats on the botanic, samples and clients. What is Fast Flux? I'm sure most of you know about it. It's a dominion resolving to a lot of IPs scattered around the world or a single IP with a very low TTL. If we are talking about the Fast Flux proxy network it's a cloud in the middle with a lot of infected bugs they act as an intermediator or a shield for communication between the back on the left and the targets on the right. Targets is usually in machines because they have valuable information you want to steal. Then the communication would happen in two directions. So if the targets get infected they will phone back to announce their existence, their status or data and the back fends they want to push extra payload they drop it on the box and further later it will be dropped on the target machines. This case we look at the Zbot proxy fast network which has a TTL of 150 seconds. Now we mentioned Zeus. What is Zeus? It's one of the most advanced crime world tool kits around. [Applause]. These are the Zeus guys. So they're going to load us with Zeus. So Zeus is one of the most advanced crime ware tool kits around. You feed it with an input which is usually a configuration file and web injects and the configuration file will have the binary ULSs and you give that to the builder and it will spit out the binary file. (URL's). Now on the other side you have a control panel and that one will let the adversary control his bot net because he will see what kind of machines got infected and he can monitor his gains. This slide shows Zeus over the years. >> Be quiet we don't care about Zeus. We care about our new speakers. Cheers. [Applause]. >> All right thank you guys. So yeah we were talking about Zeus. >> Cyber! >> Move on. >> Venous. Okay we are in Vegas. So venous has a lot of CRC's and Fast Flux buttons and that's the focus of today. Venous deserves files, binary files and drop zones. How do we catch the CNC domains we have two methods. The first one just for reference we use a language that allows it to process the data and we extract from a set of files within a certain time frame we'll extract Fast Flux domains then we build a graph then we extract the largest component. That will give you empirically the bulk of the domains and IPs. The other method is using the stream and it looks like that. So we have a lot of fields that are of interest to you. It's faster than doing like a batch processing from HDFS but you have to have your own filters to catch what you need because it's going fast. So for that we can start with Zbot domains and milk them for IPs so you build a pool of IPs and you have the stream coming so you can grab the domains that have an IP or name server IP within that pool. So having had that what wreck do is do more investigation using data visualization and people will tell us about it. >> Thank you. So he explained two different methods to extract some data linking Zbot and domains with their IPs. So first let's look at how to visualize. First we have the source data. Then we can convert data using our own library and visualize it. So this library this is a small library. As you can see it's simple. You can import the library here and create the graph and create a couple nodes and connect them using the nod keys then you can have ties and attributes. That gives you the equivalent of the diagram you have on the right. Let's look at how it looks like once we've extracted the data. So what we're looking at here  ‑ ‑ [Applause]. Thank you. So what we're looking at here are the domains in red and IPs in blue. Then finally we can play that over time. Three months. So we can not that we have a growing field of the IPs are recycled and the domains go from one IP to another. The way we constructed this animation is we took all the domains and IPs the first time and we monitored that over time and finally we can create a time line to see all the events and play it on the 3D. >> Thank you Thibault. Thanks to this investigation using visualization and it helped us confirm a few things. The next step is to figure out what are these domains used for. So it's okay to catch these domains, relatively easy but we see here that they are multipurpose. On the left they're used by Zeus the generic Citadel, KINS and ICE IX in delivering the 3 types of URLs. The other side you have Asprox which is a click fraud botanic and also a fishing in miscellaneous cases and you will find it in the white paper so I have it here for reference. We saw Zeus URL's and also Citadel URL's on the Fast Flux domains, fishing URL's and Asprox a lot of click fraud traffic using the CNC domains. Then this interesting case, this is what we call a DDoS block. Here it's announcing the operating system it's running. And furthermore what we did, these are some more cases of other malware using the domains. Now this someone an interesting case where we discovered the hosted on one of these Fast Flux domains. Pointy tries to steal passwords from a variety of applications on your machine and delivered by exploit kits or spam. This is how it's advertised on some hacking forums. Then this is how it's detected by some vendors. Now if we look further into the structure of the website we'll see some interesting things. But that's a base here for example it has a character set of (unknown) and a timezone is Moscow which is an indication of the origin of the writer or user of the panel. We also see that it has the list of all of the applications it tries to steal passwords from. Now just some snap shots for your reference. You can find them in the white paper. And a few more here. Now the next thing we did was to search for some key words we found on the panel and try to find anything related or similar. And we found like this website that is hosting another piece of malware and this one happens to be [andromida] botnet malware. And then we looked more websites having the same structure as the original panel. Now let's look at the domains and how they are, what's their structure or like where do they come from. So they are using mostly dot RU and SU, so this is the origin or the geographical location of the actors I would say. If you look at the bots, where are they mostly located, you see them mainly in Russia and Ukraine. This is an Eastern European operation to some extent. However if you look at the clients looking up the domains over 24 hours you can see they're in the U.S. Well, you can say that's mainly it's because of traffic in the U.S. but also maybe because a lot of valuable information is in a lot of North American machines, mainly banking machines. Zeus is a banking Trojan so it would make sense. Finally what we did with the Fast Flux domains is we saw what malware is going back to them. We saw with no surprise that Zbot is the top list and also Upatre. Upatre is like a first stage bring down on your machine. It brings down your machine extra maliciousness and in this case it's involved with Zeus came over like a variant of Zeus. So in summary about dealing with DNS we shared with you information and research about the Zbot Fast Flux proxy network. Fast Flux has been around. In this case fast flocks proxy network is alive, it's very active and multipurpose and used by a variety of malware and it's merely hosted in Russia and Ukraine targeting machines in North America and mostly abusing .ru and .su .com most abused TLDs. That's kind of interesting to look at this very current threat. Now having covered the venous style how about the IP style because that's the complement of the study here. So for that we looked at two different things. Usually reputation systems they assign scores to IPs, prefixes or ASNs but counting the number of IPs or domains in the structures or entities. In our study we wanted to enrich that because that's kind of weak sometimes and we wanted to look at the granularity. In one case finer than the BgB prefix which is sub allocated ranges. Say you have a prefix of /16 you want to look at smaller granularity, like /28 and 29. On the other end you want the ASN graph because it's larger than the single ASN, if you're just counting numbers, and that gives us a lot of valuable intelligence and we were able to build some really cool models. So the first one is about the malicious sub allocated ranges. The investigation process is always trying to start with domains and IPs and in this case we investigated the sub allocate the ranges and we did the same thing to see what kind of servers they were running. With that we were able to follow the TTP of the adversaries like the way they operate and how they bring the infrastructure online and the timing. Then after that we were able to generalize the predictions to block before they started delivering the attacks. That was really like ground breaking. Then over time we were able to see the shift in the operations of the adversaries. So we'll start with this example that we covered over several months. This was about nuclear exploited domains and they were abusing in Canada for months. They were basically reserving IPs, certain customers suspicious ones, and these IPs they were ones like /28, 29, 30. Brought online all at once, altogether and that lasted for several months delivering nuclear exploit kit domains. After a few months they shifted to a Ukraine hosted provider than lasted for 7 days. In this case however, they shifting MO because OVH Canada is in the air space and you were able to grab information about customers reserving IPs by looking at the 'who is'. Ukraine is in the ripe space and that's much difficult because it's scarce. Also the IPs were brought online one at a time and also in randomized fashion. This is a way to evade detection. Then they shifted to another provider. This one is in Russia. So the Russian one was keeping the same MO and then they went back to OVH. In this case it was the same people reserving the IPs, the ranges but the MO changed a little bit. Now they are using recycled IPs. IPs that hosted old content or have a rather clean score. And again these are like ways to evade IP reputation system that is a lot of security vendor’s use. These guys are really smart. They react and adapt and try to evade you and give you a hard time. However over this period the name servers were hosted on OVH space. So you might think name servers are much more difficult to prove to be malicious whereas the domains themselves they are easier if you can grab it and that's kind of easier to prove it's related to a certain exploit kit. Now this animation shows you the (unrecognized) pattern of these domains. You have a lot of them pop on a single IP, kind of a firework pattern. It's kind of a high works pattern here over time and they just like die out after a few minutes or few hours. So it's a very interesting case here. Registering domains, delivering the attacks then just dying and never coming back. Now we did the blog back in February about the study and we also took down a lot of the domains in corporation with the research group malware must die. Having said that how can we push this further and kind of give a hard time back to the adversaries? >> What’s your prediction look like? >> Prediction? >> Yes, prediction? [Inaudible]. [Applause]. >> Okay this one is one of my favorite ones. So Mr. T is inflicting pain on Rocky. Real try to do the same on these adversaries. Let's look at this case here, this automatic thing...  Okay stop there. So we looked at these reserved IP ranges over time and we see that they reserved 28 ranges December and within them 1e6 IPs, 86 are used and 63  percent of them are malicious and you see the trend growing. Like in February it was 92  percent maliciousness. The point I'm trying to make here is these guys reserve these ranges, they deliver the attacks with a high percentage on these IPs. So it's no coincidence they're not using them for other purposes. It's specifically targeted and trying to abuse the hosting provider. Now the point here is that if you look at the bgb prefixes where these IPs are living you will see it's a very course granularity. It's a huge one, 17/18/19. You got a lot of IPs to look at in these big ranges. So you cannot get that nice laser sharp view of what is going on. Whereas if you do the IP ranges finer study that gives you a lot more visibility into what's going on and how the bad actors or adversaries are thinking. Now the other interesting thing we did this tried to fingerprint the IPs. When we discovered them we know they're reserved by the same guys they had the same exact server set up. The first is Ukraine then the Russian and it's the same with OVH Canada. So the point here is, by looking at the by the same guys and same fingerprints and we saw over time they always serve the attacks, again this is a preestablished operation being carried over and over again and it's no coincidence so it's interesting to see that knowing those facts we're able to predictively block these IPs before any domains start resolving to them. So that's kind of the breakthrough in the study. Now over time we also saw the adversaries were shifting MO. Now they know what we know. Sought trend we saw growing is that they started abusing compromised domains like domains were like benign, injected with sub domains under them and we did a study about that. So we started detecting that and we collected we collected the sub IPs that were the sub domains and we had a list of all the most abused ASNs. We were looking also at the finer granularity than the big ASN's so we looked at small scale providers trying to see if they were abused or rogue. Then over time we were able to kind of see the evolution of the techniques and methods of these adversaries. And that was quite interesting. So in this case these malicious sub domains were delivering exploit kits mainly and they were dropping Zeus variance if the infection change succeeds and go daddy was the most abused in this case unfortunately for them. Now looking at the top abused ASNs again OVH is on top. 18  percent of the IPs were malicious were from OVH and we also have ASNs scattered across Europe or into North America. Now this table here summarizes the evolution of the TTP of the adversaries. This one is really cool. I like it because it kind of gives you the whole picture. So before like a few months ago and now is the current state. So they used to abuse registered domains under certain ccTLDs and exclusively use them for the attacks. Now days they complement that with abused sub domains. The second thing is the IPs they reserved were all exclusively for attacks. Now a days they complement with recycled IPs and legit stuff. And third they used to bring the IPs all in continuous chunks altogether. Now days it's like randomized one at a time whenever they feel like it. Again it's like evasion techniques. Finally we said they used to abuse OVH Canada and it was possible to take advantage of the air who is because it tells the customers reserving the IPs. Now they using OVH Europe and its the ripe space which is crossing details and on top of that they are using different countries IP pools. So this is clearly a very interesting way they shift operations and methods. But we were able to kind of follow them over time. This one here is about the fine grain hosting perspective so these are like small scalable hosting providers. Mainly serving maliciousness. This guy here, another one in Russia and another in Bulgaria and this one electric kitten unfortunately abused in North America. Then a few more for your reference. Finally what we did here is look at the sub domains like the labels that are injected under let's say Go Daddy domains. Usually it's like 3 labels. You have police.homebusiness.com. So the top used one were police and alert police. We thought maybe this is related to the theme of the attack and that was the case. Like the first two they were used for browser based ware which is a web page in pops up saying you've been looking at weird stuff, porn or what not and the FBI or local police will come after you if you don't pay a ransom. You can kill it by killing the scripts or the browser process. It's kind of a nuisance more than a threat. But still the labels related to the attacks in this case. The rest were mainly randomized but extracted from English like dictionaries. So, I know we have covered a lot. It was mostly DNS perspective and IPS perspective by looking at sub allocated ranges and we mentioned earlier the view of the ASN graph topology. So, before we get into that, we want to know more more details about the internet structure. >> Oh, man he's awake. So yeah the next section we'll take a closer look at how we can use ASN relationships and relationships on the internet with prefixes to detect more malware. Before we do that we need to make sure we're on the same page and understand how the internet works . When we talked to different people we got different replies. For example, if I ask my dad what the internet looks like he will probably point to the cable modem. If I ask the sales guy he says something different and the next room is where the IT guy is and I say do you know what it looks like or can you explain it to me and giggle because it reminds him of one episode of the IT Crowd where you got to borrow the internet for a day, the elders of the internet. It was this magic black book, it had blinking lights on it as you can see and wireless. But obviously the internet is no magic black book. In fact it is much moor beautiful. I guess if you wanted to paint a picture of the internet, this is sort of what it would look like. We have a few more. We do a lot of visualization so in the next few slides you will see more. But the internet obviously as you know it's a network of networks, and it's a graph, we love graphs. So in this picture each dot represents an autonomous system and an autonomous system on the internet is basically one organization. So open DNS is autonomous system, Google is an autonomous system, AT&T is an autonomous system, Verizon and so on and so forth. And each of these systems is identified by a number which we call an AS number or sometimes an ASM. Then each ASM has been allocated or announces one or more IP 4 or 6 prefixes. It can be one or it can be several thousands. That's basically how it works. So we have all these ASMs, they are connected to one another or at least one other because you need connectivity to others. So, now imagine you are on the left-hand side of this graph and want to get to the right-hand side. You need to know how to get there, and so that is why we need a routing protocol which is called VGP. VGP will distribute all this information. VGP in itself is a really interesting talk especially what you can do with that. It's a little out of scope for this but basically we're building a graph using VGP data.You can find all kinds of public data, Routeviews, we use our own data. Just so you know there's about half a million prefixes out there and about 48,000 ASM so you can build a really large one which has challenges when you start to visualize. That these are just some references but yeah this is how you build it. Focus on the last line, which are 2 items. One is the actual prefix which is 67.215.94.0/24, then the next part is the AS and that shows how ASM he is are connected to each other. You can infer all kinds of interesting data from this. In this case the one on the right hand side, 36692 is open DNS and connected and it's connected to 2914 from that I know that's MTT so open up from MTT so on and so forth. You can see all kinds of interesting data from that. Then basically produce it rough then we can use that for further analysis. >> Thank you Andree. So he just described how to create the graph data set from the bgb routing tables. So same thing. We create with the same concept, the same library we can create a graph data set. Then we need to use very, very simple particle physics. So even if you're not math nerds don't run away it's very simple. Essentially the concept is very simple, you take your graph and all pair of nods that's connected and attract themselves to a certain point and every pair of nods not connected will repulse each other. So, you can see here two different forces here. Attractive force and repulsive force. They all depend on distance and they can be manipulated by constant C. Which is like multiplied by the square root of the already divided by number, lot of complicated words just to say factor the density. You can control the density of the layout just by this constant C. Okay, so what's it look like. We start with other nodes in the same position, random fashion, cube, or sphere. Doesn't really matter than then we start running the particle physics. What you see here is the explosion, the graphic expanding in space. So all the nodes are connected together and they will stick together and all the nodes that disconnect will repulse. So you can see the layout looks like molecules which makes sense because the force directed algorithm is inspired by electrical forces and also forces for celestial bodies. It's not exactly the same model because it's way too complicated to implement but still it's very close. Then finally you can see the nodes are vibrating. This means other forces are becoming even and that usually the graph has reached an equilibrium state. [Inaudible]. This is really cool but doesn't work well on scale so we have to take advantage of parallelization. So essentially we run the same thing on the GPU and use a library called Open CL. Actually it's more like a specification because it can be implemented GA, CPUs, etc. But anyway essentially Open CL is very simple. It exposes a couple of work groups and like work items which are pretty much cores and threads in the GPU, and different kinds of memory like local memory, private memory. Private memory is pretty much the resisters and google memory is the memory of the GPU. Essentially it's like a tradeoff between the ease of access of the memory and the performance because the private memory is actually very fast to access. Okay main points. Why are we doing this? We look at the pictures, okay that's really cool but what's the point in the main point is the layout is data driven versus user driven. As you can see the layout depends on the forces and forces depend on the connections between the nod. So the results is a layout and depends on the data structure you're looking at. It is much closer to what I like to call the natural shape of the data model that you're looking at. We take advantage of the GPU for acceleration. We know GPU is becoming insanely fast to operate. So it would be a shame not to take advantage of that and finally main point humans are so much better at processing shapes and color than text data. So let's see what it looks like. We'll look at the whole process. So we take the whole DNS graph. So, like I said, it's placed in a random fashion. A big mess of ASN nods. Rotate around. So on this layout you don't see the edges because usually the process runs off line. It's just for the performance case we're trying to go as fast as possible here. Then we start the force directed process. As you can see it's a very heavy process. Every iteration, I think I had like 0.3 durations per second. So like close to 3 durations per second. 0.3 durations per second. So that runs a GTX550. A graphic that's like 3, 4 years old - something like that. Then that runs for an hour. The results is something like this. So as you can see Andree will give us his expertise on the graph but you can see the network is densely connected and has a strong core and we have kind of a sparse noids around. Then we can extract or create pictures that looks like this. >> : If you are a network geek this is sort of what it looks like and network engineering we use sort of the loosely defined terms, loosely defined, tier 1, tier 2, tier 3, tier 1 is the core of the internet. There's only a dozen or so of them …oops, sorry, ATT, NTT, some of those. Then around that so those are the pink ones are sort of the regional larger regional networks then at the edge is where you have your web posters and people like open DNS, FB and people like that who don't provide connectivity to others. This is another picture where you can really clearly see that. But in this you can see the different clusters which show sort of the regional networks and typically these different clusters represent different countries. This is an animation of the internet in the Ukraine and it shows that Ukraine is relying on only a few sort of upstream networks. So if you want to do something bad if you are able to take that autonomous system out of rotation then you would probably be able to put up a significant portion of a country and we've seen similar things like that in Egypt. Egypt being taken off line. It's only possible because there were 3 or 4 major carriers this that country and more recently in Syria where there's only one national Telco, if you shut that down the whole country is off line. It's ironic because the internet was made to survive nuclear warfare and blah blah blah and no single points of failure. The way the internet is built in a lot of countries is they're still relying on one or two networks and if you take that one out you will be able to take a significant portion of it. So I think this is a really good picture to represent that. So...  [Applause]. >> This stuff is awesome. So thank you for this perspective of the ASN network. Now the thing is we're looking after maliciousness. That was the core of the research. Now how can we use the graph to extract maliciousness? We're introducing the concept of suspicious or sick blink peripheral ASNs then we look for suspicious ones. We look at domains and IPs and study the fingerprints and ranges that we are focused on then that allows us to build relationships between the ASNs by considering the toe poling and then we can verify the AS Ns. This is kind of a breakthrough in case. You will see. Now the malware we're looking at is some kind of a Trojan here hosted on the/24,/24 and we took a sample of these IPs. This is how it's described by AV vendors and we took the sample of IPs and we did the fingerprinting again and we saw an interesting case of the previous study which often have the same server set up. So we have two clusters of server set ups. This is set up in bulk. Now looking at the ASN graph topology: These here are kind of brothers and sisters. Peripheral because they have no downstream providers. They are on the out skirts or periphery of the internet. Now we look for suspicious ones in this case. So this is a graph we took a snap shot from the Ukrainian Russian IP pace in January and we see ASNs, siblings and peripheral and one of the parents is doing the same. A month and a half later one of the parents stopped advertising the kid’s prefixes and more of the children are now hosting the malicious payload. Now the thing is we looked at the entire pool of IPs and we saw a lot of domains hosted on them and we also saw the payload URL's were live. So the point here is that this is an operation established in bulk in advance to deliver this Trojan payload and the thing is we saw this across different kind of IP spaces. So this was a case we are talking about but we wanted to scale it up and use it on a global fashion and here is the introduction from T‑ bow perspective. >> : Sure I would like to give an official approach of the model so we're all on the same page and make sure we're talking about the same thing. So when we look at the graph, when we study graph topology usually we interested in the number of connections. Then we can classify all the Noids depending on the number of connections that we have. So as you can see here there's a blue nod called a sync nod because it has in coming connections and the green is the leaf nod. It only has one outgoing connection and finally the red one is the source nods and the source nods are like two or more outgoing connections and the white ones don't respond to the same criteria. So when we apply the detection this is what happens. Essentially we keep only the peripheral nod which are the nods that don't have any in coming connections. Like the suburbs. Then we keep their parents and the results is pretty much a graph with all the peripheral nodes and their siblings. Including their parents as well. So we can apply this on the same data set for example if Ukraine which is relevant in this case, so the same data set before, same color code so you can see the peripheral nodes are red and green ones and finally when we apply the filter when we extract we pretty much prune the core of the DNS network. This one has a couple of leaves connected to the same parents. So that would be like an interesting candidate for us to look at. Now if you look at the whole DNS network here, I mean the Ukrainian one you can see there's a lot of candidates so another better approach to start from one noid malicious ASN and see if it has siblings. >> Okay cool. So I want to do one step back and sort of summarize what was just said. What we found was basically a whole bunch of prefixes that were hosted by different ASMs. The ones in red were hosting similar type of malware and when we scanned them we had the same exact fingerprints. And at some point some were used to hosting malware but others we knew they were coming on in the future. Why is this interesting? Because typically we look at malware on maybe a domain level or IP name level and sometimes on an AS level but what seems to be the case here is these bad guys or guys hosting this were actually hide not guilty different ASMs and typically an ASN is one of administrative domain. So it seems like this group of ASN is one of administrative domain. Owned and operated by one guy. So how can we prove that? It's our hypothesis. So we figured maybe if one of these ASN's goes down and they were owned and operated by the same guy returning the same infrastructure all the others must go down as well because they're running the same infrastructure. That was the idea. How can we prove that? We looked at the data. So we used data from the guys, they had like 2 years worth of all the outages on the internet. You can look at update messages, those are the two main BGP messages and I will not go into detail but if you look at it and the prefix is unreachable and an hour later there's an update then you can correlate when there's an outage. So that's what we did and looked at all DNSs and looked for outages and this was the result. Basically it's a table showing the ASNs and the overlapping at the same time and the main take away is the numbers are really high over the course of 6 months or so. So I'm zooming in to the upstream ASN. If the upstream goes down then elders must go down as well. Makes sense. But what we see in this case is the upstream one provider maintained up and running for most of the time but the two other ones went down 22 times at the exact same time over the course of 6 months. Here are two examples. This is just for the month of July. Again a bunch of text but you can see that or what it shows is two of these AS Ns went down at the exact same time., exact same time, exact same start time, end time, duration. What does that mean? I think we proved that these different ASN recent actually owned and operated by the same guys. They just hide. So we're sure they rely on the same infrastructure. It could be the same data center or router or switch. But we're definitely sure that it's the same organization and in fact you might even go further and say it could be hosted on the same physical hardware, same servers are multiple IP addresses. But it's a unique approach to find these sibling ASNs that could be related and it allows us to quickly if we know about ASNs we'll run it and it tells us which ones are related and we can then zoom into that. So that's pretty cool actually. [Applause]. >> Thanks Thibault and Andree for the perspective on the ANS network. So in conclusion what we shared with you was the research about catching maliciousness on the internet for DNS and IP. How can we use the ASN graph to extract the maliciousness we are after. We are introducing the concept of suspicious or actually sibling peripheral ASNs and then we look for suspicious ones. We look at IPS and domains and study fingerprints and the ranges that we are focused on. And that allows us begin a relationship between the ASNs by considering the ASN topology, the ASN graph topology. And then thanks to that we can build a model and finally we are proposing a novel approach, we will talk about it later, to verify the relationships between ASN by monitoring bgp outages. This is kind of a break through in this case. Now, the malware we're looking at was some kind of a trojan here hosted on /23 or /24s. And we took a sample of these IPs. This is how it's described on vt by some av vendors. We took the samples of the fingerprints again and we saw the case of the previous study which is all of them have the same server set up. Two clusters of server set ups here. So this is an operation set up in bulk in advance. Now looking at the ASN graph topology we introduced the concept of SPN. SNP is sibling peripheral nods. Sibling in the sense that, you see the top ASN, they share the same upstream provider at the bottom, they rely on him to provide the advertise his prefixes. So they are kind of brothers and sisters in a way. And peripheral because they have no downstream providers. So they are on the peripheral or outside of the internet. So we look for suspicious ones in this case. So this is a subgraph we took. From the Ukraine, Russia space in January. You see a bunch of these ASNs hosting, because they are in red, the malicious payload. And one of the parents is also doing the same. A month and a half later one of the parents detached itself and stopped advertising the kids bpg prefixes. More of the children are now hosting the malicious payload. Now the thing is, we looked at the entire pool of IPs and we sw a lot of domains hosted on them. We also saw the payloads were live before any domains started resolving to them. So the point here is, that this is an operation established in bulk in advance and delivered this Trojan payload and others across different kinds of IP spaces. So, this is a case we are talking about but we wanted to scale it up and use it on a global fashion. And here comes the introduction of SPN from Thibault's perspective. >> I like to give a visual approach of the SPN models so we are all on the same page and talking about the same things. When we study graph topology usually we are interested in different fetters. One of them is the green of the nods, the number of connections. And we can classify the number of the noids depending on the number of connections we have. There is a blue noid, called the synch noid, it has incoming connections. You have the green, it's called the leaf. It only has one outgoing connection, then finally the red ones, they are like the source noids. The source noids have 2 or more outgoing connections. And the white ones are the ones that really don't respond to. So when we apply the SPN detection, this is what happens. Essentially we only keep the peripheral noids, which are the noids that don't have any incoming connections. It's like the suburbs. And then we keep their parents, it's pretty much with the peripheral and their siblings, and including their parents as well. So we can apply this on the same data set for example in Ukraine which was an (unclear) in this case. So this is the same data set, same code, so you can see all the noids, peripheral ones, red and green ones. And finally when we apply the filter we extract the SPN graph you know. That will pretty much prune the core of the SPN network. Then we can select a couple candidates, like this one for example. This one has a couple of leaves connected together to the same parents. So that would be a very interesting candidate for us to look at. So, if you look at the whole DNS network, here the Ukrainian one. You can see there are a lot of candidates. So another better approach , to start with known malicious suspicious with siblings. It's a peripheral noid if it has siblings. >> So, I want to do one step back and summarize what Dhia just said. So what we found was a whole bunch of prefixes that were hosted by different ANSs - the ones in red were hosting similar types of malware and they had the exact same fingerprints. And at some point some were used for hosting malware and others not yet. They were coming on in the future. What is interesting is typically with blocking malware it will be on a domain name level, an IP name level or even sometimes on a complete AS level. But what appeared to be the case here was these bad guys, the guys hosting, were actually hiding in different ASNs. And typically one ASN is one administrative domain right. So this looks like this one group of ASN is one domain, owned and operated by one guy. Okay, so we say, how can we prove that. So our assumption, our hypothesis. So we figure if maybe, if one of these ASNs goes down and is actually owned and operated by guy running on one infrastructure then all the others must go down as well. That was sort of the idea. So we looked at how can we prove that. So we looked at BGP outages data. So we used data bgpmon.net. They have all the information from the past 2 years for all the outages on the internet that you can query. How does that work. You can look at update messages, withdrawal messages. These are the 2 main BPG messages and I'm not going to go into withdrawals. Prefixes are unreachable and then an hour later is an update and you do that from hundreds of routers at the same time. Then you basically correlate when there's an outage. So, that's what we did. We checked to see if there were overlaying outages. And this was a lot of text I know, a lot of numbers. It's basically an overlaying table of the ASNs and the overlaying and outages at the same time. And the main take away here is that the numbers are really high. This is over the course of 6 months or so. So, I'm zooming in to 3 ASNs, one is the upstream, because obviously if the upstream goes down, it's called 48361 then all the rest must go down. Makes total sense. But what we see in this one is the upstream provider maintains up and running for most of the time the 2 other ones went down 22 times at the exact same time over the course of 6 months. So, here are 2 examples, just for the month of July, again a bunch of text but you can see, what it shows that 2 of these ASNs went down at the exact same time, some for 479 minutes, 33 minutes, 63 minutes, etc. Exact same start time, end time, duration. So what does that mean? I think it proves that with these different ASNs are actually owned and operated by the same guys. They just have multiple ASNs and hiding behind that. So we can be sure they are relying on the same data structure, could be the same router, the same switch. The switch goes down, the whole thing goes down. But we are definitely sure it is the same organization and you might go further and say it's even being hosted on the same physical hardware, same servers have multiple IP addresses running on different ASNs. Anyway, it is a unique way to find these sibling ASNs that could be related and it also allows us to quickly if we know about ASN, we will just run the script and it will tell us which ASN script is related, the outages and then we can zoom into that. So, that's pretty cool I think. [Applause]. >> So, in conclusion what we shared with you was about capturing maliciousness on the internet from a DNS perspective. For DNS we looked at the Zbot fast flux proxy network. We described how catch the domains and how it's purpose is highly used by malware. And then on the IP side we looked at two granularities. We introduced a novel approach to monitor relationships based on the outages and we also proposed the model of SPN which is also catching related ones that have the same kind of suspicious purpose and finally we opened sourced our 3D visualization engine that Thibault described so you should use it and contribute back. It's all for the community and we'll be very happy to see you contribute back. [Applause]. >> Some of the references we used then you can follow us on Twitter to keep in touch and thank you again and if you have any questions and we have the time feel free to ask. [Applause].