00:00:00.868,00:00:06.874 >> Karen Bensen, Take it away [applause] >> Hi everyone, thank you so much for coming today to 00:00:14.448,00:00:20.153 learn about examining the internet's pollution. As announced, I'm Karen Bensen, and 00:00:20.153,00:00:25.158 I'm really excited to be talking here today at my first defcon. Uh so... To start off, a couple 00:00:28.929,00:00:35.836 years ago on reddit, somebody asked the garbage men on there, um, about the illegal, strange 00:00:35.836,00:00:41.708 and valuable things that they had seen while examining other people's trash, and you can go 00:00:41.708,00:00:46.880 find this thread and read what they found, but the main takeaway is that they found a 00:00:46.880,00:00:53.754 number of interesting and valuable items. So today I'm going to talk about the 00:00:53.754,00:00:59.559 analogous question, but for the internet. We're going to ask "what sort of interesting and 00:00:59.559,00:01:05.299 valuable information can we find looking at some packets and traffic that you may consider 00:01:05.299,00:01:10.304 the internet trash?" And, um, I feel that I'm pretty qualified to talk to you about this, not 00:01:12.439,00:01:17.444 because I'm Oscar the Grouch, but because I just defended my PhD, in which I spent the last 4 00:01:19.780,00:01:24.785 years looking at this type of traffic. [applause] And prior to that I looked at, not so trashy 00:01:31.525,00:01:37.030 traffic, but writing intrusion detection software, so I've looked at some packets. Um, 00:01:37.030,00:01:42.169 alright. So, quick outline of the talk, basically I'm going to go a little more into depth 00:01:42.169,00:01:47.174 about what this 'trash' is, and the various ways that you can collect this. I'll talk about 00:01:49.276,00:01:53.513 the ways that we collect this, and the ways that you could possibly collect this on your 00:01:53.513,00:01:58.785 own networks, and I'll go into a little bit about the data that I used for the presentation, and 00:01:58.785,00:02:04.091 then the bulk of it is going to be about the interesting and valuable items that you can find 00:02:04.091,00:02:09.096 in trash, and then there will be a conclusion. Alright, so, what is internet trash? Or, this is 00:02:11.665,00:02:16.403 something I made up, so what am I calling this? Um, so basically, I mean any 00:02:16.403,00:02:20.707 unsolicited packet. So this means you're not going out trying to get people to send 00:02:20.707,00:02:26.980 packets to you, you're just passively capturing everything that comes to you with your own 00:02:26.980,00:02:31.985 IP addresses. And this has a name other than 'trash', it's 'internet background radiation' 00:02:34.621,00:02:39.626 or IBR. Um, and people have studied this for a long time to look at worms and stuff like 00:02:41.795,00:02:45.332 that, but I'll tell you kind of more of the things that have happened in the past couple 00:02:45.332,00:02:52.039 years. So probabaly the most obvious example of IBR is scanning. When you're searching 00:02:52.039,00:02:57.811 for hosts that run a service you're going to send packets to hosts that will respond to you 00:02:57.811,00:03:03.116 as well as hosts that are behind firewalls and they're not going to respond to you, and possibly 00:03:03.116,00:03:08.121 to people like me who are just kind of collecting the garbage of the internet. We also get, 00:03:10.857,00:03:17.297 uh, backscatter packets, which is any packet that's a response to a forged or spoof packet, and 00:03:17.297,00:03:23.336 typically you think of these in denial of service attacks. So you have a victim and the 00:03:23.336,00:03:28.508 attacker, and the attacker doesn't necessarily want everyone to know that they're 00:03:28.508,00:03:34.915 the one launching the attack, and so they may be able to forge the source address or the 'from' 00:03:34.915,00:03:40.487 field of the packet, and when they send it to the victim, um, the victim may have a hard time 00:03:40.487,00:03:45.926 differentiating between forged and non forged packets, and they may respond, but they're not 00:03:45.926,00:03:50.931 going to respond to the attacker, instead they're going to hopefully respond to us. 00:03:53.200,00:03:58.405 Next, we have misconfigurations, which is when you just erroneously believe that a 00:03:58.405,00:04:04.077 machine is hosting a service. These can be small scale, like someone typing an IP address 00:04:04.077,00:04:09.883 incorrectly, but it can also be pretty large scale, um, and affect a lot of hosts, and we 00:04:09.883,00:04:16.022 see this a lot in peer to peer networks. Similar to misconfigurations are bugs, and 00:04:16.022,00:04:21.261 this is when you have some sore of software error that causes the packets to reach and 00:04:21.261,00:04:26.500 unintented destination, such as a byte order bug. So even if you know your DNS server correctly, 00:04:26.500,00:04:31.505 you may, um, because of issue in software, send the packet to, uh, an unintended destination. 00:04:34.574,00:04:39.579 Uh, we also get a bunch of spoofed traffic, where, um, for some reason people are using the 00:04:43.150,00:04:48.155 wrong address. Um, they typically aren't trying to attack me, but, uh, we still get 00:04:50.390,00:04:56.797 some packets like this. Um, and then finally there's some traffic that we just don't know 00:04:56.797,00:05:01.735 what it is. This can be, um, TCP SYN packets to non standard ports, or UDP packets where you 00:05:05.038,00:05:10.010 don't understand what the payload is. One example of this is encrypted packets. They are 00:05:10.010,00:05:16.650 difficult to understand what the intention of that packet is. So this is kind of a summary of the 00:05:16.650,00:05:21.655 major classes of IBR. Um, so how can we collect this? You've probably heard of honeypots, 00:05:24.624,00:05:30.564 where you purposely set up machines to be infected with malware, maybe run an old 00:05:30.564,00:05:35.569 operating system or some sore of vulnerable service, and the... with this you can get really in 00:05:38.572,00:05:43.710 depth information because you're infected, and you understand the attack vectors and the 00:05:43.710,00:05:48.181 concequences of this. But if we don't want to do something so in depth, we can... um, do... we 00:05:48.181,00:05:50.183 can have some other set ups. The first example of this is just, uh, collecting one way traffic. 00:05:50.183,00:05:52.185 So, if this is your network, and these are the used machines in your network, you announce a... 00:05:52.185,00:05:57.190 some BGP prefix, and you probably have some sort of middle box keeping state of the 00:06:14.040,00:06:19.846 connections and which ones are bidirectional and which ones haven't received an 00:06:19.846,00:06:24.417 acknowledgement yet. And if they never receive an acknowledgement this is probably some sort of 00:06:24.417,00:06:31.324 unsolicited pack traffic, so you can store this as your collection of IBR. Similar to 00:06:31.324,00:06:37.564 this you can have, um, a greynet, where your state is the IP addresses that are used, and 00:06:37.564,00:06:43.570 then you just know which other ones you can reach as storage as they come into your network. Uh, 00:06:43.570,00:06:48.575 another concept related to this is if all of your addresses are in some small BGP prefix but you 00:06:50.777,00:06:57.384 have a much larger one, you can announce the whole prefix that you have and then based on the 00:06:57.384,00:07:03.857 destination decide which ones to route to the destination or route to storage, and then 00:07:03.857,00:07:10.563 finally an extreme example of this is a network telescope where you just don't use a BGP 00:07:10.563,00:07:15.568 prefix that you have and you record all the traffic that comes in. Um, and in the order 00:07:18.004,00:07:23.910 that I presented these, um, it becomes easier to scale and implement and there's normally 00:07:23.910,00:07:30.016 relatively fewer privacy concerns, um, but you lack the ability to do really in depth, 00:07:30.016,00:07:35.021 um, analysis if you're not responding and people can avoid your, um, IP addresses. Uh, for 00:07:37.657,00:07:42.662 this talk I am going to use traffic collected at a number of network telescopes. Um, so we 00:07:46.199,00:07:52.372 have multiple large academic netwrk telescopes, um, and we receive a ton of data from 00:07:52.372,00:07:58.812 these. We're currently capturing about 5 terabytes of compressed PCAP per week, and we have 00:07:58.812,00:08:04.250 traffic going all the way back to 2008 so we can do some historical studies with this. 00:08:04.250,00:08:09.255 And with this, uh, with this data we see traffic from all over the internet in terms of 00:08:12.292,00:08:17.297 the countries we see all countries except a few islands in the Pacific Ocean, and in 00:08:19.499,00:08:25.872 terms of IP addresses we are seeing about 5% of the announced IP addresses in BGP, so it's a 00:08:25.872,00:08:30.877 pretty good sampling, and I'm showing you data from July 2013, but if we look over time this 00:08:33.780,00:08:40.520 is... we're almost always seeing data... um, I didn't extend this graph, but it's just increased a 00:08:40.520,00:08:45.525 lot recently too. Uh, there can also be events such as the spam house attack, which was a really 00:08:51.431,00:08:56.436 big DNS based, uh, denial of service attack and with this attack we see... this event, we 00:08:59.339,00:09:04.277 were able to see traffic from more hosts. So, now we get to go to the exciting part of the talk 00:09:08.248,00:09:13.453 where we talk about the interesting and valuable, um, things found in the internet's 00:09:13.453,00:09:19.492 trash. Uh, so for this section I'm going to go through the major classes of traffic, 00:09:19.492,00:09:25.365 besides spoofed and I'm going to tell you about the thing that I think is the most exciting, um, 00:09:25.365,00:09:32.272 for them. So, in terms of scanning I'll talk about some trends and some relationship 00:09:32.272,00:09:39.212 vulnerability announcements. And to collect this data we used the historical data that we had 00:09:39.212,00:09:44.217 since 2008, and we just applied Bro's, um, parameters for determining if an IP address is 00:09:47.821,00:09:52.826 a scanner which is if you send packets to 25 different IP addresses on the same protocol 00:09:54.928,00:09:59.933 and port within 5 minutes, Bro would alert that you were being scanned. So this is maybe not 00:10:02.101,00:10:06.506 the best definition of a scanner, 'cause it obviously depends on how many IP addresses 00:10:06.506,00:10:11.511 you have, and it's definitely not capturing, um, slower scans, but it can give us a kind of a 00:10:13.813,00:10:18.818 first look at the macroscopic, uh, scanning that's happening on the internet, or at least of our 00:10:21.454,00:10:26.459 networks. So, I broke up the data into what was happening from 2008 and 2012 first, and 00:10:30.396,00:10:36.769 you can see that the colors correspond to ports and we see in terms of packets and IP 00:10:36.769,00:10:41.774 addresses, the purple, uh, port is very popular, and this is TCP 445 and we see that the first 00:10:47.013,00:10:52.852 increase is right when the Conficker outbreak occurred, and then we see subsequent, um, 00:10:52.852,00:10:57.857 increases, um, often corresponding to new release of Conficker. Um, but we can't say 00:10:59.959,00:11:06.666 all of this is necessarily Conficker because there's other scans of this port, though most 00:11:06.666,00:11:11.671 of it happens to be from Conficker. Uh, so we can come up with heuristics to determine 00:11:15.175,00:11:20.847 which packets originate from Conficker, and to do this, um, we can exploit a bug that 00:11:20.847,00:11:25.718 Conficker has in its psuedo random number generator, for the most part when it's randomly 00:11:25.718,00:11:32.625 scanning the internet to propogate it has a bug where it only targets IP adresses a.b.c.d 00:11:32.625,00:11:37.630 where b is less than 128 and d is less than 128, so it's only really scanning a fourth of the 00:11:41.534,00:11:46.539 internet. And so we used a heuristic based on the birthday problem, which basically says 00:11:51.945,00:11:56.449 "Given a random group of people, what is the probability that two people are going to share a 00:11:56.449,00:12:01.387 birthday?" And often this... you're, it's like surprising, it's only like 34 people, and 00:12:03.423,00:12:08.895 its [inaudible]... likely that people share a birthday. Um, so, another way of asking this 00:12:08.895,00:12:15.468 question, um, is "how many unique birthdays can we expect given n people and 365 00:12:15.468,00:12:20.473 birthdays?" So turning this into a identifying Conficker, if we have IP addresses a.b.c.d that 00:12:24.611,00:12:29.616 are being scanned, we can look at the individual bytes of the IP address, so if we look at d 00:12:32.018,00:12:37.023 and we say "how many unique d values can we expect given either targetting 128 or 256 00:12:39.826,00:12:44.764 targets, what are the possible values for d?" And you can repeat this for the other bytes 00:12:44.764,00:12:49.335 and you can then start to differentiate between randomly scanning a quarter of the 00:12:49.335,00:12:54.340 internet versus the entire internet in expectation. So, if we look at the Conficker 00:12:56.743,00:13:03.116 outbreak, um, and the amount of scanning that happened around that time period, and this is, 00:13:03.116,00:13:08.121 uh, this graph is in log scale, we do have some missing data, but we do see an increase right 00:13:11.524,00:13:16.529 when Conficker was discovered. So what we would expect here is that we wouldn't see any, um, 00:13:20.099,00:13:25.104 hosts matching our Conficker heuristic. However, when we look at the number of IP addresses 00:13:28.107,00:13:33.112 meeting the Conficker heuristic this is what we see. And so for... up until about August we 00:13:35.415,00:13:40.720 didn't see... no IP addresses met this heuristic, and then all of a sudden we started seeing 00:13:40.720,00:13:45.725 some traffic. Um so this is, and this is all before Conficker was actually discovered. So this is 00:13:49.362,00:13:56.169 evidence that someone was trying to actually, like, test out their Conficker bug prior to 00:13:56.169,00:14:01.107 this, um, and on the first day the IP addresses were all in the same province, and the first 00:14:03.209,00:14:08.614 couple days, they were all in the same province in China. Um, and, so maybe this is helpful. 00:14:08.614,00:14:13.619 Um, as far as I know, nobody has claimed the Microsoft $250K bounty to collect th, um, 00:14:17.123,00:14:22.128 Conficker worm author, so perhaps this information could be useful for that. Alright, so, 00:14:25.498,00:14:30.503 that was before 2012, so if we look at what was happening since 2012, not surprisingly Conficker 00:14:33.206,00:14:38.211 is, uh, dying out, but the most popular port has been replaced with port 23, which is Telnet, 00:14:41.914,00:14:46.586 and the best explanation I have for this is that people may be trying to scan for internet of 00:14:46.586,00:14:51.591 things, if you have a better idea, uh, let me know, and, uh, we can also see some other 00:14:53.726,00:15:00.366 interesting things happening here. So this spike that is in grey, it was a variety of ports, 00:15:00.366,00:15:07.173 and it corresponds to traffic from the kind of bot net, which was somebody decided to, um, 00:15:07.173,00:15:13.713 create a bot net, scan the whole internet, and then publish all the results anonymously. So we 00:15:13.713,00:15:20.119 see this and we can verify that traffic was actually coming from the [inaudible] botnet based on 00:15:20.119,00:15:25.124 their data. So if we look at the IP addresses we notice some period of time where there's, 00:15:30.196,00:15:35.201 um, increased activity on a port. So if we look at, um, heartbleed right around there, 00:15:38.671,00:15:43.209 and here you can see in red where the heartbleed vulnerability announcement 00:15:43.209,00:15:48.214 occurred, and then, like, a week or so later we see a lot of increased activity on the pink 00:15:50.716,00:15:55.721 port which is TCP 443, which is where heartbleed likely could be exploited. Um, similarly a 00:15:59.725,00:16:04.664 little bit later we see a lot of traffic, a lot of hosts scanning TCP port 5000, so just Google 00:16:10.603,00:16:15.608 searching TCP port 5000 during that time, we [inaudible] had a report that they were seeing 00:16:17.844,00:16:23.282 lots of universal plug and play deviced being used in denial of service attacks, and prior to 00:16:23.282,00:16:28.287 that report we see evidence of scanning on that port. So, um... so we can... we're potentially 00:16:30.857,00:16:35.862 seeing activity before it was used in an attack. Alright, so that was scanning, hopefully we 00:16:37.997,00:16:44.036 will release our scanning dataset pretty soon, but going on to backscatter, I'm going to 00:16:44.036,00:16:49.041 talk about an attack that we've been seeing on authoritative DNS servers. So just a reminded, 00:16:52.144,00:16:58.751 backscatter is a response to a spoofed packet. So let's suppose you have a web server that you 00:16:58.751,00:17:03.589 want to want to perform a denial of service attack on. You could do the denial of service attack 00:17:03.589,00:17:10.563 directly on the web server, however there is also another weak point. All legitimate hosts 00:17:10.563,00:17:17.136 who want to contact that web server need to find the IP address associated with the 00:17:17.136,00:17:23.075 name. So they have to do a number of DNS queries. So it turns out that you could also 00:17:23.075,00:17:28.080 perform a denial of service attack on the authoritative name server. So, one way that you can 00:17:30.883,00:17:37.590 do this is with an open resolver, and an open resolver typically with DNS you should 00:17:37.590,00:17:42.595 only resolve domains for machines that you administer, so you UCSD's domain server should 00:17:45.565,00:17:50.569 only resolve, um, domain names for clients in UCSD. So, um, so it's typically considered bad 00:17:54.807,00:17:59.812 because otherwise you could use them in, um, DDoS attacks. So you could use an open resolver 00:18:02.348,00:18:08.854 to pull off this attack on the authoritative name server, in particular the attacker can 00:18:08.854,00:18:14.660 spoof a packet, a DNS query, send it to the open resolver, and since the open resolver 00:18:14.660,00:18:20.900 resolves the data for everyone, it's more than happy to ask the authoritative name server and 00:18:20.900,00:18:26.872 they get a response, and since the original query was spoofed they do not respond to the 00:18:26.872,00:18:31.877 attacker but instead it's likely that they will return... respond to our network telescope, or 00:18:33.913,00:18:38.918 there's a probability that it would do that. So, um, we're seeing... a lot of traffic 00:18:42.588,00:18:47.593 recently from open resolvers, so this is 2014 data. So prior to pretty much the end of January 00:18:54.567,00:18:59.572 2014 we didn't see pretty much any traffic from open resolvers, we saw about 3000 open resolvers 00:19:03.609,00:19:08.614 per month, and then starting in February 2014 we saw 1.5 million open resolvers per month, and we 00:19:11.751,00:19:16.956 noticed that once this attack sort of took off we were seeing traffic from the same open 00:19:16.956,00:19:21.961 resolvers over and over again. Um, this is only a small fraction of the open resolvers 00:19:24.497,00:19:30.870 used on the internet. The open resolver project which is scanning, active scanning at the 00:19:30.870,00:19:35.875 same time saw about 20 times the number of open resolvers that we did. Um, but this is... this 00:19:38.711,00:19:44.583 means that this attack is only using a subset of the open resolvers, and... but we can 00:19:44.583,00:19:49.588 also look at some other data that we have from the attack, which is the status code that 00:19:53.225,00:19:59.098 comes back with your DNS response, so if it's like 'OK, everything's happy', but you can 00:19:59.098,00:20:05.471 also get a number of failures including a serve fail which indicates that there's a problem 00:20:05.471,00:20:11.544 with most likely the authoritative name server, and in the month of data we got 00:20:11.544,00:20:17.249 serve fail errors from nearly every open resolver that we saw, whereas in the open resolver 00:20:17.249,00:20:23.689 project scan they see this error very seldomly. So this is evidence that this attack is 00:20:23.689,00:20:28.694 actually overwhelming authoritative name servers. And, so uh, one interesting thing is 00:20:32.098,00:20:38.437 that we see some data on January 29th and then the attack seems to really take off in the 00:20:38.437,00:20:43.442 beginning of February, and this first day the domain that was queried was [inaudible], which 00:20:46.045,00:20:51.050 is a popular website so this reflects a testing phase here. Um, since then there's been 00:20:54.987,00:20:59.992 lots... the domains seem to be, um, just used for a very short period of time, a number... most 00:21:03.295,00:21:08.300 of them seem to have bogus, um, registration information, um, and we're still seeing this... 00:21:10.769,00:21:15.708 all this analysis was from the first month of activity and we're still observing this kind 00:21:15.708,00:21:22.414 of attack right now. Alright so that was backscatter, now I'm going to go on to 00:21:22.414,00:21:27.553 misconfigurations, which, um, in particular I'm going to talk about bittorrent 00:21:27.553,00:21:32.558 misconfigurations. So if you want to download a torrent through bittorrent, you use, you 00:21:34.793,00:21:39.732 contact the... you typically contact the bittorrent distributed hash table, and they 00:21:39.732,00:21:46.472 will tell you the location of the torrent or some other bittorrent node that is closer 00:21:46.472,00:21:52.478 to the torrent that you want. However there can be malicious nodes in the hash... in the 00:21:52.478,00:21:58.017 distributed hash table, and they can lie to you about the location of the torrent, and if 00:21:58.017,00:22:02.521 this happens repeatedly over and over again it's going to be a lot harder for you to actually 00:22:02.521,00:22:08.494 find the torrent and get the latest episode of Game of Thrones or whatever you want to 00:22:08.494,00:22:14.266 watch. Um, so this attack is called and index poisoning attack where you're purposely 00:22:14.266,00:22:19.271 inserting fake information into or about what's in the hash table. And, uh, so... what 00:22:22.908,00:22:28.147 happens after you receive this false information is you try to set up a connection. So when 00:22:28.147,00:22:34.820 people send bittorrent packets to the network telescope we get an idea of what they're... what 00:22:34.820,00:22:39.825 torrents they're trying to download. And, so this is some data from July 2012, and in 00:22:42.061,00:22:49.068 terms of the most packets associated with a torrent and you'll notice that a lot of them 00:22:49.068,00:22:54.073 happen to have the word China in their name. And a year later we see about the same thing. Um, so 00:22:57.977,00:23:02.915 this.. this attack doesn't seem to be going on right now or if it is it's a lot slower, but we 00:23:07.653,00:23:12.658 have...but, oh, I'm sorry. But typically in this China attack, um, typically the IP addresses 00:23:16.428,00:23:21.433 that are asked for the torrents, uh, satisfy this equation, or this set of IP addresses. Um, 00:23:26.238,00:23:32.011 basically they're in certain /13 blocks and so it seems that they're being generated 00:23:32.011,00:23:37.016 programatically with a buggy psuedo random number generator, um, and this, uh, attack is, 00:23:39.918,00:23:44.423 sometimes we see a lot of packets from it and sometimes we don't see any, and currently 00:23:44.423,00:23:49.428 we're not seeing very many. Um, but, um, more recently, about a year ago we saw a huge spike in 00:23:53.632,00:24:00.205 the amount of bittorrent traffic we see. Uh, we're getting traffic from, um, about 250 00:24:00.205,00:24:05.210 times more, um, IP addresses per.. per hour. And we don't really know everything that's 00:24:09.281,00:24:14.286 going on, to try to investigate this, um, we... we were... so just as a recap, when you want 00:24:18.891,00:24:24.997 the torrent you ask... someone, the bittorrent... someone the DHT node, the location of the 00:24:24.997,00:24:30.636 torrents, and they come back with the locations and then they potentially contact our network 00:24:30.636,00:24:36.842 telescope. Um, so we want to know who's spreading this false information, so this node... we 00:24:36.842,00:24:41.847 can't really learn this by looking at the IBR, instead we can.. we can set up nodes 00:24:44.550,00:24:51.357 actually interact with the distributed hash table. So we set up two torrent... two, uh, 00:24:51.357,00:24:57.496 clients and examined what happened over two months and they both contacted our network 00:24:57.496,00:25:03.802 telescope, uh, fairly frequently, and so we looked at who was telling them... who was 00:25:03.802,00:25:10.409 telling our client to contact the network telescope, um, and the most popular client string 00:25:10.409,00:25:16.682 was a libtorrent one, but this only accounted for about 70% of the clients, and it's a pretty 00:25:16.682,00:25:21.687 popular client string among legitimate hosts as well. Most of the, um, IP addresses were in 00:25:25.991,00:25:31.563 China, but were in multiple AS's, so this wasn't too successful in identifying who 00:25:31.563,00:25:38.237 was actually sending this false information. But we did notice one... one really suspicious 00:25:38.237,00:25:43.242 behaviour, um, so in the hash table all the, um, nodes have an ID so that means that they think 00:25:47.045,00:25:52.050 that the nodes in... the IP addresses in our network telescope also have ID's and so 00:25:55.320,00:26:00.259 the ID's that they request, they, uh, all have 4 as the 3rd byte, um, so that's kind of 00:26:03.862,00:26:09.301 weird, and typically when you look at the location... when you receive locations, you receive 00:26:09.301,00:26:15.407 not just one location but multiple locations at a time, and this behaviour is similar 00:26:15.407,00:26:19.244 for a lot of other IP addresses that we see. So, um, we're receiving a lot of bittorrent 00:26:19.244,00:26:22.047 traffic asa result of a bug in... or a misconfiguration in a peer to peer, uh, network. Peer 00:26:22.047,00:26:27.052 to peer networks also cause a lot of traffic, um, as a result of a bug in one of the systems. 00:26:35.894,00:26:40.899 So if we look at the number of sources sending us traffic over time we notice some interesting 00:26:45.838,00:26:51.076 things like the Conficker outbreak, um, when we started seeing a lot of bittorrent 00:26:51.076,00:26:56.081 traffic, and then all of a sudden in October 2010 there was... the shape of the graph 00:26:59.685,00:27:05.290 definitely changes, it's very diagonal and we weren't really sure what was happening here. 00:27:05.290,00:27:10.295 And we were able to identify the responsible payload, and certain bytes seemed fixed and then we 00:27:12.397,00:27:17.302 could hypothesize about what the other ones were using it for, but we still had no idea what 00:27:17.302,00:27:23.475 this was... what this was, and the popular ports, the most frequently used ports, we 00:27:23.475,00:27:29.681 weren't really sure what those were either, um, but we did notice that in terms of the 00:27:29.681,00:27:36.054 sources sending them, they were mostly located... a large number of them were located in China, 00:27:36.054,00:27:41.059 in fact we received in a month's time, uh, traffic from 30% of all BGP announced IP addresses 00:27:43.962,00:27:48.967 in China, so this is, like, huge. Um... also interestingly, um, when the USA category, four 00:27:54.072,00:27:59.077 IP addresses belong to the UCSD computer science department, where I went to school. So we 00:28:01.346,00:28:05.317 were able to coordinate with someone who could monitor the traffic going in and out of 00:28:05.317,00:28:10.322 UCSD's network to basically capture traffic from these IP addresses. Um, we made... this 00:28:12.558,00:28:18.330 ensured that this traffic wasn't spoofed and was actually happening. So, um, all of the 00:28:18.330,00:28:24.369 UCSD machines basically contacted a common IP address and in response they got a 00:28:24.369,00:28:29.374 pretty large packet. Um, and, based on this packet they sent about 40 more packets to 00:28:32.978,00:28:37.983 different machines and they were all encoded in this original big packet. Um, and it wasn't just 00:28:41.620,00:28:48.527 one packet, they were exchanging a lot of packets, and eventually the UCSD machines would receive 00:28:48.527,00:28:53.532 a packet like this, and so this packet is from 113.70.40.122, but instead they would respond 00:28:58.036,00:29:02.975 to 122.40.70.113, just immediately after receiving this packet. So, and this packet met 00:29:06.345,00:29:11.350 the, um, BPF filter that we had used to identify all of this traffic. So this is a byte order 00:29:14.720,00:29:19.725 bug, and is what we were receiving a lot of this traffic. Um, we identified that this 00:29:22.861,00:29:29.635 software bug was in Qihoo 360, um, and if you look at their license agreement, and this is 00:29:29.635,00:29:34.706 like the most popular security software in China, and if you look at their license agreement 00:29:34.706,00:29:41.046 you see that they will use peer to peer technology to update program modules, malware 00:29:41.046,00:29:45.484 definition, databases and components of the software. So basically we're getting 00:29:45.484,00:29:52.157 information about when people were updating, getting software updates. Um, we contacted, uh, 00:29:52.157,00:29:57.162 Qihoo and told them "hey, like, you have this bug", um, and so then we could see how long it 00:30:00.832,00:30:05.837 took for them to fix it. Um, the traffic had one kind of weird thing which was like every 4 to 00:30:09.007,00:30:14.012 5 weeks, um, there was a large spike, probably related to, um, big update events, but there 00:30:16.515,00:30:23.388 wasn't a big decrease following one of these, instead it decreased, like, about a month 00:30:23.388,00:30:28.894 later, and this date was about the same time a new version of Qihoo was available on their 00:30:28.894,00:30:33.899 website. So, uh, we're still getting some traffic, but in general, um, this bug has been 00:30:37.135,00:30:42.140 fixed. Alright, now on to the last part, which is in, um, looking at some unknown traffic. 00:30:45.310,00:30:51.416 So, uh, the bug was also an example of unknown traffic, but I'll go through another one. 00:30:51.416,00:30:56.421 So... basically if you investigate some of these packets a little bit more you 00:30:59.991,00:31:05.864 might be able to tell more... identify where they're from. So, in the beginning when I 00:31:05.864,00:31:10.502 explained the unknown category, I said "here's a packet, it's payload appears to be 00:31:10.502,00:31:15.507 encrypted." Um, so, well... basically this one IP address was getting a lot of traffic 00:31:19.511,00:31:25.717 sent to it and it all seemed to be encrypted based on the entropy of the bytes. Um, but we 00:31:25.717,00:31:30.288 did a byte wise analysis, of like, what is the first byte, second byte, third byte, and 00:31:30.288,00:31:36.394 stuff like that, and we found that this byte here always seemed to be somewhat related to 00:31:36.394,00:31:41.399 the whole length of the packet itself. Um, and then I read a bunch of white papers and found 00:31:45.537,00:31:50.542 that the sality botnets, their, um, encryption is such that these four bytes are an RC4 key 00:31:52.744,00:31:58.850 used to decrypt or encrypt the entire rest of the message, um, so when we decrypted all... 00:31:58.850,00:32:04.823 almost all the packets to this one IP adress, we found that they all sort of looked... 00:32:04.823,00:32:09.828 started like this. So, um, this confirmed that this is a sality commanding control packet. So 00:32:12.397,00:32:17.235 this is kind of interesting because, you're like, okay, I understand why someone would 00:32:17.235,00:32:22.207 have a bug, or someone woud purposefully put false information into a bittorrent, 00:32:22.207,00:32:27.212 uh, DHT, or I understand how a byte order bug happens, but this also happens in peer to peer 00:32:29.614,00:32:34.753 botnets as well, and that's why we received that much... we received a lot of traffic. In 00:32:34.753,00:32:39.758 fact, um, if we look at, um, how many IP addresses were sending us traffic per month, basically 00:32:42.027,00:32:48.099 to this one IP address, we see about the same number of infections as, uh, semantic was 00:32:48.099,00:32:53.104 seeing in the earlier part of this decade. So, um, so in conclusion it's pretty likely 00:32:56.241,00:33:01.313 that you are transmitting internet background radiation, um, and, if you use network 00:33:01.313,00:33:05.150 telescopes or other technologies, you can find a whole bunch of interesting 00:33:05.150,00:33:10.155 things. Um, in addition to just looking at these, kind of, security related events, we can 00:33:12.324,00:33:19.197 also learn about the networks and machines generating the traffic, for example, um, you 00:33:19.197,00:33:24.603 can do outage detection with traffic reaching network telescopes. Um, this is a graph 00:33:24.603,00:33:29.608 from a paper that analyzed, um, events during the Arab spring, and as you can see the number of 00:33:31.710,00:33:38.283 packets coming from Libia went down to 0 at certain periods of time, um, and this corresponded 00:33:38.283,00:33:43.288 to known times that the, uh, Libian government had pulled the plug on, um, their country's 00:33:46.725,00:33:51.730 internet. Um, we can also look at path changes, so, um, when you send a packet on the 00:33:53.999,00:33:59.938 internet, there's the TTL field that is decremented by every intermediate router to prevent 00:33:59.938,00:34:04.876 routig loops, but based on this you can infer how many hops away the source it, so if this 00:34:06.945,00:34:12.984 changes, then you know that a path change occurred, and this can help you analyze outages or 00:34:12.984,00:34:17.989 understand routing dynamics. Um, so looking at some of this stuff we can see like if you have 00:34:21.026,00:34:26.565 traffic like this where the TTL is about the same value over time, it's probably using the 00:34:26.565,00:34:31.569 same path, but if it looks more like this, then, um, you're... you're, you know that the path 00:34:33.939,00:34:38.944 has changed. Um, and then as a final example we can also look at DHCP lease duration, so when 00:34:41.313,00:34:46.484 you join a network using DHCP you announce that you want to join the network and you're 00:34:46.484,00:34:51.990 given an IP address to use, and typically at some point in time you are no longer... you no 00:34:51.990,00:34:57.896 longer use that IP address, which means, uh, at a future time someone else can use the 00:34:57.896,00:35:04.803 same IP address. So we can look at DHCP lease durations, um, we're using any traffic that has 00:35:04.803,00:35:10.208 some sort of ID, um, associated with the client, so if these are the packets received... you 00:35:10.208,00:35:16.781 receive over time, you know that the lease duration is at least this long and at most this long. 00:35:16.781,00:35:21.786 Um, so, as I noted before, bittorrent as ID's as well, so we can use bittorrent to 00:35:26.625,00:35:32.864 identify how long lease durations are for various autonomous systems, um, so, this 00:35:32.864,00:35:39.104 autonomous system, almost everything has a minumum lease suration of less than 7 days, 00:35:39.104,00:35:44.109 and this is very useful for understanding the effectiveness of blacklist, or how if people 00:35:47.012,00:35:52.017 are going to not be able to access the internet because you have blacklisted their IP. Um, 00:35:55.387,00:36:00.325 so, uh, so hopefully you enjoyed the talk today where we... we discussed some of the crazy 00:36:03.094,00:36:08.099 things that happen on the internet, and, uh... thank you. [applause] >> Hi, uh, very 00:36:23.114,00:36:28.019 fascinating research, and, uh, great presentation, thank you. Um, looking towards the future, 00:36:28.019,00:36:33.024 I noticed this was all IP v4, have you done any consideration of IP v6 based telescopes and do 00:36:35.393,00:36:40.398 you think it's practical with the sparseness of prefixes and v6 addresses? >> So, I haven't, 00:36:43.301,00:36:45.303 but some people wrote a research paper where they used an IP v6. They basically were able to 00:36:45.303,00:36:51.176 announce a covering prefix and basically capture everything that wasn't... other people 00:36:51.176,00:36:56.181 weren't announcing NBGP, and they didn't find as much, but I think as IP v6 evolves, I think 00:36:58.183,00:37:03.121 also this will evolve as well. >> Thank you. >> So, thank you, that's very convincing that this 00:37:08.860,00:37:15.266 is incredibly useful data. How can other security researchers get access to it? >> Um, so, I 00:37:15.266,00:37:20.271 know that the data that, uh, UCSD has, that it is available to academic researchers, you 00:37:22.674,00:37:28.079 might need to sign a bunch of things, but I don't... I don't know the whole process, um, but 00:37:28.079,00:37:33.084 you, I mean, you can start with your... if you have your own network too, so. >> Is there a 00:37:51.469,00:37:56.474 question over there? Thank you. [applause]