1 00:00:00,000 --> 00:00:03,292 TILLMANN WERNER: So welcome, everybody to my presentation. 2 00:00:03,292 --> 00:00:04,292 I'm Tillmann. 3 00:00:04,292 --> 00:00:07,292 I work for a company that deals with targeted attacks 4 00:00:07,292 --> 00:00:12,459 but today I'm going to be talking about something else. 5 00:00:12,999 --> 00:00:16,209 I'm going to be talking about one of my favorite topics, one 6 00:00:16,209 --> 00:00:19,000 of my hobbies, which is peer-to-peer botnets 7 00:00:19,000 --> 00:00:22,125 and they're interesting because they're designed 8 00:00:22,125 --> 00:00:25,792 to be resilient against attacks, right? 9 00:00:26,834 --> 00:00:30,334 I'm usually trying to attack botnets and have fun with them. 10 00:00:34,626 --> 00:00:35,999 Let's see. 11 00:00:36,000 --> 00:00:38,000 So, yeah, there's an addenda. 12 00:00:39,459 --> 00:00:40,751 Okay. 13 00:00:40,999 --> 00:00:45,501 Let's start with a quick introduction to peer-to-peer botnets. 14 00:00:45,751 --> 00:00:47,292 I guess most people in the room here are familiar 15 00:00:47,292 --> 00:00:49,999 with peer-to-peer networks in general. 16 00:00:54,292 --> 00:00:57,667 I mean, they're networks like BitTorrent and others and 17 00:00:57,667 --> 00:01:01,584 the usually the purpose is to build a decentralized infrastructure 18 00:01:01,584 --> 00:01:03,999 that's self-reorganizing. 19 00:01:03,999 --> 00:01:06,250 So if parts of the infrastructure go offline, 20 00:01:06,250 --> 00:01:09,834 you know, it recovers itself and so on. 21 00:01:09,999 --> 00:01:12,999 And usually people build peer-to-peer networks in general 22 00:01:12,999 --> 00:01:16,751 because they don't -- they want to get rid of any central components so 23 00:01:16,751 --> 00:01:20,292 the infrastructure cannot be taken down so easily. 24 00:01:23,167 --> 00:01:25,999 When you analyze a peer-to-peer network of some sort, 25 00:01:25,999 --> 00:01:29,083 you want to understand the protocol first. 26 00:01:29,167 --> 00:01:33,375 That's not too much of a problem for all the popular file share networks 27 00:01:33,375 --> 00:01:36,083 because they're well documented but if you look 28 00:01:36,083 --> 00:01:39,417 at peer-to-peer botnets they usually use their proprietary 29 00:01:39,417 --> 00:01:42,209 protocols and you have to look at the samples, do 30 00:01:42,209 --> 00:01:45,167 the reverse engineering and so on. 31 00:01:46,167 --> 00:01:50,083 But if you do that for several peer-to-peer networks you 32 00:01:50,083 --> 00:01:55,083 will at some point see that there are different approaches. 33 00:01:55,083 --> 00:01:56,709 One is based on gossiping. 34 00:01:56,709 --> 00:01:58,292 If you think about that you got all these different nodes that are 35 00:01:58,292 --> 00:01:59,751 interconnected somehow and you want 36 00:01:59,751 --> 00:02:03,250 to propagate some information in this peer-to-peer network, right? 37 00:02:03,250 --> 00:02:06,250 You can either do that by what we call gossiping so each peer 38 00:02:06,250 --> 00:02:09,999 kind of gossips information to its neighbors so it forwards 39 00:02:09,999 --> 00:02:14,999 information to all its neighbors and these do the same and so on. 40 00:02:14,999 --> 00:02:19,626 But you can -- if you think about that, that's probably not very effective, right, 41 00:02:19,626 --> 00:02:25,083 because probably several peers will receive information several times. 42 00:02:25,083 --> 00:02:26,999 So you fill up the network with more information than you actually 43 00:02:26,999 --> 00:02:28,918 want to or have to. 44 00:02:28,999 --> 00:02:31,999 So more advanced peer-to-peer networks use what people call 45 00:02:31,999 --> 00:02:34,083 an overlay network. 46 00:02:34,083 --> 00:02:35,999 So you have addressing on top of, you know, 47 00:02:35,999 --> 00:02:39,459 the general addressing what methods like IP so every peer has 48 00:02:39,459 --> 00:02:41,876 an ID or some sort of idea and then there 49 00:02:41,876 --> 00:02:45,751 is a routing method so you can address specific peers and if you want 50 00:02:45,751 --> 00:02:49,167 to send information to a specific peer, then, well, you can -- 51 00:02:49,167 --> 00:02:54,584 if you know its address you can route that through the peer-to-peer network. 52 00:02:54,584 --> 00:02:56,999 An example for that is eDunke (sp.). 53 00:03:00,751 --> 00:03:04,209 Every peer has a hash which is at the same time its ID its idea 54 00:03:04,209 --> 00:03:07,999 and then you can look up data and the hash tag. 55 00:03:07,999 --> 00:03:10,125 But I'm not going into deal about that. 56 00:03:12,626 --> 00:03:15,417 One important thing when we talk about peer-to-peer networks 57 00:03:15,417 --> 00:03:17,167 is bootstrapping. 58 00:03:17,167 --> 00:03:20,083 Bootstrapping is the process of establishing connectivity 59 00:03:20,083 --> 00:03:23,999 with the peer-to-peer network when the new peer comes online, 60 00:03:23,999 --> 00:03:27,083 and that's a very important aspect. 61 00:03:27,083 --> 00:03:28,918 It's a very important thing because when you think 62 00:03:28,918 --> 00:03:31,876 about that you want to get rid of any central entities 63 00:03:31,876 --> 00:03:34,834 in your peer-to-peer network, right? 64 00:03:34,834 --> 00:03:39,959 So it might not be a good idea to have a seed server to contact. 65 00:03:42,501 --> 00:03:45,959 That would be a central component, and you don't want to have that. 66 00:03:45,959 --> 00:03:48,918 So what people are doing -- they deliver a seed list, a seed list 67 00:03:48,918 --> 00:03:52,709 of other peers together with the node itself. 68 00:03:52,751 --> 00:03:57,584 With the executable -- it's executed on the node system. 69 00:03:57,584 --> 00:03:59,501 But what happens if these peers go offline 70 00:03:59,501 --> 00:04:01,999 for some reason or if they are not online, 71 00:04:01,999 --> 00:04:04,834 the computers have been switched or or something, 72 00:04:04,834 --> 00:04:09,999 then you need a fallback method and that's where it's getting interesting. 73 00:04:09,999 --> 00:04:13,709 And if you look at the box at the right-hand side, the third entry 74 00:04:13,709 --> 00:04:17,375 is Conficker which is a very famous or infamous piece 75 00:04:17,375 --> 00:04:22,125 of malware that was 2009 and is still very active and it used random 76 00:04:22,125 --> 00:04:26,999 scannings so it scanned the internet for other peers and, of course, 77 00:04:26,999 --> 00:04:30,542 there's no way to block that, right? 78 00:04:30,542 --> 00:04:37,167 There's no information that the bot realize on when it's first started. 79 00:04:37,292 --> 00:04:40,999 It just starts scanning the internet until it finds other peers and then it can 80 00:04:40,999 --> 00:04:43,250 learn other peers from that one and do that 81 00:04:43,250 --> 00:04:46,876 to establish connectivity within their network. 82 00:04:47,083 --> 00:04:48,083 Speaking. 83 00:04:48,083 --> 00:04:50,083 Of that box, that's my own private history 84 00:04:50,083 --> 00:04:53,876 of peer-to-peer botnets I analyzed. 85 00:04:53,876 --> 00:04:59,167 So I started in 2008 with the Storm worm, which used 86 00:04:59,167 --> 00:05:06,667 the eDunke or protocol together with some other people. 87 00:05:06,667 --> 00:05:08,250 Some of them are here in this room. 88 00:05:09,959 --> 00:05:12,876 There are other peer-to-peer botnets that are known. 89 00:05:12,876 --> 00:05:17,167 I think new gash was active in 2007 and maybe there were some others, 90 00:05:17,167 --> 00:05:22,083 but I think 2007 is the earliest that I know earlier. 91 00:05:22,626 --> 00:05:26,918 Then there was Waledac which people believe is the successor 92 00:05:26,918 --> 00:05:30,334 of the Storm worm because Storm -- it caught a lot 93 00:05:30,334 --> 00:05:33,292 of attention by researchers. 94 00:05:33,292 --> 00:05:35,501 And a lot of -- lots of security people tried to, you know, 95 00:05:35,501 --> 00:05:38,999 investigate Storm and tried to understand the protocols. 96 00:05:38,999 --> 00:05:41,334 Some even designed attacks -- how you can attack the peer-to-peer network 97 00:05:41,334 --> 00:05:45,083 to knock it offline, take it offline, take the nodes offline. 98 00:05:46,167 --> 00:05:50,083 So apparently the people behind it decided to abandon it 99 00:05:50,083 --> 00:05:52,626 at some point and turn -- or create 100 00:05:52,626 --> 00:05:56,999 a new botnet and that was called Waledac and it was not relying 101 00:05:56,999 --> 00:06:01,918 on any peer-to-peer infrastructure so no -- no eDunke anymore instead 102 00:06:01,918 --> 00:06:05,999 they implemented their own protocol which was very similar 103 00:06:05,999 --> 00:06:10,417 to -- maybe I shouldn't say very similar to eDunke but, you know, 104 00:06:10,417 --> 00:06:13,959 the overall concept behind the botnet had similar 105 00:06:13,959 --> 00:06:18,999 structures and design characteristics so that's why people said you it's 106 00:06:18,999 --> 00:06:22,083 probably as successful as Storm. 107 00:06:22,501 --> 00:06:26,209 Then I already mentioned Conficker because it started out that was 108 00:06:26,209 --> 00:06:29,709 a bot that was entirely centralized with its command 109 00:06:29,709 --> 00:06:32,375 and control infrastructure. 110 00:06:32,792 --> 00:06:36,417 Many of you have heard of the DGA the domain generation 111 00:06:36,417 --> 00:06:41,292 and it had names all the time and then try to contact -- resolve 112 00:06:41,292 --> 00:06:46,250 and contact that host and ask for basically updates. 113 00:06:46,999 --> 00:06:51,959 Later on these people switched to version C the third version. 114 00:06:51,959 --> 00:06:53,501 They switched to a peer-to-peer protocol 115 00:06:53,501 --> 00:06:56,083 as a fallback command and control channel 116 00:06:56,083 --> 00:06:59,083 because there was some effort to block access 117 00:06:59,083 --> 00:07:03,292 to the generator domain or they would lose their 8 million nodes 118 00:07:03,292 --> 00:07:05,167 botnet, right? 119 00:07:06,876 --> 00:07:08,584 So that was Conficker. 120 00:07:08,959 --> 00:07:13,083 In 2010, I believe late 2010 the Kelihos era that's 121 00:07:13,083 --> 00:07:16,667 a bot that's known as Asprox. 122 00:07:17,876 --> 00:07:22,834 I think it's the other most well-known name. 123 00:07:22,999 --> 00:07:26,959 And that again is a successor of Waledac which was taken 124 00:07:26,959 --> 00:07:29,999 down by some people and myself. 125 00:07:31,751 --> 00:07:36,999 With a recipe attack and I will talk about that in a minute. 126 00:07:37,459 --> 00:07:41,999 That botnet was taken away from them so again they created 127 00:07:41,999 --> 00:07:43,792 a new one. 128 00:07:43,876 --> 00:07:46,375 And that was call Kelihos A, and it's interesting 129 00:07:46,375 --> 00:07:49,334 because if you look at the list it was attacked 130 00:07:49,334 --> 00:07:51,751 as well with success. 131 00:07:51,751 --> 00:07:55,209 So they created Kelihos B a successor and tried to fix some stuff. 132 00:07:55,209 --> 00:07:56,834 That was taken down as well. 133 00:07:57,167 --> 00:08:01,999 And again, they created Kelihos C, a third version. 134 00:08:01,999 --> 00:08:04,083 We attacked that as well. 135 00:08:04,959 --> 00:08:06,999 It wasn't too successful. 136 00:08:06,999 --> 00:08:08,999 It somewhat survived because we weren't able to own 137 00:08:08,999 --> 00:08:12,083 all the peers and just recently they changed something 138 00:08:12,083 --> 00:08:16,876 in the protocol and added private/public key encryption to it. 139 00:08:16,876 --> 00:08:19,751 It doesn't make sense at all because, you know, you might want 140 00:08:19,751 --> 00:08:22,626 to enscript your traffic but you can't do this 141 00:08:22,626 --> 00:08:26,999 with -- it doesn't do public/private key stuff because the peers have 142 00:08:26,999 --> 00:08:31,584 to generate their own keys and exchange keys and is to on. 143 00:08:32,626 --> 00:08:34,459 Anybody can do it. 144 00:08:36,918 --> 00:08:41,959 Anyway, okay, and then in 2011 there was the minor botnet 145 00:08:41,959 --> 00:08:45,459 and I will show you some protocol example 146 00:08:45,459 --> 00:08:47,209 for that. 147 00:08:47,626 --> 00:08:51,083 A really stupid piece of malware that was written in dot.net 148 00:08:51,083 --> 00:08:54,584 if I'm not many and the protocol was HTTPS protocol 149 00:08:54,584 --> 00:08:59,542 and they made many mistakes so it was trivial to take down. 150 00:09:01,334 --> 00:09:02,667 Okay. 151 00:09:05,999 --> 00:09:09,083 The remaining two ZeroAccess and Zeus -- they're still 152 00:09:09,083 --> 00:09:12,209 around and they're really successful. 153 00:09:12,209 --> 00:09:15,292 They're some of the most -- the biggest and most traveling botnets 154 00:09:15,292 --> 00:09:18,334 that are around these days and they're mostly used 155 00:09:18,334 --> 00:09:22,417 for dropping other malware on the infected system. 156 00:09:23,959 --> 00:09:28,626 It's used to deploy other malware like click bots. 157 00:09:29,792 --> 00:09:35,792 It's split into seven or eight separate botnets. 158 00:09:36,999 --> 00:09:39,417 Yeah, I don't know why they have some affiliate program 159 00:09:39,417 --> 00:09:40,999 or something. 160 00:09:41,083 --> 00:09:42,709 They distinguished 64 and 32-bit systems 161 00:09:42,709 --> 00:09:45,334 because they want to be able to I don't know, inject 162 00:09:45,334 --> 00:09:47,834 into other processes and that might make sense 163 00:09:47,834 --> 00:09:50,667 to maintain two separate structures. 164 00:09:51,999 --> 00:09:54,709 Going back to my slide here. 165 00:09:55,083 --> 00:09:58,334 Obviously, people build peer-to-peer networks because they want to -- 166 00:09:58,334 --> 00:10:02,501 they have the same goals with other peer-to-peer networks. 167 00:10:02,918 --> 00:10:06,417 They want to create a resilient infrastructure that's resilient, 168 00:10:06,417 --> 00:10:10,375 again, takeover attempts or takedown attempts. 169 00:10:10,501 --> 00:10:13,375 So that's the goal and that's why, you know, 170 00:10:13,375 --> 00:10:16,999 they're getting somewhat popular. 171 00:10:17,709 --> 00:10:20,751 I'm sure there are other peer-to-peer botnets out there that I don't have 172 00:10:20,751 --> 00:10:22,083 on my list. 173 00:10:22,083 --> 00:10:24,083 I'm aware of a few but I haven't looked 174 00:10:24,083 --> 00:10:28,042 at them so I'm not going to talk about them. 175 00:10:28,042 --> 00:10:31,834 Interestingly, for I think all -- yeah, all botnets that you've seen 176 00:10:31,834 --> 00:10:36,999 on the previous list, the architecture is not -- peeler prepare. 177 00:10:37,125 --> 00:10:38,918 It's a hybrid architecture. 178 00:10:38,959 --> 00:10:39,999 It's what you see here. 179 00:10:39,999 --> 00:10:43,042 The thing at the bottom is the actual peer-to-peer network 180 00:10:43,042 --> 00:10:46,792 and the dash lines represent a peer being in the peer 181 00:10:46,792 --> 00:10:50,375 of another peer but when they want to receive commands 182 00:10:50,375 --> 00:10:54,209 for I don't know sending out spam or something like that, 183 00:10:54,209 --> 00:10:57,375 they still reach out to central components and 184 00:10:57,375 --> 00:11:01,709 the boxes you see in the middle -- you see that? 185 00:11:01,999 --> 00:11:04,959 Yeah, the botnets you see in the middle are proxy servers so 186 00:11:04,959 --> 00:11:07,626 they have another layer in between like systems 187 00:11:07,626 --> 00:11:10,999 like burner systems so some of the proxy servers get taken 188 00:11:10,999 --> 00:11:14,542 down they can easily replace them without losing their command 189 00:11:14,542 --> 00:11:17,083 and control infrastructure. 190 00:11:17,083 --> 00:11:19,918 And then there's a command and control server on top that 191 00:11:19,918 --> 00:11:22,542 is the actual back end, okay. 192 00:11:22,542 --> 00:11:26,083 There might actually be multilayers between the peer-to-peer network 193 00:11:26,083 --> 00:11:29,209 and the C2 but, well, unless you get access to one 194 00:11:29,209 --> 00:11:33,501 of the proxy servers you don't see what's behind it. 195 00:11:33,501 --> 00:11:34,999 But we're fairly certain in these cases 196 00:11:34,999 --> 00:11:38,709 the proxy servers because, for example, when they speak HTTP and 197 00:11:38,709 --> 00:11:41,417 they respond with an engine X banner and -- well, 198 00:11:41,417 --> 00:11:44,334 you can't be certain it's a proxy. 199 00:11:46,626 --> 00:11:48,459 Most likely at least. 200 00:11:48,542 --> 00:11:49,542 Okay. 201 00:11:49,834 --> 00:11:52,250 Let's take a look at some protocol examples so you get 202 00:11:52,250 --> 00:11:56,083 an idea what these people create and come up with. 203 00:11:56,083 --> 00:12:00,999 This is the already mentioned minor bot and as I've said that was really 204 00:12:00,999 --> 00:12:04,792 a trivial and also stupid protocol. 205 00:12:05,125 --> 00:12:08,999 It was HTTP-based and the bot -- it wasn't 206 00:12:08,999 --> 00:12:14,999 a full HTTP server but just very rudimentary one. 207 00:12:14,999 --> 00:12:19,459 It was backed up by the fall system if you would issue a get request 208 00:12:19,459 --> 00:12:22,999 with this search parameter and the -- you know, 209 00:12:22,999 --> 00:12:26,999 the IP list two value that file name would be looked 210 00:12:26,999 --> 00:12:30,751 up in the respective directory and then delivered 211 00:12:30,751 --> 00:12:33,459 to the requesting host. 212 00:12:33,834 --> 00:12:34,834 Okay? 213 00:12:34,999 --> 00:12:37,250 So it was really -- I mean, if there were other files 214 00:12:37,250 --> 00:12:39,667 on the file system in that directory you could request 215 00:12:39,667 --> 00:12:42,250 them as well with this method and it was probably not intended 216 00:12:42,250 --> 00:12:43,667 by them. 217 00:12:44,209 --> 00:12:46,999 Yeah, anyhow, so you can see their response there. 218 00:12:47,334 --> 00:12:50,375 I think the engine X server is fake. 219 00:12:50,375 --> 00:12:51,999 They just copied that from somewhere and sent it 220 00:12:51,999 --> 00:12:53,918 with the responses. 221 00:12:55,834 --> 00:12:58,876 And you can see at the bottom is the actual payload, a list 222 00:12:58,876 --> 00:13:01,918 of other peers, a list of IP addresses. 223 00:13:01,999 --> 00:13:06,125 And minor always responds with the entire peer list that it has, 224 00:13:06,125 --> 00:13:09,375 all peers that it knows about. 225 00:13:09,999 --> 00:13:11,999 And that's stupid because it can be huge and also that 226 00:13:11,999 --> 00:13:14,125 makes it easy for us, you know, to enumerate 227 00:13:14,125 --> 00:13:17,584 the bots and understand how many affected machines there are so forth 228 00:13:17,584 --> 00:13:20,999 and so on, if you want to attack it, for example. 229 00:13:21,250 --> 00:13:24,334 I mean, this is only the start, right? 230 00:13:24,334 --> 00:13:26,959 You can see it's 11K in size. 231 00:13:26,959 --> 00:13:28,209 And this is by far not the largest request 232 00:13:28,209 --> 00:13:30,584 or response we've seen. 233 00:13:40,501 --> 00:13:42,751 You can try to recreate the other peer-to-peer list and talking 234 00:13:42,751 --> 00:13:44,667 to other nodes by a graph. 235 00:13:44,667 --> 00:13:48,083 You can recreate that by crawling peers. 236 00:13:48,083 --> 00:13:49,918 We will talk more about crawling. 237 00:13:49,918 --> 00:13:51,959 I mean, that's a topic of the talk, right? 238 00:13:51,959 --> 00:13:53,876 We'll talk more about that in a minute. 239 00:13:53,999 --> 00:13:56,626 But if you request a peer list from one peer, 240 00:13:56,626 --> 00:14:00,459 you can recreate these links in the graph and then take 241 00:14:00,459 --> 00:14:04,834 the response -- the IP address from the response that you got 242 00:14:04,834 --> 00:14:09,999 from these and then plot pretty pictures like this one here. 243 00:14:09,999 --> 00:14:14,083 I think that's about 37K nodes, which is only a subset of the -- 244 00:14:14,083 --> 00:14:17,834 of the minor botnet at that time. 245 00:14:17,834 --> 00:14:21,999 But it takes like ages to render these pictures here so we 246 00:14:21,999 --> 00:14:27,167 only did that for a subset of the nodes we found. 247 00:14:29,792 --> 00:14:33,501 You can see that other peer-to-peer protocols are also similar. 248 00:14:33,501 --> 00:14:35,083 This is version 1. 249 00:14:35,083 --> 00:14:36,792 There's two versions out there. 250 00:14:36,792 --> 00:14:38,125 This is the earlier version. 251 00:14:38,125 --> 00:14:40,417 And they define, again it's a propriety protocol that 252 00:14:40,417 --> 00:14:42,334 they implemented. 253 00:14:42,334 --> 00:14:46,999 They defined 6 different message types and one is a get L, which means get list, 254 00:14:46,999 --> 00:14:49,999 get peer list from the other peers and the red 0 255 00:14:49,999 --> 00:14:53,083 is the return peer list message. 256 00:14:53,501 --> 00:14:58,083 This is what you get when you reverse engineer the message form 257 00:14:58,083 --> 00:15:00,999 and decode it and pause it. 258 00:15:00,999 --> 00:15:02,083 It's not plain text. 259 00:15:02,125 --> 00:15:07,709 I think the version 1 had a 4 bite K and used that hash 260 00:15:07,709 --> 00:15:11,834 to decrypt its messages but it was always 261 00:15:11,834 --> 00:15:18,542 the same key so it was basically an encryption with a static key and 262 00:15:18,542 --> 00:15:25,626 the other version just used Axor with another key, version 2. 263 00:15:25,626 --> 00:15:27,083 So if -- if you undue the encryption you end 264 00:15:27,083 --> 00:15:29,375 up with something like this. 265 00:15:29,375 --> 00:15:32,542 And you can see here in the case of version 1 266 00:15:32,542 --> 00:15:38,542 a peer list has 256 entries so it always returns up to 256 entries, 267 00:15:38,542 --> 00:15:42,667 but since the botnet is so -- is large enough, 268 00:15:42,667 --> 00:15:48,626 every peer has always more than 256 entries at any time. 269 00:15:48,667 --> 00:15:51,334 So whenever you ask a peer for its peer list, you 270 00:15:51,334 --> 00:15:54,999 will get these -- most likely these 250 entries. 271 00:15:54,999 --> 00:15:57,083 And you can see there's some order there. 272 00:15:57,083 --> 00:16:00,792 So the first -- the first number is a time stamp or a time delta so 273 00:16:00,792 --> 00:16:03,709 to speak because the botnet favors peers that 274 00:16:03,709 --> 00:16:06,709 have recently been active and that makes sense 275 00:16:06,709 --> 00:16:10,999 because you don't want to keep -- maybe if that is your strategy, 276 00:16:10,999 --> 00:16:14,834 you don't want to keep like peers from the stone age that's 277 00:16:14,834 --> 00:16:18,792 in your list that may be offline so the entry becomes invalid 278 00:16:18,792 --> 00:16:22,375 and say you might want to favor peers that have recently 279 00:16:22,375 --> 00:16:26,250 become online that you have recently talked to so that's why 280 00:16:26,250 --> 00:16:30,083 they sort this peer list by the time delta and then return 281 00:16:30,083 --> 00:16:32,999 the 256 most recent ones you. 282 00:16:33,999 --> 00:16:40,542 They changed this protocol a little bit in version 2 so this is version 2. 283 00:16:40,918 --> 00:16:44,125 You can see there are again these two message types. 284 00:16:44,209 --> 00:16:46,999 I've already mentioned the encryption is likely different 285 00:16:46,999 --> 00:16:50,542 but for the most part the protocol is very similar so there 286 00:16:50,542 --> 00:16:54,334 is get L and red L and you have the time stamps and the IP address 287 00:16:54,334 --> 00:16:58,999 but they figured they don't need to send back 256 IP addresses. 288 00:16:58,999 --> 00:17:00,584 That's way too much, you know. 289 00:17:00,584 --> 00:17:03,959 It's sufficient if you respond with only 16 IP addresses. 290 00:17:03,959 --> 00:17:05,999 That makes the messages smaller so, you know, 291 00:17:05,999 --> 00:17:09,334 less overall communication in the botnet. 292 00:17:09,417 --> 00:17:12,167 And the reason is -- I mean, the version 2 is really 2. 293 00:17:12,334 --> 00:17:15,999 We've crawled some of the -- some of the botnets and they count 294 00:17:15,999 --> 00:17:18,083 like 3.7 million. 295 00:17:18,083 --> 00:17:19,751 I think that was the count we got. 296 00:17:19,751 --> 00:17:22,459 3.7 million, in fact, machines and if you have 3.7 million machines talking 297 00:17:22,459 --> 00:17:25,667 to each other that's a lot of traffic so you might want 298 00:17:25,667 --> 00:17:28,083 to reduce the message size. 299 00:17:28,501 --> 00:17:31,375 So that's what they did but if you take a look 300 00:17:31,375 --> 00:17:34,083 at the IP addresses, you might notice that 301 00:17:34,083 --> 00:17:37,334 the last one looks a little strange. 302 00:17:37,334 --> 00:17:39,334 It's very high, and that is because they do some 303 00:17:39,334 --> 00:17:41,250 data duplication. 304 00:17:41,250 --> 00:17:42,667 You don't want to -- or multiple entries 305 00:17:42,667 --> 00:17:45,083 with the same IP address in your peer list obviously 306 00:17:45,083 --> 00:17:47,959 because if you allow that, it's trivial for other people 307 00:17:47,959 --> 00:17:51,375 to poison your peer list and inject one entry multiple times and overwrite 308 00:17:51,375 --> 00:17:53,999 all the legitimate ones and then you're not connected 309 00:17:53,999 --> 00:17:56,709 to the peer-to-peer botnet anymore. 310 00:17:57,792 --> 00:18:01,459 That's why they do D duplication and in order to do they sort 311 00:18:01,459 --> 00:18:05,375 the IP addresses and then, you know, go over the sorted list and 312 00:18:05,375 --> 00:18:09,417 if they have two consecutive entries that have the same IP address 313 00:18:09,417 --> 00:18:11,584 they kick one out. 314 00:18:11,709 --> 00:18:15,375 But because IP addresses are at least on PCs stored 315 00:18:15,375 --> 00:18:19,542 in little -- you know, and they sort them you have 316 00:18:19,542 --> 00:18:25,167 in the result and these IP addresses, it in the response. 317 00:18:25,834 --> 00:18:29,209 What's interesting is they do that but they don't filter 318 00:18:29,209 --> 00:18:33,083 out invalid IP addresses so when you crawl the botnet you come 319 00:18:33,083 --> 00:18:37,083 across an IP like 255, 255, 255 so all that's set with obviously 320 00:18:37,083 --> 00:18:40,083 is an invalid IP address but it regularly shows 321 00:18:40,083 --> 00:18:43,083 up in these lists because when you sort the list, 322 00:18:43,083 --> 00:18:47,083 in increasing order then it's the topmost entry and it's always 323 00:18:47,083 --> 00:18:49,751 included and they have some other garbage 324 00:18:49,751 --> 00:18:53,918 in there but for some reason they don't filter these entries which 325 00:18:53,918 --> 00:18:55,834 is interesting. 326 00:18:58,292 --> 00:18:59,709 Okay. 327 00:18:59,999 --> 00:19:04,999 Let's talk about crawling so -- I mean, crawling is nothing else 328 00:19:04,999 --> 00:19:08,999 but recursively enumerating peers. 329 00:19:08,999 --> 00:19:11,834 You start with one peer you request it's peer list. 330 00:19:11,876 --> 00:19:13,292 You take a look at the response and do the same 331 00:19:13,292 --> 00:19:15,999 for all the returned addresses, right? 332 00:19:15,999 --> 00:19:21,083 And so on until you, you know, want to go offline or I don't know. 333 00:19:21,626 --> 00:19:26,209 So that's all -- all that crawling is, but you really want to think 334 00:19:26,209 --> 00:19:30,501 about crawling strategy, and one -- one important thing 335 00:19:30,501 --> 00:19:32,999 is crawling speed. 336 00:19:32,999 --> 00:19:34,999 So ideally we would be able to take a snapshot 337 00:19:34,999 --> 00:19:38,999 of the current peer-to-peer graph and then, you know, enumerate 338 00:19:38,999 --> 00:19:43,250 the peers in that snapshot but that's not possible. 339 00:19:43,250 --> 00:19:46,167 First off because, you know, you have to do that actively. 340 00:19:46,167 --> 00:19:46,999 You have to send out requests and process the responses 341 00:19:46,999 --> 00:19:48,999 and that takes time. 342 00:19:48,999 --> 00:19:49,999 And while you're doing that, the structure 343 00:19:49,999 --> 00:19:52,792 of the graph might be changing, right? 344 00:19:52,792 --> 00:19:55,292 Peers might go offline, new peers might come online so you 345 00:19:55,292 --> 00:19:58,999 will never be able to get that snapshot, right? 346 00:19:59,209 --> 00:20:02,876 But to come closest to that, you want to be as quickly as possible. 347 00:20:04,792 --> 00:20:05,999 Yeah. 348 00:20:05,999 --> 00:20:07,334 And when you do that, you have to think about things 349 00:20:07,334 --> 00:20:09,292 like unresponsive peers. 350 00:20:09,292 --> 00:20:09,999 What if you -- if somebody sends you 351 00:20:09,999 --> 00:20:12,542 an IP address back that's offline? 352 00:20:12,542 --> 00:20:13,876 How do you deal with that? 353 00:20:13,876 --> 00:20:16,417 Do you want to keep it in the list and try again later? 354 00:20:16,417 --> 00:20:16,542 I mean, you don't know why it's 355 00:20:16,542 --> 00:20:18,250 unresponsive, right? 356 00:20:18,250 --> 00:20:19,501 You might lose packets. 357 00:20:19,501 --> 00:20:21,999 The internet might be overwhelmed with your traffic because you tried 358 00:20:21,999 --> 00:20:24,167 to be as fast as possible. 359 00:20:24,209 --> 00:20:26,501 You don't know why it's unresponsive. 360 00:20:26,751 --> 00:20:29,501 Yeah, there's some hiccup on the internet and you might want 361 00:20:29,501 --> 00:20:33,375 to keep it in the list and try again later but you can see it's getting 362 00:20:33,375 --> 00:20:35,918 a little bit more complex. 363 00:20:35,918 --> 00:20:38,542 And what you see in the top right corner is the result 364 00:20:38,542 --> 00:20:44,250 of us crawling peer-to-peer Zeus which is also known by game over by the way. 365 00:20:45,709 --> 00:20:48,918 The red line, the red graph shows you did number 366 00:20:48,918 --> 00:20:52,000 of IP graphs and we call them known peers but most 367 00:20:52,000 --> 00:20:55,999 of them are not actually reachable, although, the protocol 368 00:20:55,999 --> 00:20:59,334 is pretty robust so they don't include any invalid IP 369 00:20:59,334 --> 00:21:01,375 addresses in it. 370 00:21:01,999 --> 00:21:04,334 But most of them are actually reachable, so 371 00:21:04,334 --> 00:21:07,083 if you count only the peers that you can talk to, 372 00:21:07,083 --> 00:21:11,375 you end up with a green line and you can see it's way less. 373 00:21:11,834 --> 00:21:13,709 And you see -- if you see these little dips 374 00:21:13,709 --> 00:21:17,083 in the red line, that is because for Zeus -- peer-to-peer Zeus 375 00:21:17,083 --> 00:21:20,999 we told a strategy where we cleaned up the list of known peers from time 376 00:21:20,999 --> 00:21:23,999 to time so we said this is unresponsive for too long 377 00:21:23,999 --> 00:21:26,167 and kick them out and keep the list small 378 00:21:26,167 --> 00:21:30,083 because otherwise you have an endlessly growing list. 379 00:21:30,417 --> 00:21:35,417 What you can also see is that the green line converges very quickly 380 00:21:35,417 --> 00:21:38,751 and that means you have probably reached 381 00:21:38,751 --> 00:21:42,375 the number you are able to crawl. 382 00:21:42,375 --> 00:21:45,083 And that gives you some size estimation, okay? 383 00:21:45,083 --> 00:21:46,083 Okay. 384 00:21:50,667 --> 00:21:52,999 There's some fancy animation here. 385 00:21:53,083 --> 00:22:00,834 You might wonder why anybody wants to crawl peer-to-peer botnets at all? 386 00:22:00,834 --> 00:22:03,125 I mean, it's interesting to play with that. 387 00:22:03,125 --> 00:22:05,999 It's interesting to understand the protocol and reimplement it 388 00:22:05,999 --> 00:22:09,834 and so on and then play with the botnet and maybe, you know, 389 00:22:09,834 --> 00:22:12,501 snoop on what they're doing. 390 00:22:12,999 --> 00:22:15,959 But we usually have other goals. 391 00:22:15,959 --> 00:22:19,083 I mean, reconnaissance is usually the foremost thing, right? 392 00:22:19,083 --> 00:22:20,250 Why do you want to learn something 393 00:22:20,250 --> 00:22:23,999 about the peer-to-peer botnet and the infected machines? 394 00:22:24,250 --> 00:22:26,542 I've already mentioned size estimation. 395 00:22:26,542 --> 00:22:30,334 If you talk to the press, they really like high numbers. 396 00:22:30,334 --> 00:22:32,999 So if you tell them, you know, it's 10 million infected machines large 397 00:22:32,999 --> 00:22:35,918 they will love that but next time you have to tell them 398 00:22:35,918 --> 00:22:38,250 the botnet is 15 million infected machines large 399 00:22:38,250 --> 00:22:39,999 or something. 400 00:22:40,334 --> 00:22:43,459 So, yes, size estimation is one thing but you have 401 00:22:43,459 --> 00:22:48,876 to be aware that you can only crawl a subset of the infected machines. 402 00:22:48,876 --> 00:22:51,501 Most of them, obviously, are, you know, behind NAT, behind gateway, 403 00:22:51,501 --> 00:22:53,999 you can't directly talk them. 404 00:22:53,999 --> 00:22:55,584 You can't directly reach them from the internet, right in they're part 405 00:22:55,584 --> 00:22:57,626 of the peer-to-peer botnet. 406 00:22:57,709 --> 00:22:59,792 They're like leaves in the graph. 407 00:23:02,584 --> 00:23:04,501 It's not trivial. 408 00:23:04,501 --> 00:23:06,542 If you do what we did for peer-to-peer Zeus 409 00:23:06,542 --> 00:23:09,999 and you get a number of green lines that we talk 410 00:23:09,999 --> 00:23:13,999 to you have to extrapolate from that number to -- to get 411 00:23:13,999 --> 00:23:17,417 to a more realistic size estimation. 412 00:23:17,709 --> 00:23:20,501 Infection tracking is something that people are doing who 413 00:23:20,501 --> 00:23:23,999 want to remediate or, you know, kill these botnets. 414 00:23:23,999 --> 00:23:26,083 They want to learn about infected machines 415 00:23:26,083 --> 00:23:30,125 and then can report the IP addresses to let's say ISPs to pass 416 00:23:30,125 --> 00:23:33,999 the information on their customers and hopefully they clean 417 00:23:33,999 --> 00:23:36,501 up the information so the botnet dies 418 00:23:36,501 --> 00:23:40,834 off but I've never really seen that be successful. 419 00:23:41,083 --> 00:23:43,209 Preponderance geographic distribution is something you can also get 420 00:23:43,209 --> 00:23:44,709 from that. 421 00:23:44,709 --> 00:23:48,250 If you have all the IP addresses you can do geolocation lookups and then 422 00:23:48,250 --> 00:23:52,459 if you want to, plot them on the map like what we did here and I want 423 00:23:52,459 --> 00:23:57,959 to mention Mark and some other guys who created the code we base this on. 424 00:23:57,999 --> 00:24:00,626 This is actually a live thing so we -- we send 425 00:24:00,626 --> 00:24:06,792 in a live feed of the crawling results and place these nice little red dots. 426 00:24:07,125 --> 00:24:08,125 Okay. 427 00:24:08,125 --> 00:24:09,667 But what we're usually after -- we want 428 00:24:09,667 --> 00:24:13,999 to attack peer-to-peer botnets so -- I mean, if you can, for example -- 429 00:24:13,999 --> 00:24:16,834 if you know all the nodes you might want to try 430 00:24:16,834 --> 00:24:20,375 to send them commands yourself if you understand the command 431 00:24:20,375 --> 00:24:22,709 and control protocol. 432 00:24:23,125 --> 00:24:26,709 There's sometimes interesting commands if you can send 433 00:24:26,709 --> 00:24:30,167 an uninstall commands to all the bots that we identified 434 00:24:30,167 --> 00:24:33,999 and they're the ones you can talk to so it's the backbone 435 00:24:33,999 --> 00:24:39,417 of the whole graph, so to speak, then you can kill the botnet entirely. 436 00:24:39,501 --> 00:24:41,542 Or if you can, I don't know, send requests 437 00:24:41,542 --> 00:24:44,999 for more information about the infected machines, you can, 438 00:24:44,999 --> 00:24:48,999 for example, get information about the operating system version 439 00:24:48,999 --> 00:24:50,876 or other stuff. 440 00:24:50,918 --> 00:24:54,125 So that's usually interesting as well, but you can also probably manipulate 441 00:24:54,125 --> 00:24:56,959 the peer-to-peer infrastructure. 442 00:24:56,959 --> 00:24:58,999 So think about it. 443 00:24:58,999 --> 00:25:02,209 If you can generate your own peer list and then propagate these 444 00:25:02,209 --> 00:25:06,375 into peer-to-peer network you can create edges. 445 00:25:06,375 --> 00:25:08,999 You can kill other edges by replacing them and so on. 446 00:25:08,999 --> 00:25:11,792 So you can basically, you know, tamper with that infrastructure 447 00:25:11,792 --> 00:25:15,542 and we will talk more about that in a little bit. 448 00:25:16,167 --> 00:25:19,417 Ideally, I mean, you might be able to sinkhole the whole thing 449 00:25:19,417 --> 00:25:23,250 by replacing all the legitimate entries in the peer list with your own ones 450 00:25:23,250 --> 00:25:27,125 and by that have all peers talking to your own machines. 451 00:25:27,876 --> 00:25:30,751 Which means nobody else has access over them anymore. 452 00:25:34,999 --> 00:25:39,709 If you think about crawling strategies, you might ask yourself, do I want 453 00:25:39,709 --> 00:25:43,959 to implement a depth search or a BFS but it doesn't really matter 454 00:25:43,959 --> 00:25:48,999 at least that's what we think because first off it's not a tree. 455 00:25:48,999 --> 00:25:50,125 It's a graph. 456 00:25:50,375 --> 00:25:52,292 I mean, you can distinguish the two strategies 457 00:25:52,292 --> 00:25:56,167 because it doesn't really matter because it's dynamics. 458 00:25:58,209 --> 00:26:00,959 It doesn't really matter which nodes you start 459 00:26:00,959 --> 00:26:04,083 with and which nodes you continue with. 460 00:26:04,250 --> 00:26:06,876 At some point if you're quick enough, fast enough you 461 00:26:06,876 --> 00:26:09,501 will hopefully be able to learn the biggest part 462 00:26:09,501 --> 00:26:11,999 of the regional machines. 463 00:26:14,459 --> 00:26:18,501 If you track the infected machines, you need to be able to distinguish, 464 00:26:18,501 --> 00:26:22,876 have I seen that IP address, have I seen that peer before? 465 00:26:22,876 --> 00:26:24,417 Do I want to include it in my list? 466 00:26:25,959 --> 00:26:27,834 Or is it a new one. 467 00:26:27,918 --> 00:26:31,417 If you rely on the IP addresses only, that's a bit of a problem 468 00:26:31,417 --> 00:26:35,459 because I've already mentioned there's a lot of IP churn and, you know, 469 00:26:35,459 --> 00:26:38,751 IP addresses that change after 24 hours and if you happen 470 00:26:38,751 --> 00:26:42,375 to crawl a peer or contact a peer and then the IP address changes 471 00:26:42,375 --> 00:26:45,999 and you contact it again, you count it twice. 472 00:26:45,999 --> 00:26:49,999 You want to avoid that or you get screwed numbers. 473 00:26:50,083 --> 00:26:52,751 Some peer-to-peer protocols are nice. 474 00:26:52,751 --> 00:26:56,959 They implement UNIX IDs especially the ones that identify overlay numbers 475 00:26:56,959 --> 00:27:01,083 because you need them for routing and if you have that, well, 476 00:27:01,083 --> 00:27:04,999 then you can have more accurate numbers. 477 00:27:05,167 --> 00:27:08,083 Wow, you just gave it to me. 478 00:27:08,542 --> 00:27:09,999 Who knew? 479 00:27:10,167 --> 00:27:16,751 So part of the Defcon experience is the best technical talks delivered 480 00:27:16,751 --> 00:27:19,584 by the top speakers. 481 00:27:19,584 --> 00:27:22,959 It's very hard to get accepted to give a talk here. 482 00:27:22,999 --> 00:27:26,292 You all should consider what you're doing to maybe become a speaker 483 00:27:26,292 --> 00:27:27,999 at some point. 484 00:27:28,292 --> 00:27:30,999 This gentleman -- this is his first time. 485 00:27:30,999 --> 00:27:32,918 Let's give him a big round of applause. 486 00:27:32,918 --> 00:27:37,999 (Applause.) So we have another tradition at Defcon, 487 00:27:37,999 --> 00:27:45,375 typically first time speakers do a shoutout on stage. 488 00:27:45,959 --> 00:27:48,542 So cheers. 489 00:27:48,999 --> 00:27:52,292 All right. 490 00:27:52,876 --> 00:27:59,375 We've had to do this all day. 491 00:27:59,375 --> 00:28:01,250 (Applause.) And now we'll see if he can pick up the talk and start 492 00:28:01,250 --> 00:28:03,083 off where he left off. 493 00:28:03,083 --> 00:28:06,999 And I know some of you have seen this many times. 494 00:28:06,999 --> 00:28:08,083 I'm not going to make him do that entire speech 495 00:28:08,083 --> 00:28:09,584 next time. 496 00:28:14,083 --> 00:28:16,667 (Laughing.) TILLMANN WERNER: Okay. 497 00:28:16,918 --> 00:28:18,250 Thank you. 498 00:28:20,083 --> 00:28:21,375 Okay. 499 00:28:21,375 --> 00:28:24,250 (Laughing.) TILLMANN WERNER: I think I need one more to nullify 500 00:28:24,250 --> 00:28:26,209 the previous one. 501 00:28:28,959 --> 00:28:37,375 (Laughing.) (Applause.) There you go. 502 00:28:38,626 --> 00:28:40,209 Now you'll be better. 503 00:28:40,209 --> 00:28:47,709 (Applause.) Good job. 504 00:28:47,709 --> 00:28:50,626 TILLMANN WERNER: Let's finish this before it kicks in. 505 00:28:52,250 --> 00:28:54,999 You're done with the crawling when this curve 506 00:28:54,999 --> 00:28:56,999 converges because you don't learn 507 00:28:56,999 --> 00:28:59,834 about any new peers anymore. 508 00:29:00,667 --> 00:29:03,999 And if there are some changes, then it's due to churn. 509 00:29:04,417 --> 00:29:08,584 So what you see here is an analysis of the congregants 510 00:29:08,584 --> 00:29:12,667 for the peer-to-peer botnets be crawled. 511 00:29:12,709 --> 00:29:13,999 I hope you can read that. 512 00:29:13,999 --> 00:29:15,501 I realize it's rather small. 513 00:29:15,959 --> 00:29:19,292 But on the left-hand side you see curves similar to the one we had 514 00:29:19,292 --> 00:29:21,459 on the previous slide. 515 00:29:21,459 --> 00:29:24,334 Like the actual number of machines that we identified, 516 00:29:24,334 --> 00:29:28,751 and you can see -- I mean, it depends on the size of the botnet, 517 00:29:28,751 --> 00:29:34,209 of course, the upper curves of 0 axis which is pretty large to you. 518 00:29:34,209 --> 00:29:37,167 You get way more hits and the ones on the bottom are way more -- let 519 00:29:37,167 --> 00:29:38,667 me see. 520 00:29:39,542 --> 00:29:42,250 So that's a botnet called Sality that I haven't 521 00:29:42,250 --> 00:29:46,459 looked in myself but my friends has and he's provided these numbers so 522 00:29:46,459 --> 00:29:50,667 you can see depending on the size of the botnet the scale is different 523 00:29:50,667 --> 00:29:54,334 but the shape is more or less the same, right? 524 00:29:54,334 --> 00:29:58,542 So you can see that all of them kind of converge against a straight line, 525 00:29:58,542 --> 00:30:02,626 and then you know you're more or less done. 526 00:30:02,626 --> 00:30:05,042 You can also take a look at the population increase or -- yeah, 527 00:30:05,042 --> 00:30:09,125 increase in percent and that's displayed on the right-hand side which basically 528 00:30:09,125 --> 00:30:12,167 correlates with the other graphs; right? 529 00:30:16,250 --> 00:30:19,334 Yes, so -- oh, by the way I mentioned I'm going 530 00:30:19,334 --> 00:30:23,042 to read some code after this presentation. 531 00:30:23,501 --> 00:30:28,792 We figure whenever we want to crawl a peer-to-peer botnet, 532 00:30:28,792 --> 00:30:36,042 let's write some basic code that we can add the protocol implementation to, 533 00:30:36,042 --> 00:30:42,209 do it right once and add the changing stuff to that. 534 00:30:42,542 --> 00:30:46,626 And I'm going to release that as open source later on. 535 00:30:46,626 --> 00:30:51,209 (Clears throat.) TILLMANN WERNER: So, yeah. 536 00:30:51,209 --> 00:30:53,375 So how do you distinguish peer-to-peers? 537 00:30:53,459 --> 00:30:57,709 You have IP addresses and IP addresses versus IDs. 538 00:30:57,918 --> 00:31:01,209 In the case where you have IDs, in the case where you haven't, 539 00:31:01,209 --> 00:31:04,292 you can still derive some, you know, conclusions 540 00:31:04,292 --> 00:31:07,999 from other cases where IDs are available. 541 00:31:07,999 --> 00:31:10,083 And what you see here -- I mean, I'm cheating a little bit here 542 00:31:10,083 --> 00:31:13,334 because these graphs are not generated by crawling. 543 00:31:13,375 --> 00:31:20,292 This botnet that's actually Kelihos C that was attacked earlier this year. 544 00:31:20,709 --> 00:31:24,083 These numbers are not generated by crawling the botnet, 545 00:31:24,083 --> 00:31:28,167 but in this case we did node injection and we propagated an entry 546 00:31:28,167 --> 00:31:32,083 into the peer-to-peer network and then it became very prominent 547 00:31:32,083 --> 00:31:35,999 and then all the other peers reached out to that machine and 548 00:31:35,999 --> 00:31:39,876 by this you even get the ones that are not directly reachable 549 00:31:39,876 --> 00:31:43,167 because at some point the entries propagate through NAT 550 00:31:43,167 --> 00:31:47,584 and gateway and this gives you way accurate numbers and it allows you 551 00:31:47,584 --> 00:31:50,459 to peer count to the ID count. 552 00:31:51,459 --> 00:31:56,209 So green is the total number of bots and the total number 553 00:31:56,209 --> 00:32:02,292 of UNIX IDs and you can see this goes up even though we have seen all -- 554 00:32:02,292 --> 00:32:08,292 or almost all UNIX IDs so the -- so the next slope whatever it's called 555 00:32:08,292 --> 00:32:12,125 is much slower for the green line. 556 00:32:13,083 --> 00:32:17,250 And that's actually very similar so the ratio between the two after, say, 557 00:32:17,250 --> 00:32:19,542 24 hours or 48 hours is almost the same 558 00:32:19,542 --> 00:32:22,751 for all botnets we've taken a look at. 559 00:32:22,918 --> 00:32:25,542 I mean, we have a paper out on that where you can take a look 560 00:32:25,542 --> 00:32:29,542 at all the numbers, but I'm not going to cover that here. 561 00:32:29,542 --> 00:32:31,417 So you can see after 24 hours that's where 562 00:32:31,417 --> 00:32:34,792 the two lines crossed so even if you don't have UNIX IDs, 563 00:32:34,792 --> 00:32:38,083 you can say I take a look at the IP addresses I can collect 564 00:32:38,083 --> 00:32:42,918 in 24 hours and that probably gives me pretty accurate numbers. 565 00:32:45,125 --> 00:32:46,417 Yeah. 566 00:32:46,417 --> 00:32:48,959 I already mentioned speed, speed is important. 567 00:32:48,959 --> 00:32:54,209 You want to be as quickly -- as fast as possible, but being fast is not easy. 568 00:32:54,209 --> 00:32:55,459 I mean, if the protocol is UDP-based it's 569 00:32:55,459 --> 00:32:58,083 a little bit easier because you don't have to worry 570 00:32:58,083 --> 00:33:01,999 about such an establishment and so on and timeouts. 571 00:33:02,167 --> 00:33:06,250 Actually, I didn't get to finish the UDP code. 572 00:33:06,584 --> 00:33:09,999 Most of these botnets use UDP for a reason. 573 00:33:10,083 --> 00:33:11,459 The overhead is less. 574 00:33:11,792 --> 00:33:14,250 But I didn't get to finish the crawl template code 575 00:33:14,250 --> 00:33:18,792 for UDP so that's left as an exercise for you or you wait until I'm done 576 00:33:18,792 --> 00:33:21,125 and check into the repo. 577 00:33:21,542 --> 00:33:23,125 But UDP. 578 00:33:23,584 --> 00:33:27,125 Either people have one of two threats, one that sends out information 579 00:33:27,125 --> 00:33:30,626 and one that consumes incoming messages. 580 00:33:31,083 --> 00:33:34,459 If you do that and many bots work that way, actually, most 581 00:33:34,459 --> 00:33:37,417 of the UDP ones we have seen. 582 00:33:37,417 --> 00:33:40,626 If you do that, you have to worry about synchronization so you have to, 583 00:33:40,626 --> 00:33:43,417 you know, have a peer list that you lock when you want 584 00:33:43,417 --> 00:33:46,626 to send out stuff or select a peer that you want to send data 585 00:33:46,626 --> 00:33:49,709 to or when you receive data, you also probably want to lock 586 00:33:49,709 --> 00:33:52,125 the peer list so you have to synchronize the two, 587 00:33:52,125 --> 00:33:54,999 so we usually the main loop and just a single thread 588 00:33:54,999 --> 00:33:56,999 because it's faster. 589 00:33:56,999 --> 00:34:00,999 The code is a little bit more complex but, yeah. 590 00:34:02,083 --> 00:34:06,999 When you're talking TCP, it's -- yeah, a little bit more difficult. 591 00:34:06,999 --> 00:34:08,584 You have to establish TCP code connections 592 00:34:08,584 --> 00:34:13,083 and worry about timeouts because you don't want to get dust. 593 00:34:13,167 --> 00:34:15,459 If you don't worry about all these things and you crawl 594 00:34:15,459 --> 00:34:18,083 the network, they might like open half -- create half 595 00:34:18,083 --> 00:34:21,167 of connections and not respond to you or keep connections open 596 00:34:21,167 --> 00:34:23,876 forever that are established and then you're running 597 00:34:23,876 --> 00:34:27,999 out of fiber scripters and your crawling doesn't work anymore. 598 00:34:28,167 --> 00:34:30,667 You probably want to have a limited set of fiber scripters 599 00:34:30,667 --> 00:34:33,459 or sessions that you want to handle. 600 00:34:33,459 --> 00:34:37,792 So what the code does that I'm going to share publicly is it allocates 601 00:34:37,792 --> 00:34:41,292 a fixed number of slots for sessions and -- I mean, I mean, 602 00:34:41,292 --> 00:34:43,999 that's the amount of simultaneous sessions 603 00:34:43,999 --> 00:34:48,167 the code can handle and, you know, when it wants to contact a new peer, 604 00:34:48,167 --> 00:34:51,876 it takes the next free slot in that array. 605 00:34:52,334 --> 00:34:55,501 So by that you make sure your crawler doesn't get dust. 606 00:34:57,125 --> 00:34:59,834 Yeah, I talked about timeouts already. 607 00:35:01,959 --> 00:35:06,542 Another thing is if you talk to a peer, then you can -- I mean, 608 00:35:06,542 --> 00:35:09,375 definitely say it's live. 609 00:35:09,375 --> 00:35:10,542 It definitely exists. 610 00:35:10,542 --> 00:35:11,542 Thank you. 611 00:35:13,792 --> 00:35:16,584 The question is how long do you want to keep it 612 00:35:16,584 --> 00:35:18,751 in your peer list flagged as active 613 00:35:18,751 --> 00:35:21,999 because as I've said previously you want to beginning 614 00:35:21,999 --> 00:35:25,083 between IP addresses or peers that you encountered or 615 00:35:25,083 --> 00:35:29,209 the ones that you actually talked to that are alive. 616 00:35:29,999 --> 00:35:33,083 If you talk to peers for live, how long do you want 617 00:35:33,083 --> 00:35:36,876 to consider it live so that's another thing. 618 00:35:36,876 --> 00:35:39,584 I mean, do you want to consider life for 24 hours or only 3 minutes 619 00:35:39,584 --> 00:35:42,083 or do you want to periodically recontact it and 620 00:35:42,083 --> 00:35:46,626 if it doesn't respond anymore then you say it's not live anymore. 621 00:35:46,999 --> 00:35:49,999 So these are parameters that are really, really important. 622 00:35:49,999 --> 00:35:51,999 I mean, it might not sound like that but they're really important 623 00:35:51,999 --> 00:35:55,876 and you might want to -- you want to tune them for the specific botnet that 624 00:35:55,876 --> 00:35:58,876 you're crawling to get accurate numbers. 625 00:35:59,375 --> 00:36:00,626 Yeah. 626 00:36:00,626 --> 00:36:04,542 Also, especially when you're talking UDP, I mean, you can't send out lots 627 00:36:04,542 --> 00:36:07,083 of UDP packets per time. 628 00:36:07,209 --> 00:36:09,999 And fulfill up your own line, your own pipe 629 00:36:09,999 --> 00:36:13,709 with UDP packets you will have pack lists sometimes 630 00:36:13,709 --> 00:36:16,584 and you get funny results. 631 00:36:16,999 --> 00:36:20,250 You either get a bigger -- bigger line, bigger bandwidth or slow 632 00:36:20,250 --> 00:36:24,209 down a little bit so you want to have a parameter that allows you 633 00:36:24,209 --> 00:36:27,751 to slow down the whole crawling process. 634 00:36:29,375 --> 00:36:32,751 So Prowler is the name of the tool that we're going 635 00:36:32,751 --> 00:36:34,876 to release today. 636 00:36:35,209 --> 00:36:40,999 As I've said, it just implements the crawling framework, so to speak. 637 00:36:40,999 --> 00:36:44,626 And you have to add the protocol implementation yourself. 638 00:36:44,626 --> 00:36:46,417 It provides with some step functions that gets 639 00:36:46,417 --> 00:36:50,584 called and that's where you have to implement the protocols. 640 00:36:50,584 --> 00:36:52,626 So if you want to check it out, please do. 641 00:36:53,792 --> 00:36:56,999 As I've said, it's only TCP4 now. 642 00:36:57,375 --> 00:37:01,334 Yeah, and you can see what it looks like at the bottom of the slide. 643 00:37:01,459 --> 00:37:06,834 You can even see that it distinguishes between known peers and active peers. 644 00:37:06,999 --> 00:37:10,999 Can you see -- take a look at the last two lines, you can see 645 00:37:10,999 --> 00:37:15,083 the number of active peers goes down from 719 to 717 and that 646 00:37:15,083 --> 00:37:19,999 is because, you know, after some time, some peers don't respond anymore so 647 00:37:19,999 --> 00:37:25,459 they're not considered active anymore and get flagged as inactive. 648 00:37:25,834 --> 00:37:28,999 And in that case, we were crawling Kelihos C 649 00:37:28,999 --> 00:37:31,999 and that was in February. 650 00:37:33,667 --> 00:37:37,834 The peer list I started with only contained two entries. 651 00:37:37,834 --> 00:37:39,584 You see that on the right-hand side. 652 00:37:42,792 --> 00:37:47,999 And Kelihos always shares 250 entries and that is why if you take a look 653 00:37:47,999 --> 00:37:51,999 at the first line why -- that is why it immediately goes 654 00:37:51,999 --> 00:37:54,999 up to 250 known peers; right? 655 00:37:54,999 --> 00:37:58,999 It contacts one peer and loads 250 entries and it knows 250 ones 656 00:37:58,999 --> 00:38:03,542 immediately and then it continues from there. 657 00:38:03,542 --> 00:38:06,584 But if you take a look at the two graphs, again, the green line 658 00:38:06,584 --> 00:38:11,167 is active peers that it can talk to and the red line are peers that I have seen 659 00:38:11,167 --> 00:38:12,999 in peer lists. 660 00:38:12,999 --> 00:38:15,209 You can see that the green line gets constant 661 00:38:15,209 --> 00:38:16,999 very quickly. 662 00:38:16,999 --> 00:38:20,999 So it converges really quickly and, you know, somewhere in the range of, 663 00:38:20,999 --> 00:38:24,083 I don't know, what is that, 700? 664 00:38:24,292 --> 00:38:26,626 Yeah, that's in line with the numbers below. 665 00:38:26,834 --> 00:38:30,959 And that is because Kelihos configures with other peers, 666 00:38:30,959 --> 00:38:34,751 more recent peers so they have this backbone of what 667 00:38:34,751 --> 00:38:39,292 they call router nodes and there's never on more than the range 668 00:38:39,292 --> 00:38:45,999 of 700 that's why we'll never be able to talk to around 700 peers at a time. 669 00:38:45,999 --> 00:38:47,959 And you can also see these, I don't know, 670 00:38:47,959 --> 00:38:51,792 steps or whatever you want to call them in the rat curve and that 671 00:38:51,792 --> 00:38:54,792 is because if new peers come online, they propagate 672 00:38:54,792 --> 00:38:58,125 in the peer-to-peer network and become active at some point 673 00:38:58,125 --> 00:39:02,083 and then you get some steps because immediately it gets propagated 674 00:39:02,083 --> 00:39:06,999 to all peers that are online and that's what causes this effect. 675 00:39:08,209 --> 00:39:09,584 Okay. 676 00:39:09,584 --> 00:39:11,918 So I'm almost done here. 677 00:39:11,959 --> 00:39:15,083 This is the good repository where you can check out the code. 678 00:39:15,083 --> 00:39:18,959 As I said, I will hopefully add a UDP version soon. 679 00:39:22,792 --> 00:39:25,792 And I've checked in like that version like one hour 680 00:39:25,792 --> 00:39:29,417 before the talk so there might be some bugs in there, but I -- 681 00:39:29,417 --> 00:39:33,999 if you tell me that there's something buggy, I will fix it or you can fix it 682 00:39:33,999 --> 00:39:36,709 yourself and send me a patch. 683 00:39:36,999 --> 00:39:39,999 But I also want to talk about the alternative that we already 684 00:39:39,999 --> 00:39:42,999 touched on briefly which is node injection. 685 00:39:43,167 --> 00:39:46,292 As I've said, by crawling you will never be able to reach 686 00:39:46,292 --> 00:39:49,459 the peers that are behind gateway network so forth 687 00:39:49,459 --> 00:39:51,999 and so on so you can actively participate 688 00:39:51,999 --> 00:39:55,999 in the peer-to-peer network as an alternative and propagate your 689 00:39:55,999 --> 00:39:59,083 own IP addresses, and then at some point depending 690 00:39:59,083 --> 00:40:02,876 on the popularity of your node, the other peers will reach 691 00:40:02,876 --> 00:40:07,918 out to you and, you know, say, take me down or sending commands. 692 00:40:07,918 --> 00:40:10,501 So, yeah. 693 00:40:11,083 --> 00:40:14,584 And that's actually a comparison here 694 00:40:14,584 --> 00:40:20,792 between tracking based on sensor injection and crawling. 695 00:40:20,792 --> 00:40:22,999 So you can see the top two lines are again -- this 696 00:40:22,999 --> 00:40:27,459 is peer-to-peer Zeus so we have IDs, unique IDs and IP addresses so we 697 00:40:27,459 --> 00:40:30,125 distinguish between the numbers unique IDs 698 00:40:30,125 --> 00:40:32,918 and peer addresses, of course, the number 699 00:40:32,918 --> 00:40:37,042 of peer addresses are much higher and the top two lines are what we 700 00:40:37,042 --> 00:40:39,542 achieved through sensor injection and 701 00:40:39,542 --> 00:40:43,999 the other lines are what we achieved through crawling. 702 00:40:44,876 --> 00:40:47,417 And the bottom lines are the active IP addresses or 703 00:40:47,417 --> 00:40:50,999 the active peers that we can talk to so you can see it's much less than 704 00:40:50,999 --> 00:40:54,042 the peers that show up in the peer list. 705 00:40:55,501 --> 00:40:56,792 Okay. 706 00:40:56,792 --> 00:40:58,999 That's basically my presentation. 707 00:40:58,999 --> 00:40:59,876 I want to give shouts to some people here 708 00:40:59,876 --> 00:41:01,876 because they're awesome. 709 00:41:03,083 --> 00:41:08,584 And did some of the work with me here and deserve credit for it. 710 00:41:08,959 --> 00:41:09,999 And that's it. 711 00:41:10,083 --> 00:41:12,501 I think we have a few more minutes left, maybe 3 or so, so 712 00:41:12,501 --> 00:41:15,167 if you have any questions, you can ask me now or hunt me 713 00:41:15,167 --> 00:41:17,417 down at the bar later on. 714 00:41:17,834 --> 00:41:19,083 Thank you.