1 00:00:00,160 --> 00:00:03,710 TOM RITTER: Hello and welcome. My name is Tom Ritter and I work for ISAC partners. 2 00:00:03,710 --> 00:00:08,830 If you don't know who zax and dism is, you will know by the end of the talk. This is 3 00:00:08,830 --> 00:00:15,830 an anonymity group. This book, many of you will call the Bible, had not even come out 4 00:00:20,180 --> 00:00:27,180 yet. But the first edition. And while you could export the book itself, the U.S. government 5 00:00:28,320 --> 00:00:34,829 had determined you could not export the floppy disk that the code had come on. In fact, the 6 00:00:34,829 --> 00:00:39,440 U.S. was actively investigating Phil Zimmerman for violating the Arms Export Control Act 7 00:00:39,440 --> 00:00:46,440 for making the first few versions of PGP available. A group went on the offensive taking the U.S. 8 00:00:47,260 --> 00:00:53,170 government to court and suing over the export controls on crypto. Another group of people 9 00:00:53,170 --> 00:00:59,430 ultimately printed out the source code for PGP, exported the book to Europe and scanned 10 00:00:59,430 --> 00:01:06,430 it in and OCRed releasing a version that bypassed the export controls. Alt.Anonymous.Messages 11 00:01:06,830 --> 00:01:12,810 was forged in the heyday of cyber punks and has changed very little in the intervening 12 00:01:12,810 --> 00:01:17,590 decade since it was last shaped in any major way. 13 00:01:17,590 --> 00:01:24,140 But in that decade, what we have seen is a monumental focus of the nation's spy agencies 14 00:01:24,140 --> 00:01:29,350 on not what was thought to be the most critical piece of information to encrypt, the content 15 00:01:29,350 --> 00:01:36,040 itself, but rather on metadata. The people who know don't ‑‑ won't talk and the 16 00:01:36,040 --> 00:01:41,270 people who talk don't know. But leaked court orders require Verizon to turn over call records 17 00:01:41,270 --> 00:01:46,280 local and abroad. Now, I'm talking here so I don't know anything and I'm just speculating. 18 00:01:46,280 --> 00:01:51,060 But the most straightforward thing to do with this data is to build communication graphs, 19 00:01:51,060 --> 00:01:55,750 analyze the metadata, looking for patterns, identifying people of interest and figure 20 00:01:55,750 --> 00:02:02,500 out who they talk to. And the metadata around an encrypted channel tells volumes. 21 00:02:02,500 --> 00:02:07,590 So SSL is the most widely used encrypted channel on the Internet today. And even ignoring the 22 00:02:07,590 --> 00:02:12,840 numerous attacks we've seen in the past few years and ignoring how it break almost every 23 00:02:12,840 --> 00:02:19,840 cryptographic, there is numerous things you can learn. There are protocol‑level leaks 24 00:02:20,820 --> 00:02:25,920 itself. It says about the type of client you're using and the version. And it includes what 25 00:02:25,920 --> 00:02:30,730 you think the local time is. So here's hoping your clocks are synced. 26 00:02:30,730 --> 00:02:35,090 But from an information theoretic perspective, an adversary can see that you are sending 27 00:02:35,090 --> 00:02:40,930 packets and communicating. That seems obvious, you know. Of course, they know you are communicating. 28 00:02:40,930 --> 00:02:45,490 It is important to bear in mind for the future. Ideally, the adversary wouldn't even know 29 00:02:45,490 --> 00:02:50,120 that you are communicating. Secondly, SSL makes no attempt at hiding who 30 00:02:50,120 --> 00:02:55,150 you are talking to. So the fact you are Facebook, straightforward. Similarly, the adversary 31 00:02:55,150 --> 00:02:59,680 knows when you are on Facebook and when you are sending data and when you are receiving 32 00:02:59,680 --> 00:03:04,760 data and the resolution on this goes down to the microsecond. So they know exactly when, 33 00:03:04,760 --> 00:03:09,850 but they also know exactly how much data you receive. SSL doesn't have any real padding. 34 00:03:09,850 --> 00:03:14,240 And I don't know of any Web site that adds variable length padding to frustrate length 35 00:03:14,240 --> 00:03:18,569 analysis. So how many of you stayed through Runa's talk? 36 00:03:18,569 --> 00:03:25,569 A few. Thank you. Let's talk about Tor. Tor is an implementation of onion routing where 37 00:03:27,130 --> 00:03:32,230 each node peels off a layer of encryption until an exit node talks to the destination, 38 00:03:32,230 --> 00:03:35,680 the destination responds, and it is routed back. 39 00:03:35,680 --> 00:03:41,120 Onion routing specifically aims to disguise who is talking. An adversary observing can't 40 00:03:41,120 --> 00:03:45,110 see that you are talking to a Web site or a service. An adversary observing that Web 41 00:03:45,110 --> 00:03:50,640 site or service can't see who is talking to it. But it doesn't stop an adversary from 42 00:03:50,640 --> 00:03:56,080 knowing you're talking to someone, knowing when you're talking, and how much you're saying. 43 00:03:56,080 --> 00:04:01,240 Tor doesn't really do padding. What little it does is not intended to be a security feature. 44 00:04:01,240 --> 00:04:07,620 Tor explicitly leaves out length padding. And if you stayed through Runa's talk you 45 00:04:07,620 --> 00:04:13,810 know Tor cannot protect you if an adversary can see the entire path of a circuit. Let's 46 00:04:13,810 --> 00:04:20,440 say hypothetically speaking that New Zealand, Australia, the U.S., Canada and the U.K. were 47 00:04:20,440 --> 00:04:26,280 to, say, conspire on some sort of spy program. (laughter). 48 00:04:26,280 --> 00:04:32,500 Well, if your circuit went through these countries, Tor can't help you, at least not information 49 00:04:32,500 --> 00:04:37,630 theoretically. The adversary can track the traffic and find out who you are talking to. 50 00:04:37,630 --> 00:04:41,910 I'm not saying this is actively happening. I'm saying we've proved in papers that it's 51 00:04:41,910 --> 00:04:45,960 possible and that it's explicitly outside of Tor's threat model. 52 00:04:45,960 --> 00:04:50,970 A slightly more difficult version of that attack is if the adversary can see you and 53 00:04:50,970 --> 00:04:55,590 then see the last leg of your path later on, like, say, you're in China visiting a Chinese 54 00:04:55,590 --> 00:05:00,940 Web site, they can do a similar track and track you down. It requires a little bit more 55 00:05:00,940 --> 00:05:05,780 math, a little bit more correlation, but again we've proved that it's possible and it is, 56 00:05:05,780 --> 00:05:10,990 again, outside of Tor's threat model. This is particularly concerting seeing I, like 57 00:05:10,990 --> 00:05:17,750 probably most of you, happen to live in the U.S. and so much of what we do is hosted in 58 00:05:17,750 --> 00:05:24,750 Amazon ECT2 in Virginia. The adversary can tell who you are talking to, so we are back 59 00:05:28,750 --> 00:05:31,040 at SSL. I think it is worthwhile to show a couple 60 00:05:31,040 --> 00:05:37,800 of attacks on meth data. There was a traffic analysis tool that looks at your SSL session 61 00:05:37,800 --> 00:05:42,490 with Google and figures out what part of the Google Maps you are actually looking at, all 62 00:05:42,490 --> 00:05:47,870 based off the sizes of the tiles that you're downloading over SSL. It is worthwhile to 63 00:05:47,870 --> 00:05:53,310 note that this is an attack on a client, on someone browsing Google Maps at that moment. 64 00:05:53,310 --> 00:05:58,740 Let me show an ultimate example. You are sitting on Facebook with Facebook 65 00:05:58,740 --> 00:06:05,610 channel enabled. All over SSL, hacked. All over Tor. Facebook chat turns you into a server. 66 00:06:05,610 --> 00:06:10,790 You are able to receive messages from people and they will be pushed down to you. The attacker, 67 00:06:10,790 --> 00:06:15,810 not you, determines when you will receive a message. That's a pretty powerful capability, 68 00:06:15,810 --> 00:06:21,810 and it can lead to time‑based correlation attacks. An adversary sends you a message 69 00:06:21,810 --> 00:06:26,560 and looks at all the people connected to Facebook or Tor and see who's receiving a message right 70 00:06:26,560 --> 00:06:33,560 after that. Since Facebook chats tend to be huge, it can lead to size‑based attacks. 71 00:06:35,630 --> 00:06:40,520 I send you a huge Facebook chat with only a couple of trials, you can be pretty confident 72 00:06:40,520 --> 00:06:45,760 that the user whose Internet connection you are monitoring is the same anonymize Syrian 73 00:06:45,760 --> 00:06:52,460 dissent you are monitoring on Facebook. A similar attack was used to deanonymize Jeremy 74 00:06:52,460 --> 00:06:59,460 Hammond who is waiting trial for dumping mail spools. The police staked out his home, watched 75 00:07:00,270 --> 00:07:07,270 him enter, saw some Tor traffic and the username they thought was him popped on to IRC. Classic 76 00:07:08,080 --> 00:07:12,800 confirmation attack. I have gotten some comments they cut his Internet connection and saw him 77 00:07:12,800 --> 00:07:18,130 drop off. I haven't been able to confirm that in the police logs. Haven't had time. If that's 78 00:07:18,130 --> 00:07:23,370 true, that's another type of traffic confirmation attack that's on a low‑latency connection. 79 00:07:23,370 --> 00:07:29,470 Now, the good news is that even if the adversary can see the start and end nodes or even the 80 00:07:29,470 --> 00:07:35,460 entire path, there is a way to disguise who you are talking to. And that's mixed networks. 81 00:07:35,460 --> 00:07:39,470 Mixed networks introduce a delay while they collect messages into a pool and then fire 82 00:07:39,470 --> 00:07:44,800 them all out. Collecting messages prevent as adversary who is observing the mix from 83 00:07:44,800 --> 00:07:51,789 knowing what message went where. It introduces uncertainty. I really like mixed networks 84 00:07:51,789 --> 00:07:57,039 and I want to encourage research and adoption. I want to take a quick moment to demonstrate 85 00:07:57,039 --> 00:08:01,150 live on stage. So right now I'm going to be a Tor node or 86 00:08:01,150 --> 00:08:07,080 an onion routing node or a low latency anonymity network. I will receive a packet and then 87 00:08:07,080 --> 00:08:14,080 send it right out. Now, I'm going to play a mixed node or remailer node and I'm going 88 00:08:14,100 --> 00:08:20,400 to collect a packet, stick it in my bag, collect another packet, stick it in my bag and collect 89 00:08:20,400 --> 00:08:25,479 another packet and stick it in my bag. I will shuffle them up and peel out the outer layer 90 00:08:25,479 --> 00:08:31,990 of encryption and now I will send them out all at once. 91 00:08:31,990 --> 00:08:37,979 So you, the global passive adversary who can observe my computer and see all the traffic 92 00:08:37,979 --> 00:08:41,740 I send and receive, you saw that I received three messages and you saw that I sent out 93 00:08:41,740 --> 00:08:48,740 three messages. But you don't know which message went where. That's the uncertainty. 94 00:08:51,970 --> 00:08:56,329 So mixed networks demonstrated we gained a certain amount of protection against figuring 95 00:08:56,329 --> 00:09:02,189 out who is communicating with who. Given enough time or a low enough traffic volume, an adversary 96 00:09:02,189 --> 00:09:06,360 can perform the same types of attacks I just described against Tor, correlating messages, 97 00:09:06,360 --> 00:09:11,829 but it takes a lot more observation. The easiest thing to learn that takes no time 98 00:09:11,829 --> 00:09:17,240 or analysis is the fact that I'm communicating. We don't disguise the "if." We also don't 99 00:09:17,240 --> 00:09:23,749 disguise the "when" and we also don't disguise how large it is. 100 00:09:23,749 --> 00:09:29,579 So enter shared inbox by Alt.Anonymous.Messages. That's a bit of a wordful. I will abbreviate 101 00:09:29,579 --> 00:09:36,579 it to AAM. Imagine an email account where everyone in the room has the username and 102 00:09:37,959 --> 00:09:44,429 password but it is read‑only access. You can't delete messages. You can't send them. 103 00:09:44,429 --> 00:09:48,749 All the messages are encrypted so what you do is you download them all as one of the 104 00:09:48,749 --> 00:09:52,660 people with access to this inbox. And then you try and decrypt each one of them with 105 00:09:52,660 --> 00:09:58,350 your private key. And the ones that you can decrypt are to you 106 00:09:58,350 --> 00:10:05,050 and the ones that you can't decrypt aren't. And you don't know who they're to. Well, someone 107 00:10:05,050 --> 00:10:10,319 watching this encrypted connection, watching you accessing this mailbox and downloading 108 00:10:10,319 --> 00:10:16,449 all the messages, they can see that you are accessing the mailbox. That's certain. And 109 00:10:16,449 --> 00:10:21,790 they know you downloaded all the messages. But they don't know if you're able to decrypt 110 00:10:21,790 --> 00:10:24,899 any of them. And because of that, they don't know when 111 00:10:24,899 --> 00:10:30,009 you received a message, who it was from, or how large it was. All they know is that you're 112 00:10:30,009 --> 00:10:33,769 checking the mailbox, not that you're actually getting mail. 113 00:10:33,769 --> 00:10:38,269 At the cost of a lot of bandwidth receiving messages via shared inbox provides an awful 114 00:10:38,269 --> 00:10:44,949 lot of security comparatively. Now, shared mailboxes are an awesome anonymity 115 00:10:44,949 --> 00:10:50,569 tool, but the difference between an awesome anonymity tool an anonymity tool that's actually 116 00:10:50,569 --> 00:10:56,989 used is the answer to the question: Can I interact with the rest of the world? 117 00:10:56,989 --> 00:11:02,230 Tor is wildly successful compared to any other anonymity system because you can browse the 118 00:11:02,230 --> 00:11:06,829 actual Internet with it. It's not a closed system where you only interact with hidden 119 00:11:06,829 --> 00:11:11,869 services. So for a shared mailbox to actually be used, it needs to interact with normal 120 00:11:11,869 --> 00:11:18,869 email and that's where nymservs come in. The newest and easiest to use receives a message 121 00:11:19,249 --> 00:11:25,970 at a domain name and then just posts it immediately to Alt.Anonymous.Messages. This is a nymserv 122 00:11:25,970 --> 00:11:32,100 written by zax and it is on GitHub. And the much more complicated type one or 123 00:11:32,100 --> 00:11:38,519 GHIO nymservs can forward the mail to another email address or directly to Alt.Anonymous.Messages 124 00:11:38,519 --> 00:11:41,790 or they can even route it through a remailer network to eventually wind up in one of those 125 00:11:41,790 --> 00:11:46,230 two places. I will talk more about this nymserv later on. 126 00:11:46,230 --> 00:11:53,230 So if we add nymservs to shared mail, shared boxes also have anonymity for the recipient. 127 00:11:53,610 --> 00:11:58,379 When you send the message to a nym that uses a shared mailbox, you are ideally using an 128 00:11:58,379 --> 00:12:03,350 onion router or a mixed networks, although you don't have to, thus you would have those 129 00:12:03,350 --> 00:12:08,110 security properties. An adversary can see that you are sending, when you send it and 130 00:12:08,110 --> 00:12:13,829 how large it is. Now I have walked through the security properties 131 00:12:13,829 --> 00:12:20,160 through the different types of anonymity networks, let's actually dive into AAM. It should really 132 00:12:20,160 --> 00:12:24,949 have strong security, after all it is the most theoretically secure. 133 00:12:24,949 --> 00:12:29,369 But if you have never looked at it before, this is what it looks like, at least in Google 134 00:12:29,369 --> 00:12:34,839 Groups. It is Usenet. How many people are old enough to have used Usenet? Good, good. 135 00:12:34,839 --> 00:12:39,350 There is a whole bunch ‑‑ this is what it looks like today. A whole bunch of hexadecimal 136 00:12:39,350 --> 00:12:46,350 subjects all hosted by anonymous or nobody. A message used, like, a PGP message that may 137 00:12:46,799 --> 00:12:53,259 or may not have a version string. Today there are about 190 messages posted 138 00:12:53,259 --> 00:12:57,519 per day. But what's interesting is that while the average has certainly decreased over the 139 00:12:57,519 --> 00:13:04,259 last decade it has held somewhat steadily in the last five years. So the dataset that 140 00:13:04,259 --> 00:13:09,399 I worked off was about 1.1 million messages from the last ten years. 141 00:13:09,399 --> 00:13:14,910 Now, we can really see some shortcomings here already. Over half of the messages in my dataset 142 00:13:14,910 --> 00:13:19,869 go through two people. The network diversity is horrible. If you stay through Runa's, you 143 00:13:19,869 --> 00:13:26,869 know that's important. If either one of these folks, got subpoenaed or shut down or just 144 00:13:27,600 --> 00:13:32,360 retired, the whole network would be thrown into disarray. 145 00:13:32,360 --> 00:13:37,239 And to the person who asked about directory authorities in Tor, dism is one of the directory 146 00:13:37,239 --> 00:13:41,549 authorities in Tor and he is not affiliated with the Tor project. He is just someone they 147 00:13:41,549 --> 00:13:48,549 trust. Now, this looks pretty bad. It is way worse. 148 00:13:51,059 --> 00:13:55,779 That 53 1/2% statistic was over the entire dataset. Today zax and dism make up virtually 149 00:13:55,779 --> 00:14:01,359 all of the messages posted to AAM. I don't mean that they are sending them all, I mean 150 00:14:01,359 --> 00:14:06,049 they are the exit node for all the messages posted to AAM. 151 00:14:06,049 --> 00:14:12,730 And that weird dip, that was 7800 messages sent through Frow which operates a remailer 152 00:14:12,730 --> 00:14:19,109 a news gateway. It had a unique subject. It didn't have any unique headers. I couldn't 153 00:14:19,109 --> 00:14:26,019 get a whole lot out of it aside correlating those 7800 messages uniquely. 154 00:14:26,019 --> 00:14:32,299 So with network diversity pretty clearly abolished, let's take a look at the data and see what 155 00:14:32,299 --> 00:14:39,249 type of analysis we can actually do. I don't think I could say anything as ironic as this 156 00:14:39,249 --> 00:14:46,249 quote. That's from 1994. So here we are just shy of 20 years later. 157 00:14:52,329 --> 00:14:58,480 And the first thing to do is break it up by PGP versus not PGP. It is overwhelming PGP 158 00:14:58,480 --> 00:15:05,480 messages but what are not the PGP messages quickly? I was trying to come up with a nice 159 00:15:05,769 --> 00:15:10,829 way to say crack pots. I'm not sure if I succeeded. There are several people who have and continue 160 00:15:10,829 --> 00:15:17,319 to post just random rants about ‑‑ I'm not even really sure. Some of them is definitely 161 00:15:17,319 --> 00:15:21,999 the lizard people. And there are actually frequently asked questions that are sprung 162 00:15:21,999 --> 00:15:28,299 up in response to these guys because people are just getting flat out confused by them. 163 00:15:28,299 --> 00:15:31,769 And besides those, there are some other none PGP messages. I think the most interesting 164 00:15:31,769 --> 00:15:38,769 are 10,000 messages with the subject operation satanic. What's interesting is they are clearly 165 00:15:40,109 --> 00:15:44,959 cipher text but it is alphabetic. If you look at a single message, you might think it is 166 00:15:44,959 --> 00:15:51,959 a Caesar Cipher or Vigenère. If you look at them in whole, you see it is a perfectly 167 00:15:51,999 --> 00:15:56,769 even distribution over a 16‑letter alphabet. In other words, I think it is a substitution 168 00:15:56,769 --> 00:16:02,209 cipher into hexadecimal and it is actually cipher text. There are other clumps that are 169 00:16:02,209 --> 00:16:08,649 similar to this. If you are into this type of analysis, have at it. 170 00:16:08,649 --> 00:16:14,249 And the next thing to look at is what percent of messages were delivered to AAM via nymserv 171 00:16:14,249 --> 00:16:19,600 or via a remailer. These numbers will be a little bit off since some of the PGP or remailer 172 00:16:19,600 --> 00:16:26,189 messages are to nyms and some are through remailers I don't know about. But it is something. 173 00:16:26,189 --> 00:16:31,489 We can see that a large portion of our messages are to nyms which is important when I can 174 00:16:31,489 --> 00:16:38,489 tell you how many nymservs are still running. Somewhat interesting statistics aside, let's 175 00:16:40,249 --> 00:16:47,249 start diving into all of those hundreds of thousands of encrypted messages. Open PGP 176 00:16:47,629 --> 00:16:53,040 consists of packet and each packet type does something slightly. There is a packet type 177 00:16:53,040 --> 00:17:00,040 for a message encrypted to a public key a packet type encrypted to a password. What 178 00:17:00,419 --> 00:17:04,910 are these packet types? These graphs show the popularity of each of the different packet 179 00:17:04,910 --> 00:17:10,180 types. For example, packet type 1 followed by packet type 9. And the top five, the ones 180 00:17:10,180 --> 00:17:14,390 on the bottom, are the ones you would expect to see. 181 00:17:14,390 --> 00:17:20,839 Packet type 1 is messages encrypted to a public key. Packet type 3 ask messages encrypted 182 00:17:20,839 --> 00:17:27,699 to a pass phrase. The actual cipher text of a message is 9 or 18 for old style or new 183 00:17:27,699 --> 00:17:32,450 style. And I separated out the messages to a single public key versus messages to multiple 184 00:17:32,450 --> 00:17:37,400 public keys. Now, there are two that are just kind of weird. 185 00:17:37,400 --> 00:17:42,290 These are the packet types you expect to see after you decrypted a message. These are plain 186 00:17:42,290 --> 00:17:48,170 text packets. There are actually a small number of messages that look like open PGP data. 187 00:17:48,170 --> 00:17:52,810 They have got the whole begin PGP message ticker and they are base 64'd. But they are 188 00:17:52,810 --> 00:17:57,800 actually just plain text sitting in plain sight. 189 00:17:57,800 --> 00:18:02,140 If we look at packet type 8, this is what we get. It really is just compressed plain 190 00:18:02,140 --> 00:18:07,350 text data. Unfortunately it is also nonsense. I don't know if there is a code there or not. 191 00:18:07,350 --> 00:18:12,650 I didn't spend a whole lot of time on it after I looked at Iran organizing bizarre Sabbatical. 192 00:18:12,650 --> 00:18:18,610 It probably came out of some mark hub generator somewhere. So I kind of moved on. 193 00:18:18,610 --> 00:18:24,470 What I moved on to were messages sent to public keys. Now, it is super obvious to do analysis 194 00:18:24,470 --> 00:18:28,950 based on the public key that's in the message. I promise you, it gets a little bit more complicated 195 00:18:28,950 --> 00:18:34,620 later. But let's look at the key I.D.s.. So obviously they are a pretty powerful segmenting 196 00:18:34,620 --> 00:18:39,490 tool. I want to illustrate examples where public keys can tell us more. There is one 197 00:18:39,490 --> 00:18:45,000 key I.D ‑‑ I have anonymized most of the specific data in this because de‑anonymizing 198 00:18:45,000 --> 00:18:51,280 people isn't cool. There is one key I.D. that messaged very reliably through a nymserv except 199 00:18:51,280 --> 00:18:57,420 for two messages sent through Easy News. If you track down a very unique gateway and user 200 00:18:57,420 --> 00:19:03,050 agent, that person sent another message to a key I.D. and we can make inferences across 201 00:19:03,050 --> 00:19:10,050 multiple types of metadata. I separated out the information send to a 202 00:19:11,290 --> 00:19:16,860 multiple keys. If a message was sent to a single key, we don't know too much about it 203 00:19:16,860 --> 00:19:23,800 because they throw the key I.D. so it is all zeros. If a message is sent to more than one 204 00:19:23,800 --> 00:19:30,130 key, then we can draw communication graphs. Now, it's not a strict communication graph 205 00:19:30,130 --> 00:19:36,530 in the sense it was sent to Alice and Bob. It is that they received the same message. 206 00:19:36,530 --> 00:19:40,860 In most situations, people will encrypt a message to themselves so they can read their 207 00:19:40,860 --> 00:19:46,060 own sent mail. I started drawing these pictures about the 208 00:19:46,060 --> 00:19:51,050 same time as the PRISM scandal started breaking. I was feeling really uncomfortable that this 209 00:19:51,050 --> 00:19:57,990 is probably what the NSA is doing to me and my friends. But nonetheless, quick reference, 210 00:19:57,990 --> 00:20:04,910 green means that I was able to get the public key off a key server. A circle means that 211 00:20:04,910 --> 00:20:09,780 a key received messages to it individually as well as to, like, it and multiple other 212 00:20:09,780 --> 00:20:14,560 people. And then the size of the circle and the width of the line is how many messages 213 00:20:14,560 --> 00:20:19,370 they received. So there is this very nice symmetrical five‑person graph. We've got 214 00:20:19,370 --> 00:20:26,370 these much larger communication networks here. A real big one here. A couple interesting 215 00:20:28,370 --> 00:20:34,120 graphs with central communication graphs. You can infer from that what you want. 216 00:20:34,120 --> 00:20:37,420 And then we got a couple more interesting networks. I think these are interesting because 217 00:20:37,420 --> 00:20:43,240 they imply that not everybody knows everybody else. This graph and the next one may really 218 00:20:43,240 --> 00:20:48,280 be a model of actual Internet where people will email people in a complex interconnected 219 00:20:48,280 --> 00:20:54,290 but not fully connected way. This is a fairly low‑volume network and this one has quite 220 00:20:54,290 --> 00:21:00,520 a few higher‑volume folks participating. And then there's like the rest, the simple 221 00:21:00,520 --> 00:21:07,520 two‑person communications going on. So I was working on the ‑‑ but let's 222 00:21:10,430 --> 00:21:17,430 talk about brute forcing cyber text. You saw packet type 9 was by far the most common packet 223 00:21:17,580 --> 00:21:23,130 type found. There is over 700,000 of them. Now, this packet type is really interesting 224 00:21:23,130 --> 00:21:27,200 so let's dive in a little bit into the open PGP spec. 225 00:21:27,200 --> 00:21:33,740 This packet is the actual cipher text of the message. It is only the encrypted data. It 226 00:21:33,740 --> 00:21:39,220 doesn't say what algorithm it is and it doesn't explain how to get the key. So where is the 227 00:21:39,220 --> 00:21:46,220 key? The key is in another packet. It is in packet type 1 for public keys or packet type 228 00:21:46,750 --> 00:21:53,060 3 for pass phrases. But if you recall from that graph, there aren't 229 00:21:53,060 --> 00:21:59,780 any packets that precede packet type 9. We've got a disconnect from what the spec says and 230 00:21:59,780 --> 00:22:06,780 the data that we actually see until we find this. The ideal algorithm is used with the 231 00:22:06,900 --> 00:22:13,900 session key calculated as the MD5 hash of the password. Yeah, the MD5 and the password. 232 00:22:16,230 --> 00:22:19,930 This is absolutely legacy and we have had better ways of doing this in open PGP since 233 00:22:19,930 --> 00:22:25,750 the late '90s. So while in the very beginning of AAM this might have been excusable, the 234 00:22:25,750 --> 00:22:31,330 fact that my dataset was from 2003 onward makes this a pretty horrible situation. 235 00:22:31,330 --> 00:22:36,330 And we know how to do MD5s really, really fast. But that's only half of it. We also 236 00:22:36,330 --> 00:22:40,250 have to do an I.D. decryption and then we have to detect of what we decrypted was the 237 00:22:40,250 --> 00:22:45,650 actual plain text or just random. While you can run randomness tests, they are slow and 238 00:22:45,650 --> 00:22:50,450 we are brute forcing so we want to go as fast as possible. This is all my way of trying 239 00:22:50,450 --> 00:22:56,180 to justify that I spend a lot of time running GPU code and running it for months and killing 240 00:22:56,180 --> 00:23:02,490 my home desktop. But I did get results out of all this GPU cracking. In fact, one of 241 00:23:02,490 --> 00:23:09,490 the first few dozen of the messages we got was this one which did not ‑‑ (laughter). 242 00:23:10,450 --> 00:23:17,340 (applause). TOM RITTER: Which did not make me feel 243 00:23:17,340 --> 00:23:20,940 terribly good about myself. (laughter). 244 00:23:20,940 --> 00:23:27,480 But I kept going. And I got some HTML pages. I got some we are SMTP logs. I got a lot of 245 00:23:27,480 --> 00:23:34,480 partial remailer messages. But overwhelming what I got after I decrypted the message was 246 00:23:35,320 --> 00:23:42,320 an encrypted message. Recursively recrypted PGP messages. And, in fact, here's a breakdown 247 00:23:44,200 --> 00:23:50,280 of how many recursions I hit. I got about 10,000 decryptions into a public key message 248 00:23:50,280 --> 00:23:56,650 and another 2200 that went into another password‑protected message. So I want to uncrack those and I 249 00:23:56,650 --> 00:24:00,780 got about 49 messages that were two layers deep and then I had to crack some more of 250 00:24:00,780 --> 00:24:05,340 those and I went four layers deep and then there is this one bloody message that was 251 00:24:05,340 --> 00:24:12,340 four layers deep that I still couldn't crack. So it's pretty damn recursive. 252 00:24:13,070 --> 00:24:20,070 For the number of messages I was trying to brute force, the fact I only got about 10,000 253 00:24:20,340 --> 00:24:26,970 cracked is not really great. Password crackers would consider that a fail in Europe. I'm 254 00:24:26,970 --> 00:24:31,560 not the best cracker. I'm sure people can do better. What I want to defend myself with 255 00:24:31,560 --> 00:24:37,760 is I'm not trying to crack password, but crack pass phrases by the most paranoid people on 256 00:24:37,760 --> 00:24:42,910 the Internet. I think I did decent. I haven't explained why there are so many 257 00:24:42,910 --> 00:24:49,650 recursively encrypted messages. What the hell? To explain that I have to talk about remailers. 258 00:24:49,650 --> 00:24:56,650 How many have used a remailer? About two dozen. So the tools that you have probably used, 259 00:24:59,960 --> 00:25:04,810 mixed master and mixed minion are dubbed type 2 and type 3 remailers. That means there must 260 00:25:04,810 --> 00:25:09,980 be a type 1 remailer somewhere, right? They're basically dead but the protocol itself lives 261 00:25:09,980 --> 00:25:16,980 on in Mixed Master. And boy what a protocol. This is a manual of how to use most but not 262 00:25:17,120 --> 00:25:22,130 even all of the options supported by type 1 remailers. 263 00:25:22,130 --> 00:25:28,030 Now, some of the directives are on the left. Now, what's the difference between remailer 264 00:25:28,030 --> 00:25:35,030 2, remix 2, anon 2, encrypt 2. I don't remember and I studied this stuff for a while. To use 265 00:25:35,770 --> 00:25:40,280 type 1, you actually have to type all of these out yourself. It is not like a GUI where you 266 00:25:40,280 --> 00:25:46,940 click a check box. I had talked in the beginning about type 1 nymservs. Type 1 nymservs are 267 00:25:46,940 --> 00:25:53,630 the main recipients of these directives. You string together a my encrypted to different 268 00:25:53,630 --> 00:25:58,580 nodes. You type that all out yourself, by the way. And that would be your reply block. 269 00:25:58,580 --> 00:26:05,580 And when someone emails your nym, it would execute the reply block. Ultimately coming 270 00:26:06,620 --> 00:26:13,620 out to your real email address or to Alt.Anonymous.Messages. And we're still seeing these messages posted. 271 00:26:14,910 --> 00:26:21,510 But there are only two type 1 nymservs operating. One is zax. The other is Paranoisy (phonetic). 272 00:26:21,510 --> 00:26:28,510 It is run by Italian hackers in Milan. They run two that you can think of an Italian version 273 00:26:28,560 --> 00:26:33,170 of Rise Up. If you have ever heard of Rise Up. 274 00:26:33,170 --> 00:26:37,890 So in conclusion, what are those nested PGP messages? They're type 1 nymserver messages 275 00:26:37,890 --> 00:26:44,890 where the key idea is the ultimate nym owner. There is another layer of encryption I haven't 276 00:26:45,010 --> 00:26:51,180 cracked yet. When you download type 1 nymserver messages you know all of the passwords. You 277 00:26:51,180 --> 00:26:57,150 peel them off one by one and finally you use your private key. And these are all the recipients 278 00:26:57,150 --> 00:27:04,150 with more than five messages. It's pretty top heavy towards just a few nyms. 279 00:27:04,770 --> 00:27:09,210 So communication graphs and brute forcing is really just the first quarter, I would 280 00:27:09,210 --> 00:27:16,210 say, of the analysis I did on AAM. A majority of my time was spent doing correlation. So 281 00:27:16,380 --> 00:27:20,570 even if I don't know who a message is to or what it says, it is valuable to know that 282 00:27:20,570 --> 00:27:27,130 it is to the same person as another message or that it was sent by the same sender. 283 00:27:27,130 --> 00:27:31,850 And why is that valuable? Well, let's go back to the slide. You can't tell if someone has 284 00:27:31,850 --> 00:27:38,150 even received a message in a shared mailbox. But if I can correlate one message with another, 285 00:27:38,150 --> 00:27:43,660 then I can start determining that some unknown person has received a message. And once I 286 00:27:43,660 --> 00:27:47,300 know that two messages are related, well, then I can start paying attention to their 287 00:27:47,300 --> 00:27:53,320 time stamp and to the length. And this goes even further. Because people tend to respond 288 00:27:53,320 --> 00:27:58,900 to messages that they receive. And since I know if someone has sent a message, it might 289 00:27:58,900 --> 00:28:04,290 just be that they are replying to a message that they just received. 290 00:28:04,290 --> 00:28:10,600 So let's talk more about correlation and some more analysis of what's going on in AAM. First 291 00:28:10,600 --> 00:28:17,280 off, it's obvious that you can correlate messages that use a single constant subject. But there 292 00:28:17,280 --> 00:28:22,890 are a lot of messages like these. Like, nearly half of all the messages post to AAM have 293 00:28:22,890 --> 00:28:27,640 a constant, like, English subject. They don't use that hexadecimal stuff. They do tend to 294 00:28:27,640 --> 00:28:32,270 be the older messages and they have tapered off recently which makes sense. But, you know, 295 00:28:32,270 --> 00:28:37,490 you can look at kind of these numbers, 22,000 messages in a cluster, 18,000 messages in 296 00:28:37,490 --> 00:28:42,440 a cluster. But let's talk about those random hexadecimal 297 00:28:42,440 --> 00:28:47,660 subjects. Now, there are two algorithms to generate these subjects. They're called encrypted 298 00:28:47,660 --> 00:28:53,510 subjects or e subs and hash subs or h subs. The point of these is to quickly identify 299 00:28:53,510 --> 00:28:59,670 which messages are for you and which messages you should ignore. For the folks who used 300 00:28:59,670 --> 00:29:05,890 Usenet, you can download the whole headers and not the bodies. We can probably cut this 301 00:29:05,890 --> 00:29:09,520 stuff out but it is still there so let's break it. 302 00:29:09,520 --> 00:29:16,520 E subs have two secrets, a subject, a password. H subs have a single secret, a password. It 303 00:29:16,630 --> 00:29:20,620 is considerably more difficult to browse force the e subs and I ran out of time so I just 304 00:29:20,620 --> 00:29:27,620 focused on the h subs. H subs were created by zax. And as his services are used more 305 00:29:28,000 --> 00:29:33,420 and more, they make up an increasing percentage of the subjects. Now, h subs have a random 306 00:29:33,420 --> 00:29:38,480 piece of them you can think of as an initialization sector, as assault. While I can try to shoehorn 307 00:29:38,480 --> 00:29:45,080 these into the 56 hackers, it would be painful. You have to truncate the output. I wrote my 308 00:29:45,080 --> 00:29:51,460 own GPU cracker and I cracked about 3500 h subs. Better the percentage of messages I 309 00:29:51,460 --> 00:29:56,480 brute forced but not a great percentage. Again, these are the passwords the most paranoid 310 00:29:56,480 --> 00:30:03,390 people on the Internet. Danger Will Robinson was used by some, but 311 00:30:03,390 --> 00:30:09,270 it was used by some but not all of the messages that were sent to a couple of particular key 312 00:30:09,270 --> 00:30:13,800 I.D.s. I cracked all the h subs of another key I.D. with the passwords of testicular 313 00:30:13,800 --> 00:30:20,800 and panties. And if you don't know what smegma is, don't Urban Dictionary it. 314 00:30:23,100 --> 00:30:28,420 If h subs and e subs are used to let a nym owner identify their own messages, can we 315 00:30:28,420 --> 00:30:35,059 do something similar? Let's say we want to target the nym Bob. What we can do is send 316 00:30:35,059 --> 00:30:41,940 a particularly large message to Bob full of nonsense. And then we wait for a large message 317 00:30:41,940 --> 00:30:48,940 to pop out in AAM. Zax's nymserv is near instantaneous. Type 1 nymservs are not necessarily instantaneous. 318 00:30:51,420 --> 00:30:56,970 A little bit more difficult but not too difficult. You can do it a couple of times. And this 319 00:30:56,970 --> 00:31:02,059 works and it works pretty easily and effectively. What we get is a specific message that we 320 00:31:02,059 --> 00:31:09,059 know is to a particular nym. At that point we can target that for h sub cracking. 321 00:31:10,020 --> 00:31:14,100 So I'm not done. But unlike everything I presented before, what I'm going to talk about now is 322 00:31:14,100 --> 00:31:18,800 probability‑based attacks. That is, I come up with a hypothesis that I can correlate 323 00:31:18,800 --> 00:31:25,740 messages with a probability, better than random, if I look at property X, whatever X is. Well, 324 00:31:25,740 --> 00:31:31,720 how many of you like the scientific method? I don't really have controls. So what I'm 325 00:31:31,720 --> 00:31:36,080 doing is I'm coming up with a hypothesis and running it across the data SET. And then I'm 326 00:31:36,080 --> 00:31:40,390 looking at the clusters of messages that pop out and then I'm going to see if I can figure 327 00:31:40,390 --> 00:31:45,309 out something else that correlates them. And if I can see something else that correlates 328 00:31:45,309 --> 00:31:49,620 them, I call it a success. That's how I kind of simulate controls. 329 00:31:49,620 --> 00:31:55,350 So let's say I think if a message has a header value of X, I think that's a unique sender. 330 00:31:55,350 --> 00:32:00,880 One sender is sending that value of X. I run that analysis and I get clusters of messages 331 00:32:00,880 --> 00:32:06,220 encrypted to a single public key. Well, if there was no correlation at all, I would probably 332 00:32:06,220 --> 00:32:11,260 get a distribution that looks more random. It would be encrypted to random public keys. 333 00:32:11,260 --> 00:32:17,220 But with such a nicely segmented public key, I kind of think that this worked. That's how 334 00:32:17,220 --> 00:32:23,400 I kind of simulate controls and find clusters of data when there is no other ‑‑ and 335 00:32:23,400 --> 00:32:28,110 if there is ‑‑ sorry. Even if I could have found that cluster by just looking at 336 00:32:28,110 --> 00:32:33,429 the public keys, the data implies that I could use that trick, that is that hypothesis, to 337 00:32:33,429 --> 00:32:40,429 find a cluster of data when there is no other distinguishing characteristic. So let's how 338 00:32:41,320 --> 00:32:44,780 I try to preserve some semblance of the scientific method. 339 00:32:44,780 --> 00:32:50,210 My first example is message headers. That's a big one. Let's look at these. There are 340 00:32:50,210 --> 00:32:57,210 a few headers that are in every message but some tails that are only in a few. These mostly 341 00:32:57,520 --> 00:33:01,400 unique message headers are not necessarily the goldmine that you might think they are. 342 00:33:01,400 --> 00:33:06,190 That's because headers can be added at the client, at the exit remailer, at the mail‑to‑news 343 00:33:06,190 --> 00:33:11,950 gateway or by the Usenet peer. What we have to do is to really go after the distinguishing 344 00:33:11,950 --> 00:33:16,750 headers, to subtract out the headers that were added by all the other parts. Path which 345 00:33:16,750 --> 00:33:21,800 we can do by just clustering by the exit remailer and then seeing which headers on all of those 346 00:33:21,800 --> 00:33:28,010 messages and kind of subtract those out. Here are some great examples of headers that 347 00:33:28,010 --> 00:33:35,010 were specified by the client. User agent, obviously, X post type I.D., X no archive. 348 00:33:35,190 --> 00:33:42,190 If you use Usenet you know X no archive is a client preference. 349 00:33:42,270 --> 00:33:47,530 These particular strange headers all formed a distinct clump of messages with the unique 350 00:33:47,530 --> 00:33:52,730 subject of: Weed will save the planet. And that's an easy example of how the idea of 351 00:33:52,730 --> 00:33:56,130 unique message headers can kind of correlate messages. 352 00:33:56,130 --> 00:34:02,140 Now, X no archive, this means don't save it in Usenet. It is a client request that most 353 00:34:02,140 --> 00:34:07,620 Usenet servers will obey. It is also not the word that I have on the screen. This is a 354 00:34:07,620 --> 00:34:12,230 misspelling of the header. And there is one person or at least I'm claiming one person 355 00:34:12,230 --> 00:34:19,230 who has messed this up and completely distinguishes their messages from everyone else's. All 17,300 356 00:34:20,559 --> 00:34:26,669 of them. So this is what you want, right? No. Capitalization 357 00:34:26,669 --> 00:34:31,489 matters. And this is not the correct capitalization. What's interesting about this one is that 358 00:34:31,489 --> 00:34:37,659 it shows up on several long‑running threads on AAM composing nearly 28,000 messages. And 359 00:34:37,659 --> 00:34:42,369 initially, I thought each of these threads was relatively independent of each other. 360 00:34:42,369 --> 00:34:48,039 But after finding this little bit of information, I'm starting to seriously doubt that. 361 00:34:48,039 --> 00:34:50,659 This one isn't right either. (laughter). 362 00:34:50,659 --> 00:34:55,279 There is 1500 messages posted with this header, including some test messages that were posted 363 00:34:55,279 --> 00:35:00,420 with someone's real name. This is actually the correct version. And 364 00:35:00,420 --> 00:35:06,420 there is about 135,000 messages that have it or a little more than 10% which makes it 365 00:35:06,420 --> 00:35:13,420 distinguishing in and of itself. So, just out of curiosity, another hand showing. 366 00:35:15,099 --> 00:35:22,099 Has anyone ever used a type 1 nymserv? I don't see any hands. Okay. So encrypt subject is 367 00:35:24,279 --> 00:35:29,190 a directive for type 1 remailers that should be processed by the remailer. It should never 368 00:35:29,190 --> 00:35:33,799 make its way into Usenet. This is a bug. This is a client. This is an 369 00:35:33,799 --> 00:35:39,339 user messing up. But I can't really blame them because type 1 is so horribly difficult. 370 00:35:39,339 --> 00:35:44,670 There are over 10,000 messages like this. And when you reuse the subject like these, 371 00:35:44,670 --> 00:35:50,509 you make messages without the encrypt subject stand out. That's the one on the far right. 372 00:35:50,509 --> 00:35:55,380 Or even worse, mess it up once and then figure out how to do it but keep using that same 373 00:35:55,380 --> 00:36:02,380 subject and password. So this let me identify 52 e‑sub messages that were otherwise security 374 00:36:02,390 --> 00:36:09,230 but they messed up once and sent it through in plain text. And then there is encrypt key. 375 00:36:09,230 --> 00:36:13,140 Another header that should never make it into Usenet but does because type 1 remailers are 376 00:36:13,140 --> 00:36:18,009 so hard to use. There are over 10,000 of these messages. 377 00:36:18,009 --> 00:36:23,809 And let's look at another header, news groups. Just‑like mailing lists, you can post a 378 00:36:23,809 --> 00:36:28,390 message to more than one news group. If you do, you're wildly in the minority and that 379 00:36:28,390 --> 00:36:35,390 segments you. Like this news group, there are 34 messages posted with this news group 380 00:36:35,509 --> 00:36:42,509 and thank you so much for ‑‑ to Comcast for making your users extremely distinguishable. 381 00:36:43,499 --> 00:36:49,210 And what about this value? AAM with four commas at the end. I thought this was a correlation, 382 00:36:49,210 --> 00:36:55,380 but after tracking it down, it was actually a bug caused by the remailer remailer.org.U.K. 383 00:36:55,380 --> 00:37:00,519 for one week in January 2006. Just some random trivia I pulled out. 384 00:37:00,519 --> 00:37:05,749 How about this one with duplicate in news groups. These were sent through a large variety 385 00:37:05,749 --> 00:37:11,009 of remailers and have no obvious correlation besides this value and that they have English 386 00:37:11,009 --> 00:37:16,230 subjects. So the English subjects was another example of the control that I used to confirm 387 00:37:16,230 --> 00:37:23,230 that using a unique news group is a bad idea. And humans are creatures of habit and as flakey 388 00:37:26,249 --> 00:37:30,989 as remailers have been, a lot of people find a configuration that works for them and then 389 00:37:30,989 --> 00:37:35,029 they stick with it. Well, if I partition people by the remailer and the news gateway that 390 00:37:35,029 --> 00:37:40,970 they use, that's what the colored squares are. What was previously an anonymous discussion 391 00:37:40,970 --> 00:37:44,950 thread suddenly makes it very easy to pick out who is saying what and who is agreeing 392 00:37:44,950 --> 00:37:51,950 with themselves. And it's even easier if I add in the header signature on the far right. 393 00:37:53,690 --> 00:37:57,839 And then here's a really interesting pattern that I observed. There are a host of messages 394 00:37:57,839 --> 00:38:03,460 who have subjects with a 1 or a 2 in them, like soggy, soggy 2. Well, I looked at these 395 00:38:03,460 --> 00:38:08,499 and found they are being posted together, really close together. And then I realized 396 00:38:08,499 --> 00:38:14,630 one of the options in type 1 remailers is to duplicate a message for redundancy. Send 397 00:38:14,630 --> 00:38:19,660 the message down two different remailer chains just in case one becomes unavailable. And 398 00:38:19,660 --> 00:38:25,809 while that gains you some measure of availability and redundancy, it is distinguishing. You 399 00:38:25,809 --> 00:38:32,220 can target a nym with huge messages. If you see two huge messages appear, well, you know 400 00:38:32,220 --> 00:38:37,049 that nym's reply block duplicates the messages. Then look for all the possible duplicate candidates 401 00:38:37,049 --> 00:38:41,859 and you have a candidate list of messages to that nym. Even if you are unsuccessful 402 00:38:41,859 --> 00:38:48,420 doing a e‑sub or h sub attack. A similar pattern is these. Look at each pair 403 00:38:48,420 --> 00:38:52,480 of messages are that are in the slightly different backgrounds. The second message comes out 404 00:38:52,480 --> 00:38:59,480 of dism five or six hours layer of pan array. It is distinguishing. The subject for all 405 00:39:02,619 --> 00:39:09,509 these was again: Weed will save the planet. Also messages from Frow were mixed in with 406 00:39:09,509 --> 00:39:16,380 no obvious correlation to other messages. So there were a number of hypotheses I tried 407 00:39:16,380 --> 00:39:20,549 that did not turn up interesting data. But there are more queries that can be run across 408 00:39:20,549 --> 00:39:26,710 this dataset but I need to start wrapping up. It all comes down to metadata. 409 00:39:26,710 --> 00:39:32,220 What we saw in AAM is the obvious mistakes we've kind of expected. It suffers a bit because 410 00:39:32,220 --> 00:39:36,960 we haven't taken into account the lessons that we've learned in the 10 to 15 years 411 00:39:36,960 --> 00:39:42,489 since it was developed. That's a lifetime in anonymity technology. But I do think there's 412 00:39:42,489 --> 00:39:49,019 some traffic analysis lessons that we haven't codified as best practice that we should. 413 00:39:49,019 --> 00:39:55,829 So what does the future hold for AAM? The security of a well‑posted message is good 414 00:39:55,829 --> 00:40:00,859 with a lot of caveats. If you use uncrackable pass phrases, only use servers that output 415 00:40:00,859 --> 00:40:04,519 packets, post through remailers with no distinguishing characteristics and you are willing to be 416 00:40:04,519 --> 00:40:10,710 in a very small anonymity set, go for it. I don't know how many people are using AAM 417 00:40:10,710 --> 00:40:14,910 today but I don't think it is a lot. What that means is if the government asks for a 418 00:40:14,910 --> 00:40:20,289 list of everyone who uses it, they could probably get a really short list of names to dig fairly 419 00:40:20,289 --> 00:40:27,289 deeply into each of their lives. And AAM crucially relies on remailers and 420 00:40:27,859 --> 00:40:33,809 news gateways. And these services are dying. Remember, that two people zax and dism post 421 00:40:33,809 --> 00:40:40,809 more than 98% of the traffic to AAM and it is also text based. Very limited bandwidth. 422 00:40:41,420 --> 00:40:45,849 And the nymservs them sells are pretty crappy architecturally speaking. We can sing given 423 00:40:45,849 --> 00:40:50,619 hot proxies like VPNs and ultra serves, a lot of shit because their architecture is 424 00:40:50,619 --> 00:40:57,069 not nearly as strong as Tor's. But nymservs are in the same category as trust this guy 425 00:40:57,069 --> 00:41:01,499 not to roll over on you. I feel compelled to mention that the alternative 426 00:41:01,499 --> 00:41:07,190 is to use Tor which you do trust to send email via thruway accounts on a service you do not 427 00:41:07,190 --> 00:41:12,190 trust. While this is a practice that everyone in this room has probably used or at least 428 00:41:12,190 --> 00:41:18,900 thought of, it's also a really shitty architecture. Now, the good news is we have something better. 429 00:41:18,900 --> 00:41:25,900 We have a very strongly architectured nymserv, pin gin gate uses product information retrieval 430 00:41:28,329 --> 00:41:34,269 instead of a shared mailbox. It exposes less metadata, resists flooding or size‑based 431 00:41:34,269 --> 00:41:40,289 correlation attacks. However, it's not built. It's been started but it's got a very long 432 00:41:40,289 --> 00:41:46,319 way to go. And it also requires a remailer network to operate. And we don't really have 433 00:41:46,319 --> 00:41:52,559 a remailer network. What we've got is Mix Master and Mix Minion. Mix Minion is a bit 434 00:41:52,559 --> 00:41:59,559 better than Mix Master. It uses old crypto with no chance of upgrading. Both of these 435 00:42:00,910 --> 00:42:06,630 services suffer from the fact we don't have a good solution to remailer spam or abuse. 436 00:42:06,630 --> 00:42:11,289 We don't have good documentation about them. And they both have horrible network diversity. 437 00:42:11,289 --> 00:42:18,289 Under 25 people running Mix Master. Under five, five people running Mix Minion. 438 00:42:20,819 --> 00:42:27,519 So if we like pinch and gate, the path forward also involves fixing Mix Minion and Mix Minion 439 00:42:27,519 --> 00:42:33,460 needs love. It is currently unmaintained but we have a to‑do list that includes the items 440 00:42:33,460 --> 00:42:38,400 I have got here. Some of them are extremely complicated in moving to a new packet format 441 00:42:38,400 --> 00:42:43,660 others are straightforward like improving the TLS settings. The others give you practice 442 00:42:43,660 --> 00:42:50,660 writing crypto, writing a complete stand‑along pinger in any language or style that you want. 443 00:42:51,960 --> 00:42:55,979 So if you are interested, there are a lot of cool opportunities here. 444 00:42:55,979 --> 00:43:01,950 But what I keep coming back to is the fact that we have no anonymity network that is 445 00:43:01,950 --> 00:43:08,950 high bandwidth, high latency. We have no anonymity network that would have let someone securely 446 00:43:09,259 --> 00:43:15,479 share the collateral motor video without WikiLeaks being their proxy. You can't take a video 447 00:43:15,479 --> 00:43:19,599 of corruption or police brutality and post it anonymously. 448 00:43:19,599 --> 00:43:25,539 Now, I hear you arguing with me in your heads. Use Tor and upload it to YouTube. No. YouTube 449 00:43:25,539 --> 00:43:30,630 will take it down. Use Tor and upload it to mega or some site that will fight fraudulent 450 00:43:30,630 --> 00:43:34,799 take‑down notices. Okay, but now you are trusting ‑‑ you are relying on the good 451 00:43:34,799 --> 00:43:41,799 graces of a third party. A third party that is known to host the video and can be sued. 452 00:43:42,059 --> 00:43:45,910 WikiLeaks was the last organization that was willing to take on that legal fight and now 453 00:43:45,910 --> 00:43:50,410 they are no longer in the business of hosting content for ordinary people. 454 00:43:50,410 --> 00:43:53,369 And you can say hidden services and I will point to size‑based traffic analysis and 455 00:43:53,369 --> 00:43:58,979 confirmation attacks that come with a low latency network, nevermind Wyman's paper that 456 00:43:58,979 --> 00:44:03,589 pretty much killed hidden services. We can go on and on like this. I hope you will at 457 00:44:03,589 --> 00:44:08,920 least concede the point that what you're coming up with are work‑around for a problem that 458 00:44:08,920 --> 00:44:13,720 we lack a good solution to. So if I have been able to entertain you, I'm 459 00:44:13,720 --> 00:44:18,479 glad. If I have been able to inspire you to work on anonymity systems, I'm overjoyed. 460 00:44:18,479 --> 00:44:20,809 If you want a place to start, I will point you there. Thank you. 461 00:44:20,809 --> 00:44:21,059 (applause)