1 00:00:00,334 --> 00:00:04,876 DANIEL BURROUGHS: And what I'm talking about today, this is basically, 2 00:00:04,876 --> 00:00:08,375 a follow up to a talk I gave a few years ago at DEF CON 18, 3 00:00:08,375 --> 00:00:11,584 about looking at information that's freely available 4 00:00:11,584 --> 00:00:14,292 out there on the net, and doing some trending 5 00:00:14,292 --> 00:00:19,292 and analysis of it, and trying to make something useful out of it. 6 00:00:19,334 --> 00:00:22,459 So a little bit about my background, I'm currently I'm the director 7 00:00:22,459 --> 00:00:25,584 of technology at the Center for Law Enforcement Technology 8 00:00:25,584 --> 00:00:29,834 training and research, which is a nonprofit research center. 9 00:00:29,834 --> 00:00:32,709 They got spun out of work that I used to do when I was a professor 10 00:00:32,709 --> 00:00:35,918 at the University of Central Florida. 11 00:00:35,959 --> 00:00:38,999 I was there for about ten years and in the engineering program, 12 00:00:38,999 --> 00:00:41,375 taught computer engineering, I developed 13 00:00:41,375 --> 00:00:45,626 the computer security curriculum there, and did embedded systems amongst 14 00:00:45,626 --> 00:00:48,083 some other things, and eventually moved away 15 00:00:48,083 --> 00:00:51,292 from teaching and more into research. 16 00:00:51,292 --> 00:00:52,999 And we ended up spinning out the research 17 00:00:52,999 --> 00:00:56,459 into an independent nonprofit center. 18 00:00:56,626 --> 00:00:59,501 I'm also the CTO for Hoverfly Technologies and prior 19 00:00:59,501 --> 00:01:02,584 to this, I used to work as a research associate 20 00:01:02,584 --> 00:01:05,999 up at the institute for security technology studies 21 00:01:05,999 --> 00:01:08,334 the Dartmouth College. 22 00:01:08,709 --> 00:01:13,459 So over the course of the last 20 years, some of the things that I have worked 23 00:01:13,459 --> 00:01:18,334 on appear on this list, and, you know, it took me quite a while to catch 24 00:01:18,334 --> 00:01:21,125 on to kind of what, like, the common theme 25 00:01:21,125 --> 00:01:25,083 between all of the things I was working on. 26 00:01:25,083 --> 00:01:29,459 I'm slow to pick up on these things at times, and eventually, 27 00:01:29,459 --> 00:01:33,876 as I started putting it together and kind. 28 00:01:33,876 --> 00:01:36,999 Realizing some of the same things I was doing, 29 00:01:36,999 --> 00:01:42,459 I realized all of this stuff from information sharing that I'm 30 00:01:42,459 --> 00:01:46,334 working on now, to hardware sensor networks, 31 00:01:46,334 --> 00:01:51,999 to intrusion detections, they rely on sensor fusion. 32 00:01:56,999 --> 00:02:02,083 Everything that we are doing, all of those things that are that I listed 33 00:02:02,083 --> 00:02:06,501 up there, they are all based on taking some sort of sensor, 34 00:02:06,501 --> 00:02:10,125 and using it to try to get some measure of reality, 35 00:02:10,125 --> 00:02:14,334 but the sensor always has some limitations. 36 00:02:14,334 --> 00:02:16,083 Sometimes it's a significant one. 37 00:02:16,083 --> 00:02:19,083 Sometimes it's not so bad, but every sensor that we look at reality, 38 00:02:19,083 --> 00:02:22,417 including ourselves, including when we view things, 39 00:02:22,417 --> 00:02:26,999 it's always got some sort of limitation, and it's one particular view, 40 00:02:26,999 --> 00:02:29,709 and that influences the data we are seeing, 41 00:02:29,709 --> 00:02:32,626 and you can get we have to work towards trying 42 00:02:32,626 --> 00:02:37,125 to get more meaningfulness out of the data that we have. 43 00:02:37,125 --> 00:02:39,999 One of the ways that we do this, and one of the things 44 00:02:39,999 --> 00:02:44,375 the techniques that I find most versatile, I would say, is sensor fusion, 45 00:02:44,375 --> 00:02:47,918 where we take multiple sensors, multiple ways of looking 46 00:02:47,918 --> 00:02:50,918 at the same thing, and kind of put that together, 47 00:02:50,918 --> 00:02:54,918 with the hope that we can take the limitations of one observation 48 00:02:54,918 --> 00:02:58,167 and cancel it out with a different observation that has 49 00:02:58,167 --> 00:03:01,083 a different set of limitations. 50 00:03:01,250 --> 00:03:02,918 So at least that's the hope. 51 00:03:02,918 --> 00:03:04,417 At least, you know, if we can put two halfway decent things 52 00:03:04,417 --> 00:03:08,083 to go and get something that's more than the sum of its parts. 53 00:03:08,876 --> 00:03:12,250 So before I get kind of more into my stuff, I always feel 54 00:03:12,250 --> 00:03:15,999 like with the in this particular subject, that I have to give 55 00:03:15,999 --> 00:03:19,417 an acknowledgment to the guy that inspired kind of some 56 00:03:19,417 --> 00:03:22,334 of these thoughts in my head and it was actually 57 00:03:22,334 --> 00:03:26,250 at DEF CON, way back at DEF CON 13, Broward Horne gave this talk 58 00:03:26,250 --> 00:03:29,375 on Meme mining for fun and profit. 59 00:03:29,626 --> 00:03:33,834 It's problem you know, all great ideas come out of a problem. 60 00:03:33,999 --> 00:03:37,999 I guess a lot of ideas come out of trying to solve a problem too, 61 00:03:37,999 --> 00:03:41,167 but his was a really good idea. 62 00:03:41,167 --> 00:03:43,876 His problem was that he would find that he would, like, 63 00:03:43,876 --> 00:03:47,999 start learning some new technology, some new tool or at least it was new 64 00:03:47,999 --> 00:03:51,459 to him and by the time he felt he had mastered it, it was kind 65 00:03:51,459 --> 00:03:55,167 of on the way out or the market, the job market was just saturated 66 00:03:55,167 --> 00:03:58,918 with people doing that now or it had just fallen by the wayside 67 00:03:58,918 --> 00:04:01,292 and nobody cared about it. 68 00:04:01,292 --> 00:04:06,999 He was always kind of struggling, what should I spend my time learning 69 00:04:06,999 --> 00:04:09,125 to get ahead? 70 00:04:09,125 --> 00:04:11,501 And he ended up kind of thinking about this as like, 71 00:04:11,501 --> 00:04:14,209 everything has this sort of saturation curve where 72 00:04:14,209 --> 00:04:17,626 a trend starts happening and there's a little bit of chatter 73 00:04:17,626 --> 00:04:21,125 about it and eventually it starts taking off, and everybody hears 74 00:04:21,125 --> 00:04:25,999 about it when it's big and growing and then it gets boring and old. 75 00:04:25,999 --> 00:04:30,501 You want to try those things earlier on and went through and did it. 76 00:04:30,501 --> 00:04:33,999 This is a slide pulled out of his old presentation. 77 00:04:35,542 --> 00:04:40,999 What he would do is he would look at new sources and forums and blogs, 78 00:04:40,999 --> 00:04:44,999 for information and key words and kind of pull them 79 00:04:44,999 --> 00:04:50,542 out and see what was trending on there, with the idea that that's kind 80 00:04:50,542 --> 00:04:54,751 of a precursor to seeing that early chatter about it, 81 00:04:54,751 --> 00:04:57,792 so something can take off. 82 00:04:57,792 --> 00:05:01,999 This one in this particular case, this is the red line shows how many times 83 00:05:01,999 --> 00:05:06,999 the word "palladium" showed up in news reports and forums. 84 00:05:06,999 --> 00:05:09,626 And the blue is the "price of palladium" and you can see clearly 85 00:05:09,626 --> 00:05:13,167 there's a lot of chatter about it before the price spiked up and then 86 00:05:13,167 --> 00:05:17,125 the chatter dropped off before the price came back down. 87 00:05:17,125 --> 00:05:21,584 It's a really good indicator about predicting the future there, 88 00:05:21,584 --> 00:05:23,959 what's going on. 89 00:05:24,125 --> 00:05:28,125 Anyway, that thought inspired me and when I was when I was teaching, 90 00:05:28,125 --> 00:05:32,417 I would have students who could come to me and they would want to know, 91 00:05:32,417 --> 00:05:36,584 what skills do they need to get a good job and all of that and I tried 92 00:05:36,584 --> 00:05:40,125 to apply what Broward had done in a similar way by monitoring 93 00:05:40,125 --> 00:05:44,876 and observing trends and this is single variable observation. 94 00:05:44,918 --> 00:05:47,125 It's doing some correlation and it started off looking 95 00:05:47,125 --> 00:05:51,667 at Craigslist data, just because Craigslist is nicely available. 96 00:05:51,667 --> 00:05:53,999 It's well organized by geographic location and you can go 97 00:05:53,999 --> 00:05:57,834 in in certain categories, like where they have the job postings in there, 98 00:05:57,834 --> 00:06:00,999 it's categories by different types of jobs. 99 00:06:00,999 --> 00:06:03,542 Craigslist is not the best place to look for jobs 100 00:06:03,542 --> 00:06:06,792 but it had some interesting properties in that there are a lot 101 00:06:06,792 --> 00:06:10,083 of small companies that post on there or maybe trying new things, 102 00:06:10,083 --> 00:06:12,834 a lot of entrepreneurial companies, start ups things 103 00:06:12,834 --> 00:06:16,626 like that are starting there, not so much the big ones. 104 00:06:16,999 --> 00:06:20,375 That tends to skew it a little bit more towards being 105 00:06:20,375 --> 00:06:25,459 a leading indicator, something that is it will come out a bit ahead 106 00:06:25,459 --> 00:06:27,417 of the curve. 107 00:06:27,417 --> 00:06:29,083 So some of the things that I ended up looking at, just 108 00:06:29,083 --> 00:06:31,999 because I found correlations in here were jobs, items for sale 109 00:06:31,999 --> 00:06:33,918 and adult services. 110 00:06:33,918 --> 00:06:38,375 And I didn't I'm not saying I looked for adult services on Craigslist. 111 00:06:38,375 --> 00:06:40,125 It's just my research took me there. 112 00:06:41,083 --> 00:06:42,626 (Laughter). 113 00:06:42,999 --> 00:06:47,125 So, you know, the things I saw, looked like this. 114 00:06:47,167 --> 00:06:48,375 This is an example. 115 00:06:48,375 --> 00:06:50,584 This is just showing job postings by date. 116 00:06:50,626 --> 00:06:53,876 And there is a this is showing the dips you see there, 117 00:06:53,876 --> 00:06:58,542 these are weekly trends, it goes dead on the weekends. 118 00:06:58,542 --> 00:07:00,999 There's a spike on a Monday and a spike on a Friday. 119 00:07:01,626 --> 00:07:04,209 Okay, it's kind of boring. 120 00:07:04,250 --> 00:07:06,751 It's sort of interesting, but not unexpected. 121 00:07:06,751 --> 00:07:08,959 There were certain things that stood out. 122 00:07:08,999 --> 00:07:11,334 In this particular case, one of the things that jumped 123 00:07:11,334 --> 00:07:14,999 out at me was Austin never had a spike on a Friday. 124 00:07:14,999 --> 00:07:16,125 It always dropped off. 125 00:07:16,125 --> 00:07:19,083 It's kind of hard to see, but it's the orange line in there. 126 00:07:19,083 --> 00:07:20,834 It never has a second spike in it. 127 00:07:20,834 --> 00:07:22,375 I thought that was interesting. 128 00:07:22,375 --> 00:07:23,083 The other thing, and this is what came 129 00:07:23,083 --> 00:07:26,167 out of the adult services was there was a correlation 130 00:07:26,167 --> 00:07:29,999 between adult services being offered or bicycles for sale or a lot 131 00:07:29,999 --> 00:07:32,083 of items for sale. 132 00:07:32,959 --> 00:07:35,626 This led to a couple of interesting discussions. 133 00:07:35,834 --> 00:07:38,292 One of my favorite moments at DEF CON, when someone said, hey, 134 00:07:38,292 --> 00:07:41,083 I think I can help you out, I'm from Austin and my sister 135 00:07:41,083 --> 00:07:42,999 is a prostitute. 136 00:07:45,083 --> 00:07:46,334 (Laughter). 137 00:07:47,292 --> 00:07:49,167 So that and then it led into a discussion 138 00:07:49,167 --> 00:07:52,751 of things could you sell one time, like a bicycle or something that you can 139 00:07:52,751 --> 00:07:55,209 sell over and over and over again. 140 00:07:58,083 --> 00:08:01,999 So, okay, that's what I had done before, and we had looked at that. 141 00:08:01,999 --> 00:08:04,501 There's some interesting stuff there, but I wanted to dig a bit deeper 142 00:08:04,501 --> 00:08:06,709 into the data and look for more relationships 143 00:08:06,709 --> 00:08:09,334 and more correlations between data and hopefully be able 144 00:08:09,334 --> 00:08:12,999 to pull in other sources and do some fusions on this. 145 00:08:12,999 --> 00:08:15,167 I started looking for things like different cycles 146 00:08:15,167 --> 00:08:19,417 in like the job postings or correlations in them. 147 00:08:19,417 --> 00:08:21,375 Because at the time when I was working on them, keep in mind, 148 00:08:21,375 --> 00:08:24,459 I was really trying to help out some of the students and figure 149 00:08:24,459 --> 00:08:27,292 out what skills they needed and what would really help them 150 00:08:27,292 --> 00:08:28,918 get ahead. 151 00:08:30,250 --> 00:08:32,876 There were definitely correlations in there. 152 00:08:32,999 --> 00:08:34,459 You know, there are things that you would see, 153 00:08:34,459 --> 00:08:36,417 but nothing unexpected. 154 00:08:36,417 --> 00:08:39,959 Nothing really interesting that jumped out in related skills. 155 00:08:39,959 --> 00:08:44,250 Could you say if a job was going to have one particular tool set 156 00:08:44,250 --> 00:08:47,999 or skill set listed, there are other ones likely 157 00:08:47,999 --> 00:08:51,209 to be listed with it as well. 158 00:08:51,999 --> 00:08:54,999 Again, nothing really jumped out at me as being unexpected out of it, 159 00:08:54,999 --> 00:08:59,209 but eventually, there were a couple of interesting things that showed up. 160 00:08:59,584 --> 00:09:02,584 One was kind of funny and it was how often 161 00:09:02,584 --> 00:09:05,459 the words drug test or drug screen showed 162 00:09:05,459 --> 00:09:11,375 up in a job advertisement correlated with the different skills in it. 163 00:09:11,375 --> 00:09:13,999 And apparently, like (Laughter) If you don't think you are 164 00:09:13,999 --> 00:09:16,999 going to pass a drug test, don't bother learning SAP, 165 00:09:16,999 --> 00:09:19,999 because it won't do you any good. 166 00:09:21,250 --> 00:09:26,999 If you want to develop IOS applications, you know, go knock yourself out. 167 00:09:27,250 --> 00:09:28,792 (Laughter). 168 00:09:28,876 --> 00:09:30,959 You know, I guess there's probably some logic 169 00:09:30,959 --> 00:09:35,918 here, like how corporate or uncorporate the environment is, I suppose. 170 00:09:36,667 --> 00:09:40,334 Another thing was looking at jobs that had benefits, and 171 00:09:40,334 --> 00:09:43,999 like retirement and health and medical. 172 00:09:44,334 --> 00:09:46,876 You know, the interesting one, the best one was COBOL 173 00:09:46,876 --> 00:09:51,167 but I think it was a bit of an outlier, there were so few jobs with COBOL. 174 00:09:51,167 --> 00:09:52,999 I guess to get any grizzled COBOL programmer 175 00:09:52,999 --> 00:09:57,459 to come work for you, you had to give them a lot of benefits. 176 00:09:59,083 --> 00:10:03,999 Python and Android, and HTML, you won't give them much in the way 177 00:10:03,999 --> 00:10:06,751 of benefits, I suppose. 178 00:10:09,250 --> 00:10:15,125 As I was looking into this actually, this is much more recently. 179 00:10:15,125 --> 00:10:16,501 This is earlier this year. 180 00:10:16,501 --> 00:10:17,999 I came across this article. 181 00:10:17,999 --> 00:10:21,000 This is actually out of the journal of "Psychology" where a psychologist, 182 00:10:21,000 --> 00:10:22,999 Dorothy Brambel. 183 00:10:24,542 --> 00:10:30,250 She looked at the missed connections of Craigslist. 184 00:10:30,751 --> 00:10:34,626 This is where people say, oh, I saw you as I was walking 185 00:10:34,626 --> 00:10:39,209 across the parking lot and tried to catch your eye and they post it 186 00:10:39,209 --> 00:10:42,918 and hope they will make a connection. 187 00:10:43,167 --> 00:10:45,542 This is organized by state. 188 00:10:45,542 --> 00:10:49,375 This is where people had the most missed connections. 189 00:10:51,375 --> 00:10:54,209 Walmart has a log on the South. 190 00:10:54,209 --> 00:10:55,209 (Laughter). 191 00:10:56,417 --> 00:10:59,959 You know, Oklahoma, it's the state fair. 192 00:10:59,959 --> 00:11:00,959 Of course! 193 00:11:00,959 --> 00:11:02,584 You know, it makes perfect sense. 194 00:11:02,584 --> 00:11:04,792 And, you know, in Nevada, it's casinos. 195 00:11:04,999 --> 00:11:07,751 And the one thing I just had to put this up there, 196 00:11:07,751 --> 00:11:11,667 one thing that just jumped out at me like crazy was Indiana, 197 00:11:11,667 --> 00:11:13,459 it's at home. 198 00:11:13,459 --> 00:11:14,459 (Laughter). 199 00:11:14,459 --> 00:11:18,292 I don't know what they're doing in Indiana, but I'm pretty sure 200 00:11:18,292 --> 00:11:21,334 they are doing it wrong. 201 00:11:23,626 --> 00:11:24,999 (Laughter). 202 00:11:25,501 --> 00:11:29,250 So I was talking with a friend of mine about this stuff, 203 00:11:29,250 --> 00:11:33,542 Dave Kerbletski and he told me about something that he had done 204 00:11:33,542 --> 00:11:37,167 in his neighborhood in Orlando, Florida. 205 00:11:37,626 --> 00:11:40,999 They had a rash of crime, recently and they didn't know they had a rash 206 00:11:40,999 --> 00:11:45,125 of crime, until the neighbors got talking to even other. 207 00:11:45,501 --> 00:11:47,334 Everybody knew a little different incident that 208 00:11:47,334 --> 00:11:48,999 had happened. 209 00:11:48,999 --> 00:11:50,999 He went and did some searching and found 210 00:11:50,999 --> 00:11:55,751 out that there was some open source data that the sheriff's department would 211 00:11:55,751 --> 00:11:58,542 post about their CAD, their dispatch calls and 212 00:11:58,542 --> 00:12:01,459 he wrote this little tool to do some geolocating 213 00:12:01,459 --> 00:12:04,999 on it and tweet it out and then you can subscribe to it, 214 00:12:04,999 --> 00:12:08,834 and get tweets from this thing, like, really hyper local things 215 00:12:08,834 --> 00:12:12,999 for your neighborhood, about what's going on there. 216 00:12:13,125 --> 00:12:15,626 And it's actually one thing that's fun. 217 00:12:15,834 --> 00:12:20,417 I pulled this up earlier today, and, like, you know, I was just noticing things. 218 00:12:20,459 --> 00:12:22,334 This is in Orlando area. 219 00:12:22,334 --> 00:12:23,999 You know, the first tweet that's on there, and I'm amazed 220 00:12:23,999 --> 00:12:26,709 at the you know, the sheriff's office is putting this out, 221 00:12:26,709 --> 00:12:29,876 they are basically saying there's a designated patrol area available, 222 00:12:29,876 --> 00:12:31,584 which means there's an area that nobody 223 00:12:31,584 --> 00:12:33,876 is patrolling it currently. 224 00:12:33,999 --> 00:12:38,626 And this is down, like, in a real tourist trap part of Orlando. 225 00:12:38,999 --> 00:12:40,626 That could be useful information to somebody, 226 00:12:40,626 --> 00:12:43,501 to know there are no cops there right now. 227 00:12:43,501 --> 00:12:45,709 And then there's a few accidents and then I guess the people 228 00:12:45,709 --> 00:12:47,918 at the bottom, down on Poppy Avenue would be 229 00:12:47,918 --> 00:12:50,751 happy to note that there's a fugitive from justice running 230 00:12:50,751 --> 00:12:52,792 around in their area. 231 00:12:54,334 --> 00:12:58,417 This kind of led us to look into more sources for data. 232 00:12:58,417 --> 00:13:01,959 What they offered where we were, wasn't very wasn't very useful 233 00:13:01,959 --> 00:13:03,751 or organized. 234 00:13:03,751 --> 00:13:06,751 We found out and started looking in places that kind of subscribe 235 00:13:06,751 --> 00:13:09,209 to the open gov system and this is a movement 236 00:13:09,209 --> 00:13:12,667 to have more transparent government data. 237 00:13:12,792 --> 00:13:15,751 Some cities publish huge amounts of data about what's going 238 00:13:15,751 --> 00:13:19,209 on in their city, the fire department, the police department, 239 00:13:19,209 --> 00:13:24,459 live interesting data, and Seattle, Boston, Chicago, a number of others. 240 00:13:24,459 --> 00:13:27,083 These are three that I spent a bit of time looking at. 241 00:13:27,083 --> 00:13:29,501 There's information about incidents that are going on, like, 242 00:13:29,501 --> 00:13:31,083 police, fire. 243 00:13:31,083 --> 00:13:32,751 In Chicago, you can actually track where 244 00:13:32,751 --> 00:13:35,250 the snowplows are in the city. 245 00:13:35,250 --> 00:13:38,292 You can track where garbage trucks are in realtime, from the city, 246 00:13:38,292 --> 00:13:41,999 which I just find really kind of fascinating. 247 00:13:42,250 --> 00:13:44,709 There's information about where bicycle racks, 248 00:13:44,709 --> 00:13:47,751 public toilets, land marks and even where cameras are, where 249 00:13:47,751 --> 00:13:51,667 the city has all of its cameras posted, which that one I thought was actually 250 00:13:51,667 --> 00:13:53,999 particularly interesting but you can really go 251 00:13:53,999 --> 00:13:56,876 on here and make a map of what is an observable location 252 00:13:56,876 --> 00:14:00,959 throughout the city, and what is not an observable location. 253 00:14:01,083 --> 00:14:02,584 Which, again, that could be useful information 254 00:14:02,584 --> 00:14:04,083 for somebody. 255 00:14:04,250 --> 00:14:07,083 Here's something, the Seattle one is great. 256 00:14:07,083 --> 00:14:10,542 They have their visualization tools built right into this thing and this 257 00:14:10,542 --> 00:14:12,999 is showing a map showing police incidents 258 00:14:12,999 --> 00:14:18,667 over a period of time, around in part of Seattle and I pulled up this area. 259 00:14:18,667 --> 00:14:22,918 You notice, like most of it everything is kind of in that same yellow orange, 260 00:14:22,918 --> 00:14:26,876 except for one big glowing red blob out there and, you know, 261 00:14:26,876 --> 00:14:28,999 over in Georgetown. 262 00:14:28,999 --> 00:14:31,334 I don't know if anybody is from Seattle here. 263 00:14:31,334 --> 00:14:34,834 But I'm like wondering what the heck is going on over in Georgetown. 264 00:14:34,834 --> 00:14:37,501 And you look in a little bit closer and right next to it 265 00:14:37,501 --> 00:14:44,083 is the Boeing Propulsion Engineering Labs, which makes me feel really good. 266 00:14:44,667 --> 00:14:49,584 So coming back, to an area I know a bit more about, back in Orlando, 267 00:14:49,584 --> 00:14:52,417 we pulled up data that had we pulled 268 00:14:52,417 --> 00:14:55,999 out traffic ticket, they don't publish data 269 00:14:55,999 --> 00:14:59,209 about who got ticket, but you can see when 270 00:14:59,209 --> 00:15:02,083 a traffic stop occurred. 271 00:15:02,125 --> 00:15:07,125 And I we looked at it and pulled data, that covered three roads in the area 272 00:15:07,125 --> 00:15:12,542 and this is right out by the University of Central Florida. 273 00:15:12,542 --> 00:15:14,751 These are three roads that run all east west and they are 274 00:15:14,751 --> 00:15:18,417 the three major roads just kind of one is right into the university, and one 275 00:15:18,417 --> 00:15:21,083 is a bit north and one is a bit south and they all have 276 00:15:21,083 --> 00:15:23,876 about the same amount of traffic on and they all have 277 00:15:23,876 --> 00:15:25,999 a similar traffic pattern. 278 00:15:26,083 --> 00:15:30,417 And when we went through and what this chart 279 00:15:30,417 --> 00:15:35,542 is showing here is this is each one of the groupings 280 00:15:35,542 --> 00:15:40,834 is a is a week long period, five week days. 281 00:15:40,834 --> 00:15:42,709 And then it's repeated over six weeks. 282 00:15:42,999 --> 00:15:48,083 And one of the things I found really interesting was the chance 283 00:15:48,083 --> 00:15:52,834 of a traffic ticket occurring on one of these roads, 284 00:15:52,834 --> 00:15:59,167 the order it was always likely at different times of the day. 285 00:15:59,167 --> 00:16:02,083 It always followed the same sort of pattern, particularly 286 00:16:02,083 --> 00:16:06,083 between this highway 50 and University Boulevard. 287 00:16:07,167 --> 00:16:09,751 The Highway 50 traffic stops all preceded 288 00:16:09,751 --> 00:16:13,542 the University Boulevard traffic stops and when you go out there 289 00:16:13,542 --> 00:16:16,417 and you look at the traffic, the traffic pattern 290 00:16:16,417 --> 00:16:19,083 is not really any different. 291 00:16:19,083 --> 00:16:22,250 So if you start thinking about this and start putting together, well, 292 00:16:22,250 --> 00:16:24,501 why you know, why do you always see one 293 00:16:24,501 --> 00:16:27,792 before the other, I don't have you know I don't have hard 294 00:16:27,792 --> 00:16:30,999 evidence to back this up, but our belief is you are seeing 295 00:16:30,999 --> 00:16:35,417 an influence of the patrol pattern of the police in the city. 296 00:16:35,417 --> 00:16:37,667 So you are actually able to kind of get in there and 297 00:16:37,667 --> 00:16:40,584 through their information that they are putting out sort 298 00:16:40,584 --> 00:16:42,834 of start tracking them. 299 00:16:42,834 --> 00:16:45,834 It's kind of like, you know, there's a talk we went to earlier yesterday, 300 00:16:45,834 --> 00:16:49,999 I guess it was, there's a great talk with Brendan O'Connor that was talking 301 00:16:49,999 --> 00:16:54,083 about tracking people by seeing, like, information their devices are spitting 302 00:16:54,083 --> 00:16:56,542 out on wireless networks. 303 00:16:57,626 --> 00:17:00,999 It's a similar concept, that they are putting out a lot 304 00:17:00,999 --> 00:17:04,999 of information here, that is that if you look at it the right way 305 00:17:04,999 --> 00:17:09,250 and you take the right pieces of data, and put it to go, you can pull 306 00:17:09,250 --> 00:17:11,999 a lot more information out about what their 307 00:17:11,999 --> 00:17:15,834 about what they're doing and what's going on. 308 00:17:17,417 --> 00:17:21,083 So, you know, why so by this time, I have kind of changed kind 309 00:17:21,083 --> 00:17:23,876 of what I was interested in doing and probably 310 00:17:23,876 --> 00:17:27,667 because I quit teaching and I left the university so I don't have 311 00:17:27,667 --> 00:17:29,667 students anymore. 312 00:17:29,667 --> 00:17:32,292 I'm not that interested in helping people find jobs. 313 00:17:32,292 --> 00:17:36,375 So now I found it kind of interesting to look at these government entities 314 00:17:36,375 --> 00:17:40,501 and the police and other things that were going on and also 315 00:17:40,501 --> 00:17:44,999 because I have worked with law enforcement a lot. 316 00:17:44,999 --> 00:17:48,375 It's kind of interesting to see how on one hand, they are very protective 317 00:17:48,375 --> 00:17:51,999 of their data, but at the same time, they are putting out a lot 318 00:17:51,999 --> 00:17:55,999 of information that I'm not sure that they quite realize how much that 319 00:17:55,999 --> 00:17:58,584 they are putting out there. 320 00:17:58,584 --> 00:18:01,292 Frankly, I think it's actually kind of a good thing. 321 00:18:01,292 --> 00:18:02,792 I like being able to have more information 322 00:18:02,792 --> 00:18:05,834 and being able to look back on them and like I say, why should 323 00:18:05,834 --> 00:18:08,999 the NSA have all the fun on spying on people? 324 00:18:10,167 --> 00:18:15,083 So the what's next with this, and there's there so much more I would 325 00:18:15,083 --> 00:18:18,999 like to talk about, but these 20 minute talks you have 326 00:18:18,999 --> 00:18:21,542 to be kind of fast in. 327 00:18:22,626 --> 00:18:26,918 What I'm really interested in is actually expanding 328 00:18:26,918 --> 00:18:32,999 the model that we have been using on this data to be analyzed. 329 00:18:32,999 --> 00:18:37,334 We kind of built things that are very purpose driven, that the first net 330 00:18:37,334 --> 00:18:40,209 of analysis, we did was very structured 331 00:18:40,209 --> 00:18:44,584 around the seeking out the jobs, doing that, and then kind 332 00:18:44,584 --> 00:18:50,083 of got side tracked by the crime and going off that direction. 333 00:18:50,083 --> 00:18:52,584 And I want to bring this back together and try 334 00:18:52,584 --> 00:18:56,999 to build a more robust model for analyzing this data and throw some 335 00:18:56,999 --> 00:19:01,999 data mining at this, where so far a lot of what we have done has been what 336 00:19:01,999 --> 00:19:05,209 I would say is like hypothesis based where I make 337 00:19:05,209 --> 00:19:07,999 a prediction about something I should see 338 00:19:07,999 --> 00:19:12,584 in here and then a correlation and see whether it exists in the data 339 00:19:12,584 --> 00:19:14,751 or doesn't exist. 340 00:19:14,751 --> 00:19:19,999 And I'm sure there's a lot of relations in there that are things that I wouldn't 341 00:19:19,999 --> 00:19:24,959 expect or I wouldn't find otherwise, I want to throw a bit of sort 342 00:19:24,959 --> 00:19:30,542 of data mining and kind of that sort of blind either either AI or brute force 343 00:19:30,542 --> 00:19:35,167 approach to finding relations throughout the data. 344 00:19:35,459 --> 00:19:37,542 So I think I'm about out of time right now 345 00:19:37,542 --> 00:19:40,999 and I'm getting a nod from the back and so I will wrap it up there and 346 00:19:40,999 --> 00:19:43,834 if there are any questions, I would happy to take a couple 347 00:19:43,834 --> 00:19:45,959 until they cut me off. 348 00:19:46,667 --> 00:19:47,999 (Applause). 349 00:19:48,834 --> 00:19:49,999 Thank you. 350 00:19:53,083 --> 00:19:54,459 Okay. 351 00:19:57,584 --> 00:19:58,999 Thank you.