So, I want to tell you a story today about intelligence and intelligence collection. And when you talk about intelligence, unless there's been a well-known leak in the press or something, there ain't a whole lot of examples to talk about. And we'll certainly talk about those. But I was really wanting to do today is talk about some actual projects. Most of the time my intelligence clients, they don't want me to talk about what I'm doing for them for obvious reasons. Sometimes I get lucky where I was able to talk about it, but normally that's not the case. So a couple of years ago, I decided that if I wanted to become my own intelligence client, okay. So what I ended up doing is I went into business with my girlfriend and we have an online retail business. And it's doing pretty well. We're in our second year. And it's growing. And I like to think the reason that it's growing is not because we work hard, but we do work hard. I like to think it's because we have got these little hacker brains and we just have the sales channels that we use. And I'll describe why that is important in a bit here. The other thing that's important for retail businesses is you want to know exactly what kind of inventory to buy before you buy it. I'll show you the intelligence that we use to do that. And I'm also going to show you how we manipulate markets or as I tell my mother, how we protect our investments. All right. So when people hear the word intelligence, usually they think immediately, military intelligence, right? Well, this, there's not a whole of that that applies to what we do. The other side of it is either business intelligence or competitive intelligence. And those two are really pretty different, but I think it's important that we describe the differences between the two. Business intelligence is what's happening within your business. You use internal data and you focus on knowing your operations and you focus on knowing your resources. And what you really are trying to do is make the whole operation just more effective and more efficient and everything. And that's all very useful. But it's also very different than competitive intelligence. Because with competitive intelligence, you're trying to find what's happening outside your organization. It doesn't matter if you're a business or a political organization or if you are a investigative journalist trying to collect information. This is all more competitive intelligence. You're using external data, and your focus is on knowing your competitors and knowing your markets. So this is the area where we're going to live today. We're going to live in the competitive intelligence area. So a lot of people, they think competitive intelligence, that's kind of murky. But it's important to know that competitive intelligence is taught in every top business school. In fact if you are running a business, it's your responsibility to do this kind of stuff. It's also important to know that there is actually a professional association of people who do competitive intelligence, with newsletters and conferences and that whole thing. So I want to talk about the title a little bit, applied intelligence. So when I'm talking about applied intelligence, I'm talking about intelligence that's actionable. Something that's going to change the way you're doing things. The problem is that most of the intelligence that's collected for the organizations that do collect it, it's not actionable. It won't actually change what they're doing. And the other problem is most organizations tend to overcollect intelligence when they do collect it. So when you do that, it comes at a higher cost, and you've got more exposure. The other thing is that a lot of the intelligence that's collected is done because people just feel obligated to collect. Not with any real reason. So, for example, I don't know how many of you saw this. This is something I had up, I was tweeting about it. I came up with a little scraper that every hour would go after the Def Con site and it would make it a printable mobile version of the Def Con speaker schedule and it's still up there if you're looking for it. And if you go through my Twitter account you can find it. So I'm doing this and it's cool and it's useful and I know a lot of people were using it because I was collecting analytics on this stuff, right? So I got the access time, IP addresses, I wrote a cookie so I could tell if people were looking at it more than once. But this is more analytics. It's not really intelligence. And the reason I don't consider this to be good intelligence is because there's absolutely nothing in these logs that would have changed anything that I was doing. In fact, the only thing that I found that was really interesting is probably about 90% of the traffic was bots. It wasn't even human activity. But again that's not going to really affect anything that I was doing. Now, if I wanted to make this implied intelligence, what I could have done is instead of just having a static table with all the schedule on it, I could have given people the option to click on the talks they wanted to go to, and I could have generated a nice little schedule for people to take along. Oh, 4:00 I need to be on this track. That would have been handy. And it also potentially would have created some actionable intelligence. Because if I had this information, and if I had it in aggregate, I could make some projections as to which talks are going to be really popular, which ones are we going to run out of seating, and it could have been useful for people doing speaker ops. I didn't do any of this kind of stuff, but I see the potential use that could have changed the way things were done. Okay. The second part of the title, information that's not there, or isn't there, that is a direct reference to metadata. Now, for most of us, we had no idea what metadata was until the Edward Snowden disclosures came across, and suddenly we heard the word surveillance, we all know what that is. And we knew that things that we were doing online, on our phones, even at the library, were now subject to surveillance and there was something called metadata that was involved in this. And people in the United States, we were pretty much confused. But fortunately we have public officials that can help us in this area. So Diane Fine Stein comes up and says this was a hastily called meeting, and she goes as you know, this is just metadata. There's no content involved. Okay. So as a citizen of the United States, this made me feel pretty damn good, right? It's just, okay. So you know I went to the library but you don't know what I looked at? Okay. That's pretty good. So then our own president got involved. So nobody is listening to your phone calls. That's not what this program is about. They're just looking at names or they're not looking at names and they're not looking at content. But sifting through the so-called metadata. They're sifting through the data. I'm pretty familiar with data. I'm not familiar with a sift command. (Laughter.) So this was good. This was very good. (Applause.) Now I know more about metadata. You can sift it. So I'm starting to feel pretty good about metadata. But I really got confused then when Michael Hayden, who is the former NSA boss chimed in, he said oh, we kill people based on metadata. (Laughter.) So okay, I need to know a little more now. I need to know just a little more. Okay. So metadata. Metadata is so important, and it's largely misunderstood. And we probably all know that metadata is data that describes other data, right? But more importantly than that, it's data that provides context for information. And usually the very best metadata doesn't even really exist. It needs to be created. And that's where the fun is. That's where the data hacking happens. So if you study metadata academically in college, they've got this whole plethora of categories for metadata, I like to keep it simple and say there's two kinds basically. There's parametric data, that's data that has to be collected or created and then there's imbedded data. And this is probably where most people got familiar with metadata initially, like in the '90s. And that's generally user created. It tends to be stuff like the imbedded headers on a digital photograph that describes the camera that was used, when the picture was taken, the F-stops, geo codes. This has become a whole branch of law now because it's all admissible in courts. I've got a couple of examples of these of data that's leaked this way. This is one I used last year if you were at my talk and I'm using it again because I just love it. This is a selfie of a Russian soldier that was uploaded to Instagram. And unfortunately, before he uploaded it, he didn't remove the geo codes. So it revealed the fact that while his government said they had no troops in Ukraine, he was in Ukraine. Another example, another war-related example, this is kind of a famous one. This is the Tony Blair memo. This is the memo that was the justification for the invasion of Iraq in what, 2002 or 2003, something like that. Right after 9/11. And it was actually used by Colin Powell when he went to the United Nations to make his justification. And the assumption was that this was done by area specialists and it was all original work, well-researched. We find out it was actually plagiarized. And not just plagiarized, but it was plagiarized by a graduate student someplace in the UK. Oh, that was the original document was a graduate student's paper. And they know it was plagiarized because not only was the content the same, but so were the grammatical errors. So where this gets interesting though is in the metadata, Microsoft Office files are full of metadata. If you look at the metadata here, we can see here that the people who did the plagiarizing, we can figure out who they were and we can see that they were not area specialists but they were actually political advisers to Tony Blair so it's kind of a nasty thing. This kind of stuff happens a lot though. The CEO of Google a couple of years ago, he wanted to just put a PowerPoint presentation online that was just kind of a general status report for how things were happening. What he neglected to do though is when he made that report, he actually made a copy of a very confidential PowerPoint presentation that was for internal use only and he did a lot of editing of the slides. What he neglected to do was edit the speaker notes and he released the specifics of a project called Google drive and it was well before it was officially announced. So watch what you put in your speaker notes because it's out there. Okay. So let's talk -- that was imbedded metadata. Let's look a little about parametric metadata and see how that's used. So according to the NSA, this is their own admission, this is what they collect. They collect phone numbers of parties making phone calls; they collect the time a call was placed; they collect the duration of a phone call; and also who initiated the call. Okay? That's kind of invasive, right? But you know, it's actually -- they're not doing anything that any Android app isn't making available to the developer, right? Stuff is in there. It's in all your phones if you have an Android. But with this metadata, you can get a lot of really interesting stuff. What they're really after is the relationships between callers. Because the metadata creates the context within the phone calls, how they were placed, and within that you can figure out the relationships with the callers. And these relationships can then be profiled. And once you've profiled them you can pick up the anomalies and outliers. And at that point it becomes possible to differentiate between a phone call being made by a parent at work to their child, making sure they got home from school okay. That phone call is going to look very different than some criminal operative calling in for instructions. It's going to look very different. And they're also able to identify things like burner phones because those are things like big outliers since those are the things that stick up. The other thing is you can look at the patterns that people are calling and the relationships and you can tie them to other events and if other events affect the way people relate, there may be a connection between that and those events. So these are the kinds of things that the NSA looks for. These are the real red flags. And they'll go three hops. So what that means is if you have a phone call unknowingly with somebody who has had a phone call with somebody who is a person of interest, now suddenly you're kind of a person of interest too. So that's how that all works. So you could look at this and say yeah, it makes sense that they would look at metadata because it's the easy way out. Otherwise they would have to look at all these conversations and they would probably have to do some kind of speech-to-text conversion and deal with languages and they would have to deal with dialects and all that kind of stuff. But it's just better data. Just flat out better data. It's already digital. It can be processed. But it needs to be created. That's the important thing about metadata. All these relationships and stuff need to be figured out. It's kind of like how long do you need to play the game of Clue before you know it's Mrs. Peacock killing colonel mustard in the library with a candle stick. If you're good at it you'll figure it out before other people playing the game. How do we do metadata? How do we process things? Well, I'm not the NSA. I don't have unlimited resources and stuff. So the area where I live is called operational security or op sec. That is a military term. What that refers to is looking at your day-to-day operations to see what kind of intelligence you're leaking. And if you have an adversary looking at your day-to-day operations, what kind of actionable intelligence can they collect? Well, I do this online. The Internet is my theater. So I look at things like employment postings. Last year I gave examples about how companies leak strategic plans through their employment postings, and I think you can kind of figure out how that could probably happen, right? If you think individuals leak information about themselves on social media, wait until you get a whole organization together leaking information on social media. It's a lot of information gets leaked this way. The way orders are fulfilled, so if you go online, place an order on a website, and if you watch the process, you see what kind of e-mails you get back, you look at order numbers, all this kind of stuff, you can learn a lot about the way a company fulfills items, where things are shipped from, that kind of stuff. If they have a store, and if they maintain a store, you can learn all kinds of stuff about pricing strategies, the products they choose to stock, the products that they choose not to stock. Very rich source of intelligence. If you sell something to a company online, the way they buy things can be very revealing. In fact, you can tell a lot about an organization's financial health just by looking at the check numbers. Just little things like that. And then there's the whole regulatory area. And this is largely things that are done for transparency. You know, financial filings, things courts, variance licenses, all that kind of stuff. Together, this is where I get competitive intelligence from. This is where I live. And for my counter intelligence clients, this is what we protect. Okay? So the first area I'm going to talk about, as far as things that we really have been able to capitalize on, is sequential numbers. And the privacy leak, the data leaks that happen with sequential numbers. It's really staggering. Sequential numbers are used everywhere. I mean, absolutely everywhere. They're on your vehicle identification number. Social security numbers, ticket numbers, order numbers. I mean, they are literally everywhere. And you know, from a data standpoint, you need unique identifiers to represent things like orders, people, all that kind of stuff. You but you don't need sequential numbers. And typically the only reason these numbers become sequential is because it's sloppy programming. Somebody programs an online app and instead of providing a unique order number, they basically reveal the index for that order and it's some table someplace. So they end up being sequential. Unless you've got chargebacks and weird things going on. But you can pretty much count on these things being sequential. So I'm going to show you a little story, I'm going to tell you a little story, to show the power of sequential numbers. And how the U.S. government almost let an entire generation be susceptible to identity fraud. Here's how it happened. Social security cards. They all have numbers on them. And the first three are an area number. The middle two are a group number. I really don't care too much about those. They basically kind of identifies the regions you're from and that kind of thing. But the last four digits are interesting because those are serial numbers. And from 1935 up until 1972 when they changed the law, they were truly sequential numbers. But it wasn't a big deal, because, I don't know about you guys, but I grew up in the '70s and I was probably 14, 15 years old when I got my social security card. And if I had gone with a buddy or if there had been somebody else applying for the number at the same time, we would have got sequential social security numbers. Not a big deal, because there wouldn't be a whole lot to connect these two people together. So if you knew one number you probably couldn't guess the other one because you wouldn't know who the other person was. Well, this all changed in 1972. These numbers became no longer sequential. And I don't know why they did that, but I know the way the government works. And I know it wasn't because they had a lot of foresight and were thinking about the tax reform act of 1986. Because in 1986, the IRS said if you are going to declare a dependent on your taxes, that dependent needs to have a social security number. And I think it was probably done more for tax reform to weed out fraudulent activity on taxes, because the year they did this, there were 7 million fewer dependents on tax forms. But again, I believe it was an accident that in '72 they stopped doing the sequential numbers. It wasn't because of this. I can't believe it was. But if they hadn't changed the law in '72, this would have been the scenario. You've got three kids back in the late '80s, you need to declare them as dependents on your tax forms. You've got kids that are maybe ages 5, 3 and 1. They don't have social security numbers. So what do you do? You go to the post office, you go to the social security administration and you file for social security cards for all of your children at once. So the bigger the family, the bigger the threat. Because each of these siblings will have sequential social security numbers. And it's pretty easy to pick up a social security number, right? If you work in HR, if you do credit kind of stuff or if somebody dies, you're all familiar with the social security death list, right? As soon as you die, if you're an American citizen, your social security number gets published. It becomes public so nobody uses it. But if you know one of these numbers, and you know they have siblings, it would be trivial to guess what their social security numbers are. So I'm very thankful that they changed the law. But it would have been catastrophic otherwise. How do we use sequential numbers in our business? And sequential numbers again are usually not thought of as being metadata. But they really are. So my girlfriend and I, we started our business, and we had tried other things before, nothing really caught on. But we got excited about this one, because our month-to-month sales were pretty good. We were having like 150% increase in sales pretty much consistently every month we were going on this and I was excited. I mean this was growing a lot faster than my consulting practice. And I think probably in the next year or two it's going to surpass my consulting practice, I think. But we were getting very confident at this point, right? By August and September, we were plowing money into the business. Everything that we were making we were putting back in, plus some. So we were investing in inventory, space, shelving, fulfillment equipment, scales, all kinds of stuff. We were doing it because we had some confidence. We had about six months of consecutive growth. We're in this now, okay? And then came October. Sales just went through the floor. So what you need to remember here is that we sell on a number of different channels and probably 80% of our business comes from this one channel. And our business on that channel was really, really off. And we were getting very concerned because we were still plowing money into this business. So we were thinking we don't know this business. We've only been at this for a few months. Did we enter at a bubble? Did we hit the peak of the bubble and it's going to be this from now on? We didn't know. So we were trying to figure out what was going on. We were pulling up spreadsheets, we're looking at old orders. We were trying to get a clue, right? So my girlfriend's reading off order numbers and amounts and I'm plugging things into a spreadsheet and we both noticed at about the same time that the order numbers were getting bigger. And then it was like okay, we need to find two orders that were close together and we found two that were placed almost at the same time and the order numbers that we were getting from this channel were incremental. Well, they weren't incremental, they were sequential. Okay. We got something now. So here's what we did. We took the last order number that we had to fulfill in October, and we subtracted the last order number we had from September, and this give us an estimate of the number of orders that were fulfilled through the sales channel by all of our competitors for the month of October. Pretty cool, huh? We knew that they fulfilled 6,500 orders, roughly, that month. This was our bad month, mind you. So we're thinking, well, our average order we figured out to be 1248 and we had no reason to believe that our orders were any different than anybody else's that was using the sales channel and we determined this to be pretty typical. It's very competitive. So this is pretty typical. So we took our average sale and -- by the way, that should not be July. It should be October. We multiplied it by the number of orders that we estimated for October. And we figured out that the gross sales on this channel in the month of October, our bad month, was just over $81,000. All right. That's pretty cool. So now let's look at the other months. So we took the last order number from the last order that we fulfilled coming from the sales channel, from April, May, June, July, all the way to October, and we were able to figure out the number of orders that were placed, and from this we were able to figure out the gross sales for that channel. Right? Fun with metadata. Cool stuff, right? We were happy, because we looked at our October sales, and it's like well, it's not consistent with the channel. The market is fine. We screwed up someplace. We did something that caused our sales to be off or maybe it was just a fluke. But we did not enter the market at the beginning of a bubble. So we were able to sleep much better at night knowing this. So better living through metadata. So this is what the channel sales actually looked like. Even though ours was really down for October, there's was actually up a little bit. So we're like what else can we figure out? Well, if you want to figure out what you can figure out, write down what you know. Well, we know that the channel that we were selling on, their commission is around 20%. So we're able to figure out how much commission they were making every month. And they were making $15,000, $16,000 a month commission. That's interesting. We also know that when somebody buys something on this channel, they're charged for postage and as resellers we get some of that back but the channel keeps $1.25 on every sale. So we're able to look at the postage that they get and we're able to figure out that their average monthly profit for their sales channel is about $24,000 a month, and that they have annual profits of about $300,000 a year. Well, this really got us thinking. We're thinking this is probably not information they wanted to share. Probably not with somebody who's got 20 years of web development experience. And this channel, I mean honestly, it's probably something -- their biggest task every month would be making pay pal payments, doing commission checks off to people, basically. I can't imagine that it takes anybody more than a couple hours a month to run this, seriously. They don't advertise, they have no expenses, they carry no inventory. Their business model is clearly better than ours. (Laughter.) For now. Again, all because of metadata and sequential numbers. So I'm going to look a little bit at how we buy inventory. One of the things that I've learned in retail is that there's a market price for everything. And if you're lucky you will get the market price. If it's inventory as it gets older you lower it over time. So you really want to get the market price but you're not going to make it in retail by charging for more than market. It's a good way to not sell stuff. So the way you make money in retail is not by selling stuff. You make money when you buy stuff. Because if something has a market value, you know you can probably sell it for that. But if you can buy it for well below market, that's where you make your money. So procurement is really important for us. And there's a number of websites where we buy stuff, too. And we sell on multiple websites, but this one channel that's got like 80% of our business, it's the only true market we deal with. I refer to it as a true market because it's large enough. I don't know how many people sell on this market, but there's a lot of them. They're not eBay but they're big. And there's literally millions of products out there. So it's mature enough. There are enough professional sellers out there. There's enough market demand, supply and demand curve going on so it is a true market. And if you want to know what the market value is for something, you go out and look here. So what we do is we have bots, we have software that goes off and autonomously looks for things that we might want to buy, and it compares them to our main channel where we sell, and it looks for market prices. And it comes up with a little report like this for us. This would be really time consuming. Having software that does this is really nice. Basically what it does, it identifies the item where it found it, what the price is, what the market price is. Anything with a margin of less than 300% we really don't deal with. And it tells us if we should buy it or ignore it. It's really cool. So the little buy thing, there's actually a link. If you click on it, it will actually go off and show you where you can buy it and like I said, it's a real time saver. The next step would be to auto mate this. Over a nine-month period we bought $20 million worth of cars. Something like that. So one of the reasons we're able to do this is because we're selling unique items. And there are major, major privacy issues for retailers that sell unique items. Truly unique items would be things where there's only one of. Real estate. You've got an address, one piece of real estate there. Vehicles. You've got a stock, you've got a VIN. A specific vehicle. Original art is another one. You can have only one copy of an original art unless it's like a signed photo or something. And even then you've got a numbering system to identify it. Then there's also likely unique items. That would be things like first edition books, autographed items, most used things. And these are things where it is possible that you've got two copies of something, but you probably don't. It's rare enough where they only probably have one copy of a first edition of Catcher in the Rye, for example, because it's rare enough. So when you've got a situation like this, here's what we do. Remember, we were very new to this. We did not know this industry at all. But we wanted to learn. And what we did is we wrote software that automatically collected the entire inventory of what we considered to be our main competitors. And this channel actually made it really easy for us to do this because they actually list the top resellers on the channel. So we would look at all of their inventory. And we would do it again. And basically the delta, the difference between these two over maybe like a day or a week's time would give us metadata to describe what they sold and for how much, and also what didn't sell during that period. So this is how we learned about our competition. And we did this a lot. So the more of this we did, the more metadata we got that described what sold, how much, how long their inventory stayed online before it was sold, and again, what didn't sell. And we're able to do this because if you look at the inventory one day, you look at it another day, if something is missing here, you assume that it sold for the last known price, right? And this is, I mean, you can tell so much about businesses that sell unique or probably or likely unique items. If you think of all the businesses out there that put all of their inventory out online, you can study this to the point where you can almost do their books for them. You know what their money is coming in. So, okay. One more. How do we protect our investments? Well, we have software that looks for situations like this. So if we're seller C and our price is 19.50 and we're competitors with D and E, we have software that will go out and flag situations like this, where people, for whatever reason, they're dumping stuff for below market value. So we have software that goes off and we immediately buy these things. This is how we kind of pseudo-manipulate our markets. So if we find things that are underpriced like that, we buy them and they become our inventory and we're protecting our investment. But again, we couldn't do this without intelligence. And like I said, I have to believe, if we were just typical sellers, we would just kind of not be doing that well. You've got to really know the business. And in our case, our proprietary intelligence tools, and our collection of metadata really made the difference for us. So if you find this kind of thing interesting, you can follow me on Twitter. And I'll leave my Twitter handle up for the rest of the slides here. I tend to almost live tweet while I'm developing. I'll talk about bugs I'm finding in my own code and stuff like that. The other thing you can do is watch the Def Con website. This set of slides will be made available on that website. These are much more complete than what you've got on your conference DVD. There was also an article written about me in the Christian science monitor last week, that's something you might want to check out. Six of my other Def Con talks are online. They're all on YouTube. And with the exception of I think one, they all involve intelligence and competitive intelligence. And then finally the last thing you might want to do is this is a book that I wrote, this is the second edition of the book I wrote, and right after this I am heading over to the no starch booth to do a book signing. If you have a question that would be a great time to sit and talk and I would love to see you there. So thank you very much. (Applause.)