Hey guys. Um, so this is a two o'clock weaponizing data science for social engineering. And these are the guys, and we're gonna kick it off. Stay. Alright, so DEF CON goons are no longer allowed to drink in red shirts, nor are they allowed to do Shot the Noob. I'm gonna keep this short. It is Philip's first time speaking at DEF CON. John spoke last year but wasn't able to get a shot. So let's do a shot with him and have a good time. Don't fuck it up. Alright, hey guys. My name is John Seema. Um, so welcome to our talk on weaponizing data science for social engineering. Wow. Dude, that was strong. Um, weaponizing data science for social engineering automated end-to-end spear phishing on Twitter. So, uh, we think this talk's actually a pretty good fit for this conference, right? Every year, uh, Black Hat, you know, does this attendee survey. And every year, social media, you know, phishing, spear phishing, social engineering is near the top of their list of concerns. Um, we wanted to try our hand and see how effective using AI to actually automate spear phishing would be. And so, uh, things like social engineering toolkit actually automates the backend of, uh, you know, social engineering, right? So, uh, creating a malicious payload thing like that. We're actually interested in more of the frontend sort of stuff. So actually generating links that users will click. Um, traditionally, there are two different types of approaches to this. There's phishing, which is very low effort, you know, shotgunning tons and tons of messages. But it also has very, very low success, between like 5 and 14%. Um, there's also spear phishing, which is highly manual. It takes like tens of minutes to actually research a target and create a message that's, you know, uh, um, handcrafted to that actual person. Um, but, uh, it also has very high success. The social media pen testing tool that we released today actually combines the automation of phishing campaigns with the effectiveness of spear phishing campaigns. And with that said, uh, I'm John Seymour. My hacker handle is DeltaZero. Um, I'm a data scientist at ZeroFox by day and by night. I am a PhD student at the University of Maryland, Baltimore County. And in my free time, I like to research malware data sets. Alright, and my name is Philip Tully. I'm a senior data scientist at ZeroFox. And in a past life, I was a PhD student at the University of Edinburgh and the Royal Institute of Technology in Stockholm. Um, so in that past life, I studied recurrent neural networks, artificial intelligence, but in a much more kind of biologically oriented way. I was trying to figure out how you could combine neurons together and connect them up with synapses and simulate networks of neurons to try to get some storage and recall of memories. Um, but nowadays, instead of combining different patterns of spikes to create some biological representation of a memory, um, combining text to try to, uh, using AI, so using similar techniques to try to generate text. Um, this is, this is not necessarily anything new. Uh, the field is known as natural language processing. It's been around for a really long time. One of the kind of, uh, fundamental examples that I think it's important to talk about is, uh, the Elisa chatbot. This was something that happened over 50 years ago with the Elisa chatbot. So this was a, this was designed by a psychotherapist name Joseph Weizenbaum at MIT. and he used it in a very clinical setting so he wanted to try to have his patients who were either on their deathbed or close to death be able to interact in some way with with the computer so it was very kind of naive very ad hoc it was based on parsing and keyword replacement it would it would simply do something like if the input to the program was my head hurts it would output something in response like why do you say your head hurts or how bad does your head hurt so something like this and these kind of very early examples were inspiring for people because they they passed some very simple versions of the Turing test right so using these kind of questions in this very ad hoc feedback it was able to not really or fool people into believing that they might be talking to a human rather than a machine fast forward 50 years and we have Microsoft AI which came out with a neural network that was based or it was called Tay Tay and you and so if you've seen this in there in the news recently it was kind of a dynamically learning bot that was released on Twitter and it was a really cool idea so each time a user a Twitter user tweeted at it it would kind of learn from that tweet and then reply to it was a chat bot and you see this a lot popping up now and Facebook and other kind of social media services for more of like a marketing twist but what they didn't foresee was the fact that Twitter tends to be cesspool sometimes and tends to be filled with porn and sexually explicit content and overall kind of bad stuff so what it actually turned into was a porn-ridden racist Nazi bot and it turned into quite like a PR disaster for Microsoft and they had to shut it down so indeed we view infosec and machine learning as kind of prioritizing the defensive orientation right so you set a perimeter or you try to detect incoming threats or you try to remediate it once it's already happened the adversary has to do something in order for you to react to it and defend your network or whatever it may be so you have some examples here these are historical black hat talks over the last 10 or 15 years you have some machine learning talks one or two per year usually and they cover anything from spam filtering to botnet identification to network defense to intrusion detection but what we wanted to promote to propose here was rather that you could use artificial intelligence techniques and machine learning not only on defense but you can use data to drive an offensive capability we call our tool snapper it's the social Network automated phishing and reconnaissance tool and it split up into two separate phases the first phase takes us input a set of users who you want to target and it takes the set of users and extracts a subset of them that it deems its high value value in case we can't do that throughout the day. you targets, so it prioritizes them. We'll get into more about this later. Uh, and then the second phase of the tool takes those users and crafts a tweet, uh, directed at them based on the content that they have on their historical Twitter timeline. Um, and the end result of this is a tweet with an at mention and the crafted, uh, machine generated text and then a shortened link which we measure, uh, success using click-through rates. Um, so with that, if, uh, if anyone wants to partake in the demo we're gonna do later on in the talk, please tweet at the hashtag snapper and that's hashtag SNAP underscore R. Um, we're not gonna target you with any kind of malicious payload. It'll be a shortened link that just redirects to google.com or something like that. Um, but if you want to have your timeline read dynamically and then have a tweet spit back out at you, uh, please do that in the next 20 or 25 minutes. Uh, so the talk will go, I'll, I'll hand it off to John to talk about machine learning on offense and then we'll go into the two parts of the talk. So we'll talk about how we evaluate the tool, target discovery and spear phishing and talk more in detail about how to generate the message content that's kind of the core of the tool. And then we'll talk about how we evaluated the tool and how that evaluation compares to other techniques, uh, that have been found in literature. Alright, cool. Um, so the first question is like why is social media such a great place for spear phishing people, right? Why Twitter in particular? Um, there's a lot of answers to this and we put a few on this slide. Uh, first being it, a lot of these tools are designed to be able to use the social networks. Um, and it's even harder to use the social networks, right? Uh, you know, is, there's the, the, um, the, uh, the key, uh, to, to do something with, um, social networks, right? If you, if you, you can use the social network as a tool, it's easier than if, uh, if you have, uh, a, a, a service or whatever, like a project, then you're just going to do the use of a tool that's going to help you, um, even if you have a really simple little, uh, it's going to help you find that tool, and then you're just going to use it as a tool to, to, uh, um, to create a tool that's going to allow you to do different things. And it's, it's just so easy to do and, and, um, I don't think it is use this for our talk. Uh 20 years ago you wouldn't have any idea what this meant. Um so the idea here is like basically machine learning tools especially generative models tend to be pretty bad if you've ever seen like subreddit simulator and things like that. Um but the fact is the bar on twitter is so low to have a good you know tweet uh that people will be interested in. Um even even generative models can do pretty freaking well. Um some other things are like due to character limits uh there are a lot of shortened links on twitter I don't know if you've ever used it. Um so basically if you're trying to obfuscate a payload or something like that um people don't actually think twice about clicking links on twitter you know that are that are shortened right? Because everything's actually shortened there. Um then there's also the fact that like people sort of seem to understand email or at least like some people do at this point. Um like Nigerian print scams you know things like that. Um and then there's like a lot of people actually like can tell you hey you know you get an email check the link before you click. Um on twitter and social media you know social networks people don't actually think about what they click on you know it's it's you don't have that sort of years of awareness built up yet. And that's one of the things we're trying to actually bring about with this talk. And then finally um people actually want to share content on these social media you know networks right? Um for example reddit like you want to get upvotes. Um and then there's twitter you want people to share and like your content right? So there's sort of this idea of like incentivizing data disclosure. Um if you're you know um on twitter you're sharing a lot of personal information about yourself about things that you like things that you enjoy that can all be used against you. So we wanted to give a quick shout out actually um at ShmooCon there was a really really cool talk about uh you know phishing the phishers using Markov chains. And that was actually a huge inspiration for this talk so we just wanted to give a quick shout out. But getting right into the tool itself um basically there's some things built into the tool directly and there's some things that we also add on top of the tool right? So things that the tool does directly are it prepends tweets with an app mention and on twitter this actually changes what the tweets are categorized in their uh in their process right? Um tweets that start with an app mention are called replies and only people who follow both the person tweeting and the target can actually see those tweets. So if our bot doesn't have any followers that means the only person who can see the tweet is the target itself. Which actually is very useful in determining whether or not an individual you know target has clicked. Um another thing that's actually built into the tool is it shortens the payload uniquely per user and we'll get into that in a bit. Um so that way we can actually go through and each of our shortened links that we generate we can check whether or not that particular link was clicked and map that back to the user who clicked it. Also uh we triage users with respect to value and engagement. So we have a machine learning model that we'll talk about in a bit that actually goes first before it actually fishes the person uh checks to see whether or not they're a valuable target. Whether they interact a lot with the platform for example. Um uh one reason this is useful is for example a lot of people have what's known as egg profiles or profiles where they haven't changed the default settings. These people tend not to post a lot. They don't they're not very engaged. And a lot of people don't post a lot and we don't want to uh waste API requests or you know waste like possible um awareness of the bot right by trying to fish these people. Um so we just go ahead and actually triage these users out so that we don't have to worry about them. And then finally the tool itself obeys rate limits. Um this is because we've sort of wanted to release it as an internal pen testing tool. Um obviously you know people can get around that but we hope you know you guys don't. Um that's all I'll say about that. Um some things that aren't actually built into the tool that are very very useful. Um first off um Twitter's actually pretty good if you post every single post of yours has a link in it. Um they're good at finding that and shutting you down. So one of the things we recommend is post a couple you know non-phishing posts in there or get ready to make a lot of accounts. And then another thing is um if you yourself the bot have an egg profile you know um nobody's going to actually click on your links because obviously you're not going to be able to get a lot of links. Um they they like to see believable profiles before they click links. So a very high level of uh design flow of the tool. Um first we have a list of Twitter users that we pass into the tool. It goes through each user and asks whether they're a valid you know whether they're a high value high uh um engagement user or not. And if they are it scrapes their timeline to a specified depth. Um so for example 200 or 400 tweets that they've sent. And uses that to either see a markov model or a neural network model. And that generates the actual text of the post. After it's generated the text then it you can either have it schedule the tweet for a later time when they're most engaged. And it actually uh calculates all that for you. Or you can post the tweet immediately and have the uh the tool suite to obey rate limits. And that's actually useful if you're doing an onstage demo. But yeah. Cool. So let's get into the tool. I'll talk about the first face here automated target discovery. So this is what Twitter looks like if um anyone's been living under a rock for the last ten year. Uhum. Twitter is full of interesting information and personal information like John said. You have this incentivization structure for disclosing personal data. Um..and by that I mean it's not necessarily just the content of the posts. So the last tweets that were made..You also have super valuable information present. Um..for the user as a result choose a in the description. People on Twitter tend to like to post about what their job title is and what their interests are generally. You get different kind of data, not just text. You have integers like how many followers and how many followers you have, how many people are following you, how many lists you belong to. You have a lot of kind of Boolean fields like have you changed your background profile image, have you changed any of your other default settings from the original instantiation of your registration. It's filled with different dates like your created at date and URLs within the text that you post. So this is what the raw API call looks like from Twitter when you grab it. So I'll use the example for this section of Eric Schmidt, the former CEO of Google. So we implement a clustering algorithm. So it's based on machine learning and we go out and group our users. We grab a bunch of Twitter users and we extract features from these API calls across these different users. And here I list a few of the most interesting and most relevant features that we grab. So like I said in the description, if you have words that tend to correspond to a job title like CEO, CSO, even like recruiter or engineer or something like this, this is probably going to end up being someone who you might want to target. They might have access to some sensitive information. They might have access to some sensitive information. information, company information or whatever if you belong to some other organization. Also your level of engagement, so how many people are following you and how many people you're following. You can imagine you don't want to target somebody who's not very active on the platform. You want to make sure that someone who is actively engaged and is likely to click on links and is getting updates on their phone. The account age is a good piece of information too. The created at date of the Twitter profile. You don't want to really target somebody who's just made the account and is just trying to get started up with the platform. Same thing for hashtag my first tweet. And then also a good indicator is the default settings. So people who tend to engage a lot in the platform will kind of make it fancy. They'll change all the default settings and they'll make it more matching to what their interests are and what they like. So in a nutshell, this is how it works. If we take the clustering algorithm and we start out with our target, Eric Schmidt, and we project it into two dimensions. You can imagine now that each Twitter user is represented on this 2D plot as a single point. Again, I'm projecting it into two dimensions. Originally it was a very, very high-dimensional feature space with all those different settings like the description, number of followers, et cetera. Project it into 2D and Eric Schmidt falls on this 2D plot somewhere there. Great. What do we do with that? We pass it through the clustering algorithm that we have. And I'll talk in the next slide about it. And then I'll talk about how we choose that. But once you do something like that, then you actually get to extract a subset of these users that you might deem as a relevant target or a high-value target. So up in the left-hand corner, the plot of red points there might be the group of people that you deem as high-value targets. And the users who belong in the blue and the green points, you want to throw them aside, de-prioritize them. So in the machine learning world, there are many different clustering algorithms you could choose from. And each of those algorithms have a set of values that you can choose from. And each of those algorithms have a certain set of hyper-parameters that you could tune to kind of optimize your technique and optimize your clusters. How do we choose this? We throw a bunch of clustering algorithms into kind of like a grid search, more or less, right? So we have k-means, and a parameter for k-means clustering algorithm is the number of clusters that you choose a-priori, for example. And you take those and you fit the models for each of these different set of algorithms and their set of hyper-parameters. And you choose the one that maximizes the silhouette score. So the silhouette score is bounded between negative one and one. And anywhere, a positive number, the more positive the better. And anywhere from kind of .5 to .7 and up is considered some kind of reasonable structure. The silhouette score kind of measures how similar a data point is to its own cluster. So the cohesion within that cluster to how it compares with data points outside that cluster. The separation of those. So on this plot, each individual data point, so each individual Twitter user is represented kind of as a horizontal bar. And the hyper-parameters are on the y-axis. So if you look at the first, the top plot there, you have two different sets of hyper-parameters for k-means. One might have two clusters. One might have three clusters. So you calculate the silhouette score for each individual data point. And you calculate the average of that, which is shown here by that red dotted line. And basically, that's the average of that. And then you want to choose the algorithm that pushes that red dotted line as far right as you possibly can get it to. All right. Cool. So before we actually get into the cool machine learning models and stuff for generating text, we're going to tease you guys a bit with some of the boilerplate that goes around the Tweets. So one of the first things that we actually ran into was we wanted to choose a URL shortener, right? And we want a URL shortener with the same name. So we wanted to use a URL shortener with a lot of different qualities, one of them being, you know, actually can shorten malicious links. And so the first thing is we went out, we found a malicious link. We verified using VirusTotal that it is indeed malicious. And we actually went to it, too, in a sandbox and all of that. And we tried it through a lot of different link shorteners. And apparently, goo.gl lets us shorten it, right? And so actually, several others also let us shorten it. But goo.gl gives us a lot of cool other things. First off, it gives us sort of like a timeline of when people click. And apparently, it gives us like a timeline of when people click. And apparently, this link has already been shortened before and people have clicked it. But that's, you know, a tale for another time. Goo.gl also gives us a lot of cool analytics, like who referred the link. For example, t.co. What browser did the target use? What country were they based in? Or at least, you know, did their, like, actual machine say they were? And what platform they used. So Windows, Chrome, you know, those sorts of things. Android. And all of that. So yeah. So goo.gl actually looks pretty legitimate. I ran it by a few guys I know. And they were like, hey, yeah, like, it comes from Google. It's got to be safe, right? And no. It can link to malicious sites. So we verified that. It also gives us really cool analytics, which is very useful if you're, you know, trying to spearfish internally, right? You want to know which users clicked. But some other cool things that it gives us is you're able to actually create shortened links on the fly using their APIs. So you can actually create links on the fly using their APIs. So you can actually say, hey, here's this, you know, general payload, www.google.com. Let's shorten it uniquely for each individual user and see, you know, which individual users actually click on the link. And then you can also obtain all of these analytics programmatically. So there's really, like, no manual, you know, process that you need at all in this entire process. And we'll go ahead and give the note that we never actually posted any malicious links to any targets. We just verified that you can actually shorten malicious links in here. So please don't get mad at us about that. And then finally another thing that the tool does in the box is it does some basic recon and profiling. So two things that it does is it figures out what time the user is likely to engage the platform. And it looks at what topics that they're interested in and tries to create a tweet based on that. So it's able to just sort of do a quick review of what on one of those topics. So for actually figuring out the scheduling, the post, what time the user is active, we just use a simple histogram for tweet times, what, uh, which hours that that user tweets. And over on the left you'll actually see my own, uh, tweet history, uh, timings. Um, so you can actually see that I'm most active at 11 p.m. at night. Take that what you will. Um, but it's, it's actually very easy to find this data, right? And, uh, for topics, we actually started, like, when we first started this project, we were thinking really, really complicated, like, you know, super LDA, all the things and whatnot. Um, but we found actually pretty early on was just a simple bag of words and counting frequency does really well for finding topics as long as you remove all the stop words. Um, so with these two things, we can actually see the models and suite, you know, the tool to, uh, tweet at a time that the user is likely to respond and also tweet on something that they're likely to be engaged with. Great. So, so at this point now, we've taken a bunch of input users and extracted a subset of them that we want to target. Uh, and we calculated what they like to talk about, the topic, and we've also determined at which time they're most active with, with Twitter or with the Twitter platform. So now how do we go about getting, um, getting them a tweet that they might be more likely to click on than your, your normal, uh, any random question? So we do, we do this in two separate ways. And the first way is, we leverage Markov models. Um, so Markov models, they're popular for text generation. Like John said, the sub-Reddit simulator or the InfoSec talk title bot. But how it works is, um, using Twitter API, you can go and grab the last X posts on someone's timeline, right, 200, 500,000, um, however many you wanna grab. And we call this the corpus. So you take your corpus and you want to learn, um, pairwise frequencies of likeliness between these words, right? So for example, you might have the word I that occurs a lot within this corpus. Sometimes it might be followed by the word don't. Other times it might be followed by the word like. So based on the relative co-occurrence of these words in your corpus, you can then generate a model that probabilistically determines how likely it is to create kind of this string of sentences I like or I don't. And you can continue this for the length of the entire tweet. So it's based on purely transition probabilities from one word to the next. On the other hand, we train a recurrent neural network. And this is called LSTM. And LSTM is an acronym for long short-term memory. And so this is a bit more cumbersome. It's less flexible than the Markov model. We took five and a half days to train this neural net. We had to do it on an EC2 instance. Using a GPU cluster. And the training set was comprised of approximately two million tweets. And we didn't go out and just grab your run of the mill any two million tweets. Because like I said, Twitter is a veritable cesspool. So we had to go and find kind of legitimate looking tweets. To do that, Twitter has an account called at verified. And that account in turn follows all the verified accounts on Twitter. All the ones with that blue check mark next to it. And so our idea was to do that. And the idea was that this, the people that are verified accounts are probably more legitimate. They're probably posting about some kind of relevant information. And so we train it on this huge corpus of tweets. The network properties, we use three layers of this neural network. And approximately 500 units per layer. And the idea here is that neural networks are, or at least this neural network in particular, is much better at learning long-term dependencies between words and a sentence. So LSTMs are often employed when people want to learn sequences of data. And in this context, you can imagine a tweet or a sentence being a sequence of words, right? So as the, in contrast to the Markov model, which just cares about the pairwise frequency, the word that follows this word. The recurrent neural network, on the other hand, considers longer-term dependencies. Because what I talk about in the beginning of my sentence might also relate to something that comes later on. This is common in all languages. This is common in all languages. In English, and most common in German, actually, you have these long-term dependencies. You might not know what the context of the sentence is until someone finally finishes the word at the end of it. So what were the differences between these two approaches? The LSTM, as I mentioned, took a few days to train, so it's a bit less flexible. Whereas the Markov chain, you can deploy it, and it can learn within a matter of milliseconds. And that kind of scales depending on how many tweets you choose to train it on. The accuracy for both? Surprisingly, it was super high. So even though the LSTM is a bit more generic, and by that I mean it learns kind of a deeper representation of what it means to be a Twitter post. And I caution myself not to call it English, because as John said, this isn't English. This is kind of Twitterese. It's filled with hashtags and different kind of syntactical oddities and abbreviations. So the availability of both of these tools is public. You can go out and you can download an LSTM model using different Python libraries or otherwise. The Markov chain as well. And the size of these, the LSTM is much, much larger to store on disk compared to the Markov chain. But like I said, the Markov chain tends to overfit on each specific user. The idea being, let's say, you're posting today or in the next week about the Olympics or something like that. Maybe two months from now, if I go back and I read your historical timeline posts and I tweet back at you. It might raise your eyebrows because the Olympics have been over for a while and you don't really care about that anymore. The cool thing about Markov models, though, is that you don't need to retrain it every time. Like I said, it's very flexible. You can deploy it very fast. What this means is that it generalizes out of the box to different languages. It's language agnostic. So if you're posting on Twitter and you're posting in Spanish or even Russian or Chinese entirely different character sets, because it's based on these pairwise probabilities, it's going to dynamically learn what word likes to be followed by the next and you're going to be able to post a tweet back at somebody based on the language they're typing in. So here's an example that's in Spanish. And if anyone is from a foreign country here with a lot of foreign language tweets and wants to volunteer for the demo, again, please tweet at that hashtag, snapper. So we don't like to think of this necessarily also as a Twitter vulnerability, so to speak. This can be applied to other social networks as well. They all have pretty accessible APIs. But the idea here is that kind of like with the rise of AI and the rise of machine learning and the democratization of this, as it becomes more and more possible to do this without a PhD, for example, and the technology grows and grows and becomes more available. Yeah. This is going to become more and more of a problem, right? So the weak point here is the human. This is classic social engineering. Cool, yeah. So before we get into the evaluation results and demo, I just want to say the tool is public. So for example, there's a version on your conference CDs and there will also be a GitHub link that we'll tweet out as soon as we get back home to Baltimore. But we first... We first trained our first couple of models and started wild testing it, and we were surprised it did really, really well. I don't know if you can actually see some of the pictures, but for example, we got a guy in the top right. The first post is what our buck posted, and the second is like the guy responding saying, hey, thanks, but the link's broken, right? We actually saw this quite a bit. And on the bottom, you can see some of the example tweets from the first models that we made. So, yeah. So we used these first couple models and we did some pilot experiments. We grabbed 90 users from hashtag cat, because cats are awesome. And we went ahead and tried to spearfish all of these users, again, with benign links. And we were actually surprised at how well the model did right out the box. After two hours, 17% of those users had clicked through. And after two days, we had, you know, between a 30% and 65%... 66%, sorry. Click-through rate. And so why that range is so huge, actually, is because there are a lot of bots crawling Twitter clicking on links. So we actually don't know exactly how many actual humans clicked through. If we use the actual strictest definition of what a human might be, so making sure that, for example, the referrer is t.co and the location matches up with the location listed on their profile and those sorts of things, that's where we get that 30% number. If we use a little bit more relaxed... Uh, criteria for judging whether it's a human or a bot, um, we actually can get up to, like, the number of people that we think clicked might be up to 66%. And so, uh, actually, uh, funny story, um, with these initial models, also, we saw how well they did, and, um, an information security professional who will remain unnamed tweeted at us saying, hey, proof of concept or get the fuck out of here. So we went ahead and used him as a guinea pig, and it did actually, he did click the link. So we will say that. Cool. So, uh, so then we iterated on the model some, and we, uh, decided we wanted to test this against a human, right? Um, see how well the human could spearfish or fish people, um, versus how well the tool could. And, uh, so we had two hours we, uh, scheduled on our calendar, and the person was able in these two hours to target 100 people. And, uh, so we, uh, he was able to tweet at, um, 129 people in these two hours, which comes out to be 1.075 tweets per minute, and he got a total of 49 click-throughs. We used one instance of our tool, so one instance of Snapper running, um, and in those same two hours, Snapper tweeted at 819 people, which comes out to 6.85 tweets per minute, and 275 of those people had clicked through. And we sort of want to emphasize that this is actually arbitrarily scalable with the number of machines you have. The major rate, uh, the major limiting factors are actually rate limiting and the posting mechanism. Cool. So, um, sort of a TLDR, um, this tool that we've made, um, there are two traditional ways of, you know, creating tweets or, or messages that people will click on. The first is, you know, phishing, which is mostly automated already and has a very, very low click-through rate, um, between 5 and 14 percent. There's also this other method called spear phishing, which takes tens of minutes to do. It's highly manual. You have to actually go out, research your target, find out what they enjoy doing, what time they're interested in posting at, things like that. Um, you get the best spear phishing campaigns actually get up to a 45 percent accuracy from what we've seen. And, uh, we actually kind of split the difference. We actually combine the automated, um, um, characteristics of actual phishing, but we still get pretty close to what the actual, um, effectiveness of spear phishing. And with that, demo God's willing, we'll do a live demo of this. Okay. Cool, right? So, I just want to see, so about 151 of you have actually tweeted. So this is the actual command to, uh, uh, run the tool. And we're going to go ahead and run it, hopefully. Cool. Um, I'm actually the first person on the list because I actually, you know, wanted to make sure that something worked right. Okay. Let's see. So what it's doing is it's actually, it pulled down the user's timeline and generated a tweet for that person. And, come on, come on. Cool. Actually. Okay. So here's it starting to come out. Um, so here's that actual post that it generated and it, uh, posted, you know, at my hashtag, the text that it grabbed from my profile and the shortened link. Um. And so you can see that that actually works and we're not just saying things. So notice that, um, on my actual, you know, timeline, you can't actually see that post, right? And this is because it's actually called a reply. But, hopefully, yep. So here's where it actually shows up. It shows up in your notifications, not your actual tweet history. And so you're the only one who can actually see that. And so, uh, as you can tell, um, yeah, I just got spearfished if I click this link. So it's actually running through all of you guys now who tweeted at the link and generating text for you and posting them. Um. So we'll leave that running as long as possible, but it probably won't get through all of you guys while we, uh, wrap up the talk. Cool. Thank you, demo gods. Um. Right. Then just a few words to wrap up. Um. Why did we do this? Uh. We want to generally just raise awareness and educate people about the, the susceptibility and the danger of social media security. Um. Like John said, people usually think about email, uh, very cautiously. You would never open a link in an email from somebody you've never interacted with before. And we want to have that same culture be instantiated on Twitter now and on other kind of social networks. Um. Another way that you could use this tool is to, if you belong to a company, um, or in some other kind of organization, you want to do some internal pen testing to see how susceptible your employees might be to some kind of attack. So, if you're a tech like this, this could generate good statistics for you and help you refine your kind of educational awareness programs. Um. You can also use this for general social engagement and staff recruiting. Reading stuff off people's timelines and then crafting a tweet geared at them might be a good way to recruit people. Or even for advertising. The click-through rates here we have are, are pretty huge compared to your general, uh, generic advertising campaigns. Um. So, like I said, ML is becoming more and more automated. Data science is growing. A lot more companies are hiring data scientists. And the tools in the toolbox are becoming a lot more, uh, democratized. You, you can, you can easily go out. There's free software you can use to train these models. Um. Including the one that we'll release today. So, the enemy will have this or the adversary will be able to, to use this and leverage this kind of technology sooner rather than later. Um. One way you can try to prevent these kind of attacks is to enable protected account on your Twitter, uh, on your Twitter user. So. If you protect your account, we can't go out with the, through the public APIs and grab your data. Um. There might also be ways to detect this stuff using, as I said in the beginning of the talk, automated methods like machine learning classifiers or, or whatever have you. Um. And also, if you're ever unsure, always, always report a user or report a poster, um, if you see a tweet like this maybe. Twitter is pretty good at actually responding to these reports. Um. And we, we use Google.com as our shortened link that you redirect to so feel safe to click it. Um. Because if we, if we did something more funny like redirect to our Black Hat talk, people might get pissed and try to report us. We don't want our bot to get, uh, our bot to get banned. And so, in conclusion, ML can only, only be used in a defensive way, but you can use it to automate an attack. Um. Twitter is especially nice for this kind of thing because the people don't really care if the message is in perfect English. It's slang-laden. It's abbreviation-laden. And these things actually help the accuracy of our tool. Uh. And finally, data's out there. It's publicly available. And it can be leveraged against someone to social engineer them. And with that, we'll take some questions. So just step up to the microphone if you have a question. Okay. So. I've got a question. So, it's not on. Is it on? Uh. It's on now. It's apparently on now. Okay. Oh, sorry. I got it. Sorry. Hello. Ah. Uh. So do you, I can hear it. Alright if you come, if you just say it we'll repeat the question. So have you tried implementing anything like change point detection for cause I know that some research has been done in using Twitter for like threat analysis as well. It's like trying to pinpoint users who say work for like ISIL or ISIS and have you done any research using like Markov chains or prior distribution detection systems? Do you want to take that one? Alright so we haven't done any research for the purpose of this talk into that but it's definitely a cool thing that we'd like to look into so if you want to talk to us a bit more after the talk about it we can get some you know information and trade some ideas. Great presentation. Quick question pertaining to the environment of a mobile platform as this applies because I know you guys touched on mobile. You mentioned phone or smart phone. Could you kind of just give me any additional thoughts on that area? Um sure so we haven't actually measured like the differences between how many click on mobile versus how many click you know from a PC or something like that. Um but it's something that we can definitely do so if you're interested in it you know tweet at us and we can crunch the numbers for you. You were mentioning that your neural network uh version of the text prediction performed better than the Markov model in terms of like temporal accuracy. Um what about the neural network caused that uh over the Markov model and what would prevent that from talking about the Olympics a month from now? I'm admittedly a noob at neural networks. Yeah sure. Um you know I definitely recommend looking at some documentation about LSTMs. Um neural networks in principle can kind of uh replicate any any kind of arbitrary function. This is a special kind of neural network that has data that can be used to calculate the different gates in between each um each layer of the LSTM and these gates kind of turn on and off dynamically and so it allows you to uh remember words at like um a certain depth back in time. Uh and it learns these connections on the fly and it it's able to turn it off and on and because of that you're able to like learn learn longer to contextual information in these words. Uh just have a question uh I want to see what kind of considerations you had for trying to prevent bias in your training set and what were some like time biases or even just using the approved Twitter handles might introduce some bias in terms of the data you're looking at. Can you discuss some of that? Yeah that's that's definitely some valid criticism. So you want to avoid you know common pitfalls like over fitting to specific users especially in the in the clustering thing. Um yeah we we didn't do any kind of uh formal navigation with the LSTM at all. evaluation of the LSTM, we have a loss that we tried to minimize over time. But in terms of the Markov model, we just kind of tuned it until it looked good enough and it worked in terms of like, you know, we had several different tests in the wild. And as soon as we started getting pretty high click-through rates, we got pretty confident that it was working. So fascinating work with some pretty groundbreaking implications. I mean, given the fact that your intent is to fake people out to believe that these are real, have you sort of passed the Twitter Turing test, if you will? Yeah, that's a really good question. So the Turing test now is, it's really interesting. I think there's even conferences dedicated to having machines try to bypass or try to pass the Turing test. And so there was kind of a much simpler version that was introduced much like 50 years ago or 40 years ago or however long ago it may be. And nowadays, you actually have to change the Turing test. And so it's kind of a much simpler version that was introduced check a lot more boxes in order to get past it. Yeah, I mean, given our click-through rates, it seems like Twitter is super, super easy to do this kind of thing on. I mean, I would argue that each kind of positive result here in our statistics is more or less passing of the Turing test, right? In the Twitter Turing test, as it were. Yeah? For training the transitional probabilities on the Markov model, did you only use bigrams or did you consider using a bigger window? Right. Only bigrams. Only bigrams. Okay. Yeah. Thanks. All right. Thanks again. Thank you.