>>So uh we’re here today to talk about a mechanism that identifies the type of device connecting to a wifi network. Uh, it can be quite specific. You can tell the difference between an iPhone 5S and an iPhone 5, between a Samsung Galaxy S7 and an S8, uh between a withings scale and a nest thermostat. Classically this kind of client detection would be called fingerprinting, like the OS fingerprinting mechanisms and end map. Uh, however in it’s current usage the term fingerprinting has evolved to mean identification of specific users, like browser fingerprinting. Um, and well the word fingerprint kind of refers to an individual's fingers, as the mechanism discussed here identifies the species of the device and not the individual user, we refer to it as wifi taxonomy, it identifies the species. We’ll get a chance to try it in the last few minutes um during time for questions. [sound cuts out as speaker continues] TCP IP packets, they’re not routable they don’t leave the wifi network. We’ll focus on 2 specific types of frames, the probe request is where the wifi client can ask for all nearby AP’s or one specific AP to respond. The client includes information about itself and it’s capabilities in the request and the AP can respond with it’s one capabilities in the response. We’ll also look at the association request which is where a a client joins a wifi network. The client includes many of the same capabilities as were in its probe request plus a few more. There were a bunch more MLME frames like authentication or action frames to modify various parameters but, for the taxonomy mechanism were talking about today we’ll just rely on these 2. Information elements are type length value tuples packed one after another in the management frame. They’re all optional though in practice, a few are universal because wifi can’t work without them. Each wifi standard has added more types of information elements. Um, in the date 802.11 B days there were very few. .11 G added a few more. .11N and L, AC added a bunch more and so on. And in addition to the standard elements there’s a mechanism for vendors to define their own. Vendor extensions are type 221 uh, with the uh an IED for the vendor called the organizationally unique identifier or OUI and then followed by a sub type so that the vendor can define multiple of their own types. Because the length field provides enough information to skip over the IE any wifi client device can interoperate whether it understands that vendor extension or not, it just skips over the ones that it doesn’t implement. This is the association frame from an iPhone 7 Plus as broken out by Wireshark. The association request includes the SSID that the client wants to join, information about it, supported rates and channels, uh about it’s power levels and it’s radio management capabilities, plus three vendor extensions from Microsoft, Broadcom, and Apple. A few of the vendor extensions are very widespread. Uh, the the Microsoft extension shown here is for prioritization and it’s widely implemented even on devices that are not running any kind of Windows OS. The Broadcom extension is also quite uh widespread owing to the how calm and broadcom chipsets are. The Apple extension shown here was added in IOS 10.2. We don’t really know what it is, but it was added on all devices running that version or later. The signature lists the tag numbers of the IE”s that are present in the frame in the order that they appear as a text string of decimal numbers. For vendor extensions it additionally includes the OUI of the vendor and that vendors sub type. Uh for the this part of the signature we ended up with the text shown in red on this slide. This part of the signature is most strongly influenced by the OS of the client device where the client wifi stack is implemented. It’s next most strongly uh influenced by the wifi chip set. Both in terms of the standards it supports and on any vendor extensions that that vendor implements in their driver. In addition to the tag numbers, a few of the information elements contain capability bit masks or other information which is useful in identifying the device. Uh, for example 802.11 uh uh 11N define 16 bits of optional capabilities and .11AC defines 32 bits more. This is most strongly influenced by the chip set and the subset of the standard that’s implemented by that A sick. The transmit power information element depends strongly on the board design and how the antennas are laid out. Uh 2 devices built by the same manufacturer using the same software, or even using the same wifi chip set will often have different TX power values because their board layouts are different. The number of antennas that are present is encoded in both the .11N and the .11AC capabilities and it’s also indicative of the board design and there’s an extended capabilities bit mask which contains even more optional elements. It’s most strongly influenced by the driver and the WPA supplicants software. A number of the capability bit masks are pended in the signature to further differentiate it also shown in red on this slide. Looking at the signature as we’ve discussed it so far, it has become more complex over time. Uh, this shows the association request portion of the signature for 3 devices. The first is from an original iPhone which is a .11G device. The this taxonomy mechanism wouldn’t have worked very well in that time frame. There was very little differentiation between devices. iPhone 4S is a .11N device introduced about 4 years later and it added a number of options to its management frames. iPhone 7 is from about 5 years after that and it’s a .11AC device and it added even more. The full signature contains the list of IE’s and the various bit masks from each of the probe request and the association request separated by a pipe. The whole thing is prefaced by wifi 4 because this is the 4th iteration of the signature format. Uh, prepending that string allowed the wifi 1, 2, and 3 signatures to remain in the database while we were working on updating everything. We shall speak no more of the earlier formats. Uh when you include all of this into the signature, it ends up being quite distinctive and it allows us to identify what the device is. The taxonomy signature is influenced by the client OS, by it’s wifi chip set, by its board layout. Uh the current database of signatures identifies the most common wifi devices which are overwhelmingly phones nowadays. We have signatures from most widely sold phones of and tablet devices over the last few years and a selection of other types of devices. uh like media streaming devices from Google, Apple, Roku, Amazon and so forth. Uh and internet of things devices from Nest, and Honeywell, and We things and and so on. For larger devices like laptops and desktops which use a separate wifi card, this mechanism identifies there card. Um we had signatures for some laptops and desktop devices in the database but it was kind of ridiculous. Uh, there was one model of Apple’s Airport Extreme Card which could be a Macbook or an iMac or a Mac Pro basically any machine of that generation uh we couldn’t tell em apart using this mechanism. Uh Intel Centrino chips as it’s used in Windows lap laptops are even less distinctive, it could be basically anything. So at this point we don’t even try. Uh We don’t add signatures from laptops or desktops into the database it just tends to uh result in confusion and isn’t very useful. Additionally there are a few classes of device which we choose not to gather signatures for. Um, first we only want to focus on common devices. Devices that uh lots of people are likely to have and we used lists of top selling consumer electronics over the last few years to target devices that we want to gather signatures for. Um if it’s something that isn’t very common or is is unique we don’t wanna really want to put it in the database. Uh The other set of things we don’t add to the debate database is things that would make people uncomfortable if they saw it in the list of devices on their router. Uh That includes uh various medical devices, devices of an adult nature, uh home incarceration monitoring devices and so forth. Many devices have been seen to emit more than one signature and so there’s more than one entry for them in the database. For devices which support both 2.4 and 5 gigahertz operation, the signatures are almost always distinct. Uh there are information elements that are only defined for one band or the other, and the whole of .11AC is only defined for 5 gigahertz operation. So if the device supports both bands, we gather signatures from each of the 2 bands. Uh however, even in the same band devices often have multiple signatures. They vary what they advertise based on the local conditions like noise. Uh this example shows 2 signatures from a Google Pixel phone. It varies it’s handling of beam forming presumably based on the noise environment that it sees. Clients can also behave differently depending on what they see from the AP in response to their probe request. For example, uh if the AP says that it supports radio resource management, most Apple and and some Android devices will include some spectrum management IE in their association request. That’s IE number 70 highlighted in red in that list. Another example is that although .11AC is only really defined for 5 gigahertz operation, many vendors have a proprietary extension to it which makes it operate on 2.4 gigahertz and we will see the .11AC fields in their probe request. They typically only then include it in the association if they see the magic proprietary handshake back from the AP and so it won’t be in the associate. So when capturing signatures for the database we use 3 different AP’s to maximize the chance of capturing different signatures. Sometimes we see the same signature from multiple devices. Um, these examples are all devices using the Broadcom 43, 362 chipset running linux using the same driver, same wifi uh supplicant, same WPA supplicant and they’re all old enough that they don’t have a transmit power information element. Uh, the signatures are identical. They’re an Amazon Dash button, a First Alert Thermostat, a nexus 7 from 2012, Roku HD and a Withings scale. In most cases like this we distinguish them using the upper 24 bits of the Mac Address which is uh an organizationally unique identifier. OUI’s are assigned to the manufacturer and adding the OUI as a qualifier can distinguish similar devices from different manufacturers which have the same signature. We sometimes also use information from DHCP. The options present in a DHCP request can identify the OS. This was originally developed by the finger bank project and that whole mechanism inspired this mechanism for wifi. However using DHCP gets us further and further from the wifi layer and so we try to be more sparing in using it. In particular only the access point will be able to see the DHCP request unencrypted. uh other devices like Sniffer devices that might want to use this mechanism would not be able to rely on DHCP. However there remain a few cases which are still troublesome. Uh mainly made by the same vendor using the same software, the same chipset, and at about the same time. Often the transmit power information will distinguish them due to the different board designs but not always. Uh for example iPad Air second generation and iPhone 6S have the same signature. Uh, we can try to use heuristics like if the DHCP hostname contains the string iPad, it’s probably an iPad, um but if nothing else we we return to all of the possibilities that it’s one of these. This mechanism was originally developed as part of a wifi EP project. Uh We intended to focus on identifying the wifi chip set the client was using. We thought that if we could just know what that chip set is, then we’d be able to implement all kinds of very clever bug work rounds and we would make wifi perfect. Um, as it turns out if bugs can be worked around easily, they mostly work around them in the client software. Who knew? Um, instead where thee, this kind of information is currently used is in the UI of the router. Uh where there’s a list of connected clients. If the client it it we can get an indication of what the client is. If the client included a useful host name in its DHCP request then that’s great. If it didn’t or if it includes something like it’s serial number as it’s name, then it’s much more helpful to say what we think it is uh to help the the user identify it. We also use it to correlate with other performance information to break it out by the kind of client device. My colleague Avery Pennarun gave a talk at Net Dev 1.1 um, on this this topic. The graph on this page is from that talk and it shows wifi throughput getting better and better and the client get’s closer to the AP until it gets really close and then it starts dropping again. That’s unusual. Most devices don’t do that and you only can see that this is happening if you break it out by the type of device and see that some of them do some weird things. In the future we may use the mechanism for more. Um, we might use it for optimizations based on the type of client device. In particular, if we can know how well it handles packet reordering, we could use that to get lower latency on average by allowing the occasional packet to arrive out of order rather than buffer all of them to keep them all in order. Um, also wireless intrusion detection systems might be able to use in information like this if they think they know what kind of device this is then they know what sorts of network activities would be reasonable from that device. Uh other resources. So we published a paper about the the mechanism which goes one level of detail deeper into uh how it works. And the Net Dev talk that I mentioned earlier is linked uh from the slides which you'll be able to get at after the talk. That talk described the overall environment where this mechanism was use used and and how it was used in that environment. So the current status. The implementation to extract signatures for clients went into host APD in August of 2016 and it’s present in hostAPD 2.6 and later. The database of known signatures is released as open source code with an Apache license on GitHub and the link is also in the slides. It currently identifies about 60 percent of wifi clients across a a broad swath of the market. The remaining 40 percent of devices are mostly laptops and desktops but with a very long tail of just other stuff that we don't’ know what it is. So what come next? There’s this thing which can identify an interesting subset of wi wifi client devices. Uh this signature mechanism is in hostAPD, the database has been released as open source but it’s only useful if it’s integrated into other products and systems. Uh, wifi AP’s, wireless intrusion detection systems and anything else interesting that people think of and so one of the points, one of the main reasons for this talk is to build awareness that this thing exists and it’s available for use. Uh we other things that we need to do are develop better tools for gathering signatures. It’s a pretty manually intensive process right now which which means me. Um, so also the longer we’ve been at it uh, the more we realize that the client responds to things that it sees from the AP. We’ve been using 3 AP’s for a long time. WE need to start using even more and more different types of AP’s to make sure that we’re getting the different signatures devices can emit. Uh, other things that might happen in the future. so this talk has been all all about how AP’s can identify client devices but running it in reverse would probably work as well. A client device could list off the information elements that are present in the beacon that it sees from an AP and maybe in the probe response that it sees from the AP and use it to identify what type of AP that it’s talking to and then the client for any kind of performance or quality measurements that it does, can also associate it with the brand and model of AP. So, uh I surveyed uh coworkers about whether to run a devo demo, uh as you can see the results were quite incorrigible. [laughter] So, you can try it and you might be able to try it. Let me move it back. You might be able to try it, okay. You can [laughter] you can join the SSID is Smell Of Wifi Talk and the password is all lowercase, smell of wifi talk and the system will try to identify what can of device uh it is. I, to make sure that the demo worked, used a nexus 4. [laughter] there ya go [laughter] Nexus 6P [laughter] any questions? >>Yes um, concerning that poll uh it seems impossible to uh train 100,000 dollar or I mean 100,00 fully emplaced captains to use wifi [inaudible questions] [laughter] >>Can, so the question is about voting and polling systems? >>yes voting and wifi intrusion um. >>Um >>That seems like an impossible problem. >>Yes, I would not use this mechanism for voting, for protecting voting polling places. Other questions? Okay. Thank you. [applause]