>> Hello, everyone. Thank you for coming. My name is Fernando Arnaboldi and I work as a security consultant for IOActive. I would like to show you today how XML applications are vulnerable to multiple practical attacks. And for that matter, the very first question that I would like to have addressed if I was sitting over there is why are we talking about XSLT? This is a programming language that is not so common. It has been created when XML was created and it was a way to parse XML data. So, a couple years ago I came across-- Ariel Sanchez, a coworker, found an external entity expansion that allowed him to retrieve some passwords. And I thought, this is pretty cool. I want to learn more about XML, schemas, and XSLT. Those are all the technologies related to XML. And I was doing that, I noticed that there were no things talking about how to explore XML implementations. When you are reviewing a language, you may want to know that. So what we will be doing here is to analyze what are those weaknesses and we will present five different issues in here and how you can practically explore them. So whether you are reviewing code, if you are a penetration tester, or if you are developing technology related to XSLT, these will come in handy, or just trying to abuse an implementation, hopefully this will help. One of the things is that even today no one of the weaknesses have been fixed, so everything will work. And you may be able to affect the confidentiality and the integrity of multiple implementations. And that means that you may get even some profit in certain areas. So, the good thing is that you are not exploiting flaws in a way that a malicious virus or Malware would. You will see no assembly code. You will see just XSLT and how that can be used to get some fun things. So, we will briefly talk today about how you can identify your target, how numbers will let you affect the integrity, how random numbers may be predictable sometimes. I'll show you how to bypass the same margin policy in a web browser using XSLT and finally some information disclosure through (Inaudible) So, basically the idea here is to tell you what XSLT does, briefly and how XSLT can be attacked. And finally, if you didn't know-- >> How's it going? >> Hello. I was actually expecting this. >> I know. That's why I'm here. So, where's the mic? This is your first time at DEF CON, huh? >> This is my first time at DEF CON, yes. >> How's it going? >> A little bit anxious. >> You seem a little nervous. Do you want me to rub your shoulders? >> I would be a little bit uncomfortable. >> We have a medicine for that. >> It comes in a bottle, right? >> Yes, it does. I gave him the right one, right? >> Why am I getting the water? I don't want the water. >>Moron. >>Welcome to DEF CON. >> Thank you very much. >> Yes. >> [Clapping]. >> That was interesting. >> You feel much better, don't you? >> I feel like a madman. Thank you very much. >> You want to talk directly into the mic...very close to it. (Laughing) I'll show you later. >> I would expect that...and as I was saying, you can identify your target right here. Thank you. >> So basically, XSLT is a language that's being used to modify an XML. So what it does is receive as an input an XML document, it creates a text document or an HTML document or a new XML document for that matter. So, there are different versions when it comes to XSLT. There are three, V1, V2, V3. The different versions doesn't mean that they are improvement... well, they are, or should be, but they have more functionality. Each version has more functionality. And V1 is the more implemented version here because it has been supported by web browsers and because it is supported by previous versions. So an XSLT processor supporting V2 will be supporting V1. So, I tested two types of software. Server side processors and client side processors. Server side processors means those are standalone things that you can run and come online or they could be libraries hooked up to different languages (Inaudible) Java, whatever. When it comes to client process servers, basically I believe you have two types, you will have web browsers or eventually XML or XSLT editors, and I believe that that is a very narrow set of people that are using those. So, the processors in the libraries are mostly three. These are the most important ones, developed by (Inaudible) Apache. Linux is the most widely deployed one. It's not only implemented by server side processors, but also by client side process servers, web browsers. And you also have (Inaudible), developed by the Apache people which comes in two flavors, SQL plus and Java. And the similar thing with (Inaudible) And the client process servers, here we have browsers, all the things that I tested was in the latest versions available of all the servers, libraries, and the web browsers. So, we have three ways to do this. The first one involves an XSLT processor receiving XML and XSLT. This only happens when you are calling a common line processor. And eventually you will get a new document. You'll do this if you need to parse an XML. So people will be using this if they need to parse something server side. Another possibility which is more common from a client perspective is when the XSLT processor is grabbing the XSLT document. There is a small portion in the XML that says you will find here XSLT document. Go get it for me, and create a new document. And finally you can embed the XSLT document along with the XML and by doing that you are just supplying one file to the processor to get the new result. So you might want to know if you don't know already who is your target, and which kind of properties do the target has. By getting which type of version and vendor they have you may know what type of vulnerabilities you could exploit in this target. Since clients may also support Java script, and that would be the case for a typical web browser, you may retrieve some Java script information. All these codes that you see here, and you will be seeing here, it is only white paper and you can do a copy and paste and try on your target of choice to see what happens. At the end of each section I will show you a brief summary of the server sites. Here we have all of them. (Inaudible) all related to XLT in this example. Then you have the clients, which would be the web browsers. You will see the first column for the version, the vendor, over there, and if support Java script or not and basically all web browsers support Java script. And there is one final thing. Normally, XLT is more widely deployed than other things, so you will notice that it is sometimes when it reaches (Inaudible) it may also effect the client one. So let's talk about the issues. So this is something present in client side and service side. And it doesn't matter if you are talking about floating point numbers or integers, all numbers will introduce errors in here. So as I was testing this, it felt a little bit weird that sometimes that relation was not working as I would expect it to be. Certain additions, subtractions were not doing what I expecting. So, the very first thing that I did was define a simple calculation. What I was trying to do is just to add a few numbers. So for that matter, I have (Inaudible) specific output here, says text output. In the middle, you will have this simple thing, 0.2 + 0.1 - 0.3. That should be 0, right? Pretty simple. May not be that simple for processors. Only two said that that was 0. That was the case for (?) and Chrome. The rest said, well, close to that. Why is this happening? This is weird. The weird thing is that you'll see this across all implementations. Ok. This is cool. But it would be better if it would do something with this. I mean this shows numbers that were not properly rounded by the programming language. So this is the things that is present in all programming languages. I realize that you have these in Java script, Python, whatever. This is a common thing. Floating point numbers will have certain decimals that are over there hanging around that you may take. So I created a simple Java script application simulating a bank. This is not my real bank. I wouldn't try this on my real bank. Hopefully my real bank would limit the amount of transaction. I wouldn't allow a very small decimal to be transferred from one account to another. So the very first thing that I tried was to see-- this application I deposit a million dollars in first account and the second account has 0 balance at the moment. This is where I will deposit profit. So I notice that if I remove a very small number from million dollar account, it will not get subtracted, but it will be added to the secondary account, because it has a lower number than a million, it has a 0, and that decimal means more for a 0 than it gets for the million. So this program will try to do the first portion of the problem, will try to see how big of a number it can retrieve. It is a small number. And then it will do millions of transactions to move it to the secondary account. So you will see here that we will be using V8, and that is the Chrome, Java script processor, (Inaudible) and we'll try to see what's the best process to getting here. How much money can I steal from a million dollars that would not be noticed? And I will try moving that money from the account number 0 to the account number 1. And hopefully that will gain me a daily profit of around $1,300. It's moving right? Yeah. So, this was good. But it would be better if it were a higher number. So let's talk about integers. (Inaudible) This should be fairly easy to understand even if you are not developing. You'll see that you have five exponential numbers in here, and the same five exponential numbers written with the number one with a bunch of 0s. The things is that programming languages do not handle it well when you have 16 digits because of precision. But what I tried to do was print in here the same number that I was having on the XML document and then format it with commas and periods and such so it would be more legible. In here you will see that Saxon is doing great. This is what you want to see. You will see number one, followed by a bunch of zeros. This is pretty clear. This is awesome. You will have the same for non-exponential exploitation. Internet Explorer and Firefox are good. They weren't able to show the exponential notation. But that's ok. The problem comes when you are introducing errors. Because there is nothing worse than believing you have the right number when in fact you do not. In fact, I was noting today that how they are finishing is different depending on whether you using exploitation notation or not. We will try to use this number in a couple of minutes. Same for Java, almost there. For C, they just don't care about what's going on over there, so anything can happen there. First thing, this is something related to an error in this download, so I want to read this download (Inaudible) But the problem is not with the error. All implementations have problems. It's what you do with floating point numbers and integers that matters. You should be saying, Ok, a number should be between this value, and not allowing a value to be so big if you are not able to handle it. Either way, this shouldn't be working like this. So I reported all of the issues, including this one, the floating point numbers, to the vendors. And the first thing that I heard was that I should be reading Wikipedia to understand how floating point numbers work. That was interesting, but probably you wouldn't find the answers there. Then, I heard that I should be reading the (Inaudible) that this was effecting purely V1, that was nice as well, but clearly was not solving the problem that was in here. And the very same person also said this is something that you'll see in Java script as well. That's fine, I know that you can find this in Java script, but I don't want to have this in my programming language, or any programming language because these numbers are everywhere. So, we stole some decimals before, and now we are trying to do a similar thing but with integers. The thing is that if you put a number zero, a number one, followed by 17 zeros and subtract the number one, programming language will not notice that the one is missing. So perhaps, I created in here a fake cryptic currency which I have named fake coin. That the value is very small, very, very, small. So I bought a number one followed by 17 zeros of this coin. So I have a lot of coins with a total net value of $1,100,000. And I will try to transfer one coin at a time to a secondary account, which will be my profit account. And hopefully by the end of the day I will have a better profit than moving decimals. The profit would be better if I use more coins. I would be able to transfer more coins at the same time. Here, just going for the minimum amount possible just to show you. The minimum amount here gave me a profit of $2,300. If you add a 0 to the coins, you should add a 0 to the daily profit as well. So that was nice. That was nice. So the very next thing that I did was to see how random numbers work. If you ever develop, if you have ever developed, you need random numbers. This is something that you should normally see on server side processors. And you should also know that of course not any random number generator should be used (Inaudible) Random number, you have to be careful with them. In XSLT, this is a function that comes from the extended XSLT, which is an extension of XSLT. It is defined as a function that returns a value between 0 and 1, as any random function should be. Supposedly any random number should be a number that doesn't have any pattern. You shouldn't know what will be the number before calling this map. That would be fairly logical from a random point of view. So, we normally have two types of random functions. If you have ever developed you may know that you have functions that are less secure like random-random in Python and you have more cryptographically secure mechanisms like system-random in Python. You may want to use that if you are moving cryptographic things. And some of the software that I tested, the service line processors, the, you are able to see the code, you are able to see how that was developed. (Inaudible) and Saxon, you will see that in all those areas they are using a random generator. Which is fine. The thing comes, the problem may come on the implementation. If people are using random numbers for any cryptograph proposer, that may be a problem because you may know with a certain random number generator what is going on. This was a point that we would see in C and C++ and Java. And a good definition comes from (Inaudible). These are chosen by a random number generator. You have to take that into consideration and shouldn't use them for cryptograph proposers. But there is one more thing when it comes to random numbers that you normally pay attention to, or you should at least, what happens if there is no initialization vector. This is something basic for many random number. And that's that you need to have some sort of-- something that's changing when you getting a random number, otherwise you may get always the very same volume. And that's not very useful if you're expecting a random number. And that's because you may know in advance which numbers you will be getting. So once you have a proper IV in place, you will have different volumes every time that you are calling the random functions. But let's see again how the functions that we saw before are working with the (Inaudible) association vector. And here there is only one that doesn't have the IV. Again, leave it XLT. This is not something new to Linux XSLT. They knew about these things in 2006, but this is how it works. So, if you try to create an XSLT that will produce a random function, or you see anyone who is trying to produce a random value out of Linux XSLT, you will see something like this. And you will see these kinds of results if you are executing that on common line. I executed twice on the same terminal and I got twice the same number. You can see that the 7.82, you always get that first number every time that you are executing the random function from Linux XSLT. So, the next thing that I do is try to understand how this can be used in cipher modes when they are doing blog ciphers. That is not a way to cipher things if you are using random. So I created two executions of the very first time to understand how these numbers look like. So first I printed the Python version of random-random. And you get two different numbers. Of course, these are from the random number generator. They may not be the best, but they are not predictable, and they are not the same every time that I execute that function. But Linux XSLT can recognize again the very same number that we saw on the previous line, that 7.82 thing. That is the very same number. If you are calling Python again with a print random-random function, we will see that we have again two different numbers. So, so far, four for Python and one for Linux XSLT. If we are calling Linus XSLT again, you will notice in second position we'll always have these new values, the 0.13 and it will be repeated every second time they are calling this. So, without having an external (Inaudible) you may know in advance which will be the sequence of numbers that will be generated by Linux XSLT. Which is pretty cool, because you may know in advance what's being encrypted if they are using this to encrypt something, which would be pretty ridiculous. So again, you may predict values when you see random numbers. The same origin point is something present in client side processors, this means web browsers. Basically this says that if you are on a website, you shouldn't be reading information from other websites, but again that may not be the case for (Inaudible) So this is important, the origin is always defined from the scheme, the host, on the port of a URL. What would be an example of this? The http at the very beginning or https would be the scheme. The host should be example.com and the ports should be either port 80 or port 443 or something like that. Generally speaking, when we are retrieving documents from different origins, the web browsers will not share the information. I mean when we are taking the same origin over and over, we may have-- we'll be sending the same cookie over and over to the same website and that would be ok. Normally, Java script is used to try to deter this, but you don't necessarily need to use it to effect the same origin policy. You should not be expecting that when you are connecting to Google.com your browser will be saving the very same cookie to this website because it have the very same origin. If you are connecting to Microsoft.com you should be seeing a different cookie. This will be a very (Inaudible) You just connected to the website and you are on the main webpage and you are trying to access a second webpage that's being stored over there. That will be fine. That's ok, you are allowed to see that. In fact, you are even allowed to see other webpages on the very same domain. But, if you are changing the scheme, if you are changing the host name, or if you are changing the port, you shouldn't be allowed to see any of the information that is present on that other website. I mean you are not sharing the private information between websites. That's what you would expect at least. So, there is only one function that reads documents, and that's document. Ok, so you may try to use that to read another XML document. In fact, since we're speaking about websites, we could also see here xhtml, which is a fairly common way to represent a webpage by certain web servers. Once we retrieve the html documents, we could see what's inside using either of these two functions (Inaudible) It will show either an XML representation or (Inaudible) representation. So, the very first thing that you want to do if you want to use this, you need to find a server that uses xhtml. Ok. Bing.com uses xhtml. I'm logging in here. What can you do with this? So in the upper right corner you will see that my name is in a red box and that is also reflected in the code. Since it is xhtml this is some sort of XML and my name is in an element named ID underscore M. So you may be able to target your web browser to retrieve that value. So let's see how using that document, how using many of the other functions, we can retrieve that information. In here, we can see that the document function is accessing the URL, WWW being dot com. Then right in the middle we are retrieving the information that we just grabbed from the document. And finally, because I'm lazy, on Java script, I will be subtracting the ID_ dot element, which has my name. Let's see a demo of this. First I will open Safari and I will show you that I'm using Bing.com as my home page. And then I will open the document that is on the desktop that it is not sharing the same origin because one, Bing.com is being hosted on https, Bing.com, and the other one it is a file, it is a local file. Let's see what happens. Notice again my number in the upper right corner. And when I open that file, I'm reading the document that is being stored by Bing.com and I'm able to retrieve my name from using XSLT. Even though it is not hosted on Bing.com, Safari doesn't care, and we'll show you that information. So, basically Safari will all you to read this. Internet Explorer may show you a warning message, it will retrieve the information, but you won't be sharing anything related to this. And other browsers didn't show anything. Another cool thing would be that you may use some of these sites to scan internal networks in case you wanted to. There are multiple ways to try to scan internal networks when you are executing something locally and this could be another way as well. So another vulnerability that I found and I found would be very interesting to discuss would be an information disclosure and I'm probably reading through errors. This is something that's present in server side and client side processors. The focus here is of course on server side processors because we wouldn't care what would happen on a web browser. So the cool thing about this is that it is not possible to read text files in XSLT1. It's only possible to read XML documents, or as we saw, xhtml documents. And since it's not possible to read plain text files, it doesn't matter what function you are trying to use in here because all functions wouldn't be capable of doing this. Let's see what happens even though when the W3C consortium says it's not possible. We saw before that there was one function to read XML documents and that is the document function. This document will allow access to other XML documents other than the main document. We have that. We can try to use that. There are also other functions used for accessing XML documents. And that would be the functions "include" and "import". These functions do just retrieve and (Inaudible) and I'll try to use it combined with other (Inaudible) We don't care what the manual says about this because either way we're not trying to read a stylesheet in here. I created a text file that contains three lines, very simple. If you see the contents on my test file you will see a line one, line two, and a line three. Pretty simple. If you read the documentations, you see that when you are reading a file-- this comes from XML documentation-- if you are reading a file, there are a couple possibilities. The first one is that you may show that XSLT processor found an error. And this is what some of the processor do. They say this is not allowed in (Inaudible) Okay. That would be-- that's okay. The other possibility would be to return an empty XML document. That's what Ruby does. Ruby will show you that there is nothing to see in here and this is something that is also expected as well. But again, this doesn't solve the problem that-- we wanted to read something that was in the test file. Linus XSLT comes again to help us with this. So when using document, XSLT prompt, PHP, and Pearl, will show you first line of our test file. Remember line one of test file? That's not too much. But it's cool. Perhaps we could do something with that. We also try to use other functions to try to access these files later. But having this unexpected behavior in place may allow us to do something with it. So you may know in advance where I'm going with this. We've saw before as it may have an interesting first line that would be valuable for us. There are certain specific files that store the most valuable information of a computer on the very first line. So what if we won't be able to read, for example, a password file. Where could we find those passwords? The most common answer for any Linux system would be in its inner password. The next one if you go off a bit (Inaudible) through a shadow. The possibilities are in your imagination in here. Depending what you are trying to read, you may be able to retrieve certain information that might be valuable for you or someone else. You also have the Apache password and you may also have database passwords. There are a number of possibilities down there. The thing is that this is what you'll see when using, for example, one of the processors when trying to read it into your password. You will see an error, and also something else: the password. Which is cool. I mean you could also now use XSLT to retrieve this information. Another example you seen PHP could be to try to use to read HD password of an Apache. And again, since this is something that you store on the very first line, you may see a bunch of errors and right in the middle, what you were planning to see, the password for Joe, in this case. And as I was saying before, just in case they do not care about what they are doing, you could also have someone using the shadow, leaving that available if they are running this as root, and this will happen if you are using Ruby to try to retrieve that file. Again, expect all the errors, but also expect the password for the root over there. So, this is pretty neat because I believe this opens the possibility for XSLT to be as interesting perhaps as (Inaudible) expansion as a way to retrieve some information even if a hacker is able to compromise an XSLT because the application is allowing-their application is allowing XSLT to be uploaded, or XML that are relying on document importing include that may be trying to read files. So, either, if you are able to control an XML and you have an XLT processor in the back end parsing this, or you are able to control the XLT, you may compromise the security of an application. As we also saw, we don't always need to do that in place to have the confidentiality and integrity effected because sometimes when using random function or integers, they may be doing that to our profit without doing anything on our side. So, I would recommend, as a very last thing, that you should check your code, or someone else, in case they want to see what's going on to use these things. So, that's what I have for today. If anyone has any questions, I would be happy to answer them. [Clapping] >> Thank you very much. And thank you to all these people who helped me with the presentation.