Simple DDOS Amplification Attack through XML Sitemap Generators
It was all too easy really. Filling up a 10Mb/s pipe, tearing down a website with just a handful of tabs open in a browser seems like something that should be out of the reach of your average web users, but SEOs like myself have made it all too easy by creating simple, largely un-protected tools for unauthorized spidering of websites. So, here is what is going on…
Yesterday a great post was released about new on site SEO crawlers that allow you to determine a host of SEO issues by merely typing in a domain name. This seems fantastic at first glance, but I immediately saw an opportunity when I realized that none of these tools – and really almost none of the free SEO tools out there – require any form of authentication you actually own the website which you are running the tools on. This creates a real danger.
Amplification and Distributed Denial of Service Attacks
I know most of the readers here are probably not in the security community so I will describe a few terms here. A Denial of Services (DOS) attack is simply a method of preventing a website from receiving any real visitors by flooding the site with fake traffic. A “Distributed” Denial of Service attack, which is more common these days, is essentially the same thing except that you use a large number of computers, often a BotNet of compromised home computers, to overwhelm the target site rather than just a handful of computers under the control of the hacker. This makes the attack much more difficult to block because differentiating between fake and real users is very hard when nearly every IP that is hitting the site is unique. Amplification is the process by which you increase the effect of the attack by turning 1 bandwidth-requiring action into many. In this case, amplification occurs because 1 submission of a URL to an XML Sitemap Generator results in THOUSANDS of requests being made to that site to build the sitemap.
How the Simplest Version of the Attack Works
First, I built a list of a whole bunch of XML sitemap generators that are free. There are tons out there and this is hardly an exhaustive list, but it was sufficient to fill a 10Mb/s bandwidth connection on a server. Here are the steps…
- Open up each of the sites below in a tab
- Type in your target domain into each
- Once ready, go tab by tab hitting “submit”
- Monitor the tabs and every time one of them completes, hit refresh
Following those steps, I was easily able to send in tens of thousands of equivalent visitors to the site coming from high bandwidth servers across 47 persistent servers. The proof is in the graphs.
As you can see above, I was able to spike the server to the full 10 Mb/s allowed.
And I was able to take up the full CPU.
Takeaways
The biggest concern here is not someone doing this with tabs, but building a much more exhaustive list of free, open site crawlers and then triggering them in an automated fashion. Sitting behind ToR, it would be quite easy to fire up hundreds of similar crawlers across the web in full anonymity and continue to ping them over and over again, effectively pulling down a site without controlling anything even remotely similar to a BotNet.
The simple solution to this type of problem is for tool creators to require authentication in the form of some sort of site ownership verification (upload a file with an authentication code, for example) before allowing the tool to run.
One other point that would help these XML sitemap generators reduce the amount people wasting their bandwidth trying this tactic out is to add in reCAPTCHA so it is harder to be automatically refreshed continuously.
For online sitemap/crawler tools, a captcha would help some against easy automated attacks.
As a a developer of desktop site crawling tools myself (A1 Sitemap Generator and A1 Website Analyzer) is it not *that* easy to avoid allowing people crawling the sites they want.
The problem of course is that there are many:
“Mom and dad” for where verification process of adding additional files/similar would be a problem.
“Agencies” for where doing quick tests on client sites would be slowed down.
What one can do is to have default settings obey robots.txt crawl delay and other similar things. (e.g. set he default settings to maybe only 1/10 of max speed) – if the user has the technical capacity to configure these things and use TOR /proxies, he/she would probably also be able to find software made for DDOS attacks.
Old post, but I thought I’d add a note – if a free crawler is serious about this, you could simply add a sleep of 2-5 seconds between pulling pages from a site, unless the user authenticates with a metatag, domain-based email address, or other method, which would remove the sleep timer.
An average shared server could handle 500 servers pulling 10 times per minute.
By adding recaptcha we can save our site from ddoc attack from xml sitemap generators?