Abandoned Websites

For all the work that people do to try to prevent spam from showing up on websites, there's a class of sites that spammers are using to propagate spam basically unchecked. Abandoned Websites.

Here's how it typically goes:

  • Bob gets an idea for a site.
  • In a burst of optimism, Bob registers a domain name, and heads to a web host to do some basic setup.
  • Bob configures a phpBB, WordPress or MediaWiki site for his new project and starts setting it up.
  • Bob goes to bed, wakes up the next day, gets busy with other stuff, and doesn't get back to his new project for some time.
  • Time passes...
  • Spammers bots find the site. "Hey look, phpBB".
  • Armed with algorithms for breaking that version of phpBB's CAPTCHA, the site starts filling up with spam.

That's it; your project has now become a tool for the dark side.

Names have been changed to protect the guilty; substitute Bob above for Steve and phpBB for MediaWiki.

A few days ago I started making a backup of the databases on my webhost. I asked MySQL to backup all of them using mysqldump --all-databases, and noticed it was taking a long time. I asked MySQL what it was doing (SHOW PROCESSLIST), and saw that it was dumping out a table called 'text' in a MediaWiki installation that I'd created for a project I wanted to work on about a year ago.

Turns out that table had 12 gig of data in it.

Most of the data was either new wiki pages or edits to the Talk pages of existing wiki pages, with links to websites that the spammers were trying to promote. MediaWiki adds a nofollow attribute to outgoing links, so I don't know why spammers would want to do this, but they were doing it hard.

12 gig is a lot of data, and on top of that, there's all the activity of posting it and the activity of all the search engines crawling it. This one abandoned MediaWiki installation was wasting significant resources for a number of companies.

Now it's gone.. but how many more are there out there?

So this is my public service announcement: Check your domains. Are there any domains you bought, set something up on and then abandoned? Now is a good time to clean them out.