Which Office Moron Are You?
Rum and Monkey: jamming your photocopier one tray at a time.
Day: June 18, 2004
And the horse you rode in on, too!
For the past couple of days, this “Willow Internet Crawler by Twotrees V2.1” has been agressively crawling my site. And I mean agressively – they download every single page as quickly as they can, with no pause between them. This is a bit of a pain, because it means they are sucking down my bandwidth that I’d rather use for live human beings or better behaved applications.
But today was the last straw – I have a robots.txt file because when web crawlers hit my image gallery, they tend to cause errors in the php code that gets logged in /var/log/messages. So today I noticed a “Last message repeated 147 times” message scrolling by, I looked and sure enough “Willow Internet Crawler” isn’t obeying the spider guidelines – they haven’t even looked at my robots.txt.
first thing I did was go to their web site – and discovered that under “Contact Us”, you can only see their email address while your mouse is hovering over the title – once you move the cursor away to actually type in a mail program, it goes away again. And the address isn’t in the same place as what you are hovering over. Making it a (probably purposely) difficult to cut and paste the address into mutt.
So fine, you want to be an asshole? I can be an asshole too. I opened up /etc/httpd/conf/httpd.conf, found the “allow all” line, and added a “deny 68.244.166.8” after it, restarted the web server, and now I’m watching “Willow Internet Crawler” get a lot of 403s. So fuck you, Twotrees.net, and the horse you rode in on too.