Archive for June 18th, 2004

I'm the IT manager. Do you fancy me?
Which Office Moron Are You?
Rum and Monkey: jamming your photocopier one tray at a time.

For the past couple of days, this “Willow Internet Crawler by Twotrees V2.1″ has been agressively crawling my site. And I mean agressively - they download every single page as quickly as they can, with no pause between them. This is a bit of a pain, because it means they are sucking down my bandwidth that I’d rather use for live human beings or better behaved applications.

But today was the last straw - I have a robots.txt file because when web crawlers hit my image gallery, they tend to cause errors in the php code that gets logged in /var/log/messages. So today I noticed a “Last message repeated 147 times” message scrolling by, I looked and sure enough “Willow Internet Crawler” isn’t obeying the spider guidelines - they haven’t even looked at my robots.txt.

first thing I did was go to their web site - and discovered that under “Contact Us”, you can only see their email address while your mouse is hovering over the title - once you move the cursor away to actually type in a mail program, it goes away again. And the address isn’t in the same place as what you are hovering over. Making it a (probably purposely) difficult to cut and paste the address into mutt.

So fine, you want to be an asshole? I can be an asshole too. I opened up /etc/httpd/conf/httpd.conf, found the “allow all” line, and added a “deny 68.244.166.8″ after it, restarted the web server, and now I’m watching “Willow Internet Crawler” get a lot of 403s. So fuck you, Twotrees.net, and the horse you rode in on too.