Watching my logs scroll by (doesn’t everybody?) I see an awful lot of hits on obscure parts of my web site from the IP 22.214.171.124. Grep back, and see that they’re evidently crawling my blog, and every link from my blog. And even weirder, every URL they grab they use the same URL in the referrer string – an obvious attempt to defeat one of those redirections that shows you a different page if you deep link something instead of going to it from the place you saw it referenced. – although wouldn’t it be simpler to use the page you found the link on instead? Further grepping shows that they did NOT get my robots.txt file. They’re also downloading the pages as fast as they can with no pause before getting the next one – it’s possible that they’re doing several simultaneous ones. Ok, three strikes, you’re out.
Into the /etc/http/conf/httpd.conf file, and a few well-placed
restart the server, and now Mister Badly Behaved (and probably Badly Intentioned) Crawler is getting a lot of 403s instead of pages.