Watching my logs scroll by (doesn’t everybody?) I see an awful lot of hits on obscure parts of my web site from the IP 18.104.22.168. Grep back, and see that they’re evidently crawling my blog, and every link from my blog. And even weirder, every URL they grab they use the same URL in the referrer string – an obvious attempt to defeat one of those redirections that shows you a different page if you deep link something instead of going to it from the place you saw it referenced. – although wouldn’t it be simpler to use the page you found the link on instead? Further grepping shows that they did NOT get my robots.txt file. They’re also downloading the pages as fast as they can with no pause before getting the next one – it’s possible that they’re doing several simultaneous ones. Ok, three strikes, you’re out.
Into the /etc/http/conf/httpd.conf file, and a few well-placed
restart the server, and now Mister Badly Behaved (and probably Badly Intentioned) Crawler is getting a lot of 403s instead of pages.
4 thoughts on “I don’t know what they’re up to, but I don’t like it.”
With crawlers like that I’d rather spare the web server from becoming unclean and just drop the packets at the firewall….
Perhaps a grumble to Cox Communications Inc’s abuse department might be in order.
Not that such a subtle form of action seems to have any great effect these days…
It’d be more fun to redirect every one of those connections to some honeypot.
Redirect to a bandwidth limited server that returns 16Gb files of nothing?
Comments are closed.