I don’t know what they’re up to, but I don’t like it.

Watching my logs scroll by (doesn’t everybody?) I see an awful lot of hits on obscure parts of my web site from the IP 68.7.32.213. Grep back, and see that they’re evidently crawling my blog, and every link from my blog. And even weirder, every URL they grab they use the same URL in the referrer string – an obvious attempt to defeat one of those redirections that shows you a different page if you deep link something instead of going to it from the place you saw it referenced. – although wouldn’t it be simpler to use the page you found the link on instead? Further grepping shows that they did NOT get my robots.txt file. They’re also downloading the pages as fast as they can with no pause before getting the next one – it’s possible that they’re doing several simultaneous ones. Ok, three strikes, you’re out.

Into the /etc/http/conf/httpd.conf file, and a few well-placed
Deny 68.7.32.213
restart the server, and now Mister Badly Behaved (and probably Badly Intentioned) Crawler is getting a lot of 403s instead of pages.

Side effects matter

One of my cow-orkers used his new fancy GUI IDE that showed him that a variable wasn’t being used in my code, so he commented it out. Only one problem: the variable was one of a list of variables being retrieved from a SQL select statement, and like is common with these things, I was retrieving them with:


int a = rs.getInt(p++);
int b = rs.getInt(p++);
String c = rs.getString(p++);

Notice the problem there? If you comment out one of the getInts without removing the field from the select statement, you also lose the “p++”, so everything after it gets the wrong field stored. Which causes a pretty nasty little bug.

Thanks, guy. That’s a few hours of my life I’ll never get back.

It figures

I’ve got the mission (to Chicago tomorrow, back home on Sunday), I have my choice of planes, and I have almost perfect VFR weather for the entire 4 days over the whole region (probably the first time that’s happened this winter). But I’ve also got non-refundable plane tickets on United, so instead of 4 hours of fun flying, I’m going to have an hour or more of security, an hour of waiting for a connection in Dulles, and two hours of sitting in torture tubes, and then who knows how many hours getting from O’Hare to my destination (which is close to DuPage airport).

Oh well, at least I don’t have to worry about pre-heating the engine on Sunday at an “away” airport.

This is getting ridiculous

In the last 24 hours, MT-Blacklist has stopped 168 comment spam attempts, and let one through.

Keep in mind that I close comments on any blog entry over 100 days old, so this is probably fewer than 100 blog entries that were the lucky recipients of those 169 comment spam attempts. Neither MT-Blacklist nor I see the attempts to comment spam the older ones unless I look for POST commands in my web log.

One thing I’ve noticed recently is that comment spammers are GET-ing pages on my web site with the referrer string set to the site they’re trying to spam for. I guess they’re hoping that people are running webalizer (which I note is enabled by default in Fedora Core 3) or some similar log analyser that puts up a log of referrer strings somewhere where Google can find it. So a warning to everybody reading this: if you’ve got a web log analyser, make sure it’s not somewhere were Google or any other search engine can find it.