Spammers must die, Wiki Spammers doubly so.

A little while ago I set up a Wiki to discuss the imminent loss of the Digitial Aeronautical Flight Information File (DAFIF). Yesterday, the spammers found it. For two days now, some asshole has registered a new account and make his “user page” a spammer link farm. Now I could continue going in and ripping these pages out, but that could get really boring.

I think what I need is to replace the simple Wiki software I’m using (TWiki) with something with a bit more features. Specifically, I’d like to make it so that you can’t start editing until you give the software your real email address, like the way my mailman mailing lists require you to get and respond to a confirmation mail.

Any suggestions? (And yes, I know I asked for suggestions before and then ignored them all in favour of one that was easier to install. I’ll go back and read your suggestions from that time as well.)

SQLite again

I wrote a few days ago about a problem I was having with SQLite, or rather with DBD::SQLite. Turns out that one should never assume that the version you installed on one machine is the same as the version you installed on the other. After making sure the machine I was testing on was up to DBD::SQLite version 1.11, that part of it worked fine.

I’ve been doing some timing tests on the a generator task that generates 26915 waypoints, and doing one at a time it takes about 45-50 seconds and doing two at a time takes twice as long (about 1:30-1:40), as opposed to MySQL which takes 3:40 for one at a time, and 4:45 for two at a time. The fact that the SQLite one takes twice as long when there are two running makes me think it’s probably CPU-bound. The fact that it’s way, way faster than the MySQL alternative makes me think this is definitely worth pursuing.

But there’s a wrinkle. According to a post on the SQLite mailing list, one program can’t commit a write while another one is doing a query, even if the writes don’t involve the same tables. I guess SQLite’s database level locking is pretty stupid. But that’s the problem – there are three different types of things going on:

  • Database reloads – these only happen about once a month, only one at a time, and involve reloading FAA and DAFIF data into the waypoint, comm_freqs, runways and other similar tables. The reload scripts can take a hour or more to run.
  • Database generations – these run in the background, and just query those same tables that the reloads are loading. Lots of these run can run at once, and lots are run every day. As mentioned above, they only take a few minutes to run.
  • Choosing generating options in the web site. These tasks run after clicking one form page on the generator and generating the response page. These mostly do some queries, but as well they track what options you’ve chosen so that they will be the defaults next time you come back. It does that by updating some tables which are not involved in the database generations.

Obviously a user doesn’t want to be sitting there waiting for their page to load for as long as it takes somebody else’s generation script to run. I’m going to have to try putting the tables that are used for storing these options in a different database to see if that will enable the pages to update in a reasonable time.

RedHat, you suck

Ok, I found out why inn wouldn’t upgrade – I’ve been using “timecaf” for the news spool. This is a semi-binary format which is supposed to be faster and more efficient than “tradspool”, which is the old single file per article in a directory structure based on the newsgroup names that we all used to know and love. “timecaf” creates just a few files per day with multiple entries in each file. I forget why I stopped using “tradspool” because this machine is way overpowered, maybe it was to see if we could use it at NCF.

Timecaf has been working pretty well for me, but evidently it has binary file offsets embedded in the file. And RedHat (oh, sorry, the “Fedora Community”) arbitrarily decided to enable “Large File Support” in between Fedora Core 3 and Fedora Core 4. This means that each record in each “timecaf” file has a 32 bit file offset attached to it, but the program is expecting a 64 bit file offset. That makes it impossible for the program to find the records.

I tried a few things, including compiling the source without large file support, and I still couldn’t get it to work. So I threw in the towel and blew away the whole news spool. After all, this is usenet, every idea comes around again in a few weeks anyway.

Upgrade nearly painless

The WordPress SpamKarma plugin was complaining about invalid SQL syntax. A little investigation showed that SpamKarma was expecting MySQL 4.0 or newer, but reluctantly allowed you to use MySQL 3.x as long as you accepted there might be some problems. I’d had other problems with the old versions of software that were part of Fedora Core 3.

I decided it was time to upgrade to Fedora Core 4. I don’t know for sure, but I think I’ve been on FC3 for a couple of years now. I would rather upgrade to Debian Sarge, since this machine is really just a server now, and I hardly ever use X Windows on it. But that would require too much work. It appeared I could upgrade from FC3 to FC4 using yum, with minimum downtime, so I set aside some time today to do it.

The upgrade is now finished. It was surprisingly smoothly, except for one thing. Inn won’t run. I can’t even rebuild the history files with makehistory – it dies with a SEGV. I’ve tried all sorts of things, and I’ve tried writing to Russ, the main developer. I have a bad feeling that I’m going to have to switch to installing from source, and I don’t want to lose the advantages that you get when you let somebody else (in this case, the Fedora team) manage the upgrades and dependencies.