Time well wasted

This weekend, I’ve accomplished two major things of the four or so that I wanted to get done.

Yesterday, I moved my picture gallery from my home machine http://xcski.com/gallery/ to my colo machine at http://gallery.xcski.com/. That was surprisingly easy once I found the part of the Gallery FAQ that showed what I was doing wrong. The biggest glitch is that when I first brought it up, I had the “Square thumbnails” option turned on, and while I turned it off and told Gallery to regenerate all the thumbnails, most of them are still square. Nothing wrong with square thumbnails, but it means that it had to trim the pictures, so for example airplane pictures tend to show the middle part of the fuselage instead of the whole thing, and some full length portraits of people cut off the heads and feet. I may have to try regenerating thumbnails again.

This morning, I woke up to find my newish external USB drive that I use for backing up my colo box is dead, and so last night’s backups didn’t work. When this has happened with my other external USB drive, usually powering it down and up again fixes it, but this time it didn’t. So I did a “mkfs.ext3” on it, and started the nightly backups again. And mid-way through, the logs started filling up with errors saying that the drive was in the process of being unplugged(?!). I rebooted the server for the first time in 66 days, and that seems to have fixed it. Hopefully tonight’s backup (which will end up being huge because the old Sunday night backup is gone) will work.

Today, I’ve mostly been concentrating on trying to restore the archives to my mailman mailing lists. When I moved my mailing lists from my home server to my Virtual Private Server at linode.com, disk space was an issue to I trimmed the archives down to just the last two years. I kept the full archives on my home server just in case. And now that the lists are on my colo box, disk space isn’t an issue any more and so I tried to restore them.

My first attempt, using vim and a split screen with the old archive (which goes up to mid 2005) and the current archive (which starts in January 2005) and attempting to cut the pre-2005 stuff out of the old one into the new one didn’t work very well. I quickly bogged the machine down in extensive swapping.

So then I cut the old archive down to stop at the beginning of 2005 using

sed -n '0,/^From.*2005/p' < old.mbox > old.mbox_to_2005
head -n -1 old.mbox_to_2005 > old.mbox

and then catted the old and the current one together.

That’s when I discovered that mailman’s arch program, that regenerates archives from the mbox files, is a huge memory hog and also has a couple of bugs. First couple of times I tried to run it, it processed a few thousand messages and then died with a message about an empty module name. When I realized it was dying on the same message both times, I discovered that back in 2000 one of the mailing list users had a weird-ass bug in their mailer that was sending email with the header

Content-Type: TEXT/PLAIN; charset=".chrsc"

Evidently some sort of misconfigured character set. I used sed to change that to “us-ascii” and arch seemed a lot happier. At least until it happily consumed all the ram and most of the swap on the system. Everything dragged down to slower than a very slow thing.

I found some awk code to split up mbox files into smaller chunks, and set it to run on this huge unified archive, and then ran arch on the chunks. That mostly worked, and only slowed the server down to a not-very-slow thing, except that arch did the wrong thing on any line that started with “From ” that wasn’t the start of a mail message. I didn’t discover this until it had been running for quite some time, so I had to start again.

I used my extensive knowledge of awk (in other words, I cargo culted something) to make the mbox splitter also change any of these bogus “From ” lines into “>From “. After another hour or so of running I discovered a small bug in my splitter that meant it worked for early archives and not for later archives, probably due to a mailman upgrade or when I switched from sendmail to postfix.

So I fixed that bug, and started again. It’s been running for over an hour now, and seems to be working fine. Well, except for 5 or 6 messages that came in January 2000 that had the year set to “100”, and one place where somebody actually quoted a mailbox header without putting a “>” at the beginning. Minor inconveniences.

I guess tomorrow morning I’ll have to check that the archive is correct, and the nightly backups worked. But for now, I’m going to bed.

QA versus development

On a mailing list somewhere I was musing about why I, as a developer, always find myself annoyed at QA. And that’s not good, because QA and development are partners in making sure that what we develop comes to the customer as good as we can make it. But the problem is that development never has time to properly document what we’re doing to QA, and QA only communicates to development in the form of bug reports.

As far as I can tell, there are only N types of bug reports:

  1. annoying because you already knew about what they are reporting.
  2. annoying because you thought you were done that bit and now you have to go back to it.
  3. annoying because you know that part works and now you’ll have to drop everything to go show them how they are using it wrong.
  4. annoying because you thought that part works and now you’ll have to drop everything to have them show you how they are using it right.
  5. annoying because their bug report doesn’t give you enough detail.
  6. annoying because it goes into excruciating detail when you could tell what is wrong from the first sentence.
  7. annoying because it goes into excruciating detail about stuff you already knew, but glosses over the bit that tells you if it’s a known bug or something new.
  8. annoying because they are describing something that’s working the way it’s documented to work.
  9. annoying because they are describing something that’s working the way you want it to work, but you haven’t had time to document that behaviour yet.
  10. annoying because it’s the same bug they already logged a week ago.
  11. annoying because it’s so poorly written that you can’t tell if it’s the same problem as the one they already logged a week ago.
  12. annoying because you thought you’d fixed that last week but it’s obvious from the report that you missed something.
  13. annoying because you thought you’d fixed that last week and it’s not obvious from the report if they’re testing the new code or not.

But what it all comes down to is that QA is annoying because they’re a constant reminder that you’re not as good as you wish you were. I don’t want to be “only human”, I want to be perfect.

Getting there.

I’ve gotten a few steps closer to moving everything that was on my Linode virtual private server over to my colocation box. Basically, the only thing left there is the hardest one to move, and that’s the navaid.com waypoint generator. Part of the problem is that the new site has FCGI instead of FastCGI, and part of the problem is that I’m going to be converting from MySQL to PostgreSQL, and of course the version of MySQL in Debian Sarge doesn’t have the “compatibility” option in mysqldump. Oh well, I’ll get there.

Today I moved my Mailman mailing lists over. Since the versions of Mailman and Postfix were the same on both places, it was a pretty simple matter of copying the files over. The hard part was managing the cut-over so that no mail got lost. That meant getting everything set up on the new site, using rsync to make sure the files were absolutely up to date, checking out the permissions, and once I’d tested the setup using forced fake DNS entries, cutting over the real DNS entry. I think it’s all working right.

Next up, I’m considering moving my Gallery installation over. I’ve also got to get out and install a new hard disk that was given to me.

Tell me, is it a bit weird that on one of my few days off from a stressful software development project I spend the whole day futzing around with computers?

Today’s fascinating discovery

I’ve mentioned already that I put a system on a local rack, and in order to cut costs I divided it up into three sections using Xen. Well, I had this annoying little problem that the “domU” (user domains – ie. the shares) weren’t able to use iptables. So I’ve gone back to the drawing board by slapping a couple of drives I have kicking around into my Windows box and trying various experiments.

First, I went back to the “step-by-step” how-tos that I’ve been using so far. They’ve updated it for Xen 3.0.3 (I actually installed Xen 3.0.2 using a how-to written for 3.0.1). So I ran through it – no joy. The domU boots, but mounts the ext3 file system as ext2 and won’t do iptables.

Tried again with their instructions on how to compile a kernel, except the instructions say to compile in iptables support, but don’t tell you how to compile in appropriate device driver support so I ended up with no network in my dom0 (the controller domain).

Then I found another “how-to”, this based on the fact that Xen is in the Debian “sid” (aka “unstable”) branch. Updated the test machine to “sid”, then went through the how-to. Initially, couldn’t get xend to start up, but then it turns out that I’d installed xen-hypervisor-3.0-unstable instead of xen-hypervisor-3.0.3. Got that installed, got the domU up and running, but DAMMIT, still the same problem. When I tried to do an “iptables -L”, it would tell me that “QM_MODULES: Function not supported”. Same if I did a “depmod -a” or “lsmod”.

While I was working this angle, I discovered that the Debian Backports project had backported Xen to “sarge”. Hmmm, I thought, if this works out I’ll have to try the Backport to see if I can do this on the rack with minimal hassle and without having to run “unstable” on a “production” server.

That’s when I discovered something interesting – modutils is old, and if you’re going to be using 2.6+ kernels only, people recommend you install module-init-tools instead. Since I’ve been installing Debian “sarge” (aka “stable”) in the domUs, and “sarge” is designed to support 2.4 and 2.6 kernels, it installs modutils instead. I installed module-init-tools, and suddenly everything worked.

Hey, I thought, maybe I don’t have to go through all this pain. I went to my real xen system, installed module-init-tools on the domU, and everything works! No need to go for the Backport. Maybe I will later, but for now I’ve got what I want, and I can install ssh-blacklist on my domU.