Archive for November, 2006

Yesterday I went out to the colo to put another hard drive in my 1U box. I’ve shut down my box about 3 times now, and one of the times the third domU got a corrupt disk and had to be wiped and reinstalled. That’s why I tried so hard to make sure that all the disks got mounted as ext3 (with journalling) instead of ext2 (no journalling). This time, just to make sure, I used the “xm shutdown [domU name]” on all three domUs before I shut down the box, just to make sure they shut down cleanly.

It took a bit of struggling to get the second drive working – I had to jumper the drives as master and slave instead of cable select, and the 80 pin cable I brought along didn’t quite stretch from one to the other so I had to stick with the existing 40 pin cable. But other than that, it seemed like everything went fine.

Until I got an email from the owner of the third domU. He couldn’t log in. So I tried the “xm console”, and saw

xm console xen3
attempt to access beyond end of device
hda1: rw=0, want=1357711368, limit=104857600
attempt to access beyond end of device
hda1: rw=0, want=18058643056, limit=104857600
attempt to access beyond end of device
hda1: rw=0, want=2123850752, limit=104857600
attempt to access beyond end of device

and then it would prompt for a userid but never prompt for a password.

I shut down his domU and did an fsck on his lv, and it reported dozens if not hundreds of errors. It boots now, but I’m scared that it’s going to do this again.

As part of my vacation recovery, I decided to submit a patch to make GPSBabel understand CoPilot version 4 files. I wrote the module for understanding CoPilot files back in 2002, but it only understood version 3. I decided to make it check the version number in the header and do the right thing for any version.

Now back in 2002, I cargo culted the existing GPSPilot code, and what I was doing wasn’t hugely different from what I already had there. But I haven’t written C code for a living since … (checking my resume) … 1994. Since that time, I’ve been coding in C++, Java, and perl. And I haven’t even done C++ since 2002. Grovelling along a “pointer to data” to try and extract some binary data into a format I can use is something that these days I’d do using pack/unpack in perl. C just seems so damn primitive now – almost like something that belongs in the last millenium. And it does. I was so impressed with it when I first started using it. But that was a lifetime ago.

On a mailing list somewhere I was musing about why I, as a developer, always find myself annoyed at QA. And that’s not good, because QA and development are partners in making sure that what we develop comes to the customer as good as we can make it. But the problem is that development never has time to properly document what we’re doing to QA, and QA only communicates to development in the form of bug reports.

As far as I can tell, there are only N types of bug reports:

  1. annoying because you already knew about what they are reporting.
  2. annoying because you thought you were done that bit and now you have to go back to it.
  3. annoying because you know that part works and now you’ll have to drop everything to go show them how they are using it wrong.
  4. annoying because you thought that part works and now you’ll have to drop everything to have them show you how they are using it right.
  5. annoying because their bug report doesn’t give you enough detail.
  6. annoying because it goes into excruciating detail when you could tell what is wrong from the first sentence.
  7. annoying because it goes into excruciating detail about stuff you already knew, but glosses over the bit that tells you if it’s a known bug or something new.
  8. annoying because they are describing something that’s working the way it’s documented to work.
  9. annoying because they are describing something that’s working the way you want it to work, but you haven’t had time to document that behaviour yet.
  10. annoying because it’s the same bug they already logged a week ago.
  11. annoying because it’s so poorly written that you can’t tell if it’s the same problem as the one they already logged a week ago.
  12. annoying because you thought you’d fixed that last week but it’s obvious from the report that you missed something.
  13. annoying because you thought you’d fixed that last week and it’s not obvious from the report if they’re testing the new code or not.

But what it all comes down to is that QA is annoying because they’re a constant reminder that you’re not as good as you wish you were. I don’t want to be “only human”, I want to be perfect.

I’ve gotten a few steps closer to moving everything that was on my Linode virtual private server over to my colocation box. Basically, the only thing left there is the hardest one to move, and that’s the navaid.com waypoint generator. Part of the problem is that the new site has FCGI instead of FastCGI, and part of the problem is that I’m going to be converting from MySQL to PostgreSQL, and of course the version of MySQL in Debian Sarge doesn’t have the “compatibility” option in mysqldump. Oh well, I’ll get there.

Today I moved my Mailman mailing lists over. Since the versions of Mailman and Postfix were the same on both places, it was a pretty simple matter of copying the files over. The hard part was managing the cut-over so that no mail got lost. That meant getting everything set up on the new site, using rsync to make sure the files were absolutely up to date, checking out the permissions, and once I’d tested the setup using forced fake DNS entries, cutting over the real DNS entry. I think it’s all working right.

Next up, I’m considering moving my Gallery installation over. I’ve also got to get out and install a new hard disk that was given to me.

Tell me, is it a bit weird that on one of my few days off from a stressful software development project I spend the whole day futzing around with computers?

Back in June, I wrote about how the Maintenance Coordinator for the club’s Lance is downright secretive. Today, Vicki, Laura and I went confidently out to the airport for a flight we’d scheduled last week, taking the Lance out to Albany to spend Thanksgiving at Stevie’s new apartment. We loaded our luggage in the luggage compartment, and Vicki and Laura went back into the FBO while I preflighted. I open the front door and sit down, and there is a sign taped to the yoke saying the plane is grounded. Oh oh. I do a quick walk around and discover that the wingtip appears to have been scraped by somebody or something, and it’s taken off the navlight/strobe fixture, the guts of which are held in place by scotch tape.

Ok, now I’m mad, because when the plane is grounded, the Maintanance Coordinator is supposed to mark the plane as grounded in ScheduleMaster, our on-line scheduling system so that people don’t expect that they’re going to be able to use it for a trip and don’t find out until they get to the airport that there is a known problem with it.

I head back to the ops room to check ScheduleMaster to see if any of the other planes are available. That’s when I discover something that made me 300% madder still – the squawk list shows that this wing navlight/strobe was squawked on 10/8, over 6 weeks ago! The squawk says that the person reporting it immediately called the Maintenance Coordinator. But the Maintenance Coordinator has let this plane sit there grounded for 6 fucking weeks without letting anybody know!

I called the VP of Maintenance, and he didn’t know either. He says that when it happened, Bill, the Maintenance Coordinator, said he would get it fixed that very week, and he’d assumed Bill had done that. I told him that this is totally unacceptable, and either he removes Bill as Maintenance Coordinator, or I’m going to join Artisan club instead. He asked me if I wanted to “move up” to Maintenance Coordinator, and I said sure.

Meanwhile, none of the other club planes are booked for a long trip, just for a few hours here and there. I call the guy who has the Dakota booked for Friday morning when we are planning to return, and he says he is booked to have some instruction in it so he can’t easily reschedule. One of the Archers is free except that somebody has it right now and isn’t due to return for a few hours. So the choice is to wait three hours and then fly, or drive now. Both options won’t get us there until after dark. So we elect to drive.

I think it’s time to check out the costs at Artisan. I know they’re a smaller, more expensive club and I don’t like their fleet balance quite as much (other than a Lance, they’ve got a couple of Arrows and a Warrior, and an Arrow doesn’t haul anywhere near what a Dakota does), but they actually put money into their Lance (including a Garmin 530) and it gets flown.

I’ve mentioned already that I put a system on a local rack, and in order to cut costs I divided it up into three sections using Xen. Well, I had this annoying little problem that the “domU” (user domains – ie. the shares) weren’t able to use iptables. So I’ve gone back to the drawing board by slapping a couple of drives I have kicking around into my Windows box and trying various experiments.

First, I went back to the “step-by-step” how-tos that I’ve been using so far. They’ve updated it for Xen 3.0.3 (I actually installed Xen 3.0.2 using a how-to written for 3.0.1). So I ran through it – no joy. The domU boots, but mounts the ext3 file system as ext2 and won’t do iptables.

Tried again with their instructions on how to compile a kernel, except the instructions say to compile in iptables support, but don’t tell you how to compile in appropriate device driver support so I ended up with no network in my dom0 (the controller domain).

Then I found another “how-to”, this based on the fact that Xen is in the Debian “sid” (aka “unstable”) branch. Updated the test machine to “sid”, then went through the how-to. Initially, couldn’t get xend to start up, but then it turns out that I’d installed xen-hypervisor-3.0-unstable instead of xen-hypervisor-3.0.3. Got that installed, got the domU up and running, but DAMMIT, still the same problem. When I tried to do an “iptables -L”, it would tell me that “QM_MODULES: Function not supported”. Same if I did a “depmod -a” or “lsmod”.

While I was working this angle, I discovered that the Debian Backports project had backported Xen to “sarge”. Hmmm, I thought, if this works out I’ll have to try the Backport to see if I can do this on the rack with minimal hassle and without having to run “unstable” on a “production” server.

That’s when I discovered something interesting – modutils is old, and if you’re going to be using 2.6+ kernels only, people recommend you install module-init-tools instead. Since I’ve been installing Debian “sarge” (aka “stable”) in the domUs, and “sarge” is designed to support 2.4 and 2.6 kernels, it installs modutils instead. I installed module-init-tools, and suddenly everything worked.

Hey, I thought, maybe I don’t have to go through all this pain. I went to my real xen system, installed module-init-tools on the domU, and everything works! No need to go for the Backport. Maybe I will later, but for now I’ve got what I want, and I can install ssh-blacklist on my domU.

Note: presidents.office is the President’s Office, yup.email.news is the Yale University Press, customer.care is their Customer Care contact email, and opa is the Office of Public Affairs


To: presidents.office@yale.edu
Cc: yup.email.news@yale.edu, customer.care@triliteral.org, opa@yale.edu
Subject: I'm sorry I'm going to have to do this...


The Yale University Press has taken to sending out "spam" (unsolicited commercial email) to email addresses trawled from web sites - I know because they hit addresses that never would have been used for conducting a business relationship. That behaviour is unconscionable. I have no alternative but to block all email from yale.edu to the domains under my control unless and until you cease this practice.


I'm sorry if that makes it harder for you to contact potential and current students, alumni and benefactors, but you should have thought about that before you decided to put the burden for your advertising budget on me and thousands of systems administrator like me instead of yourselves.

The temperature was forecast to go up to 63 degrees this afternoon. I thought I’d make one last attempt to get out for a final kayak trip and put it away. But when the peak temperature arrived, so did heavy rain and thunderstorms. A realistic assement of my clothing and ability followed, and I decided that the risk of getting hypothermia on a river that nobody else is using was just too high, and I called it off.

The temperature is going to be in the low 50s tomorrow, and down into the 40s all weekend, so I think this is it.

I think I’m going to put a “farmer john” style wet suit and a spray skirt on my Christmas list. Oh, and a paddle float so I can self rescue.

Ok, I’m going to sound like a total Apple fan-boy with this, but I have to say it. Yesterday, my iPod fell out of it’s case. I picked it up and suddenly without the extra bulk of the case, I was once again struck by how utterly perfect it is. It’s small, it’s light, it’s beautiful, and the user interface is great. It feels good in your hand.

Ok, the screen is a bit scratched up, and so is the shiny back surface. But it’s still a wonder of modern industrial design.

And I look at the Zune, and I see an ugly brown brick, and I think “what the hell were they thinking?”

And before you write me off as a total Apple geek, I had the same feeling with my Treo when I used it for a day without the heavy magnesium innopocket.com case. Not as perfect as an iPod, but definitely smaller and sleeker than I normally think of it because I normally have it in that case.

When I came out of work on Thursday, even though the sun was down my car thermometer said it was an unseasonably warm 60 degrees F. The next morning, I took a quick glance at the weather widget on my Powerbook’s Dashboard, and it said that it was going up to 65. And thus a half-baked plan was born. I quickly put my kayak on my roof rack, which has been left on my car just in case such a day happened.

The intention was to sneak out for a few hours around lunch time and enjoy one last paddle for the year. Unfortunately reality interferred. It turns out that I’d read the weather widget before it had updated, and Friday was actually only going to get up into the mid 50s. Still maybe do-able. But unfortunately I got hellishly busy at work on Friday, and didn’t manage to slip out. Today was warmer, but raining, but still a remote possibility, but I was even busier at work. So I didn’t get out today either. And tomorrow it’s going to be a high of 43F, which would be cold even in a wet suit which I don’t have. Doesn’t look like it’s going up again until Thursday. I guess I’m going to give up and take the kayak off the roof.

Man, if I don’t get off this overtime treadmill soon I’m going to kill myself. Or somebody else.

Today, in spite of how busy we are we got the word from our new boss Nancy that we all had to go to the monthly division meeting. (Ok, here’s where I prove how little attention I pay to the heirarchy: I don’t know if Nancy is Dave’s boss, or Mike’s boss, and I don’t know what slice of the company that meeting is really for, but let’s just call it ‘division’ for now.) I never go to these things, but first we got a message from Nancy saying she expected everybody there, and then another message from Dave saying he’d gotten the word that no matter how busy we were, we should make every effort to get there.

It was the typical boring monthly meeting – announcing all the anniversaries and stuff. But then they started handing out these enormous plaques to people who’d recently gotten patents. I’ve seen these plaques on people’s desks before, but I’ve never seen them handed out. And I’ve only ever seen them on pretty senior people’s desks. I wasn’t expecting one – my patent was awarded months and months ago, and besides I’m a lowly contractor. But I got one, and it had little tags for both of my patents. My boss, Dave, got one as well, with the same two tags.

Afterwards, Nancy told me that the whole reason she’d made the meeting mandatory was to make sure that Dave and I went, because neither of us were prone to going to these meetings.

Getting the plaque was surprising enough, but even more surprising was for the rest of the day people were coming up and congratulating me. Now, I’m not thrilled about the concept of software patents at all, so I didn’t really know what to say. At first, I was saying stuff like “Oh, it wasn’t such a big deal” or “I’m not too proud of it”. But then I thought that probably isn’t very gracious of me, and might be insulting to other people who’ve gotten patents or who want them and haven’t gotten them yet. So then I started just saying “Thanks” and leaving it at that. But still later, some fellow software developers came up to to congratulate/razz me, and I decided the best response was that it was a team effort and I feel sad that we couldn’t credit everybody on the patent. I also told one of the developers that one of the things she did, an automatic “matcher” algorithm, was definitely worth a patent and she should apply for one herself.

Ok, I’ve delivered a version of the code I’m working on with all the bits hooked together so that it should be test-able. Now I have to wait for a build to be produced so that I can install it on a test machine and start finding out all the ways it’s broken.

Since it’s going to be at least a couple of hours before the build is done, I’m going to go home and relax, try to not watch any election stuff, and relax before I start Stage 2 Of Hell tomorrow.

I’ve been exhausted for days now – I feel like I can’t keep my eyes open. It’s accumulated lack of sleep and stress from this project I’m on – I’ve put in over 55 hours a week for about 2 months now.

Last night I went to bed early to try and get some rest. But I found myself lying in bed awake in the middle of the night, worrying that if I didn’t get back to sleep soon I’d been even more tired and wouldn’t be able to finish the work I promised to get finished by Monday.

Believe me, that’s not conducive to getting back to sleep. It’s a great cycle of stress leading to lack of sleep leading to stress.

I want my weekends back. Although the first weekend I get off after this project is done will probably be spent comatose.

Somebody broke the build badly at work. Sometime two or three days ago, they split a library JAR into two JARs, but didn’t update the Makefiles for all the places that relied on the JAR. I didn’t discover this until today because up until now I wasn’t relying on anything new in the JAR, and so hadn’t noticed that I was using an old copy of the original JAR. Because I had to use the new stuff in the JAR to complete what I am working on today, I ended up going to work to fix the build. Which meant a long, long cycle of doing “clearmake clean; clearmake” to make sure things were really fixed.

Java has a very bad habit of deciding that a source file it just compiled two minutes ago needs to be recompiled. So it can be compiling stuff in package “foo”, but because package “foo” uses some stuff from package “bar”, it might suddenly decide to recompile something in “bar”. This is a problem because I recently added something to a package called “dcms” that meant it needed to include a new jar file in its CLASSPATH. But I didn’t know that a package called “ai”, which uses a file from “dcms”, would suddenly decide it needs to compile a file from “dcms”, and therefore the CLASSPATH in “ai”’s Makefile needs this new jar as well. Or, if you’re a clearer thinker, you change the CLASSPATH in “ai”’s Makefile so that it’s looking at “dcms.jar” instead of “src/dcms”. Unfortunately we’ve all had this bad habit of including paths to source in the Makefiles of other modules. I think we’re eventually going to end up with a monster “include everything” CLASSPATH in all Makefiles. Ugh.

Another discovery is that SpamKarma2’s “RBL check” seems to be hanging until it times out. I’m not sure if that’s due to the move, or maybe the RBL server is down? I don’t know how to check that. I spent way too much time this morning tracking down every missing “auto_increment” column in every table, and I think I have them all now, but maybe I missed one?

Another discovery is that the only version of php4-gd for my Debian system requires a lot of X Windows libraries to be installed. I had no intention of installing X Windows stuff on this system, since it will never have a graphic head on it. But on the other hand, without php4-gd, I don’t have a CAPTCHA check for SpamKarma2, which means I have to look a bit more carefully at marginal spam.

When you move your blog to a new host:

  1. If your want your Powerbook to see the dns change, issue the command lookupd -flushcache
  2. When you move a database from MySQL 5 to MySQL 4 using the compatibility mode of mysqldump, it doesn’t move the auto_increment attribute, and you have to restore it using alter table wp_posts change id ID bigint(20) auto_increment;