Xen troubles

Yesterday I went out to the colo to put another hard drive in my 1U box. I’ve shut down my box about 3 times now, and one of the times the third domU got a corrupt disk and had to be wiped and reinstalled. That’s why I tried so hard to make sure that all the disks got mounted as ext3 (with journalling) instead of ext2 (no journalling). This time, just to make sure, I used the “xm shutdown [domU name]” on all three domUs before I shut down the box, just to make sure they shut down cleanly.

It took a bit of struggling to get the second drive working – I had to jumper the drives as master and slave instead of cable select, and the 80 pin cable I brought along didn’t quite stretch from one to the other so I had to stick with the existing 40 pin cable. But other than that, it seemed like everything went fine.

Until I got an email from the owner of the third domU. He couldn’t log in. So I tried the “xm console”, and saw

xm console xen3
attempt to access beyond end of device
hda1: rw=0, want=1357711368, limit=104857600
attempt to access beyond end of device
hda1: rw=0, want=18058643056, limit=104857600
attempt to access beyond end of device
hda1: rw=0, want=2123850752, limit=104857600
attempt to access beyond end of device

and then it would prompt for a userid but never prompt for a password.

I shut down his domU and did an fsck on his lv, and it reported dozens if not hundreds of errors. It boots now, but I’m scared that it’s going to do this again.

Gah! I used to do this for a living?

As part of my vacation recovery, I decided to submit a patch to make GPSBabel understand CoPilot version 4 files. I wrote the module for understanding CoPilot files back in 2002, but it only understood version 3. I decided to make it check the version number in the header and do the right thing for any version.

Now back in 2002, I cargo culted the existing GPSPilot code, and what I was doing wasn’t hugely different from what I already had there. But I haven’t written C code for a living since … (checking my resume) … 1994. Since that time, I’ve been coding in C++, Java, and perl. And I haven’t even done C++ since 2002. Grovelling along a “pointer to data” to try and extract some binary data into a format I can use is something that these days I’d do using pack/unpack in perl. C just seems so damn primitive now – almost like something that belongs in the last millenium. And it does. I was so impressed with it when I first started using it. But that was a lifetime ago.

QA versus development

On a mailing list somewhere I was musing about why I, as a developer, always find myself annoyed at QA. And that’s not good, because QA and development are partners in making sure that what we develop comes to the customer as good as we can make it. But the problem is that development never has time to properly document what we’re doing to QA, and QA only communicates to development in the form of bug reports.

As far as I can tell, there are only N types of bug reports:

  1. annoying because you already knew about what they are reporting.
  2. annoying because you thought you were done that bit and now you have to go back to it.
  3. annoying because you know that part works and now you’ll have to drop everything to go show them how they are using it wrong.
  4. annoying because you thought that part works and now you’ll have to drop everything to have them show you how they are using it right.
  5. annoying because their bug report doesn’t give you enough detail.
  6. annoying because it goes into excruciating detail when you could tell what is wrong from the first sentence.
  7. annoying because it goes into excruciating detail about stuff you already knew, but glosses over the bit that tells you if it’s a known bug or something new.
  8. annoying because they are describing something that’s working the way it’s documented to work.
  9. annoying because they are describing something that’s working the way you want it to work, but you haven’t had time to document that behaviour yet.
  10. annoying because it’s the same bug they already logged a week ago.
  11. annoying because it’s so poorly written that you can’t tell if it’s the same problem as the one they already logged a week ago.
  12. annoying because you thought you’d fixed that last week but it’s obvious from the report that you missed something.
  13. annoying because you thought you’d fixed that last week and it’s not obvious from the report if they’re testing the new code or not.

But what it all comes down to is that QA is annoying because they’re a constant reminder that you’re not as good as you wish you were. I don’t want to be “only human”, I want to be perfect.

Getting there.

I’ve gotten a few steps closer to moving everything that was on my Linode virtual private server over to my colocation box. Basically, the only thing left there is the hardest one to move, and that’s the navaid.com waypoint generator. Part of the problem is that the new site has FCGI instead of FastCGI, and part of the problem is that I’m going to be converting from MySQL to PostgreSQL, and of course the version of MySQL in Debian Sarge doesn’t have the “compatibility” option in mysqldump. Oh well, I’ll get there.

Today I moved my Mailman mailing lists over. Since the versions of Mailman and Postfix were the same on both places, it was a pretty simple matter of copying the files over. The hard part was managing the cut-over so that no mail got lost. That meant getting everything set up on the new site, using rsync to make sure the files were absolutely up to date, checking out the permissions, and once I’d tested the setup using forced fake DNS entries, cutting over the real DNS entry. I think it’s all working right.

Next up, I’m considering moving my Gallery installation over. I’ve also got to get out and install a new hard disk that was given to me.

Tell me, is it a bit weird that on one of my few days off from a stressful software development project I spend the whole day futzing around with computers?

Now I’ve *really* had it with this guy.

Back in June, I wrote about how the Maintenance Coordinator for the club’s Lance is downright secretive. Today, Vicki, Laura and I went confidently out to the airport for a flight we’d scheduled last week, taking the Lance out to Albany to spend Thanksgiving at Stevie’s new apartment. We loaded our luggage in the luggage compartment, and Vicki and Laura went back into the FBO while I preflighted. I open the front door and sit down, and there is a sign taped to the yoke saying the plane is grounded. Oh oh. I do a quick walk around and discover that the wingtip appears to have been scraped by somebody or something, and it’s taken off the navlight/strobe fixture, the guts of which are held in place by scotch tape.

Ok, now I’m mad, because when the plane is grounded, the Maintanance Coordinator is supposed to mark the plane as grounded in ScheduleMaster, our on-line scheduling system so that people don’t expect that they’re going to be able to use it for a trip and don’t find out until they get to the airport that there is a known problem with it.

I head back to the ops room to check ScheduleMaster to see if any of the other planes are available. That’s when I discover something that made me 300% madder still – the squawk list shows that this wing navlight/strobe was squawked on 10/8, over 6 weeks ago! The squawk says that the person reporting it immediately called the Maintenance Coordinator. But the Maintenance Coordinator has let this plane sit there grounded for 6 fucking weeks without letting anybody know!

I called the VP of Maintenance, and he didn’t know either. He says that when it happened, Bill, the Maintenance Coordinator, said he would get it fixed that very week, and he’d assumed Bill had done that. I told him that this is totally unacceptable, and either he removes Bill as Maintenance Coordinator, or I’m going to join Artisan club instead. He asked me if I wanted to “move up” to Maintenance Coordinator, and I said sure.

Meanwhile, none of the other club planes are booked for a long trip, just for a few hours here and there. I call the guy who has the Dakota booked for Friday morning when we are planning to return, and he says he is booked to have some instruction in it so he can’t easily reschedule. One of the Archers is free except that somebody has it right now and isn’t due to return for a few hours. So the choice is to wait three hours and then fly, or drive now. Both options won’t get us there until after dark. So we elect to drive.

I think it’s time to check out the costs at Artisan. I know they’re a smaller, more expensive club and I don’t like their fleet balance quite as much (other than a Lance, they’ve got a couple of Arrows and a Warrior, and an Arrow doesn’t haul anywhere near what a Dakota does), but they actually put money into their Lance (including a Garmin 530) and it gets flown.