By all means, Paul, …

Yesterday I went to CompUSA to buy a new PCI IDE controller – one of my external USB drives that I bought to perform networked backups of my colo keeps losing its mind, and I was thinking that it might be that the USB controller (either the interface card or the one in the external box) isn’t up to heavy data transfer, so I thought it might be good to move it “indoors” as it were.

I installed the controller and a 250Gb hard drive. The system found it at /dev/hdg – I guess I put it in the second of the two IDE controllers on the card. I made /dev/hdg a physical volume (pv) under LVM2, made it a volume group (vg) and put a logical volume (lv) on it for my mp3 collection. After I moved my mp3s from where they’d lived before, on /dev/hdb, I wiped /dev/hdb and made it part of the vg, and made another vg for the colo backup. Yesterday I also discovered the “--link_dest” argument to rsync, so I can keep several days worth of backups in much less space.

Tonight I’m going to rip that hard disk out of the external USB drive and put it on my currently eviscerated Windows machine to see if IBM/Hitachi’s “DFT” drive function tester can find any problems with it. If not, I’ll add it to the vg and increase the size of the mp3 lv.

Tomorrow my new colo box should arrive, unless UPS does their customary screw-ups. I’m scheduled to go out to the colo facility on Thursday. I’m going to move the old drives to the new box, and upgrade to the newest version of Xen. I’ve practiced upgrading to the new Xen on my currently eviscerated Windows box (that’s why it’s eviscerated, I had to put scratch disks in it) and it didn’t go well, but I think I know what I did wrong. I also tried a full install of Debian on the dom0, and was able to save the domUs when I tried that.

If that goes well, I’ll be up again in a few hours. If it doesn’t go well, I’ll bring it home and work on it overnight, and I’m tentatively scheduled to go back to the colo on Friday.

The ultimate Heisenbug?

We’ve got a problem that happens apparently at random times at a few customer sites, but which we’ve been unable to reproduce in the lab. I’m not sure if that means it’s a Heisenbug or just a really nasty Bohr-bug.

The part of the system that is affected are three programs:

  • One that generates events, called “tixd
  • One that is responsible for collecting events from all the programs in the system (not just these three) and delivering them to subscribers, called the “EventBroker” or “eb
  • One that subscribes to the events that the “tixd” generates, which we call the “scheduled

What has been happening on these customer sites is that after days or weeks of proper operation, for no apparent reason, the “tixd” would say that it’s generating an event, but the “scheduled” wasn’t getting them any more. The customer would notice the problem, sometimes a day or two later, complain that things weren’t happening that were supposed to happen, our service people would restart the whole system, and everything would start working again.

This bug has been happening for ages now, and every time I get called in to look at their logs because I wrote the “scheduled” and all the fingers point to me. But I couldn’t find any reason why “scheduled” would stop responding to events, or would unsubscribe from events. A few builds ago, Tom put some debug into his “eb” that would log every event that came in and which subscribers it was being delivered to. He also logged subscribes and unsubscribes. And so we waited.

Today, it finally happened again. And this time, I’ve got the logs that show:

  • At 6am, an event is generated by the tixd, and the eb delivers it to the subscriber scheduled
  • Between 10am and 11am, there is a flurry of event subscribes and unsubscribes, all unrelated to scheduled. But some of these unsubscribes are caused when events are being delivered to subscribers that have exited without unsubscribing.
  • At about 1am, there is another event generated by the tixd, and the eb receives it but says there are no subscribers found.

At this point, because the eb log shows no unsubscribe coming from scheduled, I’d say it’s not my bug and pass it off to Tom, the author of the eb. But unfortunately, my employer declined to renew Tom’s contract at the end of last year, so he no longer works here. He dodged this bullet by only 5 days. And so I’ve got to figure out why this is happening. Lucky me.

Are you a pilot who blogs, or a blogger who flies?

I got an email today from “IFR Pilot” (who also signs off as Darrell) cc’ed to a bunch of other pilot-bloggers proposing that we all have a fly-in and get to know each other. After a few massively cc’ed exchanges where people seemed enthusiastic about the idea, I set up a mailing list so that other pilot-bloggers could find this list and sign up. If you are in that category, you can sign up at this link.

A lot of the people on “IFR Pilot”‘s list were people I’d never heard of, so I can see I’m going to be adding a whole bunch of new blogs to my RSS reader.

So how’d I do? (Aviation edition)

For 2006 I set myself a few goals for my flying. If I recall correctly, it was

  • Fly 50 hours this year.
  • Do some airwork and get more proficient at smooth flight, especially the use of the rudder.
  • Start work towards a Commercial or Float Plane rating.

Well, it didn’t quite work out that way. I only got 37.9 hours flying time (25.3 complex), although I would have been 5 or so hours closer to my goal if the Lance hadn’t been broken on the day we departed for Oshkosh, and maybe another 3 hours if we’d been able to fly to Albany on Thanksgiving weekend. Oh well. That’s still up for the 20-25 hours I normally put in a year. I also didn’t do much airwork, mostly cross country. So I still finding myself having to look at the ball and putting in rudder as an afterthought rather than feeling what needs be put in. However, I did get training in the Garmin 530, and I think I’m getting more precise in my approaches and IFR en-route flying. I also had a little adventure with ice avoidance and negotiating with ATC for what I needed on my way home from Pinckneyville. So while I didn’t meet my goals, I think I had a pretty satisfying flying year.

I’m not sure if I’m going to get to Oshkosh this year – this is our 10th anniversary and I think I’m going to be spending my vacation time on a cruise or something. So I probably won’t be heading down to Florida for Jack Brown’s Seaplane Base or up to Parry Sound for Georgian Bay Airways for a float rating either.

So my goals for this year remain

  • Become a more proficient yoke and rudder pilot.
  • Continue to fly more than I have been in the past.