Weird one

A week or two ago, somebody from customer support came to me because certain customer sites were having some weird problems – some sites weren’t seeing new content even though it had been delivered and couldn’t update their schedules. She reported that restarting our services fixed it on most sites, but on one she’d had to reboot. Unfortunately the wasn’t anything obvious in the logs. The weird thing is that she said that the problem started with our content provider moved to the new servers – but the content provider says that they didn’t change what they were sending, just what servers they were sending it from.

After a few days, she managed to find one site that still having the problem, and I poked around and still couldn’t find much, except for the fact that if I went into the postgres command line, psql, it would allow me to do anything on the database except query one particular table. If I tried to do anything on that table, it would freeze up. Hmmm. Lacking any other ideas, I shut down the database server and restarted it, and that cleared up the problem. But shutting down the database server also kills off any processes that might be using the database. I was starting to think that two processes were in a deadlock over this one particular table. I filed away the information and asked for her to call me if it happened on any other sites.

This morning, she comes and says it’s happened on several sites at once. She logged me into one of the sites, and sure enough psql would block if I tried to select from that same damn table. Time to dig a little deeper. “select * from pg_locks” showed a couple of exclusive locks. Hmmm. Doing a “ps auwwfx” showed that there was a vacuumdb going on. Oh oh. It’s the nightly backup scripts. A couple of years ago (21Sep2005), I threw a call to “vacuumdb –analyze –full” in there. A bit of googling showed that duh, you’re not supposed to do a “–full” when anything else is going on because it does an exclusive lock on full tables. And JDBC has to take some sort of weird work-around for the fact that Postgres doesn’t give them an easy way to turn autocommit on, so they do the equivalent of a “commit;begin;” after every command. And this causes some locking of its own to go on which is clashing with the vacuumdb locking.

In all this time, it’s never caused any problems, or at least none with any regularity, but evidently the server relocation has caused some content ingestion to happen at the exact time this vacuumdb is going on, and causing these particular sites to have semi-regular problems.

I told them to go around to all the customer sites and edit /etc/init.d/backup_cos_files and remove the “–full” from the vacuumdb command line, and all should be well. I can’t believe I made such a dumb mistake, and that it didn’t cause any problems until now. Actually, I’m sort of hoping it will solve my other mysterious database lock up that was only happening about once a year per site. But that’s probably too much to hope for.

Is it too early to start planning for Oshkosh?

Ok, I missed last year. I really want to go again, and I really want to fly. But we don’t have a Lance any more and somebody already booked the Dakota, so I’m going to have to be *really* careful about weight if I’m planning to go in an Archer. The other problem is that last time I went we camped, but after a day walking around I’m in incredible pain and I was damn near useless around the camp, and I felt bad about letting Mark do all the work. Maybe what I should do is pack a tent, a sleeping bag, and a bike to get to the Chinese buffet place across the road? Or maybe I should fly in, but park in GAP (General Aviation Parking) and find a place in one of the dorms or something?

I’m also thinking that sacrilege of sacrilege, I might not go for the whole week. Maybe come in Saturday before things get started, stay for Jay Hoeneck’s famous party on Wednesday, and leave on Thursday. Much as I hate to admit it, the air shows get pretty repetitive after 4 or 5 days, and unless you’re specifically in the market for something, you can see it all in that time as well. Plus, when I stay the full week I’m in real danger of getting talked into buying a kit. After all, I built a canoe from scratch, how much harder could an airplane be? Oh, that much harder? Ok, never mind then.

If I die

I was reading a forum thread about a scuba accident that killed a friend of my brother’s, and which my brother was also involved in finding the body after the RCMP tried and failed.

One of the thread contributors posted this thing for divers, but it made me think of Mike and Dave’s recent death in their float plane, and my own thoughts about the possibility of dying in a plane crash.

If I should die while diving.

If I should die while diving please do not hesitate to discuss the incident and assess every element with a view to furthering your understanding of how to enhance diver safety.

If I should die while diving get the facts. They won’t be readily available and will definitely not be correct as reported by the media. But get the facts as best you can.

If I should die while diving understand, as I already do, that it will most likely involve fault on my part to some degree or another so do not hesitate to point that out.

If I should die while diving some of the fault will probably belong to my buddy and that needs to be honestly assessed as well though I must admit this is one area where I hope that compassion will be in the mix.

If I should die while diving there might be those who try to squelch discussion out of a misplaced notion of respect for the deceased, family and friends. They can say nice things about me at my funeral… but in the scuba community I want the incident discussed.

If I should die while diving at least I didn’t die in bed.

I could do a search and replace of “diving” with “flying”, and it’s pretty close to something I’d like to say to my fellow pilots and my nervous but understanding wife. Well, except for the bit about buddies – we don’t use a buddy system in flying even when we should.