Anatomy of a failed plan

Radar pictureThe plan was to go up and visit my daughter. She asked me to bring my plane so we could take her new boyfriend for a ride. Unfortunately, the plane wasn’t available until 3pm on Saturday, but that’s ok because she was working until 5pm, and so we could go out to dinner and then flying on Sunday. And maybe my long distance friend Mark. So I booked it. Then came the first hitch – the boyfriend wasn’t available to go flying on Sunday after all. Oh well, we could still do dinner, and Liane and Mark and I could still go flying. And then Mark remembered a former commitment. Oh well, dinner was still on the menu, and Liane and I could go for brunch and a flight on Sunday.

At 1pm, I was checking the weather, and there were thunderstorms to the south. I thought that was ok (because storms around here generally move to the east), except the flight service weather briefer pointed out that they’re moving north. I checked the radar just before leaving for the airport, and the thunderstorms were now over Buffalo, but still a bit south of my route. I decided to go to the airport anyway, but when I got there I checked the radar again and those little red echos had moved right onto my route, and looked like they were heading straight for my destination. At this point, I gave up. The Aviation Weather site showed a SIGMET (Significant Meteorological warning) for thunderstorms in the area.

The image above is the current radar picture. If I’d left when I’d originally planned, I probably would have been landing about now. Or crashing. There is hail falling here in Rochester right now, and the dog won’t leave my side because he’s scared of the thunder.

I’d love to help you, buddy, but I’d prefer to keep my license

I got a curious email just now.

Good Afternoon- I got your name through the Rochester Flying Club website. I live in Hemlock and work in Rochester. I am looking for a
way to be in two places at once…..I have two children graduating from college on the same day in different parts of New York State. My
daughter graduates from [college in upstate NY] at 10:30 in the morning and my son graduates from [another college, near here] at 4pm in the afternoon . There isn’t time to drive between the two, but there might be time to get to them both if we fly. I’m hoping that you or someone you might know would might be interested in flying 4 of us from [first location] to [second location] on the afternoon of May 21st. If you can lend any assistance, I’d love to hear from you.

I feel for the guy, and this sort of need to be in two places at once is a pretty compelling reason to become a pilot. But if I, or any other private pilot, were to take him up on the offer, the FAA would be all over the pilot for offering an illegal charter. Plus there is the little matter that for his four person family, the pilot needs a six seater, and at 4pm on the 21st I’m going to be flying the club’s 6 seater home from the rec.aviation fly-in.

Still scratching my head.

I’m still working on the problem in Rants and Revelations » That’s a head scratcher.

I wrote the thread spawning test program, and it ran 18,000+ iterations overnight on a test machine without the slightest hesitation. I pored over the code to see if there was a “Dining Philosophers”-style lock contention issue. I examined the logs for other programs on the system. And I’m still no closer.

I have a horrible suspicion that the lock up is actually in the database code somewhere. And also, that instead of using threads and locks to make sure I respond to the events quickly but don’t do more than one event at a time, what I really need is an job queue, so I can monitor if a job is taking too long, just kill it and start the next.

But of course since I don’t know where the lock up is actually happening nor can I reproduce it, I’m not sure how to know if my changes are going to fix anything.

That’s a head scratcher

I’m working on the type of bug that might take me a day, it might take me a week, or it might cause me to give up entirely.

In our system, there is a process of mine (the schedule daemon) that gets events from another process (the event broker) and does some database manipulation. Because the events can come thick and fast, and because I don’t want them stepping on each other, each event causes a separate thread to be spawned, and the thread action is guarded by a global “synchronized” object (this is in Java, by the way). Most of the time, this works fine – if an event happens while another thread is still processing, the second one waits for the first one to relinquish the lock, and it does its thing. The event processing threads generally take 5-15 seconds to run.

But I have a log file I from a customer site, where it appears that one of these event threads started at 04:17, and never finished and never relinquished the lock. So events that happened at 04:51, 05:52 and 06:01 never got processed. And I can’t for the life of me figure out why.

I’ve looked extensively at the code between the last progress message from the 04:17 thread and the progress message it should have printed next. Nothing leaps out at me. And like I said, this code works all over the place, even at this customer site most of the time.

One possibility is that some other program is manipulating the database at the time. I do know that the Playlist that was being retrieved at the time of failure is not present in the next day’s backup, so something may have been deleting it at the time.

I wrote a program that calls the same database method as the one that hung over and over again, and ran that in a continuous loop while doing other stuff to the database including deleting the playlist in question. But while I’ve got my test program to fail with an exception numerous times, it never hangs. (I’m assuming that if a thread dies with an exception, it will release its locks. Something else to investigate, I guess.)

I guess my next step is to step up the tree a bit, and instead of calling the low level query multiple times, try spawning the thread that hung multiple times. Other than that, I’m baffled.