What a difference an index makes!

I made some changes to my FAA data loader script and ran it. Four days later it had finished running, I discovered a few bugs, and was getting ready to run it again, but I decided to see if I could improve the speed any. I’d already run the perl profiler and discovered that 95% of the time was spent in mysql. So I tried doing an “EXPLAIN” on all the queries. That’s when I discovered that one very common query was doing the dreaded “ALL” query on a 12,000 row table as step 1. Hmmm. That table isn’t even an important one, it was just table joined to the main “waypoint” table to get one field that was semi-useful. The query has a “waypoint.datasource_key = ?” in it, why isn’t it doing that first? “DESCRIBE TABLE waypoint;” showed me the error of my ways – I’d forgotten to put an index on “datasource_key”.

So I created the index and started the script before going to bed. I was astonished to discover that the script was done this morning. According to my Munin graphs, it had only taken about 6 hours to run. 96 hours down to 6 hours. Yeah, that’s a worthwhile optimization.

Fun fact #143.4

Discovered while trying to debug my nav data loading scripts: The Hendersonville Airport (0A7) and the “W.N.C. Museum Airport” (8NC9) are only 0.03 nautical miles apart, but they’re separate airports. I’m not sure if they’re the closest two ever, but they’re certainly pretty damn close. As a matter of fact, I think this picture Hendersonville Airport from the AirNav.com web site listing for Hendersonville shows both runways, the paved one for Hendersonville and the turf one for the museum. I bet there is a story why they didn’t just build a taxiway between them and call it one airport.

Oh, bugger!

Ok, the big load job just finished, and it appears I was loading the old FAA data, not the data that became current on Wednesday. Also, it appears I have a bug in the code that loads the runways – the old scripts seemed to have taken “U” or “” for the runway end latitudes and longitudes as null, but the new ones are putting those values in as 0. Oops.

I guess I’ll have to run it again – using nohup this time. See you next week.

Hey, Google

If you’re going to send a guy a survey to find out his impressions of your recruiting/interviewing methodology, you might not want to send it to somebody who has been waiting two months for you to pay his travel expense claim. Because he might just think that you’re a bunch of disorganized fuckwads. Just sayin’.

And a bit of further advice – you might want to make sure your expense spreadsheet actually prints out correctly using Google Docs and doesn’t require one to steal a copy of Microsoft Office in order to use it.

Darn it!

Some years ago I got a set of Shape Files (much as I hated ESRI at the time, they did beat us (GeoVision) fair and square with an inferior product and superior marketing) for the provincial and territorial boundaries of Canada and wrote a short little C program to do a “point in polygon” to determine what province a particular point is in. It’s set up so that it parses the shapes, then sits there waiting for a lat/long pairs to come over a pipe and writes the province code back over the pipe. I write a special lat/long pair (-999/-999) when I want it to exit. I use it all the time when I’m loading waypoints. The program has continued to work while my waypoint generator moved from being hosted at home to a webhost (Gradwell.com) to a virtual private server (Linode) to a colocation box. Unfortunately, I just discovered that somewhere along the way I lost the source code. Right now it has a small bug in that when the program that opened the pipe to it dies, it starts consuming all the CPU instead of shutting down. I can live with that, but it’s annoying while I’m doing all this testing with buggy load scripts. That’s why I went looking for the source code.

Maybe I should write a new one just in case?