A Wikipedia for Navigation Data

I was reading A blog entry about airport entries on Wikipedia and COPA (Canadian Owners and Pilots Association), and reflecting on how thousands of volunteers, people with too much time on their hands mostly, have made Wikipedia such a comprehensive resource. And it struck me that if we could harness the same sort of dedication and talent from pilots around the world, the vast number of us who rely on DAFIF (Digital Aeronautical Flight Information File) data for our flight planning, flight simulation, moving map navigators and other applications won’t be left out in the cold when DAFIF goes away.

I was envisioning something based on the Wikipedia source, but instead of a free form text entry you’d have a structured form for entering the data, and it would be pre-filled with the data already available from DAFIF, FAA and other databases. Then interested people could regularly check that against their local Aeronautical Information Publication and update it as necessary. The resultant database would be available free to any person or program who currently uses FAA and DAFIF data, such as my own waypoint database generators.

Since I posted a variation of this idea on Usenet last night, I have been informed that one DAFIF using program has already attempting the volunteer approach. The problem with that is that while they will only target users of their one individual program and only make the results available to the users of their one individual program. I think it would be better to make one system that draws volunteers from users of all programs that use DAFIF, and which makes its results available to all users of DAFIF.

My experience trying to recruit volunteers for entering data for my navaid.com systems is that people express interest, but there are very few people who follow through, and most of them lose interest after a few update cycles. The only solution is to make it as easy as possible for people to contribute, and to draw from the largest possible pool of volunteers.

Update: I contacted the people doing the volunteer project for their own product, and they have no interest in doing something for the benefit of the entire aviation community because they can’t make money off of it, and made disparaging remarks about the types of volunteers that this project would attract. Since they’re being assholes, I took the link to their site off this posting.

Ok, I’m an idiot and Linode is back on the table

It turns out that that test I ran yesterday that showed that Linode was even slower in mysql than it was in Postgres? Well, it turns out that I’d left the “;host=mysqldb.gradwell.net” in the connect string, so instead of hitting my local mysql database, I was actually going across the Atlantic Ocean to hit a database at Gradwell. D’OH!

I switched to using the local database, and the time came down to a slightly more acceptable 7+ minutes, but I was still I/O rate limited much of the time. Then I switched to using another guy’s database on his Linode (much better provisioned than mine) and the time went down to about 3+ minutes, and I never hit my I/O limit even once. (Which makes me think that multiple generators running at the same time won’t slow to a crawl.)

Linode probably a total washout

I’m starting to think that I won’t be able to host my application on Linode at all. Here’s the results of my latest testing:

Database Home Gradwell Linode
PostgreSQL 7:46   21:01
MySQL 0:32 1:01 42:40

The abysmal performance on the last run, MySQL on Linode, appears to be because I’ve hit some sort of I/O limiting that they do when people do too much disk I/O (i.e. swap).

I’m going to try the tests again on Linode but with the database hosted somewhere else – either at home or on my Gradwell server. Even if that works, I’m not sure what that will mean about my options.

More bad news on the Linode front

Followup to Rants and Revelations » Bad news on the Linode front:

I ran the same generator task on Gradwell and the Linode. On Linode, it took 21m1.1s, on Gradwell, 1m0.5s. Kind of a huge difference, don’t you think? So I copied the database and code to my home machine, which has 1024Mb of RAM instead of 96Mb, and dual Athlon MP1800+ processors, and it still takes 7m46s.

So either Postgres is way slower than MySQL, or I’ve done something really wrong when I ported the code.

I guess my next move is to try the Gradwell MySQL code on my home server and see how long it takes.

Bad news on the Linode front

I’ve started to port my navaid.com applications to run on the Linode. And I’ve got trouble. If I have three or four simultaneous generator processes running, they all will stall out. And I don’t mean that one will keep running while the rest stall, I mean that none of them will make any progress for 10-15 minutes, and then suddenly they’ll all start running again. I’m seeing load averages go up over 10 during the stall, and then come down to between two or three while they’re running, and then go back up and stall.

I think the problem is a lack of RAM. The only explanation for such high load averages and the stalls that I can find is that if the (virtual) machine is doing a lot of swapping – certainly while the load average is in the stratosphere “top” is reporting almost no CPU usage. And I can’t really see that paying $5/month for an extra 16Mb (bringing me up to 112Mb) is going to help a lot. What I really need is the sort of RAM I have on my home server (1024Mb) or a way to keep these processes from getting so big.

The individual “postmaster” processes are quite big – I wonder if turning on autocommit might shrink that. I’d turned it off, hoping that having the generator processes happening in a transaction would mean that if I load the data in a separate process while somebody is generating that they won’t get an inconsistent view.

And the CreateCoPilotDB processes are huge – at least part of that because the Palm::PDB code just puts everything into hash tables in memory until it’s all collected, and then writes it out en-mass at the end. There’s a reason for that – near the beginning of the PDB file is an index with the file offset of each individual record – and you can’t tell the offset to the first record until you know how big the index is going to be. But I had a thought about that last night – maybe I can write the actual records off into a temp file, and only store the relative offsets from the start of the first record in memory. Then at the end I can write the header and the index using (sizeof index + stored offset), and then append the temporary record file onto that file. Might be worth a try.

I have another problem where sometimes the web page progress meter will time-out and show a server error instead, but I can just raise the Apache TimeOut parameter from 30 seconds to 120 seconds like I have it at home.

I’m not sure what I’m going to do if I can’t fix the performance issues. Possibly bring the navaid site, or at least the Postgres database part of it, back onto my home server. I don’t like that idea at all.