Reason 147 why MySQL is not my favourite RDBMS

There are three components to my waypoint generator

  • A set of scripts to load or reload some of the data when an update comes in.
  • A set of scripts that actually generates the databases.
  • A web interface.

All three components are written in Perl, and all access the same database. As mentioned previously, I’m using MySQL because PostgreSQL was too slow on the limited resources I have on my Linode.

Last night, I ran one of the load scripts, and while it was running I tried to access the web interface. The web interface start up accesses and updates a couple of “session information” tables, which the load scripts have no reason to access. So somebody tell me why the web interface startup timed out with the error:

[Wed Jun 08 22:08:37 2005] [error] [client 66.67.112.52] FastCGI: server "/config_backup/navaid.com/htdocs/CoPilot/index.fpl" stderr: DBD::mysql::st execute failed: Lock wait timeout exceeded; try restarting transaction at /config_backup/navaid.com/perl/WaypointDB.pm line 312.

Line 312 in WaypointDB.pm is a line that deletes from the table sess_main. And like I said, nobody else should be updating it. So why the hell should the fact that a load script is running cause a lock wait timeout on that table?

A Wikipedia for Navigation Data

I was reading A blog entry about airport entries on Wikipedia and COPA (Canadian Owners and Pilots Association), and reflecting on how thousands of volunteers, people with too much time on their hands mostly, have made Wikipedia such a comprehensive resource. And it struck me that if we could harness the same sort of dedication and talent from pilots around the world, the vast number of us who rely on DAFIF (Digital Aeronautical Flight Information File) data for our flight planning, flight simulation, moving map navigators and other applications won’t be left out in the cold when DAFIF goes away.

I was envisioning something based on the Wikipedia source, but instead of a free form text entry you’d have a structured form for entering the data, and it would be pre-filled with the data already available from DAFIF, FAA and other databases. Then interested people could regularly check that against their local Aeronautical Information Publication and update it as necessary. The resultant database would be available free to any person or program who currently uses FAA and DAFIF data, such as my own waypoint database generators.

Since I posted a variation of this idea on Usenet last night, I have been informed that one DAFIF using program has already attempting the volunteer approach. The problem with that is that while they will only target users of their one individual program and only make the results available to the users of their one individual program. I think it would be better to make one system that draws volunteers from users of all programs that use DAFIF, and which makes its results available to all users of DAFIF.

My experience trying to recruit volunteers for entering data for my navaid.com systems is that people express interest, but there are very few people who follow through, and most of them lose interest after a few update cycles. The only solution is to make it as easy as possible for people to contribute, and to draw from the largest possible pool of volunteers.

Update: I contacted the people doing the volunteer project for their own product, and they have no interest in doing something for the benefit of the entire aviation community because they can’t make money off of it, and made disparaging remarks about the types of volunteers that this project would attract. Since they’re being assholes, I took the link to their site off this posting.

Ok, I’m an idiot and Linode is back on the table

It turns out that that test I ran yesterday that showed that Linode was even slower in mysql than it was in Postgres? Well, it turns out that I’d left the “;host=mysqldb.gradwell.net” in the connect string, so instead of hitting my local mysql database, I was actually going across the Atlantic Ocean to hit a database at Gradwell. D’OH!

I switched to using the local database, and the time came down to a slightly more acceptable 7+ minutes, but I was still I/O rate limited much of the time. Then I switched to using another guy’s database on his Linode (much better provisioned than mine) and the time went down to about 3+ minutes, and I never hit my I/O limit even once. (Which makes me think that multiple generators running at the same time won’t slow to a crawl.)

Linode probably a total washout

I’m starting to think that I won’t be able to host my application on Linode at all. Here’s the results of my latest testing:

Database Home Gradwell Linode
PostgreSQL 7:46   21:01
MySQL 0:32 1:01 42:40

The abysmal performance on the last run, MySQL on Linode, appears to be because I’ve hit some sort of I/O limiting that they do when people do too much disk I/O (i.e. swap).

I’m going to try the tests again on Linode but with the database hosted somewhere else – either at home or on my Gradwell server. Even if that works, I’m not sure what that will mean about my options.

More bad news on the Linode front

Followup to Rants and Revelations » Bad news on the Linode front:

I ran the same generator task on Gradwell and the Linode. On Linode, it took 21m1.1s, on Gradwell, 1m0.5s. Kind of a huge difference, don’t you think? So I copied the database and code to my home machine, which has 1024Mb of RAM instead of 96Mb, and dual Athlon MP1800+ processors, and it still takes 7m46s.

So either Postgres is way slower than MySQL, or I’ve done something really wrong when I ported the code.

I guess my next move is to try the Gradwell MySQL code on my home server and see how long it takes.