Reason 147 why MySQL is not my favourite RDBMS

There are three components to my waypoint generator

  • A set of scripts to load or reload some of the data when an update comes in.
  • A set of scripts that actually generates the databases.
  • A web interface.

All three components are written in Perl, and all access the same database. As mentioned previously, I’m using MySQL because PostgreSQL was too slow on the limited resources I have on my Linode.

Last night, I ran one of the load scripts, and while it was running I tried to access the web interface. The web interface start up accesses and updates a couple of “session information” tables, which the load scripts have no reason to access. So somebody tell me why the web interface startup timed out with the error:

[Wed Jun 08 22:08:37 2005] [error] [client 66.67.112.52] FastCGI: server "/config_backup/navaid.com/htdocs/CoPilot/index.fpl" stderr: DBD::mysql::st execute failed: Lock wait timeout exceeded; try restarting transaction at /config_backup/navaid.com/perl/WaypointDB.pm line 312.

Line 312 in WaypointDB.pm is a line that deletes from the table sess_main. And like I said, nobody else should be updating it. So why the hell should the fact that a load script is running cause a lock wait timeout on that table?

Ok, I’m an idiot and Linode is back on the table

It turns out that that test I ran yesterday that showed that Linode was even slower in mysql than it was in Postgres? Well, it turns out that I’d left the “;host=mysqldb.gradwell.net” in the connect string, so instead of hitting my local mysql database, I was actually going across the Atlantic Ocean to hit a database at Gradwell. D’OH!

I switched to using the local database, and the time came down to a slightly more acceptable 7+ minutes, but I was still I/O rate limited much of the time. Then I switched to using another guy’s database on his Linode (much better provisioned than mine) and the time went down to about 3+ minutes, and I never hit my I/O limit even once. (Which makes me think that multiple generators running at the same time won’t slow to a crawl.)

Linode probably a total washout

I’m starting to think that I won’t be able to host my application on Linode at all. Here’s the results of my latest testing:

Database Home Gradwell Linode
PostgreSQL 7:46   21:01
MySQL 0:32 1:01 42:40

The abysmal performance on the last run, MySQL on Linode, appears to be because I’ve hit some sort of I/O limiting that they do when people do too much disk I/O (i.e. swap).

I’m going to try the tests again on Linode but with the database hosted somewhere else – either at home or on my Gradwell server. Even if that works, I’m not sure what that will mean about my options.

More bad news on the Linode front

Followup to Rants and Revelations » Bad news on the Linode front:

I ran the same generator task on Gradwell and the Linode. On Linode, it took 21m1.1s, on Gradwell, 1m0.5s. Kind of a huge difference, don’t you think? So I copied the database and code to my home machine, which has 1024Mb of RAM instead of 96Mb, and dual Athlon MP1800+ processors, and it still takes 7m46s.

So either Postgres is way slower than MySQL, or I’ve done something really wrong when I ported the code.

I guess my next move is to try the Gradwell MySQL code on my home server and see how long it takes.

Bad news on the Linode front

I’ve started to port my navaid.com applications to run on the Linode. And I’ve got trouble. If I have three or four simultaneous generator processes running, they all will stall out. And I don’t mean that one will keep running while the rest stall, I mean that none of them will make any progress for 10-15 minutes, and then suddenly they’ll all start running again. I’m seeing load averages go up over 10 during the stall, and then come down to between two or three while they’re running, and then go back up and stall.

I think the problem is a lack of RAM. The only explanation for such high load averages and the stalls that I can find is that if the (virtual) machine is doing a lot of swapping – certainly while the load average is in the stratosphere “top” is reporting almost no CPU usage. And I can’t really see that paying $5/month for an extra 16Mb (bringing me up to 112Mb) is going to help a lot. What I really need is the sort of RAM I have on my home server (1024Mb) or a way to keep these processes from getting so big.

The individual “postmaster” processes are quite big – I wonder if turning on autocommit might shrink that. I’d turned it off, hoping that having the generator processes happening in a transaction would mean that if I load the data in a separate process while somebody is generating that they won’t get an inconsistent view.

And the CreateCoPilotDB processes are huge – at least part of that because the Palm::PDB code just puts everything into hash tables in memory until it’s all collected, and then writes it out en-mass at the end. There’s a reason for that – near the beginning of the PDB file is an index with the file offset of each individual record – and you can’t tell the offset to the first record until you know how big the index is going to be. But I had a thought about that last night – maybe I can write the actual records off into a temp file, and only store the relative offsets from the start of the first record in memory. Then at the end I can write the header and the index using (sizeof index + stored offset), and then append the temporary record file onto that file. Might be worth a try.

I have another problem where sometimes the web page progress meter will time-out and show a server error instead, but I can just raise the Apache TimeOut parameter from 30 seconds to 120 seconds like I have it at home.

I’m not sure what I’m going to do if I can’t fix the performance issues. Possibly bring the navaid site, or at least the Postgres database part of it, back onto my home server. I don’t like that idea at all.