I’ve started to port my navaid.com applications to run on the Linode. And I’ve got trouble. If I have three or four simultaneous generator processes running, they all will stall out. And I don’t mean that one will keep running while the rest stall, I mean that none of them will make any progress for 10-15 minutes, and then suddenly they’ll all start running again. I’m seeing load averages go up over 10 during the stall, and then come down to between two or three while they’re running, and then go back up and stall.
I think the problem is a lack of RAM. The only explanation for such high load averages and the stalls that I can find is that if the (virtual) machine is doing a lot of swapping – certainly while the load average is in the stratosphere “top” is reporting almost no CPU usage. And I can’t really see that paying $5/month for an extra 16Mb (bringing me up to 112Mb) is going to help a lot. What I really need is the sort of RAM I have on my home server (1024Mb) or a way to keep these processes from getting so big.
The individual “postmaster” processes are quite big – I wonder if turning on autocommit might shrink that. I’d turned it off, hoping that having the generator processes happening in a transaction would mean that if I load the data in a separate process while somebody is generating that they won’t get an inconsistent view.
And the CreateCoPilotDB processes are huge – at least part of that because the Palm::PDB code just puts everything into hash tables in memory until it’s all collected, and then writes it out en-mass at the end. There’s a reason for that – near the beginning of the PDB file is an index with the file offset of each individual record – and you can’t tell the offset to the first record until you know how big the index is going to be. But I had a thought about that last night – maybe I can write the actual records off into a temp file, and only store the relative offsets from the start of the first record in memory. Then at the end I can write the header and the index using (sizeof index + stored offset), and then append the temporary record file onto that file. Might be worth a try.
I have another problem where sometimes the web page progress meter will time-out and show a server error instead, but I can just raise the Apache TimeOut parameter from 30 seconds to 120 seconds like I have it at home.
I’m not sure what I’m going to do if I can’t fix the performance issues. Possibly bring the navaid site, or at least the Postgres database part of it, back onto my home server. I don’t like that idea at all.
One has a great deal of control over postgresql’s memory consumption. Have you tweaked the various numbers in /var/lib/pgsql/*.conf? If you suspect your virtual linux box is swapping, does vmstat/iostat track those events properly? If on the other hand the hosting linux box is swapping, then you have reason to complain to management.