While most of my blog entries are an example to the world on how to write in a way that can appeal to everybody, this one is going to be mostly a reminder to myself.
I’m having problems with my waypoint generator on the Linode, mostly because with only 96Mb of real memory, each individual generator task quickly becomes too big and then tasks start swapping, and everything gets horribly I/O bound.
At first it seemed that things were dying right at the very end, and so I lept to the conclusion that it must be in the sort phase, where it takes all the records that it’s retrieved from the database and stuck into an array of references to hashs, and sorts the array by ID. I solicited some opinions on that, and got some good ideas on how to sort by ID in the database while still allowing the priority of datasources that I use now. The most interesting one said
select ...
from waypoints w1
where ....
and field(datasource 'FAA', 'DAFIF', 'Thompson')
= (SELECT min(field(w2.datasource 'FAA', 'DAFIF', 'Thompson'))
from waypoints w2
where w1.id=w2.id)
order by w1.id
But before I had a chance to implement it, I did some testing on my own machine using “ulimit -v
” to simulate the reduced memory size. I ran an example query that produces a result file with 71197 records in it, honing in on the minimum memory size that would allow it to finish without getting an “Out of memory” error. Then I cut out the sort stage and did it again. And what I found surprised me. Cutting out the sort stage only saved me 375 bytes, reducing the memory size from 107625 to 107250 bytes. And made the time go from 1:46 to 1:35, a scant 10 seconds or 10%.
Looks like I’m going to have to find another way to reduce the memory footprint. And I keep coming back to this idea I had where I do the sorted query and write each record out to a temporary file as I retrieve it, storing only the id, PDB “unique id”, record number and the offset from the beginning of the temporary file. Then when that’s done, I go back and write the PDB file header, and the PDB file index (which consists of the offset from the beginning of the file, attibutes, category and the unique id), and then append the contents of the temporary file. That way I can avoid having the entire contents of the database in memory.
Side note about the PDB “unique id”: Each record in a PDB file has a 3 byte “unique id”. Normally when you’re creating a PDB file, you leave that as zero and the PDA itself fills it in when it loads the file. But when Laurie Davis created the CoPilot application, it used the unique id as the key to reference the waypoint records from the flight plans. So if I did leave them as zero and let the PDA fill them in, every time you reloaded your waypoint file your flight plans would get scrambled. So I maintain a table with a unique mapping between waypoint ids and “unique ids”. That way, even if you got, say, “KROC” from the FAA data this time and from the DAFIF data next time, your flight plans including KROC would still work, because both KROC ids would get the same “unique id”. That also means every time I load new data into the database, I have to find any ids that don’t currently have a “unique id” for them and generate some new ones. Occasionally I should purge no longer used ids and re-use their unique ids, because 3 bytes doesn’t give you a lot to play with.