Obviously there’s something wrong with the way I’m profiling perl scripts

I’m trying to reduce the memory foot print of my waypoint generation scripts. In order to see how much memory they use, I’ve been doing a (ulimit -v NNNN; ./CreateCoPilot.pl ….) and adjusting NNNN up and down to bracket where it fails due to lack of memory. I’ve had two problems with that:

  • The break point isn’t constant – a number will work on time and give me an out of memory error on the next run.
  • The numbers are nothing close to what I’d expect.

That second problem is the worst. Near the end of my script, after generating this huge (15,000 record) array of references to hashes, I sort it using the following code:

my $recordsArrRef = $self->{records};
my @newrecs = sort { $a->{waypoint_id} cmp $b->{waypoint_id} }
@{$recordsArrRef};
$self->{records} = \@newrecs;

It appears that at one point, there should be two arrays with 15,000 records in them, and yet when I benchmark the one where I’ve commented this code out against the one that has it, the unsorted one only saves 350 bytes. Ok, maybe it’s sorting in place, and all I’m saving is the actual code. Or maybe that isn’t the point of maximum memory usage. So then I looked in the Palm::PDB code, which is a library from somebody else. And at the end, after getting this array of hashes together, he goes through the array and encodes each one into binary, putting that into a different array. AHA, I thought, that means I’ve got the array of hashes and an array of encoded data records. Maybe what I should do is shrink the array of hashes as we build the array of encoded data. So I changed

foreach $record (@{$self->{records}})
{
...
$data = $self->PackRecord($record);

push @record_data, [ $attributes, $id, $data ];
}

to

while (defined($record = shift(@{$self->{records}})))
{
...
$data = $self->PackRecord($record);

push @record_data, [ $attributes, $id, $data ];
}

and I seem to have saved over 5,000 bytes. Not bad. But I think the has *got* to be a better memory profiling tool for Perl. Time to hit CPAN, I guess.

Oh oh

My linode has been off-line since 8:50 this morning. I checked thier support forum, and evidently 4 or 5 machines were off-line, linodes 39-43. (I’m on 41). The technician was checking into it. Then about 9:50, I suddenly couldn’t get into their support forum either. It looks like their whole site has gone tits up.

All my mailing lists are going to be dead until this gets fixed. Waaaa.

Bear with me here…

While most of my blog entries are an example to the world on how to write in a way that can appeal to everybody, this one is going to be mostly a reminder to myself.

I’m having problems with my waypoint generator on the Linode, mostly because with only 96Mb of real memory, each individual generator task quickly becomes too big and then tasks start swapping, and everything gets horribly I/O bound.

At first it seemed that things were dying right at the very end, and so I lept to the conclusion that it must be in the sort phase, where it takes all the records that it’s retrieved from the database and stuck into an array of references to hashs, and sorts the array by ID. I solicited some opinions on that, and got some good ideas on how to sort by ID in the database while still allowing the priority of datasources that I use now. The most interesting one said

select ...
from waypoints w1
where ....
and field(datasource 'FAA', 'DAFIF', 'Thompson')
= (SELECT min(field(w2.datasource 'FAA', 'DAFIF', 'Thompson'))
from waypoints w2
where w1.id=w2.id)
order by w1.id

But before I had a chance to implement it, I did some testing on my own machine using “ulimit -v” to simulate the reduced memory size. I ran an example query that produces a result file with 71197 records in it, honing in on the minimum memory size that would allow it to finish without getting an “Out of memory” error. Then I cut out the sort stage and did it again. And what I found surprised me. Cutting out the sort stage only saved me 375 bytes, reducing the memory size from 107625 to 107250 bytes. And made the time go from 1:46 to 1:35, a scant 10 seconds or 10%.

Looks like I’m going to have to find another way to reduce the memory footprint. And I keep coming back to this idea I had where I do the sorted query and write each record out to a temporary file as I retrieve it, storing only the id, PDB “unique id”, record number and the offset from the beginning of the temporary file. Then when that’s done, I go back and write the PDB file header, and the PDB file index (which consists of the offset from the beginning of the file, attibutes, category and the unique id), and then append the contents of the temporary file. That way I can avoid having the entire contents of the database in memory.

Side note about the PDB “unique id”: Each record in a PDB file has a 3 byte “unique id”. Normally when you’re creating a PDB file, you leave that as zero and the PDA itself fills it in when it loads the file. But when Laurie Davis created the CoPilot application, it used the unique id as the key to reference the waypoint records from the flight plans. So if I did leave them as zero and let the PDA fill them in, every time you reloaded your waypoint file your flight plans would get scrambled. So I maintain a table with a unique mapping between waypoint ids and “unique ids”. That way, even if you got, say, “KROC” from the FAA data this time and from the DAFIF data next time, your flight plans including KROC would still work, because both KROC ids would get the same “unique id”. That also means every time I load new data into the database, I have to find any ids that don’t currently have a “unique id” for them and generate some new ones. Occasionally I should purge no longer used ids and re-use their unique ids, because 3 bytes doesn’t give you a lot to play with.

Bax

Gordon Baxter died yesterday. He was 81.

I don’t think Gordon Baxter will mean much to most of you, but he was a radio personality (and I do mean “personality” – not a personality-free meat puppet or a screaming idealogue which as what they mean now when they use that term) and writer. And to me, he was the epitome of what it meant to be a pilot.

When I started to fly, I subscribed to a load of flying magazines, including the iconic “Flying”. They’d all arrive about the same time, but before I started on any of them I’d turn to the back of “Flying” and read “The Bax Seat”, Gordon Baxter’s column. One of the first one I read was about how he’d had to surrender his medical and couldn’t fly solo any more. Many of his later ones were about flights taken with kind friends who would be Pilot In Command but let him take the yoke or the stick for old times sake. Many of his articles made me cry.

He didn’t write much about the gory and mundane details of flying, weather, regulations, airspace, or the machines (although he loved his Mooney). Flying for him was about being in the company of people who you love like your best friend on first meeting because they share your love for flying. It was about the places he went and the people he met. And he wrote about it in an easy effortless manner that many have tried and failed to emulate – because it was obvious that they were trying, while “Bax” didn’t have to try, he just wrote.

A few years ago one of the Flying editors said that Bax was too sick to write any more, and they started running some of his best old columns in the magazine. Then they wrote how overwhelmed Bax was by the outpouring of love from people who’d never met him.

I never met Bax, and yet I feel his loss.

Blue skies, Bax.