Obviously there’s something wrong with the way I’m profiling perl scripts

I’m trying to reduce the memory foot print of my waypoint generation scripts. In order to see how much memory they use, I’ve been doing a (ulimit -v NNNN; ./CreateCoPilot.pl ….) and adjusting NNNN up and down to bracket where it fails due to lack of memory. I’ve had two problems with that:

  • The break point isn’t constant – a number will work on time and give me an out of memory error on the next run.
  • The numbers are nothing close to what I’d expect.

That second problem is the worst. Near the end of my script, after generating this huge (15,000 record) array of references to hashes, I sort it using the following code:

my $recordsArrRef = $self->{records};
my @newrecs = sort { $a->{waypoint_id} cmp $b->{waypoint_id} }
@{$recordsArrRef};
$self->{records} = \@newrecs;

It appears that at one point, there should be two arrays with 15,000 records in them, and yet when I benchmark the one where I’ve commented this code out against the one that has it, the unsorted one only saves 350 bytes. Ok, maybe it’s sorting in place, and all I’m saving is the actual code. Or maybe that isn’t the point of maximum memory usage. So then I looked in the Palm::PDB code, which is a library from somebody else. And at the end, after getting this array of hashes together, he goes through the array and encodes each one into binary, putting that into a different array. AHA, I thought, that means I’ve got the array of hashes and an array of encoded data records. Maybe what I should do is shrink the array of hashes as we build the array of encoded data. So I changed

foreach $record (@{$self->{records}})
{
...
$data = $self->PackRecord($record);

push @record_data, [ $attributes, $id, $data ];
}

to

while (defined($record = shift(@{$self->{records}})))
{
...
$data = $self->PackRecord($record);

push @record_data, [ $attributes, $id, $data ];
}

and I seem to have saved over 5,000 bytes. Not bad. But I think the has *got* to be a better memory profiling tool for Perl. Time to hit CPAN, I guess.

2 thoughts on “Obviously there’s something wrong with the way I’m profiling perl scripts”

  1. I’ve been thinking about this problem. I’m not sure that Perl and/or SQL is the right choice for the problem you’re trying to solve in a constrained memory environment. In C some 15,000 nodes would only require some 45K of storage (plus some overhead), at the expense of reading all records once, and selected records a second time on output. If you need records sorted then sort-on-insert would be efficient. Is there a way you could break the problem into two, and use Perl for the display/control, and use C (or java, or other language) for the data-retrieval?

  2. Richard, I’ve thought about something like that, but I like the flexibility of the relational model. For instance, I have a separate table for runway data, and another one for communications frequencies.

    I’m currently looking at using BerkeleyDB to handle the intermediate storage of the array, instead of slurping it all into memory.

Comments are closed.