Damn you, Linode

For the second weekend in a row, my linode node has died. This time, the linode.com web site is down as well. From what I can glean from the linode IRC channel (which isn’t on linode), about half of their servers are dead to the world.

For me, that means no outgoing email, no mailing lists, and of course my hosted web sites including navaid.com are all down. This sucks.

Last week’s outage was caused because some clueless tech at ThePlanet, which is the colo where their servers live, moved some power connections around (after being explicity told not to touch anything) and overloaded a power supply. That took several hours to resolve.

Update
It’s up again, after only 6 hours. Geez, this sucks.

Getting spammed in earnest now

After I moved my blog from MoveableType to WordPress, it seemed that the comment spammers couldn’t find my new blog. For a while there, it even seemed that referrer spam had dropped down to nearly nothing. But they’re back, with a vengeance. They’re still trying to spam my old blog which doesn’t exist anymore, and Maddy’s blog which doesn’t accept comments any more, but now they’re spamming my blog. Or attempting to, anyway. SpamKarma is catching them all, but right now it’s catching 20-30 comment spams a day.

It’s a frustrating waste of my time and resources. I don’t pay for disk space and network bandwidth so that these vandals can use it up.

Obviously there’s something wrong with the way I’m profiling perl scripts

I’m trying to reduce the memory foot print of my waypoint generation scripts. In order to see how much memory they use, I’ve been doing a (ulimit -v NNNN; ./CreateCoPilot.pl ….) and adjusting NNNN up and down to bracket where it fails due to lack of memory. I’ve had two problems with that:

  • The break point isn’t constant – a number will work on time and give me an out of memory error on the next run.
  • The numbers are nothing close to what I’d expect.

That second problem is the worst. Near the end of my script, after generating this huge (15,000 record) array of references to hashes, I sort it using the following code:

my $recordsArrRef = $self->{records};
my @newrecs = sort { $a->{waypoint_id} cmp $b->{waypoint_id} }
@{$recordsArrRef};
$self->{records} = \@newrecs;

It appears that at one point, there should be two arrays with 15,000 records in them, and yet when I benchmark the one where I’ve commented this code out against the one that has it, the unsorted one only saves 350 bytes. Ok, maybe it’s sorting in place, and all I’m saving is the actual code. Or maybe that isn’t the point of maximum memory usage. So then I looked in the Palm::PDB code, which is a library from somebody else. And at the end, after getting this array of hashes together, he goes through the array and encodes each one into binary, putting that into a different array. AHA, I thought, that means I’ve got the array of hashes and an array of encoded data records. Maybe what I should do is shrink the array of hashes as we build the array of encoded data. So I changed

foreach $record (@{$self->{records}})
{
...
$data = $self->PackRecord($record);

push @record_data, [ $attributes, $id, $data ];
}

to

while (defined($record = shift(@{$self->{records}})))
{
...
$data = $self->PackRecord($record);

push @record_data, [ $attributes, $id, $data ];
}

and I seem to have saved over 5,000 bytes. Not bad. But I think the has *got* to be a better memory profiling tool for Perl. Time to hit CPAN, I guess.

Oh oh

My linode has been off-line since 8:50 this morning. I checked thier support forum, and evidently 4 or 5 machines were off-line, linodes 39-43. (I’m on 41). The technician was checking into it. Then about 9:50, I suddenly couldn’t get into their support forum either. It looks like their whole site has gone tits up.

All my mailing lists are going to be dead until this gets fixed. Waaaa.

Bear with me here…

While most of my blog entries are an example to the world on how to write in a way that can appeal to everybody, this one is going to be mostly a reminder to myself.

I’m having problems with my waypoint generator on the Linode, mostly because with only 96Mb of real memory, each individual generator task quickly becomes too big and then tasks start swapping, and everything gets horribly I/O bound.

At first it seemed that things were dying right at the very end, and so I lept to the conclusion that it must be in the sort phase, where it takes all the records that it’s retrieved from the database and stuck into an array of references to hashs, and sorts the array by ID. I solicited some opinions on that, and got some good ideas on how to sort by ID in the database while still allowing the priority of datasources that I use now. The most interesting one said

select ...
from waypoints w1
where ....
and field(datasource 'FAA', 'DAFIF', 'Thompson')
= (SELECT min(field(w2.datasource 'FAA', 'DAFIF', 'Thompson'))
from waypoints w2
where w1.id=w2.id)
order by w1.id

But before I had a chance to implement it, I did some testing on my own machine using “ulimit -v” to simulate the reduced memory size. I ran an example query that produces a result file with 71197 records in it, honing in on the minimum memory size that would allow it to finish without getting an “Out of memory” error. Then I cut out the sort stage and did it again. And what I found surprised me. Cutting out the sort stage only saved me 375 bytes, reducing the memory size from 107625 to 107250 bytes. And made the time go from 1:46 to 1:35, a scant 10 seconds or 10%.

Looks like I’m going to have to find another way to reduce the memory footprint. And I keep coming back to this idea I had where I do the sorted query and write each record out to a temporary file as I retrieve it, storing only the id, PDB “unique id”, record number and the offset from the beginning of the temporary file. Then when that’s done, I go back and write the PDB file header, and the PDB file index (which consists of the offset from the beginning of the file, attibutes, category and the unique id), and then append the contents of the temporary file. That way I can avoid having the entire contents of the database in memory.

Side note about the PDB “unique id”: Each record in a PDB file has a 3 byte “unique id”. Normally when you’re creating a PDB file, you leave that as zero and the PDA itself fills it in when it loads the file. But when Laurie Davis created the CoPilot application, it used the unique id as the key to reference the waypoint records from the flight plans. So if I did leave them as zero and let the PDA fill them in, every time you reloaded your waypoint file your flight plans would get scrambled. So I maintain a table with a unique mapping between waypoint ids and “unique ids”. That way, even if you got, say, “KROC” from the FAA data this time and from the DAFIF data next time, your flight plans including KROC would still work, because both KROC ids would get the same “unique id”. That also means every time I load new data into the database, I have to find any ids that don’t currently have a “unique id” for them and generate some new ones. Occasionally I should purge no longer used ids and re-use their unique ids, because 3 bytes doesn’t give you a lot to play with.