I may need to rethink this…

I am currently working on a new data source for the waypoint generator. Unfortunately because of the way it’s licensed, it’s only going to be for the iPhone version of CoPilot, and I can’t make it available for GPX and other users. Now all of my data loaders have, up until now, been written in Perl, and I have a really good Perl module that performs many of the loading tasks, such as merging existing data with new data.

The new data comes in the form of a gigantic XML file with a kind of weird schema. The provider actually provides both the gigantic file, and also a smaller set of updates on the 28 day cycle favoured by the ICAO, so hopefully I’ll only have to parse the gigantic file once, and then process the updates. I installed XML::SAX and Expat, and coded up a preliminary decoder to extract some (but not all) of the information that I need, just to make sure I was doing it right. I ran it with a subset of the data, and it seemed to be doing ok, and then just for grins while I was working on improving the code, I fired it off on the whole file. That was 3 days (72 hours) ago. It’s still running. Unfortunately I didn’t put in any progress messages so I don’t know where it is in file, only that it’s past the airport section that I care about. I profiled the subset data, and verified that Perl is spending most of its time in Perl code, not in native code – some of it mine, some of it XML::SAX, and some of it in Moose.

So here’s the conundrum: Do I spend the time to re-write this loader code in another language and hope it’s faster? Or do I accept the fact that this is going to take forever, but hopefully I’ll only have to do it once and then the updates will be small enough that I can do them in perl? Because re-writing in another language means re-writing all the data merging and validation logic code, and could be a potentially huge project. And I won’t know until it’s all working whether it’s going to be faster.

Update: I profiled the perl program with a semi-large dataset. Here’s the results:

dprofpp
Total Elapsed Time = 56.86461 Seconds
User+System Time = 46.10461 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
20.5 9.494 23.288 397862 0.0000 0.0001 XML::SAX::Expat::_handle_start
15.4 7.136 12.820 131698 0.0000 0.0000 XML::SAX::Expat::_handle_char
14.7 6.787 55.922 1 6.7867 55.921 XML::Parser::Expat::ParseStream
13.6 6.311 12.977 397862 0.0000 0.0000 XML::SAX::Expat::_handle_end
7.07 3.258 3.258 472462 0.0000 0.0000 XML::NamespaceSupport::_get_ns_det
ails
6.79 3.132 3.132 397862 0.0000 0.0000 XML::NamespaceSupport::push_contex
t
6.48 2.986 5.685 131698 0.0000 0.0000 XML::SAX::Base::characters
4.24 1.953 1.953 131698 0.0000 0.0000 EADHandler::characters
3.87 1.786 4.411 397862 0.0000 0.0000 EADHandler::start_element
3.78 1.744 12.308 211270 0.0000 0.0000 XML::SAX::Base::__ANON__
3.69 1.702 1.838 4000 0.0004 0.0005 Data::Dumper::Dumpxs
2.55 1.174 5.870 397862 0.0000 0.0000 XML::SAX::Base::start_element
2.44 1.124 3.956 397862 0.0000 0.0000 XML::NamespaceSupport::process_ele
ment_name
1.93 0.892 0.892 397862 0.0000 0.0000 XML::NamespaceSupport::pop_context
1.85 0.854 5.768 397862 0.0000 0.0000 XML::SAX::Base::end_element

Note how it’s dominated by XML::SAX::Expat.

We’re back, baby!

With new hardware donated by a very generous friend, I’m back up and running again. Hopefully I’ll have time to post some of the millions of things that have happened in the couple of weeks I’ve been down, but for now I’ll say that the old “new” server died with a million errors that looked SATA related, the disks checked out fine, and they’ve now been placed in new hardware. Oh, and you never know what you’ve been leaving out of your backups until *after* you type “mkfs.ext3 -j -c -c /dev/xen-space/xen1-disk”

Same shit, different job…

I’ve written a few times (here and here) about how every time you change something, every bug anywhere near that area now becomes your fault.

In my current job, I was in charge of a system called “Entitlements” that controlled who could do what and could access what parts of the system. Which means that dozens of new defects come to me with a note from the business analyst or equivalent person saying “looks like an Entitlement issue”. And I have to look at it and say “no, the reason they can’t access that part of the site isn’t because of Entitlements, it’s because NOBODY WROTE THAT PART OF THE SITE YET”.

Side note: we’re using “Agile Development”, which is a short form way of saying “we don’t know what the fuck we’re doing from day to day, and we’re not sure what has been done and what hasn’t until somebody complains that it’s not done”.

The good part is that because we’re Agile, that means when I discover that the problem is that nobody wrote that part of the site yet, I get to write it. So yay me.

More Mailman idiocy

I’ve written a few times in the past about idiots who get their monthly email reminders from my Mailman mailing lists and then write to me personally to unsubscribe instead of following the instructions in that email reminder.

For the last couple of months, somebody has been doing that, with a pissy “this is the [N]th request” just as a topper. I write back with “That email you got a few hours ago contains 3 different ways of unsubscribing yourself from the list, and nowhere is “writing a pissy email to the server administrator” listed as a viable option”. Unfortunately, in this case it turns out that the email address the guy is writing me from bounces. And it’s not subscribed to any of my mailing lists. So even if I wanted to unsubscribe him, I can’t.

So I guess I’ll just wait to see what “N” he gets to before he gets really mad. Not that I’ll be able to do anything about it. Or care, for that matter.

Why I don’t consider myself a Linux person any more.

Time was, I was an enthusiastic Linux geek, proselytizing, apologizing, saying “well, it doesn’t now, but somebody will write something to do that”, overlooking the visual horror of the clashes of look and feel and user experience of all the disparate programs written on all the disparate X11 widget sets (yes, I could tell the difference between Xt and Xm at a glance), actually not laughing in people’s faces when they said that Gimp was better than Photoshop, ignoring the fact that Richard Stallman is a smelly looney who eats his toe jam in publc, etc. But over the years, two things have happened:

  • I care more about user experience than I do about raw computing power
  • I don’t apologize for my computers any more

Or to quote Three Dead Trolls In A Baggie “yeah, well I’ve got a girlfriend and things to get done.”

So I use Linux on my servers, and I think it’s a great OS for servers. I even contribute to open source products here and there. I hardly ever use it as a desktop any more, although it was my daily work desktop a year ago, and it was fine for work where video and audio didn’t matter. I’m just not anything like the “freetard” I used to be. Which is why I recognize the type so readily. And when somebody sends me something like this, and thinks it says something about how iPad is nothing new, I can instantly recognize the scent of crazy. Especially since it was sent to me in response to my saying that I hope HP hurries up with the Palm WebOS-based tablet because I like the user experience (UX) of WebOS better than I like iOS.

I’m sorry, but if you think somebody who is debating the subtle differences in UX between WebOS and iOS is going to like a hefty laptop with the keyboard broken off, running Windows XP or Linux, with no multi-touch, a stylus and a battery life that’s probably measured in minutes, you have greatly misunderstood the question. Or the purpose of a tablet. Or the meaning of life.