Archive for January, 2007

Why is it that a tech support problem doesn’t become an emergency until it’s most inconvenient?

Wednesday, January 31st, 2007

I was quite literally standing up and collecting my wallet to go to lunch when I got approached by one of our support people and a fellow developer with a problem at a partner site. They described the problem and asked if I can log on to see if I can figure out the answer. I asked if it could wait until after lunch, and they said no because the system is going to be packed up and shipped to the evaluation center and it has to be working by then, and so we’ve got a 1 hour window to get it working.

So I go over and get Bob to log me on, and at first I’m lost. The subsystem that sends back the heart beat from one system to the other was the responsibility of a programmer who doesn’t work here any more. The log files don’t help much. But one thing I notice is that according to the logs, they’ve had this problem as far back as the logs go, about 14 days. Ok, it’s been a problem for 14 days, but evidently you can’t be bothered to fix it until there is 1 hour to go before an irrevokable deadline. Why is this my problem again?

I do some poking around a config file that I’ve had very little to do with in the past and I notice something glaring - the system is configured as [system] “2″, but it has another similar parameter set to an id of “1″. Ok, leaving aside the little question of why there are two parameters that control the same thing when one could probably be derived from the other, is this it? I change the second parameter to a 2 and restart it. And almost immediately start seeing heartbeats coming.

I’m out of there before they can find something else for me to work on, and the cafeteria doesn’t close for another 45 minutes. Yeah, me.

User interface design for programmers with no sense of style

Wednesday, January 31st, 2007

Years ago I was working on Kodak’s Cineon system, an innovative system for digital post-production of movies. It was a great gig, but unfortunately Kodak pulled the plug because there were too few post-production houses doing digital work to support a competitive marketplace, and because some really questionable business decisions were increasing the development costs. I still think they should have held on a bit longer until the market caught up to us, but that’s life.

One of the projects I worked on was a “clip editor”, where the users got a view of multiple film clips (ie. different shots from the same scene) and they could cut them, shrink or expand them, slide them up and down relative to each other, and then define the transitions between them. Our competitors called theirs a “virtual light table” or something like that. Ed Hanway was doing the guts of the program, and I was doing the user interface. I liked working with Ed - he’s one of a handful of people I’ve worked with over the years I’d consider as good if not better than me, and easy going and easy to get on with in spite of it.

I had the basic outline for what I wanted the clip editor to look and work like, but I felt that my own aesthetic sense was lacking (which you’d agree with if you’ve seen the way I dress), so I wanted some feedback on the aesthetic aspects of the design. Kodak didn’t have a Corporate Design and Usability department like they do now, or if they did nobody was telling me about it, but since the Cineon tech support department was staffed by people who edited movies, I figured they’d have some artistic instincts.

Oh quick aside here - our tech support people often pitched in at customer sites when they were using our software on big projects, which meant that you could always tell the Cineon people at a Rochester movie theatre, because we’d always wait for the very last credit and cheer when it had one of our people’s names.

But I couldn’t get anybody to answer any of my questions. So I figured I’d force the issue by choosing two of the most hideous colours I could find. I think I chose two that had pre-defined colour names in OpenGL, but I toned them down a bit because the people using our software always seemed to do it in dark rooms and the rest of our interface was in shades of grey because of that. I think I ended up with a sort of mauvey-pink and a light limey-green. I knew *somebody* would have to complain about these colours, and then I could ask them what colours they thought it should be.

Oh, another aside - the program had been started at Kodak’s Australian office, and then moved to Rochester, and then we had some code contributed by the office in London England, and then half the development moved to San Francisco for no good reason. One of the things that led to was continual problems with the spelling of the words “colour” and “grey”. You’d find both Commonwealth and US spellings of both those words in method names, and sometimes both variations in the same method. The method name confusion was the worst - you’d write your code to call “adjustColourSpace3D” only to have the compiler bitch because you meant to call “adjustColorSpace3D”.

But I never got any complaints about the colours, so that’s how they went out in the release. And a year or two later, somebody brought some literature back from our big trade show, ShowWest, and lo and behold one of our competitors had copied my hideous colour scheme in their virtual light table.

A few weeks ago I was telling this story at lunch, and one of the other former Cineoners who got to go to customer sites mentioned that the customers had loved my hideous colour scheme because of how well it stood out. Huh. Who knew?

I guess the secret to good user interface design is to purposely make something that offends my senses, and I’ll come up with something that normal people like.

Gratuitous Icon Post

Tuesday, January 30th, 2007

Since cat_macros won’t accept something that isn’t a cat, I present:

No fucking thank you, Quicken tech support

Tuesday, January 30th, 2007

Update: Almost as soon as I finished writing the rant below, I got an email from Quicken tech support explaining what I should have done - used “Activities->Prior Statement” (which would have shown that list of prior statements that I semi-accidentally discovered on my own) and then used splat-D to delete them. Delete doesn’t appear on the Prior Statement window anywhere (nor anywhere on the help that comes up if you press the ? button that I could see), but evidently it’s there on the main menu under Edit, so I should have just known to look.

I got a paper bank statement the other day, covering up to the 5th of January. I went into Quicken as I had so many times before, only to find that all the stuff I’ve downloaded from the bank recently was marked with an “R” (meaning “Reconciled”) instead of a “C” (meaning “Cleared”) like it’s supposed to be. Attempting to click on any of these “R”s just takes you to the Reconcile page, which won’t show you any of these transactions that have supposedly been Reconciled. And it also complains about the fact that the date range you are attempting to reconcile comes before the last “statement” that it evidently processed.

I couldn’t find anything on the local help, or the on-line help, so I clicked the “tech support chat” button, which had me enter some information on what version of Quicken I’m using and a brief description of the problem, then connected me to somebody with an Indian sounding name (which isn’t too surprising at 9pm on a Sunday). I briefly described the problem again, and he/she said they could email me some steps that would solve my problem. I said that would be fine, and clicked off. The mail came some minutes later, but by that time I’d had enough of bill paying and put it off to today to deal with.

Today I read the instructions. The very first step says “First reconcile your transactions, where you see R click on it make it C .” Since the very problem I was complaining about was that once these things were “R”, I couldn’t make them “C”, that was less than helpful. The rest of the instructions about saving a copy of the file to my desktop were a little vague, but when I followed them as best as I could, I ended up with exactly the same results as before. So I threw away the copy of the file, and wrote a reply to the email telling them that their steps didn’t work, and could they please actually try reading the problem statement before sending me boiler plate responses to a similar but different problem?

And just as I sent it off, I got an email asking me to take a customer satisfaction survey. I was pleased to report that they didn’t solve my problem, I was very dissatisfied with the quality of their tech support, and in the “suggestions for improvement” I suggested they actually try reading what the customer is saying.

Then I got to playing with Quicken again. I tried clicking those pesky “R”s with various modifier keys, and found one (I think it was Control) that allowed me to uncheck it. Which brought up a warning about how I was going to mess up the reconcilliation, and then another dialog about how I had to save the transaction, and another dialog about something else I was going to mess up. If I was going to have to go through this with every “R” transaction that I wanted to un-reconcile, it was going to be a long night.

Some more messing around brought up a list of previous reconcilliations, which showed a succession of monthly manual reconcilliations, and then two this month which must have happened at Quicken’s whim when I downloaded bank transactions. I clicked on each of the two bogus reconcilliations, and clicked the “Re-reconcile”, which brought up the same dialog I normally use to reconcile. I clicked “uncheck all” and then “save”. It warned me that the reconcilliation didn’t balance, and it was going to put in a new transaction to fix them. After it was finished with that, I found and deleted the two “reconcillation adjustment” transactions. And then I tried the paper reconcilliation. And you know what? It worked. Everything looks fine now, but I don’t doubt I’ll have some problems with the next paper statement as well. But now I think I know how to fix them.

And just in case, I turned off the “auto reconcile” option entirely. I’ve had that on since I started using Quicken in 2002, with the submenu “reconcile when I recieve a paper statement on the 22nd of the month”, and it’s never caused problems before, but I suspect it was that causing the problem. The option probably didn’t mean what I thought it meant, but I guess I’ll see when the next paper statement comes.

Getting there, still some collateral damage

Monday, January 29th, 2007

I spent much of the weekend working on my waypoint generator - more specifically on the code that loads new data into the waypoint generator database. As anybody who reads this blog probably heard (and forgot), the main source of data for outside the United States, the NGA ’s DAFIF is no longer available, and so I’ve been working on ways that people can contribute data that fills in the gaps and updates the DAFIF data as it gets old.

The problem is how to integrate all these different sources of data of various levels of quality and completeness, and somehow pick out the “best” parts of them all. And that’s tricky, because points can move and their IDs can change. Worse still, the various data sources aren’t very consistent about a point’s “type” . Points can move because they’re re-surveyed, or because the navaids that define then have been re-aligned with magnetic north. IDs can change because an airport needed to get a ICAO compatible ID so that they could participate in weather reporting or international flights, or because of political expediency (there is an intersection near here named “PTAKI” - you can bet it wasn’t called that before George Pataki was governor of New York, and it probably won’t be called that after he’s gone). And types are the biggest problem - one source will have a “NDB/DME”, but another will have separate NDB and DME of the same ID at the same location, there is some mixing for “VOR/DME” and “VOR” for the same naaid, and one source uses the type “DVOR/DME” for what DAFIF would call a “VOR/DME”. The worst are the airborne points - the FAA and DAFIF could never agree on whether a point was a REP-PT or a CNF or a AWY-INTXN or a RNAV-WP.

And to make matters worse, one of the data that I used years ago was “waypoint.nl”, a site that catered to GPS users and Geocachers - it seemed likely to me that many of the airport locations were probably taken at the airport fence, but it had a bunch of airports that nobody else had, so I took them. Well, now that I’m getting better data, one of the things I’m finding is that their airport ids are frequently wrong. I’d like to just get rid of them entirely, but they cover countries that nobody else does.

So what all that leads to is if I’m loading somebody’s data, I can’t just delete all the existing points with the same id (or even the same id in the same general area) and insert what I’ve gotten. So I have to come up with a way to find the same point, even if it moved or changed ids, and use the old, existing data to fill in anything missing in the new data (like if the new data source doesn’t fill in runway or navigation frequency data). And hopefully, not lose any existing waypoints that might be “close”.

Because of the inconsistency of airborne waypoint points naming, I separate waypoint types into three types - “airport” and airport like, ground based navaids, and airborne waypoints. You’ll probably see why I mention that later.

The algorithm I settled on to find the existing point was to call an existing point the progenitor of the new point if the existing point was within 0.1 degrees of latitude and 0.1 degrees of longitude of the new point, it was in the same country (and in the case of the US and Canada, it was in the same state/province), and the same type (unless it was in the airborne waypoint category, where the same category was sufficient). In my first tests with some data I have for Ireland (after converting DVOR/DME to VOR/DME), that worked pretty well - it removed some garbage data from waypoints.nl, and updated some DAFIF data. But there was some collateral damage - a CNF named BAL86 was removed, because it was too close to a REP-PT called MARNO, a similar thing happened to GURGA near KEKUL, the SLG NDB/DME at Sligo was replaced by separate SLG NDB and DME, and who would ever have believe that they’d put waypoints named “TIMAR” and “TIMRA” within 0.17 nm of each other? Doesn’t that last one sound like a cockpit resource management nightmare?

I think I need to make a few small changes to the algorithm, but it’s mostly there. I hope. I need to make some bigger tests as well.

It’s not easy being green

Monday, January 29th, 2007

We got the results of our EnergyStar audit on Saturday. They’re recommending $20,000 worth of work, and promising that we’ll probably save at least $150 a month on average based on last years energy bills. They also said we could save another $150 a month if we did the windows, but doing them in a way that’s sensitive to the age and architecture of the house (ie. not replacing leaded glass windows and wood frames with modern plastic crap) would be really expensive - maybe $30,000 to $40,000.

The problem is that the net present value of $150 a month for 10 years (which is the expected lifetime of the new furnace) is only about $14,000. Obviously energy prices will go up, and the only energy year we have records for, last year, was unusually mild, so the savings might be greater in a year like this year. But it’s still hard to say “go ahead and spend that money” with such an uncertain pay-back. So I have to think about the non-monetary pay-back as well, like the fact that the house will be more comfortable, and it will reduce our carbon footprint, and it might have a small positive affect on the value of the house.

Still, $20,000.

Sigh.

Hey, tech support guys!

Thursday, January 25th, 2007

I have a really amazing suggestion for you: When the instructions I wrote say

cd /content/tmp
tar xvf /media/usbdisk/upgrade.tar

Either

  • Don’t tell the guy in the field to skip the “cd” step OR
  • Don’t come to me and tell me that the installer program can’t find the files.

Time to worry, or just a glitch

Wednesday, January 24th, 2007

I have mail logs going back to 24 December 2006. Recently, I noticed that every now and then one of my postfix processes will die with a “SEGV” (the dreaded Segmentation Violation). They appear at odd times in my logs, starting January 1, and continuing on the 7th, 8th, 13th, 14th, 15th, 16th, 17th (x2), 19th, 20th (x2), 23rd and 24th. It’s different processes each time, and each time it gives a warning about having some difficulty starting the replacement process (although the mail delivery continues, so I assume it starts up immediately after). I don’t see the same sorts of errors on my colo box, which uses an older version of postfix similarly configured.

I asked on the postfix-users mailing list, and got the totally unhelpful answer that it was something wrong in my config files - obviously wrong because it doesn’t happen if I start and stop postfix. And another person said my system memory probably was going. Well, that’s possible - the system is about 6 years old. It uses PC133 Registered RAM, which is still expensive - replacing the 1Gb I’ve got now would cost around $90, or about the same cost as 1Gb of the newest PC5400 RAM.

This machine is old for a server, and certainly the technology has passed this box by - it has AGP, not PCI Express, it has USB 1 (although I put in a PCI USB 2.0 card so I could use external hard drives), it has IDE instead of SATA, it refuses to boot without a PS/2 keyboard in spite of the fact that it’s perfectly happy with a USB keyboard after it’s booted. But on the other hand, it’s perfectly fast for what it does, and I’ve got three IDE hard drives and a 16x dual layer DVD burner in there and everything is just working the way I want it to. The only complaint I have is that it’s cranky - if I add new disks, I’ll often have to reboot three or four times before the bios will recognize them, and most times it won’t boot from a power on - I have to boot, wait for it to complain that there aren’t any hard drives in it, and then control-alt-delete it.

I don’t want a new server - if I were buying a new desktop now, it would be something to run a certain MMORPG faster. Maybe a Mac Pro with Boot Camp. But I want this server to continue to serve.

Lets hope this is just a little glitch.

Senator Leahy, you’re my favourite US politician

Tuesday, January 23rd, 2007

Listen here

It is beneath the dignity of this country, a country that has always been a beacon of human rights, to send somebody to another country to be tortured.

and

Attorney General Ashcroft said we got assurances. Assurances? From a country that we also say now “oh, we can’t talk to them because we can’t take their word for anything”.

Man, it’s great to listen to a Senator who remembers that there are three equal branches of government, not just one and two rubber stamps.

What is Hitachi thinking?

Tuesday, January 23rd, 2007

For years now, whenever I’ve had drive or controller problems, I’ve hauled out IBM’s DFT (Drive Fitness Test), even if the drive isn’t a DeathstarDeskstar. Now IBM’s drive division belongs to Hitachi, but DFT lives on. I used it last week to make sure my new colo box could handle the sorts of loads I wanted to put on it. But now that I have my old colo box back, I want to test it to see if the problems I was having might be fixed with a new drive cable before I sell it on eBay.

But this box doesn’t have a floppy. No problem, I thought, the Hitachi site has a bootable CD version. So I downloaded it and burned it and booted with it. But the first thing it does it scan the IDE controllers, and when it’s scanning “Secondary Slave”, it suddenly starts spewing errors about being unable to read A:\COMMAND.COM. Evidently DFT needs to read its own disk just at the moment that the drive was disconnected for scanning. So when they made the CD ISO, they didn’t actually test it, or didn’t think about how it works, and instead of using the “Linux Live CD” model where they make a ramdisk and load themselves into it, they just made a DOS boot partition on the CD and expect it to be there all the time.

I guess it’s off to my junk shelf to see if I have a floppy drive and cable.

If you’ve ever built an airplane model, stand in awe

Monday, January 22nd, 2007

This is the most amazing bit of modelling I’ve ever seen: Supermarine Spitfire Mk.I by David Glen (Scratchbuilt 1/5)

Message of the Day

Saturday, January 20th, 2007

MISS STEPHANY RODNEY (uknationalfiduciaryhqs@yahoo.co.uk): go fuck yourself.

That is all.

That was easy

Saturday, January 20th, 2007

I needed to re-arrange some disk space. I explained the situation in Rants and Revelations » Why didn’t I use LVM on everything? with a table showing the current layout and everything. At the time, my plan was:

  1. Migrate the content of /dev/hdc3 off using “pvmove” and “vgreduce”.
  2. Delete all three partitions on /dev/hdc3 and add it back to the vg using “pvcreate /dev/hdc; vgextend xen-space /dev/hdc”.
  3. Migrate the content of /dev/hde2 off using “pvmove” and “vgreduce”.
  4. Delete the /dev/hde2 partition and increase the disk of /dev/hde1 to fill up the drive, and use resize2fs to make /dev/hde1 use the whole partition.

I did steps 1-3, and it all worked perfectly. I didn’t have to shut down anything, and it didn’t interrupt the normal operation of either the dom0 or the domUs. But when I’d done that, I realized I actually had enough free space on the lv that I could do an even better plan:

  1. Set up a 250Gb lv.
  2. Use rsync to copy everything from /dev/hde1 to the lv.
  3. Once that was done, shut down domU 1.
  4. Make /dev/hde1 part of the lv.
  5. Make the 250Gb lv bigger using lvextend- I chose to add 100Gb to it, and I have space to add more if I need it.
  6. e2fsck -f” and “resize2fs” the lv.
  7. Restart the domU 1, using the lv instead of /dev/hde1.

This worked perfectly. The domU was down about 10-15 minutes tops. /dev/hde is still partitioned into two partitions, even though both partitions are part of the same vg. But other than that, it’s exactly what I’d have done if I were setting it up from scratch now.

Why didn’t I use LVM on everything?

Thursday, January 18th, 2007

Due to a series of historical accidents, I have the following disk space layout:

Partition Size Use
Disk 1 - 250Gb
/dev/hda1 2Gb dom0 root
/dev/hda2 2Gb dom0 swap
/dev/hda3 Rest part of vg “xen-space”
Disk 2 - 250Gb
/dev/hdc1 2Gb formerly dom0 root - unused
/dev/hdc2 1Gb formerly dom0 swap - unused
/dev/hdc3 Rest part of vg “xen-space”
Disk 3 - 400Gb
/dev/hde1 300Gb mounted as /dev/hdb on a domU
/dev/hde2 Rest part of vg “xen-space”

The root partitions of the three domUs are all lvs on vg “xen-space”. There is over 250Gb free on the vg.

What I would like to do is clean up the second drive to get rid of the extraneous partitions, and to grow the partition on /dev/hde1 to the full disk. So what I’m thinking of doing is the following:

  1. Migrate the content of /dev/hdc3 off using “pvmove” and “vgreduce”.
  2. Delete all three partitions on /dev/hdc3 and add it back to the vg using “pvcreate /dev/hdc; vgextend xen-space /dev/hdc”.
  3. Migrate the content of /dev/hde2 off using “pvmove” and “vgreduce”.
  4. Delete the /dev/hde2 partition and increase the disk of /dev/hde1 to fill up the drive, and use resize2fs to make /dev/hde1 use the whole partition.

The problem is that I don’t know if I can do this stuff without shutting down the domUs. And for the physical partition /dev/hde1, which is mapped to the /dev/hdb on one of my domUs, I don’t know if I have to shut down the domU or just umount it within the domU and remount it afterwards.

That was relatively painless

Wednesday, January 17th, 2007

The box is up. It was only offline for about 20 minutes, tops. Everything seems to be working. Fingers crossed.

I officially love LVM. And I wish my big binary file archive was on LVM, because it’s getting nearly full and I want to expand it. Of course, any changes in it will probably require shutting down because of the way the file system is mounted through Xen.