Upgraded to Kubuntu 12.10

A few weeks ago I got seriously pissed off about all the things that were broken on my Linux box, not least the fact that since the last time I upgraded Ubuntu the program “aptitude” kept telling me that I had to uninstall several hundred packages, including some that looked like majorly important ones, so I bit the bullet and did a fresh install of Kubuntu 12.04. The fresh install went ok, the usual few glitches and things that needed to be reconfigured. But then almost as soon as I got all that sorted out, I got a notification that Kubuntu 12.10 was out. And I figured that since I hadn’t done all that much since installing it, an upgrade would probably be no sweat.

My first indication of trouble was after it rebooted – I got a “grub rescue” prompt, and bugger all else. I tried a few things that are supposed to allow you to put in your boot partition and boot, but none of them worked. So I hauled out the CD I’d used to install Kubuntu 12.04 and booted into rescue mode. I mounted all the partitions, did a grub_install /dev/sda and rebooted, and I was back in business.

The second problem was that none of our laptops could print to the print queue that is shared out by the Linux box. I had made sure that the CUPS config files hadn’t changed, but evidently that wasn’t enough. I got the two Mac laptops printing to it by “changing” them from ipp to ipps print queues. (I should mention that neither Macs nor Windows boxes actually let you look at the existing print queue and change things like the URL). On the Windows box, I think what I had to do was change the print queue from using the name “PSC_1500_series” to “PSC-1500-series”. No idea what else I changed (because of the aforementioned problem seeing how you defined it already) but I think that was it.

The third problem was worse – this morning I got an email from somebody who reads his email on my box saying he hadn’t gotten any email since the upgrade. I looked in the mail log, and what I could see is that the local deliver program had been changed from procmail to /usr/lib/dovecot/deliver -c /etc/dovecot/conf.d/01-mail-stack-delivery.conf -m "${EXTENSION}" That was an extreme WTF moment. Further investigation revealed that this config file specified maildir instead of mbox. I just changing it to mbox, but then it complained that it didn’t have permission to write to /var/mail/ptomblin. I couldn’t find an option to tell this deliver program to run setgrp to mail. I also discovered that something had screwed up my postfix configuration to add this local delivery option, and also remove a bunch of my spam protection checks. So I removed the mail-stack-delivery package and the postfix-dovecot package, and restored all the config files. Things seem to be working again. And I used the formail command to process all the files in the various people’s “maildirs” and put them back in their mboxes.

My next trial and tribulation is that my hourly backup program, which uses lvm snapshots and rsync, is intermittently screwing up. Sometimes it can’t unmount the snapshot partition, and sometimes it can’t remove it (with the message Unable to deactivate open lvm2-home-real (252:12), and sometimes it just fails for no reason. I know there are a ton of race conditions in lvm snapshot stuff, so I already had a “sleep 10” after the lvremove. I added another one after the umount that preceeds the lvremove, in case umount suddenly got lazy and the reason it’s failing is that it hasn’t finished unmounting the partition. That seems to have quelled the major problems, but the lvremove command is spitting out the message /sbin/dmeventd: stat failed: No such file or directory and I need to figure out how to suppress that so I don’t get emailed every hour.

How to debug

I see an awful lot of posts on StackOverflow that show that the person asking the question hasn’t got the slightest clue how to go about debugging their problems. So here’s a few specifics for a few extremely common situations:

1. It’s not a bug in the compiler
It’s not a bug in the compiler, it’s never a bug in the compiler. Stop making that your default assumption. I’ve been a programmer for over 25 years, and the only time I saw a bug in the compiler was in the early versions of cfront, which was AT&T’s way to convert C++ programs into C programs so you could compile and link them with C tools. If you think there is even the slightest possibility that it’s a bug in the compiler, you’re going to stop looking before you see what you did wrong. And yes, you did something wrong. Similarly…

2. It’s probably not a bug in the library routines
The probability of a bug in the library routines depends a lot on the number of people using it. If it’s a core part of Java, chances are you’re not the first person to notice something the other 25 million Java developers somehow overlooked. If it’s a project that you found on SourceForge that hasn’t been updated in 4 years and only had one developer, it’s a possibility, but one you should discount until you’ve made sure you’re calling it right.

3. Null Pointer Exceptions happen for a reason
If you got a NullPointerException in your code, or any type of exception in library code, you did something wrong. Look at the stack trace. Look for the top-most entry that is your code. Look at the line there. Think about what you see on that line. Can one of those variables be null? Did you initialize all the variables in every possible way through the code to that point? Are you giving the correct arguments to whatever library code you’re calling? If necessary, put a breakpoint there or throw in some debugging statements and print out what you’re using in that line to make sure they’re what you expected.

4. Debugging AJAX calls is hard, but it’s easier than trying to explain it on StackOverflow
A large number of questions are of the nature “I do this ajax call, and it doesn’t work”. What doesn’t work? Are you making the call? Is the server receiving it? Is the server doing the right thing with it? Is it passing back what you’re expecting?

The first thing you need to do is use a good debugger. If the problem happens on Firefox, then you’re in luck because you can use Firebug + Firequery.

If you’re unlucky, and the problem only happens in IE (and face it, those are your only two alternatives because if a problem happens in any non-IE browser, it happens in all of them, whereas if you code works in IE8 you’re not 100% sure it works in IE7 or IE9), then you need to use whatever debugger options are available to you. I found some useful information here and I end up using a combination of Firebug Lite and IE Developer Toolbar. Fortunately most of the IE8 and IE7 problems I’ve encountered happen in IE9 with the Browser Mode and Document Mode set appropriately.

Once you’ve got your debugger up, you want to set a breakpoint on the actual ajax call (to verify that you’re actually getting to the call and not missing it for some other reason), on the success callback (to verify that the server has sent a response) and on the failure callback (to verify that the server didn’t throw up its hands and give up). It also helps if you’ve got access to the server side logs and can see what’s going on there as well, but that’s often not possible like when you’re calling somebody else’s web API. In the IE debugger, you need to go to the Network tab and “Start Capturing”, and in Firebug you just need to look at the “Console” tab. After the ajax call returns, you can look at the appropriate tab and see what was sent to the server and what came back. And in the success callback you can look at the returned response and single step through the logic to see if you’re doing the right thing with it. And you can do all that in less time than it would take to write a question to StackOverflow. If you’re still stumped, you also have a lot more information you can put in your question, which will help all the eager question answerers out there who don’t have the ability to step through your code.

5. A question and answer site is not the place to learn the syntax of a language
If you code doesn’t even compile, then you don’t know enough to ask a question that is useful to either you or the other users of StackOverflow. Pick up a book and learn the basics.

Internet Exploder, I hate you so much

Yesterday was a fun day in the continuing struggle against IE brokenness.

First problem: the form submit button used to work on IE, but now it doesn’t. Well, no matter, because the form had an onsubmit that did some AJAXy stuff and then cancelled the form submit. Rather than wasting time trying to figure out why it works on real browsers and not on IE, I just changed the submit button into an ordinary button that invoked my function. Problem solved.

Second problem: My form is very dynamic, allowing you to add, delete or clone table rows, each of which contains several select, checkbox, and textarea input fields, all with associated onchange or onclick callbacks. The problem was that when you cloned a row, the callbacks on the new row would apply to the original row. All the callbacks had the row id in the arguments list, and when I clone I use the jquery attr command and a regular expression to change the row id. That works for real browsers, and it apparently works in IE (if you examine the code in Firebug you see the new id), but apparently the actual callback data is stored internally somewhere. It didn’t seem to matter whether I called clone with true or false in the copyData argument. So I restructured all my callbacks so the were activated by the jquery on command, and grabbed the row id and other arguments using the jQuery(this).parents('tr').

It was annoying to have to do all this stuff because IE is so different from real browsers, but the code is probably better for it.