Just how fucked am I?

Unpacking linux-image-2.6.18-4-686 (from …/linux-image-2.6.18-4-686_2.6.18.dfsg.1-11_i386.deb) …
Done.

My colo box consists of xen.xcski.com, the dom0 which controls the others, and then xen1, xen2 and xen3 which are the domUs. Because it was way easier to do it this way, the dom0 is running Debian “etch” (aka “testing”), while the domUs are running Debian “sarge” (aka “stable”). The problem with using “testing” is that there are frequent updates, way more frequent than with “stable”). The problem with remote updates is if something fucks up, there isn’t any easy way to fix it. Usually that’s not a problem.

Today’s upgrades include a new xen kernel. But it says it’s installing a new kernel, leaving the existing one there. So it shouldn’t be a problem, right? Well, I was wrong. It downloaded the upgrades, then got to the “unpacking” stage and hung. I can’t ssh to the dom0. I can’t kill the upgrade. It’s not responding to the munin probes. The only thing I can think of is doing a power cycle and maybe scheduling a site visit. But the domUs are running fine. So why would I do anything drastic while the real meat of the colo box is still going fine?

I don’t know what to do. Wait and see, I guess.

Another fun day

My colo facility contacted me on Wednesday to say that this weekend they’d be moving my machine to a new rack, and also that they’d gotten a new IP range and I had to switch over to the new range soon, but they’d let me have both IPs for the switch over.

So today my system suddenly went off the air. I was sort of expecting it, but I didn’t see any shutdown messages because they just three-finger-saluted it. After a couple of hours, I phoned for an update, and was told that they’d just powered it up. But it still wasn’t responding to pings, until I mentioned to them that eth0 and eth1 are in the opposite order than what you’d expect.

Once it came up, I tried to configure an eth0:1 using the new IP. That actually seemed to work on the dom0, so then I tried to do it on my domU. That seemed to work too. I was able to ssh into both ips on both the dom0 and the domU. So I thought I’d swap the domU ips around, so the new one was eth0, and the old one was eth0:1, which would make it easier to get rid of the old one when I don’t need it any more. So I changed it in /etc/network/interfaces and rebooted.

But then suddenly things started going pear shaped. The domU was refusing to boot with an error about being unable to find /dev/hda1. On the dom0, “ifconfig” would just hang. And then it stopped responding at all. Now I was in full panic mode. I called Annexa and Dave called the guys who were doing the rack move and convinced them to go back to the facility. I met them there, and found that my poor box wasn’t even responding on the KVM. We power cycled it, and found that it wasn’t starting the domUs, and also that while it started up eth0 and eth0:1, it didn’t start the virtual bridge interfaces (peth0, vif0.0, vif7.0, vif8.0, vif9.0, xenbr0). That’s not good. It appears that Xen doesn’t like the extra interface or something. So I got rid of eth0:1, changed eth0 to the new IP, and rebooted. This time, it started up and so did the domUs.

I was still having a bit of problem with my personal domU – it didn’t want to resolve. Evidently somewhere along the way I’d decided to remove this program “resolvconf” that is supposed to maintain your name resolution for you, and when I did it had replaced my resolv.conf with one that looks like it was copied from my home machine. So I fixed that and things sort of worked, but in spite of the fact that I had the old IP on eth0:1 it wasn’t answering on it.

So it looks like I’m up and running, but I can’t use the old IPs. So you’re not going to see this until your DNS cache updates and you see the updates I made over at zoneedit.com.

USB speed confusion

Every time my external USB disk disconnects itself (like it did after a short power glitch on Wednesday), I have to google the kernel message to see if it remounted with USB 2.0 speed or the slower USB 1.1 speed. I just can’t seem to keep straight in my head whether “USB full speed” or “USB high speed” is the good one.

Note to self: it’s “USB high speed”.

I think.

Dear GoDaddy

I set up “automatic domain renewal” so that I wouldn’t have to take any action, or indeed have to think about it, when one of my domains comes up for renewal. So why do you send me four identical emails within 10 minutes telling me that one of my domains is coming up for renewal? I don’t need to know, that’s why I told you to take care of it! Even one message would have been more than sufficient. But do you really need to blast one to the technical contact, one to the billing contact, one to the registrant and one to the email address on the account? What the fuck is the purpose of having a separate “billing contact” if you’re going to write every email possibly associated with the account about a billing issue. It’s called “billing contact” for a reason, fuckwads.