Another fun day

My colo facility contacted me on Wednesday to say that this weekend they’d be moving my machine to a new rack, and also that they’d gotten a new IP range and I had to switch over to the new range soon, but they’d let me have both IPs for the switch over.

So today my system suddenly went off the air. I was sort of expecting it, but I didn’t see any shutdown messages because they just three-finger-saluted it. After a couple of hours, I phoned for an update, and was told that they’d just powered it up. But it still wasn’t responding to pings, until I mentioned to them that eth0 and eth1 are in the opposite order than what you’d expect.

Once it came up, I tried to configure an eth0:1 using the new IP. That actually seemed to work on the dom0, so then I tried to do it on my domU. That seemed to work too. I was able to ssh into both ips on both the dom0 and the domU. So I thought I’d swap the domU ips around, so the new one was eth0, and the old one was eth0:1, which would make it easier to get rid of the old one when I don’t need it any more. So I changed it in /etc/network/interfaces and rebooted.

But then suddenly things started going pear shaped. The domU was refusing to boot with an error about being unable to find /dev/hda1. On the dom0, “ifconfig” would just hang. And then it stopped responding at all. Now I was in full panic mode. I called Annexa and Dave called the guys who were doing the rack move and convinced them to go back to the facility. I met them there, and found that my poor box wasn’t even responding on the KVM. We power cycled it, and found that it wasn’t starting the domUs, and also that while it started up eth0 and eth0:1, it didn’t start the virtual bridge interfaces (peth0, vif0.0, vif7.0, vif8.0, vif9.0, xenbr0). That’s not good. It appears that Xen doesn’t like the extra interface or something. So I got rid of eth0:1, changed eth0 to the new IP, and rebooted. This time, it started up and so did the domUs.

I was still having a bit of problem with my personal domU – it didn’t want to resolve. Evidently somewhere along the way I’d decided to remove this program “resolvconf” that is supposed to maintain your name resolution for you, and when I did it had replaced my resolv.conf with one that looks like it was copied from my home machine. So I fixed that and things sort of worked, but in spite of the fact that I had the old IP on eth0:1 it wasn’t answering on it.

So it looks like I’m up and running, but I can’t use the old IPs. So you’re not going to see this until your DNS cache updates and you see the updates I made over at