IT’S ALIVE!

The GoogleBox lives!


Yes, after 4 days of downtime, my illustrious yellow 1U server has been revived from the dead. After it died, I asked the colo provider to power cycle it, and they said it didn’t come back up. I asked them to yank the box, and I picked it up and brought it home. Expecting a power problem, I first tried yanking all the hard drives and memory, but even then didn’t get any beeps or other activity. So then, I tried yanking one of the CPUs. I must have gotten lucky, because removing the #2 CPU got me a couple of POST beeps, and when I put back the memory and the hard drives, it booted just fine.

I’ve had this box since January 2007, and the CPUs are dated from 2001, so I guess it’s time to replace it. I ordered a slightly newer box off eBay that has twice as much memory and 4 SATA drive bays. With two 1Tb drives, it will have much more disk space, but more importantly the empty drive bays mean that if I need to expand, I can add newer bigger drives when they become available. I’m considering using software RAID to mirror the two drives, because even 1Tb is bigger than what I have now and I’m not hurting for space. And with lvm, I can plonk in two new 2Tb drives when the time comes, migrate the volume groups to the new drives, and yank the old ones. All the remains now is to decide whether to build a new OS and get everything working on it, or just restore from a backup and continue the upgrade path.

While I had the box home for a few days, I took the opportunity to do a long-delayed upgrade from Debian “etch” to Debian “lenny”. I didn’t want to tackle that remotely because there was a significant chance (and it happened several times) that I was going to get it into the situation where I needed to intervene at the Grub stage, and I couldn’t do that remotely (because the cheap colo facilities don’t give you remote boot consoles like the expensive ones do.) The biggest hassle of this upgrade was that I had to do some messing around to get a console to appear, changing the boot options on the box itself, and also the getty lines in inittab of each of the guest “domU”s. The second biggest hassle was that I had to install “udev” on all the guests so that ssh could work. Also while they were home I took the opportunity to make a backup of the whole thing, including the guests that don’t belong to me. Normally I just back up my own. That should make setting up a new box a lot easier.

I got all the fixing and upgrading and backing up done early this morning. I brought my box over to the colo company office at 10:15. And waited. And waited. And waited. I had a “ping -a” running so I’d know as soon as it came back. And I waited some more. The business office is the other side of town from the colo facility, so I figured there would be some delay. The company that used to own those racks would let me go out to the facility with them, but these guys bought the business from that company and they’re anal about security and won’t let me go. Well, it turns out that their scheduled visit to the colo facility was at 10pm – nobody told me that, of course, until after I’d started panicing that they’d all gone home for the weekend without racking my box. But here it is – they racked the box, phoned me to say it was powering up, and now I’m connected again. Hallelujah!

One thought on “IT’S ALIVE!”

  1. Congrats, Paul, and as always many thanks for keeping this system running.

    “Cheap colo facilities don’t give you remote boot consoles like the expensive ones do” — at my job, we have a lot of machines in off-campus labs, and we can do almost all the management stuff (power cycle, install new operating system from local CD, adjust BIOS settings, etc) remotely. The only support we need from the lab facility is a second ethernet cable to the machine and second IP address.

    Looking at (for instance) Dell’s current offerings, we use the equivalent of “iDRAC 6 Enterprise” (not Express) to get a complete remote console. $350 is a lot of money, though, plus whatever colo fees you’d have to doubling the cabling and IP addresses.

Comments are closed.