- Discovered that one of my hard disks was flakey and returned it. That’s probably why all my previous attempts to set this up failed.
- Removed the daughter card RAID controller. The built-in RAID controller still sees the disks, but reports them at a JBOD (Just a Box Of Disks).
- Started a new Debian installation.
- Set up the both whole disks as the software RAID1 (instead of just a partition on each disk like I did last time).
- Make the whole RAID (md0) into a physical volume (xen-space) for the LVM.
- Created a 4Gb root partition and a 1Gb swap partition as logical volumes on the physical volume.
- Did a base install. Noted that because I used software RAID on the whole thing, it uses LILO instead of Grub. Oh well, you can’t have everything.
- Rebooted and the BIOS only saw one of the two disks.
- Fiddled with the disk sled, rebooted, and this time it saw both.
- Evidently the first boot without the second disk caused the raid to degrade, so re-added the disk
mdadm /dev/md0 --add /dev/sdb1
and now it appears to be rebuilding.
Day 2:
- Installed smartmontools, and enabled in /etc/defaults/smartmontools. Express slight concern that /dev/sda has an exit status of 64 because of some error in the log, probably due to the late unpleasantness. Will have to figure out how to clear that.
- Installed munin-node and munin-plugin-extras, and copied the configuration from my backup from the last time
- Installed openssh-server (unselect xauth which gets added automatically because it drags in a ton of X11 libraries). Copied /etc/ssh/sshd_config and /root/.ssh directories from backup.
Day 3:
- Installed xen-utils. Holy shit that dragged in a lot of dependencies, and it said it had to “reinstall” 200+ packages for some damn reason. But then it gave an error, and when it came back it didn’t have to reinstall them after all. Very odd.
- Didn’t see any xen in /etc/lilo.conf, so installed linux-image-2.6-xen-amd64. (Had originally thought that installing xen-utils would do that, I thought it did last time.)
- Lilo complains that /vmlinuz is too big. According to the docs, lilo and xen don’t play together well, and grub has trouble with /dev/md0 software raid. I think I may have to go back to the drawing board, either re-installing the raid card, or going back to the primary boot partition and putting the software raid on the rest of the disk. Or maybe I can figure out how to get grub working. Once again I’m reminded of “Three Dead Trolls In a Baggie” singing “yeah, but I’ve got a girl friend and things to get done”.
Day 4:
- Reinstalled the Adaptec RAID card, and set up a hardware RAID-1
- Partitioned the “drive” with three partitions, one 4G ext3 for /, one 1G swap, and the rest as a physical volume for a lvm.
- Installed on /, and when it went to reboot it got to “shutting down md0” and then hung. Will have to check that again. But at least it installed Grub instead of LILO.
- After it booted, tried the “reboot” command and it worked! Yay!
- Installed smartmontools, but discovered (once again) that it doesn’t work with the raid controller, so uninstalled it. I need to find if there is some other way to monitor the raid controller. I think I tried the dpt_i20 thing before and it didn’t work.
Day 5:
- Installed sshd, copied the configuration from the backup to only allow public key logins. (Bite it, password guessers)
- Installed munin-node
- Installed linux-image-2.6-xen-amd64 and xen-hypervisor-3.2-1-amd64
- Rebooted and the damn thing spewed tons of errors and hung. Tried to reboot with the old kernel (that worked before) and I got the same errors. I guess it’s time to give up on that hardware RAID again.
Day 6
- Ran the disk “verify” tool in the raid card, and it didn’t find any errors.
- Anything I tried to boot the system (the original kernel that worked before, single user mode) still failed in aacraid.
- Ripped out the raid card again, and installed with /, /boot, /var and swap as primary partitions, and the rest of the space on both drives as a software RAID-1 used as a physical volume for LVM.
- Install openssh-server (and unselect xauth). Copy /etc/ssh/sshd_config and /root/.ssh from backup.
- Install smartmontools and enable it in /etc/default/smartmontoolsctl.
- Install munin-node.
- Rebooted to make sure everything starts correctly.
- Installed linux-image-2.6-xen-amd64 and xen-hypervisor-3.2-1-amd64
- Reboot again.
- Ok, it booted, but “xm list” isn’t up.
- Manually start xend and “xm list” is working.
- Rebooted, and this time “xm list” is working.
- Started to create the lvm logical volumes for the domUs
Day 7:
- Discovered that when I backed up the last nearly successful domU, I forgot to back up the boot partition, so I’m on my own for the grub configuration.
- Untarred my backups of the “xen2” and “xen3” domUs. Got a bunch of kernel messages about kjournald being blocked for more than X number of seconds while that was going on – I assume that’s because I was running up load averages in 7 and 8 range in the dom0, which is probably not a normal thing. I hope that just because things weren’t written to the journal immediately that doesn’t mean they were written wrong, only that I might have been in danger if things had died in the middle.
- Installed rsync so I can restore my backup of the “xen1” domU.
- Installed vim and removed vim-tiny
- Restored backup with
rsync --delete -aSurvx --numeric-ids /mnt/usb0/xen1/Sun/ /mnt/xen1/
- Copy the amd64 kernel modules to the domU’s /lib/modules.
cp -rp /lib/modules/2.6.26-2-xen-amd64 /mnt/xen1/lib/modules
Must remember to exclude /lib/modules when I do any final rsyncing from the live domUs. - DAMMIT! It appears that I made /var too small again. Once it saves /var/lib/xen/save in it, the file system is full. Need to move things around again.
- Booted into rescue mode, and moved things around. Everything seems to work now.
- Try to rsync some newer backups.
Further updates as things progress.
If you can afford to reinstall your guests, KVM is much easier to manage.
But I’m used to the redhat-way probably way too much, and I have trouble finding things in Novells SLES and never really tried Debian.
How goes it?