Oops

I’ve mentioned before that in order to help defray the costs of putting my stuff on a colo box, I partitioned the box in 3 Xen virtual machines, and rent two of them out. Well, yesterday one of the renters, Terry, asked for a bit of help with his Apache set up. Not knowing his root password, I mounted his hard drive in the “dom0” Xen controller, using “mount /dev/xen-space/xen2-disk /mnt” and started poking around. Well, evidently that managed to confuse ext2 because a few hours later he emailed me to say that his disk had gone “read-only”, and when he tried to reboot it didn’t come up.

Looking at my munin graphs, it appears that when he rebooted, it took down the whole box. I had to email the owner of the rack to power cycle my box, which he can do remotely. When it came back, 2 of the 3 virtual machines came up fine, but Terry’s was asking for a root password to run fsck. I shut down his virtual machine and did a fsck from within the dom0, and it found several things out of whack. But after those were fixed, I was able to restart Terry’s virtual machine.

So lesson learned. I’m not sure if things would have been happier if I’d mounted it read-only, but in the future if I need to mount one of the partitions in /dev/xen-space I’ll shut down the xen virtual machine instance first.

3 thoughts on “Oops”

  1. If you had the partition mounted read/write and he had the partition read/write then there will be conflicting block updates happening (if nothing else, last access times) and the two different kernels will have different information in their buffer caches. Not a good state of affairs 🙂

    If you mounted it read-only then your reads wouldn’t break the file system, but if his instance was modifying data at the time then you might have had a corrupted view when blocks have changed without your kernel’s knowledge.

    Basically, don’t do that 🙂

  2. Stephen, I guess I expected that the dom0 kernel would have to mediate all the access to the disk, and so it wouldn’t get confused about the state of the disk even if the domU was writing to it. But evidently that was a bad assumption.

  3. Sure, the basic disk block operations will all go through the dom0, but various forms of filesystem metadata — which files are stored where, which blocks are available for allocation, and so on — are separate, and will tend to be cached in memory because, well, nothing else is supposed to be changing them on disk.

    I’m surprised it even let you mount the filesystem, though; IIRC, on BSD the virtual disk backend would mark the backing device as busy. (Which, of course, serves only to give a false sense of security which is later cruelly shattered when the disk in question is on a multi-initiatior SCSI bus, but.)

Comments are closed.