Time to worry, or just a glitch

I have mail logs going back to 24 December 2006. Recently, I noticed that every now and then one of my postfix processes will die with a “SEGV” (the dreaded Segmentation Violation). They appear at odd times in my logs, starting January 1, and continuing on the 7th, 8th, 13th, 14th, 15th, 16th, 17th (x2), 19th, 20th (x2), 23rd and 24th. It’s different processes each time, and each time it gives a warning about having some difficulty starting the replacement process (although the mail delivery continues, so I assume it starts up immediately after). I don’t see the same sorts of errors on my colo box, which uses an older version of postfix similarly configured.

I asked on the postfix-users mailing list, and got the totally unhelpful answer that it was something wrong in my config files – obviously wrong because it doesn’t happen if I start and stop postfix. And another person said my system memory probably was going. Well, that’s possible – the system is about 6 years old. It uses PC133 Registered RAM, which is still expensive – replacing the 1Gb I’ve got now would cost around $90, or about the same cost as 1Gb of the newest PC5400 RAM.

This machine is old for a server, and certainly the technology has passed this box by – it has AGP, not PCI Express, it has USB 1 (although I put in a PCI USB 2.0 card so I could use external hard drives), it has IDE instead of SATA, it refuses to boot without a PS/2 keyboard in spite of the fact that it’s perfectly happy with a USB keyboard after it’s booted. But on the other hand, it’s perfectly fast for what it does, and I’ve got three IDE hard drives and a 16x dual layer DVD burner in there and everything is just working the way I want it to. The only complaint I have is that it’s cranky – if I add new disks, I’ll often have to reboot three or four times before the bios will recognize them, and most times it won’t boot from a power on – I have to boot, wait for it to complain that there aren’t any hard drives in it, and then control-alt-delete it.

I don’t want a new server – if I were buying a new desktop now, it would be something to run a certain MMORPG faster. Maybe a Mac Pro with Boot Camp. But I want this server to continue to serve.

Lets hope this is just a little glitch.

What is Hitachi thinking?

For years now, whenever I’ve had drive or controller problems, I’ve hauled out IBM’s DFT (Drive Fitness Test), even if the drive isn’t a DeathstarDeskstar. Now IBM’s drive division belongs to Hitachi, but DFT lives on. I used it last week to make sure my new colo box could handle the sorts of loads I wanted to put on it. But now that I have my old colo box back, I want to test it to see if the problems I was having might be fixed with a new drive cable before I sell it on eBay.

But this box doesn’t have a floppy. No problem, I thought, the Hitachi site has a bootable CD version. So I downloaded it and burned it and booted with it. But the first thing it does it scan the IDE controllers, and when it’s scanning “Secondary Slave”, it suddenly starts spewing errors about being unable to read A:\COMMAND.COM. Evidently DFT needs to read its own disk just at the moment that the drive was disconnected for scanning. So when they made the CD ISO, they didn’t actually test it, or didn’t think about how it works, and instead of using the “Linux Live CD” model where they make a ramdisk and load themselves into it, they just made a DOS boot partition on the CD and expect it to be there all the time.

I guess it’s off to my junk shelf to see if I have a floppy drive and cable.

That was easy

I needed to re-arrange some disk space. I explained the situation in Rants and Revelations » Why didn’t I use LVM on everything? with a table showing the current layout and everything. At the time, my plan was:

  1. Migrate the content of /dev/hdc3 off using “pvmove” and “vgreduce”.
  2. Delete all three partitions on /dev/hdc3 and add it back to the vg using “pvcreate /dev/hdc; vgextend xen-space /dev/hdc”.
  3. Migrate the content of /dev/hde2 off using “pvmove” and “vgreduce”.
  4. Delete the /dev/hde2 partition and increase the disk of /dev/hde1 to fill up the drive, and use resize2fs to make /dev/hde1 use the whole partition.

I did steps 1-3, and it all worked perfectly. I didn’t have to shut down anything, and it didn’t interrupt the normal operation of either the dom0 or the domUs. But when I’d done that, I realized I actually had enough free space on the lv that I could do an even better plan:

  1. Set up a 250Gb lv.
  2. Use rsync to copy everything from /dev/hde1 to the lv.
  3. Once that was done, shut down domU 1.
  4. Make /dev/hde1 part of the lv.
  5. Make the 250Gb lv bigger using lvextend– I chose to add 100Gb to it, and I have space to add more if I need it.
  6. e2fsck -f” and “resize2fs” the lv.
  7. Restart the domU 1, using the lv instead of /dev/hde1.

This worked perfectly. The domU was down about 10-15 minutes tops. /dev/hde is still partitioned into two partitions, even though both partitions are part of the same vg. But other than that, it’s exactly what I’d have done if I were setting it up from scratch now.