I woke up this morning to find every screen logged into the server was showing “unable to contact the UPS” errors. One window was still responding a bit, and an “uptime” command showed the load average just a hair over 230, and rising.
After power cycling, I find the log shows that sometime around 2am, when the nightly cron jobs kick off, the second IDE controller started throwing errors again.
I’ve got to consider the following possibilities:
- The hardware just miraculously decided to fail when I upgraded.
- The hardware was always a little bit bad, but the 2.6 kernel notices the problem and the 2.4 kernel didn’t. OR
- There is nothing wrong with the hardware and it’s a fault in the kernel
Tonight I’m going to have to go offline again, while I try booting with a Knoppix CD with a 2.4 kernel to test the hardware again. If that works, then I’m going to try 2.6 with no smp, and with the infamous “noapic” flag (whatever the hell that means).