Upgrade still not going well…

I woke up this morning to find every screen logged into the server was showing “unable to contact the UPS” errors. One window was still responding a bit, and an “uptime” command showed the load average just a hair over 230, and rising.

After power cycling, I find the log shows that sometime around 2am, when the nightly cron jobs kick off, the second IDE controller started throwing errors again.

I’ve got to consider the following possibilities:

  • The hardware just miraculously decided to fail when I upgraded.
  • The hardware was always a little bit bad, but the 2.6 kernel notices the problem and the 2.4 kernel didn’t. OR
  • There is nothing wrong with the hardware and it’s a fault in the kernel

Tonight I’m going to have to go offline again, while I try booting with a Knoppix CD with a 2.4 kernel to test the hardware again. If that works, then I’m going to try 2.6 with no smp, and with the infamous “noapic” flag (whatever the hell that means).

My car, again.

As I wrote about in My car, the mother, my car recently required a huge amount of service because it was leaking oil. Well, it’s been about 1500 miles since then, and last night after 5 hours on the highway, I got a “oil pressure” warning light when turning at a traffic light. Oh oh. This morning it was making that clattering noise again. Oh oh. I bought some oil, and threw in two quarts. Then I changed, and it was still showing a quart down, so I threw in another. Un-fucking-believeable – they supposedly *fixed* the oil leak, and it went through 3 quarts in 500 miles?

Even my VW Beetle wasn’t that bad when it had over 150,000 miles on it.

What I’m wasting my day on today…

This happens on RedHat 7.3 – haven’t tried on something more recent.

Assume you have a machine where root can rsh to localhost (yeah, I know, but the machine isn’t on a network where there are any users, so it’s not as bad as it could be.)


rsh localhost "/etc/init.d/snmpd restart; echo 'DONE'"

will echo the “DONE” but never return unless you hit ^C twice.


rsh localhost "/etc/init.d/snmpd restart

works as expected.

Now take a script that does rsh'es to a bunch of machines and runs apt-get on them (as well as on the local machine) and does various configuration tasks on both the local and the other systems, including restarting services. See script run. Now, take the entire script, and
put a


{
} 2>&1 | tee -a /var/log/upgrade.log

around the whole thing, and suddenly it never finishes.

See Paul waste his whole day trying different variations, each time requiring 45 minutes to put all the machines back to the version 3.3 configuration, and at least 20 minutes for the script to run. Can you say “bored and frustrated”, ladies and gentlemen?

And to top it all off, there’s an AIRMET for icing all along the route to Ottawa, so I won’t be flying after all.