One of the worst tasks I’ve had at this job is working on the automatic upgrader. I hate doing it, because it’s not so much “programming” as it’s “cobbling together a bunch of system administration stuff”. I got it working as well as I can, but there are some various flakey problems in the way RedHat/CentOS works, as well as some dodgy Dell hardware, that I can’t make it work 100% of the time. I’ve written about it before. I get called in whenever something fails to try and forensically engineer what went wrong. Today’s fuckup was very similar to the one in that linked article – somebody started the upgrade before they went home at night, and somebody else came in in the morning and started it again. That left some things half installed and half upgraded, and some of the “cp” machines decided that they were being “plex built” (built from scratch in the manufacturing area) rather than upgraded, so they all made themselves into FRU (field replacement units) and shut down. Of course it took me nearly an hour to figure out what the idiots had done and how to fix it. And the upshot is that because these machines are now “bare” and physically powered down, somebody has to go out to the site and set them up. Oh, did I mention that the fuckup also caused all copies of the saved configuration for the entire site to be lost?