For reasons I don’t remember, I did a “mdadm –detail /dev/md0” on my home Linux server and noticed that the RAID was busy quietly rebuilding itself. That prompted me to try the same command on my dom0 on my colo box, and what I discovered there was even worse – the second disk on my RAID-1 (mirror) was marked as a “spare” and some other status that indicated that it wasn’t rebuilding, and the mirror disk was marked as missing.
I removed the second disk from the RAID and re-added it, and it went to the status “spare, rebuilding” and the RAID status was “active, degraded, rebuilding”, and some hours later it was back up and happy.
During that time, I discovered that there had been a few emails about SMARTD problems and RAID problems, but because I had set up exim wrong, they weren’t getting delivered. I tried a few things to get exim set up, and then when they didn’t work I decided that since I know how to set up postfix just fine, I uninstalled exim, installed postfix, and got it configured in less time than it took for the RAID to rebuild.
The fact that the RAID degraded in the first place gives me pause, but the fact that I was able to recover it without any downtime makes me happy that I choose to do a RAID in the first place. I’ll keep and eye on it and maybe order a replacement disk or two so I’m ready if something fails again.