Long distance debugging

Reconstructing Spirit’s hopeful road to recovery

Interesting article about trying to fix Spirit.

One of the interesting things in my mind is that they think the problem stems from having too many files in flash memory. It seems that even they have the same problem that every other software project I’ve been on has – not doing a full end-to-end QA cycle. I’m sure they just do a few tests, then reboot with a new setup to run a few tests, and so on.

One of the things I like about the project I’m on now is that we have enough QA machinery to run full tests of all sorts of supported hardware. It’s amazing what the QA people find sometimes – and sometimes it’s stuff that would never have been found in testing until the test had been running for a couple of days. In one case, one of the bugs they found was due to the change over from EDT to EST. Fortunately, no customers were affected, and I got it fixed in time for the change from EST to EDT.

One thought on “Long distance debugging”

  1. I am surprised that the loss of one filesystem could cause the spacecraft to fall over completely. What piss-poor fault-intolerant design this is!

Comments are closed.