Compare and contrast backup strategies

On the one hand, you have Jeff Atwood’s Coding Horror, a blog about programming read by thousands if not hundreds of thousands of people. And by the same guy, blog.stackoverflow.com. His backup strategy was to make copies of both blogs but leave them on his hosting site, and trust that when the ISP said they had it backed up, they really had it backed up. Of course, the ISP had some sort of hardware failure, and when they went to restore their backups, they found that they didn’t work. He’s now trying to reconstruct his articles (but of course not the comments, and some very few of the images that went along with them) from Google’s cache, the Wayback Machine, and the web caches of his readers.

On the other hand, you have this blog, which is about nothing in particular and read by probably 15 people tops. My backup strategy is this:

  1. Daily database dumps, copied to another file system on a different physical volume on the same box. That’s there mostly to quickly respond if I accidentally delete the database or an upgrade goes bad or something. If my blog got more traffic and more comments, I’d do those dumps more frequently.
  2. Another backup and a tar file just before I do an upgrade.
  3. Daily rsyncs back to my Linux server at home. I keep a week’s worth of those.
  4. Daily copies of that local copy to removable hard drives. I keep a month’s worth of those.
  5. Every week or so, I move one of those removable hard drives to a physically remote location.

And I did this when my blog was hosted on a VPS that the ISP claimed had some sort of backups and now when my blog is hosted on a 1u box that I bought on eBay and stuck in a local colo facility. As far as I’m concerned, you’re not backed up until the backup in your pocket.

Oh yeah, did I mention that some of those Coding Horror blog entries that went missing were about backups and how important they are?

I’m sorry, but the idiocy of this just leaves me shaking my head in wonder about why anybody ever believed anything he ever said about computers. On the other hand, it also makes me glad that I don’t have a huge audience hanging on my every word, because someday I might get something wrong (hey, I know, not likely, right?), and schadenfreude’s a bitch.

10 thoughts on “Compare and contrast backup strategies”

  1. My VPS backup strategy matches my home backup strategy; Sunday level 0, daily incrementals to a /BACKUP partition, and that partition is rsync’d to my home machine daily. And that is also rsync’d to a removable disk.

    So if the VPS explodes then I have lost at most 24 hours of data, typically less.

    This process proved itself last week when my desktop linux died (and I was travelling); I was able to use the backups to create a temporary virtual machine and so reclaim access to my mail archives, .newsrc etc etc

    (Of course, since I don’t run a blog the contents of my VPS are mostly static or recoverable from another source, and the web pages are mastered on my home machine and rsync’d to the webserver as I make changes. I also have a 2nd VPS at a different ISP with essentially mirrored content/DNS/SMTP etc so I have a warm standby).

    I do appreciate the irony of the Coding Horror site publishing “you must do good backups!” entries… and then failing to do it themselves. Heheheh.

  2. I have important stuff backed up. My blog doesn’t qualify. If it suddenly disappeared, it would make me sad, but I wouldn’t waste a weekend trying to restore it.

    I totally agree that backups should not be outsourced. If *you* don’t have it, then you don’t have it.

  3. Yeah, I stopped reading Coding Horror ages ago once I realised he generally doesn’t quite understand the topic he’s talking about, and so makes subtle mistakes rendering his entire post pointless. But I guess if you’re making buckets from ads being correct doesn’t really matter.

  4. Heh. Yeah, as someone who has lived in not one, but two apartment buildings that caught fire, I am pretty into the idea that your data has not been backed up until it’s been backed up in a couple of locations!

  5. My own data is backed up (suitably, with exponentially kept generations etc.) to a 4-disk RAID-1 on my backup server, one disk of which always is in the safe at the SOs parents’ place.

    Does that make me paranoid? 😉 Not really, because the SOs parents only live ~20km away, so an atomic bomb or similar would probably wipe out everything (but of course I wouldn’t care overmuch then).

    cheers,
    &rw

  6. So, you’re backing up flat files? These are easy to restore if you use relative paths (speaking of linux file systems)
    What about database? Are you certain you can restore these? Do you backup the associated binaries, too? A database created with software xyz version 3, updated to 4, updated to 5 can sometimes not be imported in a “clean” installed version 5.

  7. I backup the whole systems, incl. all binaries (and the backup software (BackupPC) sees to it that the same files are only stored once, which greatly saves on space requirements). Databases (the few there are) are regularly dumped, of course, and those dumps backed up, too.
    All I need for restores is a bog-standard Linux system to which I can attach at least one of the disks of the RAID-1, and yes, I’ve had the “pleasure” of testing this a couple times.

    cheers,
    &rw

  8. > and the backup software (BackupPC) sees to it that the same files are only stored once

    Thats what rsnapshot can do for you (if you are running something like linux), too, but Me, Myself and I would not consider this a “backup”, maybe someking of very entry-level version control but not a backup that can easily (!) restored (if the whole machine goes away).
    I never liked anything like an “incremental” backup. I can to it for money, of course, but at home I consider this stressful. Its “full backup or full risk” for me at home. Disk space is cheap now, but wan bandwidth is a different story.

  9. This is invaluable if you’re backing up multiple machines (I’ve got about 10 active and 5 of which I keep historical backups for entirely hysterical reasons).

    BackupPC works thus that it checks each file fully against the store (for full backups – incrementals rely on FS-level hints if something might’ve changed – which’s why I am vary of those), and if it’s changed, well, it goes into the store, if not, it’s just another hardlink to the original.
    With the hardlinks it creates a full “view” for each backup into the store, and you can use it just like you can with a normal rsync copy (plus pipe the whole backup, or a subset, into something like netcat or ssh, and un-tar it on a new disk; et voila, system restored).
    And for bandwidth, well, it can use rsync as transport and checker, so all’s well on that front, too – the last full backup of a typical remote server (~5GB) took ~2h, and transferred only ~200MB worth of actual data.

    TBH, I don’t see a downside compared to “do a simple full backup via rsync each night”, and full backups are very cheap to keep around that way, even for long periods of time. (And it has a nice web-interface, so my SO can damn well restore her own accidentally deleted files from last week without pestering me.)

    cheers,
    &rw

Comments are closed.