Backing up

Unlike some minor internet celebrities, like one of the founders of StackOverflow, I am a big believer in making sure my data is backed up locally (for quick retrieval) and offsite in case of disaster. For the Mac, this is dead easy – I have a TimeMachine disk attached to my iMac, and I back it up offsite using BackBlaze Cloud Backup – $50 a year, and it’s unlimited so it even backs up my external drive full of my iMovie and Final Cut Pro projects. I also use Syncthing to replicate several important directories between my iMac and my Linux computer. Unlike Dropbox or Box or any of the other cloud replication services, Syncthing is free and doesn’t send your data to any third party.

But I also want the “two backups” system for the Linux computer here in my office, and also my virtual private server box that I have on Linode. And unfortunately Backblaze doesn’t support Linux yet. So for several years now, I’ve been doing a local hourly backup on my local computer to an external USB drive. My strategy for that is:

  • Make snapshots of every partition on the box using
    lvcreate -L 10G -s -n ${dir}-snapshot /dev/ssd/${dir}
    (I wish there was a way to snapshot them all at once to get a really consistent view of the file system, but I haven’t found a way.)
  • Mount them all under a root snapshot so I’ve basically got a static view of the system.
  • Back that up to the external drive using rsync. I use “--link-dest” so it really only backs up the files that have changed and makes hard links to the ones that haven’t, so it’s pretty fast and doesn’t take a lot of disk space to keep weeks of backups.
    rsync -vaSH --delete --exclude .gvfs /mnt/snapshot/ --link-dest=$PREV/ $DEST/allhats2/hour$HOUR/
  • unmount and lvremove the snapshots file systems.

As well, once a day I backup one of these snapshots to a third Linux system I haven’t mentioned yet. A while ago I discovered a company called Kimsufi that was advertising VPSes with Atom processors in Ireland or France (I’m actually not 100% sure which). Nothing too fancy in the CPU or RAM department, but with 1TB of disk space for 10 Euros a month. Not as cheap as Backblaze, but better than losing everything if Linode burns down. So once a day, I copy one of my hourly backups (still using rsync with the --link-dest option so multiple daily backups don’t take up much space, and I also backup my Linode the same way (except there’s no way to take a lvm snapshot inside the VPS so it’s just a simple rsync --link-dest of everything.

That’s been kicking along nicely, and it’s a good backup. But it’s brittle – if an hourly backup fails part way through and try to do an --link-dest the next hour using that one, I can end up with two full copies of the backup with no hard links. And there are other problems as well. Plus keeping the Kimsufi VPS just to back up to seems like a waste.

A while ago, a friend recommended Borg Backup as a better solution for regular backups than trying to roll my own. It also encrypts and compresses the backups. So I set up a parallel hourly and daily backup using Borg. The offsite one is still going to the Kimsufi VPS, but it is faster than rsync. So I was just about to pull the plug on the rsync backups and go to an all-Borg solution when that same friend then recommended Restic. Now Restic resembles Borg in a lot of ways, but it’s not very well documented. But one thing it does have is that it can backup to Amazon S3 and Backblaze B2 cloud storage. Doing some quick calculations with the B2 calculator, I think I can do my backups for way cheaper this way than maintaining a VPS in Ireland – like on the order of a dollar or two a month, rather than 10 Euros. The only problem is getting it to work.

And here’s the problem. Doing hourly restic backups to my local machine are pretty quick – about the same time as borg takes. The only problem comes when I want to delete yesterday’s hourly backups (or rather apply the rules to keep 24 hourlys, 8 dailys, 4 weeklies, and 12 monthlies). With borg, I apply these rules every hour when I back up because they’re quite fast. But they’re so damn slow with restic that had to start only doing it once per day – it takes just a few seconds to run the “forget” command to apply the rules and have it forget about the snapshots that it doesn’t want, but to run the “prune” command that actually reclaims that space, it takes over 30 minutes! And that’s on a local file system. I’m afraid to see how long it’s going to take on the B2 file system, and whether it bumps up the storage costs terribly.

I also had a bit of a teething problem with my first backup to B2 – I was doing it over a network link, and I forgot to do it in a tmux session, so when the network connection got interrupted, it left my first backup in a strange state. I did another full backup the next day, and then I was using twice the required storage on the B2 bucket. I did a restic prune which reclaimed the space, but it took 35 hours to do it. That’s not going to be useful. I need to do a couple of non-failing B2 backups and see how long prune takes in those cases – but if it’s going to take hours, I’m going to ditch restic and go back to borg to the VPS.

Oh, another thing I should mention about restic – It puts a ton of files in ~/.cache. Since I was backing up from the root account, I ended up having to resize my root partition from 4GB to 14GB just to accommodate all the cache files. Very annoying. Borg has 259MB of .cache, and restic has 7.4GB.

AWS Training

So over the Thanksgiving week, Udemy had a sale on video courses. Since my job search is going so slowly, I thought I’d maybe occupy the time by doing some of these courses, and I decided to start with the Amazon Web Services (AWS) certification training, starting with the AWS Certified Developer – Associate. Here are some impressions I’ve had after watching 10+ hours of course video:

  • The AWS offerings change so fast that it’s really easy for the course to fall behind reality. That might be one reason why they were selling the courses so cheap
  • AWS itself is very inconsistent in the way the UI for each individual web service is structured in the console. Some of them are very “here’s a list of options, pick one and configure it and launch it” and others are “here is a wizard to walk you through a bunch of steps that hopefully will lead to something you can launch”. It’s hard to describe exactly what I mean by that. That’s probably a result of how fast things are changing. Unfortunately, sometimes the course module was made when the console was the first way, but now it’s the second way and you basically have to watch half the video then try and remember everything he did on the first screen so you can do it on the Nth screen instead.
  • A couple of times, things in the AWS console have just failed. Sometimes it’s failed silently with no indication of why or even that it saw the “Create [blah]” button press. Other times it’s given very cryptic messages like “You have requested more instances (1) than your current instance limit of 0 allows”. (In that case, changing an obscure parameter from t2.small to t1.micro was the solution). The silent failure happened mostly in the Cloud Formation module when I was attempting to create a “Stack”, but after it appeared to fail silently (and nothing was shown in the list of stacks), and I tried to create it again, it complained that there was already a stack of that name and suddenly it’s there in the list of stacks again.
  • Other than the way the video is out of date in some place, my main complaint is that he is obviously reusing modules between the AWS Solutions Architect – Associate course and the AWS Developer – Associate course and so he’ll say “for this module, just use the S3 bucket named [blah] that you created in the last module” when you didn’t create anything of the sort. So then you have to hurriedly pause the video and create an S3 bucket and hope he didn’t expect any special permissions on it or any content in it.
  • A secondary complaint about that is that he never tells you when you’re done with a resource. I try to clean up all the S3 buckets and EC2 instance and whatever when it appears we’re done with them. I occasionally guess wrong. I wish at the end of a module he’d say “OK, we’re done with that bucket, feel free to delete it.” Sometimes he does, but mostly he doesn’t. I wonder if that’s an artifact of the fact that he’s mixing and matching modules? I’m probably over paranoid about leaving stuff around and getting charged for it, although when I started doing this course I discovered that a few years back I’d created an EC2 instance, stopped it, but never terminated it, so I guess their “free threshold” is high enough that I’m unlikely to hit it.