Backing up

Unlike some minor internet celebrities, like one of the founders of StackOverflow, I am a big believer in making sure my data is backed up locally (for quick retrieval) and offsite in case of disaster. For the Mac, this is dead easy – I have a TimeMachine disk attached to my iMac, and I back it up offsite using BackBlaze Cloud Backup – $50 a year, and it’s unlimited so it even backs up my external drive full of my iMovie and Final Cut Pro projects. I also use Syncthing to replicate several important directories between my iMac and my Linux computer. Unlike Dropbox or Box or any of the other cloud replication services, Syncthing is free and doesn’t send your data to any third party.

But I also want the “two backups” system for the Linux computer here in my office, and also my virtual private server box that I have on Linode. And unfortunately Backblaze doesn’t support Linux yet. So for several years now, I’ve been doing a local hourly backup on my local computer to an external USB drive. My strategy for that is:

  • Make snapshots of every partition on the box using
    lvcreate -L 10G -s -n ${dir}-snapshot /dev/ssd/${dir}
    (I wish there was a way to snapshot them all at once to get a really consistent view of the file system, but I haven’t found a way.)
  • Mount them all under a root snapshot so I’ve basically got a static view of the system.
  • Back that up to the external drive using rsync. I use “--link-dest” so it really only backs up the files that have changed and makes hard links to the ones that haven’t, so it’s pretty fast and doesn’t take a lot of disk space to keep weeks of backups.
    rsync -vaSH --delete --exclude .gvfs /mnt/snapshot/ --link-dest=$PREV/ $DEST/allhats2/hour$HOUR/
  • unmount and lvremove the snapshots file systems.

As well, once a day I backup one of these snapshots to a third Linux system I haven’t mentioned yet. A while ago I discovered a company called Kimsufi that was advertising VPSes with Atom processors in Ireland or France (I’m actually not 100% sure which). Nothing too fancy in the CPU or RAM department, but with 1TB of disk space for 10 Euros a month. Not as cheap as Backblaze, but better than losing everything if Linode burns down. So once a day, I copy one of my hourly backups (still using rsync with the --link-dest option so multiple daily backups don’t take up much space, and I also backup my Linode the same way (except there’s no way to take a lvm snapshot inside the VPS so it’s just a simple rsync --link-dest of everything.

That’s been kicking along nicely, and it’s a good backup. But it’s brittle – if an hourly backup fails part way through and try to do an --link-dest the next hour using that one, I can end up with two full copies of the backup with no hard links. And there are other problems as well. Plus keeping the Kimsufi VPS just to back up to seems like a waste.

A while ago, a friend recommended Borg Backup as a better solution for regular backups than trying to roll my own. It also encrypts and compresses the backups. So I set up a parallel hourly and daily backup using Borg. The offsite one is still going to the Kimsufi VPS, but it is faster than rsync. So I was just about to pull the plug on the rsync backups and go to an all-Borg solution when that same friend then recommended Restic. Now Restic resembles Borg in a lot of ways, but it’s not very well documented. But one thing it does have is that it can backup to Amazon S3 and Backblaze B2 cloud storage. Doing some quick calculations with the B2 calculator, I think I can do my backups for way cheaper this way than maintaining a VPS in Ireland – like on the order of a dollar or two a month, rather than 10 Euros. The only problem is getting it to work.

And here’s the problem. Doing hourly restic backups to my local machine are pretty quick – about the same time as borg takes. The only problem comes when I want to delete yesterday’s hourly backups (or rather apply the rules to keep 24 hourlys, 8 dailys, 4 weeklies, and 12 monthlies). With borg, I apply these rules every hour when I back up because they’re quite fast. But they’re so damn slow with restic that had to start only doing it once per day – it takes just a few seconds to run the “forget” command to apply the rules and have it forget about the snapshots that it doesn’t want, but to run the “prune” command that actually reclaims that space, it takes over 30 minutes! And that’s on a local file system. I’m afraid to see how long it’s going to take on the B2 file system, and whether it bumps up the storage costs terribly.

I also had a bit of a teething problem with my first backup to B2 – I was doing it over a network link, and I forgot to do it in a tmux session, so when the network connection got interrupted, it left my first backup in a strange state. I did another full backup the next day, and then I was using twice the required storage on the B2 bucket. I did a restic prune which reclaimed the space, but it took 35 hours to do it. That’s not going to be useful. I need to do a couple of non-failing B2 backups and see how long prune takes in those cases – but if it’s going to take hours, I’m going to ditch restic and go back to borg to the VPS.

Oh, another thing I should mention about restic – It puts a ton of files in ~/.cache. Since I was backing up from the root account, I ended up having to resize my root partition from 4GB to 14GB just to accommodate all the cache files. Very annoying. Borg has 259MB of .cache, and restic has 7.4GB.

AWS Training

So over the Thanksgiving week, Udemy had a sale on video courses. Since my job search is going so slowly, I thought I’d maybe occupy the time by doing some of these courses, and I decided to start with the Amazon Web Services (AWS) certification training, starting with the AWS Certified Developer – Associate. Here are some impressions I’ve had after watching 10+ hours of course video:

  • The AWS offerings change so fast that it’s really easy for the course to fall behind reality. That might be one reason why they were selling the courses so cheap
  • AWS itself is very inconsistent in the way the UI for each individual web service is structured in the console. Some of them are very “here’s a list of options, pick one and configure it and launch it” and others are “here is a wizard to walk you through a bunch of steps that hopefully will lead to something you can launch”. It’s hard to describe exactly what I mean by that. That’s probably a result of how fast things are changing. Unfortunately, sometimes the course module was made when the console was the first way, but now it’s the second way and you basically have to watch half the video then try and remember everything he did on the first screen so you can do it on the Nth screen instead.
  • A couple of times, things in the AWS console have just failed. Sometimes it’s failed silently with no indication of why or even that it saw the “Create [blah]” button press. Other times it’s given very cryptic messages like “You have requested more instances (1) than your current instance limit of 0 allows”. (In that case, changing an obscure parameter from t2.small to t1.micro was the solution). The silent failure happened mostly in the Cloud Formation module when I was attempting to create a “Stack”, but after it appeared to fail silently (and nothing was shown in the list of stacks), and I tried to create it again, it complained that there was already a stack of that name and suddenly it’s there in the list of stacks again.
  • Other than the way the video is out of date in some place, my main complaint is that he is obviously reusing modules between the AWS Solutions Architect – Associate course and the AWS Developer – Associate course and so he’ll say “for this module, just use the S3 bucket named [blah] that you created in the last module” when you didn’t create anything of the sort. So then you have to hurriedly pause the video and create an S3 bucket and hope he didn’t expect any special permissions on it or any content in it.
  • A secondary complaint about that is that he never tells you when you’re done with a resource. I try to clean up all the S3 buckets and EC2 instance and whatever when it appears we’re done with them. I occasionally guess wrong. I wish at the end of a module he’d say “OK, we’re done with that bucket, feel free to delete it.” Sometimes he does, but mostly he doesn’t. I wonder if that’s an artifact of the fact that he’s mixing and matching modules? I’m probably over paranoid about leaving stuff around and getting charged for it, although when I started doing this course I discovered that a few years back I’d created an EC2 instance, stopped it, but never terminated it, so I guess their “free threshold” is high enough that I’m unlikely to hit it.

Some random thoughts on naming conventions

Something recently made me think about product naming conventions. It seems to me you can start off with a really nifty naming convention, but after a while, it gets so cluttered with exceptions and new products that it doesn’t work anymore, and then you have to throw out the whole thing and start again.

Take, for instance, Epic Kayaks. Now I’m not 100% sure of the history, but I believe their first surf ski was the V10, and their second was the V10 Sport. Calling it “Sport” didn’t make a ton of sense because the V10 Sport is actually a less capable surf ski, but was wider and more stable to appeal to a less elite audience. To me, “Sport” usually implies a faster or more capable model, like the “Sport” model of many cars that maybe has more horsepower and gripper tires, or maybe just go-faster stripes and a manual gearbox. They also have a V10L which was at the time just a low-volume version of the V10. I believe they’ve redesigned it since then to be more of its own boat specifically for lighter paddlers.

But since that time, they’ve added the V12 and V14, each of which is narrower and less stable (and faster) than the previous, and then the V8, V7, and V5, which are increasingly more stable and slower as the number decreases. Then they made a boat that was sort of intermediate between the V8 and the V10 Sport (which was already intermediate between the V8 and the V10) and found themselves naming it the V8 Pro. Not as bad a decision as the use of “Sport” in the V10 Sport, because it does imply something faster than the V8, and it is. But still an obvious shoe-horn into a naming convention that was already under stress.

Then this year they demoed a boat that had the same width as a V12 but which was shorter (shorter even than the V10 Sport) to handle short period waves. When they were demoing it, they were calling it the V12M. And that wasn’t a horrible name because really I think it was designed to be “like a V12, but only for specific conditions”. But then they announced it officially as the V11. That to me implies something faster than the V10 and slower than the V12, and it probably actually is.

But I think their number system is getting crowded. It mostly works that the higher the number, the narrower, longer and faster the boat is. But there are exceptions. The space between the V8 and the V10 has two boats, neither of which is called the V9. There are three boats that are called “V10” (ignoring the V10 Double for a second), with pretty different characteristics. People confuse the V10 Sport and V10 a lot. There aren’t that many V10Ls around here, so I don’t know if they get confused for V10s a lot.

Epic is going to continue to design new boats. Some of them are going to be brought to market. I think sooner rather than later they’re going to have to throw out the whole “V number” system, and either just bring in new boats with a different designation or maybe even redesignate the whole fleet.

Naming conventions are tricky. I like that a person can broadly tell whether an Epic boat is more elite or less elite just by the name. I can’t tell anything about, say, the Fenn boats because they use proper nouns instead of numbers. But on the other hand, as long as Fenn designers can think names, they’re never going to have this problem.

At least they aren’t doing stuff like the computer hardware world, where you get horrendous long names with numbers and letters in riotous abandon. I’ve got an HP OfficeJet 6700 Premium printer. That name doesn’t tell me anything about its capabilities or how it stacks up against the OfficeJet J6000 or the OfficeJet L7000 or anything else in the HP printer line.

I’m reminded of the software world. Basically, most software uses monotonically increasing version numbers, usually with a minor and maybe a semi-minor version number as well, and you know that a change in major number probably means something significant and a change in semi-minor is probably invisible. So macOS 10.12.6 is obviously newer than macOS 10.12.5 and possibly just fixes some bugs, but it probably has some feature changes from macOS 10.11.1.

Windows started off with monotonically increasing numbers (Windows 1, 2, 3.11) and then switched to the last two digits of the year (being the only people I know stupid enough to set yourself up for a Y2K problem with only 5 years left to go) with Windows 95 and 98, broke the convention with Windows 98SE and Windows ME, then looked like they were going back to it with Windows 2000. But then they switched to names that meant nothing (XP) and then back to numbers for Windows 7 and 8, but due to problems caused by lazy programmers in 95 and 98, had to skip Windows 8 and go directly to Windows 10. Ugh, what a mess!

One piece of software I used way back in the day was a dBase III compiler called “Clipper”. I used to love the fact that their naming convention was actually the season and year of release, so Winter ’84 was followed by Summer ’85, etc. Good, because it was easy to tell if the version you found on the shelf was newer than the one you were using. But people evidently didn’t like it, because for their 6th release, they switched to calling it “Clipper 5.00” (yes, it was the 6th release – I guess that means they started from 0) and then “5.01”, then “5.01 Rev 129” because who needs consistency? Although looking at Wikipedia, it’s possible that people didn’t like the seasonal names because they lied a lot. “Summer ’87” was released on 21 December 1987.

So I guess what I’m saying is I’m glad I don’t have to name stuff because my OCD would want the names to tell you something, but I’d also want to leave room for fill in products without breaking the convention, but at the same time be memorable and not confusing.

More camera woes

One of the things I’ve struggled with over the years is that a typical waterproof action camera has a battery life of around 80 minutes, and most of my races and training paddles are longer than that, especially if you want to start the camera when you leave the shore for your warm up and not have to faff around on the start line trying to get it started when you really should be concentrating on the race. I’ve experimented with various ways of providing power from a USB battery pack to various cameras with varying success but they either haven’t worked or they’ve succumbed to water damage.

My newest camera is a GoPro Hero Black 5, which is waterproof without an extra case. It has two openings with waterproof covers, one for the battery and memory card, and one with a USB port and an HDMI port. The USB port can be used for charging or for downloading video. I was assured by people on the GoPro forum that it would be perfectly safe to remove the cover over the ports, plug in a USB cable, and seal around it with one of those silicon putty earplugs they sell to swimmers. I’ve been using it like that all year and it’s been great. With a small USB battery also sealed with silicon putty I’ve had record times over 3 hours with no problems.

However last Thursday was the first time I actually let the camera get fully immersed, rather than just splashed – I was landing in a big surf and the boat flipped over after I jumped out. I didn’t think much of it – the camera seemed fine, although the touch screen was acting a little wonky. I didn’t think much of it – I just figured it didn’t like the water on it and I’d have to remember to disable it next time. I took it home and plugged it into my computer to charge and download the pictures, and then forgot about it.

Until the middle of the night last night, three nights later, when I heard the distinctive sounds a GoPro makes when it’s powering off. That’s odd, I thought, maybe it took this long to fully charge and now it’s shutting off. And then some time later, it happened again. Shit! I got up and stumbled into my office, and discovered it was powered up again. Not wanting to be kept awake all night by this stupid beeping, I took it into another room and removed the battery. That’s when I saw green corrosion on the battery terminals. A very bad sign – that means that water had gotten into the case and into the electrical parts. I’m afraid to power it back in this morning and see if it’s still working. I’ll have to see if it’s too late to properly dry it out and hope it survives.

Give it a REST

As you might know, I’m currently looking for a job. And one thing you see in job ads is a requirement for experience with “REST APIs” or “RESTful services”. And as far as I can tell, it’s nothing more than a naming convention for your basic CRUD (Create, Read, Update, Delete) web services. If you write four URL handlers for the URLs “/item/create”, “/item/{item id}/read”, “/item/{item id}/update” and “/item/{item id}/delete” then you’re a filthy normie and unemployable, but if instead you make one URL handler for “/item/{item id}” and check the request type and do the read, update and delete based on the request type being “GET”, “PUT”, or “DELETE” respectively, (creation being done with a POST to the URL “/items”) then you’re a “RESTful” guru and will be showered in money.

Can we just agree that being a naming convention, it takes approximately 5 minutes to train somebody how to do this? And if my former employer would give me back my login for an hour or so I could go back and change all my AJAX calls to fit this naming convention and join the ranks of the REST API experienced.