Adventures in Time Machine backups

A few days ago my belovedest Dara got us a shiny new NAS to use as a backup server. I was prepared to get excited about this, as up until now our Time Machine solution has been to use an older Macbook in the house (and by older, I mean, it’s still running Snow Leopard) as a Time Machine server. That box has three hard drives plugged into it, and Dara and I both have been using this for years mostly without problems.

However, as we’ve acquired more machines and that backup server gets older and older, it’s meant that our backups have been… shall we say… less reliable than we’d like.

This is a post about that, in in-depth detail, so I can point at this when I put out calls to ask for help. Technogeekery behind the cut!

Dara has several machines at this point:

  • Her primary Macbook
  • The Macbook which used to be my first Macbook (Winnowill) and which she’s repurposed now as a Linux box
  • Her gaming machine (Chonkyboi)
  • Her DAW, the machine formerly in use as her music engineering system
  • Her other Macbook (which is the same model as her primary, and which the primary replaced; this is the one with trackpad issues)

Me, I’ve just got the two I’m actively using. (I can’t count my work Macbook as I’m not allowed to do backups of that on our home LAN; the backup solution for that one is “chuck everything onto OneDrive”.) My current machines are:

  • The Macbook, Aroree
  • My dual-boot Linux/Windows 10 box, the system which used to be my work box when I still worked at Big Fish, and which is now called Savah

With all these machines, a nice robust backup solution sounded like a very fine thing indeed.

But here’s the problem. My Macbook, Aroree, has been having sporadic Time Machine backup issues for a while. The problem behavior has been that Time Machine would, for no apparent reason, just quit and throw up the supremely unhelpful message that it had an error copying files. No sign of what file it was bitching about, or what kind of a failure it meant–I mean, could it not find the file? Could it not find a place to put in the backup? Is it tired of copying and needing a break? Does it just not like that file? I don’t know! Because the error message wouldn’t tell me!

Up until now though I’ve been able to get around that problem by just rebooting the Mac and trying again. Now though, not so much.

Once we got that NAS set up, Aroree seemed perfectly happy to let me start a backup. But here’s the maddening thing. Those of you who have been Mac users for a while know that when you do an initial backup of your machine to a new backup location (whether it’s a brand new computer, or a new backup location, either scenario), that backup is going to be friggin’ huge. In my case, we’re talking over 400G.

So I started an initial backup, and it went fine for a while… until about 220G in, where it went “error copying files”, threw up its hands, and bailed.

Cue the wailing and gnashing of teeth, and also trying to find ways to see exactly what the damn machine was complaining about.

Things I have wound up trying as of this writing include:

  1. Running Disk Utility, which is MacOS’s standard “if something’s broken, you can probably fix it by running this” solution. This conducts a first aid run on a problematic hard drive, fixes broken permissions, checks filesystem integrity, that kind of thing. Under the hood this includes running fsck, which the Linux geeks among you should recognize. (Note that I ran Disk Utility both inside MacOS itself and also after rebooting the machine into Recovery mode, which lets you run Disk Utility with the filesystem actually unmounted and not while you’re trying to actually, y’know, use the filesystem.)
  2. Running another utility I have called Onyx, which has also been a standard “if shit is broken, try running this to fix it” solution of mine. This one does a lot of the same things Disk Utility does in terms of checking filesystem integrity, but it also clears a bunch of system caches, gets rid of old logs, and runs maintenance scripts.
  3. Rebooting the machine into Safe Mode, and running Time Machine that way. Safe Mode is the bare minimum set of drivers and other things needed to make macOS function, and it’s useful to determine whether your issue is something about the actual operating system, or something involved with other things you had installed. Note that the mysterious file copying error did not repro in Safe Mode. But I did have an error in which Time Machine bitched about setting ownership on files in the backup, and in its infinite wisdom it didn’t do that until the backup was almost done. Cue more wailing and gnashing of teeth.
  4. Deleting CleanMyMac off the system since it was running a whole bunch of supplementary threads, and I wanted to see if not having that running helped. Answer: no.
  5. Shutting down the Dropbox desktop app, since it also generally eats a lot of system power (over 300MB of memory and a bunch of helper threads), to see if not having that running helped. Answer: no.
  6. Seeing if I could back up again to the previous backup server. Answer: no. File copying problem happened there too.
  7. Rebooting normally to see if that helped, since that helped in the past. This time: not so much.
  8. Deleting the failed backup data off the NAS to see if starting over helped. Answer: no.
  9. Seeing if I could actually get real-time data about what Time Machine is doing while it runs. This was one of the few semi-helpful things I was able to do, as this at least let me see more data about what was going on.

A few different web pages led me to a command that looks like this, which I can run from the macOS command line terminal:

log stream –predicate ‘subsystem == “com.apple.TimeMachine”‘info

This pretty much lets me monitor what Time Machine does, in real time, as it does it. It’s like doing a tail -f on system.log in Linux.

That said: what this has told me so far hasn’t been nearly as useful as I’d like. All it’s accomplished for me so far is to show me a bunch of incidents where it tries to copy a thing, apparently can’t find a proper place to put it, and just goes “NOPE”. Here’s an example of what that looks like from the log:

2021-04-10 12:19:43.713142-0700 0x58f64 Error 0x94d50 10670 0 backupd: (TimeMachine) [com.apple.TimeMachine:General] Failed to copy ‘/Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/Aroree/2021-04-10-083143/Aroree HD – Data/Users/annathepiper/Google Drive/Québecois Tunes/Saute de Lapin.pdf’to ‘(null)’, error: -36, srcErr: YES

Googling for what the hell to do about an error -36, so far, has been fruitless. I have seen multiple pages bitching about this in various forms, but so far the proposed solutions have all been things I’ve already tried.

Oh and one more thing I did try after seeing these errors: a lot of them were clustered around my library of photos in the macOS Photos app, and a bunch of them were the last few things I saw in the log after the last failure. So I excluded the Photos library from the backup, and tried again.

Now, as of this writing, I have another backup running. I’ve seen more errors showing up in the log, but so far none of them have seemed fatal. By which I mean: none of them have shown up in clusters, but they are still happening even though the backup is running.

And this concerns me. It suggests that something is fundamentally fucked about this hard drive, something that Disk Utility wasn’t able to fix. And it suggests that I’m going to have to rescue a lot of data off that drive, wipe it, and start over with another one.

Which is why I’m now writing this post on Savah, and seeing how many of my critical day to day computer activities I can do on the other system.

SO THAT’S FUN. By which I mean: now seriously considering if it’s time to get a new Macbook.

But that’ll be a decision I make after seeing whether setting up one of the spare SSD’s in the house with a new fresh install of Catalina will get me going again. I still can’t quite justify being a new M1 Macbook, even if I really kind of want one, since I have this other perfectly functional laptop to use.

And in the meantime, if anybody has any other brilliant ideas about how to fix the kinds of copy errors I’m seeing Time Machine throwing? Talk to me! Because I’d really love to solve this problem.