I remember buying my first HDD when the cost was $1/MB. Buying a gigabyte of storage was a grand (in 1990s dollars) and felt like a huge commitment. Since then, I've watched prices come down, and I feel like there's no excuse to lose data because of lack of storage.
But doing the backups is the hard part.
I've lost data before. Not recently, but enough times early on that I developed a habit of taking backups seriously. The best backup system I've found is one you don't have to think about at all -- not because you've forgotten it, but because it genuinely doesn't need you.
This was the problem when I had manual backup flows and needed to attach storage to my computers to run the backups. I needed a NAS and automation to finally backup reliably.
Almost everything flows through the NAS, a TrueNAS machine with a pair of ZFS pools.
The larger one, tank, holds both live media and backup datasets. Two datasets
do all the backup work: tank/Backups and tank/Archive. Backups
is the landing zone -- everything comes in here. Archive is a curated subset that gets
pushed offsite. (Also yes, I'm lazy and used the same pool name ("tank") that the ZFS
documentation uses.)
Keeping them separate matters. Backups includes VM disk images and restic repos that are big and reproducible; Archive contains the irreplaceable files I want to survive a house fire. Sending the whole Backups dataset offsite would cost more bandwidth and storage for diminishing returns.
All my VMs -- Plex, the *arr stack, and others -- run on Proxmox. Proxmox has a built-in
backup scheduler that runs vzdump nightly at 9pm. It takes a live snapshot of each running
VM, compresses with zstd, and pushes it to an NFS mount backed by
tank/Backups/pvebackups. Retention is seven daily and one monthly.
That's enough to recover from a botched upgrade or an accidentally deleted container. I don't need years of VM history -- I need last week.
The VMs contain services but the machines themselves accumulate state worth keeping too: home directories, config files, things that are annoying to reconstruct. For those I use restic, backed by a restic REST server running on the NAS.
Not just my VMs -- all of my Linux workstations/laptops use restic to back up their home directories too.
Each machine has a small shell wrapper that sets the repository URL and password file, then
passes everything through to restic. Repositories live in tank/Backups/restic/
with a subdirectory per machine. A separate prune job enforces the retention policy:
7 daily, 5 weekly, 12 monthly, 75 yearly.
I have photos going back to 2002 -- scanned prints, early digital camera shots, eventually iPhone photos. That's not something I'm willing to lose, and I don't trust any single cloud service to be the only copy.
Back before I had a smartphone, I had a DSLR and would archive those photos to an external HDD. One time I accidentally wiped 6 months of precious family photos. That hurt and really stuck with me.
icloudpd is an open-source tool that authenticates to iCloud and mirrors your photo library
locally. It runs against tank/Backups/icloud/, organized by year. Having them
locally means I can browse them without an internet connection and include them in the
offsite archive without depending on Apple's continued goodwill.
Not everything in Backups is worth pushing offsite. Every night at 9pm, a script on the
NAS called sync-archive pulls from the latest ZFS snapshot of Backups and
rclones the irreplaceable parts into the Archive dataset: photos, family photos, Google
Takeout exports.
The source is always a snapshot, not the live dataset. That gives a consistent point-in-time view regardless of what's being written during the copy, and means the archive always reflects a completed state, not something half-finished.
The Archive dataset gets its own daily ZFS snapshots with a 14-day retention window. So the archive is itself versioned -- not just the backups that feed it.
Two copies on the same NAS is better than one, but not by much. A bad firmware update or
a controller failure could take them both out. I have a second machine -- a Supermicro
server I call archive -- that exists purely to receive a copy of
tank/Archive.
TrueNAS has a built-in replication task that runs after each nightly snapshot and pushes the incremental diff to the archive server via SSH+netcat. The first send was a full transfer; every subsequent one is just the delta. Most nights that's a few hundred kilobytes.
The archive server is normally powered off. A machine that's off can't be compromised, updated into a broken state, or silently fail in a way nobody notices. A cron job on my Raspberry Pi sends an IPMI command at 11:55pm to power it on:
55 23 * * * /home/pi/bin/ipmitool chassis power on
By midnight the machine has booted and the ZFS replication can connect. At 00:15 the offsite sync script runs. Then the machine shuts itself down. Total uptime: roughly 20 minutes on a normal night.
Two local copies still burn together. The archive server runs a script once a week that
pushes /mnt/archive to a Hetzner StorageBox via rclone:
rclone -P --transfers 8 sync . secret:
The secret: remote is an rclone crypt layer. Everything is encrypted on the
client before it leaves the machine -- Hetzner sees opaque blobs, not filenames or content.
The script runs from cron every night at 00:15. On six nights out of seven, it checks the
day of the week and exits immediately. On Monday nights it actually does the rclone sync.
Either way the exit trap calls shutdown, so the Supermicro goes back to sleep
within a minute or two of the script finishing.
9:00pm -- Proxmox dumps VM images to the NAS.
9:00pm -- sync-archive assembles the archive dataset from the latest Backups snapshot.
11:55pm -- Raspberry Pi wakes the Supermicro via IPMI.
12:00am -- NAS sends incremental ZFS snapshot to the now-awake Supermicro.
12:15am -- Supermicro runs sync-to-hetzner. On Monday nights, it syncs to Hetzner and
then shuts down. On other nights it skips the rclone and shuts down immediately.
I don't get notifications when the sync runs. I don't log into the archive server to check on it. The TrueNAS replication dashboard shows the last successful run time and I glance at it occasionally. Mostly I leave it alone. That was the whole point.
The restic repos don't go offsite. They're on the NAS but not in the archive dataset. This is a gap I'm aware of. The data I'd actually need after a disaster is probably covered -- photos, documents, exports. But machine restoration would start from a fresh install, not from restic.
I also don't test restores regularly. I've done it a few times -- pulled a photo from the archive, restored a VM after a bad update -- but there's no scheduled restore test. That's the honest admission at the end of any backup writeup.
Actually, whenever I rebuild an Arch Linux laptop, I use the restic backup to restore my home directory to a fresh (used ThinkPad) laptop and install all of the same packages. Minus some things that live in /etc, this gets my workstations back up and running and looking exactly how I'm used to.