VM Snapshots and Rollback: Safe Experimentation in Your Homelab
One of the best things about running VMs in a homelab is that you can break things without consequences. Want to try upgrading your kernel? Testing a new database version? Experimenting with a firewall rule that might lock you out? Take a snapshot first, and if things go sideways, roll back in seconds.
But snapshots aren't magic. They're not backups. They can eat your storage if you're not careful. And if you don't understand how they work under the hood, they can actually slow your VMs down significantly.
Let's fix that. This guide covers snapshot management across the three most common homelab hypervisors: Proxmox VE, libvirt/KVM, and VMware ESXi. By the end, you'll have automated snapshot schedules, clean rollback procedures, and a solid understanding of when to use snapshots vs. full backups.

What Snapshots Actually Are (And Aren't)
A snapshot captures the state of a virtual machine at a specific point in time. Depending on the hypervisor, this includes:
- Disk state — The contents of the virtual disk(s)
- Memory state — The contents of RAM (optional)
- VM configuration — CPU, memory, network settings
Here's the critical distinction that trips people up:
| Feature | Snapshot | Backup |
|---|---|---|
| Speed to create | Seconds | Minutes to hours |
| Storage location | Same storage as VM | Separate storage |
| Survives storage failure | No | Yes (if stored elsewhere) |
| Performance impact | Yes (grows over time) | None after completion |
| Purpose | Short-term rollback | Long-term data protection |
| Dependency | Requires base disk | Self-contained |
Snapshots are not backups. I'll say it again because this is the number one mistake homelabbers make. A snapshot lives on the same storage as your VM. If that storage dies, you lose both the VM and all its snapshots. Snapshots are for short-term experimentation, not data protection.
How Snapshots Work Under the Hood
When you take a snapshot of a VM's disk, the hypervisor stops writing to the original disk image and creates a new "delta" file. All new writes go to the delta file. Reads check the delta first, then fall back to the original.
This is called copy-on-write (or redirect-on-write, depending on the implementation). The original disk becomes read-only, and the snapshot delta tracks only what's changed since the snapshot was taken.
This has some important implications:
- Snapshots are fast to create because nothing is copied — just a new delta file is started
- Snapshots grow over time because every write to the VM goes to the delta
- Snapshot chains slow things down because reads have to traverse the chain
- Deleting a snapshot isn't instant because the delta has to be merged back into the base (or parent)
Think of it like a stack of transparent overlays on a drawing. Each overlay has only the changes from the previous layer. To see the full picture, you need all the layers. The more layers you have, the longer it takes to find any given piece of data.
Proxmox VE Snapshot Management
Proxmox is probably the most popular hypervisor for homelabs, and its snapshot support is solid. You can manage snapshots through the web UI or the command line.
Taking Snapshots via the Web UI
- Select your VM in the Proxmox web interface
- Go to the Snapshots tab
- Click Take Snapshot
- Give it a meaningful name (e.g.,
before-kernel-upgrade-2026-02-09) - Optionally include VM memory state (RAM contents)
- Click Take Snapshot
Including the RAM state makes the snapshot larger and slower to create, but it lets you restore to a running state instead of a powered-off one. For most homelab scenarios, disk-only snapshots are fine.
Taking Snapshots via CLI
The command-line approach is what you'll want for automation:
# Take a snapshot (disk only)
qm snapshot 100 before-upgrade --description "Before kernel upgrade"
# Take a snapshot with RAM
qm snapshot 100 before-upgrade --vmstate --description "Before kernel upgrade"
# List all snapshots for VM 100
qm listsnapshot 100
# Roll back to a snapshot
qm rollback 100 before-upgrade
# Delete a snapshot (merge delta into base)
qm delsnapshot 100 before-upgrade
The VM ID (100 in these examples) is the numeric ID shown in the Proxmox interface.
Proxmox Snapshot Storage Considerations
Proxmox supports multiple storage backends, and not all of them support snapshots equally:
| Storage Type | Snapshot Support | Notes |
|---|---|---|
| ZFS | Excellent | Native ZFS snapshots, very fast, minimal overhead |
| LVM-thin | Good | Thin provisioning with snapshot support |
| Ceph/RBD | Excellent | Native snapshot support, distributed |
| Directory (qcow2) | Good | QEMU snapshots, works but slower |
| LVM (thick) | No | No snapshot support — use LVM-thin instead |
| NFS (qcow2) | Good | QEMU snapshots via qcow2 format |
| ZFS over iSCSI | Good | Native ZFS snapshots on the target |
If you're setting up a new Proxmox installation and want good snapshot support, ZFS or LVM-thin are your best bets. ZFS is particularly nice because snapshots are essentially free until the data diverges.
# Check your storage configuration
pvesm status
# Example output:
# Name Type Status Total Used Available %
# local dir active 98304 12288 86016 12.50%
# local-lvm lvmthin active 409600 204800 204800 50.00%
# zfspool zfspool active 1843200 368640 1474560 20.00%
ZFS Snapshots in Proxmox
If your Proxmox VMs are on ZFS storage, you get some extra benefits:
# List ZFS snapshots directly
zfs list -t snapshot -r rpool/data
# Example output:
# NAME USED AVAIL REFER MOUNTPOINT
# rpool/data/vm-100-disk-0@before-upgrade 12M - 32.5G -
# rpool/data/vm-100-disk-0@daily-2026-02-08 48M - 32.5G -
# rpool/data/vm-100-disk-0@daily-2026-02-09 8K - 32.5G -
# Check how much space snapshots are consuming
zfs list -o name,used,refer,usedbysnapshots -r rpool/data
# Send a ZFS snapshot to another pool (great for backups!)
zfs send rpool/data/vm-100-disk-0@before-upgrade | zfs recv backup/vm-100
ZFS snapshots are instantaneous and initially take zero additional space. They only grow as the active data diverges from the snapshot. This makes ZFS the ideal storage backend for a snapshot-heavy workflow.
Libvirt/KVM Snapshot Management
If you're running KVM with libvirt (either directly on a Linux host or through something like virt-manager), you manage snapshots with the virsh command.
Internal vs. External Snapshots
Libvirt supports two types of snapshots, and this is important to understand:
Internal snapshots are stored inside the qcow2 disk image. They're easy to manage but only work with qcow2 format disks and have some performance overhead.
External snapshots create a new overlay file, leaving the original as a read-only backing file. They're faster and more flexible but slightly more complex to manage.
# Take an internal snapshot (disk + memory)
virsh snapshot-create-as myvm snap-before-upgrade \
--description "Before kernel upgrade" \
--atomic
# Take a disk-only internal snapshot (VM can be running or stopped)
virsh snapshot-create-as myvm snap-before-upgrade \
--description "Before kernel upgrade" \
--disk-only \
--atomic
# Take an external snapshot
virsh snapshot-create-as myvm snap-before-upgrade \
--description "Before kernel upgrade" \
--disk-only \
--diskspec vda,snapshot=external,file=/var/lib/libvirt/images/myvm-snap.qcow2 \
--atomic
# List all snapshots
virsh snapshot-list myvm
# Get snapshot details
virsh snapshot-info myvm snap-before-upgrade
# Revert to a snapshot
virsh snapshot-revert myvm snap-before-upgrade
# Delete a snapshot
virsh snapshot-delete myvm snap-before-upgrade
Managing External Snapshot Chains
External snapshots can form chains that look like this:
base.qcow2 ← snap1.qcow2 ← snap2.qcow2 ← snap3.qcow2 (active)
Each file in the chain depends on everything before it. You can inspect the chain with:
# Show the backing chain for a disk image
qemu-img info --backing-chain /var/lib/libvirt/images/myvm-snap3.qcow2
# Example output:
# image: /var/lib/libvirt/images/myvm-snap3.qcow2
# file format: qcow2
# virtual size: 50 GiB (53687091200 bytes)
# disk size: 256 MiB
# backing file: /var/lib/libvirt/images/myvm-snap2.qcow2
# backing file format: qcow2
#
# image: /var/lib/libvirt/images/myvm-snap2.qcow2
# ...
To merge snapshots and reduce the chain (called "block commit"):
# Merge the active layer into its backing file (commit the top layer down)
virsh blockcommit myvm vda --active --pivot --verbose
# Or merge a specific snapshot into its parent
virsh blockcommit myvm vda --top /path/to/snap2.qcow2 --base /path/to/snap1.qcow2 --verbose
Practical KVM Snapshot Script
Here's a script I use for taking quick snapshots before doing anything risky:
#!/bin/bash
# vm-snap.sh — Quick snapshot management for libvirt VMs
set -euo pipefail
VM_NAME="${1:?Usage: vm-snap.sh <vm-name> [take|list|revert|delete] [snap-name]}"
ACTION="${2:-list}"
SNAP_NAME="${3:-manual-$(date +%Y%m%d-%H%M%S)}"
case "$ACTION" in
take)
echo "Taking snapshot '$SNAP_NAME' of VM '$VM_NAME'..."
virsh snapshot-create-as "$VM_NAME" "$SNAP_NAME" \
--description "Manual snapshot $(date)" \
--atomic
echo "Snapshot created. Current snapshots:"
virsh snapshot-list "$VM_NAME"
;;
list)
virsh snapshot-list "$VM_NAME" --tree
;;
revert)
echo "Reverting VM '$VM_NAME' to snapshot '$SNAP_NAME'..."
virsh snapshot-revert "$VM_NAME" "$SNAP_NAME"
echo "Reverted successfully."
;;
delete)
echo "Deleting snapshot '$SNAP_NAME' from VM '$VM_NAME'..."
virsh snapshot-delete "$VM_NAME" "$SNAP_NAME"
echo "Deleted."
;;
*)
echo "Unknown action: $ACTION"
echo "Usage: vm-snap.sh <vm-name> [take|list|revert|delete] [snap-name]"
exit 1
;;
esac
Usage:
chmod +x vm-snap.sh
# Take a snapshot
./vm-snap.sh myvm take before-nginx-upgrade
# List snapshots
./vm-snap.sh myvm list
# Roll back
./vm-snap.sh myvm revert before-nginx-upgrade
# Clean up
./vm-snap.sh myvm delete before-nginx-upgrade
Snapshot Chains and Performance Impact
This is where most people get bitten. A single snapshot has minimal performance impact. But snapshot chains — multiple snapshots stacked on top of each other — can significantly degrade VM performance.
Why Chains Hurt Performance
Every read operation has to traverse the chain from the active layer back to the base image to find the data. With a chain of 5 snapshots, a read might need to check 6 files before finding the data.
Write operations always go to the active layer, so they aren't affected as much. But the metadata overhead still adds up.
Here's a rough guide to how chain depth affects I/O performance:
| Chain Depth | Read Impact | Write Impact | Recommendation |
|---|---|---|---|
| 1 (just one snapshot) | Minimal (< 5%) | Minimal | Fine for days/weeks |
| 2-3 snapshots | Noticeable (5-15%) | Minimal | OK for short-term testing |
| 4-5 snapshots | Significant (15-30%) | Moderate | Consolidate soon |
| 6+ snapshots | Severe (30%+) | Moderate | Consolidate immediately |
These numbers are approximate and depend heavily on your storage backend, I/O patterns, and whether the data is cached. But the trend is clear: keep your chains short.
Monitoring Snapshot Size
Snapshots grow as the VM writes new data. A snapshot that was 0 bytes when created can balloon to gigabytes if the VM is doing heavy I/O.
# Proxmox — check snapshot sizes
qm listsnapshot 100
# libvirt — check snapshot disk usage
virsh domblkinfo myvm vda
qemu-img info /var/lib/libvirt/images/myvm-snap1.qcow2
# ZFS — check snapshot space usage
zfs list -t snapshot -o name,used,refer -r rpool/data
Set up a simple monitoring script to alert you when snapshot storage gets out of hand:
#!/bin/bash
# check-snapshot-growth.sh — Alert if snapshots are using too much space
set -euo pipefail
MAX_SNAPSHOT_GB=50
STORAGE_PATH="/var/lib/libvirt/images"
# Calculate total snapshot overlay size
SNAP_SIZE_BYTES=$(find "$STORAGE_PATH" -name "*-snap*" -type f -printf "%s\n" | paste -sd+ | bc)
SNAP_SIZE_GB=$((SNAP_SIZE_BYTES / 1073741824))
if [ "$SNAP_SIZE_GB" -gt "$MAX_SNAPSHOT_GB" ]; then
echo "WARNING: Snapshot storage is using ${SNAP_SIZE_GB}GB (threshold: ${MAX_SNAPSHOT_GB}GB)"
echo "Consider consolidating or deleting old snapshots."
# Optionally send a notification
# curl -s -o /dev/null "https://ntfy.sh/my-homelab-alerts" \
# -d "Snapshot storage: ${SNAP_SIZE_GB}GB exceeds ${MAX_SNAPSHOT_GB}GB threshold"
fi
Automated Snapshot Schedules
Manual snapshots are great for one-off experiments, but you should also have automated snapshots for your important VMs. These act as quick rollback points for unexpected problems.
Proxmox: Built-in Backup Scheduling
Proxmox doesn't have built-in snapshot scheduling (its schedule feature is for full backups), but you can easily add it with cron:
# /etc/cron.d/vm-snapshots
# Automated daily snapshots for important VMs
# Runs at 3:00 AM, keeps last 3 daily snapshots
0 3 * * * root /usr/local/bin/proxmox-auto-snapshot.sh
#!/bin/bash
# /usr/local/bin/proxmox-auto-snapshot.sh
# Automated snapshot management for Proxmox VMs
set -euo pipefail
# VMs to snapshot (space-separated VM IDs)
VMS="100 101 102"
# How many daily snapshots to keep
KEEP_DAYS=3
DATE=$(date +%Y%m%d)
SNAP_PREFIX="auto-daily"
for VMID in $VMS; do
SNAP_NAME="${SNAP_PREFIX}-${DATE}"
# Check if VM exists and is not a template
if ! qm status "$VMID" &>/dev/null; then
echo "VM $VMID does not exist, skipping"
continue
fi
echo "Creating snapshot '$SNAP_NAME' for VM $VMID..."
qm snapshot "$VMID" "$SNAP_NAME" --description "Automated daily snapshot"
# Clean up old snapshots
echo "Cleaning up old snapshots for VM $VMID..."
qm listsnapshot "$VMID" | grep "$SNAP_PREFIX" | while read -r line; do
OLD_SNAP=$(echo "$line" | awk '{print $2}')
# Extract date from snapshot name
OLD_DATE=$(echo "$OLD_SNAP" | grep -oP '\d{8}$' || true)
if [ -n "$OLD_DATE" ]; then
AGE_DAYS=$(( ($(date +%s) - $(date -d "$OLD_DATE" +%s)) / 86400 ))
if [ "$AGE_DAYS" -gt "$KEEP_DAYS" ]; then
echo " Deleting old snapshot: $OLD_SNAP (${AGE_DAYS} days old)"
qm delsnapshot "$VMID" "$OLD_SNAP" || true
fi
fi
done
echo "Done with VM $VMID"
done
echo "Snapshot rotation complete."
Libvirt/KVM: Automated Snapshots with Systemd Timers
Systemd timers are more reliable than cron and give you better logging:
# /etc/systemd/system/vm-snapshot.timer
[Unit]
Description=Daily VM snapshot timer
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true
RandomizedDelaySec=300
[Install]
WantedBy=timers.target
# /etc/systemd/system/vm-snapshot.service
[Unit]
Description=Take and rotate VM snapshots
After=libvirtd.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/kvm-auto-snapshot.sh
StandardOutput=journal
StandardError=journal
#!/bin/bash
# /usr/local/bin/kvm-auto-snapshot.sh
# Automated snapshot management for libvirt/KVM VMs
set -euo pipefail
# VMs to snapshot
VMS="webserver database mediaserver"
KEEP_DAYS=3
DATE=$(date +%Y%m%d)
SNAP_PREFIX="auto-daily"
for VM in $VMS; do
SNAP_NAME="${SNAP_PREFIX}-${DATE}"
# Check if VM exists
if ! virsh dominfo "$VM" &>/dev/null; then
echo "VM $VM does not exist, skipping"
continue
fi
echo "Creating snapshot '$SNAP_NAME' for VM '$VM'..."
virsh snapshot-create-as "$VM" "$SNAP_NAME" \
--description "Automated daily snapshot $(date)" \
--atomic
# Clean up old snapshots
virsh snapshot-list "$VM" --name | grep "^${SNAP_PREFIX}" | while read -r OLD_SNAP; do
OLD_DATE=$(echo "$OLD_SNAP" | grep -oP '\d{8}$' || true)
if [ -n "$OLD_DATE" ]; then
AGE_DAYS=$(( ($(date +%s) - $(date -d "$OLD_DATE" +%s)) / 86400 ))
if [ "$AGE_DAYS" -gt "$KEEP_DAYS" ]; then
echo " Deleting old snapshot: $OLD_SNAP (${AGE_DAYS} days old)"
virsh snapshot-delete "$VM" "$OLD_SNAP" || true
fi
fi
done
done
echo "Snapshot rotation complete at $(date)"
Enable the timer:
sudo systemctl daemon-reload
sudo systemctl enable --now vm-snapshot.timer
# Verify it's scheduled
systemctl list-timers vm-snapshot.timer
# Check logs after it runs
journalctl -u vm-snapshot.service --since today
Storage Considerations for Snapshot-Heavy Workflows
If you're using snapshots regularly (and you should be), you need to plan your storage accordingly.
Thin Provisioning Is Your Friend
Thin provisioning means the storage backend only allocates space as data is actually written, rather than reserving the full virtual disk size upfront. This is essential for snapshot workflows because snapshots only consume space proportional to the data that's changed.
# Proxmox — check if your storage uses thin provisioning
pvesm status
# Create a thinly provisioned LVM storage
lvcreate -L 500G -T pve/data # Creates a 500GB thin pool
# ZFS is inherently thin-provisioned
zfs create -V 50G rpool/data/vm-100-disk-0
# Only allocates space as data is written
Estimating Snapshot Storage Needs
A rough formula for snapshot storage:
Snapshot storage = Change rate × Retention period × Number of VMs
For example:
- 5 VMs each writing 2GB/day of new data
- Keeping 3 days of snapshots
- Total snapshot overhead: 5 × 2GB × 3 = 30GB
In practice, this varies wildly. Database servers change a lot of data. Web servers serving static content change very little. Monitor your actual usage for a week before setting up retention policies.
Snapshot Storage Best Practices
| Practice | Why |
|---|---|
| Keep snapshots on the same pool as the VM | Required (snapshots are deltas of the base) |
| Monitor pool free space | Snapshots can fill your pool if you're not watching |
| Set up alerts at 80% pool usage | Leave headroom for snapshot growth |
| Don't keep snapshots for more than a few days | Performance degrades and storage accumulates |
| Delete snapshots before taking new ones if space is tight | Deletion merges the delta, freeing space |
| Use ZFS or LVM-thin, not thick provisioning | Thick provisioning wastes space with snapshots |
When to Use Snapshots vs. Full Backups
This is the decision matrix I use:
Use Snapshots When:
- Upgrading software — Snap before, upgrade, test, delete the snap if it works, or roll back if it doesn't
- Testing configuration changes — Especially firewall rules, DNS changes, or anything that might lock you out
- Before running database migrations — Quick rollback if the migration fails
- Quick development iteration — Snap a known-good state, experiment, roll back, repeat
- Before applying OS updates — Kernel updates, library updates, etc.
Use Full Backups When:
- Long-term data protection — Backups survive storage failure, snapshots don't
- Compliance or audit requirements — Even in a homelab, you might want monthly snapshots of certain configs
- Before major infrastructure changes — Moving to new storage, new hypervisor, etc.
- Migration between hosts — Backups are portable, snapshots usually aren't
- Anything you can't afford to lose — If the answer is "I'd be really upset if I lost this," it needs a backup, not just a snapshot
The Ideal Workflow
The best approach combines both:
- Automated daily snapshots for quick rollback (keep 3-7 days)
- Automated weekly full backups to a separate NAS or storage (keep 4-8 weeks)
- Automated monthly backups to offsite/cloud storage (keep 6-12 months)
- Manual snapshots before any risky change (delete after verifying the change)
# Example backup schedule combining snapshots and backups
# /etc/cron.d/vm-protection
# Daily snapshots at 3 AM (keep 3 days)
0 3 * * * root /usr/local/bin/proxmox-auto-snapshot.sh
# Weekly full backups on Sunday at 4 AM (keep 4 weeks)
0 4 * * 0 root /usr/local/bin/proxmox-backup.sh weekly
# Monthly full backups on the 1st at 5 AM (keep 6 months)
0 5 1 * * root /usr/local/bin/proxmox-backup.sh monthly
Snapshot Cleanup and Consolidation
Forgetting to clean up snapshots is the most common mistake. Here's how to stay on top of it.
Manual Cleanup Procedure
# Proxmox — list all snapshots across all VMs
for VMID in $(qm list | awk 'NR>1 {print $1}'); do
echo "=== VM $VMID ==="
qm listsnapshot "$VMID"
done
# Delete a specific snapshot
qm delsnapshot 100 old-snapshot-name
# libvirt — list all snapshots across all VMs
for VM in $(virsh list --all --name); do
COUNT=$(virsh snapshot-list "$VM" --name 2>/dev/null | wc -l)
if [ "$COUNT" -gt 0 ]; then
echo "=== $VM ($COUNT snapshots) ==="
virsh snapshot-list "$VM"
fi
done
Automated Cleanup Script
This script finds and removes snapshots older than a specified age:
#!/bin/bash
# snapshot-cleanup.sh — Find and remove old snapshots across all VMs
set -euo pipefail
MAX_AGE_DAYS="${1:-7}"
DRY_RUN="${2:-false}"
echo "Cleaning up snapshots older than $MAX_AGE_DAYS days"
[ "$DRY_RUN" = "true" ] && echo "(DRY RUN — no changes will be made)"
TOTAL_DELETED=0
for VMID in $(qm list 2>/dev/null | awk 'NR>1 {print $1}'); do
qm listsnapshot "$VMID" 2>/dev/null | grep -v "current" | while read -r line; do
SNAP_NAME=$(echo "$line" | awk '{print $2}')
[ -z "$SNAP_NAME" ] && continue
# Try to extract date from snapshot name
SNAP_DATE=$(echo "$SNAP_NAME" | grep -oP '\d{4}-?\d{2}-?\d{2}' | head -1 || true)
if [ -z "$SNAP_DATE" ]; then
echo " Skipping $SNAP_NAME (no date in name)"
continue
fi
# Normalize date format
SNAP_DATE=$(echo "$SNAP_DATE" | sed 's/\([0-9]\{4\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)/\1-\2-\3/')
AGE_DAYS=$(( ($(date +%s) - $(date -d "$SNAP_DATE" +%s)) / 86400 ))
if [ "$AGE_DAYS" -gt "$MAX_AGE_DAYS" ]; then
if [ "$DRY_RUN" = "true" ]; then
echo " Would delete: VM $VMID / $SNAP_NAME (${AGE_DAYS} days old)"
else
echo " Deleting: VM $VMID / $SNAP_NAME (${AGE_DAYS} days old)"
qm delsnapshot "$VMID" "$SNAP_NAME" || echo " Failed to delete $SNAP_NAME"
TOTAL_DELETED=$((TOTAL_DELETED + 1))
fi
fi
done
done
echo "Cleanup complete. Deleted $TOTAL_DELETED snapshots."
Usage:
# Dry run — see what would be deleted
./snapshot-cleanup.sh 7 true
# Actually delete snapshots older than 7 days
./snapshot-cleanup.sh 7
# Delete snapshots older than 3 days
./snapshot-cleanup.sh 3
Practical Snapshot Workflow Examples
Let's walk through some real scenarios where snapshots save the day.
Scenario 1: Kernel Upgrade
# 1. Take a snapshot
qm snapshot 100 before-kernel --description "Before kernel 6.x upgrade"
# 2. SSH into the VM and do the upgrade
ssh root@vm100 "apt update && apt upgrade -y && apt install linux-image-6.8-amd64"
# 3. Reboot and test
ssh root@vm100 "reboot"
# Wait for it to come back...
ssh root@vm100 "uname -r" # Verify new kernel
ssh root@vm100 "systemctl --failed" # Check for broken services
# 4a. Everything works — delete the snapshot
qm delsnapshot 100 before-kernel
# 4b. Something's broken — roll back
qm rollback 100 before-kernel
# VM is back to exactly where it was before the upgrade
Scenario 2: Database Migration
# 1. Take a snapshot (include memory state for a consistent DB snapshot)
virsh snapshot-create-as dbserver before-migration \
--description "Before schema migration v42" \
--atomic
# 2. Run the migration
ssh root@dbserver "cd /opt/myapp && python manage.py migrate"
# 3. Run verification queries
ssh root@dbserver "psql -U myapp -c 'SELECT count(*) FROM users'"
ssh root@dbserver "psql -U myapp -c 'SELECT count(*) FROM orders'"
# 4a. Migration successful — delete snapshot
virsh snapshot-delete dbserver before-migration
# 4b. Migration failed — roll back
virsh snapshot-revert dbserver before-migration
# Database is back to pre-migration state
Scenario 3: Firewall Rule Testing
# 1. Take a snapshot (this is critical for firewall changes!)
qm snapshot 100 before-firewall --description "Before iptables changes"
# 2. Apply firewall rules with a safety net
ssh root@vm100 bash -c '
# Schedule a rule flush in 5 minutes in case we lock ourselves out
echo "iptables -F && iptables -P INPUT ACCEPT" | at now + 5 minutes
# Apply the new rules
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
iptables -A INPUT -j DROP
'
# 3. Test connectivity
curl -s -o /dev/null -w "%{http_code}" http://vm100 # Should be 200
# 4a. Rules work — cancel the safety flush and persist
ssh root@vm100 "atrm 1 && iptables-save > /etc/iptables/rules.v4"
qm delsnapshot 100 before-firewall
# 4b. Locked out — either wait 5 minutes for the flush, or:
qm rollback 100 before-firewall
Tips and Gotchas
Here are the hard-won lessons from years of snapshot use:
1. Never leave manual snapshots around for more than a day or two. You'll forget about them, they'll grow, and eventually you'll wonder why your storage pool is full.
2. Name your snapshots descriptively. snap1 tells you nothing. before-postgresql-16-upgrade-2026-02-09 tells you everything.
3. Don't snapshot VMs with heavy I/O unless you need to. Database servers, for example, generate huge snapshot deltas. Consider stopping the VM briefly for a consistent snapshot, or use application-level backups (like pg_dump) instead.
4. Be careful with snapshot + live migration. Some hypervisors don't support live-migrating a VM that has snapshots. Check your hypervisor's documentation.
5. Monitor your snapshot chain depth. If you see more than 3-4 snapshots in a chain, consolidate. The performance impact is real.
6. Test your rollback procedure before you need it. Don't wait until you're panicking to learn how rollback works. Take a snapshot, make a small change, roll back, verify. Do this in a low-stakes environment first.
7. Document your snapshot policies. Write down which VMs get automated snapshots, how many you keep, and when manual snapshots should be taken. Future you will thank present you.
Wrapping Up
Snapshots are one of the most powerful tools in your homelab toolkit. They turn risky operations into reversible experiments. But they need to be understood and managed — they're not "set and forget."
The key takeaways:
- Snapshots are for short-term rollback, not long-term protection — pair them with real backups
- Keep chains short — more than 3-4 snapshots in a chain will noticeably impact performance
- Automate both creation and cleanup — use the cron jobs and scripts in this guide
- Always take a snapshot before risky changes — upgrades, migrations, firewall changes, anything
- Name them well and delete them promptly — storage is finite, and forgotten snapshots are a ticking time bomb
Now go break something in your homelab. You've got a snapshot to fall back on.