← All articles
Public storage building with a sign.

Homelab Storage Tiering: NVMe, SSD, and HDD Strategy

Storage 2026-02-15 · 11 min read storage nvme ssd hdd zfs lvm bcache tiering performance
By HomeLab Starter Editorial TeamHome lab enthusiasts covering hardware setup, networking, and self-hosted services for home and small office environments.

Most homelabs start with whatever drives are available. An NVMe boot drive, a couple of SSDs from an old desktop, and some spinning disks bought on sale. This works fine until you realize your database is competing with Jellyfin transcodes for the same SATA SSD, your VM snapshots are filling up your fastest storage, and your backup jobs are tanking everything.

Photo by Annie Spratt on Unsplash

Storage tiering solves this by matching workloads to the right media type. NVMe for latency-sensitive work like VMs and databases. SSDs for active data that needs decent throughput. HDDs for bulk storage where capacity matters more than speed. The trick is implementing this in a way that is manageable and lets data flow between tiers without manual intervention.

This guide covers the principles of homelab storage tiering, the technologies available on Linux for implementing it, and practical configurations you can deploy today.

OpenZFS logo

Why Tier Your Storage

The economics are straightforward. As of early 2026, approximate prices per terabyte:

Media Type Cost per TB Sequential Read Random 4K IOPS Typical Capacity
NVMe Gen4 $80-120 5-7 GB/s 500K-1M 1-4 TB
NVMe Gen3 $60-90 2-3.5 GB/s 300-500K 1-4 TB
SATA SSD $50-80 500-560 MB/s 80-100K 1-8 TB
HDD (CMR) $15-25 150-250 MB/s 100-200 4-22 TB
HDD (SMR) $12-18 150-200 MB/s 50-100 4-18 TB

Putting everything on NVMe would cost 4-8x more than a tiered approach. Putting everything on HDDs means your VMs feel sluggish and your databases cannot keep up. A tiered strategy gives each workload the performance it needs while keeping costs reasonable.

What Goes Where

Tier 1 (NVMe) -- Hot data that demands the lowest possible latency:

Tier 2 (SATA/NVMe SSD) -- Warm data needing solid throughput but not extreme latency:

Tier 3 (HDD) -- Cold data where capacity is the priority:

Technology Options

Linux offers several technologies for implementing storage tiering. Each has different trade-offs.

LVM Cache (dm-cache)

LVM cache uses device-mapper to front an HDD logical volume with an SSD cache. Reads and writes hit the SSD first. Hot data stays cached; cold data lives on the HDD. This is the most kernel-native option and works with any filesystem.

Pros: Mature (in-kernel since 3.9), works with any filesystem, supports writeback and writethrough modes, integrates with existing LVM setups.

Cons: Setup is more complex than bcache, performance overhead on cache misses, cache device failure in writeback mode can lose data.

Setting Up LVM Cache

Assume you have an HDD volume group vg_data with a logical volume lv_storage and you want to add an SSD cache device /dev/sdb:

# Create a physical volume on the SSD
pvcreate /dev/sdb

# Extend the volume group to include the SSD
vgextend vg_data /dev/sdb

# Create the cache pool (using the SSD's physical extents)
lvcreate --type cache-pool -n cache_pool -l 100%PVS vg_data /dev/sdb

# Attach the cache pool to the existing LV
lvconvert --type cache --cachepool vg_data/cache_pool vg_data/lv_storage

# Verify
lvs -a -o +devices,cache_mode vg_data

By default, LVM cache uses writethrough mode (safe but slower writes). For better write performance:

# Switch to writeback (faster writes, risk data loss if SSD fails)
lvchange --cachemode writeback vg_data/lv_storage

Check cache hit rate:

lvs -o +cache_read_hits,cache_read_misses,cache_write_hits,cache_write_misses vg_data/lv_storage

A healthy cache should show 80%+ hit rate after warming up. If your hit rate is consistently below 50%, your working set is too large for the cache device.

bcache

bcache is a block-layer caching framework that pairs one or more backing devices (HDDs) with a caching device (SSD). It was designed from the ground up for SSD caching and offers better performance characteristics than dm-cache for many workloads.

Pros: Better sequential I/O handling (does not cache sequential reads by default), efficient writeback implementation, supports multiple backing devices per cache, built-in garbage collection.

Cons: Requires formatting devices with bcache metadata before use (cannot add to existing devices without migration), slightly less mainstream than LVM cache, kernel support varies by distro.

Setting Up bcache

# Install bcache-tools
sudo apt install bcache-tools  # Debian/Ubuntu
sudo dnf install bcache-tools  # Fedora

# Create the cache device (SSD)
make-bcache -C /dev/sdb

# Create the backing device (HDD)
make-bcache -B /dev/sdc

# Attach the cache to the backing device
# Find the cache set UUID
bcache-super-show /dev/sdb | grep cset.uuid
# Attach
echo <cset-uuid> > /sys/block/bcache0/bcache/attach

# Set writeback mode for better write performance
echo writeback > /sys/block/bcache0/bcache/cache_mode

# Create filesystem on the bcache device
mkfs.ext4 /dev/bcache0

# Mount
mount /dev/bcache0 /mnt/tiered-storage

Key tuning parameters:

# Sequential write cutoff (don't cache large sequential writes)
echo 4M > /sys/block/bcache0/bcache/sequential_cutoff

# Writeback percentage (target % of cache used for dirty data)
echo 20 > /sys/block/bcache0/bcache/writeback_percent

# Check cache stats
cat /sys/block/bcache0/bcache/stats_total/cache_hits
cat /sys/block/bcache0/bcache/stats_total/cache_misses

ZFS Special VDevs

ZFS special vdevs are purpose-built for tiered storage within a ZFS pool. A special vdev stores metadata and optionally small data blocks on fast storage (NVMe/SSD) while keeping bulk data on HDDs. This dramatically accelerates directory listings, file lookups, and small file I/O without caching entire datasets.

Pros: Native ZFS integration, no separate cache layer, metadata always on fast storage, configurable small block threshold, redundancy follows pool rules.

Cons: ZFS only, cannot be removed after adding (until OpenZFS 2.3+), must mirror the special vdev for redundancy.

Creating a Pool with Special VDev

# Create a pool with HDD main vdevs and an SSD special vdev
# Main storage: mirror of two HDDs
# Special vdev: mirror of two SSDs (for metadata + small blocks)
zpool create tank \
  mirror /dev/sda /dev/sdb \
  special mirror /dev/nvme0n1 /dev/nvme1n1

# Set the small block threshold
# Blocks smaller than this go to the special vdev
zfs set special_small_blocks=64K tank

# Verify special vdev placement
zpool status tank

The special_small_blocks property controls which data blocks go to the special vdev. The default is 0 (only metadata). Setting it to 64K means all blocks 64K or smaller also go to the special vdev. For a homelab with lots of small files (documents, code, configs), this provides a significant speedup.

Important: Always mirror your special vdev. If a non-mirrored special vdev fails, the entire pool is lost because metadata is not replicated to the main vdevs.

ZFS SLOG and L2ARC

Beyond special vdevs, ZFS offers two other tiering mechanisms:

SLOG (Separate Log): An NVMe device used for the ZFS Intent Log. It accelerates synchronous writes (NFS, databases with sync=always) by committing the write intent to fast storage before writing to the main pool. This does not speed up async writes.

# Add a SLOG device to an existing pool
zpool add tank log /dev/nvme2n1

# For redundancy, mirror the SLOG
zpool add tank log mirror /dev/nvme2n1 /dev/nvme3n1

L2ARC (Level 2 Adaptive Replacement Cache): An SSD-based read cache that extends the ARC (RAM cache). Useful when your working set exceeds available RAM. L2ARC consumes some RAM for its index (about 70 bytes per cached block), so sizing matters.

# Add an L2ARC device
zpool add tank cache /dev/ssd1

# Check L2ARC hit rate
arc_summary | grep l2

Ceph Tiering

If you run a Ceph cluster, you can configure cache tiering where a pool of SSDs acts as a cache tier in front of an HDD pool. Client I/O hits the SSD tier first, and Ceph promotes hot objects and evicts cold ones automatically.

However, Ceph cache tiering has been semi-deprecated upstream due to complexity and edge-case bugs. The Ceph developers recommend using BlueStore's built-in WAL and DB placement on SSDs instead:

# When deploying OSDs, specify WAL and DB on SSD
ceph-volume lvm create \
  --data /dev/sdc \
  --block.wal /dev/nvme0n1p1 \
  --block.db /dev/nvme0n1p2

This places the write-ahead log and RocksDB metadata on NVMe while keeping data on HDDs. It is simpler and more reliable than cache tiering.

Practical Benchmarks

Here are benchmarks from a real homelab setup to illustrate the performance differences. System: Ryzen 5 5600G, 64 GB DDR4, Samsung 980 Pro 1TB NVMe, Samsung 870 EVO 2TB SATA SSD, WD Red Plus 8TB HDD.

Raw Device Performance (fio, 4K random read/write, iodepth=32)

Device Random Read IOPS Random Write IOPS Seq Read MB/s Seq Write MB/s
NVMe (980 Pro) 620,000 510,000 6,800 4,900
SATA SSD (870 EVO) 93,000 88,000 540 520
HDD (WD Red Plus) 180 195 195 185

bcache Performance (SSD caching HDD, writeback mode)

Workload Raw HDD bcache (cold) bcache (warm)
Random 4K Read 180 IOPS 200 IOPS 78,000 IOPS
Random 4K Write 195 IOPS 42,000 IOPS 65,000 IOPS
Sequential Read 195 MB/s 195 MB/s 195 MB/s
Sequential Write 185 MB/s 480 MB/s 490 MB/s

The cold cache numbers show that cache misses add minimal overhead. Once the cache warms up, random I/O performance approaches SSD levels. Sequential I/O bypasses the cache by default (which is the right behavior -- you do not want a 4K movie file evicting your database pages).

ZFS Special VDev Impact

Operation HDD-only Pool HDD + SSD Special VDev
ls -la (10K files dir) 2.3s 0.08s
find . -name "*.txt" (100K files) 18.4s 0.6s
Small file creation (1K files) 12.1s 0.4s
Large sequential write (10 GB) 185 MB/s 185 MB/s

The special vdev accelerates metadata-heavy operations by 20-30x while large sequential I/O remains the same (those blocks go to the HDD vdevs).

Like what you're reading? Subscribe to HomeLab Starter — free weekly guides in your inbox.

Designing Your Tier Layout

Small Homelab (1-2 Servers)

Server 1:
├── 500GB NVMe (boot + VM disks + databases)
├── 1TB SATA SSD (Nextcloud, Gitea, container images)
└── 2x 8TB HDD (ZFS mirror: media, backups, archives)
    └── NVMe partition as SLOG + L2ARC

Total cost: approximately $350-450 for storage hardware. The NVMe handles latency-critical workloads, the SSD takes care of general application storage, and the HDD mirror provides bulk capacity with redundancy.

Medium Homelab (3-5 Servers)

Server 1 (Compute):
├── 1TB NVMe (Proxmox VMs)
└── 500GB SSD (ISO storage, templates)

Server 2 (NAS/Storage):
├── 500GB NVMe (ZFS SLOG + special vdev mirror)
├── 2x 2TB SSD (ZFS special vdev mirror + active shares)
└── 4x 8TB HDD (ZFS RAIDZ2: bulk storage)

Server 3 (Backup):
├── 256GB SSD (boot + BorgBackup index)
└── 2x 12TB HDD (ZFS mirror: backup targets)

This separates concerns cleanly. Compute nodes have fast local storage. The NAS server uses ZFS special vdevs to accelerate metadata on a large HDD pool. The backup server is optimized for capacity.

Monitoring Tier Health

Whatever tiering technology you use, monitor these metrics:

# LVM cache hit rate
lvs -o cache_read_hits,cache_read_misses vg_data/lv_storage

# bcache stats
cat /sys/block/bcache0/bcache/stats_five_minute/cache_hit_ratio

# ZFS ARC and L2ARC
arc_summary

# Generic I/O latency per device
iostat -x 1

Set up alerts when:

Filesystem Considerations

Your choice of filesystem affects tiering options:

ZFS offers the most integrated tiering with special vdevs, SLOG, and L2ARC. If you are building a new storage setup, ZFS gives you the most flexibility. The cost is higher RAM usage (1 GB per TB of storage as a baseline) and the complexity of managing ZFS pools.

ext4 + LVM is the simplest approach for Linux-native tiering. Use LVM cache to accelerate HDD volumes with SSD cache. ext4 is stable, well-understood, and has no special RAM requirements.

Btrfs does not have built-in tiering, but you can use bcache or LVM cache underneath it. Btrfs does support different RAID profiles per data type (metadata on SSD, data on HDD), but this requires careful pool management.

XFS works well with LVM cache and bcache. Its B-tree-based design handles large directories efficiently even on HDDs, reducing the urgency of metadata tiering compared to ext4.

Filesystem Tiering Support Matrix

Feature ZFS ext4 + LVM Btrfs XFS + LVM
Metadata on SSD Special vdev LVM cache Manual RAID profile LVM cache
Read cache L2ARC LVM cache bcache LVM cache
Write acceleration SLOG LVM writeback bcache writeback LVM writeback
Built-in compression Yes (lz4, zstd) No Yes (zstd) No
Snapshots Yes (COW) LVM snapshots Yes (COW) LVM snapshots
Complexity High Medium Medium Medium

Migration Strategy

If you have an existing homelab and want to add tiering without rebuilding:

  1. Start with monitoring: Run iostat -x 1 during normal operations for a week. Identify which devices are bottlenecked.

  2. Add bcache to bottlenecked HDDs: bcache requires reformatting, so plan a migration window. Back up data, create bcache devices, restore data.

  3. Add LVM cache if already using LVM: This can be done live without reformatting. Create the cache pool on the SSD, convert the existing LV to a cached LV.

  4. For ZFS pools: Add a special vdev at any time (no reformatting needed). New metadata automatically goes to the special vdev. Existing metadata migrates as data is rewritten.

  5. Validate with benchmarks: Run fio before and after to confirm the tiering is having the expected effect.

# Quick benchmark before/after tiering
fio --name=randread --ioengine=libaio --direct=1 --bs=4k \
    --iodepth=32 --rw=randread --numjobs=4 --size=1G \
    --filename=/path/to/tiered/volume/testfile --runtime=60

Common Mistakes

Undersizing the cache: If your SSD cache is 10% the size of your HDD and your working set is 30% of the HDD, the cache will thrash. Size the cache to cover your active working set, typically 15-25% of the backing device.

Using writeback without redundancy: Writeback mode keeps dirty data on the SSD before flushing to the HDD. If the SSD dies, you lose that data. Use writeback only when the SSD is enterprise-grade or mirrored.

Ignoring SSD endurance: Caching workloads amplify writes to the SSD. A consumer SSD rated for 600 TBW may last only 1-2 years as a busy write cache. Use enterprise SSDs or high-endurance consumer drives (look for DWPD or TBW ratings).

Caching sequential workloads: Media streaming and backup transfers are sequential. Caching them wastes SSD space and evicts actually-hot data. Configure sequential_cutoff in bcache or let LVM cache's default policy handle it.

Not monitoring cache health: A cache with a 30% hit rate is consuming power and wearing out the SSD for negligible benefit. Monitor hit rates and resize or reconfigure as workloads change.

Storage tiering is one of those homelab optimizations that pays for itself quickly. A $60 SSD in front of your HDD pool can make directory listings instant and database queries fast, while your bulk data still lives on cheap spinning rust. Start with the simplest option that fits your filesystem (LVM cache for ext4/XFS, special vdev for ZFS, bcache for anything) and expand from there.

Get free weekly tips in your inbox. Subscribe to HomeLab Starter