Proxmox VE Clustering for High Availability
A single Proxmox server is a great foundation for a homelab. Two or three Proxmox servers in a cluster unlock the features that make virtualization genuinely powerful: live migration (move running VMs between hosts without downtime), high availability (VMs restart automatically on a surviving node when a host fails), and centralized management of all your virtualization infrastructure from one web interface.
Proxmox VE clustering is built on mature Linux technologies — Corosync for cluster communication, a distributed configuration filesystem (pmxcfs), and the Proxmox HA manager for failover. It works on commodity hardware, doesn't require matching configurations across nodes, and the setup takes about 15 minutes once you understand the requirements.

Prerequisites
Before creating a cluster, make sure your environment meets these requirements:
Network:
- All nodes must be able to reach each other on a dedicated cluster network (ideally a separate NIC or VLAN)
- Low latency between nodes (<2ms round trip). Nodes across a WAN link won't work reliably
- The cluster network carries Corosync heartbeats. If it goes down, the cluster assumes a node is dead
Hostnames and DNS:
- Each node needs a unique hostname
- Hostnames must resolve correctly. Check
/etc/hostson each node — the hostname must NOT resolve to127.0.0.1. It must point to the node's real cluster network IP
Time:
- All nodes must have synchronized time. Install and configure chrony or NTP on every node
Fresh or compatible installations:
- Joining a node to a cluster wipes its existing Proxmox configuration. VMs and containers on the joining node are preserved, but the cluster config is replaced with the cluster's. Plan accordingly.
Example Setup
For this guide:
- pve1: 192.168.1.101 (cluster creation node)
- pve2: 192.168.1.102
- pve3: 192.168.1.103
- Cluster network: same subnet, dedicated VLAN preferred
Verify /etc/hosts on each node:
192.168.1.101 pve1
192.168.1.102 pve2
192.168.1.103 pve3
Do NOT have entries like 127.0.1.1 pve1 — this causes Corosync binding issues.
Creating the Cluster
On the first node (pve1), create the cluster:
pvecm create homelab-cluster
That's it. One command. Verify it:
pvecm status
You should see a cluster with one node. The web UI (https://192.168.1.101:8006) now shows the cluster name.
Specifying the Cluster Network
If you have a dedicated cluster network interface, specify it during creation:
pvecm create homelab-cluster --link0 192.168.10.101
This binds Corosync to the dedicated interface. For redundancy, add a second link:
pvecm create homelab-cluster --link0 192.168.10.101 --link1 192.168.1.101
Dual links mean the cluster survives the failure of one network path.
Joining Nodes
On pve2 and pve3, join the cluster by pointing at any existing cluster member:
# On pve2
pvecm add 192.168.1.101
# On pve3
pvecm add 192.168.1.101
You'll be prompted for the root password of the target node. After joining, verify:
pvecm status
All three nodes should appear with status. The web UI on any node now shows all three nodes and their VMs/containers.
If Joining Fails
Common issues:
- SSH key conflicts: Clear
/root/.ssh/known_hostsentries for cluster nodes and retry - Hostname resolution: Double-check
/etc/hostson all nodes - Firewall: Ports 8006 (web), 5405-5412/udp (Corosync), 22 (SSH), 60000-60050 (live migration) must be open between nodes
- Time skew: If clocks differ by more than a few seconds, Corosync refuses to form quorum
Quorum
A cluster with three nodes requires at least two nodes to be online to have quorum (a majority). Without quorum, the cluster becomes read-only to prevent split-brain scenarios.
- 3 nodes: Can tolerate 1 node failure
- 2 nodes: No fault tolerance (losing either node loses quorum). This is problematic — see below
- 5 nodes: Can tolerate 2 failures
The Two-Node Problem
Two Proxmox nodes can form a cluster, but if either goes down, the survivor doesn't have quorum and HA won't function. Solutions:
- QDevice (Corosync QNet): Add a lightweight third vote from a Raspberry Pi or any Linux machine. It doesn't run Proxmox — it just provides the tie-breaking vote.
# On the QDevice host (any small Linux box)
sudo apt install corosync-qnetd
# On a Proxmox node
pvecm qdevice setup 192.168.1.200
- Three nodes: Even a modest third node (a mini PC or old laptop running Proxmox) provides genuine three-way quorum.
For homelabs, the QDevice approach is popular because it doesn't require a third full server.
Shared Storage for Live Migration and HA
Live migration and HA require that VM disk images are accessible from all nodes simultaneously. A VM can only move to another node if that node can access the same disk.
Options for shared storage:
NFS
The simplest option. Export a directory from your NAS and add it as storage on all Proxmox nodes:
Datacenter > Storage > Add > NFS
Server: 192.168.1.50
Export: /mnt/pool/proxmox
Content: Disk image, ISO image, Container template
NFS works well for homelab clusters. Performance is adequate for most workloads, and setup is trivial.
Ceph (Built Into Proxmox)
Proxmox includes Ceph integration. Each node contributes local disks to a distributed storage pool. No external NAS needed. This is the most "proper" solution but requires:
- At least 3 nodes (for replication)
- Dedicated SSDs or disks on each node for Ceph OSDs
- A dedicated network for Ceph traffic (10 GbE recommended)
For a three-node homelab cluster, Ceph with SSDs provides excellent performance and redundancy. Setup is done through the Proxmox web UI under Datacenter > Ceph.
iSCSI
Presents a block device from your NAS to all Proxmox nodes. Better raw performance than NFS for I/O-intensive VMs. More complex to set up but well-supported by Proxmox.
ZFS over iSCSI
If your NAS runs ZFS, you can expose ZFS volumes as iSCSI targets. Proxmox has a dedicated storage plugin for this.
Enabling High Availability
With shared storage in place, enabling HA for a VM or container is straightforward.
Via the Web UI
- Select a VM or container
- Go to More > Manage HA
- Set the HA group and priority
- Choose the requested state (started, stopped, disabled)
Via the Command Line
# Add VM 100 to HA with max_restart of 3
ha-manager add vm:100 --state started --max_restart 3 --max_relocate 1
# Check HA status
ha-manager status
HA Groups
HA groups define which nodes a VM can run on and their priority:
# Create a group
ha-manager groupadd preferred-nodes --nodes pve1,pve2 --nofailback 0
- nodes — Which cluster nodes are eligible to host VMs in this group
- nofailback — If set to 0, VMs migrate back to the preferred node when it recovers. If 1, they stay where they are after failover
What Happens During a Node Failure
- Corosync detects the node is unreachable (after ~30 seconds of missed heartbeats)
- The HA manager on a surviving node takes over management responsibility
- The failed node is fenced (more on this below)
- HA-managed VMs from the failed node are restarted on surviving nodes
- Restart happens in priority order with configurable delays
Total failover time is typically 1-3 minutes, depending on fencing method and VM boot time.
Fencing
Fencing ensures that a failed node is truly stopped before its VMs are started elsewhere. Without fencing, you risk two copies of the same VM running simultaneously, which corrupts data.
Proxmox supports several fencing methods:
Hardware Watchdog (Recommended for Homelabs)
Most server hardware has an IPMI/iDRAC/iLO watchdog timer. If the node stops refreshing the watchdog, the hardware forces a reboot.
# Check if a hardware watchdog is available
ls /dev/watchdog*
# Proxmox uses the softdog module as a fallback
Proxmox configures the HA manager to use a watchdog by default. If the HA manager on a node loses cluster communication, the watchdog triggers a reboot after a timeout, ensuring the node doesn't continue running VMs that are being started elsewhere.
IPMI Fencing
For more reliable fencing, configure IPMI so surviving nodes can force-power-off the failed node:
# Test IPMI connectivity
ipmitool -I lanplus -H 192.168.1.201 -U admin -P password power status
Configure in /etc/pve/ha/fence.cfg:
device ipmi pve1 {
cmd "ipmitool -I lanplus -H 192.168.1.201 -U admin -P password power off"
}
Live Migration
With shared storage, you can move running VMs between nodes with zero downtime:
Via the Web UI
Right-click a VM > Migrate > Select target node > Migrate
Via the Command Line
# Live migrate VM 100 to pve2
qm migrate 100 pve2 --online
Live migration copies the VM's RAM contents to the target node while it continues running, then switches over in the final milliseconds. The VM experiences a brief pause (typically under 100ms) during the switchover.
Requirements for live migration:
- Shared storage for the VM's disk
- Sufficient RAM on the target node
- Same CPU architecture (Intel to Intel, AMD to AMD). Mixed works with CPU type set to a common baseline (e.g.,
x86-64-v2-AES) - Network connectivity between nodes on the migration network (ports 60000-60050)
Cluster Network Best Practices
Separate cluster traffic from VM traffic. Corosync heartbeats are small but latency-sensitive. If your cluster network shares bandwidth with a large VM backup or migration, missed heartbeats can trigger false failovers.
Use a dedicated VLAN or physical NIC for Corosync. Even a separate 1 GbE link dedicated to cluster traffic is better than sharing a 10 GbE link with everything else.
Use link bonding for redundancy. A single network cable failure shouldn't partition your cluster. Bond two interfaces or use dual Corosync links.
Set up a dedicated migration network. Under Datacenter > Options > Migration Settings, specify a network for live migration traffic. This prevents large migrations from saturating your cluster or production network.
Maintenance
Removing a Node
If you need to permanently remove a node:
- Migrate all VMs and containers off the node
- Remove HA resources from that node
- On the node being removed:
pvecm delnode NODENAME - On a remaining node:
pvecm delnode NODENAME(if the removed node is already offline)
Updating the Cluster
Update nodes one at a time. Migrate VMs off a node, update it, reboot, verify it rejoins the cluster, then move to the next node. This rolling update approach keeps your services available throughout.
A Proxmox cluster transforms your homelab from "a couple of servers" into genuine infrastructure. VMs survive hardware failures, maintenance doesn't require downtime, and you manage everything from a single interface. The setup is straightforward enough to complete in an afternoon, and the operational benefits are immediate.