Homelab Temperature Monitoring: Sensors, Alerts, and Dashboards
Heat is the enemy of homelab hardware. A CPU running 20°C too hot for months degrades the chip. A hard drive above 45°C shortens its lifespan. An unchecked GPU at 90°C will eventually trigger thermal throttling, then failure.
Photo by BINGYEN STUDIO on Unsplash
Setting up temperature monitoring takes less than an hour and provides visibility into your hardware's thermal health before problems develop.
Reading Sensors with lm-sensors
lm-sensors is the standard Linux tool for reading CPU, motherboard, and SSD temperatures.
Install:
apt install lm-sensors
Detect hardware sensors:
sudo sensors-detect # follow prompts, say yes to all defaults
This scans your hardware and loads appropriate kernel modules.
Read current temperatures:
sensors
Sample output:
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +45.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +42.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +43.0°C
nvme-pci-0300
Adapter: PCI adapter
Composite: +35.9°C (low = -273.1°C, high = +84.8°C)
k10temp-pci-00c3
Adapter: PCI adapter
Tctl: +50.2°C
The output depends on your hardware. Intel CPUs show per-core temperatures via coretemp. AMD CPUs use k10temp. NVMe drives show composite temperature.
Hard Drive Temperatures with hddtemp/smartmontools
For spinning hard drives, use smartmontools:
apt install smartmontools
# Check temperature of a specific drive
sudo smartctl -A /dev/sda | grep Temperature
Output:
194 Temperature_Celsius ... 35 (Min/Max 18/44)
For bulk monitoring of multiple drives:
for dev in /dev/sd{a,b,c,d}; do
echo "$dev: $(sudo smartctl -A $dev | grep -i temp | awk '{print $10}')°C"
done
GPU Temperature (NVIDIA)
nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader
# Example output: 65
For fan speed and power:
nvidia-smi --query-gpu=temperature.gpu,fan.speed,power.draw --format=csv
Like what you're reading? Subscribe to HomeLab Starter — free weekly guides in your inbox.
GPU Temperature (AMD)
cat /sys/class/drm/card0/device/hwmon/hwmon*/temp1_input
# Output in millidegrees: 65000 = 65°C
Or via sensors if the amdgpu kernel module is loaded.
Prometheus Node Exporter for Continuous Monitoring
For continuous monitoring integrated with Prometheus and Grafana, use node_exporter:
services:
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
network_mode: host
pid: host
volumes:
- /:/host:ro,rslave
command:
- '--path.rootfs=/host'
- '--collector.hwmon' # enables temperature sensors
- '--collector.diskstats'
- '--collector.filesystem'
With --collector.hwmon, node_exporter exposes all lm-sensors readings as Prometheus metrics:
node_hwmon_temp_celsius{chip="coretemp-isa-0000",sensor="temp1"} 45.0
node_hwmon_fan_rpm{chip="nct6796-isa-0a10",sensor="fan1"} 1200.0
Prometheus Configuration
Add the node exporter as a scrape target:
# prometheus.yml
scrape_configs:
- job_name: 'homelab-node'
static_configs:
- targets: ['your-server-ip:9100']
Grafana Temperature Dashboard
A useful dashboard configuration for CPU and drive temperatures:
CPU Core Temperature (average):
avg(node_hwmon_temp_celsius{chip=~"coretemp.*", sensor=~"temp.*_input"})
CPU Core Temperature (max):
max(node_hwmon_temp_celsius{chip=~"coretemp.*", sensor=~"temp.*_input"})
NVMe Temperature:
node_hwmon_temp_celsius{chip=~"nvme.*", sensor="temp1"}
Fan Speeds:
node_hwmon_fan_rpm{chip=~".*", sensor=~"fan.*"}
Use a gauge panel for current temperatures and a time series panel for historical trends.
Import a Pre-Built Dashboard
Grafana's dashboard library includes several node exporter dashboards with temperature support. Dashboard ID 1860 ("Node Exporter Full") includes temperature panels.
- Grafana → Dashboards → Import
- Enter ID:
1860 - Select your Prometheus data source
- Import
You may need to customize selectors for your specific sensor names.
Alertmanager Rules for Thermal Alerts
Configure alerts when temperatures exceed safe thresholds:
# prometheus/rules/temperature.yml
groups:
- name: temperature
rules:
- alert: CPUHighTemperature
expr: max(node_hwmon_temp_celsius{chip=~"coretemp.*"}) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "CPU temperature high on {{ $labels.instance }}"
description: "CPU temperature is {{ $value }}°C for 5+ minutes"
- alert: CPUCriticalTemperature
expr: max(node_hwmon_temp_celsius{chip=~"coretemp.*"}) > 95
for: 1m
labels:
severity: critical
annotations:
summary: "CPU temperature critical on {{ $labels.instance }}"
- alert: DiskHighTemperature
expr: node_hwmon_temp_celsius{chip=~"nvme.*"} > 70
for: 5m
labels:
severity: warning
annotations:
summary: "NVMe drive temperature high"
Send alerts to Discord, Slack, or email via Alertmanager's notification integrations.
Safe Temperature Ranges
Guidelines by component type:
| Component | Normal | Warning | Critical |
|---|---|---|---|
| Intel CPU (idle) | 30-45°C | 70°C+ | 90°C+ |
| Intel CPU (load) | 50-75°C | 85°C+ | 95°C+ |
| AMD Ryzen (load) | 60-80°C | 90°C+ | 95°C+ |
| NVMe SSD | 30-45°C | 65°C+ | 75°C+ |
| SATA SSD | 25-40°C | 60°C+ | 70°C+ |
| HDD | 30-40°C | 45°C+ | 50°C+ |
| GPU (gaming load) | 65-80°C | 85°C+ | 90°C+ |
These are general guidelines. Check your specific component's datasheet for rated operating temperatures.
Fan Control with fancontrol
fancontrol (part of lm-sensors) lets you set PWM fan curves based on temperature:
# Generate a fancontrol config
sudo pwmconfig
# Start fan control daemon
sudo systemctl enable --now fancontrol
pwmconfig interactively guides you through mapping fans to temperature sensors and setting temperature curves. The generated config file at /etc/fancontrol defines:
- Which temperature sensor controls which fan
- Temperature thresholds for fan speeds
- Min/max fan speeds
This prevents fans from running at full speed constantly (noisy) while ensuring adequate cooling at high temperatures.
Ambient Temperature Monitoring
For rack cooling verification, add an ambient temperature sensor:
USB temperature sensor: Generic USB HID thermometers work with usb-sensors or custom scripts.
ESPHome / Tasmota sensor: Attach a DHT22 or DS18B20 sensor to an ESP8266 and expose it as a MQTT or HTTP endpoint. Read it from Prometheus with a custom exporter.
Home Assistant integration: If you run HA, expose your homelab rack temperature as an entity and pull it into Prometheus via the HA metrics endpoint.
Tracking ambient temperature alongside CPU and drive temperatures helps diagnose cooling issues: if ambient rises and CPU temps follow, you have airflow or cooling problems in the room, not the server.
