Prometheus and Grafana: Metrics Monitoring for the Homelab

Monitoring 2026-03-04 · 3 min read prometheus grafana monitoring metrics homelab docker node-exporter cadvisor dashboards
By HomeLab Starter Editorial Team — Home lab enthusiasts covering hardware setup, networking, and self-hosted services for home and small office environments.

Uptime monitoring (is it up?) and metrics monitoring (how is it performing?) are different. Uptime Kuma handles uptime. Prometheus + Grafana handles everything else: CPU usage over time, memory trends, disk I/O, network throughput, container resource consumption. When something is slow or degrading, metrics tell you where.

Photo by Maks Key on Unsplash

Architecture

Prometheus: Scrapes metrics from targets (exporters) at configurable intervals. Stores time-series data. Evaluates alert rules.
Exporters: Translate system/service metrics into Prometheus format. Node Exporter for Linux system metrics; cAdvisor for Docker container metrics; SNMP Exporter for network devices.
Grafana: Query Prometheus and visualize with dashboards. Pre-built dashboards exist for almost everything.
Alertmanager: Routes alerts from Prometheus to email, Slack, PagerDuty, etc.

Docker Compose Stack

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./alert.rules.yml:/etc/prometheus/alert.rules.yml
      - prometheus-data:/prometheus
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --storage.tsdb.retention.time=30d
      - --web.enable-lifecycle  # Allow config reload via HTTP
    ports:
      - 9090:9090

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      GF_SECURITY_ADMIN_PASSWORD: change-this
      GF_USERS_ALLOW_SIGN_UP: "false"
    ports:
      - 3000:3000

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - --path.procfs=/host/proc
      - --path.rootfs=/rootfs
      - --path.sysfs=/host/sys
      - --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
    network_mode: host  # For accurate network metrics

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    devices:
      - /dev/kmsg
    ports:
      - 8080:8080
    privileged: true

  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: unless-stopped
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - 9093:9093

volumes:
  prometheus-data:
  grafana-data:

Prometheus Configuration

prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

rule_files:
  - "alert.rules.yml"

scrape_configs:
  # Prometheus self-monitoring
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]

  # Linux host metrics
  - job_name: node
    static_configs:
      - targets:
          - node-exporter:9100
          - 192.168.1.51:9100  # Second host
          - 192.168.1.52:9100  # Third host

  # Docker container metrics
  - job_name: cadvisor
    static_configs:
      - targets: ["cadvisor:8080"]

  # Proxmox (via PVE exporter)
  - job_name: proxmox
    static_configs:
      - targets: ["pve-exporter:9221"]

  # Additional exporters...

For hosts not running Docker, install node_exporter as a systemd service:

# On each monitored host
wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-linux-amd64.tar.gz
tar xzf node_exporter*.tar.gz
sudo cp node_exporter-*/node_exporter /usr/local/bin/
sudo useradd -rs /bin/false node_exporter

# Create systemd service
cat > /etc/systemd/system/node_exporter.service << EOF
[Unit]
Description=Node Exporter

[Service]
User=node_exporter
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable --now node_exporter

Alert Rules

alert.rules.yml:

groups:
  - name: homelab
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}%"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "{{ $labels.mountpoint }} has {{ $value }}% free"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"

Alertmanager Configuration

alertmanager.yml:

global:
  smtp_from: [email protected]
  smtp_smarthost: smtp.example.com:587
  smtp_auth_username: [email protected]
  smtp_auth_password: smtp-password

route:
  receiver: email-alerts
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
  - name: email-alerts
    email_configs:
      - to: [email protected]
        subject: '[Homelab Alert] {{ .GroupLabels.alertname }}'

  # Slack alternative
  - name: slack-alerts
    slack_configs:
      - api_url: https://hooks.slack.com/services/...
        channel: '#homelab-alerts'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

Grafana Setup

Open http://your-server:3000, log in with admin/your-password
Add Prometheus data source: Connections → Data Sources → Add → Prometheus → URL: http://prometheus:9090

Pre-built dashboards (import by ID in Grafana → Dashboards → Import):

1860: Node Exporter Full — comprehensive Linux metrics
14282: Proxmox summary dashboard
193: Docker and system monitoring
11074: Node Exporter for Prometheus Dashboard EN

Import a dashboard:

Grafana → Dashboards → New → Import
Enter dashboard ID
Select Prometheus data source
Import

Proxmox Metrics Integration

Proxmox has built-in Prometheus metrics endpoint (requires enabling):

# On Proxmox host
apt install prometheus-pve-exporter

# Configure /etc/pve-exporter/config.yml
default:
  user: prometheus@pve
  password: monitoring-password
  verify_ssl: false

Or use the built-in Proxmox metrics API (Proxmox 7.2+):

Datacenter → Metric Server → Add → Prometheus

Retention and Storage

Default Prometheus retention is 15 days. For homelab use, 30-90 days is more useful:

command:
  - --storage.tsdb.retention.time=90d

Approximately 5-15MB per monitored host per day at 15s scrape interval. A homelab with 5 hosts uses ~50-100MB/month.

PromQL Basics

Prometheus uses PromQL for queries:

# Current CPU idle percentage per host
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage percentage
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Disk usage by mount point
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100

# Network receive rate (bytes/sec)
irate(node_network_receive_bytes_total{device="eth0"}[5m])

# Docker container CPU usage
rate(container_cpu_usage_seconds_total{name=~".+"}[5m]) * 100

These form the basis of dashboard panels. Grafana's panel editor shows the resulting graph as you type.