Setting Up Grafana Alloy for Homelab Observability

Monitoring 2026-02-15 · 7 min read grafana alloy observability monitoring opentelemetry metrics
By HomeLab Starter Editorial Team — Home lab enthusiasts covering hardware setup, networking, and self-hosted services for home and small office environments.

Running separate agents for metrics, logs, and traces gets old fast. You end up with Promtail on every machine for logs, Node Exporter for metrics, maybe an OTEL Collector somewhere for traces, each with its own config format and deployment lifecycle. Grafana Alloy replaces all of them with a single binary that handles metrics, logs, and traces through one unified configuration.

Photo by Markus Winkler on Unsplash

Alloy is Grafana Labs' successor to Grafana Agent. It shipped as a stable release in early 2024, and Grafana Agent has been officially deprecated since. If you're still running Grafana Agent or juggling multiple collection agents across your homelab, Alloy is the upgrade worth making.

What Is Grafana Alloy

Grafana Alloy is an OpenTelemetry-compatible telemetry collector. It collects metrics, logs, and traces from your infrastructure and applications, processes them with pipeline stages, and ships them to backends like Prometheus, Loki, Tempo, or any OTLP-compatible endpoint.

The key differences from the old Grafana Agent:

River configuration language — A purpose-built config language that replaces YAML. More expressive and supports referencing values between components.
Component-based architecture — Everything is a component: sources, processors, and exporters. You wire them together into a pipeline.
Built-in UI — A web UI at port 12345 showing your running pipeline, component health, and live data flow.
Native OpenTelemetry support — Full OTLP receiver and exporter support alongside Prometheus and Loki native protocols.
Single binary — One process replaces Node Exporter, Promtail, OTEL Collector, and Grafana Agent.

Architecture Overview

Alloy runs on every machine you want to observe. Each instance collects local metrics, scrapes log files, and optionally receives traces. Everything flows to your central backends.

[Machine 1: Alloy] ──metrics──→ Prometheus
                    ──logs────→ Loki
                    ──traces──→ Tempo
[Machine 2: Alloy] ──metrics──→ Prometheus
                    ──logs────→ Loki
                         ↑
                    [Grafana queries all three backends]

For smaller homelabs (2-5 machines), run the backends and Alloy on the same box. For larger setups, dedicate a machine to your monitoring stack and run Alloy agents on everything else.

Installation

Docker

docker run -d \
  --name alloy \
  --restart=unless-stopped \
  --net=host --pid=host \
  -v /:/host:ro,rslave \
  -v /var/log:/var/log:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v ./config.alloy:/etc/alloy/config.alloy \
  grafana/alloy:latest \
  run /etc/alloy/config.alloy \
  --server.http.listen-addr=0.0.0.0:12345 \
  --stability.level=generally-available

The --net=host and --pid=host flags give Alloy access to accurate host metrics. The Docker socket mount enables container discovery.

Bare Metal (Debian/Ubuntu)

sudo mkdir -p /etc/apt/keyrings
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update && sudo apt install alloy
sudo systemctl enable --now alloy

For Fedora/RHEL: sudo dnf config-manager --add-repo https://rpm.grafana.com && sudo dnf install alloy

The config file lives at /etc/alloy/config.alloy. The built-in UI is at http://localhost:12345.

Kubernetes (Helm)

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install alloy grafana/alloy -n monitoring --create-namespace -f values.yaml

Want more monitoring guides? Get guides like this in your inbox — HomeLab Starter delivers one free deep-dive every week.

Full Docker Compose Stack

A complete observability stack with Alloy, Prometheus, Loki, and Grafana:

# ~/monitoring/docker-compose.yml
services:
  alloy:
    image: grafana/alloy:latest
    container_name: alloy
    restart: unless-stopped
    network_mode: host
    pid: host
    volumes:
      - /:/host:ro,rslave
      - /var/log:/var/log:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./config.alloy:/etc/alloy/config.alloy
    command:
      - run
      - /etc/alloy/config.alloy
      - --server.http.listen-addr=0.0.0.0:12345
      - --stability.level=generally-available

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - prometheus_data:/prometheus
    command:
      - '--storage.tsdb.retention.time=90d'
      - '--web.enable-remote-write-receiver'
      - '--config.file=/etc/prometheus/prometheus.yml'

  loki:
    image: grafana/loki:3.3.2
    container_name: loki
    restart: unless-stopped
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yml:/etc/loki/loki-config.yml
      - loki_data:/loki
    command: -config.file=/etc/loki/loki-config.yml

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      GF_SECURITY_ADMIN_PASSWORD: "changeme"
      GF_USERS_ALLOW_SIGN_UP: "false"

volumes:
  prometheus_data:
  loki_data:
  grafana_data:

The --web.enable-remote-write-receiver flag on Prometheus is critical — it lets Alloy push metrics via the remote write API instead of requiring Prometheus to scrape each Alloy instance.

Configuration Walkthrough

Alloy uses the River configuration language. If you've used HCL (Terraform), River will feel familiar. Every block is a component with a type, an optional label, and arguments. Components reference each other through expressions, forming a directed pipeline.

Here's a complete config.alloy that collects node metrics, Docker container metrics, Docker logs, and journal logs:

// ============================================================
// METRICS: Node Exporter (host metrics)
// ============================================================
prometheus.exporter.unix "default" {
  set_collectors = [
    "cpu", "diskstats", "filesystem", "loadavg",
    "meminfo", "netdev", "os", "time", "uname",
  ]
  filesystem {
    fs_types_exclude = "^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs|tmpfs)$"
  }
}

prometheus.scrape "node" {
  targets         = prometheus.exporter.unix.default.targets
  forward_to      = [prometheus.remote_write.default.receiver]
  scrape_interval = "15s"
}

// ============================================================
// METRICS: Docker container metrics (cAdvisor-style)
// ============================================================
prometheus.exporter.cadvisor "docker" {
  docker_host = "unix:///var/run/docker.sock"
}

prometheus.scrape "cadvisor" {
  targets         = prometheus.exporter.cadvisor.docker.targets
  forward_to      = [prometheus.remote_write.default.receiver]
  scrape_interval = "30s"
}

// ============================================================
// METRICS: Ship to Prometheus
// ============================================================
prometheus.remote_write "default" {
  endpoint {
    url = "http://localhost:9090/api/v1/write"
  }
  external_labels = {
    instance = env("HOSTNAME"),
  }
}

// ============================================================
// LOGS: Docker container logs
// ============================================================
discovery.docker "containers" {
  host = "unix:///var/run/docker.sock"
}

discovery.relabel "docker_logs" {
  targets = discovery.docker.containers.targets
  rule {
    source_labels = ["__meta_docker_container_name"]
    regex         = "/(.*)"
    target_label  = "container"
  }
  rule {
    source_labels = ["__meta_docker_container_label_com_docker_compose_service"]
    target_label  = "compose_service"
  }
}

loki.source.docker "containers" {
  host       = "unix:///var/run/docker.sock"
  targets    = discovery.relabel.docker_logs.output
  forward_to = [loki.process.docker_logs.receiver]
}

loki.process "docker_logs" {
  forward_to = [loki.write.default.receiver]
  stage.drop {
    expression = "(?i)healthcheck|health_check"
  }
  stage.static_labels {
    values = { source = "docker" }
  }
}

// ============================================================
// LOGS: Systemd journal
// ============================================================
loki.relabel "journal_labels" {
  forward_to = []
  rule {
    source_labels = ["__journal__systemd_unit"]
    target_label  = "unit"
  }
  rule {
    source_labels = ["__journal__hostname"]
    target_label  = "hostname"
  }
  rule {
    source_labels = ["__journal_priority_keyword"]
    target_label  = "level"
  }
}

loki.source.journal "system" {
  forward_to    = [loki.process.journal.receiver]
  max_age       = "12h"
  relabel_rules = loki.relabel.journal_labels.rules
  labels        = { source = "journal" }
}

loki.process "journal" {
  forward_to = [loki.write.default.receiver]
  stage.drop {
    source     = "level"
    expression = "debug"
  }
}

// ============================================================
// LOGS: Ship to Loki
// ============================================================
loki.write "default" {
  endpoint {
    url = "http://localhost:3100/loki/api/v1/push"
  }
}

How the Pipeline Works

Every block follows <type> "<label>" { ... }. Blocks wire together through expressions:

prometheus.exporter.unix.default.targets outputs scrape targets from the unix exporter
prometheus.scrape "node" consumes those targets and forwards metrics to prometheus.remote_write.default.receiver
loki.source.docker "containers" discovers running containers and sends logs through loki.process.docker_logs.receiver for filtering

This forms a DAG. The built-in UI at port 12345 visualizes this graph, showing data flow and flagging unhealthy components.

Collecting Node Metrics, Docker Metrics, and Logs

Node Metrics

The prometheus.exporter.unix component replaces the standalone Node Exporter binary. Same metrics, no separate process. Use these PromQL queries in Grafana:

// CPU usage per host
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

// Memory usage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

// Disk usage
(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100

Docker Container Metrics

The prometheus.exporter.cadvisor component provides container metrics without a separate cAdvisor container:

rate(container_cpu_usage_seconds_total[5m])          // CPU
container_memory_working_set_bytes                    // Memory
rate(container_network_receive_bytes_total[5m])       // Network

Custom Scrape Targets

Add scrape targets for services that expose Prometheus metrics:

prometheus.scrape "traefik" {
  targets = [{
    __address__ = "traefik:8080",
    job         = "traefik",
  }]
  forward_to   = [prometheus.remote_write.default.receiver]
  metrics_path = "/metrics"
}

Dashboard Setup in Grafana

Open Grafana at http://your-server:3000 and add data sources for Prometheus (http://prometheus:9090) and Loki (http://loki:3100).

Import Community Dashboards

Go to Dashboards > Import and use these IDs:

Node Exporter Full (1860) — Works directly because Alloy's unix exporter produces identical metric names.
Docker Container Monitoring (893) — Compatible with Alloy's cAdvisor metrics.
Loki Log Dashboard (13639) — Log volume and error rates.

Build a Homelab Overview

Create panels for a single-pane-of-glass view:

// All hosts CPU (Time series)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

// Memory by host (Gauge, thresholds: green <70%, yellow 70-85%, red >85%)
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

// Container count (Stat)
count(container_memory_working_set_bytes{name!=""})

For the log panels, use the Loki data source:

// Error rate by host (Time series)
sum by (hostname) (rate({level=~"err|crit|alert|emerg"}[5m]))

// Recent errors (Logs panel)
{level=~"err|crit|alert|emerg"}

Alloy vs Telegraf vs Vector vs OTEL Collector

Feature	Grafana Alloy	Telegraf	Vector	OTEL Collector
Metrics	Native Prometheus + OTLP	300+ input plugins	Prometheus + custom	OTLP native
Logs	Native Loki + OTLP	File tail, syslog	Excellent pipeline	OTLP logs
Traces	OTLP native	Limited	Limited	OTLP native
Config	River (HCL-like)	TOML	TOML	YAML
Built-in UI	Yes (port 12345)	No	No	No
Resource usage	Low-moderate	Low	Very low	Low
Grafana integration	Native	Manual	Manual	Manual

Choose Alloy if you use the Grafana stack. The integration is seamless, and one binary replaces three agents.

Choose Telegraf if you use InfluxDB or need a specific input plugin from its massive ecosystem.

Choose Vector if you have a log-heavy pipeline and need maximum throughput with minimal resources.

Choose OTEL Collector if you need vendor-neutral telemetry that can switch backends easily.

Tips and Best Practices

Use the built-in UI for debugging. Open http://alloy-host:12345 to see the component graph and check for errors. When something isn't working, the UI tells you exactly which component is failing and why.

Keep label cardinality low. Don't add labels with unbounded values like user IDs or request paths. Stick to instance, job, container, and level.

Prefer remote write over scrape. Alloy pushes metrics to Prometheus, so you don't need to configure Prometheus with every Alloy instance's address. Each instance just pushes to the Prometheus URL.

Drop noisy logs early. Use loki.process with stage.drop to filter health checks and debug noise before they reach Loki. Saves storage and makes logs more useful.

Pin image versions. Use specific tags (e.g., grafana/alloy:v1.5.1) instead of latest so updates don't break your stack unexpectedly.

Monitor Alloy itself. Add a self-scrape to catch issues with the collector:

prometheus.scrape "alloy_internal" {
  targets    = [{ __address__ = "localhost:12345", job = "alloy" }]
  forward_to = [prometheus.remote_write.default.receiver]
}

Start with stability.level=generally-available. Alloy has three stability tiers: GA, public-preview, and experimental. Stick with GA for production. Upgrade the stability flag only when you need a specific preview component.

Conclusion

Grafana Alloy collapses the observability agent sprawl that plagues homelab monitoring setups. Instead of maintaining Node Exporter, Promtail, and cAdvisor on every machine, you deploy one binary with one config file. Metrics, logs, and traces flow through a single pipeline you can visualize and debug from Alloy's built-in UI.

The migration path from an existing setup is straightforward — Alloy's unix exporter produces the same metric names as Node Exporter, so your dashboards and alerts work without changes. The River configuration language takes some getting used to coming from YAML, but the ability to reference component outputs as expressions makes complex pipelines cleaner than any YAML-based alternative.

One systemd service or Docker container per machine. One config format. One UI to check when things look wrong. That's the kind of consolidation that makes running a homelab sustainable.