Setting Up Grafana Alloy for Homelab Observability
Running separate agents for metrics, logs, and traces gets old fast. You end up with Promtail on every machine for logs, Node Exporter for metrics, maybe an OTEL Collector somewhere for traces, each with its own config format and deployment lifecycle. Grafana Alloy replaces all of them with a single binary that handles metrics, logs, and traces through one unified configuration.
Photo by Markus Winkler on Unsplash
Alloy is Grafana Labs' successor to Grafana Agent. It shipped as a stable release in early 2024, and Grafana Agent has been officially deprecated since. If you're still running Grafana Agent or juggling multiple collection agents across your homelab, Alloy is the upgrade worth making.

What Is Grafana Alloy
Grafana Alloy is an OpenTelemetry-compatible telemetry collector. It collects metrics, logs, and traces from your infrastructure and applications, processes them with pipeline stages, and ships them to backends like Prometheus, Loki, Tempo, or any OTLP-compatible endpoint.
The key differences from the old Grafana Agent:
- River configuration language — A purpose-built config language that replaces YAML. More expressive and supports referencing values between components.
- Component-based architecture — Everything is a component: sources, processors, and exporters. You wire them together into a pipeline.
- Built-in UI — A web UI at port 12345 showing your running pipeline, component health, and live data flow.
- Native OpenTelemetry support — Full OTLP receiver and exporter support alongside Prometheus and Loki native protocols.
- Single binary — One process replaces Node Exporter, Promtail, OTEL Collector, and Grafana Agent.
Architecture Overview
Alloy runs on every machine you want to observe. Each instance collects local metrics, scrapes log files, and optionally receives traces. Everything flows to your central backends.
[Machine 1: Alloy] ──metrics──→ Prometheus
──logs────→ Loki
──traces──→ Tempo
[Machine 2: Alloy] ──metrics──→ Prometheus
──logs────→ Loki
↑
[Grafana queries all three backends]
For smaller homelabs (2-5 machines), run the backends and Alloy on the same box. For larger setups, dedicate a machine to your monitoring stack and run Alloy agents on everything else.
Installation
Docker
docker run -d \
--name alloy \
--restart=unless-stopped \
--net=host --pid=host \
-v /:/host:ro,rslave \
-v /var/log:/var/log:ro \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v ./config.alloy:/etc/alloy/config.alloy \
grafana/alloy:latest \
run /etc/alloy/config.alloy \
--server.http.listen-addr=0.0.0.0:12345 \
--stability.level=generally-available
The --net=host and --pid=host flags give Alloy access to accurate host metrics. The Docker socket mount enables container discovery.
Bare Metal (Debian/Ubuntu)
sudo mkdir -p /etc/apt/keyrings
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update && sudo apt install alloy
sudo systemctl enable --now alloy
For Fedora/RHEL: sudo dnf config-manager --add-repo https://rpm.grafana.com && sudo dnf install alloy
The config file lives at /etc/alloy/config.alloy. The built-in UI is at http://localhost:12345.
Kubernetes (Helm)
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install alloy grafana/alloy -n monitoring --create-namespace -f values.yaml
Like what you're reading? Subscribe to HomeLab Starter — free weekly guides in your inbox.
Full Docker Compose Stack
A complete observability stack with Alloy, Prometheus, Loki, and Grafana:
# ~/monitoring/docker-compose.yml
services:
alloy:
image: grafana/alloy:latest
container_name: alloy
restart: unless-stopped
network_mode: host
pid: host
volumes:
- /:/host:ro,rslave
- /var/log:/var/log:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./config.alloy:/etc/alloy/config.alloy
command:
- run
- /etc/alloy/config.alloy
- --server.http.listen-addr=0.0.0.0:12345
- --stability.level=generally-available
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- prometheus_data:/prometheus
command:
- '--storage.tsdb.retention.time=90d'
- '--web.enable-remote-write-receiver'
- '--config.file=/etc/prometheus/prometheus.yml'
loki:
image: grafana/loki:3.3.2
container_name: loki
restart: unless-stopped
ports:
- "3100:3100"
volumes:
- ./loki-config.yml:/etc/loki/loki-config.yml
- loki_data:/loki
command: -config.file=/etc/loki/loki-config.yml
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
GF_SECURITY_ADMIN_PASSWORD: "changeme"
GF_USERS_ALLOW_SIGN_UP: "false"
volumes:
prometheus_data:
loki_data:
grafana_data:
The --web.enable-remote-write-receiver flag on Prometheus is critical — it lets Alloy push metrics via the remote write API instead of requiring Prometheus to scrape each Alloy instance.
Configuration Walkthrough
Alloy uses the River configuration language. If you've used HCL (Terraform), River will feel familiar. Every block is a component with a type, an optional label, and arguments. Components reference each other through expressions, forming a directed pipeline.
Here's a complete config.alloy that collects node metrics, Docker container metrics, Docker logs, and journal logs:
// ============================================================
// METRICS: Node Exporter (host metrics)
// ============================================================
prometheus.exporter.unix "default" {
set_collectors = [
"cpu", "diskstats", "filesystem", "loadavg",
"meminfo", "netdev", "os", "time", "uname",
]
filesystem {
fs_types_exclude = "^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs|tmpfs)$"
}
}
prometheus.scrape "node" {
targets = prometheus.exporter.unix.default.targets
forward_to = [prometheus.remote_write.default.receiver]
scrape_interval = "15s"
}
// ============================================================
// METRICS: Docker container metrics (cAdvisor-style)
// ============================================================
prometheus.exporter.cadvisor "docker" {
docker_host = "unix:///var/run/docker.sock"
}
prometheus.scrape "cadvisor" {
targets = prometheus.exporter.cadvisor.docker.targets
forward_to = [prometheus.remote_write.default.receiver]
scrape_interval = "30s"
}
// ============================================================
// METRICS: Ship to Prometheus
// ============================================================
prometheus.remote_write "default" {
endpoint {
url = "http://localhost:9090/api/v1/write"
}
external_labels = {
instance = env("HOSTNAME"),
}
}
// ============================================================
// LOGS: Docker container logs
// ============================================================
discovery.docker "containers" {
host = "unix:///var/run/docker.sock"
}
discovery.relabel "docker_logs" {
targets = discovery.docker.containers.targets
rule {
source_labels = ["__meta_docker_container_name"]
regex = "/(.*)"
target_label = "container"
}
rule {
source_labels = ["__meta_docker_container_label_com_docker_compose_service"]
target_label = "compose_service"
}
}
loki.source.docker "containers" {
host = "unix:///var/run/docker.sock"
targets = discovery.relabel.docker_logs.output
forward_to = [loki.process.docker_logs.receiver]
}
loki.process "docker_logs" {
forward_to = [loki.write.default.receiver]
stage.drop {
expression = "(?i)healthcheck|health_check"
}
stage.static_labels {
values = { source = "docker" }
}
}
// ============================================================
// LOGS: Systemd journal
// ============================================================
loki.relabel "journal_labels" {
forward_to = []
rule {
source_labels = ["__journal__systemd_unit"]
target_label = "unit"
}
rule {
source_labels = ["__journal__hostname"]
target_label = "hostname"
}
rule {
source_labels = ["__journal_priority_keyword"]
target_label = "level"
}
}
loki.source.journal "system" {
forward_to = [loki.process.journal.receiver]
max_age = "12h"
relabel_rules = loki.relabel.journal_labels.rules
labels = { source = "journal" }
}
loki.process "journal" {
forward_to = [loki.write.default.receiver]
stage.drop {
source = "level"
expression = "debug"
}
}
// ============================================================
// LOGS: Ship to Loki
// ============================================================
loki.write "default" {
endpoint {
url = "http://localhost:3100/loki/api/v1/push"
}
}
How the Pipeline Works
Every block follows <type> "<label>" { ... }. Blocks wire together through expressions:
prometheus.exporter.unix.default.targetsoutputs scrape targets from the unix exporterprometheus.scrape "node"consumes those targets and forwards metrics toprometheus.remote_write.default.receiverloki.source.docker "containers"discovers running containers and sends logs throughloki.process.docker_logs.receiverfor filtering
This forms a DAG. The built-in UI at port 12345 visualizes this graph, showing data flow and flagging unhealthy components.
Collecting Node Metrics, Docker Metrics, and Logs
Node Metrics
The prometheus.exporter.unix component replaces the standalone Node Exporter binary. Same metrics, no separate process. Use these PromQL queries in Grafana:
// CPU usage per host
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
// Memory usage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
// Disk usage
(1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100
Docker Container Metrics
The prometheus.exporter.cadvisor component provides container metrics without a separate cAdvisor container:
rate(container_cpu_usage_seconds_total[5m]) // CPU
container_memory_working_set_bytes // Memory
rate(container_network_receive_bytes_total[5m]) // Network
Custom Scrape Targets
Add scrape targets for services that expose Prometheus metrics:
prometheus.scrape "traefik" {
targets = [{
__address__ = "traefik:8080",
job = "traefik",
}]
forward_to = [prometheus.remote_write.default.receiver]
metrics_path = "/metrics"
}
Dashboard Setup in Grafana
Open Grafana at http://your-server:3000 and add data sources for Prometheus (http://prometheus:9090) and Loki (http://loki:3100).
Import Community Dashboards
Go to Dashboards > Import and use these IDs:
- Node Exporter Full (1860) — Works directly because Alloy's unix exporter produces identical metric names.
- Docker Container Monitoring (893) — Compatible with Alloy's cAdvisor metrics.
- Loki Log Dashboard (13639) — Log volume and error rates.
Build a Homelab Overview
Create panels for a single-pane-of-glass view:
// All hosts CPU (Time series)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
// Memory by host (Gauge, thresholds: green <70%, yellow 70-85%, red >85%)
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
// Container count (Stat)
count(container_memory_working_set_bytes{name!=""})
For the log panels, use the Loki data source:
// Error rate by host (Time series)
sum by (hostname) (rate({level=~"err|crit|alert|emerg"}[5m]))
// Recent errors (Logs panel)
{level=~"err|crit|alert|emerg"}
Alloy vs Telegraf vs Vector vs OTEL Collector
| Feature | Grafana Alloy | Telegraf | Vector | OTEL Collector |
|---|---|---|---|---|
| Metrics | Native Prometheus + OTLP | 300+ input plugins | Prometheus + custom | OTLP native |
| Logs | Native Loki + OTLP | File tail, syslog | Excellent pipeline | OTLP logs |
| Traces | OTLP native | Limited | Limited | OTLP native |
| Config | River (HCL-like) | TOML | TOML | YAML |
| Built-in UI | Yes (port 12345) | No | No | No |
| Resource usage | Low-moderate | Low | Very low | Low |
| Grafana integration | Native | Manual | Manual | Manual |
Choose Alloy if you use the Grafana stack. The integration is seamless, and one binary replaces three agents.
Choose Telegraf if you use InfluxDB or need a specific input plugin from its massive ecosystem.
Choose Vector if you have a log-heavy pipeline and need maximum throughput with minimal resources.
Choose OTEL Collector if you need vendor-neutral telemetry that can switch backends easily.
Tips and Best Practices
Use the built-in UI for debugging. Open http://alloy-host:12345 to see the component graph and check for errors. When something isn't working, the UI tells you exactly which component is failing and why.
Keep label cardinality low. Don't add labels with unbounded values like user IDs or request paths. Stick to instance, job, container, and level.
Prefer remote write over scrape. Alloy pushes metrics to Prometheus, so you don't need to configure Prometheus with every Alloy instance's address. Each instance just pushes to the Prometheus URL.
Drop noisy logs early. Use loki.process with stage.drop to filter health checks and debug noise before they reach Loki. Saves storage and makes logs more useful.
Pin image versions. Use specific tags (e.g., grafana/alloy:v1.5.1) instead of latest so updates don't break your stack unexpectedly.
Monitor Alloy itself. Add a self-scrape to catch issues with the collector:
prometheus.scrape "alloy_internal" {
targets = [{ __address__ = "localhost:12345", job = "alloy" }]
forward_to = [prometheus.remote_write.default.receiver]
}
Start with stability.level=generally-available. Alloy has three stability tiers: GA, public-preview, and experimental. Stick with GA for production. Upgrade the stability flag only when you need a specific preview component.
Conclusion
Grafana Alloy collapses the observability agent sprawl that plagues homelab monitoring setups. Instead of maintaining Node Exporter, Promtail, and cAdvisor on every machine, you deploy one binary with one config file. Metrics, logs, and traces flow through a single pipeline you can visualize and debug from Alloy's built-in UI.
The migration path from an existing setup is straightforward — Alloy's unix exporter produces the same metric names as Node Exporter, so your dashboards and alerts work without changes. The River configuration language takes some getting used to coming from YAML, but the ability to reference component outputs as expressions makes complex pipelines cleaner than any YAML-based alternative.
One systemd service or Docker container per machine. One config format. One UI to check when things look wrong. That's the kind of consolidation that makes running a homelab sustainable.
