← All articles
a red ball with white dots on it

Homelab Observability with Grafana LGTM Stack

Monitoring 2026-02-15 · 11 min read grafana loki tempo mimir lgtm observability monitoring docker-compose
By HomeLab Starter Editorial TeamHome lab enthusiasts covering hardware setup, networking, and self-hosted services for home and small office environments.

Individual monitoring tools get you partway there. Prometheus shows you that CPU spiked at 3 AM. Loki shows you the error log that happened around the same time. But connecting metrics to logs to traces across your entire homelab requires all three signal types flowing into a unified platform where you can correlate them. That platform is the LGTM stack: Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics.

Photo by Shubham Dhage on Unsplash

This guide walks through deploying the complete LGTM stack on your homelab using Docker Compose, with Grafana Alloy as the unified collection agent. By the end, you will have a single observability platform where clicking on a metric spike takes you to the relevant logs and traces without switching tools or guessing at timestamps.

Grafana logo

Why the Full LGTM Stack

If you already run Prometheus and Grafana, you might wonder why you need Mimir, Loki, and Tempo on top. The short answer is signal correlation. The longer answer involves understanding what each component brings that standalone tools cannot.

Mimir is a horizontally-scalable, long-term metrics store that is fully compatible with the Prometheus remote write API. You can keep running Prometheus as a scraper, but Mimir gives you multi-tenant isolation, cheaper long-term retention via object storage, and global query views across multiple Prometheus instances. For a homelab, the practical benefit is months of metric retention without Prometheus eating all your SSD space.

Loki stores logs indexed only by labels, not by full-text content. This makes it dramatically cheaper to run than Elasticsearch. You query logs using LogQL, which intentionally mirrors PromQL syntax, so you do not need to learn a completely different query language.

Tempo stores distributed traces using the same label-based approach as Loki. It is the cheapest trace backend to operate because it does not index trace content. Instead, it relies on trace IDs and service graph generation for discovery. For a homelab running microservices or multi-container applications, Tempo shows you exactly where requests spend their time.

Grafana ties them together. Its Explore view lets you jump from a metric panel to correlated logs to related traces in a single click. The data sources share label conventions, so job="nginx" in Mimir corresponds to {job="nginx"} in Loki.

LGTM vs. Standalone Tools Comparison

Aspect Standalone (Prometheus + ELK + Jaeger) LGTM Stack
Query languages PromQL + KQL + Jaeger UI PromQL + LogQL (similar syntax)
Storage backend Each has its own Unified object storage for all
Collection agent Multiple (node_exporter, Filebeat, OTEL) Single (Grafana Alloy)
Correlation Manual timestamp matching Native exemplars + TraceQL links
Memory footprint High (Elasticsearch alone needs 4-8 GB) Moderate (Loki and Tempo are lightweight)
Configuration Three different config formats Consistent YAML + River config
Multi-tenancy Varies by tool Built-in across all components

Prerequisites

Before deploying, make sure you have:

For production homelabs with multiple hosts, you will also want Alloy running on each machine, but we will start with a single-node deployment.

Architecture Decisions

Before writing any configuration, there are a few decisions to make.

Monolithic vs. Microservice Mode

Loki, Mimir, and Tempo each support two deployment modes. Monolithic mode runs all components in a single process. Microservice mode splits read and write paths into separate containers that scale independently.

For a homelab, use monolithic mode. Microservice mode is designed for multi-terabyte-per-day ingestion rates that no homelab will reach. Monolithic mode uses less memory, requires fewer containers, and is simpler to configure.

Storage Backend

All three backends can store data on the local filesystem or in object storage (S3-compatible). For a homelab:

This guide uses local filesystem storage for simplicity. Switching to MinIO later requires only changing the storage configuration blocks.

Retention Policy

Set retention based on your available disk space. A reasonable starting point:

Traces are high-volume and low-value for historical analysis, so short retention is standard. Logs stay around longer for debugging. Metrics keep the longest because they compress well and trend analysis benefits from history.

Like what you're reading? Subscribe to HomeLab Starter — free weekly guides in your inbox.

Docker Compose Deployment

Create a directory for your LGTM stack:

mkdir -p ~/docker/lgtm-stack/{config,data/{mimir,loki,tempo,grafana}}
cd ~/docker/lgtm-stack

The Compose File

# ~/docker/lgtm-stack/docker-compose.yml
services:
  mimir:
    image: grafana/mimir:latest
    container_name: mimir
    restart: unless-stopped
    command:
      - -config.file=/etc/mimir/config.yaml
      - -target=all
    volumes:
      - ./config/mimir.yaml:/etc/mimir/config.yaml:ro
      - ./data/mimir:/data
    ports:
      - "9009:9009"
    networks:
      - lgtm

  loki:
    image: grafana/loki:latest
    container_name: loki
    restart: unless-stopped
    command:
      - -config.file=/etc/loki/config.yaml
      - -target=all
    volumes:
      - ./config/loki.yaml:/etc/loki/config.yaml:ro
      - ./data/loki:/loki
    ports:
      - "3100:3100"
    networks:
      - lgtm

  tempo:
    image: grafana/tempo:latest
    container_name: tempo
    restart: unless-stopped
    command:
      - -config.file=/etc/tempo/config.yaml
      - -target=all
    volumes:
      - ./config/tempo.yaml:/etc/tempo/config.yaml:ro
      - ./data/tempo:/var/tempo
    ports:
      - "3200:3200"    # Tempo HTTP API
      - "4317:4317"    # OTLP gRPC
      - "4318:4318"    # OTLP HTTP
    networks:
      - lgtm

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_FEATURE_TOGGLES_ENABLE=traceqlEditor tempoSearch tempoBackendSearch
    volumes:
      - ./data/grafana:/var/lib/grafana
      - ./config/grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml:ro
    ports:
      - "3000:3000"
    depends_on:
      - mimir
      - loki
      - tempo
    networks:
      - lgtm

  alloy:
    image: grafana/alloy:latest
    container_name: alloy
    restart: unless-stopped
    command:
      - run
      - /etc/alloy/config.alloy
      - --server.http.listen-addr=0.0.0.0:12345
      - --storage.path=/var/lib/alloy/data
    volumes:
      - ./config/alloy.river:/etc/alloy/config.alloy:ro
      - /var/log:/var/log:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - "12345:12345"
    depends_on:
      - mimir
      - loki
      - tempo
    networks:
      - lgtm
    pid: host

networks:
  lgtm:
    driver: bridge

Mimir Configuration

# ~/docker/lgtm-stack/config/mimir.yaml
target: all

multitenancy_enabled: false

server:
  http_listen_port: 9009
  log_level: warn

common:
  storage:
    backend: filesystem
    filesystem:
      dir: /data

blocks_storage:
  storage_prefix: blocks
  tsdb:
    dir: /data/tsdb
    retention_period: 90d

compactor:
  data_dir: /data/compactor
  sharding_ring:
    kvstore:
      store: memberlist

distributor:
  ring:
    kvstore:
      store: memberlist

ingester:
  ring:
    kvstore:
      store: memberlist
    replication_factor: 1

store_gateway:
  sharding_ring:
    kvstore:
      store: memberlist

limits:
  max_global_series_per_user: 500000
  ingestion_rate: 50000
  ingestion_burst_size: 100000

ruler_storage:
  backend: filesystem
  filesystem:
    dir: /data/rules

The key settings here: multitenancy_enabled: false simplifies authentication for a single-user homelab, replication_factor: 1 is correct for a single node, and the limits are generous enough for a homelab but prevent runaway cardinality.

Loki Configuration

# ~/docker/lgtm-stack/config/loki.yaml
auth_enabled: false

server:
  http_listen_port: 3100
  log_level: warn

common:
  ring:
    kvstore:
      store: inmemory
  replication_factor: 1
  path_prefix: /loki

schema_config:
  configs:
    - from: "2024-01-01"
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

storage_config:
  filesystem:
    directory: /loki/chunks

limits_config:
  retention_period: 720h  # 30 days
  max_query_series: 5000
  max_query_parallelism: 4

compactor:
  working_directory: /loki/compactor
  retention_enabled: true
  delete_request_store: filesystem

Loki v3 uses the TSDB store by default, which is significantly faster at query time than the older BoltDB index. The v13 schema enables the latest optimizations.

Tempo Configuration

# ~/docker/lgtm-stack/config/tempo.yaml
server:
  http_listen_port: 3200
  log_level: warn

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

storage:
  trace:
    backend: local
    local:
      path: /var/tempo/traces
    wal:
      path: /var/tempo/wal

compactor:
  compaction:
    block_retention: 336h  # 14 days

metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: homelab
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://mimir:9009/api/v1/push
        send_exemplars: true

overrides:
  defaults:
    metrics_generator:
      processors:
        - service-graphs
        - span-metrics

The metrics_generator section is critical for LGTM integration. Tempo generates RED metrics (rate, errors, duration) from traces and pushes them to Mimir. This means your trace data automatically creates metrics you can alert on, without instrumenting anything extra.

Grafana Data Source Provisioning

# ~/docker/lgtm-stack/config/grafana-datasources.yaml
apiVersion: 1

datasources:
  - name: Mimir
    type: prometheus
    access: proxy
    url: http://mimir:9009/prometheus
    isDefault: true
    jsonData:
      httpMethod: POST
      exemplarTraceIdDestinations:
        - name: traceID
          datasourceUid: tempo

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      derivedFields:
        - name: TraceID
          datasourceUid: tempo
          matcherRegex: "traceID=(\\w+)"
          url: "$${__value.raw}"

  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    uid: tempo
    jsonData:
      tracesToLogs:
        datasourceUid: loki
        filterByTraceID: true
        filterBySpanID: true
      tracesToMetrics:
        datasourceUid: mimir
      serviceMap:
        datasourceUid: mimir
      nodeGraph:
        enabled: true

This provisioning file is where the correlation happens. The exemplarTraceIdDestinations in Mimir links metric exemplars to Tempo traces. The derivedFields in Loki extract trace IDs from log lines and link to Tempo. The tracesToLogs and tracesToMetrics in Tempo link back to Loki and Mimir. Every signal type can navigate to the others.

Alloy Collection Configuration

// ~/docker/lgtm-stack/config/alloy.river

// ============================================
// Metrics Collection
// ============================================

// Scrape Alloy's own metrics
prometheus.scrape "alloy_self" {
  targets = [{
    __address__ = "localhost:12345",
    job         = "alloy",
  }]
  forward_to = [prometheus.remote_write.mimir.receiver]
}

// Discover and scrape Docker containers with prometheus labels
discovery.docker "containers" {
  host = "unix:///var/run/docker.sock"
}

discovery.relabel "docker_metrics" {
  targets = discovery.docker.containers.targets

  rule {
    source_labels = ["__meta_docker_container_label_prometheus_scrape"]
    regex         = "true"
    action        = "keep"
  }
  rule {
    source_labels = ["__meta_docker_container_label_prometheus_port"]
    target_label  = "__address__"
    regex         = "(.*)"
    replacement   = "${1}"
  }
  rule {
    source_labels = ["__meta_docker_container_name"]
    target_label  = "container"
  }
}

prometheus.scrape "docker_containers" {
  targets    = discovery.relabel.docker_metrics.output
  forward_to = [prometheus.remote_write.mimir.receiver]
}

// Node-level metrics (host PID namespace required)
prometheus.exporter.unix "node" {}

prometheus.scrape "node_metrics" {
  targets    = prometheus.exporter.unix.node.targets
  forward_to = [prometheus.remote_write.mimir.receiver]
}

prometheus.remote_write "mimir" {
  endpoint {
    url = "http://mimir:9009/api/v1/push"
  }
}

// ============================================
// Log Collection
// ============================================

// System logs (syslog/journal)
local.file_match "syslog" {
  path_targets = [{
    __address__ = "localhost",
    __path__    = "/var/log/syslog",
    job         = "syslog",
    host        = env("HOSTNAME"),
  }]
}

loki.source.file "syslog" {
  targets    = local.file_match.syslog.targets
  forward_to = [loki.process.pipeline.receiver]
}

// Docker container logs
loki.source.docker "containers" {
  host       = "unix:///var/run/docker.sock"
  targets    = discovery.docker.containers.targets
  forward_to = [loki.process.pipeline.receiver]
}

// Log processing pipeline
loki.process "pipeline" {
  // Extract log level
  stage.regex {
    expression = "(?i)(?P<level>error|warn|info|debug)"
  }
  stage.labels {
    values = { level = "" }
  }

  forward_to = [loki.write.loki.receiver]
}

loki.write "loki" {
  endpoint {
    url = "http://loki:3100/loki/api/v1/push"
  }
}

// ============================================
// Trace Collection (OTLP receiver)
// ============================================

otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4327"
  }
  http {
    endpoint = "0.0.0.0:4328"
  }

  output {
    traces = [otelcol.processor.batch.default.input]
  }
}

otelcol.processor.batch "default" {
  output {
    traces = [otelcol.exporter.otlp.tempo.input]
  }
}

otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    tls {
      insecure = true
    }
  }
}

This Alloy configuration handles all three signal types in a single config file. Metrics are scraped from Docker containers and the host, logs are collected from Docker and syslog, and traces are received via OTLP and forwarded to Tempo.

Deploying the Stack

cd ~/docker/lgtm-stack
docker compose up -d

Verify all containers are healthy:

docker compose ps

You should see all five containers running. Check individual logs if anything fails:

docker compose logs mimir --tail=50
docker compose logs loki --tail=50
docker compose logs tempo --tail=50

Common startup issues:

Building Dashboards

Once the stack is running, access Grafana at http://your-host:3000 and log in with the admin password you set.

Node Overview Dashboard

Create a new dashboard and add these panels:

CPU Usage (Mimir data source):

100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage:

(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

Disk I/O:

rate(node_disk_read_bytes_total[5m]) + rate(node_disk_written_bytes_total[5m])

Log Explorer

Navigate to Explore, select the Loki data source, and run:

{job="syslog"} |= "error" | logfmt | line_format "{{.msg}}"

This filters syslog entries containing "error", parses structured fields, and formats the output.

Trace Exploration

If your applications send OTLP traces, navigate to Explore with the Tempo data source. Use the Search tab to find traces by service name, duration, or status code. Click any trace to see the full span waterfall.

Correlating Signals

The real power of the LGTM stack shows up when you correlate. From a metric panel showing elevated error rates:

  1. Click the exemplar dots on the metric graph (small diamonds on the time series)
  2. Grafana jumps to the Tempo trace for that specific request
  3. From the trace view, click "Logs for this span" to see logs from that exact time window and service

This correlation requires your applications to include trace IDs in log output. Most OpenTelemetry SDKs do this automatically. For applications that log without trace context, the timestamp-based correlation in Grafana still works reasonably well.

Adding Applications to the Pipeline

Instrumenting with OpenTelemetry

For applications you control, add the OpenTelemetry SDK. Here is a Node.js example:

// tracing.js — load before your application
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://alloy:4327',  // Alloy's OTLP gRPC endpoint
  }),
  instrumentations: [getNodeAutoInstrumentations()],
  serviceName: 'my-homelab-app',
});

sdk.start();

Docker Labels for Auto-Discovery

Add labels to your Docker Compose services so Alloy automatically discovers and scrapes them:

services:
  my-app:
    image: my-app:latest
    labels:
      prometheus.scrape: "true"
      prometheus.port: "my-app:8080"

Performance Tuning for Homelab Hardware

The LGTM stack is designed for large-scale deployments, so the defaults are often too aggressive for homelab hardware. Here are the adjustments that matter most.

Memory Limits

Set container memory limits to prevent any single component from consuming all available RAM:

services:
  mimir:
    deploy:
      resources:
        limits:
          memory: 2g
  loki:
    deploy:
      resources:
        limits:
          memory: 1g
  tempo:
    deploy:
      resources:
        limits:
          memory: 1g
  grafana:
    deploy:
      resources:
        limits:
          memory: 512m

Reducing Mimir Resource Usage

Add these to your Mimir config for lower memory consumption:

ingester:
  ring:
    kvstore:
      store: memberlist
    replication_factor: 1
  max_transfer_retries: 0

compactor:
  compaction_interval: 30m  # less frequent than default

querier:
  max_concurrent: 4  # limit parallel queries

Loki Chunk Tuning

ingester:
  chunk_idle_period: 30m
  chunk_retain_period: 1m
  max_chunk_age: 2h

Larger chunks mean fewer index entries and better compression, at the cost of slightly higher memory usage during ingestion.

Alerting

Mimir supports Prometheus-compatible alerting rules. Create a rules file:

# ~/docker/lgtm-stack/config/rules/homelab-alerts.yaml
groups:
  - name: homelab
    interval: 1m
    rules:
      - alert: HighCPU
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Disk space below 10% on {{ $labels.instance }}"

      - alert: HighErrorRate
        expr: sum(rate({job=~".+"} |= "error" [5m])) by (job) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate in {{ $labels.job }} logs"

Mount the rules directory into the Mimir container and add the ruler configuration:

# Add to mimir.yaml
ruler:
  rule_path: /data/rules-temp
  alertmanager_url: http://alertmanager:9093
  ring:
    kvstore:
      store: memberlist

Maintenance and Upgrades

Backup Strategy

The critical data to back up:

  1. Grafana dashboards: Export as JSON or use provisioning files (version-controlled)
  2. Configuration files: Already in your config/ directory (version-control this)
  3. Alert rules: Version-control alongside configurations
  4. Data directories: Optional. Metrics and logs can be re-collected, but historical data is nice to keep

Upgrading Components

The LGTM components follow a regular release cadence. To upgrade:

cd ~/docker/lgtm-stack
docker compose pull
docker compose up -d

Check the Grafana Labs changelog before upgrading. Mimir and Loki occasionally introduce breaking config changes between minor versions. Pin to specific versions in production:

services:
  mimir:
    image: grafana/mimir:2.14.0  # pin version

Monitoring the Monitors

Alloy's built-in UI at port 12345 shows the health of every pipeline component. Check it when something seems wrong. Additionally, Mimir, Loki, and Tempo all expose /ready and /metrics endpoints. Add these to your Alloy scrape config for meta-monitoring.

Next Steps

Once the base LGTM stack is running, consider:

The LGTM stack is a serious observability platform running on homelab hardware. It gives you the same tooling that companies use to monitor production systems at scale, and once configured, it mostly runs itself. The initial setup investment pays off every time you need to debug an issue and can correlate metrics, logs, and traces in a single pane of glass.

Get free weekly tips in your inbox. Subscribe to HomeLab Starter