Centralized Log Aggregation: ELK, Graylog, Vector, and Syslog-ng

Monitoring 2026-02-09 · 7 min read logging elk graylog vector syslog

When something breaks in your homelab, the answer is almost always in the logs. But the logs are scattered across a dozen machines and fifty containers — Proxmox logs on one server, Docker container logs on another, firewall logs on your router, application logs in various directories. By the time you find the relevant log file, SSH into the right machine, and grep through thousands of lines, the problem has either resolved itself or you've lost the context of what you were looking for.

Centralized log aggregation solves this by shipping all logs to one place where you can search, filter, and correlate them. A query like "show me everything that happened across all services between 2:15 AM and 2:20 AM" becomes trivial instead of impossible.

This guide covers the major options beyond Loki (which we've covered separately) — each with different trade-offs in resource usage, features, and complexity.

The Options at a Glance

Solution	Architecture	Storage	RAM Minimum	Best For
Loki + Grafana	Log labels, not indexes	Object store / filesystem	~256 MB	Already using Grafana, cost-conscious
ELK Stack	Full-text indexing	Elasticsearch	~4 GB	Full-text search, complex queries
Graylog	Full-text indexing	OpenSearch/Elasticsearch	~3 GB	GELF input, stream routing, alerts
Vector + ClickHouse	Pipeline + columnar DB	ClickHouse	~1 GB	High-volume, structured data
syslog-ng + flat files	Traditional syslog	Filesystem	~50 MB	Lightweight, compliance, archival

ELK Stack (Elasticsearch, Logstash, Kibana)

The ELK stack is the industry standard for log aggregation. Elasticsearch indexes your logs for fast full-text search, Logstash processes and transforms log data, and Kibana provides the web UI for searching and visualizing.

When to Choose ELK

You need powerful full-text search across millions of log entries
You want the richest visualization and dashboard capabilities
You have 8+ GB of RAM to spare
You're learning for career purposes (ELK is used widely in production)

Deployment

The modern approach replaces Logstash with Filebeat for log collection (lighter) and uses Elasticsearch directly for processing:

# docker-compose.yml
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    restart: unless-stopped
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=true
      - ELASTIC_PASSWORD=change-this-password
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
    volumes:
      - es-data:/usr/share/elasticsearch/data
    ports:
      - "9200:9200"
    ulimits:
      memlock:
        soft: -1
        hard: -1

  kibana:
    image: docker.elastic.co/kibana/kibana:8.12.0
    restart: unless-stopped
    ports:
      - "5601:5601"
    environment:
      ELASTICSEARCH_HOSTS: '["http://elasticsearch:9200"]'
      ELASTICSEARCH_USERNAME: kibana_system
      ELASTICSEARCH_PASSWORD: kibana-password
    depends_on:
      - elasticsearch

  logstash:
    image: docker.elastic.co/logstash/logstash:8.12.0
    restart: unless-stopped
    ports:
      - "5044:5044"    # Beats input
      - "5514:5514"    # Syslog input
      - "5514:5514/udp"
    volumes:
      - ./logstash/pipeline:/usr/share/logstash/pipeline
    environment:
      LS_JAVA_OPTS: "-Xms512m -Xmx512m"
    depends_on:
      - elasticsearch

volumes:
  es-data:

Logstash Pipeline

# logstash/pipeline/homelab.conf
input {
  # Accept syslog from network devices and servers
  syslog {
    port => 5514
    type => "syslog"
  }

  # Accept Filebeat input
  beats {
    port => 5044
  }
}

filter {
  # Parse syslog messages
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:source_host} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:log_message}" }
    }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }

  # Parse Docker JSON logs
  if [fields][source] == "docker" {
    json {
      source => "message"
    }
  }

  # Add geographic data for firewall logs (optional)
  if [source_ip] {
    geoip {
      source => "source_ip"
    }
  }
}

output {
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    user => "elastic"
    password => "change-this-password"
    index => "homelab-%{+YYYY.MM.dd}"
  }
}

Filebeat on Client Machines

Install Filebeat on each server to ship logs to Logstash:

# /etc/filebeat/filebeat.yml
filebeat.inputs:
  - type: log
    paths:
      - /var/log/syslog
      - /var/log/auth.log
    fields:
      source: system

  - type: container
    paths:
      - /var/lib/docker/containers/*/*.log
    fields:
      source: docker

output.logstash:
  hosts: ["logserver.homelab.lan:5044"]

# Install Filebeat
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.12.0-amd64.deb
sudo dpkg -i filebeat-8.12.0-amd64.deb
sudo systemctl enable --now filebeat

Index Lifecycle Management

Elasticsearch indexes grow without bound unless you configure retention:

# Create an ILM policy via Kibana Dev Tools or API
curl -X PUT "localhost:9200/_ilm/policy/homelab-policy" \
  -H 'Content-Type: application/json' \
  -u elastic:password \
  -d '{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "5gb",
            "max_age": "7d"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}'

Graylog

Graylog is a purpose-built log management platform. Unlike ELK (which is a general-purpose search engine repurposed for logs), Graylog was designed from the ground up for log collection, processing, and alerting.

Why Graylog Over ELK

Native GELF (Graylog Extended Log Format) input — many applications and Docker support GELF natively
Stream-based routing: send firewall logs to one stream, Docker logs to another, with different retention policies
Built-in alerting without additional components
Simpler to operate for pure log management (ELK has more general-purpose complexity)

Deployment

# docker-compose.yml
services:
  mongodb:
    image: mongo:6
    restart: unless-stopped
    volumes:
      - mongo-data:/data/db

  opensearch:
    image: opensearchproject/opensearch:2
    restart: unless-stopped
    environment:
      - discovery.type=single-node
      - "OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g"
      - DISABLE_SECURITY_PLUGIN=true
    volumes:
      - os-data:/usr/share/opensearch/data
    ulimits:
      memlock:
        soft: -1
        hard: -1

  graylog:
    image: graylog/graylog:5.2
    restart: unless-stopped
    depends_on:
      - mongodb
      - opensearch
    ports:
      - "9000:9000"       # Web UI
      - "1514:1514"       # Syslog TCP
      - "1514:1514/udp"   # Syslog UDP
      - "12201:12201"     # GELF TCP
      - "12201:12201/udp" # GELF UDP
    environment:
      GRAYLOG_PASSWORD_SECRET: "a-long-random-string-at-least-16-chars"
      GRAYLOG_ROOT_PASSWORD_SHA2: "your-sha256-hashed-password"
      GRAYLOG_HTTP_EXTERNAL_URI: "http://10.0.0.50:9000/"
      GRAYLOG_ELASTICSEARCH_HOSTS: "http://opensearch:9200"
      GRAYLOG_MONGODB_URI: "mongodb://mongodb:27017/graylog"
    volumes:
      - graylog-data:/usr/share/graylog/data

volumes:
  mongo-data:
  os-data:
  graylog-data:

Generate the password hash:

echo -n "your-admin-password" | sha256sum | cut -d' ' -f1

Docker GELF Logging Driver

Docker can send container logs directly to Graylog via GELF:

# Per container
docker run --log-driver=gelf \
  --log-opt gelf-address=udp://10.0.0.50:12201 \
  --log-opt tag="{{.Name}}" \
  nginx

# Or set as default in /etc/docker/daemon.json
{
  "log-driver": "gelf",
  "log-opts": {
    "gelf-address": "udp://10.0.0.50:12201",
    "tag": "{{.Name}}"
  }
}

Stream Configuration

In the Graylog web UI, create streams to organize logs:

Infrastructure: Match on source containing your server hostnames
Docker: Match on facility = docker or tag containing container names
Firewall: Match on application_name = filterlog (pfSense) or source = your firewall
Security: Match on facility = auth or message containing "Failed password"

Each stream can have its own retention period, alert rules, and access permissions.

Vector: The Modern Log Pipeline

Vector (by Datadog, open source) is a high-performance log and metrics pipeline written in Rust. It replaces Logstash, Filebeat, Fluentd, and similar tools with a single binary that can collect, transform, and route observability data.

Why Vector

Extremely resource-efficient (10x less memory than Logstash for equivalent workloads)
Single binary, no JVM
Can replace both the collector (Filebeat) and the processor (Logstash)
Supports dozens of sources and sinks
VRL (Vector Remap Language) for powerful log transformation

Deployment as an Agent

# docker-compose.yml
services:
  vector:
    image: timberio/vector:latest-alpine
    restart: unless-stopped
    ports:
      - "8686:8686"    # API
      - "5514:5514"    # Syslog
    volumes:
      - ./vector.yaml:/etc/vector/vector.yaml:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro

Vector Configuration

# vector.yaml
sources:
  # Collect syslog from the host
  host_syslog:
    type: file
    include:
      - /var/log/syslog
      - /var/log/auth.log

  # Collect Docker container logs
  docker_logs:
    type: docker_logs
    include_containers:
      - "*"

  # Accept syslog from network devices
  network_syslog:
    type: syslog
    address: "0.0.0.0:5514"
    mode: udp

transforms:
  # Parse and enrich logs
  parsed_logs:
    type: remap
    inputs:
      - host_syslog
      - docker_logs
      - network_syslog
    source: |
      # Add a homelab source tag
      .homelab_source = "homelab"

      # Parse severity from syslog
      if exists(.severity) {
        .level = to_string!(.severity)
      }

      # Extract container name from Docker logs
      if exists(.container_name) {
        .service = replace!(.container_name, "/", "")
      }

      # Detect and tag error messages
      if match(to_string!(.message), r'(?i)(error|fail|critical|panic)') {
        .is_error = true
      }

  # Filter out noisy health check logs
  filtered:
    type: filter
    inputs:
      - parsed_logs
    condition:
      type: vrl
      source: '!match(to_string!(.message), r"GET /health")'

sinks:
  # Send to Elasticsearch/OpenSearch
  elasticsearch:
    type: elasticsearch
    inputs:
      - filtered
    endpoints:
      - "http://elasticsearch:9200"
    bulk:
      index: "homelab-%Y-%m-%d"
    auth:
      strategy: basic
      user: elastic
      password: change-this-password

  # Also write to local files as backup
  file_backup:
    type: file
    inputs:
      - filtered
    path: "/var/log/vector/homelab-%Y-%m-%d.log"
    encoding:
      codec: json

Vector as a Replacement for the Full ELK Stack

For simpler setups, Vector can write directly to ClickHouse (columnar database, efficient for log queries) instead of Elasticsearch, significantly reducing resource usage:

sinks:
  clickhouse:
    type: clickhouse
    inputs:
      - filtered
    endpoint: "http://clickhouse:8123"
    database: logs
    table: homelab_logs
    auth:
      strategy: basic
      user: default
      password: ""

syslog-ng: Traditional and Lightweight

For pure syslog collection — network devices, servers, firewalls — syslog-ng is battle-tested and uses minimal resources.

Configuration

# /etc/syslog-ng/syslog-ng.conf

@version: 4.4

source s_network {
    syslog(
        ip("0.0.0.0")
        port(514)
        transport("udp")
    );
    syslog(
        ip("0.0.0.0")
        port(514)
        transport("tcp")
    );
};

# Separate logs by source host
destination d_hosts {
    file("/var/log/remote/${HOST}/${YEAR}-${MONTH}-${DAY}.log"
        create-dirs(yes)
    );
};

# Firewall logs to a separate directory
filter f_firewall {
    host("10.0.0.1") or program("filterlog");
};

destination d_firewall {
    file("/var/log/remote/firewall/${YEAR}-${MONTH}-${DAY}.log"
        create-dirs(yes)
    );
};

log {
    source(s_network);
    filter(f_firewall);
    destination(d_firewall);
    flags(final);
};

log {
    source(s_network);
    destination(d_hosts);
};

syslog-ng writes structured log files that you can search with grep, awk, or feed into a log viewer like lnav. It's not as fancy as Elasticsearch, but it uses ~50 MB of RAM and never loses logs because a Java process ran out of heap space.

Choosing Your Stack

You have 4+ GB RAM to spare and want the best search experience: ELK or Graylog. ELK has a larger ecosystem; Graylog is more focused on log management.

You want modern and efficient: Vector as the pipeline, with ClickHouse or Elasticsearch as the backend. Best performance per resource dollar.

You just need syslog from network devices: syslog-ng or rsyslog. Add lnav for a nice terminal-based log viewer.

You're already running Grafana and Prometheus: Stick with Loki (covered in our separate guide). It integrates seamlessly and uses the least resources for label-based log queries.

Whatever you choose, the goal is the same: when something goes wrong at 2 AM, you open one interface, type a query, and find the answer instead of SSH-ing into six machines and grepping through log files. That capability alone is worth the setup effort.