Skip to main content

Architecture Design

This document details the architecture of a Highly Available Centralized Logging System based on the Grafana Loki Stack, deployed on an on-premise Kubernetes cluster.


1. Objectives of the Architecture

  • Achieve end-to-end log traceability for services (Node.js, Go, Python)
  • Ensure High Availability (HA) of log collection, storage, and querying
  • Design for on-premise deployment: no dependency on external cloud services
  • Use open source, Kubernetes-native tools
  • Enable log search, dashboarding, and alerting
  • Allow easy scaling and fault recovery

2. Architectural Components

🔹 2.1. Log Shipper: Promtail / Fluent Bit

  • Deployment: DaemonSet (one per node)
  • Function:
    • Tails logs from /var/log/containers/*.log
    • Enriches logs with metadata: namespace, pod, container, labels
    • Sends logs to Loki Distributor via HTTP Push

🔹 2.2. Loki Components

a. Distributor

  • Type: Deployment (≥ 2 replicas)
  • Responsibility: Accepts incoming logs and routes them to available ingesters using consistent hashing

b. Ingester

  • Type: StatefulSet with Persistent Volume (PVC)
  • Responsibility: Writes logs to local disk temporarily and uploads to long-term storage (MinIO/S3)

c. Querier

  • Type: Deployment (≥ 2 replicas)
  • Responsibility: Reads logs from object storage and returns search results to Grafana

🔹 2.3. MinIO (S3-compatible)

  • Deployment: StatefulSet (4+ nodes recommended for redundancy)
  • Responsibility: Long-term object storage backend for Loki
  • Alternative: Ceph with S3 gateway

🔹 2.4. Grafana

  • Deployment: Deployment + PVC (optional)
  • Responsibility:
    • Queries Loki via querier
    • Provides dashboards and alerting
    • RBAC and folder-based access control

3. High Availability Strategy

ComponentStrategy
PromtailDaemonSet (one per node, tolerates pod loss)
Distributor≥ 2 replicas, load balanced via Service
IngesterStatefulSet + PVC + anti-affinity
Querier≥ 2 replicas, stateless, scalable
MinIODistributed mode, data redundancy via erasure coding
GrafanaStateless + optional PVC for dashboards
ServicesKubernetes Service (ClusterIP / LoadBalancer)
IngressNGINX / MetalLB with TLS termination

4. Log Flow Diagram

Go