Prometheus and Thanos: Metrics at Scale

You can’t fix what you can’t see. You can’t optimize what you can’t measure.

Prometheus is the standard for Kubernetes metrics. It works beautifully — until you need long-term storage, or multiple clusters, or high availability. Then you hit its limits.

Thanos extends Prometheus without replacing it. Keep your existing setup, add Thanos components, get unlimited retention and global querying.

The Problem with Standalone Prometheus

Prometheus has built-in limitations:

Single node — No native clustering or HA
Local storage — Retention limited by disk size
Single cluster view — Can’t query across clusters
No downsampling — Old data takes as much space as new

For a single small cluster with 2 weeks retention, these aren’t problems. For production multi-cluster environments with compliance requirements, they’re blockers.

Thanos Architecture

Thanos adds components around Prometheus:

flowchart TD
    subgraph clusterA["Cluster A"]
        PA["Prometheus + Sidecar"]
    end

    subgraph clusterB["Cluster B"]
        PB["Prometheus + Sidecar"]
    end

    PA --> OS["Object Storage<br/>(S3/MinIO/GCS)"]
    PB --> OS

    OS --> Q["Querier"]
    OS --> SG["Store Gateway"]
    OS --> C["Compactor"]

    SG --> Q
    Q --> G["Grafana"]

Sidecar — Runs alongside Prometheus, uploads blocks to object storage Store Gateway — Serves historical data from object storage Querier — Aggregates data from sidecars and store gateway Compactor — Downsamples and deduplicates data in object storage

Installing Thanos

Using the Bitnami Helm chart:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

helm install thanos bitnami/thanos \
  --namespace monitoring \
  --create-namespace \
  --set objstoreConfig="$(cat thanos-objstore.yaml)"

Object store configuration (thanos-objstore.yaml):

type: s3
config:
  bucket: thanos-metrics
  endpoint: minio.storage:9000
  access_key: ${MINIO_ACCESS_KEY}
  secret_key: ${MINIO_SECRET_KEY}
  insecure: true  # For MinIO without TLS

Prometheus with Thanos Sidecar

Modify your Prometheus deployment to include the sidecar:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 2  # HA pair
  retention: 2h  # Short local retention, Thanos handles long-term

  # Thanos sidecar configuration
  thanos:
    baseImage: quay.io/thanos/thanos
    version: v0.34.0
    objectStorageConfig:
      key: thanos.yaml
      name: thanos-objstore-secret

  # External labels for deduplication
  externalLabels:
    cluster: production
    replica: $(POD_NAME)

  # Let Thanos sidecar access Prometheus data
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: longhorn
        resources:
          requests:
            storage: 50Gi

The sidecar:

Exposes Prometheus data to Thanos Querier via gRPC
Uploads completed TSDB blocks to object storage
Responds to Store API queries for recent data

Thanos Components Configuration

Querier

apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-querier
  namespace: monitoring
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: thanos-query
          image: quay.io/thanos/thanos:v0.34.0
          args:
            - query
            - --http-address=0.0.0.0:9090
            - --grpc-address=0.0.0.0:10901
            # Connect to sidecars
            - --store=dnssrv+_grpc._tcp.prometheus-operated.monitoring.svc
            # Connect to store gateway
            - --store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc
            # Deduplication
            - --query.replica-label=replica
          ports:
            - name: http
              containerPort: 9090
            - name: grpc
              containerPort: 10901

Store Gateway

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store
  namespace: monitoring
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: thanos-store
          image: quay.io/thanos/thanos:v0.34.0
          args:
            - store
            - --http-address=0.0.0.0:10902
            - --grpc-address=0.0.0.0:10901
            - --data-dir=/var/thanos/store
            - --objstore.config-file=/etc/thanos/objstore.yaml
          volumeMounts:
            - name: objstore-config
              mountPath: /etc/thanos
            - name: data
              mountPath: /var/thanos/store
      volumes:
        - name: objstore-config
          secret:
            secretName: thanos-objstore-secret
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        storageClassName: longhorn
        resources:
          requests:
            storage: 10Gi

Compactor

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-compactor
  namespace: monitoring
spec:
  replicas: 1  # Only one compactor!
  template:
    spec:
      containers:
        - name: thanos-compact
          image: quay.io/thanos/thanos:v0.34.0
          args:
            - compact
            - --http-address=0.0.0.0:10902
            - --data-dir=/var/thanos/compact
            - --objstore.config-file=/etc/thanos/objstore.yaml
            - --retention.resolution-raw=30d
            - --retention.resolution-5m=90d
            - --retention.resolution-1h=1y
            - --wait
          volumeMounts:
            - name: objstore-config
              mountPath: /etc/thanos
            - name: data
              mountPath: /var/thanos/compact

Retention configuration:

Raw data: 30 days at full resolution
5m downsampled: 90 days
1h downsampled: 1 year

Older data takes less space because it’s downsampled.

GitOps Deployment

For ArgoCD:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: thanos
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://charts.bitnami.com/bitnami
    chart: thanos
    targetRevision: 12.20.0
    helm:
      values: |
        objstoreConfig: |-
          type: s3
          config:
            bucket: thanos-metrics
            endpoint: minio.storage:9000
            insecure: true

        query:
          enabled: true
          replicaCount: 2
          stores:
            - dnssrv+_grpc._tcp.prometheus-operated.monitoring.svc

        storegateway:
          enabled: true
          replicaCount: 2
          persistence:
            size: 20Gi

        compactor:
          enabled: true
          retentionResolutionRaw: 30d
          retentionResolution5m: 90d
          retentionResolution1h: 1y
          persistence:
            size: 50Gi

        ruler:
          enabled: false  # Use Prometheus rules instead

        receive:
          enabled: false  # Using sidecar mode

  destination:
    server: https://kubernetes.default.svc
    namespace: monitoring

High Availability

Thanos enables Prometheus HA:

# Run two Prometheus instances
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
spec:
  replicas: 2
  externalLabels:
    replica: $(POD_NAME)  # Different for each replica

Both Prometheus instances scrape the same targets. Thanos Querier deduplicates:

# Querier configuration
args:
  - --query.replica-label=replica
  - --query.replica-label=prometheus_replica

Queries return deduplicated results automatically.

Global View Across Clusters

Add multiple clusters to the same Thanos deployment:

Cluster A Prometheus:

externalLabels:
  cluster: production-eu
  replica: $(POD_NAME)

Cluster B Prometheus:

externalLabels:
  cluster: production-us
  replica: $(POD_NAME)

Querier aggregates both:

# Total requests across all clusters
sum(rate(http_requests_total[5m]))

# Requests by cluster
sum by (cluster) (rate(http_requests_total[5m]))

Grafana Integration

Point Grafana at Thanos Querier:

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
data:
  thanos.yaml: |
    apiVersion: 1
    datasources:
      - name: Thanos
        type: prometheus
        url: http://thanos-query.monitoring:9090
        access: proxy
        isDefault: true
        jsonData:
          timeInterval: "15s"

All your existing Prometheus dashboards work — just point at Thanos instead.

Recording Rules for Performance

Pre-compute expensive queries:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: recording-rules
spec:
  groups:
    - name: aggregations
      interval: 1m
      rules:
        # Pre-aggregate request rate by service
        - record: service:http_requests:rate5m
          expr: sum by (service) (rate(http_requests_total[5m]))

        # Pre-aggregate error rate
        - record: service:http_errors:rate5m
          expr: sum by (service) (rate(http_requests_total{status=~"5.."}[5m]))

        # Pre-compute availability
        - record: service:availability:ratio
          expr: |
            1 - (
              service:http_errors:rate5m /
              service:http_requests:rate5m
            )

Dashboards query the pre-computed service:* metrics instead of raw data.

Alerting Architecture

Keep alerting close to data — run Alertmanager with Prometheus, not Thanos:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
spec:
  alerting:
    alertmanagers:
      - namespace: monitoring
        name: alertmanager
        port: web
  ruleSelector:
    matchLabels:
      role: alert-rules

Thanos Ruler exists but adds complexity. For most setups, Prometheus alerting is sufficient.

Monitoring Thanos Itself

Thanos exposes Prometheus metrics. Monitor:

# Sidecar upload success
thanos_shipper_uploads_total
thanos_shipper_upload_failures_total

# Store gateway performance
thanos_bucket_store_series_fetch_duration_seconds
thanos_bucket_store_block_loads_total

# Compactor health
thanos_compact_group_compactions_total
thanos_compact_group_compaction_failures_total

# Querier performance
thanos_query_gate_duration_seconds

Alert on failures:

- alert: ThanosSidecarUploadFailing
  expr: increase(thanos_shipper_upload_failures_total[1h]) > 0
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: "Thanos sidecar failing to upload blocks"

Storage Considerations

Object storage costs for metrics:

Resolution	Data per day	1 year cost (S3)
Raw (15s)	~100MB/target	~$4/target
5m downsample	~3MB/target	~$0.12/target
1h downsample	~0.5MB/target	~$0.02/target

Downsampling is crucial for cost control. Don’t keep raw data forever.

For self-hosted object storage, MinIO works well:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: minio
spec:
  template:
    spec:
      containers:
        - name: minio
          image: minio/minio:latest
          args:
            - server
            - /data
            - --console-address
            - ":9001"
          env:
            - name: MINIO_ROOT_USER
              valueFrom:
                secretKeyRef:
                  name: minio-credentials
                  key: root-user
            - name: MINIO_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: minio-credentials
                  key: root-password

My Production Setup

# Prometheus with sidecar
prometheus:
  replicas: 2
  retention: 6h  # Very short, Thanos handles long-term
  thanos:
    objectStorageConfig:
      name: thanos-objstore
  externalLabels:
    cluster: production
    environment: prod

# Thanos components
thanos:
  query:
    replicaCount: 2
    stores:
      - dnssrv+_grpc._tcp.prometheus-operated.monitoring.svc
      - dnssrv+_grpc._tcp.thanos-store.monitoring.svc

  storegateway:
    replicaCount: 2
    persistence:
      size: 50Gi

  compactor:
    retentionResolutionRaw: 14d
    retentionResolution5m: 60d
    retentionResolution1h: 365d
    persistence:
      size: 100Gi

# Object storage
minio:
  replicas: 4
  persistence:
    size: 500Gi

Key decisions:

6h local retention — Sidecar uploads frequently, no need for long local storage
14d raw retention — Full resolution for recent debugging
1 year 1h retention — Capacity planning and trends
Self-hosted MinIO — Data sovereignty, no cloud dependency

Why This Matters

Metrics are how you understand your systems. They answer:

Is this service healthy?
What changed before the incident?
Are we meeting our SLOs?
Where should we invest in optimization?

Without long-term metrics, you lose the ability to answer “compared to when?” Without cross-cluster queries, you can’t see the full picture.

Prometheus + Thanos gives you unlimited retention, global view, and high availability while keeping the familiar Prometheus interface.

This is understanding at scale.

You can’t improve what you can’t see over time. Thanos extends Prometheus from “what’s happening now” to “what’s been happening” — the difference between reactive firefighting and proactive optimization.

The Problem with Standalone Prometheus#

Thanos Architecture#

Installing Thanos#

Prometheus with Thanos Sidecar#

Thanos Components Configuration#

Querier#

Store Gateway#

Compactor#

GitOps Deployment#

High Availability#

Global View Across Clusters#

Grafana Integration#

Recording Rules for Performance#

Alerting Architecture#

Monitoring Thanos Itself#

Storage Considerations#

My Production Setup#

Why This Matters#