You can’t fix what you can’t see. You can’t optimize what you can’t measure.
Prometheus is the standard for Kubernetes metrics. It works beautifully — until you need long-term storage, or multiple clusters, or high availability. Then you hit its limits.
Thanos extends Prometheus without replacing it. Keep your existing setup, add Thanos components, get unlimited retention and global querying.
The Problem with Standalone Prometheus
Prometheus has built-in limitations:
- Single node — No native clustering or HA
- Local storage — Retention limited by disk size
- Single cluster view — Can’t query across clusters
- No downsampling — Old data takes as much space as new
For a single small cluster with 2 weeks retention, these aren’t problems. For production multi-cluster environments with compliance requirements, they’re blockers.
Thanos Architecture
Thanos adds components around Prometheus:
flowchart TD
subgraph clusterA["Cluster A"]
PA["Prometheus + Sidecar"]
end
subgraph clusterB["Cluster B"]
PB["Prometheus + Sidecar"]
end
PA --> OS["Object Storage<br/>(S3/MinIO/GCS)"]
PB --> OS
OS --> Q["Querier"]
OS --> SG["Store Gateway"]
OS --> C["Compactor"]
SG --> Q
Q --> G["Grafana"]
Sidecar — Runs alongside Prometheus, uploads blocks to object storage Store Gateway — Serves historical data from object storage Querier — Aggregates data from sidecars and store gateway Compactor — Downsamples and deduplicates data in object storage
Installing Thanos
Using the Bitnami Helm chart:
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm install thanos bitnami/thanos \
--namespace monitoring \
--create-namespace \
--set objstoreConfig="$(cat thanos-objstore.yaml)"
Object store configuration (thanos-objstore.yaml):
type: s3
config:
bucket: thanos-metrics
endpoint: minio.storage:9000
access_key: ${MINIO_ACCESS_KEY}
secret_key: ${MINIO_SECRET_KEY}
insecure: true # For MinIO without TLS
Prometheus with Thanos Sidecar
Modify your Prometheus deployment to include the sidecar:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 2 # HA pair
retention: 2h # Short local retention, Thanos handles long-term
# Thanos sidecar configuration
thanos:
baseImage: quay.io/thanos/thanos
version: v0.34.0
objectStorageConfig:
key: thanos.yaml
name: thanos-objstore-secret
# External labels for deduplication
externalLabels:
cluster: production
replica: $(POD_NAME)
# Let Thanos sidecar access Prometheus data
storage:
volumeClaimTemplate:
spec:
storageClassName: longhorn
resources:
requests:
storage: 50Gi
The sidecar:
- Exposes Prometheus data to Thanos Querier via gRPC
- Uploads completed TSDB blocks to object storage
- Responds to Store API queries for recent data
Thanos Components Configuration
Querier
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-querier
namespace: monitoring
spec:
replicas: 2
template:
spec:
containers:
- name: thanos-query
image: quay.io/thanos/thanos:v0.34.0
args:
- query
- --http-address=0.0.0.0:9090
- --grpc-address=0.0.0.0:10901
# Connect to sidecars
- --store=dnssrv+_grpc._tcp.prometheus-operated.monitoring.svc
# Connect to store gateway
- --store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc
# Deduplication
- --query.replica-label=replica
ports:
- name: http
containerPort: 9090
- name: grpc
containerPort: 10901
Store Gateway
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-store
namespace: monitoring
spec:
replicas: 2
template:
spec:
containers:
- name: thanos-store
image: quay.io/thanos/thanos:v0.34.0
args:
- store
- --http-address=0.0.0.0:10902
- --grpc-address=0.0.0.0:10901
- --data-dir=/var/thanos/store
- --objstore.config-file=/etc/thanos/objstore.yaml
volumeMounts:
- name: objstore-config
mountPath: /etc/thanos
- name: data
mountPath: /var/thanos/store
volumes:
- name: objstore-config
secret:
secretName: thanos-objstore-secret
volumeClaimTemplates:
- metadata:
name: data
spec:
storageClassName: longhorn
resources:
requests:
storage: 10Gi
Compactor
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-compactor
namespace: monitoring
spec:
replicas: 1 # Only one compactor!
template:
spec:
containers:
- name: thanos-compact
image: quay.io/thanos/thanos:v0.34.0
args:
- compact
- --http-address=0.0.0.0:10902
- --data-dir=/var/thanos/compact
- --objstore.config-file=/etc/thanos/objstore.yaml
- --retention.resolution-raw=30d
- --retention.resolution-5m=90d
- --retention.resolution-1h=1y
- --wait
volumeMounts:
- name: objstore-config
mountPath: /etc/thanos
- name: data
mountPath: /var/thanos/compact
Retention configuration:
- Raw data: 30 days at full resolution
- 5m downsampled: 90 days
- 1h downsampled: 1 year
Older data takes less space because it’s downsampled.
GitOps Deployment
For ArgoCD:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: thanos
namespace: argocd
spec:
project: default
source:
repoURL: https://charts.bitnami.com/bitnami
chart: thanos
targetRevision: 12.20.0
helm:
values: |
objstoreConfig: |-
type: s3
config:
bucket: thanos-metrics
endpoint: minio.storage:9000
insecure: true
query:
enabled: true
replicaCount: 2
stores:
- dnssrv+_grpc._tcp.prometheus-operated.monitoring.svc
storegateway:
enabled: true
replicaCount: 2
persistence:
size: 20Gi
compactor:
enabled: true
retentionResolutionRaw: 30d
retentionResolution5m: 90d
retentionResolution1h: 1y
persistence:
size: 50Gi
ruler:
enabled: false # Use Prometheus rules instead
receive:
enabled: false # Using sidecar mode
destination:
server: https://kubernetes.default.svc
namespace: monitoring
High Availability
Thanos enables Prometheus HA:
# Run two Prometheus instances
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
spec:
replicas: 2
externalLabels:
replica: $(POD_NAME) # Different for each replica
Both Prometheus instances scrape the same targets. Thanos Querier deduplicates:
# Querier configuration
args:
- --query.replica-label=replica
- --query.replica-label=prometheus_replica
Queries return deduplicated results automatically.
Global View Across Clusters
Add multiple clusters to the same Thanos deployment:
Cluster A Prometheus:
externalLabels:
cluster: production-eu
replica: $(POD_NAME)
Cluster B Prometheus:
externalLabels:
cluster: production-us
replica: $(POD_NAME)
Querier aggregates both:
# Total requests across all clusters
sum(rate(http_requests_total[5m]))
# Requests by cluster
sum by (cluster) (rate(http_requests_total[5m]))
Grafana Integration
Point Grafana at Thanos Querier:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
data:
thanos.yaml: |
apiVersion: 1
datasources:
- name: Thanos
type: prometheus
url: http://thanos-query.monitoring:9090
access: proxy
isDefault: true
jsonData:
timeInterval: "15s"
All your existing Prometheus dashboards work — just point at Thanos instead.
Recording Rules for Performance
Pre-compute expensive queries:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: recording-rules
spec:
groups:
- name: aggregations
interval: 1m
rules:
# Pre-aggregate request rate by service
- record: service:http_requests:rate5m
expr: sum by (service) (rate(http_requests_total[5m]))
# Pre-aggregate error rate
- record: service:http_errors:rate5m
expr: sum by (service) (rate(http_requests_total{status=~"5.."}[5m]))
# Pre-compute availability
- record: service:availability:ratio
expr: |
1 - (
service:http_errors:rate5m /
service:http_requests:rate5m
)
Dashboards query the pre-computed service:* metrics instead of raw data.
Alerting Architecture
Keep alerting close to data — run Alertmanager with Prometheus, not Thanos:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
spec:
alerting:
alertmanagers:
- namespace: monitoring
name: alertmanager
port: web
ruleSelector:
matchLabels:
role: alert-rules
Thanos Ruler exists but adds complexity. For most setups, Prometheus alerting is sufficient.
Monitoring Thanos Itself
Thanos exposes Prometheus metrics. Monitor:
# Sidecar upload success
thanos_shipper_uploads_total
thanos_shipper_upload_failures_total
# Store gateway performance
thanos_bucket_store_series_fetch_duration_seconds
thanos_bucket_store_block_loads_total
# Compactor health
thanos_compact_group_compactions_total
thanos_compact_group_compaction_failures_total
# Querier performance
thanos_query_gate_duration_seconds
Alert on failures:
- alert: ThanosSidecarUploadFailing
expr: increase(thanos_shipper_upload_failures_total[1h]) > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Thanos sidecar failing to upload blocks"
Storage Considerations
Object storage costs for metrics:
| Resolution | Data per day | 1 year cost (S3) |
|---|---|---|
| Raw (15s) | ~100MB/target | ~$4/target |
| 5m downsample | ~3MB/target | ~$0.12/target |
| 1h downsample | ~0.5MB/target | ~$0.02/target |
Downsampling is crucial for cost control. Don’t keep raw data forever.
For self-hosted object storage, MinIO works well:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: minio
spec:
template:
spec:
containers:
- name: minio
image: minio/minio:latest
args:
- server
- /data
- --console-address
- ":9001"
env:
- name: MINIO_ROOT_USER
valueFrom:
secretKeyRef:
name: minio-credentials
key: root-user
- name: MINIO_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: minio-credentials
key: root-password
My Production Setup
# Prometheus with sidecar
prometheus:
replicas: 2
retention: 6h # Very short, Thanos handles long-term
thanos:
objectStorageConfig:
name: thanos-objstore
externalLabels:
cluster: production
environment: prod
# Thanos components
thanos:
query:
replicaCount: 2
stores:
- dnssrv+_grpc._tcp.prometheus-operated.monitoring.svc
- dnssrv+_grpc._tcp.thanos-store.monitoring.svc
storegateway:
replicaCount: 2
persistence:
size: 50Gi
compactor:
retentionResolutionRaw: 14d
retentionResolution5m: 60d
retentionResolution1h: 365d
persistence:
size: 100Gi
# Object storage
minio:
replicas: 4
persistence:
size: 500Gi
Key decisions:
- 6h local retention — Sidecar uploads frequently, no need for long local storage
- 14d raw retention — Full resolution for recent debugging
- 1 year 1h retention — Capacity planning and trends
- Self-hosted MinIO — Data sovereignty, no cloud dependency
Why This Matters
Metrics are how you understand your systems. They answer:
- Is this service healthy?
- What changed before the incident?
- Are we meeting our SLOs?
- Where should we invest in optimization?
Without long-term metrics, you lose the ability to answer “compared to when?” Without cross-cluster queries, you can’t see the full picture.
Prometheus + Thanos gives you unlimited retention, global view, and high availability while keeping the familiar Prometheus interface.
This is understanding at scale.
You can’t improve what you can’t see over time. Thanos extends Prometheus from “what’s happening now” to “what’s been happening” — the difference between reactive firefighting and proactive optimization.
