You’ve got Prometheus for metrics. You can see what’s happening across your clusters. But when something breaks, metrics tell you that something is wrong — logs tell you why.
The traditional answer is Elasticsearch. It’s powerful, flexible, and… expensive. It indexes everything, which means you pay for every byte of log data in CPU, memory, and storage.
Loki takes a different approach: index labels, not content. It’s the same philosophy that makes Prometheus efficient for metrics, applied to logs.
Why Loki?
Loki was designed by Grafana Labs with specific goals:
- Cost efficient — Only index metadata (labels), store log lines compressed
- Kubernetes native — Labels from Kubernetes metadata automatically
- Grafana integration — Same dashboards, same alerting, same workflow
- Operationally simple — No JVM tuning, no cluster management complexity
The trade-off: you can’t do full-text search across all logs efficiently. You need to know which labels to filter on first. For Kubernetes workloads where you’re usually looking at specific pods, namespaces, or services, this is fine.
Architecture
flowchart TD
subgraph cluster["Kubernetes Cluster"]
subgraph nodes["Nodes"]
P1["Promtail<br/>(DaemonSet)"]
P2["Promtail"]
P3["Promtail"]
end
PODS["Pod Logs<br/>(/var/log/pods)"]
end
P1 --> PODS
P2 --> PODS
P3 --> PODS
P1 --> L["Loki"]
P2 --> L
P3 --> L
L --> OS["Object Storage<br/>(chunks)"]
L --> G["Grafana"]
Promtail runs on each node as a DaemonSet, tails log files, adds labels, and ships to Loki.
Loki receives logs, indexes by labels, compresses chunks, stores in object storage.
Grafana queries Loki using LogQL, displays in Explore or dashboards.
Installing Loki
Using the Grafana Helm charts:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Install Loki with simple scalable deployment
helm install loki grafana/loki \
--namespace monitoring \
--create-namespace \
--values loki-values.yaml
Basic values for a single-binary deployment (good for small clusters):
# loki-values.yaml
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
storage:
type: filesystem
schemaConfig:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
singleBinary:
replicas: 1
persistence:
size: 50Gi
# Disable components not needed for single binary
backend:
replicas: 0
read:
replicas: 0
write:
replicas: 0
For production with object storage:
# loki-values.yaml
loki:
auth_enabled: false
commonConfig:
replication_factor: 3
storage:
type: s3
s3:
endpoint: minio.storage:9000
bucketnames: loki-chunks
access_key_id: ${MINIO_ACCESS_KEY}
secret_access_key: ${MINIO_SECRET_KEY}
insecure: true
schemaConfig:
configs:
- from: 2024-01-01
store: tsdb
object_store: s3
schema: v13
index:
prefix: index_
period: 24h
# Scalable deployment
backend:
replicas: 3
read:
replicas: 3
write:
replicas: 3
Installing Promtail
Promtail collects logs from your nodes:
helm install promtail grafana/promtail \
--namespace monitoring \
--set config.clients[0].url=http://loki:3100/loki/api/v1/push
Promtail configuration for Kubernetes:
# promtail-values.yaml
config:
clients:
- url: http://loki:3100/loki/api/v1/push
snippets:
# Add Kubernetes metadata as labels
pipelineStages:
- cri: {}
- labeldrop:
- filename
- match:
selector: '{app="nginx"}'
stages:
- regex:
expression: '^(?P<remote_addr>[\d\.]+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<request>[^"]+)" (?P<status>\d+)'
- labels:
status:
# DaemonSet tolerations for all nodes
tolerations:
- operator: Exists
Understanding Labels
Labels are everything in Loki. They determine how logs are indexed and queried.
Default Kubernetes labels from Promtail:
| Label | Source | Example |
|---|---|---|
namespace | Pod namespace | default, monitoring |
pod | Pod name | nginx-abc123 |
container | Container name | nginx, sidecar |
node_name | Node | worker-1 |
app | Pod label | nginx |
job | Scrape config | kubernetes-pods |
High cardinality warning: Don’t add labels that have many unique values (like request IDs, user IDs, or timestamps). This kills Loki’s performance. Keep labels to dimensions you’ll filter on.
Good labels:
namespace,app,environment,team
Bad labels:
request_id,user_id,trace_id,timestamp
LogQL: Querying Logs
LogQL is Loki’s query language. It looks like PromQL but for logs.
Basic Queries
# All logs from a namespace
{namespace="production"}
# Specific app
{app="frontend", namespace="production"}
# Multiple containers
{container=~"nginx|envoy"}
# Exclude a namespace
{namespace!="kube-system"}
Filtering Content
# Lines containing "error"
{app="frontend"} |= "error"
# Lines NOT containing "health"
{app="frontend"} != "health"
# Regex match
{app="frontend"} |~ "status=(4|5)[0-9]{2}"
# Case insensitive
{app="frontend"} |~ "(?i)error"
Parsing and Extracting
# Parse JSON logs
{app="api"} | json
# Extract specific field
{app="api"} | json | status_code >= 500
# Parse with pattern
{app="nginx"} | pattern `<ip> - - [<_>] "<method> <path> <_>" <status>`
# Use extracted fields
{app="nginx"} | pattern `<_> - - [<_>] "<method> <path> <_>" <status>` | status >= 400
Aggregations (Log Metrics)
# Count errors per app
sum by (app) (count_over_time({namespace="production"} |= "error" [5m]))
# Rate of requests
sum(rate({app="nginx"} | pattern `<_> "<method> <path> <_>" <status>` [1m])) by (status)
# Bytes per namespace
sum by (namespace) (bytes_over_time({job="kubernetes-pods"}[1h]))
Grafana Integration
Add Loki as a data source in Grafana:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
namespace: monitoring
data:
loki.yaml: |
apiVersion: 1
datasources:
- name: Loki
type: loki
url: http://loki:3100
access: proxy
jsonData:
maxLines: 1000
Explore View
Grafana Explore is perfect for log investigation:
- Select Loki data source
- Build query with label browser
- Filter with content matches
- Click on log lines for context
Dashboard Panels
Add logs to your dashboards:
{
"type": "logs",
"datasource": "Loki",
"targets": [
{
"expr": "{namespace=\"production\", app=\"frontend\"} |= \"error\"",
"refId": "A"
}
],
"options": {
"showTime": true,
"showLabels": false,
"wrapLogMessage": true
}
}
Correlating Metrics and Logs
The power of Grafana: same dashboard shows metrics and logs.
# Prometheus panel showing error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) by (app)
# Loki panel showing error logs
{app="$app"} |= "error"
Variable $app links both panels. Click a spike in the metrics, see the errors in the logs.
Alerting on Logs
Loki supports alerting through Grafana or its own ruler:
# Grafana alert rule
apiVersion: 1
groups:
- name: LogAlerts
rules:
- alert: HighErrorRate
expr: |
sum(count_over_time({namespace="production"} |= "error" [5m])) > 100
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate in production logs"
Using Loki’s ruler component:
# loki-rules.yaml
groups:
- name: errors
rules:
- alert: CriticalError
expr: |
count_over_time({app="payment-service"} |= "CRITICAL" [1m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "Critical error in payment service"
Retention and Storage
Configure retention in Loki:
loki:
limits_config:
retention_period: 30d
compactor:
working_directory: /var/loki/compactor
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
Per-tenant retention (if using multi-tenancy):
loki:
limits_config:
retention_period: 30d # Default
overrides:
production:
retention_period: 90d # Keep production logs longer
development:
retention_period: 7d # Dev logs expire faster
Performance Tuning
Chunk Size
Larger chunks = fewer index entries, better compression, higher latency for small queries:
loki:
ingester:
chunk_target_size: 1572864 # 1.5MB
chunk_idle_period: 30m
max_chunk_age: 2h
Query Limits
Prevent runaway queries:
loki:
limits_config:
max_query_length: 721h # Max time range
max_query_parallelism: 32 # Concurrent sub-queries
max_entries_limit_per_query: 5000
Caching
Add caching for better query performance:
loki:
memcached:
chunk_cache:
enabled: true
host: memcached.monitoring
results_cache:
enabled: true
host: memcached.monitoring
Structured Logging Best Practices
To get the most from Loki, log in structured format:
{
"level": "error",
"message": "Failed to process order",
"order_id": "12345",
"error": "payment declined",
"duration_ms": 234
}
Query structured logs easily:
{app="order-service"} | json | level="error" | duration_ms > 1000
Configure your apps to output JSON:
# Spring Boot
logging.pattern.console: '{"timestamp":"%d","level":"%p","logger":"%c","message":"%m"}%n'
# Node.js with winston
const logger = winston.createLogger({
format: winston.format.json(),
});
# Go with zap
logger, _ := zap.NewProduction()
My Production Setup
# Loki with object storage
loki:
auth_enabled: false
commonConfig:
replication_factor: 3
storage:
type: s3
s3:
endpoint: minio.storage:9000
bucketnames: loki-data
limits_config:
retention_period: 30d
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
# Three-replica deployment
backend:
replicas: 3
persistence:
size: 50Gi
read:
replicas: 3
write:
replicas: 3
persistence:
size: 50Gi
# Promtail on all nodes
promtail:
tolerations:
- operator: Exists
config:
clients:
- url: http://loki:3100/loki/api/v1/push
Key decisions:
- Object storage: MinIO for sovereignty, no cloud dependency
- 30-day retention: Enough for debugging, not infinite
- Three replicas: Survives node failures
- All nodes: Promtail runs everywhere, including control plane
Loki vs Elasticsearch
| Aspect | Loki | Elasticsearch |
|---|---|---|
| Indexing | Labels only | Full-text |
| Storage cost | Lower | Higher |
| Query flexibility | Label-first | Full-text search |
| Operations | Simpler | Complex |
| Memory usage | Lower | Higher (JVM) |
| Grafana integration | Native | Good |
Choose Loki when:
- You query by known dimensions (namespace, app, pod)
- Cost matters
- You want operational simplicity
- You’re already in the Grafana ecosystem
Choose Elasticsearch when:
- You need full-text search across all logs
- You don’t know what you’re looking for
- Log analytics is a primary use case
Why This Matters
Logs are the narrative of your system. Metrics tell you the health score, logs tell you the story.
With Loki, you get:
- Affordable log retention — Keep logs without breaking the budget
- Kubernetes-native labels — Query by what matters (namespace, app, pod)
- Unified observability — Same Grafana, same workflow as metrics
Combined with Prometheus/Thanos for metrics and traces (covered separately), you have complete observability without the operational complexity of the ELK stack.
Logs are the story your system tells about itself. Loki makes sure you can afford to listen.
