You have metrics telling you something is slow. You have logs telling you errors happened. But which request failed? Where did the latency come from? Which service in the chain caused the timeout?
This is where distributed tracing comes in. It follows individual requests as they flow through your microservices, showing you exactly what happened and where.
The Observability Triangle
flowchart TD
subgraph observability["Complete Observability"]
M["Metrics<br/>(Prometheus/Thanos)<br/>WHAT is happening"]
L["Logs<br/>(Loki)<br/>WHY it happened"]
T["Traces<br/>(Tempo)<br/>WHERE it happened"]
end
M <--> L
L <--> T
T <--> M
G["Grafana"] --> M
G --> L
G --> T
- Metrics answer: “What is the error rate? What is the latency?”
- Logs answer: “What error message? What was the context?”
- Traces answer: “Which service? Which call? What was the path?”
Together, they give you complete understanding.
What is a Trace?
A trace is a tree of spans representing work done for a single request:
flowchart LR
subgraph trace["Trace: order-12345"]
A["API Gateway<br/>250ms"] --> B["Order Service<br/>180ms"]
B --> C["Inventory Check<br/>45ms"]
B --> D["Payment Service<br/>120ms"]
D --> E["Bank API<br/>95ms"]
B --> F["Notification<br/>15ms"]
end
Each box is a span. Spans have:
- Name: What operation (e.g., “HTTP GET /orders”)
- Duration: How long it took
- Parent: Which span initiated this one
- Attributes: Key-value metadata (user_id, order_id, etc.)
- Status: Success/error
The trace ID links all spans from the same request across all services.
Why Tempo?
Grafana Tempo is designed to be:
- Cost-effective — Object storage backend, no indexing
- Simple — No complex cluster management
- Scalable — Handles massive trace volumes
- Integrated — Native Grafana support, links to metrics/logs
Like Loki for logs, Tempo only indexes trace IDs. It doesn’t index spans or attributes. This keeps costs low but means you need trace IDs to query — you can’t search for “all traces with user_id=123”.
The solution: use metrics and logs to find trace IDs, then deep-dive in Tempo.
Architecture
flowchart TD
subgraph apps["Applications"]
A1["Service A<br/>(instrumented)"]
A2["Service B<br/>(instrumented)"]
A3["Service C<br/>(instrumented)"]
end
subgraph collector["OpenTelemetry"]
OC["OTel Collector"]
end
A1 -->|"OTLP"| OC
A2 -->|"OTLP"| OC
A3 -->|"OTLP"| OC
OC -->|"traces"| T["Tempo"]
OC -->|"metrics"| P["Prometheus"]
OC -->|"logs"| L["Loki"]
T --> OS["Object Storage"]
T --> G["Grafana"]
P --> G
L --> G
Applications are instrumented with OpenTelemetry SDKs. OTel Collector receives telemetry, processes, and exports. Tempo stores traces in object storage. Grafana visualizes and correlates everything.
Installing Tempo
Using Helm:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install tempo grafana/tempo \
--namespace monitoring \
--values tempo-values.yaml
Basic single-binary deployment:
# tempo-values.yaml
tempo:
storage:
trace:
backend: local
local:
path: /var/tempo/traces
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
persistence:
enabled: true
size: 50Gi
Production with object storage:
# tempo-values.yaml
tempo:
storage:
trace:
backend: s3
s3:
bucket: tempo-traces
endpoint: minio.storage:9000
access_key: ${MINIO_ACCESS_KEY}
secret_key: ${MINIO_SECRET_KEY}
insecure: true
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Retention
compactor:
compaction:
block_retention: 48h
# Distributed mode for scale
distributor:
replicas: 2
ingester:
replicas: 3
querier:
replicas: 2
compactor:
replicas: 1
Installing OpenTelemetry Collector
The OTel Collector acts as a pipeline for all telemetry:
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm install otel-collector open-telemetry/opentelemetry-collector \
--namespace monitoring \
--values otel-collector-values.yaml
Collector configuration:
# otel-collector-values.yaml
mode: deployment
replicaCount: 2
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
# Add Kubernetes metadata
k8sattributes:
auth_type: serviceAccount
extract:
metadata:
- k8s.namespace.name
- k8s.pod.name
- k8s.deployment.name
# Sample to reduce volume (adjust rate as needed)
probabilistic_sampler:
sampling_percentage: 10
exporters:
otlp/tempo:
endpoint: tempo.monitoring:4317
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
namespace: otel
service:
pipelines:
traces:
receivers: [otlp]
processors: [k8sattributes, batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
Instrumenting Applications
Auto-Instrumentation (Easy Mode)
For many languages, OpenTelemetry can instrument automatically without code changes.
Java:
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: app
image: my-java-app:latest
env:
- name: JAVA_TOOL_OPTIONS
value: "-javaagent:/otel/opentelemetry-javaagent.jar"
- name: OTEL_SERVICE_NAME
value: "order-service"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector.monitoring:4317"
volumeMounts:
- name: otel-agent
mountPath: /otel
initContainers:
- name: otel-agent
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
command: [cp, /javaagent.jar, /otel/opentelemetry-javaagent.jar]
volumeMounts:
- name: otel-agent
mountPath: /otel
volumes:
- name: otel-agent
emptyDir: {}
Python:
FROM python:3.11
RUN pip install opentelemetry-distro opentelemetry-exporter-otlp
RUN opentelemetry-bootstrap -a install
CMD ["opentelemetry-instrument", "python", "app.py"]
Node.js:
// tracing.js - require this first
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://otel-collector:4317',
}),
instrumentations: [getNodeAutoInstrumentations()],
serviceName: process.env.OTEL_SERVICE_NAME || 'my-service',
});
sdk.start();
Manual Instrumentation (More Control)
For custom spans and attributes:
// Go example
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
)
func ProcessOrder(ctx context.Context, orderID string) error {
tracer := otel.Tracer("order-service")
ctx, span := tracer.Start(ctx, "process-order")
defer span.End()
// Add attributes
span.SetAttributes(
attribute.String("order.id", orderID),
attribute.String("order.type", "standard"),
)
// Create child span for sub-operation
ctx, childSpan := tracer.Start(ctx, "validate-inventory")
err := validateInventory(ctx, orderID)
childSpan.End()
if err != nil {
span.RecordError(err)
return err
}
return nil
}
Context Propagation
For traces to work across services, context must propagate with requests.
HTTP headers (automatic with instrumentation):
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: vendor=value
gRPC metadata (automatic with instrumentation)
If you’re making manual HTTP calls:
// Inject context into outgoing request
req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
// Extract context from incoming request
ctx := otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))
Grafana Integration
Add Tempo as a data source:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
data:
tempo.yaml: |
apiVersion: 1
datasources:
- name: Tempo
type: tempo
url: http://tempo.monitoring:3100
access: proxy
jsonData:
tracesToLogs:
datasourceUid: loki
tags: ['app', 'namespace']
tracesToMetrics:
datasourceUid: prometheus
tags: ['service.name']
serviceMap:
datasourceUid: prometheus
nodeGraph:
enabled: true
search:
hide: false
lokiSearch:
datasourceUid: loki
Finding Traces
In Grafana Explore:
- Select Tempo data source
- Choose “Search” tab
- Filter by service name, duration, status
- Click a trace to see the waterfall
Trace to Logs
With tracesToLogs configured, you can jump from a span directly to related logs:
- Open a trace
- Click a span
- Click “Logs for this span”
- See Loki logs with the same trace ID
Trace to Metrics
Similarly, link traces to request metrics:
- See slow traces
- Check corresponding latency histograms
- Correlate with error rates
Service Graph
Tempo can generate a service dependency graph from traces:
# Enable metrics generator in Tempo
tempo:
metricsGenerator:
enabled: true
remoteWriteUrl: http://prometheus.monitoring:9090/api/v1/write
This creates metrics like:
traces_service_graph_request_totaltraces_service_graph_request_failed_totaltraces_service_graph_request_server_seconds
Grafana displays this as an interactive service map showing traffic flow and error rates between services.
Sampling Strategies
At scale, you can’t store every trace. Sampling strategies:
Head Sampling (At Collection)
# OTel Collector
processors:
probabilistic_sampler:
sampling_percentage: 10 # Keep 10% of traces
Simple but you might miss interesting traces.
Tail Sampling (After Collection)
processors:
tail_sampling:
decision_wait: 10s
policies:
# Always keep errors
- name: errors
type: status_code
status_code:
status_codes: [ERROR]
# Always keep slow traces
- name: slow
type: latency
latency:
threshold_ms: 1000
# Sample 5% of everything else
- name: probabilistic
type: probabilistic
probabilistic:
sampling_percentage: 5
Better: keeps all errors and slow traces, samples normal ones.
My Production Setup
# Tempo with object storage
tempo:
storage:
trace:
backend: s3
s3:
bucket: tempo-traces
endpoint: minio.storage:9000
compactor:
compaction:
block_retention: 72h # 3 days of traces
metricsGenerator:
enabled: true
remoteWriteUrl: http://prometheus:9090/api/v1/write
# OTel Collector with tail sampling
otel-collector:
config:
processors:
tail_sampling:
policies:
- name: errors
type: status_code
status_codes: [ERROR]
- name: slow
type: latency
threshold_ms: 500
- name: sample-rest
type: probabilistic
sampling_percentage: 5
Key decisions:
- 72h retention — Enough to debug recent issues
- Tail sampling — Keep all errors and slow traces
- 5% general sampling — Manageable volume
- Service graph — Visual dependency map
Debugging with Traces
Real debugging workflow:
- Alert fires: High latency on checkout service
- Check metrics: P99 latency spiked at 14:32
- Find traces: Search Tempo for checkout-service, duration > 1s, time range 14:30-14:35
- Analyze trace: See that payment-service call took 4.2s
- Drill into span: See
db.statementattribute showing slow query - Check logs: Jump to Loki logs for that span, see connection pool exhaustion
- Fix: Increase connection pool size
Without tracing, you’d be guessing which service caused the latency.
Why This Matters
Microservices are great for teams but terrible for debugging. A single user request might touch 10 services. When something fails:
- Logs show errors but not causation
- Metrics show symptoms but not root cause
- Only traces show the complete picture
With Prometheus/Thanos for metrics, Loki for logs, and Tempo for traces, you have complete observability. All in Grafana. All correlated. All self-hosted.
No more “works on my machine.” No more “I think it’s the payment service.” Just data.
Metrics tell you the score. Logs tell you the play-by-play. Traces tell you who passed the ball to whom. You need all three to understand the game.
