Distributed Tracing with Tempo and OpenTelemetry

You have metrics telling you something is slow. You have logs telling you errors happened. But which request failed? Where did the latency come from? Which service in the chain caused the timeout?

This is where distributed tracing comes in. It follows individual requests as they flow through your microservices, showing you exactly what happened and where.

The Observability Triangle

flowchart TD
    subgraph observability["Complete Observability"]
        M["Metrics<br/>(Prometheus/Thanos)<br/>WHAT is happening"]
        L["Logs<br/>(Loki)<br/>WHY it happened"]
        T["Traces<br/>(Tempo)<br/>WHERE it happened"]
    end

    M <--> L
    L <--> T
    T <--> M

    G["Grafana"] --> M
    G --> L
    G --> T

Metrics answer: “What is the error rate? What is the latency?”
Logs answer: “What error message? What was the context?”
Traces answer: “Which service? Which call? What was the path?”

Together, they give you complete understanding.

What is a Trace?

A trace is a tree of spans representing work done for a single request:

flowchart LR
    subgraph trace["Trace: order-12345"]
        A["API Gateway<br/>250ms"] --> B["Order Service<br/>180ms"]
        B --> C["Inventory Check<br/>45ms"]
        B --> D["Payment Service<br/>120ms"]
        D --> E["Bank API<br/>95ms"]
        B --> F["Notification<br/>15ms"]
    end

Each box is a span. Spans have:

Name: What operation (e.g., “HTTP GET /orders”)
Duration: How long it took
Parent: Which span initiated this one
Attributes: Key-value metadata (user_id, order_id, etc.)
Status: Success/error

The trace ID links all spans from the same request across all services.

Why Tempo?

Grafana Tempo is designed to be:

Cost-effective — Object storage backend, no indexing
Simple — No complex cluster management
Scalable — Handles massive trace volumes
Integrated — Native Grafana support, links to metrics/logs

Like Loki for logs, Tempo only indexes trace IDs. It doesn’t index spans or attributes. This keeps costs low but means you need trace IDs to query — you can’t search for “all traces with user_id=123”.

The solution: use metrics and logs to find trace IDs, then deep-dive in Tempo.

Architecture

flowchart TD
    subgraph apps["Applications"]
        A1["Service A<br/>(instrumented)"]
        A2["Service B<br/>(instrumented)"]
        A3["Service C<br/>(instrumented)"]
    end

    subgraph collector["OpenTelemetry"]
        OC["OTel Collector"]
    end

    A1 -->|"OTLP"| OC
    A2 -->|"OTLP"| OC
    A3 -->|"OTLP"| OC

    OC -->|"traces"| T["Tempo"]
    OC -->|"metrics"| P["Prometheus"]
    OC -->|"logs"| L["Loki"]

    T --> OS["Object Storage"]
    T --> G["Grafana"]
    P --> G
    L --> G

Applications are instrumented with OpenTelemetry SDKs. OTel Collector receives telemetry, processes, and exports. Tempo stores traces in object storage. Grafana visualizes and correlates everything.

Installing Tempo

Using Helm:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install tempo grafana/tempo \
  --namespace monitoring \
  --values tempo-values.yaml

Basic single-binary deployment:

# tempo-values.yaml
tempo:
  storage:
    trace:
      backend: local
      local:
        path: /var/tempo/traces

  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

persistence:
  enabled: true
  size: 50Gi

Production with object storage:

# tempo-values.yaml
tempo:
  storage:
    trace:
      backend: s3
      s3:
        bucket: tempo-traces
        endpoint: minio.storage:9000
        access_key: ${MINIO_ACCESS_KEY}
        secret_key: ${MINIO_SECRET_KEY}
        insecure: true

  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

  # Retention
  compactor:
    compaction:
      block_retention: 48h

# Distributed mode for scale
distributor:
  replicas: 2
ingester:
  replicas: 3
querier:
  replicas: 2
compactor:
  replicas: 1

Installing OpenTelemetry Collector

The OTel Collector acts as a pipeline for all telemetry:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

helm install otel-collector open-telemetry/opentelemetry-collector \
  --namespace monitoring \
  --values otel-collector-values.yaml

Collector configuration:

# otel-collector-values.yaml
mode: deployment
replicaCount: 2

config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

  processors:
    batch:
      timeout: 1s
      send_batch_size: 1024

    # Add Kubernetes metadata
    k8sattributes:
      auth_type: serviceAccount
      extract:
        metadata:
          - k8s.namespace.name
          - k8s.pod.name
          - k8s.deployment.name

    # Sample to reduce volume (adjust rate as needed)
    probabilistic_sampler:
      sampling_percentage: 10

  exporters:
    otlp/tempo:
      endpoint: tempo.monitoring:4317
      tls:
        insecure: true

    prometheus:
      endpoint: 0.0.0.0:8889
      namespace: otel

  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [k8sattributes, batch]
        exporters: [otlp/tempo]

      metrics:
        receivers: [otlp]
        processors: [batch]
        exporters: [prometheus]

Instrumenting Applications

Auto-Instrumentation (Easy Mode)

For many languages, OpenTelemetry can instrument automatically without code changes.

Java:

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: app
          image: my-java-app:latest
          env:
            - name: JAVA_TOOL_OPTIONS
              value: "-javaagent:/otel/opentelemetry-javaagent.jar"
            - name: OTEL_SERVICE_NAME
              value: "order-service"
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://otel-collector.monitoring:4317"
          volumeMounts:
            - name: otel-agent
              mountPath: /otel
      initContainers:
        - name: otel-agent
          image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
          command: [cp, /javaagent.jar, /otel/opentelemetry-javaagent.jar]
          volumeMounts:
            - name: otel-agent
              mountPath: /otel
      volumes:
        - name: otel-agent
          emptyDir: {}

Python:

FROM python:3.11
RUN pip install opentelemetry-distro opentelemetry-exporter-otlp
RUN opentelemetry-bootstrap -a install
CMD ["opentelemetry-instrument", "python", "app.py"]

Node.js:

// tracing.js - require this first
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://otel-collector:4317',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
  serviceName: process.env.OTEL_SERVICE_NAME || 'my-service',
});

sdk.start();

Manual Instrumentation (More Control)

For custom spans and attributes:

// Go example
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
)

func ProcessOrder(ctx context.Context, orderID string) error {
    tracer := otel.Tracer("order-service")
    ctx, span := tracer.Start(ctx, "process-order")
    defer span.End()

    // Add attributes
    span.SetAttributes(
        attribute.String("order.id", orderID),
        attribute.String("order.type", "standard"),
    )

    // Create child span for sub-operation
    ctx, childSpan := tracer.Start(ctx, "validate-inventory")
    err := validateInventory(ctx, orderID)
    childSpan.End()

    if err != nil {
        span.RecordError(err)
        return err
    }

    return nil
}

Context Propagation

For traces to work across services, context must propagate with requests.

HTTP headers (automatic with instrumentation):

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: vendor=value

gRPC metadata (automatic with instrumentation)

If you’re making manual HTTP calls:

// Inject context into outgoing request
req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))

// Extract context from incoming request
ctx := otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))

Grafana Integration

Add Tempo as a data source:

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
data:
  tempo.yaml: |
    apiVersion: 1
    datasources:
      - name: Tempo
        type: tempo
        url: http://tempo.monitoring:3100
        access: proxy
        jsonData:
          tracesToLogs:
            datasourceUid: loki
            tags: ['app', 'namespace']
          tracesToMetrics:
            datasourceUid: prometheus
            tags: ['service.name']
          serviceMap:
            datasourceUid: prometheus
          nodeGraph:
            enabled: true
          search:
            hide: false
          lokiSearch:
            datasourceUid: loki

Finding Traces

In Grafana Explore:

Select Tempo data source
Choose “Search” tab
Filter by service name, duration, status
Click a trace to see the waterfall

Trace to Logs

With tracesToLogs configured, you can jump from a span directly to related logs:

Open a trace
Click a span
Click “Logs for this span”
See Loki logs with the same trace ID

Trace to Metrics

Similarly, link traces to request metrics:

See slow traces
Check corresponding latency histograms
Correlate with error rates

Service Graph

Tempo can generate a service dependency graph from traces:

# Enable metrics generator in Tempo
tempo:
  metricsGenerator:
    enabled: true
    remoteWriteUrl: http://prometheus.monitoring:9090/api/v1/write

This creates metrics like:

traces_service_graph_request_total
traces_service_graph_request_failed_total
traces_service_graph_request_server_seconds

Grafana displays this as an interactive service map showing traffic flow and error rates between services.

Sampling Strategies

At scale, you can’t store every trace. Sampling strategies:

Head Sampling (At Collection)

# OTel Collector
processors:
  probabilistic_sampler:
    sampling_percentage: 10  # Keep 10% of traces

Simple but you might miss interesting traces.

Tail Sampling (After Collection)

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      # Always keep errors
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]

      # Always keep slow traces
      - name: slow
        type: latency
        latency:
          threshold_ms: 1000

      # Sample 5% of everything else
      - name: probabilistic
        type: probabilistic
        probabilistic:
          sampling_percentage: 5

Better: keeps all errors and slow traces, samples normal ones.

My Production Setup

# Tempo with object storage
tempo:
  storage:
    trace:
      backend: s3
      s3:
        bucket: tempo-traces
        endpoint: minio.storage:9000
  compactor:
    compaction:
      block_retention: 72h  # 3 days of traces
  metricsGenerator:
    enabled: true
    remoteWriteUrl: http://prometheus:9090/api/v1/write

# OTel Collector with tail sampling
otel-collector:
  config:
    processors:
      tail_sampling:
        policies:
          - name: errors
            type: status_code
            status_codes: [ERROR]
          - name: slow
            type: latency
            threshold_ms: 500
          - name: sample-rest
            type: probabilistic
            sampling_percentage: 5

Key decisions:

72h retention — Enough to debug recent issues
Tail sampling — Keep all errors and slow traces
5% general sampling — Manageable volume
Service graph — Visual dependency map

Debugging with Traces

Real debugging workflow:

Alert fires: High latency on checkout service
Check metrics: P99 latency spiked at 14:32
Find traces: Search Tempo for checkout-service, duration > 1s, time range 14:30-14:35
Analyze trace: See that payment-service call took 4.2s
Drill into span: See db.statement attribute showing slow query
Check logs: Jump to Loki logs for that span, see connection pool exhaustion
Fix: Increase connection pool size

Without tracing, you’d be guessing which service caused the latency.

Why This Matters

Microservices are great for teams but terrible for debugging. A single user request might touch 10 services. When something fails:

Logs show errors but not causation
Metrics show symptoms but not root cause
Only traces show the complete picture

With Prometheus/Thanos for metrics, Loki for logs, and Tempo for traces, you have complete observability. All in Grafana. All correlated. All self-hosted.

No more “works on my machine.” No more “I think it’s the payment service.” Just data.

Metrics tell you the score. Logs tell you the play-by-play. Traces tell you who passed the ball to whom. You need all three to understand the game.

The Observability Triangle#

What is a Trace?#

Why Tempo?#

Architecture#

Installing Tempo#

Installing OpenTelemetry Collector#

Instrumenting Applications#

Auto-Instrumentation (Easy Mode)#

Manual Instrumentation (More Control)#

Context Propagation#

Grafana Integration#

Finding Traces#

Trace to Logs#

Trace to Metrics#

Service Graph#

Sampling Strategies#

Head Sampling (At Collection)#

Tail Sampling (After Collection)#

My Production Setup#

Debugging with Traces#

Why This Matters#