cert-manager automatic TLS certificate flow

cert-manager: Automatic TLS Certificates in Kubernetes

Manual certificate management is a recipe for outages. Certificates expire at 3 AM on a holiday weekend. Renewal processes live in tribal knowledge. Teams deploy services without HTTPS because “it’s too complicated.” cert-manager automates everything. Define what certificates you need, and cert-manager handles issuance, renewal, and Kubernetes Secret management. Forever. This is one of the first things I install in every cluster. How cert-manager Works flowchart TD subgraph cluster["Kubernetes Cluster"] CM["cert-manager"] CERT["Certificate<br/>Resource"] SECRET["TLS Secret"] INGRESS["Ingress"] end subgraph external["External"] LE["Let's Encrypt<br/>ACME Server"] DNS["DNS Provider"] end CERT -->|"watches"| CM CM -->|"creates"| SECRET CM <-->|"ACME protocol"| LE CM <-->|"DNS challenge"| DNS SECRET -->|"mounts"| INGRESS You create a Certificate resource cert-manager requests a certificate from the issuer (Let’s Encrypt, Vault, etc.) cert-manager completes the challenge (HTTP-01 or DNS-01) cert-manager stores the certificate in a Kubernetes Secret Your Ingress/Gateway uses the Secret for TLS Renewal happens automatically 30 days before expiration. ...

April 12, 2026 · 6 min read · Tom Meurs
Cilium eBPF networking architecture

Cilium Deep Dive: eBPF Networking for Kubernetes

Kubernetes networking is notoriously complex. CNI plugins, kube-proxy, iptables chains, service meshes — layers upon layers of abstraction that eventually break in ways nobody understands. Cilium changes this. It uses eBPF to move networking logic into the Linux kernel, bypassing iptables entirely. The result: better performance, more visibility, and network policies that actually make sense. This is what I run in my clusters. Let me show you why. What is eBPF? eBPF (extended Berkeley Packet Filter) lets you run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. ...

April 8, 2026 · 7 min read · Tom Meurs
Distributed tracing visualization with Tempo

Distributed Tracing with Tempo and OpenTelemetry

You have metrics telling you something is slow. You have logs telling you errors happened. But which request failed? Where did the latency come from? Which service in the chain caused the timeout? This is where distributed tracing comes in. It follows individual requests as they flow through your microservices, showing you exactly what happened and where. The Observability Triangle flowchart TD subgraph observability["Complete Observability"] M["Metrics<br/>(Prometheus/Thanos)<br/>WHAT is happening"] L["Logs<br/>(Loki)<br/>WHY it happened"] T["Traces<br/>(Tempo)<br/>WHERE it happened"] end M <--> L L <--> T T <--> M G["Grafana"] --> M G --> L G --> T Metrics answer: “What is the error rate? What is the latency?” Logs answer: “What error message? What was the context?” Traces answer: “Which service? Which call? What was the path?” Together, they give you complete understanding. ...

April 4, 2026 · 7 min read · Tom Meurs
Loki log aggregation architecture for Kubernetes

Loki for Kubernetes Logging: The Prometheus-Like Approach

You’ve got Prometheus for metrics. You can see what’s happening across your clusters. But when something breaks, metrics tell you that something is wrong — logs tell you why. The traditional answer is Elasticsearch. It’s powerful, flexible, and… expensive. It indexes everything, which means you pay for every byte of log data in CPU, memory, and storage. Loki takes a different approach: index labels, not content. It’s the same philosophy that makes Prometheus efficient for metrics, applied to logs. ...

March 31, 2026 · 7 min read · Tom Meurs
Thanos remote write push architecture with edge clusters

Thanos Remote Write: Push-Based Metrics for Edge and Multi-Cluster

In my previous post on Prometheus and Thanos, I covered the sidecar architecture — Thanos Sidecar runs alongside Prometheus, uploads TSDB blocks to object storage, and exposes data to the Querier. It works beautifully for clusters with stable connectivity to your central infrastructure. But what happens when your clusters are at the edge? When they might lose connectivity for hours or days? When you’re running dozens or hundreds of small clusters and don’t want sidecar complexity on each one? ...

March 27, 2026 · 8 min read · Tom Meurs