Kubernetes

Cilium Deep Dive: eBPF Networking for Kubernetes

Kubernetes networking is notoriously complex. CNI plugins, kube-proxy, iptables chains, service meshes — layers upon layers of abstraction that eventually break in ways nobody understands. Cilium changes this. It uses eBPF to move networking logic into the Linux kernel, bypassing iptables entirely. The result: better performance, more visibility, and network policies that actually make sense. This is what I run in my clusters. Let me show you why. What is eBPF? eBPF (extended Berkeley Packet Filter) lets you run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. ...

Distributed tracing visualization with Tempo

Distributed Tracing with Tempo and OpenTelemetry

You have metrics telling you something is slow. You have logs telling you errors happened. But which request failed? Where did the latency come from? Which service in the chain caused the timeout? This is where distributed tracing comes in. It follows individual requests as they flow through your microservices, showing you exactly what happened and where. The Observability Triangle flowchart TD subgraph observability["Complete Observability"] M["Metrics (Prometheus/Thanos) WHAT is happening"] L["Logs (Loki) WHY it happened"] T["Traces (Tempo) WHERE it happened"] end M <--> L L <--> T T <--> M G["Grafana"] --> M G --> L G --> T Metrics answer: “What is the error rate? What is the latency?” Logs answer: “What error message? What was the context?” Traces answer: “Which service? Which call? What was the path?” Together, they give you complete understanding. ...

Loki log aggregation architecture for Kubernetes

Loki for Kubernetes Logging: The Prometheus-Like Approach

You’ve got Prometheus for metrics. You can see what’s happening across your clusters. But when something breaks, metrics tell you that something is wrong — logs tell you why. The traditional answer is Elasticsearch. It’s powerful, flexible, and… expensive. It indexes everything, which means you pay for every byte of log data in CPU, memory, and storage. Loki takes a different approach: index labels, not content. It’s the same philosophy that makes Prometheus efficient for metrics, applied to logs. ...

Thanos remote write push architecture with edge clusters

Thanos Remote Write: Push-Based Metrics for Edge and Multi-Cluster

In my previous post on Prometheus and Thanos, I covered the sidecar architecture — Thanos Sidecar runs alongside Prometheus, uploads TSDB blocks to object storage, and exposes data to the Querier. It works beautifully for clusters with stable connectivity to your central infrastructure. But what happens when your clusters are at the edge? When they might lose connectivity for hours or days? When you’re running dozens or hundreds of small clusters and don’t want sidecar complexity on each one? ...

NixOS vs Talos Linux for Kubernetes nodes comparison

NixOS vs Talos for Kubernetes Nodes: Two Flavors of Immutable Infrastructure

I’ve written about Talos Linux as the immutable Kubernetes OS, and I’ve compared Arch vs NixOS for workstations. But there’s a question I get asked often: what about NixOS for Kubernetes nodes? Both NixOS and Talos are declarative. Both can be immutable. Both version their configuration. So why would you choose one over the other for running Kubernetes? I’ve run both in production. Here’s what I’ve learned. The Philosophical Difference Before diving into specifics, understand the core difference: ...