Platform

Infrastructure as Code for People Who Need to Understand

Here is how a lot of infrastructure still gets built in 2026. Someone opens a cloud console, clicks through a wizard, picks some defaults, and a resource appears. It works. The dashboard turns green. Everyone moves on. I can’t work that way. When I click a button and infrastructure appears, I feel like I borrowed it. I want to see what’s happening, I want the configuration written down where I can read it back, and I want to know why something exists instead of just that it exists. The console gives me a green checkmark. It doesn’t give me understanding. ...

Internal Developer Platform architecture

Building an Internal Developer Platform: Where to Start

Every platform team eventually asks the same question: should we build an Internal Developer Platform? The honest answer is usually yes. The part that wrecks teams is the how. I’ve watched platforms that cost a small fortune get shipped and then quietly abandoned because nobody wanted to use them. I’ve also seen a couple of Helm charts and a Kyverno policy change how a whole team ships software. The gap between those two outcomes has almost nothing to do with budget or which fashionable tool you picked. It comes down to whether you started by solving a real problem or by building the platform you imagined developers should want. ...

Grafana Dashboards That Actually Get Used

You have Grafana. You have Prometheus metrics. You have logs in Loki and traces in Tempo. The data is all there. You also have 47 dashboards that nobody opens. I have done this to myself more than once. Something breaks at 2 AM, I bolt together a dashboard to see what’s going on, and then it just sits there forever. Multiply that by a year of incidents and a few “let me just add a panel for that” moments, and you end up with a Grafana that’s mostly archaeology. Nobody remembers what half the panels mean. The honest move is to delete most of them, but first it helps to understand what makes the survivors worth keeping. ...

Chaos Engineering: Breaking Your Cluster to Make It Stronger

My dashboard is a wall of green. Pods running, replicas matched, CPU comfortable, no alerts firing. I look at it and feel that small dopamine hit of “everything is fine.” And for the most part, it is fine. The cluster has been up for weeks. Nothing has fallen over. That green wall is also the most dangerous thing in my homelab, because it tells me nothing about what happens when something goes wrong. It only tells me that, right now, nothing has. ...

Longhorn vs Rook-Ceph storage comparison

Longhorn vs Rook-Ceph: Kubernetes Storage Compared

The first time you run a stateful workload on a self-hosted cluster, you hit a wall. No cloud provider storage class to lean on. Just your nodes, their disks, and a Postgres pod that refuses to schedule because nothing can give it a PersistentVolume. So you start reading, and within an hour you’ve narrowed it down to two names that keep coming up: Longhorn and Rook-Ceph. I’ve run both in production. So let me get my bias out of the way before anything else: I default to Longhorn on small clusters, and I’ll explain exactly why later. Keep that in mind as you read, because it colours how I weigh things. Both are CNCF projects, both give you replicated block storage that survives a node dying, and both are good software. They just disagree about how much complexity you should be signing up for. ...