Infrastructure as Code for People Who Need to Understand

Here is how a lot of infrastructure still gets built in 2026. Someone opens a cloud console, clicks through a wizard, picks some defaults, and a resource appears. It works. The dashboard turns green. Everyone moves on.

I can’t work that way. When I click a button and infrastructure appears, I feel like I borrowed it. I want to see what’s happening, I want the configuration written down where I can read it back, and I want to know why something exists instead of just that it exists. The console gives me a green checkmark. It doesn’t give me understanding.

This is the gap I want to talk about: the distance between infrastructure that happens to you and infrastructure you can actually hold in your head.

What Click-Ops Actually Costs

Click your way to a working system and you get a working system. You also get a pile of decisions nobody wrote down.

Every resource created through a GUI carries settings you didn’t consciously choose, dependencies you never saw form, and state that quietly accumulates where you can’t audit it. None of that hurts on day one. It hurts six months later, when someone asks why a security group has 47 rules, or whether that orphaned load balancer is safe to delete, and the honest answer is that nobody knows.

The documentation won’t save you either. Docs describe what things should be. The moment you change something in the console without updating the wiki, the two drift apart, and they always drift apart. You end up trusting a description of your infrastructure instead of your infrastructure.

What It Looks Like When Everything Is Written Down

Now picture the same system, except every piece of it exists as text you can read, review, and recreate.

# This exists because we need to allow traffic from the monitoring namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-prometheus
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: monitoring
      ports:
        - port: 9090
          protocol: TCP

The comment explains why. The code shows what. Both live in the same file, both version controlled, both readable by anyone who clones the repo. There is no archaeology to do, because the artifact and its reasoning never separated.

That single property, explicit over implicit, is what makes Infrastructure as Code feel less like a best practice and more like a way of working I can actually trust.

The Real Cost Is Cognitive

Let me go back to the messy side for a moment, because the surface-level cost of click-ops isn’t the interesting part. The interesting part is what it does to your head.

You made a configuration decision for a reason. Months pass. Was that timeout value load-bearing? Did that IP whitelist protect against something specific, or did someone add it during an incident and forget? Nobody remembers, so the setting stays forever, quietly turning into the kind of technical debt that everyone is slightly afraid to touch.

And there’s always one person on the team who happens to remember why the network looks the way it does. When something breaks, everyone turns to that person. They become a single point of failure made of memory, and memory is the least reliable storage system we have.

Written-down infrastructure moves that knowledge out of one tired person’s head and into a file the whole team can read.

What You Get Back

Once the system lives in Git, a few things stop being heroics and start being ordinary.

History stops being a guess. Every change carries a commit message and an author, so you can trace a decision backwards through time:

git log --oneline -- network-policy.yaml

a1b2c3d Add network policy for Prometheus scraping
d4e5f6g Restrict to specific port (was allowing all)
g7h8i9j Initial creation for monitoring access

A second pair of eyes gets a chance before production does. The classic “allow all” mistake is hard to catch in a console, but in a diff it jumps off the screen:

# Pull request diff
- cidrBlocks:
-   - 0.0.0.0/0  # DANGER: allows all traffic
+ cidrBlocks:
+   - 10.0.0.0/8  # Internal network only

Recovery turns into a command instead of a long, anxious afternoon. When a cluster dies, you don’t reconstruct it from memory:

# Recreate everything
terraform apply
kubectl apply -k ./infrastructure
argocd app sync --all

The declarative part is what makes this hold together. You write down the state you want, and the system’s job is to make reality match it.

# I declare: there should be 3 replicas
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3

A pod dies, Kubernetes notices the drift from three, and brings up a new one without anyone clicking anything. You declared the contract, and the system enforces it.

There is room to write down intent, too, not just mechanics. Tags and comments turn configuration into a message to whoever reads it next, usually a future version of you:

# Terraform: Why this VPC exists
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"

  tags = {
    Name        = "production"
    Environment = "production"
    Purpose     = "Primary VPC for production workloads"
    Owner       = "platform-team"
    CostCenter  = "infrastructure"
  }
}

This is roughly how my homelab is built. Everything that runs there is defined in one repo:

homelab/
├── infrastructure/
│   ├── terraform/        # VMs, networks, DNS
│   └── kubernetes/       # Cluster configuration
├── applications/
│   ├── argocd/          # GitOps controller
│   ├── monitoring/      # Prometheus, Grafana, Loki
│   └── services/        # Home Assistant, GitLab, etc.
└── secrets/
    └── external-secrets/  # ExternalSecret manifests (secrets live in Vault)

If the whole thing burned down tomorrow, I could rebuild it with three commands and a fresh disk, because I never trusted my own memory to hold the configuration in the first place.

Why People Don’t Switch (And Why The Objections Are Soft)

If this is so obviously better, why does click-ops survive? Because the objections feel real in the moment, even when they don’t hold up.

“It’s slower than clicking.” Initially, sure. Writing the YAML once takes ten minutes against a five-minute click. The catch is that you only write the YAML once, and it works across every environment, while the clicking has to be repeated by hand every single time and transfers to nothing.

“Not everything can be Infrastructure as Code.” True. Some steps need a human, and the answer there is to document them honestly rather than pretend they don’t exist:

# manual-steps.md

## Setting up external DNS provider

1. Log into Cloudflare dashboard
2. Create API token with Zone:Edit permission
3. Add token to Vault at `secret/cloudflare/api-token`

Note: This is manual because Cloudflare doesn't support
organization-level tokens via API.

Last performed: 2026-03-15 by @tom

The goal is that nothing is undocumented, even the parts you can’t automate.

“My team won’t adopt it.” Then don’t ask them to adopt it all at once. Automate one painful thing well, and let the story sell itself: the three hours you didn’t spend rebuilding a server, the outage a code review would have caught. People follow results faster than they follow arguments.

Start From Where You Already Are

You don’t have to rewrite your infrastructure to get here. You start by capturing what already exists.

# Export existing Kubernetes resources
kubectl get deployment my-app -o yaml > deployments/my-app.yaml

# Import existing Terraform resources
terraform import aws_instance.web i-1234567890abcdef0

Then give it some structure, split by environment so the differences are visible instead of buried:

infrastructure/
├── base/                    # Common configuration
│   ├── namespaces.yaml
│   ├── rbac.yaml
│   └── network-policies.yaml
├── environments/
│   ├── development/
│   │   └── kustomization.yaml  # Overlays for dev
│   ├── staging/
│   │   └── kustomization.yaml
│   └── production/
│       └── kustomization.yaml  # Production-specific settings
└── applications/
    ├── api/
    ├── frontend/
    └── database/

And stop applying it by hand. Wire it into CI, or better, hand it to GitOps and let a controller keep reality in sync with the repo:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: infrastructure
spec:
  source:
    repoURL: https://gitlab.internal/infrastructure
    path: environments/production
  destination:
    server: https://kubernetes.default.svc
  syncPolicy:
    automated:
      selfHeal: true

Why This Connects to Sovereignty

For me this lands on something bigger than tidy YAML. Digital sovereignty goes well past owning hardware. It’s about understanding and controlling the systems you depend on, and you can’t control what you can’t see.

Infrastructure as Code is what makes that control real. It gives you portability, because moving providers becomes a config change. It gives you reproducibility, because you can recreate everything from scratch. Most of all it gives you the ability to answer “why does this exist” for any piece of your stack, instead of depending on whoever last clicked the buttons and hoping they remember.

That’s the part I actually care about. I understand my infrastructure because I defined it, line by line, in files I can read. When something breaks at 2am, there is no console to spelunk through, just a repo that tells me exactly what should be running and why. For a brain that treats unexplained systems as a low-grade itch, that’s worth far more than the afternoon it cost to write it all down.

What Click-Ops Actually Costs#

What It Looks Like When Everything Is Written Down#

The Real Cost Is Cognitive#

What You Get Back#

Why People Don’t Switch (And Why The Objections Are Soft)#

Start From Where You Already Are#

Why This Connects to Sovereignty#