Kubernetes is designed to be self-healing, but what does that actually mean? More importantly: what happens when the components doing the healing themselves fail?

I’ve run Kubernetes clusters through all kinds of failures — planned, unplanned, and “hold my beer” experiments. Here’s what actually happens when things break.

The Components That Can Fail

Before diving into failure scenarios, let’s map out what we’re working with:

Control Plane:

  • kube-apiserver: The API that everything talks to
  • etcd: The database storing all cluster state
  • kube-scheduler: Decides where pods run
  • kube-controller-manager: Runs controllers (ReplicaSet, Deployment, etc.)
  • cloud-controller-manager: Cloud provider integrations (if applicable)

Node Components:

  • kubelet: Manages pods on each node
  • kube-proxy: Handles network rules for Services
  • Container runtime: Actually runs containers

Scenario 1: API Server Down

The kube-apiserver is the single point through which all Kubernetes API requests flow. What happens when it dies?

Immediate impact:

  • kubectl commands fail
  • No new deployments or updates possible
  • No new pod scheduling
  • Existing pods keep running

What keeps working:

  • Running pods continue to run
  • Containers stay alive
  • Network connectivity between pods
  • Services continue to route traffic

What breaks:

  • No new pods can be created
  • Failed pods won’t be replaced
  • Horizontal Pod Autoscaler stops working
  • No changes to any resources
flowchart TD
    subgraph api_down["API Server Down"]
        subgraph control["Control Plane"]
            etcd["etcd ✓"]
            sched["scheduler<br/>idle"]
            ctrl["controller-mgr<br/>idle"]
            apiX["✗ API Server"]
        end
        control -->|"no updates"| nodes
        subgraph nodes["Worker Nodes"]
            N1["Node 1<br/>Pods ✓"]
            N2["Node 2<br/>Pods ✓"]
            N3["Node 3<br/>Pods ✓"]
        end
    end
    note["Pods keep running - they don't need the API"]

This is the key insight: Kubernetes is designed for API server outages. The system degrades gracefully — workloads keep running, you just can’t change anything.

For HA setups, you want multiple API servers behind a load balancer. See Kubernetes High Availability: stacked vs external etcd for architecture options.

Scenario 2: etcd Down

etcd is the brain of Kubernetes. All cluster state lives here. This is the scariest failure.

Immediate impact:

  • API server can’t read or write state
  • Effectively same as API server down
  • Existing pods keep running

What keeps working:

  • Running pods continue to run
  • Containers stay alive
  • Network connectivity
  • Services work

What breaks:

  • Same as API server down, plus:
  • Risk of state inconsistency on recovery
  • Split-brain scenarios in partial failures
flowchart TD
    subgraph etcd_down["etcd Down"]
        subgraph control["Control Plane"]
            etcdX["etcd ✗"] --> apiX["API ✗"]
            apiX --> other["other components"]
        end
    end
    note["Without etcd, API server cannot function<br/>Nodes continue running - they cache their assignments"]

etcd failures are why backups are non-negotiable. See etcd Deep Dive for understanding etcd’s role and backup strategies.

Scenario 3: Scheduler Down

The kube-scheduler decides where pods run. When it’s down:

Immediate impact:

  • New pods stay in Pending state
  • No scheduling decisions made

What keeps working:

  • Existing pods keep running
  • API server functions normally
  • You can create resources (they just won’t be scheduled)

What breaks:

  • New pods can’t be scheduled
  • Rescheduling after node failure doesn’t happen
  • HPA creates pods that stay Pending
flowchart TD
    subgraph sched_down["Scheduler Down"]
        apply["kubectl apply deployment"] --> api["API accepts"]
        api --> pending["Pod: Pending..."]
        subgraph queue["Pending Queue"]
            web["Pod: web"]
            apiPod["Pod: api"]
            job["Pod: job"]
            waiting["... waiting"]
        end
    end
    note["Nodes have capacity but nobody assigns pods to them"]

In HA setups, you run multiple schedulers with leader election. Only one is active, others wait.

Scenario 4: Controller Manager Down

The controller manager runs all the controllers that make Kubernetes “self-healing.”

Immediate impact:

  • ReplicaSet controller stops
  • Deployment controller stops
  • Node controller stops
  • All reconciliation loops stop

What keeps working:

  • Existing pods keep running
  • Scheduling still works
  • API server still works

What breaks:

  • Failed pods don’t get replaced
  • Deployments don’t roll out
  • Node failures aren’t handled
  • Orphaned resources aren’t cleaned up
flowchart TD
    subgraph ctrl_down["Controller Manager Down"]
        rs["ReplicaSet: 3 desired, 2 running → no action taken"]
        deploy["Deployment: rollout in progress → stuck"]
        node["Node marked NotReady → pods not evicted"]
    end
    note["The 'self-healing' part of Kubernetes stops"]

This is where you notice Kubernetes isn’t magic — it’s just software that runs reconciliation loops. Stop the loops, stop the magic.

Scenario 5: Kubelet Down on a Node

The kubelet is the Kubernetes agent on each node. When it fails:

Immediate impact (on that node):

  • Node marked as NotReady after timeout (default ~40 seconds)
  • Pods on that node get evicted (after another timeout)
  • No new pods scheduled to that node

What keeps working:

  • Containers keep running (they don’t need kubelet)
  • Network might still work (depends on CNI)
  • Other nodes unaffected

What breaks:

  • No pod lifecycle management on that node
  • Health checks stop
  • Resource updates stop
  • Eventually pods are rescheduled elsewhere
flowchart LR
    subgraph kubelet_down["Kubelet Down on Node 2"]
        subgraph N1["Node 1 ✓ Ready"]
            P1["pod ✓"]
        end
        subgraph N2["Node 2 ✗ NotReady"]
            P2["pod ?<br/>orphaned"]
        end
        subgraph N3["Node 3 ✓ Ready"]
            P3["pod ✓<br/>rescheduled"]
        end
        P2 -.->|"rescheduled"| P3
    end
    note["After pod-eviction-timeout, pods get rescheduled"]

The interesting part: containers keep running even without kubelet. The kubelet manages them, but doesn’t keep them alive.

Scenario 6: Container Runtime Down

If the container runtime (containerd, CRI-O) fails:

Immediate impact:

  • Running containers might die
  • New containers can’t start
  • Health checks fail

What happens next:

  • kubelet detects failures
  • Pods marked as Failed
  • Pods get rescheduled to other nodes

This is typically a node-level failure that triggers pod eviction.

Scenario 7: Network Partition

Network partitions are the trickiest failures. A node loses connectivity to the control plane but can still run containers.

What happens:

  • Node marked NotReady (can’t reach API server)
  • Pods eventually evicted
  • But they might still be running on the partitioned node
  • Potential for “split brain” — same pod running in two places
flowchart LR
    subgraph partition["Network Partition"]
        subgraph control["Control Plane"]
            cp["Node 2 is NotReady<br/>Evicting pods..."]
        end
        control x--x|"✗"| partitioned
        subgraph partitioned["Partitioned Node"]
            pn["I'm fine, running<br/>these pods..."]
        end
    end
    note["Pod 'web-abc123' now runs on Node 1 AND Node 2<br/>Both think they're the real one"]

This is why stateful applications need careful handling. Databases with split-brain can corrupt data.

Failure Timeouts to Know

These timeouts affect how fast Kubernetes reacts to failures:

TimeoutDefaultWhat it does
node-monitor-grace-period40sHow long before marking node NotReady
pod-eviction-timeout5mHow long before evicting pods from NotReady node
node-monitor-period5sHow often node status is checked

For faster failover, you can tune these — but beware of false positives during temporary network issues.

The Blast Radius Principle

Every failure affects a “blast radius”:

ComponentBlast Radius
ContainerSingle container
PodAll containers in pod
KubeletAll pods on node
NodeAll pods on node
SchedulerNew pod scheduling cluster-wide
Controller ManagerSelf-healing cluster-wide
API ServerAll management operations
etcdEverything

Design your HA accordingly. etcd and API server need the most redundancy.

What This Means for You

  1. Running workloads are resilient: Existing pods survive most control plane failures
  2. Management operations aren’t: You need control plane HA for continuous deployment
  3. etcd is the critical path: Protect it, back it up, monitor it
  4. Failures cascade: API server down → looks like everything is down
  5. Timeouts matter: Know your failure detection times

The beauty of Kubernetes is that it was designed with failures in mind. The question isn’t whether components will fail — it’s whether you’ve architected for it.

Testing Failures

Don’t wait for production to find out how your cluster behaves. Test:

# Simulate API server failure (on a test cluster!)
kubectl exec -it -n kube-system kube-apiserver-xxx -- kill 1

# Simulate scheduler failure
kubectl scale deployment kube-scheduler -n kube-system --replicas=0

# Simulate kubelet failure on a node
ssh node-1 'sudo systemctl stop kubelet'

Better yet, use chaos engineering tools like Litmus Chaos for controlled experiments.


Understanding failure modes isn’t pessimism — it’s engineering. Every system fails. The question is whether you designed for it or get surprised by it.