When I first set up a “production” Kubernetes cluster, I had one control plane node. It worked fine until it didn’t — a failed disk took down the entire cluster. Every pod, every service, everything. That day taught me what “single point of failure” really means.

Kubernetes High Availability isn’t optional for production. But there’s a choice that trips up many people: do you run etcd on your control plane nodes (stacked), or on separate dedicated nodes (external)? Let me break down both approaches.

What is etcd and Why Does it Matter?

etcd is Kubernetes’ brain. It’s a distributed key-value store that holds all cluster state: pod definitions, secrets, configmaps, service accounts — everything. When you kubectl apply something, it ends up in etcd.

If etcd dies, your cluster is brain-dead. The API server can’t function, controllers can’t reconcile state, nothing works. This is why etcd availability directly determines cluster availability.

etcd uses the Raft consensus algorithm, which requires a quorum (majority) of nodes to operate. With 3 etcd nodes, you can lose 1. With 5, you can lose 2. Lose quorum, and etcd becomes read-only until nodes recover.

Stacked etcd: The Simple Approach

In a stacked topology, etcd runs on each control plane node alongside the API server, controller manager, and scheduler.

flowchart TD
    subgraph stacked["Stacked etcd Topology"]
        subgraph CP1["Control Plane Node 1"]
            API1["API Server"]
            Ctrl1["Controller"]
            Sched1["Scheduler"]
            etcd1["etcd"]
        end
        subgraph CP2["Control Plane Node 2"]
            API2["API Server"]
            Ctrl2["Controller"]
            Sched2["Scheduler"]
            etcd2["etcd"]
        end
        subgraph CP3["Control Plane Node 3"]
            API3["API Server"]
            Ctrl3["Controller"]
            Sched3["Scheduler"]
            etcd3["etcd"]
        end
    end

Advantages of Stacked etcd

Simplicity: Fewer nodes to manage. Three nodes give you a fully HA control plane with etcd quorum.

Cost: Half the infrastructure compared to external etcd. For homelabs and smaller deployments, this matters.

Easier setup: kubeadm, k3s, and most installers default to stacked because it’s simpler. kubeadm init with --control-plane-endpoint and you’re done.

Coupled lifecycle: Control plane and etcd scale together. Add a control plane node, get an etcd member automatically.

Disadvantages of Stacked etcd

Coupled failure: If a node dies, you lose both a control plane component AND an etcd member. The blast radius is larger.

Resource contention: etcd is sensitive to disk latency. If your API server is hammering the disk during a large list operation, etcd might miss heartbeats and trigger leader election.

Scaling constraints: You might want 5 API servers but only 3 etcd members (or vice versa). Stacked topology forces them to match.

External etcd: The Decoupled Approach

External etcd runs on dedicated nodes, completely separate from the control plane.

flowchart TD
    subgraph external["External etcd Topology"]
        subgraph etcd_cluster["etcd Cluster"]
            E1["etcd Node 1"]
            E2["etcd Node 2"]
            E3["etcd Node 3"]
        end
        etcd_cluster --> control_plane
        subgraph control_plane["Control Plane Nodes"]
            subgraph CP1["Control Plane 1"]
                API1["API Server<br/>Controller<br/>Scheduler"]
            end
            subgraph CP2["Control Plane 2"]
                API2["API Server<br/>Controller<br/>Scheduler"]
            end
            subgraph CP3["Control Plane 3"]
                API3["API Server<br/>Controller<br/>Scheduler"]
            end
        end
    end

Advantages of External etcd

Isolated failures: A control plane node dying doesn’t affect etcd quorum. An etcd node dying doesn’t affect API server availability.

Independent scaling: Run 3 etcd members and 5 API servers, or whatever your workload needs.

Optimized resources: Dedicate fast SSDs to etcd nodes. Give API servers more RAM. Each component gets what it needs.

Easier etcd maintenance: Backup, restore, and upgrade etcd without touching the control plane.

Disadvantages of External etcd

Complexity: More nodes, more moving parts, more network paths to secure.

Cost: Double the node count for the control plane tier.

Network dependency: The API server to etcd path becomes critical. Network issues between them break everything.

Setup overhead: kubeadm requires manual etcd cluster setup before initializing the control plane.

When to Use Which

Choose Stacked etcd When:

  • Small to medium clusters: Under 100 nodes, stacked is usually fine
  • Homelab or development: Fewer nodes means less power, less cost
  • Team simplicity: Your team isn’t experienced with etcd operations
  • Using managed Kubernetes: EKS, GKE, AKS handle this for you anyway

Choose External etcd When:

  • Large clusters: 100+ nodes means more API server load, more etcd writes
  • Strict SLAs: When you can’t afford coupled failures
  • Mixed workloads: You need to scale control plane and etcd independently
  • etcd expertise: Your team knows how to operate etcd clusters

Setting Up Stacked HA with kubeadm

Here’s the practical setup for stacked HA:

# On first control plane node
kubeadm init \
  --control-plane-endpoint "loadbalancer.example.com:6443" \
  --upload-certs

# Save the join commands for other control planes
# Join additional control plane nodes
kubeadm join loadbalancer.example.com:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <cert-key>

The key is --control-plane-endpoint — this should point to a load balancer in front of your API servers, not a single node IP.

Setting Up External etcd

For external etcd, you first bootstrap the etcd cluster:

# On each etcd node, configure etcd
cat > /etc/etcd/etcd.conf.yaml << EOF
name: etcd-1
data-dir: /var/lib/etcd
initial-cluster-state: new
initial-cluster: etcd-1=https://10.0.0.1:2380,etcd-2=https://10.0.0.2:2380,etcd-3=https://10.0.0.3:2380
listen-peer-urls: https://10.0.0.1:2380
listen-client-urls: https://10.0.0.1:2379,https://127.0.0.1:2379
advertise-client-urls: https://10.0.0.1:2379
initial-advertise-peer-urls: https://10.0.0.1:2380
client-transport-security:
  cert-file: /etc/etcd/pki/server.crt
  key-file: /etc/etcd/pki/server.key
  trusted-ca-file: /etc/etcd/pki/ca.crt
peer-transport-security:
  cert-file: /etc/etcd/pki/peer.crt
  key-file: /etc/etcd/pki/peer.key
  trusted-ca-file: /etc/etcd/pki/ca.crt
EOF

Then point kubeadm to the external cluster:

# kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
etcd:
  external:
    endpoints:
      - https://10.0.0.1:2379
      - https://10.0.0.2:2379
      - https://10.0.0.3:2379
    caFile: /etc/kubernetes/pki/etcd/ca.crt
    certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
    keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key

What About K3s?

K3s defaults to embedded etcd (stacked) with a simpler setup:

# First server
curl -sfL https://get.k3s.io | sh -s - server --cluster-init

# Additional servers
curl -sfL https://get.k3s.io | sh -s - server \
  --server https://first-server:6443 \
  --token <token>

K3s also supports external databases (MySQL, PostgreSQL) as an etcd alternative, which can be useful if you already operate those databases.

My Recommendation

For most use cases, start with stacked etcd. It’s simpler, cheaper, and for clusters under 100 nodes with reasonable workloads, it works great.

Move to external etcd when you:

  • Experience etcd performance issues
  • Need independent scaling
  • Have strict availability requirements that justify the complexity

I run stacked etcd in my homelab on three control plane nodes. It’s survived node failures, network issues, and the occasional “oops I rebooted the wrong server.” Stacked doesn’t mean fragile — it means coupled.

The most important thing isn’t stacked vs external. It’s having at least three control plane nodes with etcd. One node isn’t HA. Two nodes is worse than one (you need majority quorum, and 2/2 - 1 = 1, meaning you can’t lose any nodes).

Three nodes. That’s the minimum for real HA.


Kubernetes HA is about surviving failures, not preventing them. Failures will happen. The question is whether your cluster survives them.