Reliability

Kubernetes is designed to be self-healing, but what does that actually mean? More importantly: what happens when the components doing the healing themselves fail? I’ve run Kubernetes clusters through all kinds of failures — planned, unplanned, and “hold my beer” experiments. Here’s what actually happens when things break. The Components That Can Fail Before diving into failure scenarios, let’s map out what we’re working with: Control Plane: kube-apiserver: The API that everything talks to etcd: The database storing all cluster state kube-scheduler: Decides where pods run kube-controller-manager: Runs controllers (ReplicaSet, Deployment, etc.) cloud-controller-manager: Cloud provider integrations (if applicable) Node Components: ...

Reliability

Unbreakable - my fascination.

Graceful Degradation in Kubernetes: What Happens When Components Fail