Every Kubernetes cluster eventually needs persistent storage. The question is: which solution?
For self-hosted clusters without cloud provider storage classes, two options dominate: Longhorn and Rook-Ceph. Both are CNCF projects. Both provide replicated block storage. Both work well.
But they’re very different in philosophy, complexity, and use cases. I’ve run both in production. Let me share what I’ve learned.
The Fundamental Difference
Longhorn: Simple distributed block storage built for Kubernetes. Each volume is replicated across nodes using standard Linux storage primitives.
Rook-Ceph: Kubernetes operator for Ceph, a battle-tested distributed storage system that predates Kubernetes by years. Brings Ceph’s full feature set into Kubernetes.
The trade-off:
- Longhorn prioritizes simplicity
- Rook-Ceph prioritizes features and scale
Longhorn: The Simple Choice
flowchart TD
subgraph longhorn["Longhorn Architecture"]
subgraph node1["Node 1"]
E1["Longhorn Engine"]
R1["Replica"]
end
subgraph node2["Node 2"]
E2["Longhorn Engine"]
R2["Replica"]
end
subgraph node3["Node 3"]
R3["Replica"]
end
end
PV["PersistentVolume"] --> E1
E1 --> R1
E1 --> R2
E1 --> R3
How Longhorn Works
- Engine per volume: Each PVC gets a dedicated Longhorn engine (runs as a pod)
- Replicas on nodes: Data replicated to multiple nodes’ local disks
- Synchronous replication: All replicas written before acknowledging
- iSCSI frontend: Engine exposes volume via iSCSI to the workload
Installing Longhorn
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--create-namespace
Basic configuration:
# longhorn-values.yaml
defaultSettings:
defaultReplicaCount: 3
defaultDataPath: /var/lib/longhorn
storageMinimalAvailablePercentage: 15
defaultLonghornStaticStorageClass: longhorn
persistence:
defaultClass: true
defaultClassReplicaCount: 3
Longhorn Strengths
Simplicity: Install with Helm, get storage. No dedicated storage nodes, no complex configuration.
UI: Built-in web interface for volume management, backup status, node health.
Backup: Native backup to S3-compatible storage with incremental snapshots.
Kubernetes-native: Designed for Kubernetes from the start. No legacy baggage.
Longhorn Limitations
Performance: Good for most workloads, but not optimized for extreme IOPS. Each volume runs through its own engine pod.
Scale: Works well up to ~100 nodes. Beyond that, consider alternatives.
Storage efficiency: Each replica is a full copy. No erasure coding.
Rook-Ceph: The Feature-Rich Choice
flowchart TD
subgraph rook["Rook-Ceph Architecture"]
subgraph mgmt["Management"]
OP["Rook Operator"]
MON["Ceph Monitors"]
MGR["Ceph Manager"]
end
subgraph storage["Storage"]
OSD1["OSD<br/>(disk 1)"]
OSD2["OSD<br/>(disk 2)"]
OSD3["OSD<br/>(disk 3)"]
OSD4["OSD<br/>(disk 4)"]
end
subgraph access["Access"]
RBD["RBD<br/>(Block)"]
RGW["RGW<br/>(Object)"]
CFS["CephFS<br/>(Filesystem)"]
end
end
PV["PersistentVolume"] --> RBD
RBD --> OSD1
RBD --> OSD2
How Rook-Ceph Works
- OSDs on disks: Each disk becomes an Object Storage Daemon
- CRUSH algorithm: Data distributed across OSDs using placement rules
- Multiple access methods: Block (RBD), Object (S3-compatible), Filesystem (CephFS)
- Monitors for consensus: Cluster state managed by monitor daemons
Installing Rook-Ceph
helm repo add rook-release https://charts.rook.io/release
helm repo update
# Install Rook operator
helm install rook-ceph rook-release/rook-ceph \
--namespace rook-ceph \
--create-namespace
# Create Ceph cluster
kubectl apply -f ceph-cluster.yaml
Cluster configuration:
# ceph-cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v18.2.0
mon:
count: 3
allowMultiplePerNode: false
mgr:
count: 2
storage:
useAllNodes: true
useAllDevices: false
deviceFilter: "^sd[b-z]" # Use sdb, sdc, etc.
resources:
mon:
requests:
cpu: 500m
memory: 1Gi
osd:
requests:
cpu: 500m
memory: 2Gi
Rook-Ceph Strengths
Scale: Handles petabytes. Used by CERN, Bloomberg, and other massive deployments.
Features: Block, object, and filesystem storage. Erasure coding. Snapshots. Mirroring.
Performance tuning: Extensive options for optimization. Can be tuned for specific workloads.
Storage efficiency: Erasure coding reduces overhead (e.g., 1.5x instead of 3x for replication).
Rook-Ceph Limitations
Complexity: More moving parts. Monitors, managers, OSDs all need resources and attention.
Resource overhead: Minimum 3 monitors, 2 managers, plus OSDs. Significant memory usage.
Learning curve: Ceph has decades of features and configuration options.
Dedicated storage nodes: For performance, often need dedicated nodes for OSDs.
Head-to-Head Comparison
| Aspect | Longhorn | Rook-Ceph |
|---|---|---|
| Complexity | Low | High |
| Setup time | 10 minutes | 30+ minutes |
| Resource overhead | Low | High |
| Max scale | ~100 nodes | 1000+ nodes |
| Storage types | Block only | Block, Object, Filesystem |
| Performance | Good | Excellent (when tuned) |
| Storage efficiency | 3x (replication) | 1.5x+ (erasure coding) |
| Backup | Built-in S3 | External tools |
| UI | Excellent | Ceph Dashboard |
| Community | Growing | Mature |
When to Choose Longhorn
Choose Longhorn when:
- Small to medium clusters (under 100 nodes)
- Simplicity matters — You want storage that “just works”
- Limited ops capacity — Small team, can’t dedicate time to storage management
- General workloads — Databases, stateful apps with moderate I/O
- Homelab or edge — Resource-constrained environments
# Typical Longhorn workload
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 100Gi
When to Choose Rook-Ceph
Choose Rook-Ceph when:
- Large clusters (100+ nodes)
- Multiple storage types needed — Block AND object AND filesystem
- Performance critical — Need to tune for specific workloads
- Storage efficiency matters — Erasure coding reduces costs
- Dedicated storage team — People who can learn and operate Ceph
# Rook-Ceph with erasure coding
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicated-pool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: erasure-coded-pool
namespace: rook-ceph
spec:
failureDomain: host
erasureCoded:
dataChunks: 2
codingChunks: 1
Performance Considerations
Longhorn Performance
# Tune replica count for performance vs durability
defaultSettings:
defaultReplicaCount: 2 # Faster than 3, less durable
# Use dedicated disk path
defaultDataPath: /mnt/fast-ssd/longhorn
Longhorn is I/O bound by the engine pod. For high IOPS workloads, it may bottleneck.
Rook-Ceph Performance
# Dedicated OSD nodes
spec:
placement:
osd:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: storage-node
operator: In
values:
- "true"
# NVMe optimization
storage:
config:
osdsPerDevice: "1"
storeType: bluestore
Ceph can saturate modern NVMe drives when properly configured.
Backup Strategies
Longhorn Backup
Built-in, configure S3 target:
defaultSettings:
backupTarget: s3://longhorn-backups@us-east-1/
backupTargetCredentialSecret: longhorn-s3-credentials
Schedule backups per volume:
apiVersion: longhorn.io/v1beta1
kind: RecurringJob
metadata:
name: daily-backup
spec:
cron: "0 2 * * *"
task: backup
groups:
- default
retain: 7
Rook-Ceph Backup
Use Velero with Ceph CSI snapshots:
velero install \
--provider aws \
--plugins velero/velero-plugin-for-csi \
--features=EnableCSI
Or native Ceph mirroring for disaster recovery between clusters.
My Setup
I run Longhorn in my homelab:
# My Longhorn configuration
defaultSettings:
defaultReplicaCount: 2 # 3 nodes, 2 replicas
defaultDataPath: /mnt/storage/longhorn
backupTarget: s3://backups@minio/longhorn/
backupTargetCredentialSecret: minio-credentials
storageMinimalAvailablePercentage: 20
persistence:
defaultClass: true
Why Longhorn?
- 3 nodes — Too small for Ceph overhead
- Simplicity — I don’t want to debug Ceph at 2 AM
- Backup integration — S3 backups to my MinIO
- Resource efficiency — Every MB matters on small nodes
If I were running 50+ nodes or needed object storage, I’d switch to Ceph.
Migration Path
Starting with Longhorn and growing? You can migrate:
- Backup data from Longhorn volume
- Deploy Rook-Ceph alongside
- Restore to Ceph volumes
- Update workloads to use new StorageClass
- Retire Longhorn once migrated
Both support CSI, so the interface to workloads is identical.
Why This Matters
Storage is the hardest part of Kubernetes. Get it wrong and you lose data. Over-engineer it and you waste resources maintaining complexity you don’t need.
The right choice depends on your scale:
- Homelab/small clusters: Longhorn
- Medium production: Either, based on features needed
- Large scale: Rook-Ceph
Both are solid. Both will serve you well. The difference is how much complexity you’re willing to manage.
Simple systems fail in simple ways. Complex systems fail in complex ways. Choose the complexity you can handle.
