GitOps promises that Git is the source of truth. But what if someone kubectl edits a deployment? What if a mutating webhook changes a resource? What if the cluster silently diverges from what Git says it should be?

This is configuration drift, and it’s one of the most insidious problems in Kubernetes operations. ArgoCD can help you detect it — if you configure it correctly.

What Is Configuration Drift?

Drift happens when the actual state of your cluster differs from the desired state in Git.

flowchart LR
    subgraph git["Git says (Source of truth)"]
        G1["replicas: 3"]
        G2["image: v1.2.3"]
        G3["cpu: 100m"]
    end

    subgraph cluster["Cluster has (Actual state)"]
        C1["replicas: 5"]
        C2["image: v1.2.3"]
        C3["cpu: 200m"]
    end

    git -.->|"≠"| cluster

How did replicas become 5 when Git says 3? Possible causes:

  1. Manual changes: Someone ran kubectl scale or kubectl edit
  2. Horizontal Pod Autoscaler: HPA adjusted replicas
  3. Mutating webhooks: Admission controllers modified resources
  4. Controller side effects: Operators made changes
  5. Partial syncs: Sync failed midway

Some drift is intentional (HPA). Most is not. The problem is not knowing which is which.

Why Drift Matters

Without drift detection, you have no guarantee that Git represents reality. This breaks:

  1. Audit trails: “What’s deployed?” becomes “check the cluster” instead of “check Git”
  2. Disaster recovery: Rebuilding from Git won’t match the old state
  3. Security: Unauthorized changes go unnoticed
  4. Reproducibility: Two clusters from the same Git won’t be identical

The moment you have undetected drift, you’ve lost the core benefit of GitOps.

ArgoCD’s Sync Status

ArgoCD continuously compares Git to cluster state. The sync status tells you:

  • Synced: Cluster matches Git exactly
  • OutOfSync: Differences detected
  • Unknown: ArgoCD can’t determine state
ApplicationSync StatusHealth
frontendSyncedHealthy
backendOutOfSyncHealthy
databaseSyncedHealthy
cacheSyncedDegraded

“OutOfSync” = drift detected

OutOfSync means drift. But ArgoCD’s default behavior might surprise you.

Self-Heal: Automatic Drift Correction

ArgoCD can automatically revert drift:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: frontend
spec:
  syncPolicy:
    automated:
      selfHeal: true  # Revert manual changes
      prune: true     # Delete orphaned resources

With selfHeal: true, when someone runs kubectl scale deployment frontend --replicas=5, ArgoCD will revert it to what Git says within seconds.

This is powerful but has implications:

  • Intentional changes get reverted
  • HPA adjustments get overwritten
  • You can’t quickly hotfix production

For most applications, selfHeal should be enabled. It’s the “GitOps purist” approach.

Handling Intentional Drift: Ignore Differences

Some fields should be managed outside Git:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: autoscaled-app
spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas
    - group: ""
      kind: Service
      jsonPointers:
        - /spec/clusterIP

This tells ArgoCD: “Don’t report drift for these fields.”

Common fields to ignore:

  • /spec/replicas (if using HPA)
  • /spec/clusterIP (assigned by Kubernetes)
  • /metadata/annotations (controller-added)
  • /status (always managed by controllers)

Detecting Drift Without Auto-Fix

Sometimes you want to know about drift but not automatically fix it:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: critical-app
spec:
  syncPolicy:
    automated:
      selfHeal: false  # Don't auto-fix
      prune: false     # Don't auto-delete

Now ArgoCD shows OutOfSync status but waits for manual intervention. This is useful for:

  • Critical production systems where you want human review
  • Debugging drift sources
  • Applications managed partially outside GitOps

Notifications: Alert on Drift

Don’t stare at the dashboard. Get notified:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  trigger.on-sync-status-unknown: |
    - when: app.status.sync.status == 'OutOfSync'
      send: [app-out-of-sync]
  template.app-out-of-sync: |
    message: |
      Application {{.app.metadata.name}} is OutOfSync.
      Sync Status: {{.app.status.sync.status}}
      Health: {{.app.status.health.status}}
      Repository: {{.app.spec.source.repoURL}}

Connect this to Slack, PagerDuty, or email. Drift should trigger alerts.

The Diff View: Understanding Drift

When drift occurs, ArgoCD shows exactly what changed:

argocd app diff frontend

Or in the UI, click on an OutOfSync application to see the diff:

--- Git (desired)
+++ Cluster (actual)
@@ -1,4 +1,4 @@
 spec:
-  replicas: 3
+  replicas: 5
   template:
     spec:

This is invaluable for understanding what drifted and why.

Refresh vs Sync

Two different operations:

Refresh: Compare Git to cluster, update status. No changes made.

argocd app get frontend --refresh

Sync: Apply Git state to cluster. Changes made.

argocd app sync frontend

Refresh is safe and frequent (every 3 minutes by default). Sync is destructive and should be deliberate (unless automated).

Drift Detection Strategy

Here’s my approach:

For Development/Staging

  • selfHeal: true — Revert all drift
  • prune: true — Delete orphaned resources
  • Fast feedback, pure GitOps

For Production (Most Apps)

  • selfHeal: true — Revert drift
  • prune: true — Delete orphaned
  • Alerts on any OutOfSync event
  • Investigate why drift happened

For Production (Critical/Special)

  • selfHeal: false — Human review required
  • prune: false — Manual deletion only
  • Strict alerts
  • Explicit sync approval

For HPA-Managed Apps

ignoreDifferences:
  - group: apps
    kind: Deployment
    jsonPointers:
      - /spec/replicas
syncPolicy:
  automated:
    selfHeal: true  # For other fields

Finding the Drift Source

When you see drift, investigate:

  1. Check audit logs: Who ran kubectl?

    kubectl get events --field-selector reason=Update
    
  2. Check controller logs: Did an operator make changes?

  3. Check admission webhooks: Are mutations happening?

    kubectl get mutatingwebhookconfigurations
    
  4. Check the diff: What exactly changed?

    argocd app diff app-name
    

Preventing Drift at the Source

Better than detecting drift is preventing it:

  1. RBAC restrictions: Limit who can modify resources

    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: readonly
    rules:
      - apiGroups: ["*"]
        resources: ["*"]
        verbs: ["get", "list", "watch"]  # No create/update/delete
    
  2. Policy enforcement: Use Kyverno to block manual changes

    apiVersion: kyverno.io/v1
    kind: ClusterPolicy
    metadata:
      name: require-gitops
    spec:
      rules:
        - name: block-manual-changes
          match:
            resources:
              kinds:
                - Deployment
          exclude:
            subjects:
              - kind: ServiceAccount
                name: argocd-application-controller
          validate:
            message: "Changes must go through GitOps"
            deny: {}
    
  3. Training: Teach teams to change Git, not cluster

Monitoring Drift Over Time

Track drift as a metric:

# Prometheus query for out-of-sync apps
count(argocd_app_info{sync_status="OutOfSync"})

Alert if it’s non-zero for too long:

- alert: GitOpsDriftDetected
  expr: count(argocd_app_info{sync_status="OutOfSync"}) > 0
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "GitOps drift detected"
    description: "One or more applications are OutOfSync with Git"

My Checklist for Drift-Free GitOps

[ ] selfHeal enabled for most applications
[ ] ignoreDifferences configured for HPA-managed replicas
[ ] Notifications set up for OutOfSync events
[ ] RBAC restricts direct cluster modifications
[ ] Policy enforcement prevents manual changes
[ ] Monitoring alerts on drift
[ ] Team trained on GitOps workflow

Configuration drift is the enemy of reliable infrastructure. Detect it immediately, fix it automatically where safe, and investigate ruthlessly when it happens. Git should always reflect reality — that’s the whole point.