“We want to migrate to Kubernetes by November.”

It was September. The client was an e-commerce company. Their biggest sales event of the year — Black Friday — was in late November. I said no. They asked if I knew someone who might take it on anyway.

I did. A fellow platform engineer — someone I respect, highly capable. I made the introduction, but warned him about the timeline. He took the engagement, documented the same concerns I had, got them signed off. The client proceeded anyway.

What happened next is his story, shared with permission. It’s a cautionary tale about why you should never migrate to Kubernetes without proper resource metrics. Let me tell you what he witnessed, and then let’s talk about how to do this right.

The Black Friday Disaster

The client had a monolithic PHP application running on four dedicated servers. Two app servers, one database server, one for Redis and background jobs. Simple, stable, boring — it had handled Black Friday traffic for three years.

But the CTO had been to a conference. Kubernetes was the future. Microservices were the future. They wanted to containerize everything and deploy to EKS before Black Friday to “handle scale better.”

Some context: the CTO was new, brought in specifically to drive innovation. The company had been coasting on the same tech stack for nearly a decade — profitable, but stagnant. The board wanted modernization, and the CTO needed a win to prove the investment was worth it.

But here’s why the timeline wasn’t pure hubris: their legacy hosting contract was ending December 1st. The provider had been acquired, the new owners were sunsetting the platform, and they wouldn’t extend. The options were: migrate before Black Friday, or go month-to-month on emergency hosting at 3x the cost while trying to migrate during December — the one month when code freezes make everything harder and any outage is catastrophic.

The CTO did the math. A risky migration in October with two weeks to stabilize before Black Friday versus a certain cash drain and an even riskier December migration. The CFO signed off. The CEO signed off. It wasn’t reckless — it was a calculated bet. They just calculated wrong because they didn’t have the data to calculate right.

The Warning Signs My Colleague Raised

  1. No metrics. They had basic Nagios monitoring — CPU, memory, disk. No application-level metrics. No request latency histograms. No understanding of how resources correlated with traffic.

  2. No load testing. They had never load tested the application. Their only data point was “it survived last year’s Black Friday.”

  3. No time for proper sizing. Kubernetes resource requests and limits require data. Without metrics, they’d be guessing.

  4. No rollback plan. The old infrastructure was being decommissioned. If Kubernetes failed, there was nothing to fail back to.

He documented all of this. He recommended finding a way to extend the hosting contract, even at higher cost, to buy time for proper preparation. They declined. The math had been done, the executives had signed off, and the decision was final.

What They Deployed

Without proper metrics, they made educated guesses based on the current server specs:

# What they deployed (DON'T DO THIS)
resources:
  requests:
    memory: "512Mi"
    cpu: "250m"
  limits:
    memory: "1Gi"
    cpu: "500m"

You might wonder: why set CPU limits at all? My colleague knows the arguments against them. The answer: the client’s platform team had a policy requiring all deployments to specify both requests and limits for all resources. The reason? At the CTO’s previous company, a cryptominer had been running on their servers for weeks before anyone noticed. It was one of the first policies they implemented after joining — if everything has CPU limits, a rogue process can’t consume unbounded resources. Reasonable logic, but it assumes you know what limits to set. Without historical data to push back with, and with the timeline pressure, my colleague went along with it. In hindsight, this policy — combined with the guessed values — made things worse.

They set up Horizontal Pod Autoscaler (HPA) based on CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 4
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Looked reasonable. It wasn’t.

Black Friday Morning

Traffic started climbing at 6 AM. By 8 AM, they were at 3x normal traffic. The HPA did its job — pods scaled from 4 to 12. Everything looked fine.

Then 10 AM hit.

Traffic jumped to 8x normal. The HPA tried to scale to 20 pods. But here’s what nobody knew: the application was memory-bound, not CPU-bound. Each pod was hitting the 1Gi memory limit and getting OOMKilled. The HPA saw low CPU utilization (because pods were dying before they could use CPU) and thought everything was fine.

New pods would start, load the PHP opcache, warm up, start serving traffic, run out of memory, die. Repeat. They had 20 pods constantly cycling through CrashLoopBackOff.

The Real Problem

The application’s memory footprint varied wildly based on the type of request:

  • Browse category: ~200MB
  • View product: ~300MB
  • Add to cart: ~400MB
  • Checkout: ~800MB+

On Black Friday, checkout traffic was 5x normal. Every pod was trying to handle checkout requests that needed 800MB, but they’d limited them to 1Gi with no headroom.

Worse: they’d set CPU requests too low. When pods did survive, they were being throttled. Response times went from 200ms to 8 seconds. Users started refreshing pages. More requests. More memory pressure. Death spiral.

The Fallout

  • 4 hours of degraded service during peak Black Friday hours
  • Estimated lost revenue: €240,000
  • Emergency scale-up of node pool (which took 20 minutes to provision)
  • Manual intervention to increase memory limits (which required redeployment)
  • Post-incident review with very uncomfortable executives

The application was fine. Kubernetes was fine. The configuration was the problem — and the configuration was wrong because they didn’t have the data to make it right.

The Right Way: Data-Driven Resource Sizing

Let me show you how this should have been done.

Step 1: Instrument Before You Migrate

You cannot size Kubernetes resources without understanding your application’s resource consumption patterns. At minimum, you need:

Memory metrics:

  • Baseline memory usage (idle)
  • Working set memory under load
  • Peak memory during traffic spikes
  • Memory consumption by request type (if varies)

CPU metrics:

  • CPU utilization under normal load
  • CPU utilization during peaks
  • Request latency at various CPU levels
  • Whether the app is CPU-bound or IO-bound

Request patterns:

  • Requests per second (RPS)
  • Request latency percentiles (p50, p95, p99)
  • Traffic patterns by time of day, day of week
  • Historical peak traffic events

Step 2: Load Test in Isolation

Before touching Kubernetes, load test your application on a single container:

# Run the app in a container with generous limits
docker run -d --name load-test \
  --memory 4g \
  --cpus 2 \
  -p 8080:8080 \
  your-app:latest

# Monitor resource usage during load test
docker stats load-test

Use a load testing tool to simulate realistic traffic:

# Example with k6
k6 run --vus 100 --duration 10m load-test.js

Record:

  • Memory high-water mark
  • CPU utilization percentage
  • Where performance degrades

Step 3: Calculate Requests and Limits

Based on your load test data, calculate proper resource specifications:

Memory:

requests.memory = p95 memory usage + 20% buffer
limits.memory = peak memory usage + 30% buffer

CPU:

requests.cpu = average CPU under expected load
limits.cpu = peak CPU during burst (or omit to allow bursting)

For the e-commerce client, proper sizing would have been:

# What they SHOULD have deployed
resources:
  requests:
    memory: "1Gi"      # p95 was ~800MB
    cpu: "500m"        # Average under load
  limits:
    memory: "2Gi"      # Peak during checkout was ~1.5GB
    # No CPU limit - allow bursting

Note: I’m a proponent of not setting CPU limits. CPU is compressible — if you run out, you get throttled. Memory isn’t — if you run out, you get killed. The Google SRE book and many practitioners recommend against CPU limits in most cases.

Step 4: Use Vertical Pod Autoscaler for Rightsizing

The Vertical Pod Autoscaler (VPA) observes actual resource usage and recommends (or automatically applies) better resource specifications.

Install VPA:

# Clone the autoscaler repo
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

# Install VPA components
./hack/vpa-up.sh

Create a VPA resource in “Off” mode to get recommendations without auto-applying:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Off"  # Just recommend, don't apply

After running for a few days, check recommendations:

kubectl describe vpa web-app-vpa

Output:

Recommendation:
  Container Recommendations:
    Container Name: web-app
    Lower Bound:
      Cpu:     200m
      Memory:  800Mi
    Target:
      Cpu:     450m
      Memory:  1200Mi
    Upper Bound:
      Cpu:     800m
      Memory:  2Gi

The “Target” values are VPA’s recommended requests. Use these as your baseline, then add appropriate headroom for limits.

Step 5: HPA + VPA Together (Carefully)

HPA and VPA can conflict. HPA scales horizontally based on metrics; VPA changes resource requests. If VPA increases requests, HPA might think pods are underutilized and scale down.

The safe approach:

  1. Use VPA in “Off” mode to get recommendations
  2. Apply recommendations manually as your baseline
  3. Use HPA for horizontal scaling based on custom metrics (not just CPU)
  4. Re-evaluate periodically with VPA recommendations

For the e-commerce client, better HPA configuration would have been:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 8    # Higher minimum for Black Friday
  maxReplicas: 50   # More headroom
  metrics:
  - type: Resource
    resource:
      name: memory  # Scale on memory, not just CPU
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second  # Custom metric
      target:
        type: AverageValue
        averageValue: "100"

The Checklist: Before You Migrate

Before any Kubernetes migration, answer these questions with data:

QuestionData Source
What’s the application’s memory baseline?Monitoring (Prometheus, Datadog, etc.)
What’s peak memory under load?Load testing
Is the app CPU-bound or memory-bound?Load testing + profiling
What’s p99 latency at expected load?Load testing
What’s the traffic pattern (daily, weekly, seasonal)?Historical metrics
What’s the highest traffic spike in the past year?Historical metrics
How does the app behave at 2x, 5x, 10x load?Load testing

If you can’t answer these questions, you’re not ready to migrate.

What They Did Differently the Second Time

After the Black Friday disaster, the client finally agreed to the expensive emergency hosting option they’d been trying to avoid. Three times the cost, but it bought them time. My colleague used December to do the migration properly:

  1. Instrumented everything. Prometheus, application metrics, custom metrics for checkout flow.

  2. Load tested extensively. They simulated Black Friday traffic patterns before the migration.

  3. Started with generous resources. Overprovisioned initially, then used VPA to rightsize.

  4. Validated with canary deployments. Ran new configuration alongside old, comparing behavior.

  5. Kept a rollback option. The emergency hosting stayed available as fallback for 30 days post-migration.

The late December migration — after the Christmas rush — was boring. Nothing broke. That’s the goal.

Conclusion

Kubernetes is powerful, but it’s not magic. It won’t fix your scaling problems if you don’t understand your application’s resource requirements. And you can’t understand those requirements without data.

The migration playbook:

  1. Instrument first. Get at least 2-4 weeks of metrics before migration planning.
  2. Load test thoroughly. Simulate your worst-case scenarios.
  3. Size conservatively. Overprovision initially, then rightsize with data.
  4. Use VPA for insights. Even if you don’t auto-apply, the recommendations are valuable.
  5. Keep a rollback plan. Until you’ve proven the new setup handles peak load.

Don’t be the engineer who deploys to Kubernetes two weeks before Black Friday with guessed resource limits. My colleague learned that lesson the hard way — and I’m grateful he shared the story so others don’t have to.

Take the time to do it right. Your future self — and your on-call rotation — will thank you.


Related posts: