Picture this: You’re monitoring your production Kubernetes cluster when suddenly pods start dying left and right, showing that dreaded OOMKilled status. Even worse, your nodes seem to have plenty of free memory. Sound familiar?

OOMKilled errors remain one of the most frustrating issues in Kubernetes environments, catching even experienced teams off guard. I’ve seen entire services go down because of poorly configured memory limits, and frankly, it’s a problem that keeps happening because the underlying mechanics aren’t well understood.

In this guide, we’ll cut through the confusion and give you practical, battle-tested strategies to prevent and resolve OOMKilled errors.

 

Understanding the OOMKilled Phenomenon

Let’s start with what actually happens when you see that OOMKilled status. Contrary to popular belief, Kubernetes doesn’t kill your pods directly. Instead, it’s the Linux kernel’s OOM Killer that does the dirty work.

Here’s the actual sequence of events:

  1. Container exceeds its memory limit
  2. Linux kernel detects memory pressure
  3. OOM Killer sends SIGKILL (signal 9) to the process
  4. Kubelet detects the termination
  5. Pod status updates to “OOMKilled”

The telltale sign is Exit Code 137 (128 + 9), which always indicates an OOM kill event.

Quick Diagnosis Commands

# Check pod status and recent events
kubectl describe pod <pod-name> -n <namespace>

# Look for the smoking gun in events
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep OOMKilled

# Check current resource usage
kubectl top pods -n <namespace> --sort-by=memory

 

 

Root Causes: Why Pods Get OOMKilled

Based on real-world incidents I’ve investigated, OOMKilled errors typically stem from five main causes:

1. Misconfigured Memory Limits

This is the big one. Developers often set memory limits based on guesswork rather than actual usage patterns.

Real Example: A fintech company’s fraud detection system was configured with a 2Gi memory limit, but during peak hours, it actually needed 3.5Gi. Result? Constant OOMKills during business hours.

2. Memory Leaks in Application Code

Memory leaks cause gradual memory consumption growth until the container hits its limit. Different languages have different leak patterns:

  • Java: JVM garbage collector not fully understanding container constraints
  • Go: Improved since Go 1.19 with GOMEMLIMIT, but still requires attention
  • Node.js: Event loop-related memory leak patterns
  • Python: Circular references and unclosed resources

3. Traffic Spikes

Applications that handle variable workloads can experience sudden memory spikes during traffic bursts.

Real Example: An e-commerce checkout service crashed during a flash sale when user traffic increased 10x, causing memory usage to spike beyond configured limits.

4. Node-Level Memory Pressure

Even well-configured pods can get killed when the entire node runs out of memory. Kubernetes follows a priority system based on Quality of Service (QoS) classes:

QoS Class:

QoS Class Priority Description
BestEffort Lowest No resource requests/limits set
Burstable Medium Requests < Limits
Guaranteed Highest Requests = Limits

5. Resource Overcommitment

This happens when the sum of all pod memory requests exceeds node capacity, or when pods burst beyond their requests simultaneously.

 

 

Memory Management Best Practices

The Golden Rule: Memory Limit = Memory Request

In 2025, the best practice is setting memory limit = memory request. This might surprise you, especially since we recommend NOT setting CPU limits at all.

Here’s why memory is different:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-app
    resources:
      requests:
        memory: "2Gi"
        cpu: "500m"
      limits:
        memory: "2Gi"  # Same as request
        # CPU limit intentionally omitted

Think of it like a pizza party analogy: If each guest orders 2 slices but you allow them to eat up to 4 slices, you’ll run out of pizza mid-party. Memory works similarly—when actual usage exceeds requests, unpredictable situations arise.

Understanding QoS Classes

Kubernetes automatically assigns QoS classes based on your resource configuration:

# Guaranteed QoS - Highest priority
resources:
  requests:
    memory: "1Gi"
    cpu: "500m"
  limits:
    memory: "1Gi"
    cpu: "500m"

# Burstable QoS - Medium priority  
resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

# BestEffort QoS - Lowest priority
# No requests or limits specified

Memory QoS Feature (Kubernetes 1.27+)

The Memory Quality of Service feature, introduced in Kubernetes 1.27, provides finer control over memory management using cgroups v2:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    # Opt out of Memory QoS per pod if needed
    qos.memory.kubernetes.io/disabled: "true"
spec:
  containers:
  - name: my-app
    resources:
      requests:
        memory: "1Gi"
      limits:
        memory: "2Gi"

 

 

Monitoring and Observability Stack

Effective memory management starts with proper monitoring. Here’s the battle-tested stack that works in production:

Setting Up Prometheus + Grafana

# Install kube-prometheus-stack with Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

Essential Memory Metrics

Container-level metrics:

  • container_memory_working_set_bytes: The metric OOM killer actually uses
  • container_memory_usage_bytes: Current memory usage
  • container_spec_memory_limit_bytes: Configured memory limit

Pod-level metrics:

  • kube_pod_container_resource_limits: Resource limit values
  • kube_pod_container_resource_requests: Resource request values
  • kube_pod_status_phase: Pod status information

Production-Ready PromQL Queries

Memory utilization monitoring:

# Memory usage percentage per pod
100 * (
  container_memory_working_set_bytes{job="kubelet", container!=""}
  /
  container_spec_memory_limit_bytes{job="kubelet", container!=""}
)

# Total memory usage by namespace
sum by (namespace) (container_memory_working_set_bytes{job="kubelet", container!=""})

# Pods using more than 80% of memory limit
(
  container_memory_working_set_bytes{job="kubelet", container!=""}
  /
  container_spec_memory_limit_bytes{job="kubelet", container!=""}
) > 0.8

OOMKilled detection:

# OOMKilled events in the last hour
increase(kube_pod_container_status_restarts_total{reason="OOMKilled"}[1h])

# List of pods that were OOMKilled
kube_pod_container_status_restarts_total{reason="OOMKilled"} > 0

Grafana Dashboard Configuration

Create panels for comprehensive memory monitoring:

Panel Type Query Focus Purpose
Time Series Memory usage trends Pattern analysis
Stat Current memory usage Real-time status
Table OOMKilled pod list Problem identification
Heatmap Node memory distribution Resource balance check

 

 

Practical Problem-Solving Scenarios

Let me walk you through real scenarios and their solutions:

Scenario 1: Data Processing Application with Memory Spikes

Problem: Large file processing causes memory spikes leading to OOMKills

Solution Strategy:

# Before: Undersized resources
resources:
  requests:
    memory: "1Gi"
  limits:
    memory: "2Gi"

# After: Properly sized with room for spikes
resources:
  requests:
    memory: "4Gi"
  limits:
    memory: "4Gi"

# Additional: Switch from memory-backed to disk-backed storage
volumes:
- name: temp-storage
  emptyDir:
    medium: ""  # Use disk instead of memory
    sizeLimit: 10Gi

Scenario 2: Java Application Heap Management

Problem: JVM doesn’t properly recognize container memory limits

Solution:

env:
- name: JAVA_OPTS
  value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:+UseG1GC"
resources:
  requests:
    memory: "2Gi"
  limits:
    memory: "2Gi"

Why this works:

  • UseContainerSupport: Makes JVM aware of container limits
  • MaxRAMPercentage=75.0: Leaves 25% headroom for non-heap memory
  • UseG1GC: Better garbage collection for containerized environments

Scenario 3: Node.js Memory Leak

Problem: Gradual memory increase over time

Solution:

env:
- name: NODE_OPTIONS
  value: "--max-old-space-size=1536"  # Limit to 1.5GB
resources:
  requests:
    memory: "2Gi"
  limits:
    memory: "2Gi"

# Add graceful shutdown handling
lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 15"]

Implementing Vertical Pod Autoscaler (VPA)

VPA can automatically adjust resource limits based on actual usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # Automatically restart pods with new resources
  resourcePolicy:
    containerPolicies:
    - containerName: my-app
      minAllowed:
        memory: "100Mi"
      maxAllowed:
        memory: "8Gi"
      controlledResources: ["memory"]

 

 

Advanced Memory Profiling Techniques

Go Application Profiling

Add pprof endpoints to your Go applications:

import _ "net/http/pprof"

go func() {
    log.Println(http.ListenAndServe("localhost:6060", nil))
}()

Then profile in production:

# Port-forward to the pod
kubectl port-forward pod/<pod-name> 6060:6060 &

# Collect heap profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Analyze memory usage patterns
(pprof) top
(pprof) list <function-name>

eBPF-Based Continuous Profiling

For production environments, eBPF provides continuous profiling with minimal overhead:

# Deploy continuous profiling with Parca
apiVersion: v1
kind: ConfigMap
metadata:
  name: parca-config
data:
  parca.yaml: |
    scrape_configs:
    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_scrape]
        action: keep
        regex: true

 

 

OOMKilled Error Prevention Strategies

Development Phase Best Practices

1. Local Memory Testing

# Monitor Docker container memory usage during development
docker stats <container-id>

# Use load testing with memory tracking
for i in {1..100}; do
  echo "Test iteration $i"
  kubectl top pod <pod-name> >> memory-usage.log
  sleep 30
done

2. Proper Resource Estimation

# Start with generous limits, then optimize
resources:
  requests:
    memory: "2Gi"  # Based on observed usage + 50% buffer
  limits:
    memory: "2Gi"

Production Monitoring Setup

Proactive Alerting:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: memory-alerts
spec:
  groups:
  - name: memory.rules
    rules:
    - alert: HighMemoryUsage
      expr: |
        (
          container_memory_working_set_bytes{job="kubelet", container!=""}
          /
          container_spec_memory_limit_bytes{job="kubelet", container!=""}
        ) > 0.8
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "Container memory usage exceeds 80%"
        description: "{{ $labels.namespace }}/{{ $labels.pod }} memory usage is {{ $value | humanizePercentage }}"
    
    - alert: OOMKilledDetected
      expr: |
        increase(kube_pod_container_status_restarts_total{reason="OOMKilled"}[5m]) > 0
      labels:
        severity: critical
      annotations:
        summary: "OOMKilled event detected"
        description: "{{ $labels.namespace }}/{{ $labels.pod }} was OOMKilled"

Memory-Based Horizontal Pod Autoscaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: memory-based-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70  # Scale at 70% memory usage

Infrastructure Optimization

Node Resource Reservations:

# Configure kubelet to reserve system resources
--system-reserved=memory=1Gi,cpu=500m
--kube-reserved=memory=500Mi,cpu=500m

Pod Priority and Preemption:

# High-priority workloads get better protection
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000
description: "High priority for critical applications"

---
apiVersion: v1
kind: Pod
spec:
  priorityClassName: high-priority
  containers:
  - name: critical-app
    resources:
      requests:
        memory: "2Gi"
      limits:
        memory: "2Gi"

 

 

OOMKilled errors don’t have to be the bane of your Kubernetes operations. With proper understanding, monitoring, and proactive management, you can minimize their impact and often prevent them entirely.

The key takeaways:

  • Set memory limits equal to requests for predictable behavior
  • Monitor continuously with Prometheus and Grafana
  • Profile applications to understand real memory usage patterns
  • Implement proactive alerts before problems occur
  • Use VPA and HPA for dynamic resource management

 

The landscape of Kubernetes memory management continues to evolve with features like Memory QoS and improved container runtime integration. Stay current with these developments, and your clusters will be more stable and efficient.

 

댓글 남기기