Kubernetes manages CPU and memory resources for containers using two primary configurations: requests and limits, which influence both scheduling and runtime enforcement.
1. CPU Management
CPU Requests
- Represent the minimum guaranteed CPU a container can access.
- The kube-scheduler uses requests to place Pods on nodes with sufficient available CPU.
- CPU is compressible, meaning a container can temporarily exceed its request if spare cycles exist.
CPU Limits
- Define the maximum CPU a container may consume.
- Enforced via Linux cgroups using the Completely Fair Scheduler (CFS):
cpu.cfs_quota_usspecifies microseconds per scheduling period (cpu.cfs_period_us) a container may consume.- Exceeding the limit triggers CPU throttling, slowing the container without killing it.
- Behavioral nuances:
- Throttling occurs only when multiple containers compete for limited CPU.
- Unused CPU is reallocated proportionally to containers based on their requests and priority (QoS class).
2. Memory Management
Memory Requests
- Indicate the minimum memory guaranteed to a container.
- Scheduler ensures the host node has enough free memory for requested allocations.
Memory Limits
- Specify the maximum memory a container can use.
- Memory is incompressible: exceeding the limit may trigger:
- OOM (Out-of-Memory) kills by the Linux kernel.
- If container is terminated and
restartPolicyallows, Kubernetes restarts it.
- Unlike CPU, exceeding memory limits usually results in immediate termination under memory pressure.
Notes on Enforcement
- Memory enforcement is reactive, occurring only if the kernel detects memory pressure.
- Kubernetes tracks Pod QoS class (Guaranteed, Burstable, BestEffort) to prioritize OOM eviction:
- Guaranteed: request = limit; least likely to be killed.
- Burstable: request < limit; may be killed under pressure.
- BestEffort: no requests/limits; first to be evicted when memory is scarce.
3. Pod-Level Resource Aggregation
- A Pod’s total CPU or memory requests/limits are the sum of its containers’ configurations.
- Kubernetes now supports Pod-level resource requests/limits, allowing containers within a Pod to share idle resources dynamically.
4. Best Practices
- Always set requests and limits to ensure scheduling stability and prevent runaway consumption.
- For CPU, prioritize requests over hard limits to avoid performance throttling in bursty workloads.
- For memory, set realistic limits above expected peak usage to reduce OOM kills.
- Use QoS classes, HPA (Horizontal Pod Autoscaler), and VPA (Vertical Pod Autoscaler) to dynamically adjust resources.
- Monitor CPU and memory with tools like
kubectl top pod, Prometheus metrics (container_cpu_cfs_throttled_seconds_total,container_memory_usage_bytes), and Node PSI (Pressure Stall Information). - CPU is compressible, meaning a container can temporarily exceed its request if spare cycles exist.
CPU Limits
- Define the maximum CPU a container may consume.
- Enforced via Linux cgroups using the Completely Fair Scheduler (CFS):
cpu.cfs_quota_usspecifies microseconds per scheduling period (cpu.cfs_period_us) a container may consume.- Exceeding the limit triggers CPU throttling, slowing the container without killing it.
- Behavioral nuances:
- Throttling occurs only when multiple containers compete for limited CPU.
- Unused CPU is reallocated proportionally to containers based on their requests and priority (QoS class).
2. Memory Management
Memory Requests
- Indicate the minimum memory guaranteed to a container.
- Scheduler ensures the host node has enough free memory for requested allocations.
Memory Limits
- Specify the maximum memory a container can use.
- Memory is incompressible: exceeding the limit may trigger:
- OOM (Out-of-Memory) kills by the Linux kernel.
- If container is terminated and
restartPolicyallows, Kubernetes restarts it.
- Unlike CPU, exceeding memory limits usually results in immediate termination under memory pressure.
Notes on Enforcement
- Memory enforcement is reactive, occurring only if the kernel detects memory pressure.
- Kubernetes tracks Pod QoS class (Guaranteed, Burstable, BestEffort) to prioritize OOM eviction:
- Guaranteed: request = limit; least likely to be killed.
- Burstable: request < limit; may be killed under pressure.
- BestEffort: no requests/limits; first to be evicted when memory is scarce.
3. Pod-Level Resource Aggregation
- A Pod’s total CPU or memory requests/limits are the sum of its containers’ configurations.
- Kubernetes now supports Pod-level resource requests/limits, allowing containers within a Pod to share idle resources dynamically.
4. Best Practices
- Always set requests and limits to ensure scheduling stability and prevent runaway consumption.
- For CPU, prioritize requests over hard limits to avoid performance throttling in bursty workloads.
- For memory, set realistic limits above expected peak usage to reduce OOM kills.
- Use QoS classes, HPA (Horizontal Pod Autoscaler), and VPA (Vertical Pod Autoscaler) to dynamically adjust resources.
- Monitor CPU and memory with tools like
kubectl top pod, Prometheus metrics (container_cpu_cfs_throttled_seconds_total,container_memory_usage_bytes), and Node PSI (Pressure Stall Information).
Key Takeaways:
- CPU limits are throttled; memory limits are terminated.
- Requests ensure Pod placement; limits enforce runtime constraints.
- Pods’ performance and cluster stability depend on setting proper requests and limits, aligned with workload characteristics.