a3kconsultancy.com: April 2026

Kubernetes manages CPU and memory resources for containers using two primary configurations: requests and limits, which influence both scheduling and runtime enforcement.

1. CPU Management

CPU Requests

Represent the minimum guaranteed CPU a container can access.
The kube-scheduler uses requests to place Pods on nodes with sufficient available CPU.
CPU is compressible, meaning a container can temporarily exceed its request if spare cycles exist.

CPU Limits

Define the maximum CPU a container may consume.
Enforced via Linux cgroups using the Completely Fair Scheduler (CFS):
- cpu.cfs_quota_us specifies microseconds per scheduling period (cpu.cfs_period_us) a container may consume.
- Exceeding the limit triggers CPU throttling, slowing the container without killing it.
Behavioral nuances:
- Throttling occurs only when multiple containers compete for limited CPU.
- Unused CPU is reallocated proportionally to containers based on their requests and priority (QoS class).

2. Memory Management

Memory Requests

Indicate the minimum memory guaranteed to a container.
Scheduler ensures the host node has enough free memory for requested allocations.

Memory Limits

Specify the maximum memory a container can use.
Memory is incompressible: exceeding the limit may trigger:
- OOM (Out-of-Memory) kills by the Linux kernel.
- If container is terminated and restartPolicy allows, Kubernetes restarts it.
Unlike CPU, exceeding memory limits usually results in immediate termination under memory pressure.

Notes on Enforcement

Memory enforcement is reactive, occurring only if the kernel detects memory pressure.
Kubernetes tracks Pod QoS class (Guaranteed, Burstable, BestEffort) to prioritize OOM eviction:
- Guaranteed: request = limit; least likely to be killed.
- Burstable: request < limit; may be killed under pressure.
- BestEffort: no requests/limits; first to be evicted when memory is scarce.

3. Pod-Level Resource Aggregation

A Pod’s total CPU or memory requests/limits are the sum of its containers’ configurations.
Kubernetes now supports Pod-level resource requests/limits, allowing containers within a Pod to share idle resources dynamically.

4. Best Practices

Always set requests and limits to ensure scheduling stability and prevent runaway consumption.
For CPU, prioritize requests over hard limits to avoid performance throttling in bursty workloads.
For memory, set realistic limits above expected peak usage to reduce OOM kills.
Use QoS classes, HPA (Horizontal Pod Autoscaler), and VPA (Vertical Pod Autoscaler) to dynamically adjust resources.
Monitor CPU and memory with tools like kubectl top pod, Prometheus metrics (container_cpu_cfs_throttled_seconds_total, container_memory_usage_bytes), and Node PSI (Pressure Stall Information).
CPU is compressible, meaning a container can temporarily exceed its request if spare cycles exist.

CPU Limits

Define the maximum CPU a container may consume.
Enforced via Linux cgroups using the Completely Fair Scheduler (CFS):
- cpu.cfs_quota_us specifies microseconds per scheduling period (cpu.cfs_period_us) a container may consume.
- Exceeding the limit triggers CPU throttling, slowing the container without killing it.
Behavioral nuances:
- Throttling occurs only when multiple containers compete for limited CPU.
- Unused CPU is reallocated proportionally to containers based on their requests and priority (QoS class).

2. Memory Management

Memory Requests

Indicate the minimum memory guaranteed to a container.
Scheduler ensures the host node has enough free memory for requested allocations.

Memory Limits

Specify the maximum memory a container can use.
Memory is incompressible: exceeding the limit may trigger:
- OOM (Out-of-Memory) kills by the Linux kernel.
- If container is terminated and restartPolicy allows, Kubernetes restarts it.
Unlike CPU, exceeding memory limits usually results in immediate termination under memory pressure.

Notes on Enforcement

Memory enforcement is reactive, occurring only if the kernel detects memory pressure.
Kubernetes tracks Pod QoS class (Guaranteed, Burstable, BestEffort) to prioritize OOM eviction:
- Guaranteed: request = limit; least likely to be killed.
- Burstable: request < limit; may be killed under pressure.
- BestEffort: no requests/limits; first to be evicted when memory is scarce.

3. Pod-Level Resource Aggregation

A Pod’s total CPU or memory requests/limits are the sum of its containers’ configurations.
Kubernetes now supports Pod-level resource requests/limits, allowing containers within a Pod to share idle resources dynamically.

4. Best Practices

Always set requests and limits to ensure scheduling stability and prevent runaway consumption.
For CPU, prioritize requests over hard limits to avoid performance throttling in bursty workloads.
For memory, set realistic limits above expected peak usage to reduce OOM kills.
Use QoS classes, HPA (Horizontal Pod Autoscaler), and VPA (Vertical Pod Autoscaler) to dynamically adjust resources.
Monitor CPU and memory with tools like kubectl top pod, Prometheus metrics (container_cpu_cfs_throttled_seconds_total, container_memory_usage_bytes), and Node PSI (Pressure Stall Information).

Key Takeaways:

CPU limits are throttled; memory limits are terminated.
Requests ensure Pod placement; limits enforce runtime constraints.
Pods’ performance and cluster stability depend on setting proper requests and limits, aligned with workload characteristics.

a3kconsultancy.com

POKE ME for any consultancy

Sunday, April 19, 2026

How Does Kubernetes Handle CPU and Memory Limits?

1. CPU Management

CPU Requests

CPU Limits

2. Memory Management

Memory Requests

Memory Limits

Notes on Enforcement

3. Pod-Level Resource Aggregation

4. Best Practices

CPU Limits

2. Memory Management

Memory Requests

Memory Limits

Notes on Enforcement

3. Pod-Level Resource Aggregation

4. Best Practices