Sometime back one of my client contacts reported frequent restart of his application deployed on Kubernetes. He was unable to understand why the POD reported status of OOMKilled despite the node having plenty of free memory.
I’m sure many Kubernetes users would have faced this issue and might already know the reasons for the same. For people unfamiliar with the reasons please continue to read on
Kubernetes provides a way to specify minimum and maximum resource requirements for containers. Here is an example of a POD specifying minimum (50M) and maximum (200M) memory requirements.
spec: containers: - name: memory-demo image: polinux/stress resources: limits: memory: "200M" requests: memory: "50M"
Note that you can either specify minimum (requests) or maximum (limits) or both, for that matter. Head to the official Kubernetes documentation if you want to understand all the available options and nuances –
When the POD has a memory ‘limit’ (maximum) defined and if the POD memory usage crosses beyond the specified limit, the POD will get killed, and the status will be reported as OOMKilled. Note that, this happens despite the node having enough free memory.
In order to understand this behaviour we need to look at the Linux “OOM Killer”. Its purpose is to free up memory when there is an out-of-memory situation for the system. It does this by killing selected processes.
The kernel maintains an oom_score value for each process. The higher the value, the more likelihood of the specific process and its children getting killed by the OOM Killer. The kernel provides an additional variable – oom_score_adj to tweak the oom_score value, thereby enabling some control on the OOM Killer process selection.
When memory limit is set for the POD, Kubernetes sets an oom_score_adj value based on the quality of service (default QoS is Burstable) to ensure the specific container process gets selected by the OOM killer. The oom_score_adj value is calculated as follows:
|Quality of Service||oom_score_adj|
|Burstable||min(max(2, 1000 – (1000 * memoryRequestBytes) / machineMemoryCapacityBytes), 999)|
This is the reason why the application pod gets killed and reports OOMKilled status.
Let’s see this in action. Fire up your minikube or Docker for Mac with Kubernetes and run the following:
- Create a POD and check the oom_score and oom_score_adj values of the main process
$ kubectl create ns memory-demo $ kubectl create -f https://raw.githubusercontent.com/bpradipt/examples/master/kubeyamls/memory-limit-pod.yaml $ kubectl get pods memory-demo -o yaml [..snip..] phase: Running podIP: 172.17.0.7 qosClass: Burstable startTime: 2018-05-26T05:14:43Z [..snip..] $ kubectl exec -it -n memory-demo memory-demo bash bash-4.3# ps aux PID USER TIME COMMAND 1 root 0:00 stress --vm 1 --vm-bytes 150M --vm-hang 1 7 root 224:00 stress --vm 1 --vm-bytes 150M --vm-hang 1 20 root 0:00 bash 26 root 0:00 ps aux bash-4.3# cat /proc/7/oom_score 1013 bash-4.3# cat /proc/7/oom_score_adj 995
- Create a POD to simulate OOM
$ kubectl create -f https://raw.githubusercontent.com/bpradipt/examples/master/kubeyamls/memory-limit-pod-oom.yaml pod "memory-demo-oom" created $ kubectl get pods -n memory-demo NAME READY STATUS RESTARTS AGE memory-demo-oom 0/1 ContainerCreating 0 2s $ kubectl get pods -n memory-demo NAME READY STATUS RESTARTS AGE memory-demo-oom 0/1 OOMKilled 0 5s $ kubectl get pods -n memory-demo NAME READY STATUS RESTARTS AGE memory-demo-oom 0/1 CrashLoopBackOff 1 9s $ kubectl describe pod/memory-demo-oom -n memory-demo [..snip..] State: Terminated Reason: OOMKilled Exit Code: 1 Started: Mon, 28 May 2018 04:47:12 +0000 Finished: Mon, 28 May 2018 04:47:12 +0000 Last State: Terminated
Here are some additional references on the topic: