Why my Kubernetes Application shows OOMKilled Status ?

Sometime back one of my client contacts reported frequent restart of his application deployed on Kubernetes. He was unable to understand why the POD reported status of OOMKilled despite the node having plenty of free memory.

I’m sure many Kubernetes users would have faced this issue and might already know the reasons for the same. For people unfamiliar with the reasons please continue to read on :-)

Kubernetes provides a way to specify minimum and maximum resource requirements for containers. Here is an example of a POD specifying minimum (50M) and maximum (200M) memory requirements.

spec:
  containers:
  - name: memory-demo
    image: polinux/stress
    resources:
      limits:
        memory: "200M"
      requests:
        memory: "50M"

Note that you can either specify minimum (requests) or maximum (limits) or both, for that matter. Head to the official Kubernetes documentation if you want to understand all the available options and nuances –

https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource

When the POD has a memory ‘limit’ (maximum) defined and if the POD memory usage crosses beyond the specified limit, the POD will get killed, and the status will be reported as OOMKilled. Note that, this happens despite the node having enough free memory.

In order to understand this behaviour we need to look at the Linux “OOM Killer”. Its purpose is to free up memory when there is an out-of-memory situation for the system. It does this by killing selected processes.

The kernel maintains an oom_score value for each process. The higher the value, the more likelihood of the specific process and its children getting killed by the OOM Killer. The kernel provides an additional variable – oom_score_adj to tweak the oom_score value, thereby enabling some control on the OOM Killer process selection.

When memory limit is set for the POD, Kubernetes sets an oom_score_adj value based on the quality of service (default QoS is Burstable) to ensure the specific container process gets selected by the OOM killer. The oom_score_adj value is calculated as follows:

Quality of Service oom_score_adj
Guaranteed -998
BestEffort 1000
Burstable min(max(2, 1000 – (1000 * memoryRequestBytes) / machineMemoryCapacityBytes), 999)

Src: https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior

This is the reason why the application pod gets killed and reports OOMKilled status.

Let’s see this in action. Fire up your minikube or Docker for Mac with Kubernetes and run the following:

  • Create a POD and check the oom_score and oom_score_adj values of the main process
$ kubectl create ns memory-demo
$ kubectl create -f https://raw.githubusercontent.com/bpradipt/examples/master/kubeyamls/memory-limit-pod.yaml

$ kubectl get pods memory-demo -o yaml
[..snip..]
phase: Running
  podIP: 172.17.0.7
  qosClass: Burstable
  startTime: 2018-05-26T05:14:43Z
[..snip..]

$ kubectl exec -it -n memory-demo memory-demo bash
bash-4.3# ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 stress --vm 1 --vm-bytes 150M --vm-hang 1
    7 root     224:00 stress --vm 1 --vm-bytes 150M --vm-hang 1
   20 root       0:00 bash
   26 root       0:00 ps aux
bash-4.3# cat /proc/7/oom_score
1013
bash-4.3# cat /proc/7/oom_score_adj
995
  • Create a POD to simulate OOM
$ kubectl create -f https://raw.githubusercontent.com/bpradipt/examples/master/kubeyamls/memory-limit-pod-oom.yaml
pod "memory-demo-oom" created
$ kubectl get pods -n memory-demo
NAME              READY     STATUS              RESTARTS   AGE
memory-demo-oom   0/1       ContainerCreating   0          2s

$ kubectl get pods -n memory-demo
NAME              READY     STATUS      RESTARTS   AGE
memory-demo-oom   0/1       OOMKilled   0          5s

$ kubectl get pods -n memory-demo
NAME              READY     STATUS             RESTARTS   AGE
memory-demo-oom   0/1       CrashLoopBackOff   1          9s

$ kubectl describe pod/memory-demo-oom -n memory-demo
[..snip..]
    State:          Terminated
      Reason:       OOMKilled
      Exit Code:    1
      Started:      Mon, 28 May 2018 04:47:12 +0000
      Finished:     Mon, 28 May 2018 04:47:12 +0000
    Last State:     Terminated

Here are some additional references on the topic:

Pradipta Kumar Banerjee

I'm a Cloud and Linux/ OpenSource enthusiast, with 16 years of industry experience at IBM. You can find more details about me here - Linkedin

You may also like...