'How does Kubernetes kubelet resource reservation work
I recently tried to bring up a Kubernetes cluster in AWS using kops. But when the worker node (Ubuntu 20.04) started, a docker load process on it kept getting OOMkilled even when it has enough memory (~14GiB). I tracked down the issue being I set kubelet's memory reservation too small (--kube-reserved=memory=100Mi...).
So now I have two questions related to the following paragraph in the documentation:
kube-reserved is meant to capture resource reservation for kubernetes system daemons like the kubelet, container runtime, node problem detector, etc.
https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#kube-reserved
First, I interpreted the "reservation" as "the amount of memory guaranteed", similar to the concept of a pod's .spec.resource.requests.memory. However, it seems like the flag acts like a limit as well? Does this mean Kubernetes intend to manage Kubernetes system daemons with "guaranteed" QoS class concept?
Also, my container runtime, docker, does not seem to be in /kube-reserved cgroup, instead, it is in /system.slice:
$ systemctl status $(pgrep dockerd) | grep CGroup
CGroup: /system.slice/docker.service
So why is it getting limited by /kube-reserved? It is not even kubelet talking to docker through CRI, but just my manual docker load command.
Solution 1:[1]
kube-reserved is a way to protect Kubernetes system daemons (which includes the Kubelet) from running out of memory should the pods consume too much. How is this achieved? The pods are limited by default to an "allocatable" value, equal to the memory capacity of the node minus several flag values defined in the URL you posted, one of which is kube-reserved. Here's what this looks like for a 7-GiB DS2_v2 node in AKS:
But it's not always the Kubernetes system daemons that have to be protected from either pods or even OS components consuming too much memory. It can very well be the Kubernetes system daemons that could consume too much memory and start affecting the pods or other OS components. To protect against this scenario, there's an additional flag defined:
To optionally enforce
kube-reservedon kubernetes system daemons, specify the parent control group for kube daemons as the value for--kube-reserved-cgroupkubelet flag.
With this new flag in place, should the aggregated memory use of the Kubernetes system daemons exceed the cgroup limit, then the OOM killer will step in and terminate one of their processes. To apply this to the picture above, with the --kube-reserved-cgroup flag specified, the Kubernetes system daemons are prevented from going over 1,638 MiB.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mihai Albert |

