'How to know Memory cgroup limit?
We have kubernetes cluster, and we are running jenkins in it. Our jenkins restart after every 48 hours, when we check the kubelet logs for that worker where jenkins deployed, it gives error
Feb 15 14:52:01 myworker kernel: Memory cgroup out of memory: Kill process 110129 (Computer.thread) score 1972 or sacrifice child
Feb 15 14:52:01 myworker kernel: Killed process 50179 (java), UID 1000, total-vm:17378260kB, anon-rss:8371056kB, file-rss:29676kB, shmem-rss:0kB
where 50179 is java process for jenkins.
We set limit in kubernetes for jenkins as 8Gi
resources:
limits:
cpu: 3500m
memory: 8Gi
requests:
cpu: "1"
memory: 4Gi
I also check newrelic alerts, which we integrated with our pods, it never goes beyond 5GB in memory.
Details logs below.
Feb 15 14:52:01 myworker kernel: Download metada cpuset=kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d mems_allowed=0
Feb 15 14:52:01 myworker kernel: CPU: 6 PID: 115222 Comm: Download metada Kdump: loaded Tainted: G ------------ T 3.10.0-1160.15.2.el7.x86_64 #1
Feb 15 14:52:01 myworker kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Feb 15 14:52:01 myworker kernel: Call Trace:
Feb 15 14:52:01 myworker kernel: [<ffffffff82581fba>] dump_stack+0x19/0x1b
Feb 15 14:52:01 myworker kernel: [<ffffffff8257c8da>] dump_header+0x90/0x229
Feb 15 14:52:01 myworker kernel: [<ffffffff8209d378>] ? ep_poll_callback+0xf8/0x220
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc1d16>] ? find_lock_task_mm+0x56/0xc0
Feb 15 14:52:01 myworker kernel: [<ffffffff8203caa8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc227d>] oom_kill_process+0x2cd/0x490
Feb 15 14:52:01 myworker kernel: [<ffffffff82040ebc>] mem_cgroup_oom_synchronize+0x55c/0x590
Feb 15 14:52:01 myworker kernel: [<ffffffff82040320>] ? mem_cgroup_charge_common+0xc0/0xc0
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc2b64>] pagefault_out_of_memory+0x14/0x90
Feb 15 14:52:01 myworker kernel: [<ffffffff8257ade6>] mm_fault_error+0x6a/0x157
Feb 15 14:52:01 myworker kernel: [<ffffffff8258f8d1>] __do_page_fault+0x491/0x500
Feb 15 14:52:01 myworker kernel: [<ffffffff8258f975>] do_page_fault+0x35/0x90
Feb 15 14:52:01 myworker kernel: [<ffffffff8258b778>] page_fault+0x28/0x30
Feb 15 14:52:01 myworker kernel: Task in /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d killed as a result of limit of /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d
Feb 15 14:52:01 myworker kernel: memory: usage 8388608kB, limit 8388608kB, failcnt 111634
Feb 15 14:52:01 myworker kernel: memory+swap: usage 8388608kB, limit 9007199254740988kB, failcnt 0
Feb 15 14:52:01 myworker kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Feb 15 14:52:01 myworker kernel: Memory cgroup stats for /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d: cache:20KB rss:8388588KB rss_huge:6144KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:8388556KB inactive_file:4KB active_file:0KB unevictable:0KB
Feb 15 14:52:01 myworker kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[root@myworker log]# head messages -n376428 | tail -n 40
Feb 15 14:52:01 myworker kernel: [115493] 1000 115493 2059 462 8 0 969 git
Feb 15 14:52:01 myworker kernel: [115497] 1000 115497 1764 350 8 0 969 git
Feb 15 14:52:01 myworker kernel: [115498] 1000 115498 24351 2784 17 0 969 git-remote-http
Feb 15 14:52:01 myworker kernel: Memory cgroup out of memory: Kill process 115496 (git fetch --tag) score 1972 or sacrifice child
Feb 15 14:52:01 myworker kernel: Killed process 115493 (git), UID 1000, total-vm:8236kB, anon-rss:296kB, file-rss:1552kB, shmem-rss:0kB
Feb 15 14:52:01 myworker containerd: time="2022-02-15T14:52:01.791126760Z" level=info msg="TaskOOM event &TaskOOM{ContainerID:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d,XXX_unrecognized:[],}"
Feb 15 14:52:01 myworker kernel: Download metada invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=969
Feb 15 14:52:01 myworker kernel: Download metada cpuset=kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d mems_allowed=0
Feb 15 14:52:01 myworker kernel: CPU: 6 PID: 115222 Comm: Download metada Kdump: loaded Tainted: G ------------ T 3.10.0-1160.15.2.el7.x86_64 #1
Feb 15 14:52:01 myworker kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Feb 15 14:52:01 myworker kernel: Call Trace:
Feb 15 14:52:01 myworker kernel: [<ffffffff82581fba>] dump_stack+0x19/0x1b
Feb 15 14:52:01 myworker kernel: [<ffffffff8257c8da>] dump_header+0x90/0x229
Feb 15 14:52:01 myworker kernel: [<ffffffff8209d378>] ? ep_poll_callback+0xf8/0x220
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc1d16>] ? find_lock_task_mm+0x56/0xc0
Feb 15 14:52:01 myworker kernel: [<ffffffff8203caa8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc227d>] oom_kill_process+0x2cd/0x490
Feb 15 14:52:01 myworker kernel: [<ffffffff82040ebc>] mem_cgroup_oom_synchronize+0x55c/0x590
Feb 15 14:52:01 myworker kernel: [<ffffffff82040320>] ? mem_cgroup_charge_common+0xc0/0xc0
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc2b64>] pagefault_out_of_memory+0x14/0x90
Feb 15 14:52:01 myworker kernel: [<ffffffff8257ade6>] mm_fault_error+0x6a/0x157
Feb 15 14:52:01 myworker kernel: [<ffffffff8258f8d1>] __do_page_fault+0x491/0x500
Feb 15 14:52:01 myworker kernel: [<ffffffff8258f975>] do_page_fault+0x35/0x90
Feb 15 14:52:01 myworker kernel: [<ffffffff8258b778>] page_fault+0x28/0x30
Feb 15 14:52:01 myworker kernel: Task in /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d killed as a result of limit of /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d
Feb 15 14:52:01 myworker kernel: memory: usage 8388608kB, limit 8388608kB, failcnt 111634
Feb 15 14:52:01 myworker kernel: memory+swap: usage 8388608kB, limit 9007199254740988kB, failcnt 0
Feb 15 14:52:01 myworker kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Feb 15 14:52:01 myworker kernel: Memory cgroup stats for /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d: cache:20KB rss:8388588KB rss_huge:6144KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:8388556KB inactive_file:4KB active_file:0KB unevictable:0KB
Feb 15 14:52:01 myworker kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Feb 15 14:52:01 myworker kernel: [41710] 1000 41710 285 1 4 0 969 tini
Feb 15 14:52:01 myworker kernel: [50179] 1000 50179 4344565 2100159 4662 0 969 java
Feb 15 14:52:01 myworker kernel: [115497] 1000 115497 1764 350 8 0 969 git
Feb 15 14:52:01 myworker kernel: [115498] 1000 115498 24351 2784 17 0 969 git-remote-http
Feb 15 14:52:01 myworker kernel: Memory cgroup out of memory: Kill process 110129 (Computer.thread) score 1972 or sacrifice child
Feb 15 14:52:01 myworker kernel: Killed process 50179 (java), UID 1000, total-vm:17378260kB, anon-rss:8371056kB, file-rss:29676kB, shmem-rss:0kB
Feb 15 14:52:03 myworker containerd: time="2022-02-15T14:52:03.132654815Z" level=info msg="Finish piping stderr of container \"7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d\""
Feb 15 14:52:03 myworker containerd: time="2022-02-15T14:52:03.132676088Z" level=info msg="Finish piping stdout of container \"7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d\""
Feb 15 14:52:03 myworker containerd: time="2022-02-15T14:52:03.134738144Z" level=info msg="TaskExit event &TaskExit{ContainerID:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d,ID:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d,Pid:41710,ExitStatus:137,ExitedAt:2022-02-15 14:52:03.134458495 +0000 UTC,XXX_unrecognized:[],}"
Feb 15 14:52:03 myworker containerd: time="2022-02-15T14:52:03.248040140Z" level=info msg="shim disconnected" id=7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d
Only problem I can see here is, we are telling kubernetes to go till 8Gb but Memory cgroup might have limit below 8Gb and when it try to reach something beyond 5Gb it kill the pod and it restart again.
What is the best way to know the Memory cgroup limit? and is there way to know which pods/process are using this cgroup?
Questions:
Q1: What kind of cluster do you use? Minikube, kubeadm or managed by cloud GKE, EKS, AKS? A1: kubeadm
Q2: Which version of kubernetes do you use? A2: v1.21.3
Q3: From when the problem with restart jenkins pod has been started? A3: Issue might be from the beginning, but we start noticing recently when we moved more jobs to kubernetes based jenkins.
Q4: Could you paste an output from jenkins pods using kubectl describe pod ? A4:
# kubectl describe pod -n jenkins jenkins-jenkins-instance
Name: jenkins-jenkins-instance
Namespace: jenkins
Priority: 0
Node: myworker/192.168.X.X
Start Time: Sun, 13 Mar 2022 15:12:19 +0000
Labels: app=jenkins-operator
jenkins-cr=jenkins-instance
Annotations: <none>
Status: Running
IP: 192.168.113.152
IPs:
IP: 192.168.113.152
Controlled By: Jenkins/jenkins-instance
Containers:
jenkins-master:
Container ID: containerd://70e68b7b069404f825b53e9d8f0dac22c595074e5bdc4659cae5248e25af8e00
Image: jenkins/jenkins:lts
Image ID: docker.io/jenkins/jenkins@sha256:b414f82151b865d3efd49ec27a944f624188d09fec58700cddfbe6bae2450f77
Ports: 8080/TCP, 50000/TCP
Host Ports: 0/TCP, 0/TCP
Command:
bash
-c
/var/jenkins/scripts/init.sh && exec /sbin/tini -s -- /usr/local/bin/jenkins.sh
State: Running
Started: Sun, 13 Mar 2022 15:12:20 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 3500m
memory: 8Gi
Requests:
cpu: 1
memory: 4Gi
Liveness: http-get http://:http/login delay=100s timeout=5s period=10s #success=1 #failure=12
Readiness: http-get http://:http/login delay=80s timeout=1s period=10s #success=1 #failure=10
Environment:
COPY_REFERENCE_FILE_LOG: /var/lib/jenkins/copy_reference_file.log
NEW_RELIC_METADATA_KUBERNETES_CLUSTER_NAME: IAD.Prod
NEW_RELIC_METADATA_KUBERNETES_NODE_NAME: (v1:spec.nodeName)
NEW_RELIC_METADATA_KUBERNETES_NAMESPACE_NAME: jenkins (v1:metadata.namespace)
NEW_RELIC_METADATA_KUBERNETES_POD_NAME: jenkins-jenkins-instance (v1:metadata.name)
NEW_RELIC_METADATA_KUBERNETES_CONTAINER_NAME: master
NEW_RELIC_METADATA_KUBERNETES_CONTAINER_IMAGE_NAME: jenkins/jenkins:lts
JAVA_OPTS: -XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=80.0 -Djenkins.install.runSetupWizard=false -Djava.awt.headless=true
JENKINS_HOME: /var/lib/jenkins
Mounts:
/var/jenkins/init-configuration from init-configuration (ro)
/var/jenkins/operator-credentials from operator-credentials (ro)
/var/jenkins/scripts from scripts (ro)
/var/lib/jenkins from jenkins-home (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fc57k (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
jenkins-home:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: jenkins-operator-scripts-jenkins-instance
Optional: false
init-configuration:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: jenkins-operator-init-configuration-jenkins-instance
Optional: false
operator-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: jenkins-operator-credentials-jenkins-instance
Optional: false
kube-api-access-fc57k:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
Q5: How we deploy jenkins? A5: We are using Jenkins-operator to deploy jenkins.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
