'How to know Memory cgroup limit?

We have kubernetes cluster, and we are running jenkins in it. Our jenkins restart after every 48 hours, when we check the kubelet logs for that worker where jenkins deployed, it gives error

Feb 15 14:52:01 myworker kernel: Memory cgroup out of memory: Kill process 110129 (Computer.thread) score 1972 or sacrifice child

Feb 15 14:52:01 myworker kernel: Killed process 50179 (java), UID 1000, total-vm:17378260kB, anon-rss:8371056kB, file-rss:29676kB, shmem-rss:0kB

where 50179 is java process for jenkins.

We set limit in kubernetes for jenkins as 8Gi

    resources:
      limits:
        cpu: 3500m
        memory: 8Gi
      requests:
        cpu: "1"
        memory: 4Gi

I also check newrelic alerts, which we integrated with our pods, it never goes beyond 5GB in memory.

Details logs below.

Feb 15 14:52:01 myworker kernel: Download metada cpuset=kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d mems_allowed=0
Feb 15 14:52:01 myworker kernel: CPU: 6 PID: 115222 Comm: Download metada Kdump: loaded Tainted: G               ------------ T 3.10.0-1160.15.2.el7.x86_64 #1
Feb 15 14:52:01 myworker kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Feb 15 14:52:01 myworker kernel: Call Trace:
Feb 15 14:52:01 myworker kernel: [<ffffffff82581fba>] dump_stack+0x19/0x1b
Feb 15 14:52:01 myworker kernel: [<ffffffff8257c8da>] dump_header+0x90/0x229
Feb 15 14:52:01 myworker kernel: [<ffffffff8209d378>] ? ep_poll_callback+0xf8/0x220
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc1d16>] ? find_lock_task_mm+0x56/0xc0
Feb 15 14:52:01 myworker kernel: [<ffffffff8203caa8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc227d>] oom_kill_process+0x2cd/0x490
Feb 15 14:52:01 myworker kernel: [<ffffffff82040ebc>] mem_cgroup_oom_synchronize+0x55c/0x590
Feb 15 14:52:01 myworker kernel: [<ffffffff82040320>] ? mem_cgroup_charge_common+0xc0/0xc0
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc2b64>] pagefault_out_of_memory+0x14/0x90
Feb 15 14:52:01 myworker kernel: [<ffffffff8257ade6>] mm_fault_error+0x6a/0x157
Feb 15 14:52:01 myworker kernel: [<ffffffff8258f8d1>] __do_page_fault+0x491/0x500
Feb 15 14:52:01 myworker kernel: [<ffffffff8258f975>] do_page_fault+0x35/0x90
Feb 15 14:52:01 myworker kernel: [<ffffffff8258b778>] page_fault+0x28/0x30
Feb 15 14:52:01 myworker kernel: Task in /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d killed as a result of limit of /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d
Feb 15 14:52:01 myworker kernel: memory: usage 8388608kB, limit 8388608kB, failcnt 111634
Feb 15 14:52:01 myworker kernel: memory+swap: usage 8388608kB, limit 9007199254740988kB, failcnt 0
Feb 15 14:52:01 myworker kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Feb 15 14:52:01 myworker kernel: Memory cgroup stats for /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d: cache:20KB rss:8388588KB rss_huge:6144KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:8388556KB inactive_file:4KB active_file:0KB unevictable:0KB
Feb 15 14:52:01 myworker kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[root@myworker log]# head messages -n376428 | tail -n 40
Feb 15 14:52:01 myworker kernel: [115493]  1000 115493     2059      462       8        0           969 git
Feb 15 14:52:01 myworker kernel: [115497]  1000 115497     1764      350       8        0           969 git
Feb 15 14:52:01 myworker kernel: [115498]  1000 115498    24351     2784      17        0           969 git-remote-http
Feb 15 14:52:01 myworker kernel: Memory cgroup out of memory: Kill process 115496 (git fetch --tag) score 1972 or sacrifice child
Feb 15 14:52:01 myworker kernel: Killed process 115493 (git), UID 1000, total-vm:8236kB, anon-rss:296kB, file-rss:1552kB, shmem-rss:0kB
Feb 15 14:52:01 myworker containerd: time="2022-02-15T14:52:01.791126760Z" level=info msg="TaskOOM event &TaskOOM{ContainerID:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d,XXX_unrecognized:[],}"
Feb 15 14:52:01 myworker kernel: Download metada invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=969
Feb 15 14:52:01 myworker kernel: Download metada cpuset=kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d mems_allowed=0
Feb 15 14:52:01 myworker kernel: CPU: 6 PID: 115222 Comm: Download metada Kdump: loaded Tainted: G               ------------ T 3.10.0-1160.15.2.el7.x86_64 #1
Feb 15 14:52:01 myworker kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Feb 15 14:52:01 myworker kernel: Call Trace:
Feb 15 14:52:01 myworker kernel: [<ffffffff82581fba>] dump_stack+0x19/0x1b
Feb 15 14:52:01 myworker kernel: [<ffffffff8257c8da>] dump_header+0x90/0x229
Feb 15 14:52:01 myworker kernel: [<ffffffff8209d378>] ? ep_poll_callback+0xf8/0x220
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc1d16>] ? find_lock_task_mm+0x56/0xc0
Feb 15 14:52:01 myworker kernel: [<ffffffff8203caa8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc227d>] oom_kill_process+0x2cd/0x490
Feb 15 14:52:01 myworker kernel: [<ffffffff82040ebc>] mem_cgroup_oom_synchronize+0x55c/0x590
Feb 15 14:52:01 myworker kernel: [<ffffffff82040320>] ? mem_cgroup_charge_common+0xc0/0xc0
Feb 15 14:52:01 myworker kernel: [<ffffffff81fc2b64>] pagefault_out_of_memory+0x14/0x90
Feb 15 14:52:01 myworker kernel: [<ffffffff8257ade6>] mm_fault_error+0x6a/0x157
Feb 15 14:52:01 myworker kernel: [<ffffffff8258f8d1>] __do_page_fault+0x491/0x500
Feb 15 14:52:01 myworker kernel: [<ffffffff8258f975>] do_page_fault+0x35/0x90
Feb 15 14:52:01 myworker kernel: [<ffffffff8258b778>] page_fault+0x28/0x30
Feb 15 14:52:01 myworker kernel: Task in /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d killed as a result of limit of /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d
Feb 15 14:52:01 myworker kernel: memory: usage 8388608kB, limit 8388608kB, failcnt 111634
Feb 15 14:52:01 myworker kernel: memory+swap: usage 8388608kB, limit 9007199254740988kB, failcnt 0
Feb 15 14:52:01 myworker kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Feb 15 14:52:01 myworker kernel: Memory cgroup stats for /system.slice/containerd.service/kubepods-burstable-pod1840326e_dca6_4e8c_a55a_f4fb9a7c95fa.slice:cri-containerd:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d: cache:20KB rss:8388588KB rss_huge:6144KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:8388556KB inactive_file:4KB active_file:0KB unevictable:0KB
Feb 15 14:52:01 myworker kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Feb 15 14:52:01 myworker kernel: [41710]  1000 41710      285        1       4        0           969 tini
Feb 15 14:52:01 myworker kernel: [50179]  1000 50179  4344565  2100159    4662        0           969 java
Feb 15 14:52:01 myworker kernel: [115497]  1000 115497     1764      350       8        0           969 git
Feb 15 14:52:01 myworker kernel: [115498]  1000 115498    24351     2784      17        0           969 git-remote-http
Feb 15 14:52:01 myworker kernel: Memory cgroup out of memory: Kill process 110129 (Computer.thread) score 1972 or sacrifice child
Feb 15 14:52:01 myworker kernel: Killed process 50179 (java), UID 1000, total-vm:17378260kB, anon-rss:8371056kB, file-rss:29676kB, shmem-rss:0kB
Feb 15 14:52:03 myworker containerd: time="2022-02-15T14:52:03.132654815Z" level=info msg="Finish piping stderr of container \"7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d\""
Feb 15 14:52:03 myworker containerd: time="2022-02-15T14:52:03.132676088Z" level=info msg="Finish piping stdout of container \"7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d\""
Feb 15 14:52:03 myworker containerd: time="2022-02-15T14:52:03.134738144Z" level=info msg="TaskExit event &TaskExit{ContainerID:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d,ID:7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d,Pid:41710,ExitStatus:137,ExitedAt:2022-02-15 14:52:03.134458495 +0000 UTC,XXX_unrecognized:[],}"
Feb 15 14:52:03 myworker containerd: time="2022-02-15T14:52:03.248040140Z" level=info msg="shim disconnected" id=7fc11e70ccd4fd078b8d243f2710ecc1404955bf52a5cb05eb54f2917086420d

Only problem I can see here is, we are telling kubernetes to go till 8Gb but Memory cgroup might have limit below 8Gb and when it try to reach something beyond 5Gb it kill the pod and it restart again.

What is the best way to know the Memory cgroup limit? and is there way to know which pods/process are using this cgroup?

Questions:

Q1: What kind of cluster do you use? Minikube, kubeadm or managed by cloud GKE, EKS, AKS? A1: kubeadm

Q2: Which version of kubernetes do you use? A2: v1.21.3

Q3: From when the problem with restart jenkins pod has been started? A3: Issue might be from the beginning, but we start noticing recently when we moved more jobs to kubernetes based jenkins.

Q4: Could you paste an output from jenkins pods using kubectl describe pod ? A4:

# kubectl describe pod -n jenkins jenkins-jenkins-instance
Name:         jenkins-jenkins-instance
Namespace:    jenkins
Priority:     0
Node:         myworker/192.168.X.X
Start Time:   Sun, 13 Mar 2022 15:12:19 +0000
Labels:       app=jenkins-operator
              jenkins-cr=jenkins-instance
Annotations:  <none>
Status:       Running
IP:           192.168.113.152
IPs:
  IP:           192.168.113.152
Controlled By:  Jenkins/jenkins-instance
Containers:
  jenkins-master:
    Container ID:  containerd://70e68b7b069404f825b53e9d8f0dac22c595074e5bdc4659cae5248e25af8e00
    Image:         jenkins/jenkins:lts
    Image ID:      docker.io/jenkins/jenkins@sha256:b414f82151b865d3efd49ec27a944f624188d09fec58700cddfbe6bae2450f77
    Ports:         8080/TCP, 50000/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      bash
      -c
      /var/jenkins/scripts/init.sh && exec /sbin/tini -s -- /usr/local/bin/jenkins.sh
    State:          Running
      Started:      Sun, 13 Mar 2022 15:12:20 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     3500m
      memory:  8Gi
    Requests:
      cpu:      1
      memory:   4Gi
    Liveness:   http-get http://:http/login delay=100s timeout=5s period=10s #success=1 #failure=12
    Readiness:  http-get http://:http/login delay=80s timeout=1s period=10s #success=1 #failure=10
    Environment:
      COPY_REFERENCE_FILE_LOG:                             /var/lib/jenkins/copy_reference_file.log
      NEW_RELIC_METADATA_KUBERNETES_CLUSTER_NAME:          IAD.Prod
      NEW_RELIC_METADATA_KUBERNETES_NODE_NAME:              (v1:spec.nodeName)
      NEW_RELIC_METADATA_KUBERNETES_NAMESPACE_NAME:        jenkins (v1:metadata.namespace)
      NEW_RELIC_METADATA_KUBERNETES_POD_NAME:              jenkins-jenkins-instance (v1:metadata.name)
      NEW_RELIC_METADATA_KUBERNETES_CONTAINER_NAME:        master
      NEW_RELIC_METADATA_KUBERNETES_CONTAINER_IMAGE_NAME:  jenkins/jenkins:lts
      JAVA_OPTS:                                           -XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=80.0 -Djenkins.install.runSetupWizard=false -Djava.awt.headless=true
      JENKINS_HOME:                                        /var/lib/jenkins
    Mounts:
      /var/jenkins/init-configuration from init-configuration (ro)
      /var/jenkins/operator-credentials from operator-credentials (ro)
      /var/jenkins/scripts from scripts (ro)
      /var/lib/jenkins from jenkins-home (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fc57k (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  jenkins-home:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      jenkins-operator-scripts-jenkins-instance
    Optional:  false
  init-configuration:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      jenkins-operator-init-configuration-jenkins-instance
    Optional:  false
  operator-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  jenkins-operator-credentials-jenkins-instance
    Optional:    false
  kube-api-access-fc57k:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

Q5: How we deploy jenkins? A5: We are using Jenkins-operator to deploy jenkins.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source