'GKE Autopilot - Containers stuck in init phase on particular node

I'm using GKE's Autopilot Cluster to run some kubernetes workloads. Pods getting scheduled to one of the allocated nodes is taking around 10 mins stuck in init phase. Same pod in different node is up in seconds.

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jobs
spec:
  replicas: 1
  selector:
    matchLabels:
      app: job
  template:
    metadata:
      labels:
        app: job
    spec:
      volumes:
        - name: shared-data
          emptyDir: {}
      initContainers:
        - name: init-volume
          image: gcr.io/dummy_image:latest
          imagePullPolicy: Always
          resources:
            limits:
              memory: "1024Mi"
              cpu: "1000m"
              ephemeral-storage: "10Gi"
          volumeMounts:
            - name: shared-data
              mountPath: /data
          command: ["/bin/sh","-c"]
          args:
          - cp -a /path /data;
      containers:
        - name: job-server
          resources:
            requests:
              ephemeral-storage: "5Gi"
            limits:
              memory: "1024Mi"
              cpu: "1000m"
              ephemeral-storage: "10Gi"
          image: gcr.io/jobprocessor:latest
          imagePullPolicy: Always
          volumeMounts:
            - name: shared-data
              mountPath: /ebdata1
     

This happens only if container has init container. In my case, I'm copying some data from dummy container to shared volume which I'm mounting on actual container.. But whenever pods get scheduled to this particular node, it gets stuck in init phase for around 10 minutes and automatically gets resolved. I couldn't see any errors in event logs.

kubectl describe node problematic-node

    Events:
  Type     Reason      Age   From            Message
  ----     ------      ----  ----            -------
  Warning  SystemOOM   52m   kubelet         System OOM encountered, victim process: cp, pid: 477887
  Warning  OOMKilling  52m   kernel-monitor  Memory cgroup out of memory: Killed process 477887 (cp) total-vm:2140kB, anon-rss:564kB, file-rss:768kB, shmem-rss:0kB, UID:0 pgtables:44kB oom_score_adj:-997

Only message is the above warning. Is this issue caused by some misconfiguration from my side?



Solution 1:[1]

The best recommendation is for you to manage container compute resources properly within your Kubernetes cluster. When creating a Pod, you can optionally specify how much CPU and memory (RAM) each Container needs to avoid OOM situations.

When Containers have resource requests specified, the scheduler can make better decisions about which nodes to place Pods on. And when Containers have their limits specified, contention for resources on a node can be handled in a specified manner. CPU specifications are in units of cores, and memory is specified in units of bytes.

An event is produced each time the scheduler fails, use the command below to see the status of events:

$ kubectl describe pod <pod-name>| grep Events

Also, read the official Kubernetes guide on “Configure Out Of Resource Handling”. Always make sure to:

reserve 10-20% of memory capacity for system daemons like kubelet and OS kernel identify pods which can be evicted at 90-95% memory utilization to reduce thrashing and incidence of system OOM.

To facilitate this kind of scenario, the kubelet would be launched with options like below:

--eviction-hard=memory.available<xMi
--system-reserved=memory=yGi

Having Heapster container monitoring in place must be helpful for visualization. Read more reading on Kubernetes and Docker Administration.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nestor Daniel Ortega Perez