'How to resolve this error that nginx-ingress-controller start fail in my k8s cluster?

  • Rancher v2.4.2
  • kubernetes version: v1.17.4

In my k8s cluster,nginx-ingress-controller doesn't work and restart always.I don't get anything useful information in the logs, thanks for your help.

cluster nodes:

> kubectl get nodes  
NAME      STATUS   ROLES                      AGE   VERSION
master1   Ready    controlplane,etcd,worker   18d   v1.17.4
master2   Ready    controlplane,etcd,worker   17d   v1.17.4
node1     Ready    worker                     17d   v1.17.4
node2     Ready    worker                     17d   v1.17.4

cluster pods in ingress-nginx namespace

> kubectl get pods -n ingress-nginx
NAME                                    READY   STATUS    RESTARTS   AGE
default-http-backend-5bb77998d7-k7gdh   1/1     Running   1          17d
nginx-ingress-controller-6l4jh          0/1     Running   10         27m
nginx-ingress-controller-bh2pg          1/1     Running   0          63m
nginx-ingress-controller-drtzx          1/1     Running   0          63m
nginx-ingress-controller-qndbw          1/1     Running   0          63m

the pod logs of nginx-ingress-controller-6l4jh

> kubectl logs nginx-ingress-controller-6l4jh -n ingress-nginx
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       nginx-0.25.1-rancher1
  Build:         
  Repository:    https://github.com/rancher/ingress-nginx.git
  nginx version: openresty/1.15.8.1

-------------------------------------------------------------------------------

> 

describe info

> kubectl describe pod nginx-ingress-controller-6l4jh -n ingress-nginx
Name:         nginx-ingress-controller-6l4jh
Namespace:    ingress-nginx
Priority:     0
Node:         node2/172.26.13.11
Start Time:   Tue, 19 Apr 2022 07:12:16 +0000
Labels:       app=ingress-nginx
              controller-revision-hash=758cb9dbbc
              pod-template-generation=8
Annotations:  cattle.io/timestamp: 2022-04-19T07:08:51Z
              field.cattle.io/ports:
                [[{"containerPort":80,"dnsName":"nginx-ingress-controller-hostport","hostPort":80,"kind":"HostPort","name":"http","protocol":"TCP","source...
              field.cattle.io/publicEndpoints:
                [{"addresses":["172.26.13.130"],"nodeId":"c-wv692:m-d5802d05bbf0","port":80,"protocol":"TCP"},{"addresses":["172.26.13.130"],"nodeId":"c-w...
              prometheus.io/port: 10254
              prometheus.io/scrape: true
Status:       Running
IP:           172.26.13.11
IPs:
  IP:           172.26.13.11
Controlled By:  DaemonSet/nginx-ingress-controller
Containers:
  nginx-ingress-controller:
    Container ID:  docker://09a6248edb921b9c9cbab678c793fe1cc3d28322ea6abbb8f15c899351ce4b40
    Image:         172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
    Image ID:      docker-pullable://172.26.13.133:5000/rancher/nginx-ingress-controller@sha256:fe50ceea3d1a0bc9a7ccef8d5845c9a30b51f608e411467862dff590185a47d2
    Ports:         80/TCP, 443/TCP
    Host Ports:    80/TCP, 443/TCP
    Args:
      /nginx-ingress-controller
      --default-backend-service=$(POD_NAMESPACE)/default-http-backend
      --configmap=$(POD_NAMESPACE)/nginx-configuration
      --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
      --udp-services-configmap=$(POD_NAMESPACE)/udp-services
      --annotations-prefix=nginx.ingress.kubernetes.io
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Tue, 19 Apr 2022 07:40:12 +0000
      Finished:     Tue, 19 Apr 2022 07:41:32 +0000
    Ready:          False
    Restart Count:  11
    Liveness:       http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
    Readiness:      http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       nginx-ingress-controller-6l4jh (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-2kdbj (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  nginx-ingress-serviceaccount-token-2kdbj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nginx-ingress-serviceaccount-token-2kdbj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     :NoExecute
                 :NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  <unknown>           default-scheduler  Successfully assigned ingress-nginx/nginx-ingress-controller-6l4jh to node2
  Normal   Pulled     27m (x3 over 30m)   kubelet, node2     Container image "172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1" already present on machine
  Normal   Created    27m (x3 over 30m)   kubelet, node2     Created container nginx-ingress-controller
  Normal   Started    27m (x3 over 30m)   kubelet, node2     Started container nginx-ingress-controller
  Normal   Killing    27m (x2 over 28m)   kubelet, node2     Container nginx-ingress-controller failed liveness probe, will be restarted
  Warning  Unhealthy  25m (x10 over 29m)  kubelet, node2     Liveness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
  Warning  Unhealthy  10m (x21 over 29m)  kubelet, node2     Readiness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
  Warning  BackOff    8s (x69 over 20m)   kubelet, node2     Back-off restarting failed container
> 


Solution 1:[1]

It sounds like the ingress controller pod fails the liveness/readiness checks but looks like only on a certain node. You could try:

  • check the node for firewall on that port
  • update to newer version than nginx-0.25.1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Root -