'Kubelet on Kubernetes worker nodes breaks when master node us unavailable
I have a situation in our k8s cluster. When a k8s master node is down, the kubelet service on the worker nodes that were connecting to affected master node breaks - the node goes into NotReady status. Below are the messages I could see from kubelet during the time of the incident. As seen, the kubelet is trying to fetching secrets/configmaps from the master node that is down.
root@k8s-glance02:~# systemctl status kubelet.service | more
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2020-10-16 16:26:48 UTC; 1 years 6 months ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 22912 (kubelet)
Tasks: 25 (limit: 4915)
CGroup: /system.slice/kubelet.service
└─22912 /usr/local/bin/kubelet --logtostderr=true --v=2 --node-ip=10.69.214.140 --hostname-override=k8s-glance02.global.inova-pipeline-iad.ohthree
.com --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/etc/kubernetes/kubelet-config.yaml --kubeconfig=/etc/kubernetes/kubelet.conf --p
od-infra-container-image=k8s.gcr.io/pause:3.2 --runtime-cgroups=/systemd/system.slice --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/c
ni/bin
May 02 15:32:15 k8s-glance02 kubelet[22912]: E0502 15:32:15.037807 22912 reflector.go:383] object-"kube-system"/"node
localdns": Failed to watch *v1.ConfigMap: Get https://10.69.209.53:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metad
ata.name%3Dnodelocaldns&resourceVersion=463481235&timeout=8m13s&timeoutSeconds=493&watch=true: dial tcp 10.69.209.53:6443: connect: connection refused
May 02 15:32:15 k8s-glance02 kubelet[22912]: E0502 15:32:15.038946 22912 reflector.go:383] object-"kube-system"/"kube
-proxy-token-97n9g": Failed to watch *v1.Secret: Get https://10.69.209.53:6443/api/v1/namespaces/kube-system/secrets?allowWatchBookmarks=true&fieldSelector=m
etadata.name%3Dkube-proxy-token-97n9g&resourceVersion=443070136&timeout=5m13s&timeoutSeconds=313&watch=true: dial tcp 10.69.209.53:6443: connect: connection
refused
May 02 15:32:15 k8s-glance02 kubelet[22912]: E0502 15:32:15.040030 22912 reflector.go:383] object-"kube-system"/"cali
co-node-token-27jzm": Failed to watch *v1.Secret: Get https://10.69.209.53:6443/api/v1/namespaces/kube-system/secrets?allowWatchBookmarks=true&fieldSelector=
metadata.name%3Dcalico-node-token-27jzm&resourceVersion=443070136&timeout=9m54s&timeoutSeconds=594&watch=true: dial tcp 10.69.209.53:6443: connect: connectio
n refused
May 02 15:32:15 k8s-glance02 kubelet[22912]: E0502 15:32:15.041136 22912 reflector.go:383] k8s.io/client-go/informers
/factory.go:135: Failed to watch *v1beta1.RuntimeClass: Get https://10.69.209.53:6443/apis/node.k8s.io/v1beta1/runtimeclasses?allowWatchBookmarks=true&resour
ceVersion=141325884&timeout=6m42s&timeoutSeconds=402&watch=true: dial tcp 10.69.209.53:6443: connect: connection refused
May 02 15:32:15 k8s-glance02 kubelet[22912]: E0502 15:32:15.043357 22912 reflector.go:383] object-"kube-system"/"node
localdns-token-7dvbx": Failed to watch *v1.Secret: Get https://10.69.209.53:6443/api/v1/namespaces/kube-system/secrets?allowWatchBookmarks=true&fieldSelector
=metadata.name%3Dnodelocaldns-token-7dvbx&resourceVersion=443070136&timeout=5m12s&timeoutSeconds=312&watch=true: dial tcp 10.69.209.53:6443: connect: connect
ion refused
May 02 15:32:15 k8s-glance02 kubelet[22912]: E0502 15:32:15.044591 22912 reflector.go:383] object-"kube-system"/"kube
-proxy": Failed to watch *v1.ConfigMap: Get https://10.69.209.53:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadat
a.name%3Dkube-proxy&resourceVersion=466328076&timeout=8m46s&timeoutSeconds=526&watch=true: dial tcp 10.69.209.53:6443: connect: connection refused
May 02 15:32:15 k8s-glance02 kubelet[22912]: E0502 15:32:15.045779 22912 reflector.go:383] k8s.io/kubernetes/pkg/kube
let/kubelet.go:517: Failed to watch *v1.Service: Get https://10.69.209.53:6443/api/v1/services?allowWatchBookmarks=true&resourceVersion=465632206&timeoutSeco
nds=348&watch=true: dial tcp 10.69.209.53:6443: connect: connection refused
May 02 15:32:15 k8s-glance02 kubelet[22912]: E0502 15:32:15.046907 22912 reflector.go:383] object-"global"/"default-t
oken-9pbfs": Failed to watch *v1.Secret: Get https://10.69.209.53:6443/api/v1/namespaces/global/secrets?allowWatchBookmarks=true&fieldSelector=metadata.name%
3Ddefault-token-9pbfs&resourceVersion=443070136&timeout=9m5s&timeoutSeconds=545&watch=true: dial tcp 10.69.209.53:6443: connect: connection refused
May 02 15:32:15 k8s-glance02 kubelet[22912]: E0502 15:32:15.048051 22912 reflector.go:383] k8s.io/kubernetes/pkg/kube
root@k8s-glance02:~#
The Kubelet service on affected worker nodes continues to experience issues untill its been restarted manually.
Shouldn't the kubelet service on worker nodes connect to another k8s master node in the cluster if the ones they are connecting to goes down? or is there any configuration parameters that needs to be set for that? Our clusters are running on versions 1.18.9 and 1.16.9
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
