'How to resolve Kubernetes image respository in a bad state

I have a 3 node bare metal K3s cluster where an install fails on one node, but not another.

My guess is that somehow the Kubernetes image repository on the node where the deployment failed is in a bad state. I don't know how to prove that, or fix it.

I did a helm install yesterday which failed with the following error:

Apr 14 14:28:41 clstr2n1 k3s[18777]: E0414 14:28:41.878018   18777 remote_image.go:114] "PullImage from image service failed" err="rpc error: code = NotFound desc = failed to pull and unpack image \"docker.ssgh.com/device-api:1.2.0-SNAPSHOT\": failed to copy: httpReadSeeker: failed open: could not fetch content descriptor sha256:cd5b8d67fe0f3675553921aeb4310503a746c0bb8db237be6ad5160575a133f9 (application/vnd.docker.image.rootfs.diff.tar.gzip) from remote: not found" image="docker.ssgh.com/device-api:1.2.0-SNAPSHOT"

I verified that I could pull the image from the repository using docker pull docker.ssgh.com/device-api:1.2.0-SNAPSHOT on my development VM and it worked as expected.

I then set the nodeName attribute for the pod specification to force it to one of the other nodes and the deployment worked as expected.

In addition I also used cURL to fetch the content descriptor, which worked as expected.

Edit for further detail. My original install included 6 different charts. Initially only 2 of the 6 installed correctly, the remaining 4 reported image pull errors. I deleted the failing 4 and tried again, this time 2 of the 4 failed. I deleted the failing 2 and tried again. These 2 continued to fail, unless I specified a different node, in which they worked. I deleted them again and waited for an hour to see if Kubernetes would clean up the mess. When I tried again, 1 of them worked, but the other continued to fail. I left it over night, and its still failing this morning. Unless I move force onto a different node.

It is worth noting that the nodes in question are able to download other images from the same private repo without issue.

k3s


Solution 1:[1]

There can be multiple reasons for your pod not pulling the image on particular node:

Docker on non-working node is not trusting the image repo

Docker is not verifying the CA issuer for the repo

Firewall is not opened to image repo on non-working node

Troubleshoot using the following option to find the cause of the issue :

Check the connectivity to image repo on the non-working node

Check the docker config over non working node whether its allowing the image repo Do docker pull on non working node

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Manmohan Mittal