'Kubernetes nginx ingress controller is unreliable

I need help understanding in detail how an ingress controller, specifically the ingress-nginx ingress controller, is supposed to work. To me, it appears as a black box that is supposed to listen on a public IP, terminate TLS, and forward traffic to a pod. But exactly how that happens is a mystery to me.

The primary goal here is understanding, the secondary goal is troubleshooting an immediate issue I'm facing.

I have a cluster with five nodes, and am trying to get the Jupyterhub application to run on it. For the most part, it is working fine. I'm using a pretty standard Rancher RKE setup with flannel/calico for the networking. The nodes run RedHat 7.9 with iptables and firewalld, and docker 19.03.

The Jupyterhub proxy is set up with a ClusterIP service (I also tried a NodePort service, that also works). I also set up an ingress. The ingress sometimes works, but oftentimes does not respond (connection times out). Specifically, if I delete the ingress, and then redeploy my helm chart, the ingress will start working. Also, if I restart one of my nodes, the ingress will start working again. I have not identified the circumstances when the ingress stops working.

Here are my relevant services:

kubectl get services
NAME                       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
hub                        ClusterIP   10.32.0.183   <none>        8081/TCP   378d
proxy-api                  ClusterIP   10.32.0.11    <none>        8001/TCP   378d
proxy-public               ClusterIP   10.32.0.30    <none>        80/TCP     378d

This works; telnet 10.32.0.30 80 responds as expected (of course only from one of the nodes). I can also telnet directly to the proxy-public pod (10.244.4.41:8000 in my case).

Here is my ingress.

kubectl describe ingress
Name:             jupyterhub
Labels:           app=jupyterhub
                  app.kubernetes.io/managed-by=Helm
                  chart=jupyterhub-1.2.0
                  component=ingress
                  heritage=Helm
                  release=jhub
Namespace:        jhub
Address:          k8s-node4.<redacted>,k8s-node5.<redacted>
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
  tls-jhub terminates jupyterhub.<redacted>
Rules:
  Host                     Path  Backends
  ----                     ----  --------
  jupyterhub.<redacted>
                           /   proxy-public:http (10.244.4.41:8000)
Annotations:               field.cattle.io/publicEndpoints:
                             [{"addresses":["",""],"port":443,"protocol":"HTTPS","serviceName":"jhub:proxy-public","ingressName":"jhub:jupyterhub","hostname":"jupyterh...
                           meta.helm.sh/release-name: jhub
                           meta.helm.sh/release-namespace: jhub
Events:                    <none>

What I understand so far about the ingress in this situation:

Traffic arrives on port 443 at k8s-node4 or k8s-node5. Some magic (controlled by the ingress controller) receives that traffic, terminates TLS, and sends the unencrypted traffic to the pod's IP at port 8000. That's the part I want to understand better.

That black box seems to at least partially involve flanel/calico and some iptables magic, and it also obviously involves nginx at some point.

Update: in the meantime, I identified what causes Kubernetes to break: restarting firewalld.

As best I can tell, that wipes out all iptables rules, not just the firewalld-generated ones.



Solution 1:[1]

I found the answer to my question here: https://www.stackrox.io/blog/kubernetes-networking-demystified/ There probably is a caveat that this may vary to some extent depending on which networking CNI you are using, although everything I saw was strictly related to Kubernetes itself.

I'm still trying to digest the content of this blog, and I highly recommend referring directly to that blog, instead of relying on my answer, which could be a poor retelling of the story.

Here is approximately how a package that arrives on port 443 flows.

You will need to use the command to see the tables.

iptables -t nat -vnL | less

The output of this looks rather intimidating.

The below cuts out a lot of other chains and calls to cut to the chase. In this example:

  • This cluster uses the CNI plugin for Calico/channel/Flannel.
  • Listen port is 443
  • The pod for the nginx-ingress-controller listens (among others) at 10.244.0.183.

In that situation, here is how the packet flows:

  • The packet comes into the PREROUTING chain.
  • The PREROUTING chain calls (among other things) the CNI-HOSTPORT-DNAT chain.
  • The POSTROUTING chain also calls the same chain.
  • The CNI-HOSTPORT-DNAT chain in turn calls several CNI-DN-xxxx chains.
  • The CNI-DN-xxx chains perform DNAT and change the destination address to 10.244.0.183.
  • The container inside the nginx-ingress-controller listens on 10.244.0.183.

There is some additional complexity involved if the pod is on a different node than the packet arrived in, and also if multiple pods are load-balanced for the same port. Load balancing seems to be handled with the iptables statistics module randomly picking one or the other iptables rule.

Internal traffic from a service to a pod follows a similar flow, but not the same.

In this example:

  • The service is at 10.32.0.183, port 8001
  • The pod is at 10.244.6.112, port 8001.
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
...
KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain KUBE-SERVICES (2 references)
...
/* Traffic from within the cluster to 10.32.0.183:8001 */
0 0 KUBE-SVC-ZHCKOT5PFJF4PASJ  tcp  --  *      *       0.0.0.0/0            10.32.0.183          tcp dpt:8001
...

/* Mark the package */
Chain KUBE-SVC-ZHCKOT5PFJF4PASJ (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 KUBE-MARK-MASQ  tcp  --  *      *      !10.244.0.0/16        10.32.0.183  tcp dpt:8081
    0     0 KUBE-SEP-RYU73S2VFHOHW4XO  all  --  *      *       0.0.0.0/0            0.0.0.0/0 

/* Perform DNAT, redirecting from 10.32.0.183 to 10.244.6.12 */
Chain KUBE-SEP-RYU73S2VFHOHW4XO (1 references)                                                                                                                                                                                                                                       0     0 KUBE-MARK-MASQ  all  --  *      *       10.244.6.112         0.0.0.0/0
0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0 tcp to:10.244.6.112:8081

The second part of my question regarding how to get the nodes to work reliably:

  • Disable firewalld.
  • Use Kubernetes network policies (or use Calico network policies if you are using Calico) instead.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kevin Keane