'How to delete node in EKS managed node group if the Kubelet crashes or stops reporting?

I am using aws EKS with a managed node group. Twice in the passed couple of weeks I had a case where the Kubelet in one of the nodes crashed or stopped reporting back to the control plane.

In this case I would expect the Autoscaling group to identify this node as unhealthy, and replace it. However, this is not what happens. I have recreated the issue by creating a node and manually stopping the Kubelet, see image below:

enter image description here

My first thought was to create an Event Bus alert that would trigger a lambda to take care of this but I couldn't find the EKS service in the list of services in Event Bus, so …

Does anyone know of a tool or configuration that would help with this? To be clear I am looking for something that would:

  1. Detect that that kubelet isn't connecting to the control plane
  2. Delete the node in the cluster
  3. Terminate the EC2

THANKS!!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source