'Trying to create a script to look for pods in '0/1' status for more than 20 mins or an hour and delete them
BAD_PODS=`kubectl get pods --context ${EKS_CLUSTER_NAME} | grep "0/1" | awk {'print $1'}`
if [ -z "$BAD_PODS" ]; then
log "No Pod in Not Ready state"
else
for pod in $BAD_PODS
do
duration=`kubectl get pod --context ${EKS_CLUSTER_NAME} $pod | grep "0/1" | awk -F' ' {'print $5'}`
if [ `echo $duration | egrep "h|d"` ]; then
log "Pod not running since more than an hour. Deleting it."
kubectl delete pod --context ${EKS_CLUSTER_NAME} $pod
elif [ `echo $duration | awk -F'm' {'print $1'}` -gt 20 ]; then
log "Pod not running since more than 20 minutes. Deleting it."
kubectl delete pod --context ${EKS_CLUSTER_NAME} $pod
fi
done
fi
this however does not seem to work when I deploy to my cluster. I keep getting a crash loop back off error. Would like some feedback in this
Solution 1:[1]
Try command like
kubectl --context ${EKS_CLUSTER_NAME} delete pod $pod
whole script
BAD_PODS=`kubectl get pods --context ${EKS_CLUSTER_NAME} | grep "0/1" | awk {'print $1'}`
if [ -z "$BAD_PODS" ]; then
log "No Pod in Not Ready state"
else
for pod in $BAD_PODS
do
duration=`kubectl get pod --context ${EKS_CLUSTER_NAME} $pod | grep "0/1" | awk -F' ' {'print $5'}`
if [ `echo $duration | egrep "h|d"` ]; then
log "Pod not running since more than an hour. Deleting it."
kubectl --context ${EKS_CLUSTER_NAME} delete pod $pod
elif [ `echo $duration | awk -F'm' {'print $1'}` -gt 20 ]; then
log "Pod not running since more than 20 minutes. Deleting it."
kubectl --context ${EKS_CLUSTER_NAME} delete pod $pod
fi
done
fi
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Harsh Manvar |
