'Correcting clock skew in a GKE cluster
I have the following alert configured in prometheus:
alert: ClockSkewDetected
expr: abs(node_timex_offset_seconds{job="node-exporter"})
> 0.03
for: 2m
labels:
severity: warning
annotations:
message: Clock skew detected on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}. Ensure NTP is configured correctly on this host.
This alert is part of the default kube-prometheus stack which I am using.
I find this alert fires for around 10 mins every day or two.
I'd like to know how to deal with this problem (the alert firing!). It's suggested in this answer that I shouldn't need to run NTP (via a daemonset I guess) myself on GKE.
I'm also keen to use the kube-prometheus defaults where possible - so I'm unsure about increasing the 0.03 value.
Solution 1:[1]
As it turns out, this alert overkill described above can also happen when you have no NTP running anywhere yourself. As stated by @yyyyahir correctly, Google manages NTP for you.
In that case it ended up being a bug in the Prometheus Operator [1] [2] [3] reporting this wrong, which can be fixed by upgrading. There is a lot more details in the issues linked for anyone interested.
I hope answering here makes someone else have one headache less :-)
[1] https://github.com/helm/charts/issues/21941
[2] https://github.com/prometheus-operator/kube-prometheus/issues/495
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | sandrom |
