'Detect and Setup alerts on clusters being dropped from envoy

We are using envoy as a reverse proxy and have few static/dynamic clusters. I need a way to monitor all the static clusters (all are critical) and create alerts whenever any of them is not reachable. The alert will help team take timely action.

I am new to envoy and exploring its features. It would be helpful if someone can answer/ point me to right resource.

thanks



Solution 1:[1]

As far as I know, this is not possible out-of-the-box with Envoy. But you can use something like Prometheus and Alertmanager to monitor and create alerts for your clusters.

If you have admin interface set up (https://www.envoyproxy.io/docs/envoy/v1.21.1/operations/admin), you can query /stats/prometheus to get some metrics.

The following metrics can be interesting in your case :

  • envoy_cluster_update_failure{envoy_cluster_name="my-cluster"} : increase when the cluster is not reachable
  • envoy_cluster_update_success{envoy_cluster_name="my-cluster"} : increase when the cluster is reachable

I am not an expert in Prometheus/Alertmanager, but something like :

increase(envoy_cluster_update_failure{envoy_cluster_name="my-cluster"}[1m]) > 0

should trigger alerts when the cluster my-cluster become not reachable.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 norbjd