'Grafana dashboard best practice for large scale monitoring
We have spark clusters with 100-200 nodes and we plot several metrics of executors, driver
We are not sure what's the best way to create a dashboard at such scale? Visualizing all the 100-200 nodes and executor stats doesn't surface the problem as there is lot of noise. It also slows down the dashboard tremendously
What are some good practices around grafana dashboards?
- Visualize using top K
- Plot only anomalies? How do we detect anomalies?
- How to reduce noise?
- How to make the dashboard more performant?
We use prometheus in the backend
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
