'Quantiles in Prometheus Summary - What can I do with them in PromQL
I have a summary metric for endpoint latency "my_metric_api_latency_seconds" with a few quantiles calculated by the client for e.g. p50, p90, p95, p99 etc and I also have a set of labels associated with the metric.
consider I have the following time-series as: my_metric_api_latency_seconds{host="host-1.win", instance="local", api="/api/foo", status="200", quantile=".95"} = 0.05
my_metric_api_latency_seconds{host="host-2.win", instance="web", api="/api/foo", status="200", quantile=".95"} = 0.76
my_metric_api_latency_seconds{host="host-3.win", instance="native", api="/api/foo", status="200", quantile=".95"} = 0.55
We know that summary quantiles are not aggregatable. Since, the quantiles are calculated by the client, PromQL queries are much faster as well.
My question is
How can I use PromQL query which will give me the overall p95 latency results for the endpoint "api/foo" over all the hosts.
If I have another time series for another endpoint for e.g. my_metric_api_latency_seconds{host="host-1.win", instance="local", api="/api/foo2", status="200", quantile=".95"} = 0.05. How can I use PromQL query to give me the overall latency of host = "host-1.win" aggregated over all the other labels.
Solution 1:[1]
As you say, quantiles are not aggregatable so these queries are not possible with this input data. You could use the _sum and _count of the Summary to calculate an average or use a Histogram instead if you want a quantile.
Solution 2:[2]
Unfortunately Prometheus-style summary quantiles cannot be aggregated :( So it is recommended switching to Histograms instead if aggregation over multiple metrics is needed. Prometheus provides histogram_quantile function, which can be used for dynamic quantiles' calculation across histogram buckets. For example, the following query returns 95th percentile over my_metric_api_latency_seconds histogram grouped by host:
histogram_quantile(0.95, sum(rate(my_metric_api_latency_seconds_bucket[5m])) by (host,le))
P.S. It may be hard to choose the correct set of buckets for Prometheus-style histograms. See this article for possible issues and solutions.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | brian-brazil |
| Solution 2 | valyala |
