'Why does stddev_over_time increase the bigger the range vector is

I am setting up an anomaly detection for our web application based on rate of traffic at the moment. Traffic is usally approx. 0.6 per second as you can see with this rate query. We have a cluster of several instances of the inspected application, thus i need to aggregate using sum for determining the sum of req/s or avg std deviation below.

sum(rate(http_server_requests_seconds_count[1m]))

average rate of traffic per second in 1m interval

When I do the stddev_over_time with an interval of 1m it looks comprehensive. Note that I need to filter out 0's, because sometimes stddev_over_time cannot calculate a std deviation when a particular JVM didn't receive traffic on that instant point of time and then we get 0:

avg(stddev_over_time(http_server_requests_seconds_count[1m]) != 0)

enter image description here This works fine and values are in expected range of 0.5 (no deviation) to around 1-2 (relatively unprobable deviation).

I want to calculate the z score to detect the traffic anomaly you can cleary see at 11am in my first screenshot on the top (full loss of traffic -> alert!).

Z-Score formula is defined like follows:

z = (datapoint - mean_traffic) / "mean"_std_deviation

Thus, I want something like this:

z = (sum(rate[1m]) - sum(rate[10m])) / avg(stddev_over_time[10m])

However it does not work, because as soon as I increase the range vector of stddev_over_time to 10m, values seem to somehow sum up and do not reflect reality anymore (std dev of more than 1). If I increase it further, e. g. 30m I end up with values more than 5. std deviation going up when increasing range vector However, what I want is a moving average of std deviation of 10m, because I need it for determining whether the current rate of traffic deviates from the average std deviation (which the z-score is all about).



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source