'Is there a fast running rolling standard deviation algorithm?
I have a Python script in which for every new sample i have to update the standard deviation of this samples array using a rolling window of length N. Using the simple formula of the standard deviation the code is really slow. I found many different solutions for online calculation but all of them are NOT considering a rolling window for the update. Some of alternative way for computing the variance are explained here (Welford algorithm, parallel, ...) https://en.m.wikipedia.org/wiki/Algorithms_for_calculating_variance but none of them are actually using a rolling window through data set. What i'm looking for is a fast algorithm which won't be prone to catastrophic cancellation phenomenon. Formulas will be appreciated.
Thanks for your help guys.
Solution 1:[1]
Here's an adaptation of the code at the link I put in a comment. It takes O(1) (constant) time for each element added, regardless of window size. Note that if you configure for a window of N elements, the first N-1 results are more-than-less gibberish: it initializes the data to N zeroes. The class also saves the most recent N entries in a collections.deque of maximum size N. Note that this computes the "sample" standard deviation, not "population". Season to taste ;-)
from collections import deque
from math import sqrt
class Running:
def __init__(self, n):
self.n = n
self.data = deque([0.0] * n, maxlen=n)
self.mean = self.variance = self.sdev = 0.0
def add(self, x):
n = self.n
oldmean = self.mean
goingaway = self.data[0]
self.mean = newmean = oldmean + (x - goingaway) / n
self.data.append(x)
self.variance += (x - goingaway) * (
(x - newmean) + (goingaway - oldmean)) / (n - 1)
self.sdev = sqrt(self.variance)
Here's a by-eyeball sanity check. Seems fine. But note that the statistics module makes heroic (and slow!) efforts to maximize floating-point accuracy. The code above just accepts accumulating half a dozen fresh rounding errors per element added.
import statistics
from random import random
from math import ulp
r = Running(50)
for i in range(1000000):
r.add(random() * 100)
assert len(r.data) == 50
if i % 1000 == 0.0:
a, b = r.mean, statistics.mean(r.data)
print(i, "mean", a, b, (a - b) / ulp(b))
a, b = r.sdev, statistics.stdev(r.data)
print(i, "sdev", a, b, (a - b) / ulp(b))
Sample output (will vary across runs):
0 mean 1.4656985567210468 1.4656985567210468 0.0
0 sdev 10.364053886327875 10.364053886327877 -1.0
1000 mean 50.73313401192864 50.73313401192878 -20.0
1000 sdev 31.06576415649153 31.06576415649151 5.0
2000 mean 50.4175663202043 50.41756632020437 -10.0
2000 sdev 27.692406266774878 27.69240626677488 -1.0
3000 mean 53.054435599235525 53.0544355992356 -11.0
3000 sdev 32.439246859431556 32.439246859431606 -7.0
4000 mean 51.66216784517698 51.662167845177 -3.0
4000 sdev 31.026902004950404 31.02690200495047 -18.0
5000 mean 54.08949367166644 54.089493671666425 2.0
5000 sdev 29.405357061221196 29.40535706122128 -24.0
...
Solution 2:[2]
I have a function that I use for standard deviation. I modified it from a php function that calculated standard deviation. You can input an array (or a slice of an array) into it, and it will calculate the standard deviation for that array or slice.
def calc_std_dev(lst, precision=0, sample=True):
"""
:param: lst A Python list containing values
:param: precision The number of decimal places desired.
:param: sample Is the data a sample or a population? Set to True by
default.
"""
sum = 0;
length = len(lst)
if length == 0:
print("The array has zero elements.")
return false
elif (length == 1) and (sample == True):
print("The array has only 1 element.")
return false
else:
sum = math.fsum(lst)
# Calculate the arithmetic mean
mean = sum / length
carry = 0.0
for i in lst:
dev = i - mean
carry += dev * dev
if sample == True:
length = length - 1
variance = carry / length
std_dev = math.sqrt(variance)
std_dev = round(std_dev, precision)
return std_dev
When I need a rolling standard deviation, I pass in a slice of the total list to calculate the value.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tim Peters |
| Solution 2 | Greg G |
