'Is there a fast running rolling standard deviation algorithm?

I have a Python script in which for every new sample i have to update the standard deviation of this samples array using a rolling window of length N. Using the simple formula of the standard deviation the code is really slow. I found many different solutions for online calculation but all of them are NOT considering a rolling window for the update. Some of alternative way for computing the variance are explained here (Welford algorithm, parallel, ...) https://en.m.wikipedia.org/wiki/Algorithms_for_calculating_variance but none of them are actually using a rolling window through data set. What i'm looking for is a fast algorithm which won't be prone to catastrophic cancellation phenomenon. Formulas will be appreciated.

Thanks for your help guys.



Solution 1:[1]

Here's an adaptation of the code at the link I put in a comment. It takes O(1) (constant) time for each element added, regardless of window size. Note that if you configure for a window of N elements, the first N-1 results are more-than-less gibberish: it initializes the data to N zeroes. The class also saves the most recent N entries in a collections.deque of maximum size N. Note that this computes the "sample" standard deviation, not "population". Season to taste ;-)

from collections import deque
from math import sqrt

class Running:
    def __init__(self, n):
        self.n = n
        self.data = deque([0.0] * n, maxlen=n)
        self.mean = self.variance = self.sdev = 0.0

    def add(self, x):
        n = self.n
        oldmean = self.mean
        goingaway = self.data[0]
        self.mean = newmean = oldmean + (x - goingaway) / n
        self.data.append(x)
        self.variance += (x - goingaway) * (
            (x - newmean) + (goingaway - oldmean)) / (n - 1)
        self.sdev = sqrt(self.variance)

Here's a by-eyeball sanity check. Seems fine. But note that the statistics module makes heroic (and slow!) efforts to maximize floating-point accuracy. The code above just accepts accumulating half a dozen fresh rounding errors per element added.

import statistics
from random import random
from math import ulp
r = Running(50)
for i in range(1000000):
    r.add(random() * 100)
    assert len(r.data) == 50
    if i % 1000 == 0.0:
        a, b = r.mean, statistics.mean(r.data)
        print(i, "mean", a, b, (a - b) / ulp(b))
        a, b = r.sdev, statistics.stdev(r.data)
        print(i, "sdev", a, b, (a - b) / ulp(b))

Sample output (will vary across runs):

0 mean 1.4656985567210468 1.4656985567210468 0.0
0 sdev 10.364053886327875 10.364053886327877 -1.0
1000 mean 50.73313401192864 50.73313401192878 -20.0
1000 sdev 31.06576415649153 31.06576415649151 5.0
2000 mean 50.4175663202043 50.41756632020437 -10.0
2000 sdev 27.692406266774878 27.69240626677488 -1.0
3000 mean 53.054435599235525 53.0544355992356 -11.0
3000 sdev 32.439246859431556 32.439246859431606 -7.0
4000 mean 51.66216784517698 51.662167845177 -3.0
4000 sdev 31.026902004950404 31.02690200495047 -18.0
5000 mean 54.08949367166644 54.089493671666425 2.0
5000 sdev 29.405357061221196 29.40535706122128 -24.0
...

Solution 2:[2]

I have a function that I use for standard deviation. I modified it from a php function that calculated standard deviation. You can input an array (or a slice of an array) into it, and it will calculate the standard deviation for that array or slice.

def calc_std_dev(lst, precision=0, sample=True):
"""
:param: lst A Python list containing values
:param: precision The number of decimal places desired.
:param: sample Is the data a sample or a population? Set to True by
default.
"""
sum = 0;
length = len(lst)
if length == 0:
    print("The array has zero elements.")
    return false
elif (length == 1) and (sample == True):
    print("The array has only 1 element.")
    return false
else:
    sum = math.fsum(lst)
#  Calculate the arithmetic mean
mean = sum / length
carry = 0.0
for i in lst:
    dev = i - mean
    carry += dev * dev
if sample == True:
    length = length - 1
variance = carry / length
std_dev = math.sqrt(variance)
std_dev = round(std_dev, precision)
return std_dev

When I need a rolling standard deviation, I pass in a slice of the total list to calculate the value.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Peters
Solution 2 Greg G