'What is the difference between the Autocorrelation functions provided by statsmodels, scipy & numpy?
I can see that there are different functions available across various libraries for performing Autocorrelation on a signal in Python.
I've tried the following 3 functions and all result in different outputs for the sample 'x' used, where, x = [22, 24, 25, 25, 28, 29, 34, 37, 40, 44, 51, 48, 47, 50, 51]
1) Using statsmodels
import statsmodels
res = statsmodels.tsa.stattools.acf(x)
plt.plot(res)
plt.show()
2. Using scipy
import scipy.signal as signal
res = signal.correlate(x, x, mode = 'same')
res_au = (res-min(res))/(max(res)-min(res))
plt.plot(res_au)
plt.show()
3. Using numpy
import numpy
res = numpy.correlate(x, x, mode='same')
res_norm = (res-min(res))/(max(res)-min(res))
plt.plot(res_norm)
plt.show()
Can anyone please explain what are the differences between them and when should we be using each of them?
My objective is to find autocorrelation for a single channel with itself.
Solution 1:[1]
Your confusion stems from the difference between statistical (statsmodels.tsa.stattools.acf) and signal processing (scipy.signal.correlate/numpy.correlate) definitions of autocorrelation. Statistical autocorrelation is normalized onto [-1,1] interval. Your attempt at normalization is incorrect.
Example using numpy.correlate to match output of statsmodels.tsa.stattools.acf:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([22, 24, 25, 25, 28, 29, 34, 37, 40, 44, 51, 48, 47, 50, 51])
def acorr(x, lags):
x_demeaned=x-x.mean()
corr=np.correlate(x_demeaned,x_demeaned,'full')[len(x)-1:]/(np.var(x)*len(x))
return corr[:len(lags)]
plt.plot(acorr(x, range(len(x))))
plt.show()
Related question: How can I use numpy.correlate to do autocorrelation?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |




