'determining which signals contain periodicity
I have thousands of signals, some noisy and some with periodic features. Auto-correlation seems like a natural check:
import numpy as np
def autocorr(x):
n = x.size
norm = x - np.mean(x)
result = np.correlate(norm, norm, mode="same")
acorr = result[n // 2 + 1 :] / (x.var() * np.arange(n - 1, n // 2, -1))
return acorr
Below is a plot showing a set of signals, left column being the raw signals, middle being the autocorrelation and far right being the FFT. An additional point about the data - for each batch of signals (~20 per batch, one batch plotted here), the ones that are periodic are highly correlated to one another as well (i.e. same frequency), opening up e.g. clustering/classification options too, though that's probably better done on the smoother autocorrelations.
Although it is quite clear which are periodic from the autocorrelation, it still recasts it as another problem, determining which of those are periodic. Any good ideas on distinguishing say rows 1, 7, 13 from the others? I thought about doing peak detection on the autocorrelation function but I'm sure there are cleaner ways.
I have scaled the alpha of each line to be the inverse of the Durbin-Watson statistic of the autocorrelation result:
from statsmodels.regression.linear_model import OLS
import numpy as np
from statsmodels.stats.stattools import durbin_watson
def dw(data):
ols_res = OLS(data, np.ones(len(data))).fit()
dwout = durbin_watson(ols_res.resid)
return dwout
Though this isn't entirely clear as some of the noisy signals are still highlighted and periodic signals (e.g. 1) slip through.
This looks as such:
fig, ax = plt.subplots(18, 3, sharex=True, constrained_layout=True, figsize=(16, 12))
for item in np.arange(tracks.shape[1]):
sig = tracks[:, item]
acc = autocorr(sig)
dout = dw(autocorr(sig)) # but scaled elsewhere between 0.1 to 1.
ff = np.abs(np.fft.fft(acc))
ax[item, 0].plot(sig, alpha=dout, color="k")
ax[item, 1].plot(acc, alpha=dout, color="k")
ax[item, 2].plot(ff, alpha=dout, color="k")
plt.show()
Traces CSV here: https://www.dropbox.com/s/atr9k56vwkq5mq7/traces.csv?dl=0
Note: these are just some examples of thousands - some are higher/lower wavelength than these few examples that have them.
Update: I thought given my autocorrelation is maximally 1, I could just do peak detection with a threshold of e.g. 0.8 and see if there is more than 1 peak detected (with some minimum distance between them) e.g.
peaks = peakutils.indexes(acc, thres=0.8, min_dist=10, thres_abs=True)
Seems unclean though.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|


