'Rolling correlation in pandas returning NaNs
I would like to get the best match of a small array "b" (length: few hundreds of elements) from a bigger array "a" (length: few millions of elements). I am trying to use pandas, rolling and corr for comparing "b" with a sliding window over "a". This is my code:
import pandas as pd
a = pd.read_csv(<file1>)
b = pd.read_csv(<file2>)
normalized_a = (a - a.mean()) / a.std()
normalized_b = (b - b.mean()) / b.std()
res = a.rolling(window=len(b)).corr(b)
Dataframe a is:
0
0 0.941042
1 0.656281
2 0.969081
3 0.881595
4 0.848359
... ...
1814386 -1.323574
1814387 -1.351035
1814388 -1.359450
1814389 -1.296941
1814390 -1.266813
Dataframe b:
0 -2.256496
1 -2.949674
2 -1.614618
3 -1.784006
4 -0.976331
.. ...
287 0.378578
288 0.247859
289 0.375981
290 0.444575
291 0.450435
However, res contains all NaNs, but one element (in fact, output of res.count() is 1):
0
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
... ..
1814386 NaN
1814387 NaN
1814388 NaN
1814389 NaN
1814390 NaN
The only non-NaN element in res is located at row 291 (found with res.idxmax()):
280 NaN
281 NaN
282 NaN
283 NaN
284 NaN
285 NaN
286 NaN
287 NaN
288 NaN
289 NaN
290 NaN
291 -0.134144
292 NaN
293 NaN
294 NaN
295 NaN
296 NaN
297 NaN
298 NaN
299 NaN
Does anybody know why I get all these NaNs? I would have expected to get meaningful values after row 292. Is corr a pairwise operation?
Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
