'How to replace NaNs by average of preceding and succeeding values in pandas DataFrame?
If I have some missing values and I would like to replace all NaN with average of preceding and succeeding values, how can I do that ?.
I know I can use pandas.DataFrame.fillna with method='ffill' or method='bfill' options to replace the NaN values by preceding or succeeding values, however I would like to apply the average of those values on the dataframe instead of iterating over rows and columns.
Solution 1:[1]
Maybe late but I just had the same question and the (unique) answer in this page did not satisfy my expectations. That's why I am answering now.
Your post states that you want to replace the NaNs with averages however, the interpolation is not a correct answer for me because it fills the empty cells with a linear equation. If you want to fill it with the averages of the preceding and succeeding rows. This code helped me:
dfb = df.fillna(method='bfill')
dff = df.fillna(method='ffill')
dfmeans = (dfb+dff)/2
dfmeans
For the datafrme of the example above, the result looks like
A B
0 1.0 0.250
1 2.1 2.125
2 3.4 2.125
3 4.7 4.000
4 5.6 12.200
5 6.8 14.400
Where you can see, at index 2 of the column A they both produce 3.4 because there the interpolation is (2.1 + 4.7)/2 but in column B the values differ.
For a one-line script and it's application to time series, you can see this post: Average between values with unevenly distributed time in Pandas DataFrame
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | eliasmaxil |
