'Why do I get nan from a new column consisting of the average of other columns?
I need to create a new column consisting of the average of other columns in the dataframe. If I add the columns to be averaged manually everything works, as in this case:
Case 1 - It works and give me a new column "BL_SFI_AV" with float
matrice_clean['BL_SFI_AV'] = matrice_clean[['BL_SFI_01','BL_SFI_02','BL_SFI_03']].mean(axis = 1)
Unfortunately, when I want to use the column slice and then give a range of columns to take the result of the new column is nan. I can't figure out where the problem is. Here is the code I use for the slice:
Case 2 - It doesn't give me any error but the new column BL_SFI_AV is full of nan.
matrice_clean['BL_SFI_AV'] = matrice_clean.loc['BL_SFI_01':'BL_SFI_03'].mean (axis = 1)
Solution 1:[1]
First, I think your "case 2" example contains an error. The columns are in the second axis, so you need to include a row indexer first:
matrice_clean.loc[:, 'BL_SFI_01':'BL_SFI_03'].mean (axis = 1)
# ^^
# Don't forget this
Next, I have a theory: In what order to your columns appear?
If they are not stored as ['BL_SFI_01','BL_SFI_02','BL_SFI_03'], then using slicing notation ['BL_SFI_01':'BL_SFI_03'] will not have the desired effect.
For example, this works fine:
In [25]: cols = ['BL_SFI_01', 'BL_SFI_02', 'BL_SFI_03']
...: df = pd.DataFrame(np.ones((5, 3)), columns=cols)
...: df.loc[:, 'BL_SFI_01':'BL_SFI_03'].mean(axis=1)
Out[25]:
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
dtype: float64
But this returns NaN, because the column order is reversed:
In [26]: cols = ['BL_SFI_03', 'BL_SFI_02', 'BL_SFI_01']
...: df = pd.DataFrame(np.ones((5, 3)), columns=cols)
...: df.loc[:, 'BL_SFI_01':'BL_SFI_03'].mean(axis=1)
Out[26]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
