'pandas dataframe function mean() not working correctly to ignore nan values
By default, the mean() method should ignore the nan value, but for my case, it didn't work. It still takes the nan value.
a = np.array([1,9])
b = np.array([3,nan])
c = np.array([7,8])
d = {'value': [a,b,a,c], 'group': [3,3,4,4], 'garbage':['asd','acas','asdasdc','ghfas']}
df = pd.DataFrame(data=d)
df
OUTPUT:
value group garbage
0 [1, 9] 3 asd
1 [3.0, nan] 3 acas
2 [1, 9] 4 asdasdc
3 [7, 8] 4 ghfas
for i,j in df.groupby('group')['value']:
print(j.mean())
print("=========")
OUTPUT:
[ 2. nan]
=========
[4. 8.5]
=========
Solution 1:[1]
I am not sure what you are trying to do here, but Ill take a stab at it.
Firstly, the values column is a column of numpy arrays, so it is two dimensional. Then when you run groupby, j becomes a pd.Series of numpy arrays. Thus, when you call mean you are taking the mean by aligning the axes of the numpy arrays. This is pretty unadvisable because these objects can change shape which will cause an error.
I think what you are trying to do is take the mean across all the arrays in each group. You can do that with.
for i,j in df.groupby('group')['value']:
print(np.nanmean(np.concatenate(j.values)))
Whatever you are trying to do, it is going to be way easier to interact with once you combine the values in your loop.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Collin Cunningham |
