'pandas dataframe function mean() not working correctly to ignore nan values

By default, the mean() method should ignore the nan value, but for my case, it didn't work. It still takes the nan value.

a = np.array([1,9])
b = np.array([3,nan])
c = np.array([7,8])
d = {'value': [a,b,a,c], 'group': [3,3,4,4], 'garbage':['asd','acas','asdasdc','ghfas']}
df = pd.DataFrame(data=d)
df

OUTPUT:
value   group   garbage
0   [1, 9]  3   asd
1   [3.0, nan]  3   acas
2   [1, 9]  4   asdasdc
3   [7, 8]  4   ghfas

for i,j in df.groupby('group')['value']:
    print(j.mean())
    print("=========")

OUTPUT:
[ 2. nan]
=========
[4.  8.5]
=========

pandas

Solution 1:^[1]

I am not sure what you are trying to do here, but Ill take a stab at it.

Firstly, the values column is a column of numpy arrays, so it is two dimensional. Then when you run groupby, j becomes a pd.Series of numpy arrays. Thus, when you call mean you are taking the mean by aligning the axes of the numpy arrays. This is pretty unadvisable because these objects can change shape which will cause an error.

I think what you are trying to do is take the mean across all the arrays in each group. You can do that with.

for i,j in df.groupby('group')['value']:
    print(np.nanmean(np.concatenate(j.values)))

Whatever you are trying to do, it is going to be way easier to interact with once you combine the values in your loop.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Collin Cunningham

'pandas dataframe function mean() not working correctly to ignore nan values

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]