'replacing zero values with mean of other columns
I'm trying to replace the zero values in column "gps_height". They should be the average of its class "ward". However, I ran this code and it says error.
df.groupby('ward')['gps_height'].transform(lambda x: df.gps_height.mean() if x == 0 else x)
Solution 1:[1]
This can be further be improved but it works.
df=pd.DataFrame([[1, 10], [1, 0], [1, 24], [2, 15], [2, 0], [3, 23]], columns=['ward','gps_height'])
df['gps_height']=df['gps_height'].replace(0, np.nan) #replace 0 with NaN to remove 0 from average
df2=df.groupby(['ward'], as_index=False).mean() #get mean group by ward
df = df.merge(df2, on='ward', how='outer') #merge both dataframes
df.loc[pd.isnull(df.gps_height_x), 'gps_height_x'] = df.gps_height_y #replace NaN from average values
df=df[['ward','gps_height_x']] #select only first two columns
df.columns=['ward','gps_height'] #renamecolumns
df
Result:
ward gps_height
0 1 10.0
1 1 17.0
2 1 24.0
3 2 15.0
4 2 15.0
5 3 23.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jose_bacoy |
