'Python: replacing outliers values with median values
I have a python data-frame in which there are some outlier values. I would like to replace them with the median values of the data, had those values not been there.
id Age
10236 766105
11993 288
9337 205
38189 88
35555 82
39443 75
10762 74
33847 72
21194 70
39450 70
So, I want to replace all the values > 75 with the median value of the dataset of the remaining dataset, i.e., the median value of 70,70,72,74,75.
I'm trying to do the following:
- Replace with 0, all the values that are greater than 75
- Replace the 0s with median value.
But somehow, the below code not working
df['age'].replace(df.age>75,0,inplace=True)
Solution 1:[1]
A more general solution I've tried lately: replace 75 with the median of the whole column and then follow a solution similar to what Bharath suggested:
median = float(df['Age'].median())
df["Age"] = np.where(df["Age"] > median, median, df['Age'])
Solution 2:[2]
you code is almost right , but their is a gap.
use:
df['age']=df['age'].replace(df.age>75,0,inplace=True)
Solution 3:[3]
Actually, this is not an efficient way to deal with outliers in data.
You can refer to this article https://www.kite.com/python/answers/how-to-remove-outliers-from-a-pandas-dataframe-in-python
By calculating z scores for a column or entire dataset you can replace outliers with dynamic and mathematical calculations.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Behnam Moh |
| Solution 2 | S.B |
| Solution 3 | K_one28 |
