'Python: replacing outliers values with median values

I have a python data-frame in which there are some outlier values. I would like to replace them with the median values of the data, had those values not been there.

id         Age
10236    766105
11993       288
9337        205
38189        88
35555        82
39443        75
10762        74
33847        72
21194        70
39450        70

So, I want to replace all the values > 75 with the median value of the dataset of the remaining dataset, i.e., the median value of 70,70,72,74,75.

I'm trying to do the following:

  1. Replace with 0, all the values that are greater than 75
  2. Replace the 0s with median value.

But somehow, the below code not working

df['age'].replace(df.age>75,0,inplace=True)


Solution 1:[1]

A more general solution I've tried lately: replace 75 with the median of the whole column and then follow a solution similar to what Bharath suggested:

median = float(df['Age'].median())
df["Age"] = np.where(df["Age"] > median, median, df['Age'])

Solution 2:[2]

you code is almost right , but their is a gap.
use:

df['age']=df['age'].replace(df.age>75,0,inplace=True)

Solution 3:[3]

Actually, this is not an efficient way to deal with outliers in data.

You can refer to this article https://www.kite.com/python/answers/how-to-remove-outliers-from-a-pandas-dataframe-in-python

By calculating z scores for a column or entire dataset you can replace outliers with dynamic and mathematical calculations.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Behnam Moh
Solution 2 S.B
Solution 3 K_one28