'Drop outliers from a column that contains a string and then integers
I am having issues dropping outliers from a certain column in a dataframe. I have a massive dataframe but the most important column looks like this:
| Weight |
|---|
| In kilograms |
| 10000000 |
| 70 |
| 92 |
| ... |
I am trying to drop rows depending on outliers in this column, i.e. if the weight is an outlier then the whole row is dropped. The issue is whenever I use this code:
col=(dataframe['Weight'].drop(index=0))
col=col.apply(pd.to_numeric)
Q1=col.quantile(0.25)
Q2=col.quantile(0.75)
IQR=Quart3-Quart1
dataframe=dataframe[col>(Q1-(1.5*IQR))]
dataframe=dataframe[col<(Q3+(1.5*IQR))]
I get the following error caused by the 2nd line:
ValueError: Unable to parse string "In kilograms" at position 0
When I remove the second line I still get an error because even though I have dropped the first row for some reason it still is using the first row:
TypeError: '<' not supported between instances of 'int' and 'str'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
