'How to extract other data of outlier that is specified with that outlier in box plot in python?
this is the my pandas data frame:
| Datetime | SN NO. | Values | data1 | data2 | data3 | data4 | data5 | data6 |
|---|---|---|---|---|---|---|---|---|
| 2020-09-29T14:59:13.4461479+02:00 | 701 | 24.511 | 3.556 | 3.557 | 3.555 | 3.551 | 3.559 | 3.555 |
| 2020-09-29T15:48:04.6368679+02:00 | 702 | 24.516 | 3.554 | 3.555 | 3.555 | 3.556 | 3.552 | 3.557 |
| 2020-09-29T15:51:46.2555875+02:00 | 703 | 24.517 | 3.553 | 3.556 | 3.551 | 3.553 | 3.558 | 3.554 |
| 2020-10-01T12:51:59.2687665+02:00 | 704 | 24.519 | 3.552 | 3.557 | 3.556 | 3.559 | 3.557 | 3.557 |
| 2021-02-01T19:27:09.0472459+02:00 | 705 | 24.511 | 3.551 | 3.558 | 3.558 | 3.550 | 3.551 | 3.552 |
| . | . | . | . | . | . | . | . | . |
boxplot = df.reset_index().boxplot(column=['Values'], by = "Datetime", return_type=None)
from matplotlib.cbook import boxplot_stats
outliers = [y for stat in boxplot_stats(df['Values']) for y in stat['fliers']]
print(outliers)
boxplot.plot()
plt.show()
[sorry for inconvenience this picture was deleted]
as shown in the box plot, there is some outlier but I want to extract other data which is included in the row with that specific values. (by example: one outlier is 24.519 from the data frame but I also need other data such as SN no. and data1, data2, data3, and so on for specific values. what is the best way to do it?
Solution 1:[1]
To get a DF with all the outliers:
df_outliers = df.loc[df['Values'].isin(outlier_values), :]
To get only one row:
df_outliers = df.loc[df['Values'].eq(single_value), :]
If you have multiple rows with the same Value it will find all of them.
To keep only some columns from the original df:
cols = ['data1', 'data2']
df_outliers = df.loc[df['Values'].isin(outlier_values), cols]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
