'How to extract other data of outlier that is specified with that outlier in box plot in python?

this is the my pandas data frame:

Datetime SN NO. Values data1 data2 data3 data4 data5 data6
2020-09-29T14:59:13.4461479+02:00 701 24.511 3.556 3.557 3.555 3.551 3.559 3.555
2020-09-29T15:48:04.6368679+02:00 702 24.516 3.554 3.555 3.555 3.556 3.552 3.557
2020-09-29T15:51:46.2555875+02:00 703 24.517 3.553 3.556 3.551 3.553 3.558 3.554
2020-10-01T12:51:59.2687665+02:00 704 24.519 3.552 3.557 3.556 3.559 3.557 3.557
2021-02-01T19:27:09.0472459+02:00 705 24.511 3.551 3.558 3.558 3.550 3.551 3.552
. . . . . . . . .
boxplot = df.reset_index().boxplot(column=['Values'], by = "Datetime", return_type=None)
from matplotlib.cbook import boxplot_stats
outliers = [y for stat in boxplot_stats(df['Values']) for y in stat['fliers']]
print(outliers)
boxplot.plot()
plt.show()

[sorry for inconvenience this picture was deleted]

as shown in the box plot, there is some outlier but I want to extract other data which is included in the row with that specific values. (by example: one outlier is 24.519 from the data frame but I also need other data such as SN no. and data1, data2, data3, and so on for specific values. what is the best way to do it?



Solution 1:[1]

To get a DF with all the outliers:

df_outliers = df.loc[df['Values'].isin(outlier_values), :]

To get only one row:

df_outliers = df.loc[df['Values'].eq(single_value), :]

If you have multiple rows with the same Value it will find all of them.

To keep only some columns from the original df:

cols = ['data1', 'data2']
df_outliers = df.loc[df['Values'].isin(outlier_values), cols]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1