'Problem of understanding the graph seaborn.boxplot()

I don't understand the seaborn.boxplot() graph below.

data source for cvs file

The code is:

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('train.csv')
df.head()
plt.figure(figsize = (8,8))  
sns.color_palette("Paired")
sns.boxplot(x="Gender",y="Purchase", hue="Age", data=df, palette="Paired")
plt.legend(bbox_to_anchor=(1.05,1),loc=2, borderaxespad=0)
plt.grid(True)
plt.draw()

That produces:

enter image description here

df[(df.Gender == 'F') & (df.Age =='55+')].Purchase.describe()

That produces:

count     5083.000000
mean      9007.036199
std       4801.556874
min         12.000000
25%       6039.500000
50%       8084.000000
75%      10067.000000
max      23899.000000
Name: Purchase, dtype: float64

I find some values but not all. For example, I do not see the maximum. But most of all, I don't understand these clusters of black dots that I circled in red on the graph. I don't know what they correspond to. Do you have any idea what they represent?



Solution 1:[1]

As Johann C has indicated, the whiskers are 1.5 times the interquartile range (the values from 25 to 75% i.e. cover the middle 50% of the values). The values outside of this interquartile range are known as outliers and this is what is being represented when you are labelling by ???. In theory the whiskers would be equal length from top and bottom of the interquartile box but as the min value is 12 the whiskers are cut off here. From the looks of it, it suggests that you have a right skew distribution.

Solution 2:[2]

From what it looks, these are outliers which are so numerous they overlap. You might thus want to check if you're actually dealing with two separate populations whose samples have been thrown together, or a bimodal distribution as such. Both deserves investigation IMO. However, that'd be better discussed in a statistics channel (it's not specific to seaborn).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rufus
Solution 2 I_O