'Problem of understanding the graph seaborn.boxplot()
I don't understand the seaborn.boxplot() graph below.
The code is:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('train.csv')
df.head()
plt.figure(figsize = (8,8))
sns.color_palette("Paired")
sns.boxplot(x="Gender",y="Purchase", hue="Age", data=df, palette="Paired")
plt.legend(bbox_to_anchor=(1.05,1),loc=2, borderaxespad=0)
plt.grid(True)
plt.draw()
That produces:
df[(df.Gender == 'F') & (df.Age =='55+')].Purchase.describe()
That produces:
count 5083.000000
mean 9007.036199
std 4801.556874
min 12.000000
25% 6039.500000
50% 8084.000000
75% 10067.000000
max 23899.000000
Name: Purchase, dtype: float64
I find some values but not all. For example, I do not see the maximum. But most of all, I don't understand these clusters of black dots that I circled in red on the graph. I don't know what they correspond to. Do you have any idea what they represent?
Solution 1:[1]
As Johann C has indicated, the whiskers are 1.5 times the interquartile range (the values from 25 to 75% i.e. cover the middle 50% of the values). The values outside of this interquartile range are known as outliers and this is what is being represented when you are labelling by ???. In theory the whiskers would be equal length from top and bottom of the interquartile box but as the min value is 12 the whiskers are cut off here. From the looks of it, it suggests that you have a right skew distribution.
Solution 2:[2]
From what it looks, these are outliers which are so numerous they overlap. You might thus want to check if you're actually dealing with two separate populations whose samples have been thrown together, or a bimodal distribution as such. Both deserves investigation IMO. However, that'd be better discussed in a statistics channel (it's not specific to seaborn).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rufus |
| Solution 2 | I_O |

