'Create one boxplot per cluster for each column of information for a dataframe
Let it be the following Python Panda DataFrame:
| value | other_value | cluster |
|---|---|---|
| 1382 | 2.1 | 0 |
| 10 | 3.9 | 1 |
| 104 | 5.9 | 1 |
| 82 | -1.1 | 0 |
| 100 | 0.9 | 2 |
| 1003 | 0.85 | 2 |
| 232 | 4.1 | 0 |
| 19 | 0.6 | 3 |
| 1434 | 0.3 | 3 |
| 23 | 1.6 | 3 |
Using the seaborn module, I want to display a set of boxplots for each column of values, showing the comparative information per value of the cluster column.
That is, for the above DataFrame, it would show a first graph for the 'value' column with 4 boxplots, one for each cluster value. The second graph would include information for the 'other_value' column also showing 1 boxplot for each cluster.
My idea is to do the same, but instead of in R language, in python: Boxplots of different variables by cluster assigned on one graph in ggplot
My code, It only shows the 1 to 1 graphs, I would like to get a joint graph with all graphs applied, as in the link above:
sns.boxplot(y='value', x='cluster',
data=df,
palette="colorblind",
hue='cluster')
Thanks for the help offered.
Solution 1:[1]
Most seaborn functions work best with the data in "long form".
Here is how the code could look like:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_html('https://stackoverflow.com/questions/72301993/')[0]
df_long = df.melt(id_vars='cluster', value_vars=df.columns[:-1], var_name='variable', value_name='values')
sns.catplot(kind='box', data=df_long,
col='variable', y='values', x='cluster', hue='cluster', palette="colorblind", sharey=False, colwrap=2)
plt.tight_layout()
plt.show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | JohanC |

