'Create one boxplot per cluster for each column of information for a dataframe

Let it be the following Python Panda DataFrame:

value other_value cluster
1382 2.1 0
10 3.9 1
104 5.9 1
82 -1.1 0
100 0.9 2
1003 0.85 2
232 4.1 0
19 0.6 3
1434 0.3 3
23 1.6 3

Using the seaborn module, I want to display a set of boxplots for each column of values, showing the comparative information per value of the cluster column.

That is, for the above DataFrame, it would show a first graph for the 'value' column with 4 boxplots, one for each cluster value. The second graph would include information for the 'other_value' column also showing 1 boxplot for each cluster.

My idea is to do the same, but instead of in R language, in python: Boxplots of different variables by cluster assigned on one graph in ggplot

My code, It only shows the 1 to 1 graphs, I would like to get a joint graph with all graphs applied, as in the link above:

sns.boxplot(y='value', x='cluster',
                 data=df, 
                 palette="colorblind",
                 hue='cluster')

Thanks for the help offered.



Solution 1:[1]

Most seaborn functions work best with the data in "long form".

Here is how the code could look like:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_html('https://stackoverflow.com/questions/72301993/')[0]
df_long = df.melt(id_vars='cluster', value_vars=df.columns[:-1], var_name='variable', value_name='values')

sns.catplot(kind='box', data=df_long, 
            col='variable', y='values', x='cluster', hue='cluster', palette="colorblind", sharey=False, colwrap=2)
plt.tight_layout()
plt.show()

sns.catplot kind=box with long dataframe

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 JohanC