'Excel create scatterplot by categorical variable
I'm trying to create a scatterplot in excel with the following data:
import pandas as pd
import seaborn as sns
import numpy as np
df = pd.DataFrame(data=np.random.random((100, 2)), columns=['X', 'Y'])
df['category'] = np.random.choice(['groupA', 'groupB', 'groupC'], size=len(df))
df.head()
X Y category
0 0.407807 0.269535 groupB
1 0.658499 0.671735 groupB
2 0.071938 0.199920 groupC
3 0.197185 0.358241 groupA
4 0.033922 0.917605 groupB
In seaborn I would just do something like:
sns.scatterplot(x='X', y='Y', data=df, hue='category')
But I am unable to create this is in excel. Do I really have to create a new column for each category?
Solution 1:[1]
Not sure if there is a better way to do this in Excel but you need to resume your data using a Pivot Table, and then copy/paste the output as range somewhere else and then do your scatterplot (You can't do an scatterplot directly taking a Pivot Table as a source):
The setup of my Pivot Table is:
- Field
Xto rows section - Field
Categoryto Columns section - Field
Yto values section, set it toSum
Anyways this method got an issue: If there are 2 or more points in same category with same X value but different Y values, then Pivot Table will sum them up and that's an error.
What you need to create the scatterlot is a source with the structure showed in step 3. It there is any kind od semiduplicates as mentioned before, I'm afraid you'll need to construct manually (or VBA code) your dataframe with this structure:
Do I really have to create a new column for each category? I'm afraid that yes, you do
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |




