'Excel create scatterplot by categorical variable

I'm trying to create a scatterplot in excel with the following data:

import pandas as pd   
import seaborn as sns   
import numpy as np   

df = pd.DataFrame(data=np.random.random((100, 2)), columns=['X', 'Y'])
df['category'] = np.random.choice(['groupA', 'groupB', 'groupC'], size=len(df))

df.head()

    X           Y           category
0   0.407807    0.269535    groupB
1   0.658499    0.671735    groupB
2   0.071938    0.199920    groupC
3   0.197185    0.358241    groupA
4   0.033922    0.917605    groupB

In seaborn I would just do something like:

sns.scatterplot(x='X', y='Y', data=df, hue='category')

But I am unable to create this is in excel. Do I really have to create a new column for each category?

excel plot

Solution 1:^[1]

Not sure if there is a better way to do this in Excel but you need to resume your data using a Pivot Table, and then copy/paste the output as range somewhere else and then do your scatterplot (You can't do an scatterplot directly taking a Pivot Table as a source):

The setup of my Pivot Table is:

Field X to rows section
Field Category to Columns section
Field Y to values section, set it to Sum

Anyways this method got an issue: If there are 2 or more points in same category with same X value but different Y values, then Pivot Table will sum them up and that's an error.

What you need to create the scatterlot is a source with the structure showed in step 3. It there is any kind od semiduplicates as mentioned before, I'm afraid you'll need to construct manually (or VBA code) your dataframe with this structure:

Do I really have to create a new column for each category? I'm afraid that yes, you do

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Excel create scatterplot by categorical variable

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]