'visualize the data points clustering output using python

I have dataframe df1, df1 containing the topics, with their cluster

Here is a sample input dataframe:

df1 = pd.DataFrame({'topics':['algebra', 'calculus', 'number theory', 'atom', 'chemical reaction', 'geometry',
                             'Linear Algebra-Advanced', 'evolution', 'botany',  'electricity', 'quantum',
                             'zoology', 'mechanics','Differential Equations', 'Electric Charges and Fields'],    
                    'cluster':[0, 0, 0, 1, 1, 0,0,  2, 2, 3, 3, 2, 3, 0, 3]
                   })

I Visualize clusters with elements as we can see in the following output:

#set up colors per clusters using a dict
cluster_colors = {0: '#1b9e77', 1: '#d95f02', 2: '#7570b3', 3: '#EEC900'}

#set up cluster names using a dict
cluster_names = {0: 'Math courses', 
                 1: 'Chemistry courses', 
                 2: 'Biology courses', 
                 3: 'Physics courses'}
xs, ys = df1['topics'], df1['cluster']

df = pd.DataFrame(dict(x=xs, y=ys)) 

#group by cluster
groups = df.groupby('y')

# set up plot
fig, ax = plt.subplots(figsize=(22, 14)) # set size
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling

#iterate through groups to layer the plot
#note that I use the cluster_name and cluster_color dicts with the 'name' lookup to return the appropriate color/label
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=14, 
            label=cluster_names[name], color=cluster_colors[name], 
            mec='none')
    ax.set_aspect('auto')
    ax.tick_params(\
        axis= 'x',          # changes apply to the x-axis
        which='both',      # both major and minor ticks are affected
        bottom='off',      # ticks along the bottom edge are off
        top='off',         # ticks along the top edge are off
        labelbottom='off')
    ax.tick_params(\
        axis= 'y',         # changes apply to the y-axis
        which='both',      # both major and minor ticks are affected
        left='off',      # ticks along the bottom edge are off
        top='off',         # ticks along the top edge are off
        labelleft='off')
    
ax.legend(numpoints=1)  #show legend with only 1 point

    
plt.show() #show the plot

enter image description here

What I want is to plot the data points with labeled color by cluster using matplotlib and these data points are randomly distributed.

Expected output is

enter image description here



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source