'visualize the data points clustering output using python
I have dataframe df1, df1 containing the topics, with their cluster
Here is a sample input dataframe:
df1 = pd.DataFrame({'topics':['algebra', 'calculus', 'number theory', 'atom', 'chemical reaction', 'geometry',
'Linear Algebra-Advanced', 'evolution', 'botany', 'electricity', 'quantum',
'zoology', 'mechanics','Differential Equations', 'Electric Charges and Fields'],
'cluster':[0, 0, 0, 1, 1, 0,0, 2, 2, 3, 3, 2, 3, 0, 3]
})
I Visualize clusters with elements as we can see in the following output:
#set up colors per clusters using a dict
cluster_colors = {0: '#1b9e77', 1: '#d95f02', 2: '#7570b3', 3: '#EEC900'}
#set up cluster names using a dict
cluster_names = {0: 'Math courses',
1: 'Chemistry courses',
2: 'Biology courses',
3: 'Physics courses'}
xs, ys = df1['topics'], df1['cluster']
df = pd.DataFrame(dict(x=xs, y=ys))
#group by cluster
groups = df.groupby('y')
# set up plot
fig, ax = plt.subplots(figsize=(22, 14)) # set size
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
#iterate through groups to layer the plot
#note that I use the cluster_name and cluster_color dicts with the 'name' lookup to return the appropriate color/label
for name, group in groups:
ax.plot(group.x, group.y, marker='o', linestyle='', ms=14,
label=cluster_names[name], color=cluster_colors[name],
mec='none')
ax.set_aspect('auto')
ax.tick_params(\
axis= 'x', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
bottom='off', # ticks along the bottom edge are off
top='off', # ticks along the top edge are off
labelbottom='off')
ax.tick_params(\
axis= 'y', # changes apply to the y-axis
which='both', # both major and minor ticks are affected
left='off', # ticks along the bottom edge are off
top='off', # ticks along the top edge are off
labelleft='off')
ax.legend(numpoints=1) #show legend with only 1 point
plt.show() #show the plot
What I want is to plot the data points with labeled color by cluster using matplotlib and these data points are randomly distributed.
Expected output is
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|


