'Visualization of K-Means Clustering of multiple columns

Hello Community , I need help regarding how to apply KNN clustering on this use case.

I have a dataset consisting (27884 ROWS, 8933 Columns)

Here's a little preview of a dataset

user_iD	b1	b2	b3	b4	b5	b6	b7	b8	b9	b10	b11
1	1	7	2	3	8	0	4	0	6	0	5
2	7	8	1	2	4	6	5	9	10	3	0
3	0	0	0	0	1	5	2	3	4	0	6
4	1	7	2	3	8	0	5	0	6	0	4
5	0	4	7	0	6	1	5	3	0	0	2
6	1	0	2	3	0	5	4	0	0	6	7

Here the column userid represents: STUDENTS and columns b1-b11: They represent Book Chapters and the sequence of each student that which chapter he/she studied first then second then third and so on. the 0 entry tells that the student did not study that particular chapter.

This is just a small preview of a big dataset. There are a total of 27884 users and 8932 Chapters stated as (b1--b8932)

Here's the complete dataset shape information

I'm Applying KMEANS CLUSTERING. How do I visualize all the clusters using all the columns

As I stated there are 27844 users & 8932 other columns I have achieved by just using user_iD & b1 column only. How do I take all the columns at once?

What I have tried so far

#Build and train the model
from sklearn.cluster import KMeans
model = KMeans(n_clusters=5)
model.fit(df3)

#See the predictions
model.labels_
model.cluster_centers_

#PLot the predictions against the original data set
fig = plt.figure(figsize=(6, 6))
#ax = fig.add_subplot(111)
plt.scatter(df3['user_iD'], df3['b1'],cmap='rainbow',
           linewidths=1, alpha=.7,
           edgecolor='k'
           )
plt.show()

This gives me clustering visualization based on a single column.

Solution 1:^[1]

Well, you cannot do it directly if you have more than 3 columns. However, you can apply a Principal Component Analysis to reduce the space in 2 columns and visualize this instead.

pca_num_components = 2

reduced_data = PCA(n_components=pca_num_components).fit_transform(df3.iloc[:,1:12])
results = pd.DataFrame(reduced_data,columns=['pca1','pca2'])

sns.scatterplot(x="pca1", y="pca2", hue=df3['clusters'], data=results)
plt.title('K-means Clustering with 2 dimensions')
plt.show()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	SAL

'Visualization of K-Means Clustering of multiple columns

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]