'Distance between nodes and the centroid in a kmeans cluster?
Any option to extract the distance between the nodes and the centroid in a kmeans cluster.
I have done Kmeans clustering over an text embedding data set and I want to know which are the nodes that are far away from the Centroid in each of the cluster, so that I can check the respective node's features which is making a difference.
Thanks in advance!
Solution 1:[1]
If you are using Python and sklearn.
From here: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans
you can get labels_ and cluster_centers_.
Now, you determine the distance function that takes the vector of each node and its cluster center. Filter by labels_ and calculate distances for each point inside each label.
Solution 2:[2]
Kevin has a great answer above but I feel like it does not answer the question that is asked (Maybe I am reading this completely wrong). If you are trying to look at each individual cluster center and get the point in that cluster that is furthest from the center, you will need to use the cluster labels to get the distance of each point to the centroid of that cluster. The code above just finds the point in each cluster that is furthest from ALL other cluster centers (which is you can see in the picture, the points are always on the far side of the cluster away from the other 2 clusters). In order to look at the individual clusters you would need something like the following:
center_dists = np.array([X_dist[i][x] for i,x in enumerate(y)])
This will give you the distance of each point to the centroid of its cluster. Then by running almost the same code that Kevin has above, it will give you the point that is the furthest away in each cluster.
max_indices = []
for label in np.unique(kmeans.labels_):
X_label_indices = np.where(y==label)[0]
max_label_idx = X_label_indices[np.argmax(center_dists[y==label])]
max_indices.append(max_label_idx)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | avchauzov |
| Solution 2 |
