'Changing label names of Kmean clusters

I am doing the kmean clustering through sklearn in python. I am wondering how to change the generated label name for kmean clusters. For example:

data          Cluster
0.2344         1
1.4537         2
2.4428         2
5.7757         3

And I want to achieve to

data          Cluster
0.2344         black
1.4537         red
2.4428         red
5.7757         blue

I am not meaning to directly set1 -> black; 2 -> redby printing. I am wondering is it possible to set different cluster names in kmean clustering model in default.



Solution 1:[1]

No
There isn't any way to change the default labels.
You have to map them separately using a dictionary. You can take look at all available methods in the documentation here.
None of the available methods or attributes allows you to change the default labels.

Solution using dictionary:

# Code
a = [0,0,1,1,2,2]
mapping = {0:'black', 1:'red', 2:'blue'}
a = [mapping[i] for i in a]

# Output
['black', 'black', 'red', 'red', 'blue', 'blue']

If you change your data or number of clusters: First we will see the visualizations:
Code:
Importing and generating random data:

from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt

x = np.random.uniform(100, size =(10,2))

Applying Kmeans algorithm

kmeans = KMeans(n_clusters=3, random_state=0).fit(x)

Getting cluster centers

arr = kmeans.cluster_centers_

Your cluster centroids look like this:

array([[23.81072765, 77.21281171],
       [ 8.6140551 , 23.15597377],
       [93.37177176, 32.21581703]])

Here, 1st row is the centroid of cluster 0, 2nd row is centroid of cluster 1 and so on.

Visualizing centroids and data:

plt.scatter(x[:,0],x[:,1])
plt.scatter(arr[:,0], arr[:,1])

You get a graph that looks like this: My graph.

As you can see, you have access to centroids as well as training data. If your training data and number of clusters is constant these centroids dont really change.

But if you add more training data or more number of clusters then you will have to create new mapping according to the centroids that are generated.

Solution 2:[2]

check out the top response on this related post

sklearn doesn't include this functionality but you can map the values to your dataframe in a fairly straightforward manner.

current_labels = [1, 2, 3]
desired_labels = ['black', 'red', 'blue']
# create a dictionary for your corresponding values
map_dict = dict(zip(current_labels, desired_labels))
map_dict
>>> {1: 'black', 2: 'red', 3: 'blue'}

# map the desired values back to the dataframe
# note this will replace the original values
data['Cluster'] = data['Cluster'].map(map_dict)

# alternatively you can map to a new column if you want to preserve the old values
data['NewNames'] = data['Cluster'].map(map_dict)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 gojandrooo