'How to find the optimal number of clusters using k-prototype in python

I am trying to cluster some big data by using the k-prototypes algorithm. I am unable to use K-Means algorithm as I have both categorical and numeric data. Via k prototype clustering method I have been able to create clusters if I define what k value I want.

How do I find the appropriate number of clusters for this.?

Will the popular methods available (like elbow method and silhouette score method) with only the numerical data works out for mixed data?



Solution 1:[1]

You can use this code:

#Choosing optimal K
cost = []
for num_clusters in list(range(1,8)):
    kproto = KPrototypes(n_clusters=num_clusters, init='Cao')
    kproto.fit_predict(Data, categorical=[0,1,2,3,4,5,6,7,8,9])
    cost.append(kproto.cost_)

plt.plot(cost)

Source: https://github.com/aryancodify/Clustering

Solution 2:[2]

Yeah elbow method is good enough to get number of cluster. Because it based on total sum squared.

Solution 3:[3]

Most evaluation methods need a distance matrix.

They will then work with mixed data, as long as you have a distance function that helps solving your problem. But they will not be very scalable.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rohan Nadagouda
Solution 2 Jack shephard
Solution 3 Has QUIT--Anony-Mousse