'Clustering Loop Python

I am clustering a data set in python using kmeans. Before I clustered the data set, I determined the optimal number of clusters using an elbow curve.

The optimal number of clusters was 5. So after kmeans clustered the dataset, I had 5 different clusters.

So here’s my question. Now that I have 5 different clusters, I would like to cluster those 5 clusters again so that I can get smaller clusters. Once I have smaller clusters for each one of those 5 clusters, I would like to cluster those smaller clusters again. I would like to repeat this until I have only about 20 points in each cluster. The dataset has 1,000,000 + observations.

What is the best way to do this? Is there a way to build a clustering loop? Is there a completely different better way to do this? I know this isn’t a specific coding question, but I’d love to hear some thoughts.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source