'Intra-cluster for custom k-means

I'm stuck trying to implement and plot in python the intra-cluster of each cluster in k-means to get best number of k. Which is represented using this formula

Which is the sum of the square distances of data points which belong to a certain cluster from the centroid and normalized by the size of the cluster Ck.
Then we can compute the intra cluster variance for all clusters by just adding up the individual cluster or specific variances using this formula:

Can I get help implementing Wk and W?
The custom k-mean implementaion:

def kmeans(X, k):
  iterations=0
  data = pd.DataFrame(X)
  cluster = np.zeros(X.shape[0])
  #taking random samples from the datapoints as an initialization of centroids
  centroids = data.sample(n=k).values 
  while True:
     # for each observation
     for i, row in enumerate(X):
         mn_dist = float('inf')
        # distance of the point from all centroids
         for idx, centroid in enumerate(centroids):
            # calculating euclidean distance 
            d = np.sqrt((centroid[0]-row[0])**2 + (centroid[1]-row[1])**2)
            # assign closest centroid
            if mn_dist > d:
               mn_dist = d
               cluster[i] = idx
     #updating centroids by taking the mean value of all datapoints of each cluster
     new_centroids = pd.DataFrame(X).groupby(by=cluster).mean().values 
     iterations+=1
     # if centroids are same then break.
     if np.count_nonzero(centroids-new_centroids) == 0:
        break
     else: #else update old centroids with new ones
        centroids = new_centroids
  return centroids, cluster, iterations

python k-means

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Intra-cluster for custom k-means

Sources

Related Questions