'Intra-cluster for custom k-means

I'm stuck trying to implement and plot in python the intra-cluster of each cluster in k-means to get best number of k. Which is represented using this formula enter image description here

  • Which is the sum of the square distances of data points which belong to a certain cluster from the centroid and normalized by the size of the cluster Ck.
  • Then we can compute the intra cluster variance for all clusters by just adding up the individual cluster or specific variances using this formula:

enter image description here

  • Can I get help implementing Wk and W?
  • The custom k-mean implementaion:
def kmeans(X, k):
  iterations=0
  data = pd.DataFrame(X)
  cluster = np.zeros(X.shape[0])
  #taking random samples from the datapoints as an initialization of centroids
  centroids = data.sample(n=k).values 
  while True:
     # for each observation
     for i, row in enumerate(X):
         mn_dist = float('inf')
        # distance of the point from all centroids
         for idx, centroid in enumerate(centroids):
            # calculating euclidean distance 
            d = np.sqrt((centroid[0]-row[0])**2 + (centroid[1]-row[1])**2)
            # assign closest centroid
            if mn_dist > d:
               mn_dist = d
               cluster[i] = idx
     #updating centroids by taking the mean value of all datapoints of each cluster
     new_centroids = pd.DataFrame(X).groupby(by=cluster).mean().values 
     iterations+=1
     # if centroids are same then break.
     if np.count_nonzero(centroids-new_centroids) == 0:
        break
     else: #else update old centroids with new ones
        centroids = new_centroids
  return centroids, cluster, iterations


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source