'Intra-cluster for custom k-means
I'm stuck trying to implement and plot in python the intra-cluster of each cluster in k-means to get best number of k. Which is represented using this formula 
- Which is the sum of the square distances of data points which belong to a certain cluster from the centroid and normalized by the size of the cluster Ck.
- Then we can compute the intra cluster variance for all clusters by just adding up the individual cluster or specific variances using this formula:
- Can I get help implementing Wk and W?
- The custom k-mean implementaion:
def kmeans(X, k):
iterations=0
data = pd.DataFrame(X)
cluster = np.zeros(X.shape[0])
#taking random samples from the datapoints as an initialization of centroids
centroids = data.sample(n=k).values
while True:
# for each observation
for i, row in enumerate(X):
mn_dist = float('inf')
# distance of the point from all centroids
for idx, centroid in enumerate(centroids):
# calculating euclidean distance
d = np.sqrt((centroid[0]-row[0])**2 + (centroid[1]-row[1])**2)
# assign closest centroid
if mn_dist > d:
mn_dist = d
cluster[i] = idx
#updating centroids by taking the mean value of all datapoints of each cluster
new_centroids = pd.DataFrame(X).groupby(by=cluster).mean().values
iterations+=1
# if centroids are same then break.
if np.count_nonzero(centroids-new_centroids) == 0:
break
else: #else update old centroids with new ones
centroids = new_centroids
return centroids, cluster, iterations
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

