'Elbow Method for optimal no. of clusters

I have a dataset that I am analysing to find the optimal number of clusters using k-means.

I am testing the number of clusters from [1..11] - which produces the following plot:

enter image description here

The original dataset has six classes but the elbow plot shows the bend really occurring at 3 clusters. For curiosity I overlaid a line on the plot from 11 clusters and back and it is almost a straight line to 6 clusters - which indicates to me that the real elbow is at 6, but it is subtle to see.

So, visually 3 looks to be the right answer, but given the known number of classes (6) the straight line I drew indicates 6...

Question:

  • How should you correctly interpret an elbow plot like this (especially when you are not given the classes)?
  • Would you say the elbow is at 3 or 6?


Solution 1:[1]

Based on the plot I'd say that there are 6 clusters. From my experience and intuition, I believe it makes sense to say that the "elbow" is where the "within cluster sum of squares" begins to decrease linearly.

However, for cluster validation, I recommend using silhouette coefficients as the "right answer" is objectively obtained. In addition, the silhouette coefficients takes the separation of clusters into account as well.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Naphat Amundsen