'Threshold in hierarchical clustering
I have a dataset that generates the chart attached at the end of the post. I want to cluster the data. Visually I identify 4 different clusters. As this data may change and so the number of clusters I'm using a hierarchical clustering in which I don't specify in advance the number of clusters to do.
The code
thresh = 1.00
clusters = hcluster.fclusterdata(df[['time', 'mean']].to_numpy(), thresh, criterion="distance")
Produces more than 12000 different clusters while when thresh is 1.02 (an increase of 0.02 points) gives only 1 cluster.
Can someone help to understand where is the problem in my code and how should I do in order to identify the 4 data groups that it is possible to visually observe.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
