'Scikit HDBSCAN *tree* labeling (not single-slice labeling)
BLUF: For a specific epsilon (or for HDBSCAN's 'favorite' epsilon), I can extract the mapping of my data in that epsilon's partition. But how can I see my data's full tree membership?
I've gotten a ton out of the terrific tutorial here. In scikit learn's HDBSCAN, I can use clusterer.labels to see the best epsilon's partition labels. And I can use clusterer.single_linkage_tree_.get_clusters(0.023, min_cluster_size=2) to see the an arbitrary epsilon's partition labels. I can even plot the entire dendogram using clusterer.condensed_tree_.plot(). But how do I see the dendogram's labels for individual datapoints?
For Example: It's nice that my pets' names are {Spot, Felix, Nemo, Fido, Tigger}. Or the species are {Dog, Cat, Guppy, Dog, Cat}. But I'd like one output that tells me:
| Spot | Dog | Mammal | Animal |
| Felix | Cat | Mammal | Animal |
| Nemo | Guppy | Fish | Animal |
| Fido | Dog | Mammal | Animal |
| Tigger | Cat | Mammal | Animal |
With this sort of output, I could see precisely how related Spot and Felix are, instead of "Do they have the same species? Y/N?" "Do they have the same kingdom? Y/N?"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
