'String clustering in Python
I have a list of strings and I want to classify it by using clustering in Python.
list = ['String1', 'String2', 'String3',...]
I want to use Levenshtein distance, so I used jellyfish library. Given two strings, I know that their distance can be found this way:
jellyfish.levenshtein_distance('string1', 'string2')
My problem is that I don't know how to use scipy.cluster.hierarchy to get a list in Python of each cluster. I have also tried using linkage function:
linkage(y[, method, metric])
But I am not able to get the final list with clusters.
Solution 1:[1]
After using linkage for implementing hierarchical clustering on the distance you have, you should use cluster.hierarchy.cut_tree to cut the tree.
If you want two clusters:
cluster.hierarchy.cut_tree(linkage_output,2).ravel() #.ravel makes it 1D array.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | MattDMo |
