'String clustering in Python

I have a list of strings and I want to classify it by using clustering in Python.

list = ['String1', 'String2', 'String3',...]

I want to use Levenshtein distance, so I used jellyfish library. Given two strings, I know that their distance can be found this way:

jellyfish.levenshtein_distance('string1', 'string2')

My problem is that I don't know how to use scipy.cluster.hierarchy to get a list in Python of each cluster. I have also tried using linkage function:

linkage(y[, method, metric])

But I am not able to get the final list with clusters.



Solution 1:[1]

After using linkage for implementing hierarchical clustering on the distance you have, you should use cluster.hierarchy.cut_tree to cut the tree. If you want two clusters:

cluster.hierarchy.cut_tree(linkage_output,2).ravel() #.ravel makes it 1D array.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 MattDMo