'Python / SciPy Ward Clustering: Semipartial R-Squared as height of dendrogram
Is there a way to use semipartial R-Squared as height in dendrogram tree in SciPy?
Using SAS for a long time I am now recoding stuff to Python. Ward Clustering is available in SAS and in SciPy, but implementation differ. In SAS I like to have the height of the dendrogram tree as Semipartial R Squared. I was not able to find this function in SciPy Ward Clustering. In SciPy distances are used for Dendrogram heights.
I rebuild the example from SAS to compare results. Everything is fine. I just want to have semipartial r squared as height in the tree because I think it is "nicer" for interpretation.
See SAS-Documentation on Page 49 for example: https://support.sas.com/documentation/onlinedoc/stat/142/cluster.pdf
Here is my code:
from scipy.cluster.hierarchy import ward, fcluster, dendrogram
from scipy.spatial.distance import pdist, squareform
# Distance in miles between cities, Source: SAS Sample Data (https://v8doc.sas.com/sashtml/stat/chap23/sect24.htm)
cities = ["Atlanta", "Chicago", "Denver", "Houston", "Los Angeles", "Miami", "New York", "San Francisco", "Seattle", "Washington D.C."]
distances = np.array([[ 0, 587, 1212, 701, 1936, 604, 748, 2139, 2182, 543],
[ 587, 0, 920, 940, 1745, 1188, 713, 1858, 1737, 597],
[1212, 920, 0, 879, 831, 1726, 1631, 949, 1021, 1494],
[ 701, 940, 879, 0, 1374, 968, 1420, 1645, 1891, 1220],
[1936, 1745, 831, 1374, 0, 2339, 2451, 347, 959, 2300],
[ 604, 1188, 1726, 968, 2339, 0, 1092, 2594, 2734, 923],
[ 748, 713, 1631, 1420, 2451, 1092, 0, 2571, 2408, 205],
[2139, 1858, 949, 1645, 347, 2594, 2571, 0, 678, 2442],
[2182, 1737, 1021, 1891, 959, 2734, 2408, 678, 0, 2329],
[ 543, 597, 1494, 1220, 2300, 923, 205, 2442, 2329, 0]])
# Keep triangle
distances_triangle = squareform(distances)
# use squared distances
dist_squared = np.square(distances_triangle)
# Ward Clustering
Z = ward(dist_squared)
# Display Dendrogram
dendrogram(Z, orientation='left', labels=cities)
Thank you for your help.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
