'Python / SciPy Ward Clustering: Semipartial R-Squared as height of dendrogram

Is there a way to use semipartial R-Squared as height in dendrogram tree in SciPy?

Using SAS for a long time I am now recoding stuff to Python. Ward Clustering is available in SAS and in SciPy, but implementation differ. In SAS I like to have the height of the dendrogram tree as Semipartial R Squared. I was not able to find this function in SciPy Ward Clustering. In SciPy distances are used for Dendrogram heights.

I rebuild the example from SAS to compare results. Everything is fine. I just want to have semipartial r squared as height in the tree because I think it is "nicer" for interpretation.

See SAS-Documentation on Page 49 for example: https://support.sas.com/documentation/onlinedoc/stat/142/cluster.pdf

See image of SAS-Example here

Here is my code:

from scipy.cluster.hierarchy import ward, fcluster, dendrogram
from scipy.spatial.distance import pdist, squareform

# Distance in miles between cities, Source: SAS Sample Data (https://v8doc.sas.com/sashtml/stat/chap23/sect24.htm)
cities = ["Atlanta", "Chicago", "Denver", "Houston", "Los Angeles", "Miami", "New York", "San Francisco", "Seattle", "Washington D.C."]
distances = np.array([[   0,  587, 1212,  701, 1936,  604,  748, 2139, 2182,  543],
                      [ 587,    0,  920,  940, 1745, 1188,  713, 1858, 1737,  597],
                      [1212,  920,    0,  879,  831, 1726, 1631,  949, 1021, 1494],
                      [ 701,  940,  879,    0, 1374,  968, 1420, 1645, 1891, 1220], 
                      [1936, 1745,  831, 1374,    0, 2339, 2451,  347,  959, 2300],
                      [ 604, 1188, 1726,  968, 2339,    0, 1092, 2594, 2734,  923], 
                      [ 748,  713, 1631, 1420, 2451, 1092,    0, 2571, 2408,  205], 
                      [2139, 1858,  949, 1645,  347, 2594, 2571,    0,  678, 2442], 
                      [2182, 1737, 1021, 1891,  959, 2734, 2408,  678,    0, 2329], 
                      [ 543,  597, 1494, 1220, 2300,  923,  205, 2442, 2329,    0]])

# Keep triangle
distances_triangle = squareform(distances)

# use squared distances
dist_squared = np.square(distances_triangle)

# Ward Clustering
Z = ward(dist_squared)

# Display Dendrogram
dendrogram(Z, orientation='left', labels=cities)

Thank you for your help.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source