'Ordering colors on colored bar for dendrogram in R
The vignette for the R package dendextend (https://cran.r-project.org/web/packages/dendextend/vignettes/dendextend.html) gives an example of using the colored_bars function with cutreeDynamic from package dynamicTreeCut as follows:
# let's get the clusters
library(dynamicTreeCut)
data(iris)
x <- iris[,-5] %>% as.matrix
hc <- x %>% dist %>% hclust
dend <- hc %>% as.dendrogram
# Find special clusters:
clusters <- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
# we need to sort them to the order of the dendrogram:
clusters <- clusters[order.dendrogram(dend)]
clusters_numbers <- unique(clusters) - (0 %in% clusters)
n_clusters <- length(clusters_numbers)
library(colorspace)
cols <- rainbow_hcl(n_clusters)
true_species_cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
dend2 <- dend %>%
branches_attr_by_clusters(clusters, values = cols) %>%
color_labels(col = true_species_cols)
plot(dend2)
clusters <- factor(clusters)
levels(clusters)[-1] <- cols[-5][c(1,4,2,3)]
# Get the clusters to have proper colors.
# fix the order of the colors to match the branches.
colored_bars(clusters, dend, sort_by_labels_order = FALSE)
The following line reorders the colors to match the branches:
levels(clusters)[-1] <- cols[-5][c(1,4,2,3)]
I wish to apply this method to my own data which has many more clusters, but I am unclear on how the revised ordering of the colors was determined. This example uses a custom ordering for the iris data. Can anyone explain how this order was determined and is there a way to automate this?
Solution 1:[1]
Just for starters, your example code above from the data(iris)was missing two necessary packages, library(dplyr) to be able to use the pipe command %>% and library(dendextend) for the label colors, from color_lables()
In order to answer your question, solution can be found in the levels(clusters)[-1] <- cols[-5][c(1,4,3,2)] section of code. As you mention, this is custom to this specific dataset, but I am unaware of why the authors picked this specific order. If you do not set the order, and want R to automatically do it, than in the colored_bars() command, the sort_by_labels_order=TRUE must be set. Here, it is set to FALSE since the authors use a custom order.
If it is set to TRUE, than I cite directly from R "the colors vector/matrix should be provided in the order of the original data order (and it will be re-ordered automatically to the order of the dendrogram)". For more information, see ?colored_bars()
This will show you the difference betweeen the two parameters, when set to FALSE or TRUE.
# let's get the clusters
library(dynamicTreeCut)
library(dplyr)
data(iris)
x <- iris[,-5] %>% as.matrix
hc <- x %>% dist %>% hclust
dend <- hc %>% as.dendrogram
# Find special clusters:
clusters <- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
# we need to sort them to the order of the dendrogram:
clusters <- clusters[order.dendrogram(dend)]
clusters_numbers <- unique(clusters) - (0 %in% clusters)
n_clusters <- length(clusters_numbers)
library(colorspace)
library(dendextend)
cols <- rainbow_hcl(n_clusters)
true_species_cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
dend2 <- dend %>%
branches_attr_by_clusters(clusters, values = cols) %>%
color_labels(col = true_species_cols)
clusters <- factor(clusters)
levels(clusters)[-1] <- cols[-5][c(1,4,2,3)]
plot(dend2);colored_bars(clusters, dend, sort_by_labels_order = FALSE)
# here R automatically assigned the colors
plot(dend2);colored_bars(clusters, dend, sort_by_labels_order = TRUE)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dharman |
