'Return Max Correlation and Row Name From Corr Matrix
I am trying to find the maximum correlation in each column of a data.frame object by using the cor function. Let's say this object looks like
A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)
M <- data.frame(A,B,C,D,E)
N <- cor(M)
And the correlation matrix looks like
>N
A B C D E
A 1.000000000 0.02676645 0.000462529 0.026875495 -0.054506842
B 0.026766455 1.00000000 -0.150622473 0.037911600 -0.071794930
C 0.000462529 -0.15062247 1.000000000 0.015170017 0.026090225
D 0.026875495 0.03791160 0.015170017 1.000000000 -0.001968634
E -0.054506842 -0.07179493 0.026090225 -0.001968634 1.000000000
In the case of the first column (A) I'd like R to return to me the value "D" since it's the maximum non-negative, non-"1" value in column A, along with it's associated correlation.
Any ideas?
Solution 1:[1]
The column numbers are
(n <- max.col(`diag<-`(N,0)))
# [1] 4 4 5 2 3
The names are
colnames(N)[n]
# [1] "D" "D" "E" "B" "C"
The values are
N[cbind(seq_len(nrow(N)),n)]
# [1] 0.02687549 0.03791160 0.02609023 0.03791160 0.02609023
Solution 2:[2]
Use apply on rows to get the max of the row for values less than one. Then use which to get the column index and then use the colNames to get the actual letters...
set.seed(9)
A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)
M <- data.frame(A,B,C,D,E)
N <- cor(M)
N
A B C D E
A 1.000000000 0.005865532 0.03595202 0.28933634 0.00795076
B 0.005865532 1.000000000 0.13483843 0.04252079 -0.09567275
C 0.035952017 0.134838434 1.00000000 -0.01160411 0.02588474
D 0.289336335 0.042520787 -0.01160411 1.00000000 -0.12054680
E 0.007950760 -0.095672747 0.02588474 -0.12054680 1.00000000
colnames(N)[apply(N, 1, function (x) which(x==max(x[x<1])))]
[1] "D" "C" "B" "A" "C"
Solution 3:[3]
The corrr package gives a simple way to do it.
library(corrr)
library(dplyr)
set.seed(9)
A <- rnorm(100, 5, 1)
B <- rnorm(100, 6, 1)
C <- rnorm(100, 7, 4)
D <- rnorm(100, 4, 2)
E <- rnorm(100, 4, 3)
M <- data.frame(A, B, C, D, E)
N <- corrr::correlate(M)
print(N)
# # A tibble: 5 x 6
# term A B C D E
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A NA 0.00587 0.0360 0.289 0.00795
# 2 B 0.00587 NA 0.135 0.0425 -0.0957
# 3 C 0.0360 0.135 NA -0.0116 0.0259
# 4 D 0.289 0.0425 -0.0116 NA -0.121
# 5 E 0.00795 -0.0957 0.0259 -0.121 NA
head(dplyr::arrange(corrr::stretch(N, remove.dups = TRUE), desc(r)), 3)
# # A tibble: 3 x 3
# x y r
# <chr> <chr> <dbl>
# 1 A D 0.289
# 2 B C 0.135
# 3 B D 0.0425
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | A. Webb |
| Solution 2 | cory |
| Solution 3 | Zettsu Tatsuya |
