'Return Max Correlation and Row Name From Corr Matrix

I am trying to find the maximum correlation in each column of a data.frame object by using the cor function. Let's say this object looks like

A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)


M <- data.frame(A,B,C,D,E)
N <- cor(M)

And the correlation matrix looks like

>N

             A           B            C            D            E
A  1.000000000  0.02676645  0.000462529  0.026875495 -0.054506842
B  0.026766455  1.00000000 -0.150622473  0.037911600 -0.071794930
C  0.000462529 -0.15062247  1.000000000  0.015170017  0.026090225
D  0.026875495  0.03791160  0.015170017  1.000000000 -0.001968634
E -0.054506842 -0.07179493  0.026090225 -0.001968634  1.000000000

In the case of the first column (A) I'd like R to return to me the value "D" since it's the maximum non-negative, non-"1" value in column A, along with it's associated correlation.

Any ideas?



Solution 1:[1]

The column numbers are

(n <- max.col(`diag<-`(N,0)))
# [1] 4 4 5 2 3

The names are

colnames(N)[n]
# [1] "D" "D" "E" "B" "C"

The values are

N[cbind(seq_len(nrow(N)),n)]
# [1] 0.02687549 0.03791160 0.02609023 0.03791160 0.02609023

Solution 2:[2]

Use apply on rows to get the max of the row for values less than one. Then use which to get the column index and then use the colNames to get the actual letters...

set.seed(9)
A <- rnorm(100,5,1)
B <- rnorm(100,6,1)
C <- rnorm(100,7,4)
D <- rnorm(100,4,2)
E <- rnorm(100,4,3)

M <- data.frame(A,B,C,D,E)
N <- cor(M)

N
            A            B           C           D           E
A 1.000000000  0.005865532  0.03595202  0.28933634  0.00795076
B 0.005865532  1.000000000  0.13483843  0.04252079 -0.09567275
C 0.035952017  0.134838434  1.00000000 -0.01160411  0.02588474
D 0.289336335  0.042520787 -0.01160411  1.00000000 -0.12054680
E 0.007950760 -0.095672747  0.02588474 -0.12054680  1.00000000

colnames(N)[apply(N, 1, function (x) which(x==max(x[x<1])))]
[1] "D" "C" "B" "A" "C"

Solution 3:[3]

The corrr package gives a simple way to do it.

library(corrr)
library(dplyr)
set.seed(9)
A <- rnorm(100, 5, 1)
B <- rnorm(100, 6, 1)
C <- rnorm(100, 7, 4)
D <- rnorm(100, 4, 2)
E <- rnorm(100, 4, 3)
M <- data.frame(A, B, C, D, E)
N <- corrr::correlate(M)

print(N)
# # A tibble: 5 x 6
#   term         A        B       C       D        E
#   <chr>    <dbl>    <dbl>   <dbl>   <dbl>    <dbl>
# 1 A     NA        0.00587  0.0360  0.289   0.00795
# 2 B      0.00587 NA        0.135   0.0425 -0.0957
# 3 C      0.0360   0.135   NA      -0.0116  0.0259
# 4 D      0.289    0.0425  -0.0116 NA      -0.121
# 5 E      0.00795 -0.0957   0.0259 -0.121  NA

head(dplyr::arrange(corrr::stretch(N, remove.dups = TRUE), desc(r)), 3)
# # A tibble: 3 x 3
#   x     y          r
#   <chr> <chr>  <dbl>
# 1 A     D     0.289
# 2 B     C     0.135
# 3 B     D     0.0425

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 A. Webb
Solution 2 cory
Solution 3 Zettsu Tatsuya