'Fill in a matrix based on whether the index rows are similar or different
I have a very large pairwise distance matrix in R. I'd like to code cell in the matrix based on whether the row/column names are the same or different.
On a smaller scale, the row/column names would be:
individuals <- c("apple", "pear", "apple", "cranberry", "peach", "apple")
I would like a matrix with 1 for each comparison involving apple, except for comparisons of apple to apple. That would look like:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "0" "1" "1" "1" "1" "1"
[2,] "1" "0" "1" "0" "0" "1"
[3,] "1" "1" "0" "1" "1" "1"
[4,] "1" "0" "1" "0" "0" "1"
[5,] "1" "0" "1" "0" "0" "1"
[6,] "1" "1" "1" "1" "1" "0"
I know I can achieve this by doing:
final.matrix <- matrix(nrow= length(individuals), ncol = length(individuals))
final.matrix[grep("apples", individuals),] <- 1
final.matrix[,grep("apples", individuals)] <- 1
diag(final.matrix) <- 0
final.matrix[is.na(final.matrix)] <- 0
But there's gotta be a cleaner/simpler way. What am I missing?
Additionally, this doesn't work when the row/column names are a tibble, which is how they are in reality. Suggestions for a solution that works with tibbles?
tibble_inds <- as_tibble(individuals)
grep("apple", tibble_inds)
# 1
Solution 1:[1]
It sounds like you want
outer(x, x, function(a, b) as.integer(a + b == 1L))
where
x <- tibble_inds[[1L]] == "apple"
if you accept only "apple" or
x <- grepl("apple", tibble_inds[[1L]])
if you accept any string having "apple" as a substring.
I am assuming that your character vector individuals is the first variable in tibble_inds. In this case, outer returns
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 1 0 1 1 0
## [2,] 1 0 1 0 0 1
## [3,] 0 1 0 1 1 0
## [4,] 1 0 1 0 0 1
## [5,] 1 0 1 0 0 1
## [6,] 0 1 0 1 1 0
for both choices of x. This result doesn't match yours, because your diag<- call misses [1,3], [3,1], [3,6], [6,3], [1,6], and [6,1].
Solution 2:[2]
Another possible solution:
individuals <- c("apple", "pear", "apple", "cranberry", "peach", "apple")
m <- matrix(0, length(individuals), length(individuals))
for (i in 1:length(individuals))
for (j in 1:length(individuals))
m[i, j] <- +(sum(c(individuals[i], individuals[j]) == "apple") == 1)
m
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 1 0 1 1 0
#> [2,] 1 0 1 0 0 1
#> [3,] 0 1 0 1 1 0
#> [4,] 1 0 1 0 0 1
#> [5,] 1 0 1 0 0 1
#> [6,] 0 1 0 1 1 0
Or replacing the nested for loop by a nested sapply:
m <- matrix(0, length(individuals), length(individuals))
sapply(1:length(individuals), \(i) sapply(1:length(individuals),
\(j) m[i,j] <- +(sum(c(individuals[i], individuals[j]) == "apple") == 1)))
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 1 0 1 1 0
#> [2,] 1 0 1 0 0 1
#> [3,] 0 1 0 1 1 0
#> [4,] 1 0 1 0 0 1
#> [5,] 1 0 1 0 0 1
#> [6,] 0 1 0 1 1 0
Solution 3:[3]
We can try outer like below
> x <- grepl("apple",individuals)
> +(outer(x, x, `+`) == 1)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 0 1 1 0
[2,] 1 0 1 0 0 1
[3,] 0 1 0 1 1 0
[4,] 1 0 1 0 0 1
[5,] 1 0 1 0 0 1
[6,] 0 1 0 1 1 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mikael Jagan |
| Solution 2 | |
| Solution 3 |
