'Association data frame in R

I am trying to do an association matrix between viruses and their hosts. I have a data frame that contains 2 columns (pathogen and host) like this one:

pathogen <- c("A_virus", "B_virus","B_virus", "C_virus","C_virus", "D_virus", "D_virus")
host <- c("Human", "Human","Dog", "Lion", "Human", "Gorilla", "Dog")
FoundIn <- data.frame(pathogen,host)

FoundIn

    pathogen  host
[1] A_virus   Human 
[2] B_virus   Human
[3] B_virus   Dog
[4] C_virus   Lion
[5] C_virus   Human
[6] D_virus   Gorilla
[7] D_virus   Dog

I would like to have a dataframe that contains the association as 1 and no-association as 0, like this:

         Human  Dog  Lion  Gorilla  
A_virus   1      0     0      0   
B_virus   1      1     0      0  
C_virus   1      0     1      0  
D_virus   0      1     0      1   

Is there a simple way to do this?



Solution 1:[1]

Use xtabs:

xtabs(~ pathogen + host, data = FoundIn)
#          host
# pathogen  Dog Gorilla Human Lion
#   A_virus   0       0     1    0
#   B_virus   1       0     1    0
#   C_virus   0       0     1    1
#   D_virus   1       1     0    0

or

table(FoundIn$pathogen, FoundIn$host) # same output

Note that this is not a data.frame, it is class table. In order to make that format into a data.frame, you would have to use row-names. That's certainly feasible,

tbl <- xtabs(~ pathogen + host, data = FoundIn)
class(tbl) <- "matrix"
as.data.frame(tbl)
#         Dog Gorilla Human Lion
# A_virus   0       0     1    0
# B_virus   1       0     1    0
# C_virus   0       0     1    1
# D_virus   1       1     0    0

but know that many tools (especially dplyr and other packages in the tidyverse meta-package) ignore and sometimes intentionally remove row-names, so it is often discouraged to use them, instead recommending moving them to an explicit column (e.g., with tibble::rownames_to_column, easy enough in base R too).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 r2evans