'How to run cor.test() on two different dataframes

I would like to run cor.test() on two seperate dataframes but I am unsure how to proceed.

I have two example dataframes with identical columns (patients) but differing rows (bacteria and genes, respectively):

1C 1L 2C 2L
Staphylococcus 10 400 20 600
Enterococcus 15 607 39 800
1C 1L 2C 2L
IL4 60 300 90 450
IL8 30 600 54 750
TNFA 89 450 96 600

I want to run a spearman correlation test between both dataframes to identify if bacterial counts (abundance) are associated with increased expression of genes. So basically I want to test all bacteria against all genes.

I have tried running:

cor.test(df1, df2, method = "spearman", alternative = c("two.sided"))

But I get this error:

Error in cor.test.default(df1, df2, method = "spearman",  : 
  'x' and 'y' must have the same length


Solution 1:[1]

I think the issue you are having is trying to run a correlation on three variables when the function takes x and y vectors of the same length.

In order to compare all genes to all bacteria counts across subjects you have to get them into a tabular format the function can work with. You can use pivot_longer() from tidyr for that and then merge to join on subject.

Bacteria <- data.frame(name=c("Staph", "Enter"), C1=c(10,15), L1=c(400,607), C2=c(20,39), L2=c(600, 800))
Genes <- data.frame(name=c("IL4", "IL8", "TNFA"), C1=c(60,30,89), L1=c(300,600,450), C2=c(90,54,96), L2=c(450,750,600))

Bacteria <- pivot_longer(Bacteria, -1, names_to = "Subject", values_to="Counts")
Genes <- pivot_longer(Genes, -1, names_to = "Subject", values_to="Counts")

FullSet <- merge(Bacteria, Genes, by="Subject", suffixes = c(".Bac", ".Gene"))

cor.test(FullSet$Counts.Bac, FullSet$Counts.Gene, method="spearman", alternative=c("two.sided"))

Edit to create a nice looking corrplot with p-value matrix

library(tidyverse)
library(tidyr)

MakeStats <- function(x) {

result <- cor.test(x$Counts.Bac, x$Counts.Gene, method="spearman", alternative=c("two.sided"))
return(data.frame(Bacteria=x$name.Bac[1], Gene=x$name.Gene[1],    Estimate=result$estimate[1], PValue=result$p.value, row.names=NULL))
}

ListOfTests <- split(FullSet, list(FullSet$name.Bac, FullSet$name.Gene))
Results <- bind_rows(lapply(ListOfTests, MakeStats))
PValues <- Results[,-3]
Estimates <- Results[,-4]
Estimates <- pivot_wider(Estimates, id_cols="Gene", names_from="Bacteria", values_from="Estimate")
PValues <- pivot_wider(PValues, id_cols="Gene", names_from="Bacteria", values_from="PValue")

EstMatrix <- as.matrix(data.frame(Estimates[-1], row.names = Estimates$Gene))
PMatrix <- as.matrix(data.frame(PValues[-1], row.names = PValues$Gene))

corrplot(EstMatrix, method="square", p.mat = PMatrix, pch=8)

Corrplot with PValues

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1