'Issue with corr.test() results

I am running corr.test() to look at potential correlations between genes and bacteria in a dataframe using this code:

spearman=cor.test(FullSet$counts.Bac, FullSet$counts.Gene, method="spearman", alternative=c("two.sided"))

My dataframe is structured as follows:

Subject name.Bac counts.Bac name.Gene counts.Gene
10C Finegoldia -2.07 CCL4 1.73
10C Finegoldia -2.07 CKAP4 6.7

In total my dataframe has approximately 4 million rows as I am testing about 2000 genes against 33 bacteria across 24 patients.

When I run the above code I get this as the results:

Spearman's rank correlation rho

data:  FullSet$counts.Bac and FullSet$counts.Gene
S = 1.1501e+19, p-value = 8.368e-09
alternative hypothesis: true rho is not equal to 0
sample estimates:
         rho 
-0.002845856 

However, I was aiming to get the results as a matrix with individual test results and p.values for each comparison so I could plot the results using corrplot(). What is the best way to do this?



Solution 1:[1]

Maybe something like the this?
Partition the data by name.Bac and name.Gene, and run the tests in a lapply loop. Then extract the relevant values with a sequence of sapply loops and form a results matrix with cbind.

sp <- split(FullSet, list(FullSet$name.Bac, FullSet$name.Gene))
spearman_list <- lapply(sp, \(x) {
  cor.test(x$counts.Bac, x$counts.Gene, data = x, method = "spearman", alternative = "two.sided")
})

stat <- sapply(spearman_list, `[[`, 'statistic')
pval <- sapply(spearman_list, `[[`, 'p.value')

cbind(stat, pval)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rui Barradas