'Pairwise Correlation Table
I'm new to R, so I apologize if this is a straightforward question, however I've done quite a bit of searching this evening and can't seem to figure it out. I've got a data frame with a whole slew of variables, and what I'd like to do is create a table of the correlations among a subset of these, basically the equivalent of "pwcorr" in Stata, or "correlations" in SPSS. The one key to this is that not only do I want the r, but I also want the significance associated with that value.
Any ideas? This seems like it should be very simple, but I can't seem to figure out a good way.
Solution 1:[1]
Here is something that I just made, I stumbled on this post because I was looking for a way to take every pair of variables, and get a tidy nX3 dataframe. Column 1 is a variable, Column 2 is a variable, and Column 3 and 4 are their absolute value and true correlation. Just pass the function a dataframe of numeric and integer values.
pairwiseCor <- function(dataframe){
pairs <- combn(names(dataframe), 2, simplify=FALSE)
df <- data.frame(Vairable1=rep(0,length(pairs)), Variable2=rep(0,length(pairs)),
AbsCor=rep(0,length(pairs)), Cor=rep(0,length(pairs)))
for(i in 1:length(pairs)){
df[i,1] <- pairs[[i]][1]
df[i,2] <- pairs[[i]][2]
df[i,3] <- round(abs(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]])),4)
df[i,4] <- round(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]]),4)
}
pairwiseCorDF <- df
pairwiseCorDF <- pairwiseCorDF[order(pairwiseCorDF$AbsCor, decreasing=TRUE),]
row.names(pairwiseCorDF) <- 1:length(pairs)
pairwiseCorDF <<- pairwiseCorDF
pairwiseCorDF
}
This is what the output is:
> head(pairwiseCorDF)
Vairable1 Variable2 AbsCor Cor
1 roll_belt accel_belt_z 0.9920 -0.9920
2 gyros_dumbbell_x gyros_dumbbell_z 0.9839 -0.9839
3 roll_belt total_accel_belt 0.9811 0.9811
4 total_accel_belt accel_belt_z 0.9752 -0.9752
5 pitch_belt accel_belt_x 0.9658 -0.9658
6 gyros_dumbbell_z gyros_forearm_z 0.9491 0.9491
Solution 2:[2]
I've found that the R package picante does a nice job dealing with the problem that you have. You can easily pass your dataset to the cor.table function and get a table of correlations and p-values for all of your variables. You can specify Pearson's r or Spearman in the function. See this link for help: http://www.inside-r.org/packages/cran/picante/docs/cor.table
Also remember to remove any non-numeric columns from your dataset prior to running the function. Here's an example piece of code:
install.packages("picante")
library(picante)
#Insert the name of your dataset in the code below
cor.table(dataset, cor.method="pearson")
Solution 3:[3]
You can use the sjt.corr function of the sjPlot-package, which gives you a nicely formatted correlation table, ready for use in your Office application.
Simplest function call is just to pass the data frame:
sjt.corr(df)
See examples here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user3728456 |
| Solution 2 | jscole |
| Solution 3 | Daniel |
