'Corrplot with a lot of variables

how could I remedy a corrplot with lots of variables. Because visually it's not appealing. Below is my code:

corrplot(cor(dataT[,c("D_Wavg_EASI_DENSITY_CWR_2", "D_Wavg_EASI_POP16_CWR_2", 
                                "D_Wavg_EASI_URBANPOP_P_CWR_2", "D_Wavg_EASI_RURALPOP_P_CWR_2",
                                "D_Wavg_EASI_MEDHHINC_CWR_2", "D_Wavg_EASI_ED_C_P_CWR_2",
                                "D_Wavg_EASI_WHCOLROCC_P_CWR_2", "D_Wavg_EASI_BLCOLROCC_P_CWR_2",
                                "D_Wavg_EASI_CARTHEFT_CWR_2", "D_Wavg_EASI_TOTCRIME_CWR_2",
                                "D_Wavg_EASI_MAXTEMP_CWR_2", "D_Wavg_EASI_MINTEMP_CWR_2",
                                "D_Wavg_EASI_RAINDAYS_CWR_2", "D_Wavg_EASI_SNOWDAYS_CWR_2",
                                "D_Wavg_EASI_ANNULRAIN_CWR_2", "D_Wavg_EASI_ANNULSNOW_CWR_2",
                                "D_Wavg_EASI_EASIWETHI_CWR_2", "D_Wavg_EASI_MED_INC_CWR_2",
                                "D_Wavg_EASI_PROPCRIME_CWR_2","D_Wavg_EASI_LARCENY_CWR_2",
                                "D_Wavg_EASI_BURGLARY_CWR_2","D_Wavg_EASI_ROBBERY_CWR_2")]))

And this is what the output looks like. The names of the variables overlap each other enter image description here

r


Solution 1:[1]

If you have a LOT of variables (e.g. 100) such that reading the individual row names is not feasible, one approach could be to label them with common groupings if that exists in your data. For me this has been useful for illustrating brain networks, where I have hundreds of brain regions but I would like to illustrate them as grouped by their network.

The key to this is to pipe your plot into the corrRect() function to draw rectangles on your plot. Also in your corrplot() function remove the messy rownames with: tl.pos = 'n'

cars <- cor(mtcars) #Get correlation matrix
rownames(cars) #List the rownames so you can choose where you would like to see the boundaries

r_horiz <- rbind( c('mpg', 'mpg', 'hp', 'carb'), #insert names as "Top left corner 1", "Top left corner coord 2", "bottom right corner coord 1", "bottom right coord 2"
                c('mpg', 'mpg', 'qsec', "carb" ),
                c('mpg', 'mpg', "carb",'carb' )) #State where you want the corners of your horizontal rectangles

r_vert <- rbind( c('mpg', 'mpg',  'carb', 'hp'), #to go from horizontal to vertical (for symmetrical matrices) flip the order of the 3rd and 4th names
            c('mpg', 'mpg', "carb",'qsec' ),
            c('mpg', 'mpg', "carb",'carb' )) #State where you want the corners of your vertical rectangles

Lab2 <- rbind( c('mpg', 'mpg',  'wt', 'carb'),
               c('mpg', 'mpg', "carb",'wt' ),
               c('mpg', 'mpg', "carb",'carb' )) #You can include multiple batches of rectangles and use colour to convey their different meanings

plot_cor <-  corrplot(as.matrix(cars),  method = "color",
             col = col(100), col.lim=c(-1,1), tl.pos = 'n', #HERE, tl.pos = 'n' removes individual row labels. If you have few enough you can use tl.cex = 0.4 instead to include them but smaller 
             title = "This is my corplot", mar=c(0,0,1,0),  diag = T) %>% corrRect(namesMat = r_horiz)  %>% corrRect(namesMat = r_vert) %>% corrRect(namesMat =  Lab2,  col = "red")

Example corrplot with groupings illustrated with lines

To add labels of the groups you can change your row and column names to empty ("") for all but the middle row and choose what you want to name the group, or you can label them after the fact with annotation. If you find these methods to clunky perhaps you should use ggplot instead of corrplot. The basic would be from an uncorrelated matrix

ggplot(data = df, aes(x=V1, y=V2, fill=value)) 

and you could add lines and annotations with

+ geom_hline(yintercept = 5) + geom_vline(xintercept = 10) +
 annotate("text", x = 5, y = 5, label = "Group1") 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kirk Geier