'Correlation problems with two variables WITH NA

I have two variables and I want to know if they are correlated, I have them distributed like this:

X = 14,15,16,18,12,13,14,15
Y = NA, 13,12, NA, NA, 16,16, NA

  And when by

cor(X, Y)

NA



Solution 1:[1]

If you can tolerate omitting all points for which NA appears in even one of either X or Y, then you can call cor() with the option use='complete.obs':

X <- c(14, 15, 16, 18, 12, 13, 14, 15)
Y <- c(NA, 13, 12, NA, NA, 16, 16, NA)

cor(X, Y, use='complete.obs', method='pearson')
[1] -0.9393364

You can verify for yourself that the above result is the same as using:

X <- c(15, 16, 13, 14)
Y <- c(13, 12, 16, 16)
cor(X, Y, method='pearson')

i.e. just dropping those data points for which either X or Y has an NA value.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Biegeleisen