'pairwise.complete.obs in cov function in R
I have a simulated dataset (problem) that looks like this:
A = factor(rep("A",252));A
B = factor(rep("B",190));B
FACT = c(A,B)
x = rnorm(252)
y = rnorm(190)
d = c(x,y)
DATA = tibble(FACT,d);DATA
resulting to :
# A tibble: 442 x 2
FACT d
<fct> <dbl>
1 A -0.172
2 A 1.23
3 A -0.589
4 A 0.512
5 A -1.00
6 A 0.532
7 A 0.562
8 A -0.403
9 A 2.10
10 A 0.649
# ... with 432 more rows
Now i have a vector of interest with has length 100.
z = rnorm(100)
i want to find the covariance of vector z with each vector x and y respectively. Doing so in R i tried :
DATA %>%
group_by(FACT)%>%
dplyr::mutate(row = row_number())%>%
tidyr::pivot_wider(names_from = FACT, values_from = d)%>%
dplyr::select(-row)%>%
dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
slice(n=1)%>%
tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")
But R reports me an error that there is an issue with the argument use "pairwise.complete.obs".
The error is :
Error in `dplyr::mutate()`:
! Problem while computing `..1 = (across(.cols =
everything(), ~cov(.x, z, use =
"pairwise.complete.obs")))`.
Caused by error in `across()`:
! Problem while computing column `A`.
Caused by error in `cov()`:
! incompatible dimensions
Imagine that my realworld problem has 150 factor categories. How can be fixed ? Any help ?
Solution 1:[1]
The problem is that you’re trying to get covariance for vectors of different lengths. "pairwise.complete.obs" is just included in the error message because it’s printing the call which raised the error, but it’s not the problem. The important bit is:
Caused by error in `cov()`:
! incompatible dimensions
ie, you’re requesting covariance of a 252-length vector with a 100-length vector. If all vectors are the same length, there’s no error:
library(tidyverse)
A = factor(rep("A",100))
B = factor(rep("B",100))
FACT = c(A,B)
x = rnorm(100)
y = rnorm(100)
d = c(x,y)
DATA = tibble(FACT,d)
z = rnorm(100)
DATA %>%
group_by(FACT)%>%
dplyr::mutate(row = row_number())%>%
tidyr::pivot_wider(names_from = FACT, values_from = d) %>%
dplyr::select(-row)%>%
dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
slice(n=1)%>%
tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")
# # A tibble: 2 x 2
# FACT CoV
# <chr> <dbl>
# 1 A 0.0705
# 2 B -0.214
Edit:
OP comments,
The problem is that the pairwise.complete.obs does not solve the mismatch in length of the needed vectors.
"pairwise.complete.obs" is for dropping rows where either vector is NA. But the input vectors still have to be of equal length. e.g.:
# returns NA due to missing values
cov(
c(1,2,3,NA,5,6),
c(6,NA,2,NA,5,1)
)
# NA
# with pairwise.complete.obs, returns covariance for pairs without NAs
cov(
c(1,2,3,NA,5,6),
c(6,NA,2,NA,5,1),
use = "pairwise.complete.obs"
)
# -3.166667
# but still throws an error for unequal dimensions
cov(
c(1,2,3,NA,5,6,7,8),
c(6,NA,2,NA,5,1),
use = "pairwise.complete.obs"
)
# Error in cov(c(1, 2, 3, NA, 5, 6, 7, 8), c(6, NA, 2, NA, 5, 1), use = "pairwise.complete.obs") :
# incompatible dimensions
The underlying problem is that covariance is based on pairs of values. One way to think of it is that your input vectors need to be the same length so R knows how you want the values "paired up." So trying to get covariance for different length vectors doesn’t quite make sense.
postscript: Your code could be simplified quite a bit using dplyr::summarize:
DATA %>%
group_by(FACT) %>%
summarize(CoV = cov(d, z, use= "pairwise.complete.obs"))
# # A tibble: 2 x 2
# FACT CoV
# <chr> <dbl>
# 1 A 0.0705
# 2 B -0.214
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
