'pairwise.complete.obs in cov function in R

I have a simulated dataset (problem) that looks like this:

A = factor(rep("A",252));A
B = factor(rep("B",190));B
FACT = c(A,B)
x = rnorm(252)

y = rnorm(190)
d = c(x,y)
DATA = tibble(FACT,d);DATA

resulting to :

# A tibble: 442 x 2
   FACT       d
   <fct>  <dbl>
 1 A     -0.172
 2 A      1.23 
 3 A     -0.589
 4 A      0.512
 5 A     -1.00 
 6 A      0.532
 7 A      0.562
 8 A     -0.403
 9 A      2.10 
10 A      0.649
# ... with 432 more rows

Now i have a vector of interest with has length 100.

z = rnorm(100)

i want to find the covariance of vector z with each vector x and y respectively. Doing so in R i tried :

DATA %>%
  group_by(FACT)%>%
  dplyr::mutate(row = row_number())%>%
  tidyr::pivot_wider(names_from = FACT, values_from = d)%>%
  dplyr::select(-row)%>%
  dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
  slice(n=1)%>%
  tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")

But R reports me an error that there is an issue with the argument use "pairwise.complete.obs".

The error is :

Error in `dplyr::mutate()`:
! Problem while computing `..1 = (across(.cols =
  everything(), ~cov(.x, z, use =
  "pairwise.complete.obs")))`.
Caused by error in `across()`:
! Problem while computing column `A`.
Caused by error in `cov()`:
! incompatible dimensions

Imagine that my realworld problem has 150 factor categories. How can be fixed ? Any help ?

r dplyr covariance

Solution 1:^[1]

The problem is that you’re trying to get covariance for vectors of different lengths. "pairwise.complete.obs" is just included in the error message because it’s printing the call which raised the error, but it’s not the problem. The important bit is:

Caused by error in `cov()`:
! incompatible dimensions

ie, you’re requesting covariance of a 252-length vector with a 100-length vector. If all vectors are the same length, there’s no error:

library(tidyverse)
A = factor(rep("A",100))
B = factor(rep("B",100))
FACT = c(A,B)
x = rnorm(100)

y = rnorm(100)
d = c(x,y)
DATA = tibble(FACT,d)

z = rnorm(100)

DATA %>%
  group_by(FACT)%>%
  dplyr::mutate(row = row_number())%>%
  tidyr::pivot_wider(names_from = FACT, values_from = d) %>% 
  dplyr::select(-row)%>%
  dplyr::mutate((across(.cols= everything(),~cov(.x,z,use= "pairwise.complete.obs"))))%>%
  slice(n=1)%>%
  tidyr::pivot_longer( cols = tidyselect::everything(), names_to = "FACT", values_to = "CoV")

# # A tibble: 2 x 2
#   FACT      CoV
#   <chr>   <dbl>
# 1 A      0.0705
# 2 B     -0.214

Edit:

OP comments,

The problem is that the pairwise.complete.obs does not solve the mismatch in length of the needed vectors.

"pairwise.complete.obs" is for dropping rows where either vector is NA. But the input vectors still have to be of equal length. e.g.:

# returns NA due to missing values
cov(
  c(1,2,3,NA,5,6),
  c(6,NA,2,NA,5,1)
)
# NA

# with pairwise.complete.obs, returns covariance for pairs without NAs
cov(
  c(1,2,3,NA,5,6),
  c(6,NA,2,NA,5,1),
  use = "pairwise.complete.obs"
)
# -3.166667

# but still throws an error for unequal dimensions
cov(
  c(1,2,3,NA,5,6,7,8),
  c(6,NA,2,NA,5,1),
  use = "pairwise.complete.obs"
)
# Error in cov(c(1, 2, 3, NA, 5, 6, 7, 8), c(6, NA, 2, NA, 5, 1), use = "pairwise.complete.obs") : 
#   incompatible dimensions

The underlying problem is that covariance is based on pairs of values. One way to think of it is that your input vectors need to be the same length so R knows how you want the values "paired up." So trying to get covariance for different length vectors doesn’t quite make sense.

postscript: Your code could be simplified quite a bit using dplyr::summarize:

DATA %>%
  group_by(FACT) %>%
  summarize(CoV = cov(d, z, use= "pairwise.complete.obs"))

# # A tibble: 2 x 2
#   FACT      CoV
#   <chr>   <dbl>
# 1 A      0.0705
# 2 B     -0.214

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'pairwise.complete.obs in cov function in R

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]