'How to calculate correlations and scatterplots for each marked cell in multiple columns in R

I have a date frame which looks like this:

ID Column 1 Column 2 Column 3 Column n Main variable 1 Main variable 2
1 0 1 0 ... -0.5 8
2 1 0 0 ... 2.5 14
3 0 1 0 ... 4 6
... ... ... ... ... ... ...

There are two main variables which I want to correlate and make a scatterplot for each marked cell (= 1) in a column. For example in column 2 the first und third row are marked with 1. So only first and third value from the main variable should go into the correlation calculation. My idea was to create e vector with the column names and put it in the correlation function but as an result I only get the correlation value from the marked cell of the first column.

Cor(
     Data_frame[which(Data_frame[,column]==1),"main_variable_1"],
     Data_frame[which(Data_frame[,column]==1),"main_variable_2"],
     use = "pairwise.complete.obs"
   )

Does some have an idea how to solve this?

r


Solution 1:[1]

library(tidyverse)
library(broom)

data <- tribble(
  ~ID, ~Column.1, ~Column.2, ~Column.3, ~Column.n, ~Main.variable.1, ~Main.variable.2,
  1L, 0L, 1L, 0L, "...", -0.5, 8L,
  2L, 1L, 0L, 0L, "...", 2.5, 14L,
  3L, 0L, 1L, 0L, "...", 4, 6L,
  3L, 0L, 1L, 0L, "...", 4, 6L,
  3L, 0L, 1L, 0L, "...", 4, 6L,
  3L, 0L, 1L, 0L, "...", 4, 6L,
)

data %>%
  colnames() %>%
  keep(~ .x %>% str_detect("^Column\\.[0-9]+$")) %>%
  enframe() %>%
  mutate(
    cor_test = value %>% map(possibly(~ {
      data %>%
        filter(across(matches(.x), ~ .x == 1)) %>%
        cor.test(~ Main.variable.1 + Main.variable.2, data = .) %>%
        tidy()
    }, NA))
  ) %>%
  unnest(cor_test)
#> Warning: Using `across()` in `filter()` is deprecated, use `if_any()` or
#> `if_all()`.

#> Warning: Using `across()` in `filter()` is deprecated, use `if_any()` or
#> `if_all()`.

#> Warning: Using `across()` in `filter()` is deprecated, use `if_any()` or
#> `if_all()`.
#> # A tibble: 3 × 10
#>    name value    estimate statistic p.value parameter conf.low conf.high method 
#>   <int> <chr>       <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>  
#> 1     1 Column.1       NA        NA      NA        NA       NA        NA <NA>   
#> 2     2 Column.2       -1      -Inf       0         3       -1        -1 Pearso…
#> 3     3 Column.3       NA        NA      NA        NA       NA        NA <NA>   
#> # … with 1 more variable: alternative <chr>

Created on 2022-04-14 by the reprex package (v2.0.0)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 danlooo