'How to calculate correlations and scatterplots for each marked cell in multiple columns in R
I have a date frame which looks like this:
| ID | Column 1 | Column 2 | Column 3 | Column n | Main variable 1 | Main variable 2 |
|---|---|---|---|---|---|---|
| 1 | 0 | 1 | 0 | ... | -0.5 | 8 |
| 2 | 1 | 0 | 0 | ... | 2.5 | 14 |
| 3 | 0 | 1 | 0 | ... | 4 | 6 |
| ... | ... | ... | ... | ... | ... | ... |
There are two main variables which I want to correlate and make a scatterplot for each marked cell (= 1) in a column. For example in column 2 the first und third row are marked with 1. So only first and third value from the main variable should go into the correlation calculation. My idea was to create e vector with the column names and put it in the correlation function but as an result I only get the correlation value from the marked cell of the first column.
Cor(
Data_frame[which(Data_frame[,column]==1),"main_variable_1"],
Data_frame[which(Data_frame[,column]==1),"main_variable_2"],
use = "pairwise.complete.obs"
)
Does some have an idea how to solve this?
Solution 1:[1]
library(tidyverse)
library(broom)
data <- tribble(
~ID, ~Column.1, ~Column.2, ~Column.3, ~Column.n, ~Main.variable.1, ~Main.variable.2,
1L, 0L, 1L, 0L, "...", -0.5, 8L,
2L, 1L, 0L, 0L, "...", 2.5, 14L,
3L, 0L, 1L, 0L, "...", 4, 6L,
3L, 0L, 1L, 0L, "...", 4, 6L,
3L, 0L, 1L, 0L, "...", 4, 6L,
3L, 0L, 1L, 0L, "...", 4, 6L,
)
data %>%
colnames() %>%
keep(~ .x %>% str_detect("^Column\\.[0-9]+$")) %>%
enframe() %>%
mutate(
cor_test = value %>% map(possibly(~ {
data %>%
filter(across(matches(.x), ~ .x == 1)) %>%
cor.test(~ Main.variable.1 + Main.variable.2, data = .) %>%
tidy()
}, NA))
) %>%
unnest(cor_test)
#> Warning: Using `across()` in `filter()` is deprecated, use `if_any()` or
#> `if_all()`.
#> Warning: Using `across()` in `filter()` is deprecated, use `if_any()` or
#> `if_all()`.
#> Warning: Using `across()` in `filter()` is deprecated, use `if_any()` or
#> `if_all()`.
#> # A tibble: 3 × 10
#> name value estimate statistic p.value parameter conf.low conf.high method
#> <int> <chr> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <chr>
#> 1 1 Column.1 NA NA NA NA NA NA <NA>
#> 2 2 Column.2 -1 -Inf 0 3 -1 -1 Pearso…
#> 3 3 Column.3 NA NA NA NA NA NA <NA>
#> # … with 1 more variable: alternative <chr>
Created on 2022-04-14 by the reprex package (v2.0.0)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | danlooo |
