'How to mutate two list columns with dplyr::mutate
I have a following dataframe:
library(tidyverse)
dat <- structure(list(peptide_name = c(
"foo", "foo", "foo",
"foo", "foo", "foo", "bar", "bar", "bar",
"bar", "bar", "bar"
), predicted = c(
1, 0.965193935171986,
1.002152924502, 1.13340754433401, 1.24280233366, 1.43442435500686,
1, 1.07873571757982, 1.141383975916, 1.247359728244, 1.259245716526,
1.23549751707385
), trueval = c(
1, 1.174927114, 1.279883382, 1.752186589,
1.994169096, 2.358600583, 1, 0.977742448, 1.305246423, 1.500794913,
1.532591415, 1.197138315
)), row.names = c(NA, -12L), class = c(
"tbl_df",
"tbl", "data.frame"
))
dat
It looks like this:
peptide_name predicted trueval
<chr> <dbl> <dbl>
1 foo 1 1
2 foo 0.965 1.17
3 foo 1.00 1.28
4 foo 1.13 1.75
5 foo 1.24 1.99
6 foo 1.43 2.36
7 bar 1 1
8 bar 1.08 0.978
9 bar 1.14 1.31
10 bar 1.25 1.50
11 bar 1.26 1.53
12 bar 1.24 1.20
Each foo and bar peptide contain the same number of rows.
What I want to do is to perform *Pearson correlation` between two peptides.
The following code is my attempt:
dat %>%
group_by(peptide_name) %>%
# Here create list-columns
nest() %>%
mutate(pn = row_number()) %>%
dplyr::select(pn, everything()) %>%
pivot_wider(-pn, names_from = peptide_name, values_from = data) %>%
# Attempt to calculate Pearson correlation
mutate(pearson = cor(foo, bar, method = "pearson"))
But it failed:
Error in `mutate()`:
! Problem while computing `pearson = cor(foo, bar, method =
"pearson")`.
Caused by error in `cor()`:
! 'x' must be numeric
Whats the right way to do it?
The final expected result of the correlation:
foo bar type
0.97 0.85 pearson_cor
Solution 1:[1]
The problem seems to be in how you are passing the arguments to the cor() function. I was able to get the following code to work:
dat %>%
group_by(peptide_name) %>%
# Here create list-columns
nest() %>%
mutate(pn = row_number()) %>%
dplyr::select(pn, everything()) %>%
pivot_wider(-pn, names_from = peptide_name, values_from = data) %>%
mutate(pearson_foo = cor(foo[[1]][[1]], foo[[1]][[2]], method = "pearson"),
pearson_bar = cor(bar[[1]][[1]], bar[[1]][[2]], method = "pearson"))
However, I'd be curious to see if anyone has a more elegant solution to your problem, since my solution involves adding an extra column! I'll keep playing around with it and see if I can come up with something better...
Edit: Ritchie's answer with summarise() is way easier!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
