'How to mutate two list columns with dplyr::mutate

I have a following dataframe:

library(tidyverse)
dat <- structure(list(peptide_name = c(
  "foo", "foo", "foo",
  "foo", "foo", "foo", "bar", "bar", "bar",
  "bar", "bar", "bar"
), predicted = c(
  1, 0.965193935171986,
  1.002152924502, 1.13340754433401, 1.24280233366, 1.43442435500686,
  1, 1.07873571757982, 1.141383975916, 1.247359728244, 1.259245716526,
  1.23549751707385
), trueval = c(
  1, 1.174927114, 1.279883382, 1.752186589,
  1.994169096, 2.358600583, 1, 0.977742448, 1.305246423, 1.500794913,
  1.532591415, 1.197138315
)), row.names = c(NA, -12L), class = c(
  "tbl_df",
  "tbl", "data.frame"
))

dat

It looks like this:

   peptide_name predicted trueval
   <chr>            <dbl>   <dbl>
 1 foo              1       1    
 2 foo              0.965   1.17 
 3 foo              1.00    1.28 
 4 foo              1.13    1.75 
 5 foo              1.24    1.99 
 6 foo              1.43    2.36 
 7 bar              1       1    
 8 bar              1.08    0.978
 9 bar              1.14    1.31 
10 bar              1.25    1.50 
11 bar              1.26    1.53 
12 bar              1.24    1.20 

Each foo and bar peptide contain the same number of rows. What I want to do is to perform *Pearson correlation` between two peptides.

The following code is my attempt:

dat %>%  
  group_by(peptide_name) %>% 
  # Here create list-columns
  nest() %>% 
  mutate(pn = row_number()) %>% 
  dplyr::select(pn, everything()) %>% 
  pivot_wider(-pn, names_from = peptide_name, values_from = data) %>% 
  # Attempt to calculate Pearson correlation
  mutate(pearson = cor(foo, bar, method = "pearson")) 

But it failed:

Error in `mutate()`:
! Problem while computing `pearson = cor(foo, bar, method =
  "pearson")`.
Caused by error in `cor()`:
! 'x' must be numeric

Whats the right way to do it?

The final expected result of the correlation:

foo   bar  type
0.97 0.85  pearson_cor


Solution 1:[1]

The problem seems to be in how you are passing the arguments to the cor() function. I was able to get the following code to work:

 dat %>%  
  group_by(peptide_name) %>% 
  # Here create list-columns
  nest() %>% 
  mutate(pn = row_number()) %>% 
  dplyr::select(pn, everything()) %>% 
  pivot_wider(-pn, names_from = peptide_name, values_from = data) %>% 
  mutate(pearson_foo = cor(foo[[1]][[1]], foo[[1]][[2]], method = "pearson"),
         pearson_bar = cor(bar[[1]][[1]], bar[[1]][[2]], method = "pearson"))

However, I'd be curious to see if anyone has a more elegant solution to your problem, since my solution involves adding an extra column! I'll keep playing around with it and see if I can come up with something better...

Edit: Ritchie's answer with summarise() is way easier!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1