'Calling user defined functions from dplyr::mutate

I'm working on a project that involves many different tibbles, all of which have a period variable of the format YYYYMM. Below is an example of how all my tibbles look like:

tibble_1 <- tibble::tibble(
  period = c(201901, 201912, 201902, 201903),
  var_1 = rnorm(4),
  var_2 = rnorm(4)
)

But for some operations (i.e. time series plots) it's easier to work with an actual Date variable. So I'm using mutate to transform the period variable into a date like follows:

tibble_1 %>% 
  dplyr::mutate(
    date = lubridate::ymd(stringr::str_c(period, "01"))
)

Since I will be doing this a lot, and the date transformation is not the only mutation I am going to be doing when calling mutate, I'd like to have a user-defined function that I can call from within the mutate call. Here's my function:

period_to_date <- function() {
  lubridate::ymd(stringr::str_c(period, "01"))
}

Which I would later call like this:

tibble_1 %>% 
  dplyr::mutate(
    date = period_to_date()
)

Problem is, R can't find the period object (which is not really an object on itself, but part of the tibble).

> Error in stri_c(..., sep = sep, collapse = collapse, ignore_null = 
TRUE) : object 'period' not found 

I'm pretty sure I need to define a data-mask so that the envir in which period_to_date is executed can look for the object in it's parent envir (which should always be the caller envir since the tibble containing the period column is not always the same), but I can't seem to figure out how to do it.



Solution 1:[1]

The function does not know which object you want to modify. Pass the period object in the function and use it like :

period_to_date <- function(period) {
  lubridate::ymd(stringr::str_c(period, "01"))
  #Can also use
  #as.Date(paste0(period,"01"), "%Y%m%d")
}

tibble_1 %>% 
  dplyr::mutate(date = period_to_date(period))

#  period   var_1  var_2 date      
#   <dbl>   <dbl>  <dbl> <date>    
#1 201901 -0.476  -0.456 2019-01-01
#2 201912 -0.645   1.45  2019-12-01
#3 201902 -0.0939 -0.982 2019-02-01
#4 201903  0.410   0.954 2019-03-01

Solution 2:[2]

Consider passing the column name as an argument to your function:

library(dplyr)


period_to_date <- function(x) {
  lubridate::ymd(stringr::str_c(x, "01"))
}

df <- data.frame(x = 1:3, period = c('201903', '202001', '201511'))

df %>% mutate(p2 = period_to_date(period))
#>   x period         p2
#> 1 1 201903 2019-03-01
#> 2 2 202001 2020-01-01
#> 3 3 201511 2015-11-01

Created on 2020-01-10 by the reprex package (v0.3.0)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 mrhellmann