'Problem using across in mutate in dplyr, where function to apply depends on another tibble

I am trying to mutate certain columns of a tibble, where the specific function to use is named in another tibble. The setup code is below, after which I explain the issue.

library(tidyverse)
library(lubridate)

transforms <- list(
  "floor_date" = function(x) {
    floor_date(dmy(x), "month")
  },
  "integer" = function(x) {
    as.integer(gsub("[^[:digit:]]", "", x))
  }
)

data_meta <- tibble(
  datafield = letters[1:3], 
  transform_to = c("floor_date", "", "integer")
)
# A tibble: 3 x 2
# datafield transform_to
# <chr>     <chr>       
# 1 a       "floor_date"
# 2 b       ""          
# 3 c       "integer"  

data <- tibble(
  a = c("09/09/2021", "19/09/2021", "06/10/2021"),
  b = c("lorem", "ipsum", "dolor"),
  c = c("99 bottles", "98 bottles", "97 bottles")
)
# A tibble: 3 x 3
#   a          b     c         
#   <chr>      <chr> <chr>
# 1 09/09/2021 lorem 99 bottles
# 2 19/09/2021 ipsum 98 bottles
# 3 06/10/2021 dolor 97 bottles

The data_meta tibble contains the desired transformation function (if any) for each column of the data tibble. These transformation functions are in a named list, transforms. In order to focus on only those columns that need a transform, I define needs_transform:

needs_transform <- data_meta %>%
      filter(nchar(transform_to) > 0)
    # A tibble: 2 x 2
    #   datafield transform_to
    #   <chr>     <chr>       
    # 1 a         floor_date  
    # 2 c         integer

I now want to use mutate(across(...)) to apply the transformations. I find that the following gives the correct function, based on the column name:

transforms[[(needs_transform %>% filter(datafield == "a") %>% select(transform_to))[[1,1]]]]
# function(x) {
#   floor_date(dmy(x), "month")
# }

So I try the below using the cur_column() function to filter correctly:

clean_data <- data %>%
  mutate(across(
    needs_transform$datafield,
    ~ transforms[[(needs_transform %>%
                     filter(datafield == cur_column()) %>% select(transform_to))[[1,1]]]]
  ))
# Error in `mutate()`:
#   ! Problem while computing `..1 = across(...)`.
# Caused by error in `across()`:
#   ! Problem while computing column `a`.

Unfortunately this does not work and I am not sure why, even after inspecting the traceback (can provide it, it was not helpful tho).

My second attempt was to try wrapping the logic in a function (note the x arg does nothing but is required to be used in across):

get_transform <- function(x) {
  t <- (needs_transform %>%
          filter(datafield == cur_column()) %>%
          select(transform_to))[[1,1]]
  
  transforms[[t]]
}

clean_data <- data %>%
  mutate(across(
    needs_transform$datafield,
    get_transform
  ))
# Error in `mutate()`:
#   ! Problem while computing `..1 = across(needs_transform$datafield, get_transform)`.
# Caused by error in `across()`:
#   ! Problem while computing column `a`.

Almost the exact same error message. I have looked thru several threads on here and nothing quite matches what I am looking to do. Could anyone help to get this to work? Or is this not a great way to do it, is there a better way?



Solution 1:[1]

One option to achieve your desired result may look like so:

library(tidyverse)
library(lubridate)

trans <- function(x, y) {
  fn_name <- data_meta %>%
    filter(datafield == y) %>%
    pull(transform_to)
  transforms[[fn_name]](x)
}

data %>%
  mutate(across(needs_transform$datafield, ~ trans(.x, cur_column())))
#> # A tibble: 3 × 3
#>   a          b         c
#>   <date>     <chr> <int>
#> 1 2021-09-01 lorem    99
#> 2 2021-09-01 ipsum    98
#> 3 2021-10-01 dolor    97

Solution 2:[2]

You could also try:

data %>%
  mutate( across(needs_transform$datafield, 
        ~ transforms[[with(data_meta,
            transform_to[datafield == cur_column()])]](.x)))

 a          b         c
  <date>     <chr> <int>
1 2021-09-01 lorem    99
2 2021-09-01 ipsum    98
3 2021-10-01 dolor    97

or Even:

data %>%
   mutate( across(needs_transform$datafield, 
           ~.x %>% {transforms %>%
             getElement(data_meta %>%
             filter(datafield == cur_column())%>%
             pull(transform_to))}()))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 stefan
Solution 2