'refer to quoted column name in a function in R

I want to use the na_omit function from the collapse package in a user-defined function. na_omit requires a column name to be in quotes as one of its arguments. If I didn't need the column name in quotes, I could just refer to the column name in double braces, {{col}}, as mentioned in this vignette, "Programming with dplyr". If I refer to the column using the glue package, such as glue::glue("{col}"), I receive errors.

Here is a reprex:

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

library(collapse)
library(dplyr)
library(glue)

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, color_code)

The expected output can be generated with the following:

my_df %>% 
  collapse::na_omit(cols = c("color_code")) 

and should produce:

#  color_code  color
#1        V9G   Blue
#2        J4C  White
#3        F7B Orange
#4        G3V  Green

How should I refer to a quoted column name that's a parameter and an argument of a function within a user-defined function in R?



Solution 1:[1]

You have to provide col name as a character, like:

my_func <- function(df, col){
  df %>% 
    collapse::na_omit(cols = c(glue("{col}"))) #Here is the code that fails
}

my_func(my_df, col = "color_code")

Solution 2:[2]

It's important to first determine what environment in R you're programming in. Are you in dplyr or base R? If in dplyr, then reference the documentation for programming with dplyr, rlang, glue, and this stackoverflow answer. If in base R, reference the documentation on non-standard evaluation, especially wrapping quoted columns in as.character(substitute()) and wrapping functions with unquoted columns in eval(substitute()).

It should be noted that both of the approaches above involve non-standard evaluation. Another approach is use standard evaluation (or some "combination" of standard evaluation and non-standard evaluation). For example, see the issue raised in this link.

Reasons for this question come, at least partially, from environment confusion. Here are some of the different approaches in a reprex.

Data

my_df <-
  data.frame(
    matrix(
      c(
        "V9G","Blue",
        NA,"Red",
        "J4C","White",
        NA,"Brown",
        "F7B","Orange",
        "G3V","Green"
      ),
      nrow = 6,
      ncol = 2,
      byrow = TRUE,
      dimnames = list(NULL,
                      c("color_code", "color"))
    ),
    stringsAsFactors = FALSE
  )

Packages

library(collapse)
library(dplyr)
library(stringr)
library(glue)

Functional Programming in base R (non-standard evaluation)
with a quoted column name:

my_func <- function(df, col) {
  col_char_ref <- as.character(substitute(col)) #Use as.character(substitute()) to refer to a quoted column name
  df %>% 
    collapse::na_omit(cols = col_char_ref) 
}

my_func(my_df, color_code)

#Should generate output below
my_df %>% 
  collapse::na_omit(cols = "color_code")

and with a non-quoted column name:

my_func <- my_func <- function(df, col){
  df <- df # This makes sure "df" is available inside the function environment where we evaluate the ftransform expression
  eval(substitute(collapse::ftransform(df, count = stringr::str_length(col)))) # Wrap the function to be evaluated in eval(substitute())
}

 my_func(my_df, color)

 #Should generate output below
 my_df %>%  
  collapse::ftransform(count = stringr::str_length(color))

Functional programming in dplyr (non-standard evaluation)
with a quoted column name using glue and dplyr functions:

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := glue("color code: {pull(., {{col1}})}; color: {pull(., {{col2}})}"))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

or with a quoted column name using a C language wrapper function:

my_func <- function(df, col1, col2) {
  df %>%
    mutate(description := sprintf("color code: %s; color: %s", {{col1}}, {{col2}}))
}

my_func(my_df, color_code, color)

#Should generate output below
my_df %>%
  mutate(description = glue("color code: {color_code}; color: {color}"))

and with a non-quoted column name:

my_func <- function(df, col){
  df %>%  
    dplyr::mutate(count = stringr::str_length({{ col }}))
}

my_func(my_df, color)

#Should generate output below
my_df %>% 
  dplyr::mutate(count = stringr::str_length(color))

Correcting error-producing code
The following code that produces an error provides a motivation for the two examples below:

my_func <- function(df, col){
  df <- df
  df %>%  
    collapse::na_omit(cols = as.character(substitute(col))) %>% 
    eval(substitute(collapse::ftransform(description = stringr::str_length(col))))
}

my_func(my_df, color_code)

#Error in ckmatch(cols, nam) : Unknown columns: col

The examples below are alternatives that do not produce errors.

Functional Programming in base R (standard evaluation - requires column to be passed as character string in function)

library(pkgcond)

my_func <- function(df, col) {
  if (!is.character(substitute(col)))
    pkgcond::pkg_error("col must be a quoted string") #if users aren't used to quoted strings as inputs to a function
  df <- na_omit(df, cols = col) 
  df$count <- stringr::str_length(.subset2(df, col))
  df
}

my_func(my_df, "color_code")

#Should generate output below
my_df %>% 
  na_omit(cols = "color_code") %>% 
  ftransform(description = stringr::str_length("color_code"))

Functional Programming in base R ("combination" of standard evaluation and non-standard evaluation)

my_func <- function(df, col){
  df <- df
  df <- collapse::na_omit(df, cols = as.character(substitute(col))) # Unlike the code with the error, the function is not piped (using %>%)
  eval(substitute(collapse::ftransform(df, description = stringr::str_length(col))))
}

 my_func(my_df, color_code)

 #Should generate output below
 my_df %>% 
  na_omit(cols = "color_code") %>% 
  ftransform(description = stringr::str_length("color_code"))

More complex examples using the collapse package can be referenced at this link.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Grzegorz Sapijaszko
Solution 2