'What is non-standard evaluation and how can you pass an "undefined variable" to a function in R?
originally asked here
ggplot(data = bechdel, aes(x = domgross_2013)) +
geom_histogram(bins = 10, color="purple", fill="white") +
labs(title = "Domestic Growth of Movies", x = " Domestic Growth")
How come we are able to pass the column we would like to map to the x value (domgross_2013)? It seems to be passed like a variable rather then a string.
This is different from this post because in order to reach that post you must know that on standard evaluation exists and is the cause of allowing you to pass an "undefined variable". I didn't know that this evaluation existed, and the explanation within that post is for people who have a much large R foreknowledge as well as understanding of what the "undefined variable" is
Solution 1:[1]
This is called non-standard evaluation (NSE). It can be a nicer interface to not need to use strings or data_frame$column_name or other, longer, syntax. It requires special handling in the way the function is written. The Advanced R book has a chapter on non-standard evaluation, which is a good place to dig in to the mechanics of it. I'll quote the outline of the chapter here, to give an idea of what is covered, and to make a point that it's too complex of a topic to explain well in a single answer on Stack Overflow - a full chapter of a book is much more appropriate.
Outline
- Capturing expressions teaches you how to capture unevaluated expressions using
substitute().- Non-standard evaluation shows you how
subset()works by combiningsubstitute()witheval()to allow you to succinctly select rows from a data frame.- Scoping issues discusses scoping issues specific to NSE, and will show you how to resolve them.
- Calling from another function shows why every function that uses NSE should have an escape hatch, a version that uses regular evaluation.
- Substitute teaches you how to use
substitute()to work with functions that don’t have an escape hatch.- The downsides finishes off the chapter with a discussion of the downsides of NSE.
It's worth noting that non-standard evaluation is shows up even in base R, although its heaviest use seems to be in packages like dplyr, data.table, and ggplot2.
## Examples of NSE in base R:
## library() has non-standard evaluation optionally
library(ggplot2) # this works even though `ggplot2` isn't an object
library("ggplot2") # also works with standard evaluation
## by contrast, install.packages does not allow NSE
install.packages(ggplot2) ## throws an error
# Error in install.packages : object 'ggplot2' not found
install.packages("ggplot2") ## quotes are needed here
## subset() uses NSE on column names,
## even letting you use `:` to choose consecutive columns
subset(mtcars, mpg > 22, select = mpg:hp)
## with() is a wrapper that allows non-standard evaluation inside it
with(mtcars, mpg / wt + disp)
Solution 2:[2]
The way R passes arguments is to pass the expression to the function. Most functions will evaluate it (this happens automatically when you reference the variable in a normal way), but it is also possible to access the expression itself, and that's what ggplot2 functions do in a lot of situations. As others have said, this is called "non-standard evaluation" or NSE.
The usual way to access the expression is with the substitute() function. For example,
f <- function(x) substitute(x)
f(y + z)
#> y + z
Created on 2022-01-19 by the reprex package (v2.0.1)
This works even if variables y and z don't exist, because that function doesn't ever do standard evaluation on the x argument.
This is also used by R in lazy evaluation. It won't evaluate an argument until it needs the value, so sometimes you get surprising results, because the value may have changed between the time you called the function and the time you evaluate its arguments.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | user2554330 |
