'selecting columns from a set of names with dplyr

I'm attempting to make subsets of a large data frame based on whether the column names are in an externally defined set. So I'm starting with something like:

> x <- c(1,2,3)
> y <- c("a","b","c")
> z <- c(4,5,6)
> 
> df <- data.frame(x=x,y=y,z=z)
> df
  x y z
1 1 a 4
2 2 b 5
3 3 c 6

chosen_columns <- c(x,y)

And I'm attempting to use this much to end up with:

  x y
1 1 a
2 2 b
3 3 c

It seems like using select() from dplyr should be able to handle this perfectly, but I'm not sure what the arguments would be to get that. Something like:

df_chosen <- df %>%
  select(is.element(___,chosen_columns))

I'm just not sure what would go in the ___ there.

Thank you!



Solution 1:[1]

c(x, y) is not a vector of two columns: it's combining your objects x and y into a vector of characters: c("1", "2", "3", "a","b","c").

You may want to create a vector of column names and then pass it directly to select():

library(dplyr)

chosen_columns <- c("x", "y")

df |> select(all_of(chosen_columns))

(Thank you, Gregor Thomas, for the advice to wrap column names in all_of()).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1