'selecting columns from a set of names with dplyr
I'm attempting to make subsets of a large data frame based on whether the column names are in an externally defined set. So I'm starting with something like:
> x <- c(1,2,3)
> y <- c("a","b","c")
> z <- c(4,5,6)
>
> df <- data.frame(x=x,y=y,z=z)
> df
x y z
1 1 a 4
2 2 b 5
3 3 c 6
chosen_columns <- c(x,y)
And I'm attempting to use this much to end up with:
x y
1 1 a
2 2 b
3 3 c
It seems like using select() from dplyr should be able to handle this perfectly, but I'm not sure what the arguments would be to get that. Something like:
df_chosen <- df %>%
select(is.element(___,chosen_columns))
I'm just not sure what would go in the ___ there.
Thank you!
Solution 1:[1]
c(x, y) is not a vector of two columns: it's combining your objects x and y into a vector of characters: c("1", "2", "3", "a","b","c").
You may want to create a vector of column names and then pass it directly to select():
library(dplyr)
chosen_columns <- c("x", "y")
df |> select(all_of(chosen_columns))
(Thank you, Gregor Thomas, for the advice to wrap column names in all_of()).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
