'compare multiple columns to one column and return names of columns that match condition
I have an df like this:
df <- data.frame(
Death = as.Date(c("2017-09-20")),
First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")),
new = c("Second_Date", "First_Date, Second_Date", NA),
row_number = c(1,2,3))
And I want to create the column 'new' where if any columns that contain the word "Date" is after the 'Death' date column, then I want to return the names of these columns. For example, you can see:
- in the first row, Second_Date is after Death so new = Second_date
- in the second row, both First_Date and Second_Date are after Death so new = First_Date, Second_Date
- in the third row, none of the dates are after death so new = NA
So far I have this code:
df2 <- df %>% mutate(new = Reduce(coalesce, across(contains("Date"), ~ ifelse(. > Death, cur_column(), NA_character_))))
but I'm only able to return the first column from left to right that meets this condition. Any help would be much appreciated.
Solution 1:[1]
We loop across the columns that have '_Date' as suffix in column names, get the column name (cur_column()) if the values are greater than Death column, return as new column by modifying the .names, then use unite to join those _new column to a single one
library(dplyr)
library(tidyr)
df %>%
mutate(across(ends_with("_Date"),
~ case_when(.x > Death ~ cur_column()), .names = "{.col}_new")) %>%
unite(new, ends_with("_new"), na.rm = TRUE, sep = ", ") %>%
na_if("")
-output
Death First_Date Second_Date row_number new
1 2017-09-20 2016-09-09 2019-05-02 1 Second_Date
2 2017-09-20 2018-09-20 2019-09-20 2 First_Date, Second_Date
3 2017-09-20 2016-09-09 2016-09-09 3 <NA>
NOTE: coalesce returns only the first non-NA value across rows
Solution 2:[2]
Another possible solution, in base R:
df <- data.frame(
Death = as.Date(c("2017-09-20")),
First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")))
df$new <- apply(df, 1, \(x) if (any(x[1] < x[2:3]))
paste(names(df)[c(F, x[1] < x[2:3])], collapse = ", ") else NA)
df
#> Death First_Date Second_Date new
#> 1 2017-09-20 2016-09-09 2019-05-02 Second_Date
#> 2 2017-09-20 2018-09-20 2019-09-20 First_Date, Second_Date
#> 3 2017-09-20 2016-09-09 2016-09-09 <NA>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | akrun |
| Solution 2 | PaulS |
