'mutate variable based on other columns with similar names

I have a df here (the desired output, my starting df does not have the Flag variable):

df <- data.frame(
  Person = c('1','2','3'),
  Date = as.Date(c('2010-09-30', '2012-11-20', '2015-03-11')),
  Treatment_1 = as.Date(c('2010-09-30', '2012-11-21', '2015-03-22')),
  Treatment_2 = as.Date(c('2011-09-30', 'NA', '2011-03-22')),
  Treatment_3 = as.Date(c('2012-09-30', '2015-11-21', '2015-06-22')),
  Surgery_1 = as.Date(c(NA, '2016-11-21', '2015-03-12')),
  Surgery_2 = as.Date(c(NA, '2017-11-21', '2019-03-12')),
  Surgery_3 = as.Date(c(NA, '2018-11-21', '2013-03-12')),
  Flag = c('', 'Y', '') 
)

and I want to derive the Flag variable based on these conditions:

  1. For any column that starts with Treatment, set Flag to "" if Date = Treatment
  2. For any column that starts with Surgery, set Flag to "" if Date = Surgery OR Date = Surgery +1 OR Date = Surgery - 1 (basically if the Surgery date is on the day, one day before, or one day after the Date variable, set Flag to "").
  3. else set Flag = "Y"

I've looked into mutate_at but that rewrites the variables and assigns values of True/False.

This is wrong but this is my attempt:

df2 <- df %>%
  mutate(Flag = case_when(
    vars(starts_with("Treatment"), Date == . ) ~ '',
    vars(starts_with("Surgery"), Date == . | Date == . - 1 | Date == . + 1) ~ '',
    TRUE ~ 'Y')
  )


Solution 1:[1]

Here is an alternative using across approach:

library(tidyverse)

df %>% 
  mutate(across(starts_with("Treatment"), ~as.numeric(. %in% Date), .names ="new_{.col}"),
         across(starts_with("Surgery"), ~as.numeric(. %in% c(Date, Date+1, Date-1)), .names ="new_{.col}")) %>% 
  mutate(Flag = ifelse(rowSums(select(., contains('new')))==1, "", "Y"), .keep="used") %>% 
  bind_cols(df)
  Flag Person       Date Treatment_1 Treatment_2 Treatment_3  Surgery_1  Surgery_2  Surgery_3
1           1 2010-09-30  2010-09-30  2011-09-30  2012-09-30       <NA>       <NA>       <NA>
2    Y      2 2012-11-20  2012-11-21        <NA>  2015-11-21 2016-11-21 2017-11-21 2018-11-21
3           3 2015-03-11  2015-03-22  2011-03-22  2015-06-22 2015-03-12 2019-03-12 2013-03-12

Solution 2:[2]

Updated to add data.table approach

If you want a data.table approach, here it is:

df[melt(df, id=c(1,2))[,flag:=fifelse(
  (str_starts(variable,"T") & value==Date) | 
    (str_starts(variable,"S") & abs(value-Date)<=1),"", "Y")][
      , .(flag=min(flag,na.rm=T)), Person], on=.(Person)]

Output


   Person       Date Treatment_1 Treatment_2 Treatment_3  Surgery_1  Surgery_2  Surgery_3 flag
1:      1 2010-09-30  2010-09-30  2011-09-30  2012-09-30       <NA>       <NA>       <NA>     
2:      2 2012-11-20  2012-11-21        <NA>  2015-11-21 2016-11-21 2017-11-21 2018-11-21    Y
3:      3 2015-03-11  2015-03-22  2011-03-22  2015-06-22 2015-03-12 2019-03-12 2013-03-12     

I like Andrew's approach, but I was working on this when his answer came in, so here it is in case you are interested

df %>% inner_join(
  pivot_longer(df, cols=Treatment_1:Surgery_3) %>% 
    mutate(flag=case_when(
        (str_starts(name,"T") & value==Date) | (str_starts(name,"S") & abs(value-Date)<=1) ~ "",
        TRUE ~"Y")) %>% 
    group_by(Person) %>% 
    summarize(flag = min(flag))
)

Output:

  Person       Date Treatment_1 Treatment_2 Treatment_3  Surgery_1  Surgery_2  Surgery_3 flag
1      1 2010-09-30  2010-09-30  2011-09-30  2012-09-30       <NA>       <NA>       <NA>     
2      2 2012-11-20  2012-11-21        <NA>  2015-11-21 2016-11-21 2017-11-21 2018-11-21    Y
3      3 2015-03-11  2015-03-22  2011-03-22  2015-06-22 2015-03-12 2019-03-12 2013-03-12     

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TarJae
Solution 2