'How to find the earliest date across multiple columns in R (Issue with NAs)

I have 3 date columns (class-date) and I want to create a new column that will have the earliest of the 3 dates. This is the code I used below:

df1 <- df %>% mutate(timeout= pmin(date1, date2, end_date))

In the case that date1 and date2 are NAs, then I would like the date in end_date to be returned in the timeout column and therefore timeout should not have any NAs. The code above is bringing back NAs. Any assistance will be greatly appreciated.



Solution 1:[1]

You can add na.rm = TRUE, then it will ignore the NAs in each row when calculating pmin.

library(dplyr)

df %>% 
  mutate(timeout = pmin(date1, date2, end_date, na.rm = TRUE))

Output

  id      date1      date2   end_date    timeout
1  1       <NA>       <NA> 2008-01-23 2008-01-23
2  1 2007-10-16 2007-11-01 2008-01-23 2007-10-16
3  2 2007-11-30 2007-11-30 2007-11-30 2007-11-30
4  3 2007-08-17 2007-12-17 2008-12-12 2007-08-17
5  3 2008-11-12 2008-12-12 2008-12-12 2008-11-12

Data

df <- structure(list(id = c(1L, 1L, 2L, 3L, 3L), date1 = structure(c(NA, 
13802, 13847, 13742, 14195), class = "Date"), date2 = structure(c(NA, 
13818, 13847, 13864, 14225), class = "Date"), end_date = c("2008-01-23", 
"2008-01-23", "2007-11-30", "2008-12-12", "2008-12-12")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AndrewGB