'Group by ID and keep latest date

I have a dataset that looks like this

 ID       date
 1        2020-05-01
 1        2020-06-13
 2        2021-01-02
 2        2021-01-02
 3        2022-01-07
 3        2021-12-14

And so on. There are about 30 variables in my dataset. I want to group by ID, and rewrite df$date such that only the latest date remains for each ID:

ID       date
 1        2020-06-13
 1        2020-06-13
 2        2021-01-02
 2        2021-01-02
 3        2022-01-07
 3        2022-01-07

Date is formatted as "%Y-%m-%d"

r date group-by

Solution 1:^[1]

You can use an if_else statement to check for latest date, and replace any date that is not the latest.

library(dplyr)

df %>% 
  group_by(ID) %>% 
  mutate(date = if_else(date == max(date), as.Date(date), as.Date(max(date))))

# A tibble: 6 × 2
# Groups:   ID [3]
     ID date      
  <int> <date>    
1     1 2020-06-13
2     1 2020-06-13
3     2 2021-01-02
4     2 2021-01-02
5     3 2022-01-07
6     3 2022-01-07

Solution 2:^[2]

We could use max(date) with mutate:

library(dplyr)
library(lubridate)
df %>% 
  mutate(date = ymd(date)) %>% # need only to transform to date class
  group_by(ID) %>% 
  mutate(date = max(date))

     ID date      
  <int> <date>    
1     1 2020-06-13
2     1 2020-06-13
3     2 2021-01-02
4     2 2021-01-02
5     3 2022-01-07
6     3 2022-01-07

Solution 3:^[3]

If just keeping the rows with the most recent date per group is your purpose, you could do without group_by.

df %>% 
  arrange(ID, desc(date)) %>% 
  distinct(ID, .keep_all = T)

  ID       date
1  1 2020-06-13
2  2 2021-01-02
3  3 2022-01-07

Solution 4:^[4]

This is where the data.table syntax shines.

library(data.table)
setDT(mydata)[, date := max(date), by = ID]
#    ID       date
# 1:  1 2020-06-13
# 2:  1 2020-06-13
# 3:  2 2021-01-02
# 4:  2 2021-01-02
# 5:  3 2022-01-07
# 6:  3 2022-01-07

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	benson23
Solution 2
Solution 3	Lennyy
Solution 4	Wimpel

'Group by ID and keep latest date

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]