'Imputing date based on next(or previous) available date grouped by another column

I have a dataframe that looks like this:

  CYCLE date_cycle Randomization_Date COUPLEID
1      0                    2016-02-16    10892
2      1 2016-08-17         2016-02-19    10894
3      1 2016-08-14         2016-02-26    10899
4      1                    2016-02-26    10900
5      2 2016-03---         2016-02-26    10900
6      3 2016-07-19         2016-02-26    10900
7      4 2016-11-15         2016-02-26    10900
8      1                    2016-02-27    10901
9      2 2016-02---         2016-02-27    10901
10     1 2016-03-27         2016-03-03    10902
11     2 2016-04-21         2016-03-03    10902
12     1                    2016-03-03    10903
13     2 2016-03---         2016-03-03    10903
14     0                    2016-03-03    10904
15     1                    2016-03-03    10905
16     2                    2016-03-03    10905
17     3                    2016-03-03    10905
18     4 2016-04-14         2016-03-03    10905
19     5 2016-05---         2016-03-03    10905
20     6 2016-06---         2016-03-03    10905

The goal is to fill in the missing day for a given ID using either an earlier or later date and add/subtract 28 from that. The date_cycle variable was originally in the dataframe as a character type. I have tried to code it as follows:

mutate(rowwise(df),
       newdate = case_when( str_count(date1, pattern = "\\W") >2 ~ lag(as.Date.character(date1, "%Y-%m-%d"),1) + days(28)))

But I need to incorporate it by ID by CYCLE.

An example of my data could be made like this:

data.frame(stringsAsFactors = FALSE,
CYCLE =(0,1,1,1,2,3,4,1,2,1,2,1,2,0,1,2,3,4,5,6),
date_cycle = c(NA,"2016-08-17", "2016-08-14",NA,"2016-03---","2016-07-19", "2016-11-15",NA,"2016-02---", "2016-03-27","2016-04-21",NA, "2016-03---",NA,NA,NA,NA,"2016-04-14", "2016-05---","2016-06---"), Randomization_Date = c("2016-02-16","2016-02-19",
                                       "2016-02-26","2016-02-26",
                                       "2016-02-26","2016-02-26", 
"2016-02-26",
                                       "2016-02-27","2016-02-27", 
"2016-03-03",
                                       "2016-03-03","2016-03-03",
                                       "2016-03-03","2016-03-03", 
"2016-03-03",
                                       "2016-03-03","2016-03-03", 
"2016-03-03",
                                       "2016-03-03","2016-03-03"),
                          COUPLEID = c(10892,10894,10899,10900,
                                       10900,10900,10900,10901,10901,
                                       10902,10902,10903,10903,10904,
                                     
10905,10905,10905,10905,10905,10905)
              )

The output I am after would look like:

COUPLEID  CYCLE   date_cycle     new_date_cycle
  a         1     2014-03-27      2014-03-27
  a         1     2014-04---      2014-04-24
  b         1     2014-03-24      2014-03-24
  b         2                     2014-04-21
  b         3     2014-05---      2014-05-19
  c         1     2014-04---      2014-04-02
  c         2     2014-04-30      2014-04-30

I have also started to make a long conditional, but I wanted to ask here and see if anyone new of a more straight forward way to do it, instead of explicitly writing out all of the possible conditions.

mutate(rowwise(df),
              newdate = case_when(
                              grp == 1 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1)  ~ lead(date1,1) - days(28),
                              grp == 2 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1)) ~ lead(date1,1) - days(28),
                              grp == 3 & str_count(date1, pattern = "\\W") >2 & ...)))


Solution 1:[1]

Function to fill dates forward and backwards

filldates <- function(dates) {
  m = which(!is.na(dates))
  if(length(m)>0 & length(m)!=length(dates)) {
    if(m[1]>1) for(i in seq(m,1,-1)) if(is.na(dates[i])) dates[i]=dates[i+1]-28
    if(sum(is.na(dates))>0) for(i in seq_along(dates)) if(is.na(dates[i])) dates[i] = dates[i-1]+28
  } 
  return(dates)
}

Usage:

data %>%
  arrange(ID, grp) %>%
  group_by(ID) %>%
  mutate(date2=filldates(as.Date(date1,"%Y-%m-%d")))

Ouput:

  ID      grp date1      date2   
  <chr> <dbl> <chr>      <date>    
1 a         1 2014-03-27 2014-03-27
2 a         2 2014-04--- 2014-04-24
3 b         1 2014-03-24 2014-03-24
4 b         2 2014-04--- 2014-04-21
5 b         3 2014-05--- 2014-05-19
6 c         1 2014-03--- 2014-04-02
7 c         2 2014-04-30 2014-04-30

Solution 2:[2]

An option using purrr::accumulate().

library(tidyverse)

center <- df %>%
  group_by(ID) %>%
  mutate(helpDate = ymd(str_replace(date1, '---', '-01')),
         refDate = max(ymd(date1), na.rm = T))

backward <- center %>%
  filter(refDate == max(helpDate)) %>%
  mutate(date2 = accumulate(refDate, ~ . - days(28), .dir = 'backward'))

forward <- center %>%
  filter(refDate == min(helpDate)) %>%
  mutate(date2 = accumulate(refDate, ~ . + days(28)))

bind_rows(forward, backward) %>%
  ungroup() %>%
  mutate(date2 = as_date(date2)) %>%
  select(-c('helpDate', 'refDate'))

# # A tibble: 7 x 4
#   ID      grp date1      date2     
#   <chr> <int> <chr>      <date>    
# 1 a         1 2014-03-27 2014-03-27
# 2 a         2 2014-04--- 2014-04-24
# 3 b         1 2014-03-24 2014-03-24
# 4 b         2 2014-04--- 2014-04-21
# 5 b         3 2014-05--- 2014-05-19
# 6 c         1 2014-03--- 2014-04-02
# 7 c         2 2014-04-30 2014-04-30

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 langtang
Solution 2 rjen