'How to flag time-varying indicators with overlapping dates in a longitudinal data set?
I have a simulated data set with 5 rows, each representing a block of person-time, each with its own start and end date ('start' and 'end').
- Each row has a visit date associated with it ('visit'), and this is filled up until the row that contains that actual date, and then it's followed by a new visit date (eg, '2015-09-11' repeats until there's a row that contains the date of the next visit, which is '2015-09-17').
- There is a 'mo_previsit' date that takes 'visit' minus 1 month
- There is a 'flag_mo' variable that marks the row in which the 'mo_previsit' date falls, and a 'flag_rows' variable that will flag all rows that are contained within those 30 days
Problem: 'flag_mo' and 'flag_rows' work for the first visit date ('2015-09-11'), but not for the second visit date ('2015-09-17') - this is because they're based on rows that contain the mo_previsit, but it cannot search for that beyond the value (and grouping it differently does not seem to change this). How can I edit this code to allow it to overlap its search across visit dates when it creates 'flag_mo'?
#Load packages
pacman::p_load(dplyr, tidyr, lubridate)
#Create variables for data set
start <- c('2015-01-01', '2015-04-04', '2015-08-13', '2015-09-11', '2015-09-17')
end <- c('2015-04-03', '2015-08-12', '2015-09-10', '2015-09-16', '2015-12-31')
visit <- c('2015-09-11', '2015-09-11', '2015-09-11', '2015-09-11', '2015-09-17')
row <- c(1, 2, 3, 4, 5)
#Populate data frame with variables
d <- cbind(row)
d <- as.data.frame(d)
#Format dates and add to data frame
d$start <- as.Date(start, format = '%Y-%m-%d')
d$end <- as.Date(end, format = '%Y-%m-%d')
d$visit <- as.Date(visit, format = '%Y-%m-%d')
d1 <- d %>%
group_by(visit) %>%
arrange(row) %>%
#Calculate 'mo_previsit', which is the date that occurs 1 month before each visit date
mutate(mo_previsit = visit %m-% months(1),
#Create a flag to mark the row that contains the start of that month before each visit
flag_mo = ifelse(((mo_previsit >= start) & (mo_previsit <= end)), 1, NA)) %>%
group_by(visit, flag_mo) %>%
arrange(visit) %>%
#Create a new flag so that if the visit date is the same as the start date of a given row,
#we don't want to count that row as part of the 1 month that comes before the visit date
mutate(flag_rows = ifelse(visit == start, 0, flag_mo)) %>%
ungroup()
class(d1$mo_previsit) <- 'Date'
d1
#> # A tibble: 5 × 7
#> row start end visit mo_previsit flag_mo flag_rows
#> <dbl> <date> <date> <date> <date> <dbl> <dbl>
#> 1 1 2015-01-01 2015-04-03 2015-09-11 2015-08-11 NA NA
#> 2 2 2015-04-04 2015-08-12 2015-09-11 2015-08-11 1 1
#> 3 3 2015-08-13 2015-09-10 2015-09-11 2015-08-11 NA NA
#> 4 4 2015-09-11 2015-09-16 2015-09-11 2015-08-11 NA 0
#> 5 5 2015-09-17 2015-12-31 2015-09-17 2015-08-17 NA 0
Created on 2022-05-13 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|