'Calculating time to next and last events, with unevenly distributed event times

I am trying to write a code that will return two separate variables: one for time from a current event to the next event for a specific individual, and one for the time since the last event happened for a specific individual, for each of the events in my data frame.

Right now I have a data frame where each row is an event, and the data frame includes many individuals (with distinct ID markers in the ID.2 column). My events are not evenly distributed through time; sometimes no events happen for many days, and then there are 5+ events within one day. I have a Datetime column in as.posix format, that comprises my date and time. I have created columns for time to next event (ttne) and time since last event (tspe), but am not sure how to populate the fields! I am a super R beginner so would appreciate any help this wonderful community can give!

My data looks like:

    > head(ACss)
    > Date         Datetime          ID.2 month day year tspe
    > 1 2019-05-25 2019-05-25 11:57  139     5  25 2019   NA
    > 2 2019-06-09 2019-06-09 19:42   43     6   9 2019   NA
    > 3 2019-07-05 2019-07-05 20:12  139     7   5 2019   NA
    > 4 2019-07-27 2019-07-27 17:27  152     7  27 2019   NA
    > 5 2019-08-04  2019-08-04 9:13  152     8   4 2019   NA
    > 6 2019-08-04 2019-08-04 16:18  139     8   4 2019   NA

I have tried a few different mutate and group functions in DPLYR, with no luck!

enter image description here



Solution 1:[1]

This is a fast data.table approach (see data below). Also, however, see my comment above that time to next event for a row is equal to time since prior event for the following rows.

setorder(data,ID.2,Datetime)
data[, `:=`(tspe=Datetime - shift(Datetime), ttne = shift(Datetime,-1)-Datetime), by=.(ID.2)][]

                 Datetime  ID.2           tspe           ttne
                   <POSc> <int>     <difftime>     <difftime>
   1: 2019-01-11 22:07:56     1        NA secs 6605737.1 secs
   2: 2019-03-29 10:03:33     1 6605737.1 secs 1597959.3 secs
   3: 2019-04-16 21:56:13     1 1597959.3 secs 5108436.7 secs
   4: 2019-06-15 00:56:49     1 5108436.7 secs  169861.8 secs
   5: 2019-06-17 00:07:51     1  169861.8 secs  547332.5 secs
  ---                                                        
4996: 2019-12-06 12:49:24   200 7668946.3 secs  471838.4 secs
4997: 2019-12-11 23:53:22   200  471838.4 secs  100658.9 secs
4998: 2019-12-13 03:51:01   200  100658.9 secs  402635.4 secs
4999: 2019-12-17 19:41:36   200  402635.4 secs  547332.5 secs
5000: 2019-12-24 03:43:49   200  547332.5 secs        NA secs

Or with dplyr

data %>%
  arrange(ID.2,Datetime) %>% 
  group_by(ID.2) %>%
  mutate(tspe=Datetime - lag(Datetime),
         tsne=lead(Datetime)-Datetime
)
   Datetime             ID.2 tspe           tsne          
   <dttm>              <int> <drtn>         <drtn>        
 1 2019-01-11 22:07:56     1        NA secs 6605737.1 secs
 2 2019-03-29 10:03:33     1 6605737.1 secs 1597959.3 secs
 3 2019-04-16 21:56:13     1 1597959.3 secs 5108436.7 secs
 4 2019-06-15 00:56:49     1 5108436.7 secs  169861.8 secs
 5 2019-06-17 00:07:51     1  169861.8 secs  547332.5 secs
 6 2019-06-23 08:10:04     1  547332.5 secs 2900233.2 secs
 7 2019-07-26 21:47:17     1 2900233.2 secs  981423.8 secs
 8 2019-08-07 06:24:21     1  981423.8 secs  257938.3 secs
 9 2019-08-10 06:03:19     1  257938.3 secs  553623.7 secs
10 2019-08-16 15:50:23     1  553623.7 secs  195026.5 secs

Input:

library(data.table)

set.seed(123)
data = data.table(
  Datetime = seq(as.POSIXct("2019-01-01"), as.POSIXct("2019-12-31"), length.out = 5000)
)
data[, ID.2:=sample(1:200, nrow(data), replace=T)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1