'Calculating time to next and last events, with unevenly distributed event times
I am trying to write a code that will return two separate variables: one for time from a current event to the next event for a specific individual, and one for the time since the last event happened for a specific individual, for each of the events in my data frame.
Right now I have a data frame where each row is an event, and the data frame includes many individuals (with distinct ID markers in the ID.2 column). My events are not evenly distributed through time; sometimes no events happen for many days, and then there are 5+ events within one day. I have a Datetime column in as.posix format, that comprises my date and time. I have created columns for time to next event (ttne) and time since last event (tspe), but am not sure how to populate the fields! I am a super R beginner so would appreciate any help this wonderful community can give!
My data looks like:
> head(ACss)
> Date Datetime ID.2 month day year tspe
> 1 2019-05-25 2019-05-25 11:57 139 5 25 2019 NA
> 2 2019-06-09 2019-06-09 19:42 43 6 9 2019 NA
> 3 2019-07-05 2019-07-05 20:12 139 7 5 2019 NA
> 4 2019-07-27 2019-07-27 17:27 152 7 27 2019 NA
> 5 2019-08-04 2019-08-04 9:13 152 8 4 2019 NA
> 6 2019-08-04 2019-08-04 16:18 139 8 4 2019 NA
I have tried a few different mutate and group functions in DPLYR, with no luck!
Solution 1:[1]
This is a fast data.table approach (see data below). Also, however, see my comment above that time to next event for a row is equal to time since prior event for the following rows.
setorder(data,ID.2,Datetime)
data[, `:=`(tspe=Datetime - shift(Datetime), ttne = shift(Datetime,-1)-Datetime), by=.(ID.2)][]
Datetime ID.2 tspe ttne
<POSc> <int> <difftime> <difftime>
1: 2019-01-11 22:07:56 1 NA secs 6605737.1 secs
2: 2019-03-29 10:03:33 1 6605737.1 secs 1597959.3 secs
3: 2019-04-16 21:56:13 1 1597959.3 secs 5108436.7 secs
4: 2019-06-15 00:56:49 1 5108436.7 secs 169861.8 secs
5: 2019-06-17 00:07:51 1 169861.8 secs 547332.5 secs
---
4996: 2019-12-06 12:49:24 200 7668946.3 secs 471838.4 secs
4997: 2019-12-11 23:53:22 200 471838.4 secs 100658.9 secs
4998: 2019-12-13 03:51:01 200 100658.9 secs 402635.4 secs
4999: 2019-12-17 19:41:36 200 402635.4 secs 547332.5 secs
5000: 2019-12-24 03:43:49 200 547332.5 secs NA secs
Or with dplyr
data %>%
arrange(ID.2,Datetime) %>%
group_by(ID.2) %>%
mutate(tspe=Datetime - lag(Datetime),
tsne=lead(Datetime)-Datetime
)
Datetime ID.2 tspe tsne
<dttm> <int> <drtn> <drtn>
1 2019-01-11 22:07:56 1 NA secs 6605737.1 secs
2 2019-03-29 10:03:33 1 6605737.1 secs 1597959.3 secs
3 2019-04-16 21:56:13 1 1597959.3 secs 5108436.7 secs
4 2019-06-15 00:56:49 1 5108436.7 secs 169861.8 secs
5 2019-06-17 00:07:51 1 169861.8 secs 547332.5 secs
6 2019-06-23 08:10:04 1 547332.5 secs 2900233.2 secs
7 2019-07-26 21:47:17 1 2900233.2 secs 981423.8 secs
8 2019-08-07 06:24:21 1 981423.8 secs 257938.3 secs
9 2019-08-10 06:03:19 1 257938.3 secs 553623.7 secs
10 2019-08-16 15:50:23 1 553623.7 secs 195026.5 secs
Input:
library(data.table)
set.seed(123)
data = data.table(
Datetime = seq(as.POSIXct("2019-01-01"), as.POSIXct("2019-12-31"), length.out = 5000)
)
data[, ID.2:=sample(1:200, nrow(data), replace=T)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
