'Find the overlap between two timestamps in R to assign shifts
Problem
Currently, I have a large flight crew schedule dataset, with a start and end time, where my goal is to identify whether an employee was working a night shift. A night shift is defined as any portion of the shift between 01:00:00 and 05:59:59. I have looked at functions such as %overlaps%, but these seem not to work for only timestamps. Some sample data (in UTC-tz):
library(lubridate)
df <- data.frame(start = ymd_hms(c("2018-09-19 23:30:00", "2018-09-19 17:00:00", "2018-09-22 04:30:00")),
end = ymd_hms(c('2018-09-20 07:05:00', "2018-09-19 21:00:00", "2018-09-22 12:00:00")))
Solution
Ideally, I would like to get the following output, with a Boolean variable indicating whether the employee worked a night shift:
start end night.shift
2018-09-19 23:30:00 | 2018-09-20 07:05:00 | TRUE
2018-09-19 17:00:00 | 2018-09-19 21:00:00 | FALSE
2018-09-22 04:30:00 | 2018-09-22 12:00:00 | TRUE
Thanks in advance!
Solution 1:[1]
Using seq.POSIXt
transform(df, night.shift=mapply(\(x, y) any(
as.POSIXct(outer(as.Date(c(x, y)), c('01:00:00', '05:59:59'), paste), tz='GMT') %in%
seq.POSIXt(x, y, by='sec')),
start, end))
# start end night.shift
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00 TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00 FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00 TRUE
or, almost twice as fast, %inrange% from the data.table package.
library(data.table)
transform(df, night.shift=mapply(\(x, y) any(
as.POSIXct(outer(as.Date(c(x, y)), c('01:00:00', '05:59:59'), paste), tz='GMT') %inrange%
c(x, y)),
start, end))
# start end night.shift
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00 TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00 FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00 TRUE
Solution 2:[2]
You can use interval() or %--% to create an Interval object and int_overlaps() to test if two intervals overlap.
library(dplyr)
library(lubridate)
df %>%
mutate(
night.shift = int_overlaps(
(date(start) + hms("01:00:00")) %--% (date(start) + hms("05:59:59")),
start %--% end
) | int_overlaps(
(date(end) + hms("01:00:00")) %--% (date(end) + hms("05:59:59")),
start %--% end
)
)
Another way is using %within% to check whether a date-time object falls within an interval.
df %>%
rowwise() %>%
mutate(
night.shift = any(outer(date(c(start, end)), hms(c("01:00:00", "05:59:59")), `+`) %within% (start %--% end))
) %>%
ungroup()
Output
# # A tibble: 4 × 3
# start end night.shift
# <dttm> <dttm> <lgl>
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00 TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00 FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00 TRUE
# 4 2018-09-22 04:30:00 2018-09-23 00:30:00 TRUE
Reference
Solution 3:[3]
This is super janky and not optimized, but it works (and was fun to figure out). You'll want to vectorize it if possible.
library(lubridate)
df <- data.frame(start = ymd_hms(c("2018-09-19 23:30:00", "2018-09-19 17:00:00", "2018-09-22 04:30:00")),
end = ymd_hms(c('2018-09-20 07:05:00', "2018-09-19 21:00:00", "2018-09-22 12:00:00")))
night <- interval( hms::as_hms(3600), hms::as_hms(21599), tz = "UTC")
print(night)
for(i in 1:3) {
s = df$start[i]
f = df$end[i]
start_seconds = hms::as_hms(60*60*hour(s) + 60*minute(s) + second(s))
end_seconds = hms::as_hms(60*60*hour(f) + 60*minute(f) + second(f))
interval <- interval(start_seconds, end_seconds, tz = "UTC")
t <- int_overlaps(night, interval)
print(t)
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jay.sf |
| Solution 2 | |
| Solution 3 | Jonathan Graves |
