'R, Python or Excel. How can I count time range occurrences happening during other time ranges?

Say for example, I want to count how many people are in a room while others are in that room for each day. When that person leaves, I want an updated count of how many are in that room. This feels like it should be the most simple question in history, yet I am completely stumped. See table to understand what I am working with.

These times are meant to be checked every 15 minutes and thus are recorded as full duration of their stay in 15 minute intervals.

I have access to R, Python, Excel and Power BI and am fairly familiar with each. So if anyone can help using any of those languages I would be entirely grateful.

Cheers!

ID    |  Date      |  Start Time | End Time |
11111 |  09/01/21  |  0900       | 1700     |
22222 |  09/01/21  |  1000       | 1300     | 
33333 |  09/01/21  |  0900       | 1200     |  
44444 |  09/02/21  |  0900       | 1700     | 
55555 |  09/02/21  |  1200       | 1500     |
66666 |  09/02/21  |  0945       | 1400     | 
77777 |  09/02/21  |  1000       | 1230     |
88888 |  09/02/21  |  0900       | 1445     |
99999 |  09/02/21  |  1300       | 1700     |

Edit: Both Langtang and Maël's solution worked well for me, but I ultimately ended up using something very similar to langtang's solution for my project. Here is the code langtang provided:

============================================================================

library(data.table)

# set to data.table, and create datetime columns sdate and edate
setDT(dt)[
,c("sdate","edate"):=lapply(.SD, \(x) 
lubridate::mdy_hm(paste(Date,x))),
.SDcols = c("Start Time", "End Time")
 ] 

 dt[, in_room:=sapply(edate, \(x) sum(sdate<x & edate>x)), by=Date] 
 [,`:=`(sdate=NULL, edate=NULL)]

============================================================================

Thank you all!



Solution 1:[1]

Here's a way with the ivs and tidyverse libraries in R:

library(ivs)
library(tidyverse)

df <- df %>%
  mutate(Start.Time = mdy_hm(paste(Date, Start.Time)),
         End.Time = mdy_hm(paste(Date, End.Time)),
         ivs = iv(Start.Time, End.Time))

bounds <- range(df$ivs)
lower <- iv_start(bounds[[1]])
upper <- iv_end(bounds[[2]]) - 1L

tibble(minutes = seq(lower, upper, by = 15 * 60),
       count = iv_count_between(minutes, df$ivs)) %>% 
  group_by(gp = data.table::rleid(count)) %>% 
  summarise(StartTime = min(minutes),
            EndTime = max(minutes),
            count = unique(count))

# A tibble: 14 × 4
      gp StartTime           EndTime             count
   <int> <dttm>              <dttm>              <int>
 1     1 2021-09-01 09:00:00 2021-09-01 09:45:00     2
 2     2 2021-09-01 10:00:00 2021-09-01 11:45:00     3
 3     3 2021-09-01 12:00:00 2021-09-01 12:45:00     2
 4     4 2021-09-01 13:00:00 2021-09-01 16:45:00     1
 5     5 2021-09-01 17:00:00 2021-09-02 08:45:00     0
 6     6 2021-09-02 09:00:00 2021-09-02 09:30:00     2
 7     7 2021-09-02 09:45:00 2021-09-02 09:45:00     3
 8     8 2021-09-02 10:00:00 2021-09-02 11:45:00     4
 9     9 2021-09-02 12:00:00 2021-09-02 12:15:00     5
10    10 2021-09-02 12:30:00 2021-09-02 12:45:00     4
11    11 2021-09-02 13:00:00 2021-09-02 13:45:00     5
12    12 2021-09-02 14:00:00 2021-09-02 14:30:00     4
13    13 2021-09-02 14:45:00 2021-09-02 14:45:00     3
14    14 2021-09-02 15:00:00 2021-09-02 16:45:00     2

data

df <- tibble::tribble(
     ~ID,      ~Date, ~Start.Time, ~End.Time,
  11111L, "09/01/21",      "0900",     1700,
  22222L, "09/01/21",      "1000",     1300L,
  33333L, "09/01/21",      "0900",     1200L,
  44444L, "09/02/21",      "0900",     1700L,
  55555L, "09/02/21",      "1200",     1500L,
  66666L, "09/02/21",      "0945",     1400L,
  77777L, "09/02/21",      "1000",     1230L,
  88888L, "09/02/21",      "0900",     1445L,
  99999L, "09/02/21",      "1300",     1700L
  )

Solution 2:[2]

You can use the following approach, that leverages data.table.

library(data.table)

# set to data.table, and create datetime columns sdate and edate
setDT(dt)[
  ,c("sdate","edate"):=lapply(.SD, \(x) lubridate::mdy_hm(paste(Date,x))),
  .SDcols = c("Start Time", "End Time")
] 

dt[, in_room:=sapply(edate, \(x) sum(sdate<x & edate>x)), by=Date][,`:=`(sdate=NULL, edate=NULL)]

Output:

      ID     Date Start Time End Time in_room
   <int>   <char>     <char>   <char>   <int>
1: 11111 09/01/21      09:00    17:00       0
2: 22222 09/01/21      10:00    13:00       1
3: 33333 09/01/21      09:00    12:00       2
4: 44444 09/02/21      09:00    17:00       0
5: 55555 09/02/21      12:00    15:00       2
6: 66666 09/02/21      09:45    14:00       4
7: 77777 09/02/21      10:00    12:30       4
8: 88888 09/02/21      09:00    14:45       3
9: 99999 09/02/21      13:00    17:00       0

Update

The OP has clarified that they are interested in, for each person/Date, the total number of people that were in the room with that person across the span of that person's time in the room. This is also quite simple:

ct_p <- function(s,e) dt[(sdate>e | edate<s)==F, .N]
dt[, overlapped:=ct_p(sdate,edate), by=1:nrow(dt)][,`:=`(sdate=NULL, edate=NULL)]

Output:

      ID     Date Start Time End Time overlapped
   <int>   <char>     <char>   <char>      <int>
1: 11111 09/01/21      09:00    17:00          3
2: 22222 09/01/21      10:00    13:00          3
3: 33333 09/01/21      09:00    12:00          3
4: 44444 09/02/21      09:00    17:00          6
5: 55555 09/02/21      12:00    15:00          6
6: 66666 09/02/21      09:45    14:00          6
7: 77777 09/02/21      10:00    12:30          5
8: 88888 09/02/21      09:00    14:45          6
9: 99999 09/02/21      13:00    17:00          5

Input:

structure(list(ID = c(11111L, 22222L, 33333L, 44444L, 55555L, 
66666L, 77777L, 88888L, 99999L), Date = c("09/01/21", "09/01/21", 
"09/01/21", "09/02/21", "09/02/21", "09/02/21", "09/02/21", "09/02/21", 
"09/02/21"), `Start Time` = c("09:00", "10:00", "09:00", "09:00", 
"12:00", "09:45", "10:00", "09:00", "13:00"), `End Time` = c("17:00", 
"13:00", "12:00", "17:00", "15:00", "14:00", "12:30", "14:45", 
"17:00")), row.names = c(NA, -9L), class = "data.frame")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2