'R - Excluding repeating, chronological events in row counts

I would like to count distinct events that have a time element to them. When a certain event repeats chronologically, I only want to give credit to this event one time. In this data, the events are mobility events, so passive mobility events are assumed (right or wrong, not important) to be a continuous action and not distinct. This would only matter within a specific event_date For example:

Suppose I have this df

df <- data.frame(id = c("1", "1", "1", "1", "2", "2", "2", "2"),
                 event = c("sitting", "sitting", "sitting", "sitting", "sitting", "walking", "sitting", "sitting"),
                 timestamp = c('2021-12-19 11:30:00', '2021-12-19 12:30:00', '2021-12-19 13:30:00', '2021-12-20 14:30:00', '2021-12-20 12:00:00', '2021-12-20 12:30:00', '2021-12-20 13:30:00', '2021-12-20 14:30:00'))

df$timestamp <- as.POSIXct(df$timestamp)
df$event_date <- as.Date(df$timestamp, format = "%Y-%m-%d")

> df
  id   event           timestamp event_date
1  1 sitting 2021-12-19 11:30:00 2021-12-19
2  1 sitting 2021-12-19 12:30:00 2021-12-19
3  1 sitting 2021-12-19 13:30:00 2021-12-19
4  1 sitting 2021-12-19 14:30:00 2021-12-19
5  2 sitting 2021-12-20 12:00:00 2021-12-20
6  2 walking 2021-12-20 12:30:00 2021-12-20
7  2 sitting 2021-12-20 13:30:00 2021-12-20
8  2 sitting 2021-12-20 14:30:00 2021-12-20

I would like to achieve an output that looks like this:

> df
  id event_date mobility_events
1  1 2021-12-19               1
2  1 2021-12-20               1
3  2 2021-12-20               3

id 1 only gets credit for 1 mobility event the first event_date because it is sitting for 3 consecutive timestamps of the same activity on that event_date. id 2 gets credit for 3 events because the sitting event is interrupted by a different event; however, does not get credit for the last sitting event because it is chronological and uninterrupted by a different event.

I hope this is clear.



Solution 1:[1]

You can use dplyr to simply group_by your id and date and then count the number of times event != lag(event).

library(dplyr)

df %>%
  group_by(id, event_date) %>%
  summarize(mobility_events = 1 + sum(event != lag(event), na.rm = T),
            .groups = "drop")
#> # A tibble: 3 × 3
#>   id    event_date mobility_events
#>   <chr> <date>               <dbl>
#> 1 1     2021-12-19               1
#> 2 1     2021-12-20               1
#> 3 2     2021-12-20               3

Solution 2:[2]

data

df <- data.frame(id = c("1", "1", "1", "1", "2", "2", "2", "2"),
                 event = c("sitting", "sitting", "sitting", "sitting", "sitting", "walking", "sitting", "sitting"),
                 timestamp = c('2021-12-19 11:30:00', '2021-12-19 12:30:00', '2021-12-19 13:30:00', '2021-12-20 14:30:00', '2021-12-20 12:00:00', '2021-12-20 12:30:00', '2021-12-20 13:30:00', '2021-12-20 14:30:00'))

df$timestamp <- as.POSIXct(df$timestamp)
df$event_date <- as.Date(df$timestamp, format = "%Y-%m-%d")

solution using data.table

library(data.table)

setDT(df)

df[, .(mobility_events = max(rleid(event))), by = .(id, event_date)]

output

   id event_date mobility_events
1:  1 2021-12-19               1
2:  1 2021-12-20               1
3:  2 2021-12-20               3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 caldwellst
Solution 2 Merijn van Tilborg