'R - Excluding repeating, chronological events in row counts
I would like to count distinct events that have a time element to them. When a certain event repeats chronologically, I only want to give credit to this event one time. In this data, the events are mobility events, so passive mobility events are assumed (right or wrong, not important) to be a continuous action and not distinct. This would only matter within a specific event_date For example:
Suppose I have this df
df <- data.frame(id = c("1", "1", "1", "1", "2", "2", "2", "2"),
event = c("sitting", "sitting", "sitting", "sitting", "sitting", "walking", "sitting", "sitting"),
timestamp = c('2021-12-19 11:30:00', '2021-12-19 12:30:00', '2021-12-19 13:30:00', '2021-12-20 14:30:00', '2021-12-20 12:00:00', '2021-12-20 12:30:00', '2021-12-20 13:30:00', '2021-12-20 14:30:00'))
df$timestamp <- as.POSIXct(df$timestamp)
df$event_date <- as.Date(df$timestamp, format = "%Y-%m-%d")
> df
id event timestamp event_date
1 1 sitting 2021-12-19 11:30:00 2021-12-19
2 1 sitting 2021-12-19 12:30:00 2021-12-19
3 1 sitting 2021-12-19 13:30:00 2021-12-19
4 1 sitting 2021-12-19 14:30:00 2021-12-19
5 2 sitting 2021-12-20 12:00:00 2021-12-20
6 2 walking 2021-12-20 12:30:00 2021-12-20
7 2 sitting 2021-12-20 13:30:00 2021-12-20
8 2 sitting 2021-12-20 14:30:00 2021-12-20
I would like to achieve an output that looks like this:
> df
id event_date mobility_events
1 1 2021-12-19 1
2 1 2021-12-20 1
3 2 2021-12-20 3
id 1 only gets credit for 1 mobility event the first event_date because it is sitting for 3 consecutive timestamps of the same activity on that event_date. id 2 gets credit for 3 events because the sitting event is interrupted by a different event; however, does not get credit for the last sitting event because it is chronological and uninterrupted by a different event.
I hope this is clear.
Solution 1:[1]
You can use dplyr to simply group_by your id and date and then count the number of times event != lag(event).
library(dplyr)
df %>%
group_by(id, event_date) %>%
summarize(mobility_events = 1 + sum(event != lag(event), na.rm = T),
.groups = "drop")
#> # A tibble: 3 × 3
#> id event_date mobility_events
#> <chr> <date> <dbl>
#> 1 1 2021-12-19 1
#> 2 1 2021-12-20 1
#> 3 2 2021-12-20 3
Solution 2:[2]
data
df <- data.frame(id = c("1", "1", "1", "1", "2", "2", "2", "2"),
event = c("sitting", "sitting", "sitting", "sitting", "sitting", "walking", "sitting", "sitting"),
timestamp = c('2021-12-19 11:30:00', '2021-12-19 12:30:00', '2021-12-19 13:30:00', '2021-12-20 14:30:00', '2021-12-20 12:00:00', '2021-12-20 12:30:00', '2021-12-20 13:30:00', '2021-12-20 14:30:00'))
df$timestamp <- as.POSIXct(df$timestamp)
df$event_date <- as.Date(df$timestamp, format = "%Y-%m-%d")
solution using data.table
library(data.table)
setDT(df)
df[, .(mobility_events = max(rleid(event))), by = .(id, event_date)]
output
id event_date mobility_events
1: 1 2021-12-19 1
2: 1 2021-12-20 1
3: 2 2021-12-20 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | caldwellst |
| Solution 2 | Merijn van Tilborg |
