'How to group by a time window in R?

I want to find the highest average of departure delay in time windows of length 1 week in flights dataset of nycflights13 package. I've used

seq(min(flights:time_hour), max(flights:time_hour), by = "week")

to find the dates with the difference of one week. But I don't know how to group by these dates to find the average departure delay of each period. How can I do this using tidyverse package?

Thank you for your help in advance.



Solution 1:[1]

We can use {lubridate} to round each date to the nearest week. Two wrinkles to think about:

  1. To count weeks beginning with Jan 1, you'll need to specify the week_start arg. Otherwise lubridate will count from the previous Sunday, which in this case is 12/30/2012.
  2. You also need to deal with incomplete weeks. In this case, the last week of the year only contains one day. I chose to drop weeks with < 7 days for this demo.
library(tidyverse)
library(lubridate)
library(nycflights13)
data(flights)

# what weekday was the first of the year? 
weekdays(min(flights$time_hour))
#> [1] "Tuesday"
# Tuesday = day #2 so we'll pass `2` to `week_start`

flights %>% 
  group_by(week = floor_date(time_hour, unit = "week", week_start = 2)) %>% 
  filter(n_distinct(day) == 7) %>%        # drop incomplete weeks
  summarize(dep_delay_avg = mean(dep_delay, na.rm = TRUE)) %>% 
  arrange(desc(dep_delay_avg))

#> # A tibble: 52 x 2
#>    week                dep_delay_avg
#>    <dttm>                      <dbl>
#>  1 2013-06-25 00:00:00          40.6   # week of June 25 had longest delays
#>  2 2013-07-09 00:00:00          24.4
#>  3 2013-12-17 00:00:00          24.0
#>  4 2013-07-23 00:00:00          21.8
#>  5 2013-03-05 00:00:00          21.7
#>  6 2013-04-16 00:00:00          21.6
#>  7 2013-07-16 00:00:00          20.4
#>  8 2013-07-02 00:00:00          20.1
#>  9 2013-12-03 00:00:00          19.9
#> 10 2013-05-21 00:00:00          19.2
#> # ... with 42 more rows

Created on 2022-03-06 by the reprex package (v2.0.1)

Edit: as requested by OP, here is a solution using only core {tidyverse} packages, without {lubridate}:

library(tidyverse)
library(nycflights13)
data(flights)

flights %>% 
  group_by(week = (as.POSIXlt(time_hour)$yday) %/% 7) %>% 
  filter(n_distinct(day) == 7) %>%
  summarize(
    week = as.Date(min(time_hour)),
    dep_delay_avg = mean(dep_delay, na.rm = TRUE)
  ) %>% 
  arrange(desc(dep_delay_avg))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1