'How to group by a time window in R?
I want to find the highest average of departure delay in time windows of length 1 week in flights dataset of nycflights13 package. I've used
seq(min(flights:time_hour), max(flights:time_hour), by = "week")
to find the dates with the difference of one week. But I don't know how to group by these dates to find the average departure delay of each period. How can I do this using tidyverse package?
Thank you for your help in advance.
Solution 1:[1]
We can use {lubridate} to round each date to the nearest week. Two wrinkles to think about:
- To count weeks beginning with Jan 1, you'll need to specify the
week_startarg. Otherwiselubridatewill count from the previous Sunday, which in this case is 12/30/2012. - You also need to deal with incomplete weeks. In this case, the last week of the year only contains one day. I chose to drop weeks with < 7 days for this demo.
library(tidyverse)
library(lubridate)
library(nycflights13)
data(flights)
# what weekday was the first of the year?
weekdays(min(flights$time_hour))
#> [1] "Tuesday"
# Tuesday = day #2 so we'll pass `2` to `week_start`
flights %>%
group_by(week = floor_date(time_hour, unit = "week", week_start = 2)) %>%
filter(n_distinct(day) == 7) %>% # drop incomplete weeks
summarize(dep_delay_avg = mean(dep_delay, na.rm = TRUE)) %>%
arrange(desc(dep_delay_avg))
#> # A tibble: 52 x 2
#> week dep_delay_avg
#> <dttm> <dbl>
#> 1 2013-06-25 00:00:00 40.6 # week of June 25 had longest delays
#> 2 2013-07-09 00:00:00 24.4
#> 3 2013-12-17 00:00:00 24.0
#> 4 2013-07-23 00:00:00 21.8
#> 5 2013-03-05 00:00:00 21.7
#> 6 2013-04-16 00:00:00 21.6
#> 7 2013-07-16 00:00:00 20.4
#> 8 2013-07-02 00:00:00 20.1
#> 9 2013-12-03 00:00:00 19.9
#> 10 2013-05-21 00:00:00 19.2
#> # ... with 42 more rows
Created on 2022-03-06 by the reprex package (v2.0.1)
Edit: as requested by OP, here is a solution using only core {tidyverse} packages, without {lubridate}:
library(tidyverse)
library(nycflights13)
data(flights)
flights %>%
group_by(week = (as.POSIXlt(time_hour)$yday) %/% 7) %>%
filter(n_distinct(day) == 7) %>%
summarize(
week = as.Date(min(time_hour)),
dep_delay_avg = mean(dep_delay, na.rm = TRUE)
) %>%
arrange(desc(dep_delay_avg))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
