'How to separate daily data into weekly or monthly data in R
I have daily discharge data from a local stream near me. I am trying to sum and take the average of the daily data into weekly or monthly chunks so I can plot discharge_m3d(discharge) and Qs_sum(depletion) by weekly and monthly timeframes. Does anyone know how I can do this? I attached a figure of how my data frame looks.
Solution 1:[1]
People often use floor_date() from lubridate for these purposes. You can floor to a unit of month or week and then group by the resulting date column. Then you can use summarize() to compute the monthly or weekly sums/averages. From there you can use your plotting library of choice to visualize the result (like ggplot2, not shown).
This works even if you have more than one year of data (i.e. where the month or week number might repeat).
library(dplyr)
library(lubridate)
set.seed(123)
df <- tibble(
date = seq(
from = as.Date("2014-03-01"),
to = as.Date("2016-12-31"),
by = 1
),
Qs_sum = runif(length(date)),
discharge_m3d = runif(length(date))
)
df
#> # A tibble: 1,037 × 3
#> date Qs_sum discharge_m3d
#> <date> <dbl> <dbl>
#> 1 2014-03-01 0.288 0.560
#> 2 2014-03-02 0.788 0.427
#> 3 2014-03-03 0.409 0.448
#> 4 2014-03-04 0.883 0.833
#> 5 2014-03-05 0.940 0.720
#> 6 2014-03-06 0.0456 0.457
#> 7 2014-03-07 0.528 0.521
#> 8 2014-03-08 0.892 0.242
#> 9 2014-03-09 0.551 0.0759
#> 10 2014-03-10 0.457 0.391
#> # … with 1,027 more rows
df %>%
mutate(date = floor_date(date, unit = "month")) %>%
group_by(date) %>%
summarise(
n = n(),
qs_total = sum(Qs_sum),
qs_average = mean(Qs_sum),
discharge_total = sum(discharge_m3d),
discharge_average = mean(discharge_m3d),
.groups = "drop"
)
#> # A tibble: 34 × 6
#> date n qs_total qs_average discharge_total discharge_average
#> <date> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2014-03-01 31 18.1 0.585 15.3 0.494
#> 2 2014-04-01 30 12.9 0.429 15.2 0.507
#> 3 2014-05-01 31 15.5 0.500 15.3 0.493
#> 4 2014-06-01 30 15.8 0.525 16.3 0.542
#> 5 2014-07-01 31 15.1 0.487 13.9 0.449
#> 6 2014-08-01 31 14.8 0.478 16.2 0.522
#> 7 2014-09-01 30 15.3 0.511 13.1 0.436
#> 8 2014-10-01 31 15.6 0.504 14.7 0.475
#> 9 2014-11-01 30 16.0 0.532 15.1 0.502
#> 10 2014-12-01 31 14.2 0.458 15.5 0.502
#> # … with 24 more rows
# Assert that the "start of the week" is Sunday.
# So groups are made of data from [Sunday -> Monday]
sunday <- 7L
df %>%
mutate(date = floor_date(date, unit = "week", week_start = sunday)) %>%
group_by(date) %>%
summarise(
n = n(),
qs_total = sum(Qs_sum),
qs_average = mean(Qs_sum),
discharge_total = sum(discharge_m3d),
discharge_average = mean(discharge_m3d),
.groups = "drop"
)
#> # A tibble: 149 × 6
#> date n qs_total qs_average discharge_total discharge_average
#> <date> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2014-02-23 1 0.288 0.288 0.560 0.560
#> 2 2014-03-02 7 4.49 0.641 3.65 0.521
#> 3 2014-03-09 7 3.77 0.539 3.88 0.554
#> 4 2014-03-16 7 4.05 0.579 3.45 0.493
#> 5 2014-03-23 7 4.43 0.632 3.08 0.440
#> 6 2014-03-30 7 4.00 0.572 4.74 0.677
#> 7 2014-04-06 7 2.50 0.357 3.15 0.449
#> 8 2014-04-13 7 2.48 0.355 2.44 0.349
#> 9 2014-04-20 7 2.30 0.329 2.45 0.349
#> 10 2014-04-27 7 3.44 0.492 4.40 0.629
#> # … with 139 more rows
Created on 2022-04-13 by the reprex package (v2.0.1)
Solution 2:[2]
One way to approach this is using the lubridate and dplyr packages in the tidyverse. I assume here that your dates are year-month-day which they appear to be and that you only have one calendar year or at least no repeated months/weeks across two years.
monthly_discharge <- discharge %>%
filter(variable == "discharge") # First select just the rows that represent discharge (not clear if that's necessary here)
mutate(date = ymd(date), # convert date to a lubridate date object
month = month(date), # extract the numbered month from the date
week = week(date)) %>% # extract the numbered week in a year from the date
group_by(month, stream) %>% # group your data by month and stream
summarize(discharge_summary = mean(discharge_m3d)) # summarize your data so that each month has a single row with a single (mean) discharge value
# you can include multiple summary variables within the summarize function
This should produce a data frame with one row per month for each stream and a summary value for discharge. You could summarize by week by changing the month label in group_by to week.
Solution 3:[3]
Make use of the functions week(), month() and year() from the package lubridate to get the corresponding values for your date column. Afterwards we can find the means per week, month or year. For illustration, I added a row with year 2015, since there was only year 2014 in your sample data. Furthermore, for plotting reasons, I added a column "Year_Month" that shows the abbreviated month followed by year (x axis of the plot).
library(dplyr)
library(lubridate)
data <- data %>% mutate(Week = week(date), Month = month(date), Year = year(date)) %>%
group_by(Year, Week) %>%
mutate(mean_Week_Qs = mean(Qs_sum)) %>%
ungroup() %>%
group_by(Year, Month) %>%
mutate(mean_Month_Qs = mean(Qs_sum)) %>%
ungroup() %>%
group_by(Year) %>%
mutate(mean_Year_Qs = mean(Qs_sum)) %>%
ungroup() %>%
mutate(Year_Month = paste0(lubridate::month(date, label = TRUE), " ", Year)) %>%
ungroup()
> data
# A tibble: 12 x 10
date discharge_m3d Qs_sum Week Month Year mean_Week_Qs mean_Month_Qs mean_Year_Qs Year_Month
<date> <dbl> <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <chr>
1 2014-03-01 797 0 9 3 2014 0.0409 0.629 0.629 Mar 2014
2 2014-03-02 826 0.00833 9 3 2014 0.0409 0.629 0.629 Mar 2014
3 2014-03-03 3760 0.114 9 3 2014 0.0409 0.629 0.629 Mar 2014
4 2014-03-04 4330 0.292 10 3 2014 0.785 0.629 0.629 Mar 2014
5 2014-03-05 2600 0.480 10 3 2014 0.785 0.629 0.629 Mar 2014
6 2014-03-06 4620 0.656 10 3 2014 0.785 0.629 0.629 Mar 2014
7 2014-03-07 2510 0.816 10 3 2014 0.785 0.629 0.629 Mar 2014
8 2014-03-08 1620 0.959 10 3 2014 0.785 0.629 0.629 Mar 2014
9 2014-03-09 2270 1.09 10 3 2014 0.785 0.629 0.629 Mar 2014
10 2014-03-10 5650 1.20 10 3 2014 0.785 0.629 0.629 Mar 2014
11 2014-03-11 2530 1.31 11 3 2014 1.31 0.629 0.629 Mar 2014
12 2015-03-06 1470 1.52 10 3 2015 1.52 1.52 1.52 Mar 2015
Now we can plot, for example Qs_sum per year and month, and add the mean as a red dot:
ggplot(data, aes(Year_Month, Qs_sum)) +
theme_classic() +
geom_point(size = 2) +
geom_point(aes(Year_Month, mean_Month_Qs), color = "red", size = 5, alpha = 0.6)
To summarize the results by weekly or monthly averages, you can do as follows, using distinct():
data %>% distinct(Year, Week, mean_Week_Qs)
# A tibble: 4 x 3
Week Year mean_Week_Qs
<int> <int> <dbl>
1 9 2014 0.0409
2 10 2014 0.785
3 11 2014 1.31
4 10 2015 1.52
data %>% distinct(Year, Month, mean_Month_Qs)
# A tibble: 2 x 3
Month Year mean_Month_Qs
<int> <int> <dbl>
1 3 2014 0.629
2 3 2015 1.52
This can only be done after the mutate() and mean() commands above. If you want to get directly to summarized results, you can use summarize() directly on the initial dataframe:
data %>% group_by(Year, Week) %>% summarise(Week_Avg = mean(Qs_sum))
# A tibble: 4 x 3
# Groups: Year [2]
Year Week Week_Avg
<int> <int> <dbl>
1 2014 9 0.0409
2 2014 10 0.785
3 2014 11 1.31
4 2015 10 1.52
data %>% group_by(Year, Month) %>% summarise(Month_Avg = mean(Qs_sum))
# A tibble: 2 x 3
# Groups: Year [2]
Year Month Month_Avg
<int> <int> <dbl>
1 2014 3 0.629
2 2015 3 1.52
Note that for plotting, mutate() is preferred, since it preserves the single weekly points (black in the plot above), if we used summarise() instead, we would be left with only the red points.
Data
data <- structure(list(date = structure(16130:16140, class = "Date"),
discharge_m3d = c(797, 826, 3760, 4330, 2600, 4620, 2510,
1620, 2270, 5650, 2530), Qs_sum = c(0, 0.00833424, 0.114224781,
0.291812109, 0.479780482, 0.656321971, 0.816140731, 0.959334606,
1.087579095, 1.20284046, 1.30695595), Week = c(9L, 9L, 9L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L), Month = c(3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), row.names = c(NA, -11L
), class = c("tbl_df", "tbl", "data.frame"))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Davis Vaughan |
| Solution 2 | Robbie Richards |
| Solution 3 |


