'How to separate daily data into weekly or monthly data in R

I have daily discharge data from a local stream near me. I am trying to sum and take the average of the daily data into weekly or monthly chunks so I can plot discharge_m3d(discharge) and Qs_sum(depletion) by weekly and monthly timeframes. Does anyone know how I can do this? I attached a figure of how my data frame looks.

enter image description here

r


Solution 1:[1]

People often use floor_date() from lubridate for these purposes. You can floor to a unit of month or week and then group by the resulting date column. Then you can use summarize() to compute the monthly or weekly sums/averages. From there you can use your plotting library of choice to visualize the result (like ggplot2, not shown).

This works even if you have more than one year of data (i.e. where the month or week number might repeat).

library(dplyr)
library(lubridate)

set.seed(123)

df <- tibble(
  date = seq(
    from = as.Date("2014-03-01"), 
    to = as.Date("2016-12-31"),
    by = 1
  ),
  Qs_sum = runif(length(date)),
  discharge_m3d = runif(length(date))
)

df
#> # A tibble: 1,037 × 3
#>    date       Qs_sum discharge_m3d
#>    <date>      <dbl>         <dbl>
#>  1 2014-03-01 0.288         0.560 
#>  2 2014-03-02 0.788         0.427 
#>  3 2014-03-03 0.409         0.448 
#>  4 2014-03-04 0.883         0.833 
#>  5 2014-03-05 0.940         0.720 
#>  6 2014-03-06 0.0456        0.457 
#>  7 2014-03-07 0.528         0.521 
#>  8 2014-03-08 0.892         0.242 
#>  9 2014-03-09 0.551         0.0759
#> 10 2014-03-10 0.457         0.391 
#> # … with 1,027 more rows

df %>%
  mutate(date = floor_date(date, unit = "month")) %>%
  group_by(date) %>%
  summarise(
    n = n(),
    qs_total = sum(Qs_sum),
    qs_average = mean(Qs_sum),
    discharge_total = sum(discharge_m3d),
    discharge_average = mean(discharge_m3d),
    .groups = "drop"
  )
#> # A tibble: 34 × 6
#>    date           n qs_total qs_average discharge_total discharge_average
#>    <date>     <int>    <dbl>      <dbl>           <dbl>             <dbl>
#>  1 2014-03-01    31     18.1      0.585            15.3             0.494
#>  2 2014-04-01    30     12.9      0.429            15.2             0.507
#>  3 2014-05-01    31     15.5      0.500            15.3             0.493
#>  4 2014-06-01    30     15.8      0.525            16.3             0.542
#>  5 2014-07-01    31     15.1      0.487            13.9             0.449
#>  6 2014-08-01    31     14.8      0.478            16.2             0.522
#>  7 2014-09-01    30     15.3      0.511            13.1             0.436
#>  8 2014-10-01    31     15.6      0.504            14.7             0.475
#>  9 2014-11-01    30     16.0      0.532            15.1             0.502
#> 10 2014-12-01    31     14.2      0.458            15.5             0.502
#> # … with 24 more rows

# Assert that the "start of the week" is Sunday.
# So groups are made of data from [Sunday -> Monday]
sunday <- 7L

df %>%
  mutate(date = floor_date(date, unit = "week", week_start = sunday)) %>%
  group_by(date) %>%
  summarise(
    n = n(),
    qs_total = sum(Qs_sum),
    qs_average = mean(Qs_sum),
    discharge_total = sum(discharge_m3d),
    discharge_average = mean(discharge_m3d),
    .groups = "drop"
  )
#> # A tibble: 149 × 6
#>    date           n qs_total qs_average discharge_total discharge_average
#>    <date>     <int>    <dbl>      <dbl>           <dbl>             <dbl>
#>  1 2014-02-23     1    0.288      0.288           0.560             0.560
#>  2 2014-03-02     7    4.49       0.641           3.65              0.521
#>  3 2014-03-09     7    3.77       0.539           3.88              0.554
#>  4 2014-03-16     7    4.05       0.579           3.45              0.493
#>  5 2014-03-23     7    4.43       0.632           3.08              0.440
#>  6 2014-03-30     7    4.00       0.572           4.74              0.677
#>  7 2014-04-06     7    2.50       0.357           3.15              0.449
#>  8 2014-04-13     7    2.48       0.355           2.44              0.349
#>  9 2014-04-20     7    2.30       0.329           2.45              0.349
#> 10 2014-04-27     7    3.44       0.492           4.40              0.629
#> # … with 139 more rows

Created on 2022-04-13 by the reprex package (v2.0.1)

Solution 2:[2]

One way to approach this is using the lubridate and dplyr packages in the tidyverse. I assume here that your dates are year-month-day which they appear to be and that you only have one calendar year or at least no repeated months/weeks across two years.


monthly_discharge <- discharge %>%
filter(variable == "discharge") # First select just the rows that represent discharge (not clear if that's necessary here)
mutate(date = ymd(date), # convert date to a lubridate date object
       month = month(date), # extract the numbered month from the date
       week = week(date)) %>% # extract the numbered week in a year from the date
group_by(month, stream) %>% # group your data by month and stream
summarize(discharge_summary = mean(discharge_m3d)) # summarize your data so that each month has a single row with a single (mean) discharge value
# you can include multiple summary variables within the summarize function

This should produce a data frame with one row per month for each stream and a summary value for discharge. You could summarize by week by changing the month label in group_by to week.

Solution 3:[3]

Make use of the functions week(), month() and year() from the package lubridate to get the corresponding values for your date column. Afterwards we can find the means per week, month or year. For illustration, I added a row with year 2015, since there was only year 2014 in your sample data. Furthermore, for plotting reasons, I added a column "Year_Month" that shows the abbreviated month followed by year (x axis of the plot).

library(dplyr)
library(lubridate)

data <- data %>% mutate(Week = week(date), Month = month(date), Year = year(date)) %>% 
  group_by(Year, Week) %>% 
  mutate(mean_Week_Qs = mean(Qs_sum)) %>% 
  ungroup() %>%
  group_by(Year, Month) %>%
  mutate(mean_Month_Qs = mean(Qs_sum)) %>% 
  ungroup() %>% 
  group_by(Year) %>% 
  mutate(mean_Year_Qs = mean(Qs_sum)) %>% 
  ungroup() %>% 
  mutate(Year_Month = paste0(lubridate::month(date, label = TRUE), " ", Year)) %>% 
  ungroup()

> data
# A tibble: 12 x 10
   date       discharge_m3d  Qs_sum  Week Month  Year mean_Week_Qs mean_Month_Qs mean_Year_Qs Year_Month
   <date>             <dbl>   <dbl> <int> <int> <int>        <dbl>         <dbl>        <dbl> <chr>     
 1 2014-03-01           797 0           9     3  2014       0.0409         0.629        0.629 Mar 2014  
 2 2014-03-02           826 0.00833     9     3  2014       0.0409         0.629        0.629 Mar 2014  
 3 2014-03-03          3760 0.114       9     3  2014       0.0409         0.629        0.629 Mar 2014  
 4 2014-03-04          4330 0.292      10     3  2014       0.785          0.629        0.629 Mar 2014  
 5 2014-03-05          2600 0.480      10     3  2014       0.785          0.629        0.629 Mar 2014  
 6 2014-03-06          4620 0.656      10     3  2014       0.785          0.629        0.629 Mar 2014  
 7 2014-03-07          2510 0.816      10     3  2014       0.785          0.629        0.629 Mar 2014  
 8 2014-03-08          1620 0.959      10     3  2014       0.785          0.629        0.629 Mar 2014  
 9 2014-03-09          2270 1.09       10     3  2014       0.785          0.629        0.629 Mar 2014  
10 2014-03-10          5650 1.20       10     3  2014       0.785          0.629        0.629 Mar 2014  
11 2014-03-11          2530 1.31       11     3  2014       1.31           0.629        0.629 Mar 2014  
12 2015-03-06          1470 1.52       10     3  2015       1.52           1.52         1.52  Mar 2015 

Now we can plot, for example Qs_sum per year and month, and add the mean as a red dot:

ggplot(data, aes(Year_Month, Qs_sum)) +
  theme_classic() +
  geom_point(size = 2) +
  geom_point(aes(Year_Month, mean_Month_Qs), color = "red", size = 5, alpha = 0.6)

enter image description here

To summarize the results by weekly or monthly averages, you can do as follows, using distinct():

data %>% distinct(Year, Week, mean_Week_Qs)
    
# A tibble: 4 x 3
   Week  Year mean_Week_Qs
  <int> <int>        <dbl>
1     9  2014       0.0409
2    10  2014       0.785 
3    11  2014       1.31  
4    10  2015       1.52  
    
data %>% distinct(Year, Month, mean_Month_Qs)
    
# A tibble: 2 x 3
  Month  Year mean_Month_Qs
  <int> <int>         <dbl>
1     3  2014         0.629
2     3  2015         1.52 

This can only be done after the mutate() and mean() commands above. If you want to get directly to summarized results, you can use summarize() directly on the initial dataframe:

data %>% group_by(Year, Week) %>% summarise(Week_Avg = mean(Qs_sum))

# A tibble: 4 x 3
# Groups:   Year [2]
   Year  Week Week_Avg
  <int> <int>    <dbl>
1  2014     9   0.0409
2  2014    10   0.785 
3  2014    11   1.31  
4  2015    10   1.52  

data %>% group_by(Year, Month) %>% summarise(Month_Avg = mean(Qs_sum))

# A tibble: 2 x 3
# Groups:   Year [2]
   Year Month Month_Avg
  <int> <int>     <dbl>
1  2014     3     0.629
2  2015     3     1.52 

Note that for plotting, mutate() is preferred, since it preserves the single weekly points (black in the plot above), if we used summarise() instead, we would be left with only the red points.

Data

data <- structure(list(date = structure(16130:16140, class = "Date"), 
discharge_m3d = c(797, 826, 3760, 4330, 2600, 4620, 2510, 
1620, 2270, 5650, 2530), Qs_sum = c(0, 0.00833424, 0.114224781, 
0.291812109, 0.479780482, 0.656321971, 0.816140731, 0.959334606, 
1.087579095, 1.20284046, 1.30695595), Week = c(9L, 9L, 9L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L), Month = c(3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), row.names = c(NA, -11L
 ), class = c("tbl_df", "tbl", "data.frame"))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Davis Vaughan
Solution 2 Robbie Richards
Solution 3