'How do you take average values by date/state in a dataframe?

I have a data frame like this (header):

   state start_date   end_date    created_at cycle party answer candidate_name  pct survey_length
1      Florida 2020-11-02 2020-11-02 6/14/21 15:36  2020   REP  Trump   Donald Trump 48.0        0 days
2         Iowa 2020-11-01 2020-11-02 11/2/20 09:02  2020   REP  Trump   Donald Trump 48.0        1 days
3 Pennsylvania 2020-11-01 2020-11-02 11/2/20 12:49  2020   REP  Trump   Donald Trump 49.2        1 days
4      Florida 2020-11-01 2020-11-02 11/2/20 19:02  2020   REP  Trump   Donald Trump 48.2        1 days
5      Florida 2020-10-31 2020-11-02 11/4/20 09:17  2020   REP  Trump   Donald Trump 49.4        2 days
6       Nevada 2020-10-31 2020-11-02 11/4/20 10:38  2020   REP  Trump   Donald Trump 49.1        2 days

I want to take the average value of the 'pct' column for each month, by state. How would you do this? Would you use a for loop?



Solution 1:[1]

Here's a solution with group_by and summarize.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# simulated data
df <- expand_grid(
  state = c("fl", "io", "pa", "nv"),
  start_date = seq(mdy("1/1/2022"), by = "day", length.out = 300),
) %>% 
  mutate(pct = runif(nrow(.)))

# mean pct by month
df %>%
  mutate(mnth = floor_date(start_date, unit = "month")) %>%
  group_by(state, mnth) %>%
  summarize(pct = mean(pct), .groups = "drop")
#> # A tibble: 40 x 3
#>    state mnth         pct
#>    <chr> <date>     <dbl>
#>  1 fl    2022-01-01 0.443
#>  2 fl    2022-02-01 0.529
#>  3 fl    2022-03-01 0.570
#>  4 fl    2022-04-01 0.583
#>  5 fl    2022-05-01 0.477
#>  6 fl    2022-06-01 0.499
#>  7 fl    2022-07-01 0.497
#>  8 fl    2022-08-01 0.561
#>  9 fl    2022-09-01 0.467
#> 10 fl    2022-10-01 0.437
#> # ... with 30 more rows

Created on 2022-03-14 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Arthur