'How do you take average values by date/state in a dataframe?
I have a data frame like this (header):
state start_date end_date created_at cycle party answer candidate_name pct survey_length
1 Florida 2020-11-02 2020-11-02 6/14/21 15:36 2020 REP Trump Donald Trump 48.0 0 days
2 Iowa 2020-11-01 2020-11-02 11/2/20 09:02 2020 REP Trump Donald Trump 48.0 1 days
3 Pennsylvania 2020-11-01 2020-11-02 11/2/20 12:49 2020 REP Trump Donald Trump 49.2 1 days
4 Florida 2020-11-01 2020-11-02 11/2/20 19:02 2020 REP Trump Donald Trump 48.2 1 days
5 Florida 2020-10-31 2020-11-02 11/4/20 09:17 2020 REP Trump Donald Trump 49.4 2 days
6 Nevada 2020-10-31 2020-11-02 11/4/20 10:38 2020 REP Trump Donald Trump 49.1 2 days
I want to take the average value of the 'pct' column for each month, by state. How would you do this? Would you use a for loop?
Solution 1:[1]
Here's a solution with group_by and summarize.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
# simulated data
df <- expand_grid(
state = c("fl", "io", "pa", "nv"),
start_date = seq(mdy("1/1/2022"), by = "day", length.out = 300),
) %>%
mutate(pct = runif(nrow(.)))
# mean pct by month
df %>%
mutate(mnth = floor_date(start_date, unit = "month")) %>%
group_by(state, mnth) %>%
summarize(pct = mean(pct), .groups = "drop")
#> # A tibble: 40 x 3
#> state mnth pct
#> <chr> <date> <dbl>
#> 1 fl 2022-01-01 0.443
#> 2 fl 2022-02-01 0.529
#> 3 fl 2022-03-01 0.570
#> 4 fl 2022-04-01 0.583
#> 5 fl 2022-05-01 0.477
#> 6 fl 2022-06-01 0.499
#> 7 fl 2022-07-01 0.497
#> 8 fl 2022-08-01 0.561
#> 9 fl 2022-09-01 0.467
#> 10 fl 2022-10-01 0.437
#> # ... with 30 more rows
Created on 2022-03-14 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Arthur |
