'Average using dplyr if row is equal to previous row in another column
I have the following code:
#load req'd libraries
library(plyr)
library(dplyr)
library(fitzRoy)
#get the raw data from the fitzRoy package (season selection use :)
player <- fetch_player_stats(season = 2020:2021, source = "fryzigg")
#select the req'd cols
player <- player %>%
select(venue_name, match_date, match_round, player_id, player_first_name, player_last_name,
kicks, handballs, disposals)
#change the match_date to date format
player$match_date <- as.Date(player$match_date, format = "%Y-%m-%d")
#add a column for the year (season)
player$season <- format(as.Date(player$match_date, format="%Y-%m-%d"),"%Y")
#change format for match_round
player$match_round <- as.numeric(player$match_round)
#add opponent
player2$opponent <- ifelse(player2$player_team == player2$match_home_team, player2$match_away_team,
ifelse(player2$player_team == player2$match_away_team, player2$match_home_team, player2$match_away_team))
#sort
player <- player %>%
arrange(player_id, season, match_round)
head(player)
This gives me a data frame like so:
# A tibble: 6 x 10
venue_name match_date match_round player_id player_first_name player_last_name kicks handballs disposals season
<chr> <date> <dbl> <int> <chr> <chr> <int> <int> <int> <chr>
1 GIANTS Stadium 2020-03-21 1 11170 Gary Ablett 16 8 24 2020
2 GMHBA Stadium 2020-06-12 2 11170 Gary Ablett 9 12 21 2020
3 GMHBA Stadium 2020-06-20 3 11170 Gary Ablett 8 6 14 2020
4 MCG 2020-06-28 4 11170 Gary Ablett 3 8 11 2020
5 GMHBA Stadium 2020-07-04 5 11170 Gary Ablett 6 8 14 2020
6 SCG 2020-07-09 6 11170 Gary Ablett 11 3 14 2020
I am trying to add several new columns:
A season average of disposals by player that is cumulative based on each round. So for example,
using the table above, the new column would look like:
| season_average_disposals | 24 | 22.5 | 20 | 17.5 | 16.8 | 16.3
When the season changes, say from 2020 to 2021, this would reset and the first entry would be the total disposal for round 1 that season.
Similar to the above, a season average of disposals by player by venue that is cumulative based on each round.
Similar to the above, a season average of disposals by player by venue and opponent that is cumulative based on each round.
A career average that is cumulative based on season and round. So this would not reset when the season changes, it would just keep calculating.
I tried using this:
player <- player %>%
transform(season_average_disposals = ifelse(lag(season) == season, mean(disposals), disposals))
But this does not give me the required results.
Solution 1:[1]
For 1)
library(dplyr)
player %>%
group_split(season, player_id) %>%
purrr::map_dfr(~.x %>%
mutate(season_average_disposals = cummean(disposals)))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Julian |
