'Average using dplyr if row is equal to previous row in another column

I have the following code:

    #load req'd libraries
    library(plyr)
    library(dplyr)
    library(fitzRoy)
    
    #get the raw data from the fitzRoy package (season selection use :)
    player <- fetch_player_stats(season = 2020:2021, source = "fryzigg")
    
    #select the req'd cols
    player <- player %>%
      select(venue_name, match_date, match_round, player_id, player_first_name, player_last_name,
            kicks, handballs, disposals)
    
    #change the match_date to date format
    player$match_date <- as.Date(player$match_date, format = "%Y-%m-%d")
    
    #add a column for the year (season)
    player$season <- format(as.Date(player$match_date, format="%Y-%m-%d"),"%Y")
    
    #change format for match_round
    player$match_round <- as.numeric(player$match_round)
    
#add opponent
player2$opponent <- ifelse(player2$player_team == player2$match_home_team, player2$match_away_team, 
                           ifelse(player2$player_team == player2$match_away_team, player2$match_home_team, player2$match_away_team))

    #sort
    player <- player %>%
      arrange(player_id, season, match_round)
    
    head(player)

This gives me a data frame like so:

# A tibble: 6 x 10
  venue_name     match_date match_round player_id player_first_name player_last_name kicks handballs disposals season
  <chr>          <date>           <dbl>     <int> <chr>             <chr>            <int>     <int>     <int> <chr> 
1 GIANTS Stadium 2020-03-21           1     11170 Gary              Ablett              16         8        24 2020  
2 GMHBA Stadium  2020-06-12           2     11170 Gary              Ablett               9        12        21 2020  
3 GMHBA Stadium  2020-06-20           3     11170 Gary              Ablett               8         6        14 2020  
4 MCG            2020-06-28           4     11170 Gary              Ablett               3         8        11 2020  
5 GMHBA Stadium  2020-07-04           5     11170 Gary              Ablett               6         8        14 2020  
6 SCG            2020-07-09           6     11170 Gary              Ablett              11         3        14 2020  

I am trying to add several new columns:

  1. A season average of disposals by player that is cumulative based on each round. So for example,

    using the table above, the new column would look like:

     | season_average_disposals
     | 24
     | 22.5
     | 20
     | 17.5
     | 16.8
     | 16.3
    

When the season changes, say from 2020 to 2021, this would reset and the first entry would be the total disposal for round 1 that season.

  1. Similar to the above, a season average of disposals by player by venue that is cumulative based on each round.

  2. Similar to the above, a season average of disposals by player by venue and opponent that is cumulative based on each round.

  3. A career average that is cumulative based on season and round. So this would not reset when the season changes, it would just keep calculating.

I tried using this:

player <- player %>% 
  transform(season_average_disposals = ifelse(lag(season) == season, mean(disposals), disposals))

But this does not give me the required results.



Solution 1:[1]

For 1)

library(dplyr)
player %>% 
  group_split(season, player_id) %>% 
  purrr::map_dfr(~.x %>% 
        mutate(season_average_disposals = cummean(disposals))) 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Julian