'Expanding mean over time per subgroup in dataframe

Still quite new to R, so trying to figure out what I am doing wrong in the following explanation.

I am trying to calculate the expanding mean over time per subgroup for a dataframe. My code works when there is only a single subgroup in the dataframe, but starts to break when multiple subgroups are available within the dataframe.

Apologies if I have overlooked something, but I cant figure out where exactly my code is incorrect. My hunch is that I am not filling in the width correctly, but I have not been able to figure out how to change width to a dynamically expanding window over time per subgroup.

See my data below; sample file

See my code below;

library(ggplot2)
library(zoo)
library(RcppRoll)
library(dplyr)

x <- read.csv("stackoverflow.csv")

x$datatime <- as.POSIXlt(x$datatime,format="%m/%d/%Y %H:%M",tz=Sys.timezone())
x$Event <- as.factor(x$Event)

x2 <- arrange(x,x$Event,x$datatime) %>% 
  group_by(x$Event) %>% 
  mutate(ma=rollapply(data = x$Actual, width=seq_along(x$Actual), FUN=mean,
                          partial=TRUE, fill=NA,
                          align = "right"))

Any help is very much appreciated!

Thanks

EDIT:

A fix has been found! Thanks to all the useful feedback.

The working code is;

x <- 
  arrange(x,x$Event,x$datatime) %>% 
  group_by(Event) %>% 
  mutate(ma=rollapply(data = Actual, 
                      width=seq_along(Actual), 
                      FUN=mean,
                      partial=TRUE, 
                      fill=NA,
                      align = "right"))

Solution 1:^[1]

I think the problem here is that you’re using x$ to extract columns from the original data in mutate(), rather than using the column name directly to refer to the column in the grouped slice. In dplyr verbs you can (and in case of grouped operations, must) refer to the columns directly. The solution is to just remove all x$ references from your code in dplyr functions.

Here’s a small example that illustrates what’s going on:

library(dplyr, warn.conflicts = FALSE)

tbl <- tibble(g = c(1, 1, 2, 2, 2), x = 1:5)
tbl
#> # A tibble: 5 x 2
#>       g     x
#>   <dbl> <int>
#> 1     1     1
#> 2     1     2
#> 3     2     3
#> 4     2     4
#> 5     2     5

tbl %>% 
  group_by(g) %>% 
  mutate(y = cumsum(tbl$x))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `y`.
#> i `y = cumsum(tbl$x)`.
#> i `y` must be size 2 or 1, not 5.
#> i The error occurred in group 1: g = 1.

And how to fix it:

tbl %>% 
  group_by(g) %>% 
  mutate(y = cumsum(x))
#> # A tibble: 5 x 3
#> # Groups:   g [2]
#>       g     x     y
#>   <dbl> <int> <int>
#> 1     1     1     1
#> 2     1     2     3
#> 3     2     3     3
#> 4     2     4     7
#> 5     2     5    12

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Mikko Marttila

'Expanding mean over time per subgroup in dataframe

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]