'Expanding mean over time per subgroup in dataframe
Still quite new to R, so trying to figure out what I am doing wrong in the following explanation.
I am trying to calculate the expanding mean over time per subgroup for a dataframe. My code works when there is only a single subgroup in the dataframe, but starts to break when multiple subgroups are available within the dataframe.
Apologies if I have overlooked something, but I cant figure out where exactly my code is incorrect. My hunch is that I am not filling in the width correctly, but I have not been able to figure out how to change width to a dynamically expanding window over time per subgroup.
See my data below; sample file
See my code below;
library(ggplot2)
library(zoo)
library(RcppRoll)
library(dplyr)
x <- read.csv("stackoverflow.csv")
x$datatime <- as.POSIXlt(x$datatime,format="%m/%d/%Y %H:%M",tz=Sys.timezone())
x$Event <- as.factor(x$Event)
x2 <- arrange(x,x$Event,x$datatime) %>%
group_by(x$Event) %>%
mutate(ma=rollapply(data = x$Actual, width=seq_along(x$Actual), FUN=mean,
partial=TRUE, fill=NA,
align = "right"))
Any help is very much appreciated!
Thanks
EDIT:
A fix has been found! Thanks to all the useful feedback.
The working code is;
x <-
arrange(x,x$Event,x$datatime) %>%
group_by(Event) %>%
mutate(ma=rollapply(data = Actual,
width=seq_along(Actual),
FUN=mean,
partial=TRUE,
fill=NA,
align = "right"))
Solution 1:[1]
I think the problem here is that you’re using x$ to extract columns from
the original data in mutate(), rather than using the column name directly
to refer to the column in the grouped slice.
In dplyr verbs you can (and in case of grouped operations, must) refer to the columns directly.
The solution is to just remove
all x$ references from your code in dplyr functions.
Here’s a small example that illustrates what’s going on:
library(dplyr, warn.conflicts = FALSE)
tbl <- tibble(g = c(1, 1, 2, 2, 2), x = 1:5)
tbl
#> # A tibble: 5 x 2
#> g x
#> <dbl> <int>
#> 1 1 1
#> 2 1 2
#> 3 2 3
#> 4 2 4
#> 5 2 5
tbl %>%
group_by(g) %>%
mutate(y = cumsum(tbl$x))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `y`.
#> i `y = cumsum(tbl$x)`.
#> i `y` must be size 2 or 1, not 5.
#> i The error occurred in group 1: g = 1.
And how to fix it:
tbl %>%
group_by(g) %>%
mutate(y = cumsum(x))
#> # A tibble: 5 x 3
#> # Groups: g [2]
#> g x y
#> <dbl> <int> <int>
#> 1 1 1 1
#> 2 1 2 3
#> 3 2 3 3
#> 4 2 4 7
#> 5 2 5 12
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mikko Marttila |
