'Moving average varying window
I have an unbalanced panel, in which I have certain observations (variable x) per ID and month. I am trying to calculate a 6-month-rolling average of x, but only every March. I know that with zoo, I can calculate the average every single time, but I think that is computationally expensive. I have a very large panel, so it would be better to define an index first and pass it to the function. Also, my panel is imbalanced, so sometimes I have all 6 past values at a given March, and sometimes I do not. If there is a minimum of 3 values available, I would still like to compute the average.
Here is some sample code and my solution so far:
library(data.table)
set.seed(1)
time=rep(seq(as.Date("2010-02-01"), length=42, by="1 month") - 1,2)
IDs=rep(letters[1:2],each=length(time))
DT <- data.table(time=time,
ID=IDs,
ind=rep(1:(2*length(time))),
row=1:(2*length(time)),
x=sample(2*length(time)))
DT
DT <- DT[!ind %in% c(11,12,26)]
DT
library(zoo)
DT[,movavg := if(length(x) >= 3){ rollapply(x, 6, sum, na.rm = FALSE,align = "right",fill = NA)}else{
rep(NA,length(x))
},by=ID]
DT
The target is to simply show for each March the corresponding moving average, which contains the past 6 observations. I don't mind if the original panel is kept, that is, only in March the results are shown, or if only the March values are extracted and nothing else is shown.
My code works, but it does the calculation every row/month. What I want it to do is to work only at a defined index. The issue is, as the panel is unbalanced, the distance between the Marches is not equally long. For example, it can be 12 months from one to another year, but it could be 10 months from the next to the following year when 2 observations are unfortunately missing. Can roll apply still be used? Any hints for data table or dplyr are highly appreciated.
Solution 1:[1]
If this code from the question gives what you want
DT[,movavg := if(length(x) >= 3){ rollapply(x, 6, sum, na.rm = FALSE,align = "right",fill = NA)}else{
rep(NA,length(x))
},by=ID]
then the first of these ran 2.8x faster and gave the same result and the second one using frollsum from data.table ran 4.8x faster.
DT[, movavg := rollsumr(x, 6, fill = NA), by = ID]
DT[, movavg := frollsum(x, 6), by = ID]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | G. Grothendieck |
