'Predicting regression with Arima errors in forecast R package for 1 time period with xreg variables
I use this tutorial and trying to build a regression with ARIMA errors.
I have time periods for 3 years only 2017-2019 and want to predict for 2020 with exogenous variables. In total I have 4500 people and their I used 2017-2018 for train and 2019 for test.
# Split data for train and test
df_train <- df %>% filter(Year != 2019)
df_test <- df %>% filter(Year == 2019)
# Select xreg variables
xregvars_train <- cbind(# here I combine 9 variables)
# Convert to matrix
xregvars_train <- matrix(as.numeric(xregvars_train), ncol = 9)
# Retrain model only on train data - 2017 and 2018
trained_model1 <- auto.arima(df_train[,"Y"],
xreg = xregvars_train,
trace = TRUE,
seasonal = FALSE,
stepwise = FALSE,
approximation = FALSE)
summary(trained_model1)
Best model: Regression with ARIMA(2,0,2) errors
Series: df_train[, "Y"]
Regression with ARIMA(2,0,2) errors
Coefficients:
ar1 ar2 ma1 ma2 xreg1 xreg2 xreg3 xreg4 xreg5 xreg6 xreg7 xreg8 xreg9
-0.0042 0.9010 -0.0196 -0.4570 -5e-04 -0.0510 -0.2588 2.4189 -1.1462 0.2989 0.3617 4e-04 5.1636
s.e. 0.0061 0.0061 0.0126 0.0127 2e-04 0.0776 0.0838 1.8556 0.7600 0.0269 0.0182 2e-04 0.0958
sigma^2 = 15.09: log likelihood = -24898.81
AIC=49825.63 AICc=49825.67 BIC=49925.05
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.001099617 3.881168 1.388604 NaN Inf 0.3068132 -0.003611257
# Select xreg variables for test
xregvars_test <- cbind(# here I combine 9 variables)
# Convert to matrix
xregvars_test <- matrix(as.numeric(xregvars_test), ncol = 9)
# Forecast
myforecasts <- forecast::forecast(trained_model1, xreg = xregvars_test, 1)
summary(myforecasts)
For some reasons it prints me the same coefficients
Forecast method: Regression with ARIMA(2,0,2) errors
Model Information:
Series: df_train[, "Y"]
Regression with ARIMA(2,0,2) errors
Coefficients:
ar1 ar2 ma1 ma2 xreg1 xreg2 xreg3 xreg4 xreg5 xreg6 xreg7 xreg8 xreg9
-0.0042 0.9010 -0.0196 -0.4570 -5e-04 -0.0510 -0.2588 2.4189 -1.1462 0.2989 0.3617 4e-04 5.1636
s.e. 0.0061 0.0061 0.0126 0.0127 2e-04 0.0776 0.0838 1.8556 0.7600 0.0269 0.0182 2e-04 0.0958
sigma^2 = 15.09: log likelihood = -24898.81
AIC=49825.63 AICc=49825.67 BIC=49925.05
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.001099617 3.881168 1.388604 NaN Inf 0.3068132 -0.003611257
Forecasts:
And I get values:
Description:df [4,486 x 5]
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
8973 7.61958386 2.642059677 12.597108 0.00711754 15.23205
8974 45.13170539 40.152777288 50.110633 37.51709196 52.74632
8975 19.75824133 14.310610280 25.205872 11.42680860 28.08967
8976 13.18712620 7.738266115 18.635986 4.85381383 21.52044
8977 26.42824374 20.626598045 32.229889 17.55539233 35.30110
- Is my approach correct? I am not sure because my rmse is the same?
- Are my
Point Forecastvalues predicted for 2019? If so, can I export them and calculate RMSE test, having actual and predicted?
Thnaks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
