'Predicting regression with Arima errors in forecast R package for 1 time period with xreg variables

I use this tutorial and trying to build a regression with ARIMA errors.

I have time periods for 3 years only 2017-2019 and want to predict for 2020 with exogenous variables. In total I have 4500 people and their I used 2017-2018 for train and 2019 for test.

# Split data for train and test
df_train <- df %>% filter(Year != 2019)
df_test <- df %>% filter(Year == 2019)

# Select xreg variables
xregvars_train <- cbind(# here I combine 9 variables) 

# Convert to matrix
xregvars_train <- matrix(as.numeric(xregvars_train), ncol = 9)

# Retrain model only on train data - 2017 and 2018
trained_model1 <- auto.arima(df_train[,"Y"], 
                        xreg = xregvars_train, 
                        trace = TRUE, 
                        seasonal = FALSE,
                        stepwise = FALSE,
                        approximation = FALSE)

 summary(trained_model1)

Best model: Regression with ARIMA(2,0,2) errors 

Series: df_train[, "Y"] 
Regression with ARIMA(2,0,2) errors 

Coefficients:
      ar1     ar2      ma1      ma2   xreg1    xreg2    xreg3   xreg4    xreg5   xreg6   xreg7  xreg8   xreg9
  -0.0042  0.9010  -0.0196  -0.4570  -5e-04  -0.0510  -0.2588  2.4189  -1.1462  0.2989  0.3617  4e-04  5.1636
s.e.   0.0061  0.0061   0.0126   0.0127   2e-04   0.0776   0.0838  1.8556   0.7600  0.0269  0.0182  2e-04  0.0958

sigma^2 = 15.09:  log likelihood = -24898.81
AIC=49825.63   AICc=49825.67   BIC=49925.05

Training set error measures:
                  ME     RMSE      MAE MPE MAPE      MASE         ACF1
Training set 0.001099617 3.881168 1.388604 NaN  Inf 0.3068132 -0.003611257


# Select xreg variables for test
xregvars_test <- cbind(# here I combine 9 variables) 

# Convert to matrix
xregvars_test <- matrix(as.numeric(xregvars_test), ncol = 9)

# Forecast
myforecasts <- forecast::forecast(trained_model1, xreg = xregvars_test, 1)

summary(myforecasts)

For some reasons it prints me the same coefficients

Forecast method: Regression with ARIMA(2,0,2) errors

Model Information:
Series: df_train[, "Y"] 
Regression with ARIMA(2,0,2) errors 

Coefficients:
      ar1     ar2      ma1      ma2   xreg1    xreg2    xreg3   xreg4    xreg5   xreg6   xreg7  xreg8   xreg9
  -0.0042  0.9010  -0.0196  -0.4570  -5e-04  -0.0510  -0.2588  2.4189  -1.1462  0.2989  0.3617  4e-04  5.1636
s.e.   0.0061  0.0061   0.0126   0.0127   2e-04   0.0776   0.0838  1.8556   0.7600  0.0269  0.0182  2e-04  0.0958

sigma^2 = 15.09:  log likelihood = -24898.81
AIC=49825.63   AICc=49825.67   BIC=49925.05

Error measures:
                  ME     RMSE      MAE MPE MAPE      MASE         ACF1
Training set 0.001099617 3.881168 1.388604 NaN  Inf 0.3068132 -0.003611257

Forecasts:

And I get values:

Description:df [4,486 x 5]


       Point Forecast Lo 80 Hi 80 Lo 95 Hi 95

8973    7.61958386  2.642059677 12.597108   0.00711754  15.23205
8974    45.13170539 40.152777288    50.110633   37.51709196 52.74632
8975    19.75824133 14.310610280    25.205872   11.42680860 28.08967
8976    13.18712620 7.738266115 18.635986   4.85381383  21.52044
8977    26.42824374 20.626598045    32.229889   17.55539233 35.30110
  • Is my approach correct? I am not sure because my rmse is the same?
  • Are my Point Forecast values predicted for 2019? If so, can I export them and calculate RMSE test, having actual and predicted?

Thnaks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source