'Python - How to use fitted ARIMA model on unseen data

I am using statsmodels.tsa.arima.model.ARIMA to fit an ARIMA model on a timeseries.

How can I use this model to make predictions on unseen data? It seems that the predict and forecast function can only make predictions from the last seen data in the training set that model was fitted to.

So for instance, I want to use a static model to keep making prediction into the future. This is for the purpose of real time multi step forecasting where re-fitting the model isn't an option.

E.g.,

Say we have a dataset size of 10,000 split into train and test (70/30). The last reading we train on is 7,000 Is it possible to, say, use the trained model and pass in 6997 to 7000 to predict 7001 to 7004 And then in the following iteration pass it 6998 to 7001 to predict 7002 to 7005 using the same model.

This type of prediction is common in ML workflow, but not apparent to me how to perform this in ARIMA. Predict and forecast functions only ask for indices parameters, but there is no parameter for fresh data.



Solution 1:[1]

You can easily do it with the predict method which was created for this purpose. You first train you ARIMA model on all of you data (without splits). When generating forecasts you use the predict method and set the start and end parameter, e.g. when you want to predict 7001 to 7004 like this:

model.predict(start=7000, end=7004)

The predict method will use all the data available to the start point (including that one) and then make a prediction. That way you do not have to train you model again and again with new data.

The start/end parameter also works with datetime or strings (like "2021-06-30" to "2021-07-31").

https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.predict.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Arne Decker