'How to use ARIMA model to predict events from a multicolumn dataframe
The dataset contains number of passenger entering a train station in a month. Each column numbered 1-74 are time intervals of 15minutes from 5:30 - 24:00. And the total number of passengers for each timee period every day is recorded. I want to use the data for the first 28 day to create the arima model and use that to predict the data for the 29th day. I am to get p, d & q value for each time interval(i.e each column) and use it for model creation resulting on 74 total predictions. Can also select a entire column and calculate the p, d & value to be used for all other columns.
I am new to ARIMA and i dont even know how to go about with this task.
Solution 1:[1]
You can use statsmodels ARIMA implementation. You train a separate ARIMA model for every time slice using the first 28 observations.
import pandas as pd
from statsmodels.tsa.arima_model import ARIMA
df = pd.read_csv("inbound_flights.csv")
df.drop(columns=['Unnamed: 0'], inplace=True)
# use first rows for training and use last row for out-of-sample testing
X_train, X_test = df.iloc[:-1, :], df.iloc[[-1],:]
order = (5,1,0) # <- plug-in p, d, q here
for col in X_train.columns:
# fit model
model = ARIMA(X_train[col], order=order)
model_fit = model.fit()
# summary of fit model
#print(model_fit.summary())
# make one-day forecast
forecast, _, _ = model_fit.forecast(steps=1)
print(forecast[0])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
