'How to use ARIMA model to predict events from a multicolumn dataframe

https://docs.google.com/spreadsheets/d/1JmMoGcKI9ycFkYi0lrqtnb0sj2JKmJ1l/edit?usp=sharing&rtpof=true&sd=true

The dataset contains number of passenger entering a train station in a month. Each column numbered 1-74 are time intervals of 15minutes from 5:30 - 24:00. And the total number of passengers for each timee period every day is recorded. I want to use the data for the first 28 day to create the arima model and use that to predict the data for the 29th day. I am to get p, d & q value for each time interval(i.e each column) and use it for model creation resulting on 74 total predictions. Can also select a entire column and calculate the p, d & value to be used for all other columns.

I am new to ARIMA and i dont even know how to go about with this task.



Solution 1:[1]

You can use statsmodels ARIMA implementation. You train a separate ARIMA model for every time slice using the first 28 observations.

import pandas as pd
from statsmodels.tsa.arima_model import ARIMA

df = pd.read_csv("inbound_flights.csv")
df.drop(columns=['Unnamed: 0'], inplace=True)

# use first rows for training and use last row for out-of-sample testing
X_train, X_test = df.iloc[:-1, :], df.iloc[[-1],:]

order = (5,1,0) # <- plug-in p, d, q here

for col in X_train.columns:
  # fit model
  model = ARIMA(X_train[col], order=order)
  model_fit = model.fit()

  # summary of fit model
  #print(model_fit.summary())

  # make one-day forecast
  forecast, _, _ = model_fit.forecast(steps=1)
  print(forecast[0])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1