'How to choose initial, period, horizon and cutoffs with Facebook Prophet?

I have around 23300 hourly datapoints in my dataset and I try to forecast using Facebook Prophet. To fine-tune the hyperparameters one can use cross validation:

from fbprophet.diagnostics import cross_validation

The whole procedure is shown here: https://facebook.github.io/prophet/docs/diagnostics.html

Using cross_validation one needs to specify initial, period and horizon:

df_cv = cross_validation(m, initial='xxx', period='xxx', horizon = 'xxx')

I am now wondering how to configure these three values in my case? As stated I have data of about 23.300 hourly datapoints. Should I take a fraction of that as the horizon or is it not that important to have correct fractions of the data as horizon and I can take whatever value seems to be appropriate?

Furthermore, cutoffs has also be defined as below:

cutoffs = pd.to_datetime(['2013-02-15', '2013-08-15', '2014-02-15'])
df_cv2 = cross_validation(m, cutoffs=cutoffs, horizon='365 days')

Should these cutoffs be equally distributed as above or can we set the cutoffs individually as someone likes to set them?



Solution 1:[1]

  • initial is the first training period. It is the minimum amount of data needed to begin your training on.
  • horizon is the length of time you want to evaluate your forecast over. Let's say that a retail outlet is building their model so that they can predict sales over the next month. A horizon set to 30 days would make sense here, so that they are evaluating their model on the same parameter setting that they wish to use it on.
  • period is the amount of time between each fold. It can be either greater than the horizon or less than it, or even equal to it.
  • cutoffs are the dates where each horizon will begin.

You can understand these terms by looking at this image -

enter image description here credits: Forecasting Time Series Data with Facebook Prophet by Greg Rafferty

Let's imagine that a retail outlet wants a model that is able to predict the next month of daily sales, and they plan on running the model at the beginning of each quarter. They have 3 years of data

They would set their initial training data to be 2 years, then. They want to predict the next month of sales, and so would set horizon to 30 days. They plan to run the model each business quarter, and so would set the period to be 90 days. Which is also shown in above image.

Let's apply these parameters into our model:

df_cv = cross_validation(model,
                     horizon='30 days',
                     period='90 days',
                     initial='730 days')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 JATIN