'How is model.fit affected in a for-loop

I am implementing walk-forward optimization for a time series. I understand splitting the set to train-test with 20-30% reserved for testing, but for time series i cant randomly split nor can i choose the last portion of the data as it's not indicative of changes happening in the majority of the training set. If i put a model.fit into a for-loop how is it affected when looping over my segmented dates? Does it get reset or does it keep the previous tests?

Here's an idea (Its not meant to run, just something to see the concept)

for date in seg_dates:
    x_train = dataframe.iloc[date-timeframeLength : date, x_varColumn]
    y_train = dataframe.iloc[date-timeframeLength: date, y_varColumn]
    model = LinearRegression()
    model.fit(x_train, y_train)


Solution 1:[1]

the .fit method will 'reset' the model to be trained on only the values in x_train and y_train.

if you want to train it on an extended dataset, then you should make both x_train and y_train as lists and append them inside the loop before training , (instead of training on each iteration).

x_train = []
y_train = []
for date in seg_dates:
    x_train.append(dataframe.iloc[date-timeframeLength : date, x_varColumn])
    y_train.append(dataframe.iloc[date-timeframeLength: date, y_varColumn])
model = LinearRegression()
model.fit(x_train, y_train)

you can also apply sklearn.model_selection.train_test_split to split data into train and test before the .fit line.

as for training on time series, I think the best method is to split the time series into a lot of 'windows' and your 'sample' will be your 'windowed data'.

edit: replaced extend with append, since you want the data to be a time series, and not single valued data points.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1