'Create train and test with lags of multiple features

I have a classification problem for which I want to create a train and test dataframe with 21 lags of multiple features (X-variables). I already have an easy way to do this with only one feature but I don't know how to adjust this code if I want to use more variables (e.g. df['ETHLogReturn']).

The code I have for one variable is:

Ntest = 252
train = df.iloc[:-Ntest]
test = df.iloc[-Ntest:]

# Create data ready for machine learning algoritm
series = df['BTCLogReturn'].to_numpy()[1:] # first change is NaN

# Did the price go up or down?
target = (targets > 0) * 1

T = 21 # 21 Lags
X = []
Y = []
for t in range(len(series)-T):
  x = series[t:t+T]
  X.append(x)
  y = target[t+T]
  Y.append(y)
  
X = np.array(X).reshape(-1,T)
Y = np.array(Y)
N = len(X)
print("X.shape", X.shape, "Y.shape", Y.shape)

#output --> X.shape (8492, 21) Y.shape (8492,)

Then I create my train and test datasets like this:

Xtrain, Ytrain = X[:-Ntest], Y[:-Ntest]
Xtest, Ytest = X[-Ntest:], Y[-Ntest:]

# example of model:
lr = LogisticRegression()
lr.fit(Xtrain, Ytrain)
print(lr.score(Xtrain, Ytrain))
print(lr.score(Xtest, Ytest))

Does anyone have a suggestion how to adjust this code for a model with lagging variables of multiple columns? Like:

df[['BTCLogReturn','ETHLogReturn']]

Many thanks for your help!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Create train and test with lags of multiple features

Sources

Related Questions