'Create train and test with lags of multiple features
I have a classification problem for which I want to create a train and test dataframe with 21 lags of multiple features (X-variables). I already have an easy way to do this with only one feature but I don't know how to adjust this code if I want to use more variables (e.g. df['ETHLogReturn']).
The code I have for one variable is:
Ntest = 252
train = df.iloc[:-Ntest]
test = df.iloc[-Ntest:]
# Create data ready for machine learning algoritm
series = df['BTCLogReturn'].to_numpy()[1:] # first change is NaN
# Did the price go up or down?
target = (targets > 0) * 1
T = 21 # 21 Lags
X = []
Y = []
for t in range(len(series)-T):
x = series[t:t+T]
X.append(x)
y = target[t+T]
Y.append(y)
X = np.array(X).reshape(-1,T)
Y = np.array(Y)
N = len(X)
print("X.shape", X.shape, "Y.shape", Y.shape)
#output --> X.shape (8492, 21) Y.shape (8492,)
Then I create my train and test datasets like this:
Xtrain, Ytrain = X[:-Ntest], Y[:-Ntest]
Xtest, Ytest = X[-Ntest:], Y[-Ntest:]
# example of model:
lr = LogisticRegression()
lr.fit(Xtrain, Ytrain)
print(lr.score(Xtrain, Ytrain))
print(lr.score(Xtest, Ytest))
Does anyone have a suggestion how to adjust this code for a model with lagging variables of multiple columns? Like:
df[['BTCLogReturn','ETHLogReturn']]
Many thanks for your help!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
