'Multi variate LSTM model to predict future price data with features -keras
I am relatively new to machine learning, my current project has me taking data in 6 hour intervals for the past 50 days (300 columns of data), making it indexed by datetime and trying to predict prices. I have 4 columns of data for the average high price, average low price, high price volume and low price volume. I want to use these columns as features to predict the avg low price every 6 hours for a certain number of days. Currently I have been following guides which have led me to this multivariate LSTM model which has some errors, this is my code:
df=pd.read_csv('clawdata.csv')
df['datetime']=pd.to_datetime(df['datetime'],format='%d-%m-%y %H:%M:%S')
df=df.set_index('datetime')
print(df)
df.sort_index(inplace=True)
cols = list(df)[:5]
#removing commas
df = df[cols].astype(str)
for i in cols:
for j in range(0,len(df)):
df[i][j] = df[i][j].replace(',','')
#convert back to float
df = df.astype(float)
#using multiple predictors/making data into matrix form
training_set = df.values
print("Shape of training set == {}.".format(df.shape))
#different feature have different measurements, scale them all to 1 scale
sc = StandardScaler()
training_set_scaled = sc.fit_transform(training_set)#scaling for all features that should be used as predictors
sc_predict = StandardScaler()
print(sc_predict.fit_transform(training_set[:, 1:2]))#scaling for avgLowPrice
print(training_set)
#creating data structure
x_train = []
y_train = []
n_future_time_intervals = 30#number of 6 hour time intervals we want to predict into the future.
n_past_time_intervals = 100#number of 6 hour time intervals we want to use to predict the future.
print(len(training_set_scaled))
for i in range(n_past_time_intervals, len(training_set_scaled)-n_future_time_intervals + 1):
x_train.append(training_set_scaled[i - n_past_time_intervals:i, 0:df.shape[1]-1])##********
y_train.append(training_set_scaled[i + n_future_time_intervals:i + n_future_time_intervals, 0])
x_train,y_train = np.array(x_train), np.array(y_train)
print("x_train shape == {}".format(x_train.shape))
print("y_train shape == {}".format(y_train.shape))
model = Sequential()
model.add(LSTM(64, return_sequences=True,input_shape=(n_past_time_intervals,df.shape[1]-1)))
model.add(LSTM(32,return_sequences=False))
model.add(Dropout(0.25))
model.add(Dense(13, activation='linear'))#receives input from all neurons of its previous layer
model.compile(optimizer= 'adam',loss='mean_squared_error')
print(model.summary())
es = EarlyStopping(monitor='val_loss',min_delta=1e-10,patience=10,verbose=1)
rlr = ReduceLROnPlateau(monitor='val_loss',factor=0.5,patience=10,verbose=1)
mcp = ModelCheckpoint(filepath='weights.h5',monitor='val_loss',verbose=1,save_best_only=True,save_weights_only=True)
tb = TensorBoard('logs')
history = model.fit(x_train,y_train, shuffle=True, epochs=15, callbacks=[es,rlr,mcp,tb],validation_split=0.2,verbose=1,batch_size=256)
- I am struggling to understand the section of code in which i append values to my x_train and y_train.
- I am getting an issue when I run the code just to test the output and if it works that "Dimensions must be equal but are 13 and 0 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](sequential/dense/BiasAdd, IteratorGetNext:1)' with input shapes: [?,13], [?,0].orGetNext:1)' with input shapes: [?,13], [?,0].". I am puzzled as to what is going on here as i followed a guide accomplishing a similar goal and try applying it to my dataset and it's not what i was expecting.
- Is this code just taking a certain % of past data to predict the rest of the past data, or can I use it for future days like I wanted?
Here is the csv for the dataframe created at the start of the code: https://www.filemail.com/d/vxpranwaldbpchu.
This is a project to help me learn supervised learning and the best way I could go about this was making this personal project for prices of items, trying to understand how someone else implemented something similar and seeing how I can apply something like that to my situation, so if you cant answer certain questions if you could point me in the right direction to achieve what I want here that would be much appreciated :) . fyi the data is for prices of items in a game I play :D
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
