'How to Predict Future values Using LSTM?
I am kind of new in time series forecasting and deep learning. I have a dataset regarding Solar Irradiation and I am using Jupyter Notebook. I have divided data into 3 parts train, val and test. Trained the model and got the predictions on the test dataset. The dataset is from 2010 to 2020 consisting of each hour. I want to make future prediction like from 2021 to 2024.
This is how dataset and current plot looks like:
These are the prediction that I made on the train, val and test dataset.

.
But I am unable to make future predictions.
I extended the dataset for future prediction
but not able to make the prediction. Also, how to handle these NaN values?
## Made a function for window size took the window size as 24
def df_to_X_y(df, window_size=24):
df_as_np = df.to_numpy()
X = []
y = []
for i in range(len(df_as_np)-window_size):
row = [[a] for a in df_as_np[i:i+window_size]]
X.append(row)
label = df_as_np[i+window_size]
y.append(label)
return np.array(X), np.array(y)`
WINDOW_SIZE = 24
x,y = df_to_X_y(Irr,WINDOW_SIZE)
x.shape,y.shape
## Splitted the Data
X_train, y_train = x[:80000], y[:80000]
X_val , y_val = x[80000:90000] , y[80000:90000]
X_test , y_test = x[90000:] , y[90000:]
## Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.metrics import RootMeanSquaredError
from tensorflow.keras.optimizers import Adam
model1 = Sequential()
model1.add(InputLayer((24, 1)))
model1.add(LSTM(64))
model1.add(Dense(8, 'relu'))
model1.add(Dense(1, 'linear'))
model1.summary()
model1.fit(X_train, y_train,
validation_data(X_val,y_val),epochs=50,callbacks=[cp])
train_predictions = model1.predict(X_train).flatten()
train_results = pd.DataFrame(data={'Train Predictions':train_predictions, 'Actuals':y_train})
train_results.index = pd.to_datetime(df['Date'][:80000], format='%Y-%m-%d %H:%M:%S')
train_results.head(125)
## Val predictions
val_predictions = model1.predict(X_val).flatten()
val_results = pd.DataFrame(data={'Val Predictions':val_predictions, 'Actuals':y_val})
val_results.index = pd.to_datetime(df['Date'][80000:90000], format='%Y-%m-%d %H:%M:%S')
val_results
## Now Testing on the Test dataset
test_predictions = model1.predict(X_test).flatten()
test_results = pd.DataFrame(data={'Predictions':test_predictions, 'Actuals':y_test})
test_results.index = pd.to_datetime(df['Date'][90000:96408], format='%Y-%m-%d %H:%M:%S')
test_results
## Future Predictions
test_results_last_24 = test_results['Test Predictions'][-24:] ## Taking last 24 values
test_results_last_24
x_input = array(test_results_last_24)
temp_input=list(x_input)
lst_output=[]
i=0
while(i<10):
if(len(temp_input)>24:
x_input=array(temp_input[1:])
print("{} day input {}".format(i,x_input))
#print(x_input)
x_input = x_input.reshape((1, n_steps, n_features))
#print(x_input)
yhat = model.predict(x_input, verbose=0)
print("{} day output {}".format(i,yhat))
temp_input.append(yhat[0][0])
temp_input=temp_input[1:]
#print(temp_input)
lst_output.append(yhat[0][0])
i=i+1
else:
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat[0])
temp_input.append(yhat[0][0])
lst_output.append(yhat[0][0])
i=i+1
print(lst_output)
Getting The Error
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13296/1377391710.py in <module>
----> 1 x_input = array(test_results_last_24)
2 temp_input=list(x_input)
3 lst_output=[]
4 i=0
5 n_features = 1
TypeError: array() argument 1 must be a unicode character, not Series
This is what I have used
Solution 1:[1]
I don't know how you extended the dataset, but I think it should be like this: in the existing data, suppose the solar irradiation data of the previous 24 hours is used to predict the solar irradiation data of the next hour. Your data is from 2010 to 2020. Then you can use the data of the last day of 2020 to predict the data of the first hour of January 1, 2021, and then you can use the data of the last 23 hours of 2020(true value) and the data of the first hour of January 1, 2021(predicted value) to predict the data at the second hour, and so on. In general, NaN do not appear in your model's predictions, and if they do, you need to retrain your model.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
