'How to Predict Future values Using LSTM?

I am kind of new in time series forecasting and deep learning. I have a dataset regarding Solar Irradiation and I am using Jupyter Notebook. I have divided data into 3 parts train, val and test. Trained the model and got the predictions on the test dataset. The dataset is from 2010 to 2020 consisting of each hour. I want to make future prediction like from 2021 to 2024. This is how dataset and current plot looks like:Data & Plot of the graph from 2010-2020

These are the prediction that I made on the train, val and test dataset.Train PlotValidationTest Plot. But I am unable to make future predictions. I extended the dataset for future predictionExtended_data but not able to make the prediction. Also, how to handle these NaN values?

## Made a function for window size took the window size as 24


def df_to_X_y(df, window_size=24):
    df_as_np = df.to_numpy()
     X = []
     y = []
     for i in range(len(df_as_np)-window_size):
         row = [[a] for a in df_as_np[i:i+window_size]]
         X.append(row)
         label = df_as_np[i+window_size]
         y.append(label)
     return np.array(X), np.array(y)`


WINDOW_SIZE = 24
x,y = df_to_X_y(Irr,WINDOW_SIZE)
x.shape,y.shape

## Splitted the Data

X_train, y_train = x[:80000], y[:80000]
X_val , y_val = x[80000:90000] , y[80000:90000]
X_test , y_test = x[90000:] , y[90000:]

## Model 
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.metrics import RootMeanSquaredError
from tensorflow.keras.optimizers import Adam

model1 = Sequential()
model1.add(InputLayer((24, 1)))
model1.add(LSTM(64))
model1.add(Dense(8, 'relu'))
model1.add(Dense(1, 'linear'))

model1.summary()

model1.fit(X_train, y_train, 
validation_data(X_val,y_val),epochs=50,callbacks=[cp])

train_predictions = model1.predict(X_train).flatten()
train_results = pd.DataFrame(data={'Train Predictions':train_predictions, 'Actuals':y_train})
train_results.index = pd.to_datetime(df['Date'][:80000], format='%Y-%m-%d %H:%M:%S')
train_results.head(125)

## Val predictions
val_predictions = model1.predict(X_val).flatten()
val_results = pd.DataFrame(data={'Val Predictions':val_predictions, 'Actuals':y_val})
val_results.index = pd.to_datetime(df['Date'][80000:90000], format='%Y-%m-%d %H:%M:%S')
val_results

## Now Testing on the Test dataset

test_predictions = model1.predict(X_test).flatten()
test_results = pd.DataFrame(data={'Predictions':test_predictions, 'Actuals':y_test})
test_results.index = pd.to_datetime(df['Date'][90000:96408], format='%Y-%m-%d %H:%M:%S')

test_results

## Future Predictions
test_results_last_24 = test_results['Test Predictions'][-24:] ## Taking last 24 values
test_results_last_24

x_input = array(test_results_last_24)

temp_input=list(x_input)
lst_output=[]
i=0
while(i<10):

    if(len(temp_input)>24:
        x_input=array(temp_input[1:])
        print("{} day input {}".format(i,x_input))
        #print(x_input)
        x_input = x_input.reshape((1, n_steps, n_features))
        #print(x_input)
        yhat = model.predict(x_input, verbose=0)
        print("{} day output {}".format(i,yhat))
        temp_input.append(yhat[0][0])
        temp_input=temp_input[1:]
        #print(temp_input)
        lst_output.append(yhat[0][0])
        i=i+1
    else:
        x_input = x_input.reshape((1, n_steps, n_features))
        yhat = model.predict(x_input, verbose=0)
        print(yhat[0])
        temp_input.append(yhat[0][0])
        lst_output.append(yhat[0][0])
        i=i+1


print(lst_output)

Getting The Error

TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_13296/1377391710.py in <module>
----> 1 x_input = array(test_results_last_24)
  2 temp_input=list(x_input)
  3 lst_output=[]
  4 i=0
  5 n_features = 1

TypeError: array() argument 1 must be a unicode character, not Series

This is what I have used



Solution 1:[1]

I don't know how you extended the dataset, but I think it should be like this: in the existing data, suppose the solar irradiation data of the previous 24 hours is used to predict the solar irradiation data of the next hour. Your data is from 2010 to 2020. Then you can use the data of the last day of 2020 to predict the data of the first hour of January 1, 2021, and then you can use the data of the last 23 hours of 2020(true value) and the data of the first hour of January 1, 2021(predicted value) to predict the data at the second hour, and so on. In general, NaN do not appear in your model's predictions, and if they do, you need to retrain your model.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1