'How do I do a validation split on keras dataset
At the moment the code is splitting the dataset in half, 50% for training and 50% for test, how could i split the data in other proportions like 80/20?
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)
i have added the validation_split function in the model.compile section of my CNN model but that only splits the training dataset and not the whole data set.
model.fit(X_train, y_train,validation_split=0.2, epochs=3, callbacks=[
tensorBoardCallback], batch_size=64)
i have found that the sklearn train_test_split() is the common way to do it but when i put the code from the website in my code i get the error:
----> 9 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
NameError: name 'X' is not defined
the code responsible:
# Using keras to load the dataset with the top_words
top_words = 10000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)
word_index = keras.datasets.imdb.get_word_index()
# Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Solution 1:[1]
# Combine the data and labels and then do the split.
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)
X = np.concatenate((X_train, X_test), axis=0)
y = np.concatenate((y_train, y_test), axis=0)
# Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ML_Enthu |
