'I'm getting error (Inputs to a layer should be tensors) when using tf.data.Dataset and the Window creation function
The problem I'm stuck with is an error in the fit method when trying to train a neural network based on a dataset generated using the tf.data.Dataset.Window window creation function. My training dataset is too big to fit in memory, and I have to train on data that is formed into window. In this regard, the loading of a data set is organized through the tf.data.experimental.CsvDataset function.
The dataset is a consecutive row of numeric values, where the first 7 values contain labels, the next 100 values contain features. Only one value is taken to form labels, the remaining 6 are omitted and serve only for additional experiments with the quality of training.
import tensorflow as tf
from tensorflow import keras
XLength = 107
types = [tf.constant(0, dtype=tf.float32)]
ds = tf.data.experimental.CsvDataset(train_file_list, types*XLength, header=False, field_delim = ";", compression_type="GZIP")
The pack_row function extracts the 3rd value from each row as a label and 100 features values
def pack_row(*row):
label = row[3]
features = tf.stack(row[PLength:XLength],1)
return features, label
Next, we create a data set in which rows form a data set divided into features and labels, and add a window creation function.
window_ds_train = ds.batch(1000).map(stack_row, num_parallel_calls=4).unbatch().window(10, shift=1, drop_remainder=True)
The features dataset looks like this:
for x in window_ds_train.take(1):
for n in x[0]:
print(n)
tf.Tensor(
[1.1039783 1.1163003 1.1081576 1.1117266 1.1180297 1.2345679 1.3053098
1.3443557 1.3639535 1.26 1.2604042 1.1780168 1.1761158 1.2451861
1.4478064 1.4914197 1.35623 1.4864376 1.4237918 1.4029851 1.434866
1.1298449 1.0216535 1.0060976 1.0190678 1.0550661 0.99117 0.8632287
0.7545455 0.7396314 0.7372093 0.7226107 0.7727273 0.766129 1.0083683
1.5096774 1.4933333 1.2517985 1.537037 1.6262627 1.5851064 1.2197802
1.1764706 1.6491228 4.631579 5.25 4.7 4.3333335 4.
3.5714285 0.28 0.25 0.2307692 0.212766 0.1904762 0.2159091
0.606383 0.85 0.8198198 0.6308725 0.6149068 0.6506024 0.7988506
0.6696429 0.6623932 0.9917012 1.3052632 1.2941177 1.383871 1.3564669
1.3520249 1.3253012 1.1584415 1.0089086 0.9478079 0.981289 0.9939394
0.9788054 0.8850772 0.6969292 0.7127659 0.7023498 0.6727494 0.7373381
0.6705021 0.6907001 0.8030928 0.8502564 0.8488844 0.7933962 0.7936508
0.7331628 0.7438507 0.7661017 0.81 0.8944306 0.8995017 0.9023987
0.8958163 0.9058149], shape=(100,), dtype=float32)
tf.Tensor(
[1.0480686 1.0768552 1.0823635 1.0807899 1.0946314 1.1049724 1.0976744
1.1112158 1.1066037 1.0180608 1.0143541 1.0478215 1.1168385 1.1465721
1.1544029 1.1672772 1.0481482 1.0198511 0.9598997 1.0053476 1.1888889
0.9557377 0.8722689 0.9482759 0.948718 0.9485149 0.9144603 0.7938144
0.6960168 0.6963124 0.7188209 0.7328605 0.6848341 0.686747 0.589242
0.5806451 0.5614035 0.4371859 0.483965 0.4721408 0.7163461 0.8951613
0.8403361 0.8703704 1.1428572 0.9264706 0.7460318 0.65 0.5925926
0.9615384 1.04 1.6875 1.5384616 1.3404255 1.0793651 0.875
1.1489362 1.19 1.1171172 1.3959732 2.1180124 2.066265 2.2873564
1.78125 1.7222222 1.6970954 1.4561404 1.4602076 1.3645161 1.3911672
1.4361371 1.436747 1.2597402 1.0935411 1.0542798 1.054054 1.0545454
1.1464355 1.0463122 0.8411215 0.9946808 1.0417755 0.9805353 0.9540636
0.8566946 0.8662487 0.872165 0.8953846 0.9543611 0.9858491 0.9822596
0.9036658 0.8999152 0.9110169 0.905 0.9135495 0.9252492 0.9239041
0.9286301 0.954136 ], shape=(100,), dtype=float32)
I had to omit some of the data, because the data set is too large, the window has the form (10,100)
The labels look like this:
for x in window_ds_train.take(1):
for n in x[1]:
print(n)
tf.Tensor(-0.21, shape=(), dtype=float32)
tf.Tensor(-0.22, shape=(), dtype=float32)
tf.Tensor(-0.22, shape=(), dtype=float32)
tf.Tensor(-0.22, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
tf.Tensor(-0.19, shape=(), dtype=float32)
Next, I would like to make a flat_map transformation to a dataset, but when I try to execute:
flatten = window_ds_train.flat_map(lambda x:x.batch(10))
of course, I will get an error: TypeError: () takes 1 positional argument but 2 were given, since both features and labels are hardwired inside the dataset, and the method can apparently only process one axis. The model I'm trying to train looks like this:
inputs = keras.Input(shape=(100))
x = keras.layers.Dense(204, activation='relu')(inputs)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(400, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(400, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(204, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(102, activation='relu')(x)
x = keras.layers.Dropout(0.2)(x)
outputs = keras.layers.Dense(10)(x)
model = keras.Model(inputs, outputs)
model.compile(optimizer=tf.keras.optimizers.Adam(), loss = 'mse', metrics="mae")
If, under such circumstances, training is carried out:
model.fit(window_ds_train, epochs=1, verbose=1)
then I get an error: TypeError: Inputs to a layer should be tensors. Got: <_VariantDataset shapes: (100,), types: tf.float32> Accordingly, I understand that the incoming data must be a tensor, while it is of type _VariantDataset, which is not acceptable. To work around this problem, I attempted to split the dataset into features and labels and process them in separate flat_map threads. To do this, I had to additionally introduce two functions, the first returns features, and the second labels:
def label_row(*row):
label = row[3]
return label
def features_row(*row):
features = tf.stack(row[PLength:XLength],1)
return features
Next, we form a data set with window functions for features and labels separately for each:
feature_flatten = feature_window_ds_train.flat_map(lambda x:x.batch(10))
label_flatten = label_window_ds_train.flat_map(lambda x:x.batch(10))
When trying to train a model:
history = model.fit(feature_flatten, label_flatten, epochs=1, verbose=1)
i get error: y argument is not supported when using dataset as input
Definitely, the input model expects a dataset in which the Dataset consists of x and y, in this case I submit x separately from y, which is unacceptable. If someone has ideas on how to train a model that will accept Dataset.Window as input, I would be very grateful for clarifications.
Solution 1:[1]
Let's first create a dataset compatible with your model
N = 50;
c = 1;
ds = tf.data.Dataset.from_tensor_slices(
(
tf.random.normal(shape=(N, c, 100)),
tf.random.normal(shape=(N, c))
)
)
Then we can simply
model.fit(ds, epochs=1)
But notice that the return type of window, is not the same as the initial dataset. ds is a dataset of tuples, dsw is a tupple of _VariantDatasets
print(ds)
# <TensorSliceDataset shapes: ((1, 100), (1,)), types: (tf.float32, tf.float32)>
for dsw in ds.window(30):
print(dsw);
# (<_VariantDataset shapes: (1, 100), types: tf.float32>, <_VariantDataset shapes: (1,), types: tf.float32>)
# (<_VariantDataset shapes: (1, 100), types: tf.float32>, <_VariantDataset shapes: (1,), types: tf.float32>)
What you can do to get a window of the dataset with the same type is to combine skip and take
def simple_window(ds, size):
for start in range(0, ds.cardinality(), size):
yield ds.skip(start).take(size)
Then you can train with different windows
for dsw in simple_window(ds, 30):
model.fit(dsw, epochs=1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Bob |
