'Convert a tensorflow dataset to a python list with strings
Consider the following code below:
import numpy as np
import tensorflow as tf
simple_data_samples = np.array([
[1, 1, 1, -1, -1],
[2, 2, 2, -2, -2],
[3, 3, 3, -3, -3],
[4, 4, 4, -4, -4],
[5, 5, 5, -5, -5],
[6, 6, 6, -6, -6],
[7, 7, 7, -7, -7],
[8, 8, 8, -8, -8],
[9, 9, 9, -9, -9],
[10, 10, 10, -10, -10],
[11, 11, 11, -11, -11],
[12, 12, 12, -12, -12],
])
def timeseries_dataset_multistep_combined(features, label_slice, input_sequence_length, output_sequence_length, batch_size):
feature_ds = tf.keras.preprocessing.timeseries_dataset_from_array(features, None, input_sequence_length + output_sequence_length, batch_size=batch_size)
def split_feature_label(x):
x=tf.strings.as_string(x)
return x[:, :input_sequence_length, :], x[:, input_sequence_length:, label_slice]
feature_ds = feature_ds.map(split_feature_label)
return feature_ds
ds = timeseries_dataset_multistep_combined(simple_data_samples, slice(None, None, None), input_sequence_length=4, output_sequence_length=2,
batch_size=1)
def print_dataset(ds):
for inputs, targets in ds:
print("---Batch---")
print("Feature:", inputs.numpy())
print("Label:", targets.numpy())
print("")
print_dataset(ds)
The tensorflow dataset "ds" consists of an input and target. Now I would like to transform the tensorflow dataset to a python list with the following properties:
Index Type Size Value
0 str 13 1 2 3 4 5 6
1 str 13 1 2 3 4 5 6
2 str 13 1 2 3 4 5 6
3 str 13 -1 -2 -3 -4 -5 -6
4 str 13 -1 -2 -3 -4 -5 -6
5 str 13 2 3 4 5 6 7
.... and so on
In the above example, we hypothetically created a python list containing strings. In the field "value" you can see the inputs of the tensorflow datasets on the left hand side (e.g. 1 2 3 4 with an whitespace between the strings) and on the right hand side you can see the corresponding targets (e.g. 5 6 with a whitespace between the strings). It is important to note that there is a horizontal tab "\t" between the inputs and targets (e.g. 1 2 3 4.\t5 6.)
How would I code this?
Solution 1:[1]
I used your print_dataset function.
def print_dataset(ds):
list_sets = []
for input, targets in ds:
input = np.transpose(np.array(inputs)[0])
label = np.transpose(np.array(targets)[0])
for input_set, label_set in zip(input, label):
set = ""
set = "".join(str(value).replace("b'", "").replace("'", "") + " " for value in input_set)
set += "\t" # add the tab
set += "".join(str(value).replace("b'", "").replace("'", "") + " " for value in label_set)
set = set[:-1] # remove the trailing white space
# print(set) #prints each line individually
list_sets.append(set)
print(list_sets) # prints the whole list
Ignore that you can see the "\t" instead of a tab with spaces if you print the individual lines every works fine. Python only prints the "\t" to shorten the length by replacing useless space with shortcuts.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | gerda die gandalfziege |
