'Convert a tensorflow dataset to a python list with strings

Consider the following code below:

import numpy as np
import tensorflow as tf

simple_data_samples = np.array([
         [1, 1, 1, -1, -1],
         [2, 2, 2, -2, -2],
         [3, 3, 3, -3, -3],
         [4, 4, 4, -4, -4],
         [5, 5, 5, -5, -5],
         [6, 6, 6, -6, -6],
         [7, 7, 7, -7, -7],
         [8, 8, 8, -8, -8],
         [9, 9, 9, -9, -9],
         [10, 10, 10, -10, -10],
         [11, 11, 11, -11, -11],
         [12, 12, 12, -12, -12],
])

def timeseries_dataset_multistep_combined(features, label_slice, input_sequence_length, output_sequence_length, batch_size):
    feature_ds = tf.keras.preprocessing.timeseries_dataset_from_array(features, None, input_sequence_length + output_sequence_length, batch_size=batch_size)

    def split_feature_label(x):
        x=tf.strings.as_string(x)

        return x[:, :input_sequence_length, :], x[:, input_sequence_length:, label_slice]

    feature_ds = feature_ds.map(split_feature_label)

    return feature_ds

ds = timeseries_dataset_multistep_combined(simple_data_samples, slice(None, None, None), input_sequence_length=4, output_sequence_length=2,
batch_size=1)
def print_dataset(ds):
    for inputs, targets in ds:
        print("---Batch---")
        print("Feature:", inputs.numpy())
        print("Label:", targets.numpy())
        print("")



print_dataset(ds)

The tensorflow dataset "ds" consists of an input and target. Now I would like to transform the tensorflow dataset to a python list with the following properties:

Index Type Size  Value 
0     str    13   1  2  3  4      5  6 
1     str    13   1  2  3  4      5  6
2     str    13   1  2  3  4      5  6
3     str    13   -1 -2 -3 -4    -5 -6   
4     str    13   -1 -2 -3 -4    -5 -6
5     str    13    2  3  4  5     6  7
.... and so on

In the above example, we hypothetically created a python list containing strings. In the field "value" you can see the inputs of the tensorflow datasets on the left hand side (e.g. 1 2 3 4 with an whitespace between the strings) and on the right hand side you can see the corresponding targets (e.g. 5 6 with a whitespace between the strings). It is important to note that there is a horizontal tab "\t" between the inputs and targets (e.g. 1 2 3 4.\t5 6.)

How would I code this?



Solution 1:[1]

I used your print_dataset function.

def print_dataset(ds):

    list_sets = []

    for input, targets in ds:

        input = np.transpose(np.array(inputs)[0])
        label = np.transpose(np.array(targets)[0])

        for input_set, label_set in zip(input, label):

            set = ""
            set = "".join(str(value).replace("b'", "").replace("'", "") + " " for value in input_set)

            set += "\t" # add the tab

            set += "".join(str(value).replace("b'", "").replace("'", "") + " " for value in label_set)
            set = set[:-1] # remove the trailing white space

            # print(set) #prints each line individually 
            list_sets.append(set)

    print(list_sets) # prints the whole list

Ignore that you can see the "\t" instead of a tab with spaces if you print the individual lines every works fine. Python only prints the "\t" to shorten the length by replacing useless space with shortcuts.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 gerda die gandalfziege