'Group By and Sort a Tensorflow Dataset
I would like to group rows in a tensorflow dataset by a key and select top k rows in each group by some value. This is easily doable ex. in Pandas or SQL, but not so obvious in TF.
I found in tf.experimental group_by_window and group_by_reducer, but I can't figure out how to sort a dataset by a specific column.
My dataset has Dict structure for the rows. What I am looking for is smth like:
from tensorflow.data.experimental import group_by_window
def key_f(row):
return row['id']
def reduce_func(key, ds):
# sort by a value - except there is no method like this...
ds=ds.sort(by='value')
return ds.take(5)
t = group_by_window(key_func = key_f, reduce_func = reduce_func, window_size=100)
ds = dataset.apply(t)
UPDATE: Here is an example. Let's say I want to group by 'id' and sort by 'start' in each group, all within TF:
pd.DataFrame([{ 'id': 1, 'input_a': 0.0, 'start': 5},
{'id': 1, 'input_a': 10.0, 'start': 15},
{'id': 2, 'input_a': 20.0, 'start': 25},
{'id': 2, 'input_a': 30.0, 'start': 35}])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|