'Apache beam FixedWindow doesn't do anything after GroupByKey transform
I built a pipeline which reads from confluent kafka it processes the records and then use side outputs to split them into rejected and approved pcollections, then the approved pcollections gets written to bigquery, but I want to persist the approved records and write them into a file on gcs.
The code is:
windowing=(aproved
| 'Create_window' >> beam.WindowInto(window.FixedWindows(60))
| 'AddKey' >> beam.Map(lambda record: (none,record))
| 'GBK' >> beam.GroupByKey()
| 'remove_key' >> beam.FlatMap(ret_key)
| 'AddTimeStamp' >> beam.Map(lambda record: beam.window.TimestampValue(record,time.time()))
| 'Write' >> WriteToFiles(path=MY_BUCKET,file_naming=destination_prefix_naming('.ppl'))
)
This works when I test it reading from a file and using direct runner, but when I use dataflow and streaming it just doesn't do anything after the GroupByKey transform, it says on the graph that 20 element were added, but the next transform ('remove_key') never gets an element after that
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
