'Running jobs independent of each other's failure/success in a single dataflow pipeline

I am trying to load data in Avro format from GCS to Big Query, using a single pipeline. There are 10 tables for instance that I am trying to load, which means 10 parallel jobs in a single pipeline. Now if the 3rd job fails, all the subsequent jobs fail. How can I make the other jobs run independent of the failure/success of one?



Solution 1:[1]

You cannot isolate different steps within a single Dataflow pipeline without implementing custom logic (for example, custom DoFn/ParDo implementations). Some I/O connectors such as BigQuery offer a way to send failed requests to a dead-letter queue in some write modes but this might not give what you want. If you want full isolation you should run separate jobs and combine them into a workflow using a orchestration framework such as Apache Airflow.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 chamikara