'Google Cloud Dataflow Worker Threading

Say we have one worker with 4 CPU cores. How does parallelism configured in Dataflow worker machines? Do we parallelize beyond # of cores?



Solution 1:[1]

For batch jobs, one worker thread is used per core, and each worker thread independently processes a chunk of the input space.

For streaming jobs, there can be many more worker threads per core to wait on input.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1