'Flink SQL behavior

I want to execute Flink SQL on batch data. (CSVs in S3)

However, I explicitly want Flink to execute my query in a streaming fashion because I think it will be faster than the batch mode.

For example, my query consists of filtering on two tables and joining the filtered result. I want Flink not to materialize the two tables in blocking batch fashion and then pipe the result through the join, but use a streaming hash join operator like in the datastream API.

How do I make this happen? I am using PyFlink.



Solution 1:[1]

You can read at https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/ how you can set the Execution Mode for a Flink application. Combine this with https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/python_config/ which explains how you can specify configuration options in Python applications.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martijn Visser