'Flink SQL behavior
I want to execute Flink SQL on batch data. (CSVs in S3)
However, I explicitly want Flink to execute my query in a streaming fashion because I think it will be faster than the batch mode.
For example, my query consists of filtering on two tables and joining the filtered result. I want Flink not to materialize the two tables in blocking batch fashion and then pipe the result through the join, but use a streaming hash join operator like in the datastream API.
How do I make this happen? I am using PyFlink.
Solution 1:[1]
You can read at https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/ how you can set the Execution Mode for a Flink application. Combine this with https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/python_config/ which explains how you can specify configuration options in Python applications.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Martijn Visser |
