'best way to expose Flink table both as a changelist and also flat
I would like to use Flink for the following use case:
- I have a table whose columns are
id, datapoint. There are many rows (many uniquedatapoints) for eachid. - My readers want to get all the
datapoints associated with a singleidso they can show a visualization specific to theid. - There are many
datapoints perid, but the set doesn’t change quickly, so watching a stream of updates to the table fromtoChangelogStream()is critical for responsiveness. - But, I want the first load to be quick, so I want to have a flattened cache of all the
datapoints for eachidto start with, even if it’s a few seconds out of date. Doing a fresh load like this is uncommon —- usually clients will watch the sameidfor a long period.
What’s the best way to do this? The options I could think of were:
- Use two sinks, one from
table.toChangelogStream(), and one where I group alldatapoints for anidtogether into a list. But then:- how do I avoid writing gigantic flattened views out after every new/deleted row? (is this a valid use case for an Iteration?)
- if I succeed at the first question, how do I get old entries in the change log back to the point where the most recent flattened view was calculated from, so that I can play the change log forward to get back to a live version of the data?
- will creating these big lists in the flat view cause a big performance penalty? (for example, checkpointing might be slower, etc.)
- Do the same as above, but instead of making the flattened view a normal sink, expose it as Queryable State keyed on the
id, since we only occasionally need it. (perhaps this solves the first problem above, and kind of solves the second one?)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
