'best way to expose Flink table both as a changelist and also flat

I would like to use Flink for the following use case:

  • I have a table whose columns are id, datapoint. There are many rows (many unique datapoints) for each id.
  • My readers want to get all the datapoints associated with a single id so they can show a visualization specific to the id.
  • There are many datapoints per id, but the set doesn’t change quickly, so watching a stream of updates to the table from toChangelogStream() is critical for responsiveness.
  • But, I want the first load to be quick, so I want to have a flattened cache of all the datapoints for each id to start with, even if it’s a few seconds out of date. Doing a fresh load like this is uncommon —- usually clients will watch the same id for a long period.

What’s the best way to do this? The options I could think of were:

  1. Use two sinks, one from table.toChangelogStream(), and one where I group all datapoints for an id together into a list. But then:
    • how do I avoid writing gigantic flattened views out after every new/deleted row? (is this a valid use case for an Iteration?)
    • if I succeed at the first question, how do I get old entries in the change log back to the point where the most recent flattened view was calculated from, so that I can play the change log forward to get back to a live version of the data?
    • will creating these big lists in the flat view cause a big performance penalty? (for example, checkpointing might be slower, etc.)
  2. Do the same as above, but instead of making the flattened view a normal sink, expose it as Queryable State keyed on the id, since we only occasionally need it. (perhaps this solves the first problem above, and kind of solves the second one?)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source