'Output Sorted KV Pcollection to a SST file from Google Cloud Dataflow

I am trying to create a SST file from a Pcollection(Simple KV<String, String>>) using dataflow.This SST file will be later used for loading the rocksdb. However I see only limited built in IO support as here.

Is it possible to write to a different file format maintaining the order in dataflow?



Solution 1:[1]

Beam does not have built in support for SST files.

You could use the FileIO module to write such files but you need to develop a Sink for this file format.

You could also write files from a simple ParDo transform, but you have to make sure that writing is correct and efficient (for example, make sure that you do not perform duplicate writes if a bundle fails and retried by the runner).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 chamikara