'Apache Beam/Dataflow- passing file path to ReadFromText

I have a use case where I want to read the filename from a metadata table, I have written a pipeline function to read the metadata table, but I am not sure how can I pass this information to ReadFromText as it only takes string as input, Is it possible to assign this value to ReadFromText(). Please suggest some workarounds or ideas how to achieve this, Thanks

code: pipeline | 'Read from a File' >> ReadFromText(I want to pass the file path here?, skip_header_lines=1)

Note: There will be various folders and files in storage, files are in csv format, but in my use case I can't directly pass the storage location or filename to file path in ReadFromText. I want to read it from metadata and pass the value. Hope I am clear, Thanks



Solution 1:[1]

I don't understand why you need to read the metadata. If you want to read all the files inside a folder, you can just provide a blob. This solution working in python, not sure about java.

p|readfromtext("./folder/*.csv") 

"*" is the blob here, which allows pipeline to read all the patterns matching .csv. You can also add something at the starting.

Solution 2:[2]

What you want is textio.ReadAllFromText which reads from a PCollection instead of taking a string directly.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kartikey Garg
Solution 2 Daniel Oliveira