'How to write Parquet Files to AWS S3 using Apache Beam Java?

I am trying to convert Json -> Generic Record -> Parquet --to--> S3. I am able to convert it to Parquet but I don't know how to directly put Parquet file to S3 without storing it into filesystem.

Code I wrote for this:

public static void main(String[] args) throws IOException {

    PipelineOptionsFactory.register(MainConfig.class);
    MainConfig options = PipelineOptionsFactory.fromArgs(args).withValidation().as(MainConfig.class);
    Pipeline pipeLine = Pipeline.create(options);
    BasicAWSCredentials awsCredentials = new BasicAWSCredentials(options.getAWSAccessKey(), options.getAWSSecretKey());
    options.setAwsCredentialsProvider(new AWSStaticCredentialsProvider(awsCredentials));

    Schema jsonSchema = new Schema.Parser().parse(schemaString);
    logger.info(jsonSchema.getFields());
    pipeLine.apply("ReadMyFile", TextIO.read().from(options.getInput()))
            .apply("Convert Json To General Record", ParDo.of(new JsonToGeneralRecord(jsonSchema)))
            .setCoder(AvroCoder.of(GenericRecord.class, jsonSchema))
            .apply("Generation the parquet files", FileIO.<GenericRecord>write().via(ParquetIO.sink(jsonSchema)).to(options.getOutput()).withNumShards(1).withSuffix(".parquet"));
    pipeLine.run();


}

At the end I just want to add that parquet to S3



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source