'buffer s3 object inputs

Does anyone know other than kinesis firehose, is there any other service from AWS can catch the S3 inject event? I am trying to do some analysis on VPC flow logs, currently setup is cloud-watch-logs -> Kinesis Firehose -> S3 -> Athena.

The problem is kinesis firehose can only buffer up to 128MB which is to small for me.



Solution 1:[1]

Events from Amazon S3 can go to:

  • AWS Lambda functions
  • Amazon SNS topic
  • Amazon SQS queue

So, you could send the messages to an SQS queue and then have a regular process (every hour?) that retrieves many messages and writes them to a single file.

Alternatively, you could use your current setup but use Amazon Athena on a regular basis to join multiple files by using CREATE TABLE AS. This would select from the existing files and store the output in a new table (with a new location). You could even use it to transform the files into a format that is easier to query in Athena (eg Snappy-compressed Parquet). The hard part is to only include each input file once into this concatenation process (possibly using SymlinkTextInputFormat).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 John Rotenstein