'Cloud solution to parse and process 1M+ rows

We are writing a service that can parse a CSV file containing over 1M+ rows. Each row needs insertion (or update) data of a DB record. We are currently using DynamoDB.

What AWS services are best suited for this? We are considering lambda and queue system.



Solution we are looking at.

  1. API: An API would accept the csv file upload. The API would go through the file and start reading it in chunks. Each chunk would be pushed to a queue for processing.
  2. QUEUE: The queue holds chunks of the original file as a message to be processed.
  3. PROCESSOR: The chunk processor goes through the queue and inserts records to dynamo.

enter image description here

Challenges:

  • How would you trigger an end of processing notification (to indicate the last chunk for that request was completed).
  • How would you handle partial failure and rollback if using this approach?


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source