'Cloud solution to parse and process 1M+ rows
We are writing a service that can parse a CSV file containing over 1M+ rows. Each row needs insertion (or update) data of a DB record. We are currently using DynamoDB.
What AWS services are best suited for this? We are considering lambda and queue system.
Solution we are looking at.
- API: An API would accept the csv file upload. The API would go through the file and start reading it in chunks. Each chunk would be pushed to a queue for processing.
- QUEUE: The queue holds chunks of the original file as a message to be processed.
- PROCESSOR: The chunk processor goes through the queue and inserts records to dynamo.
Challenges:
- How would you trigger an end of processing notification (to indicate the last chunk for that request was completed).
- How would you handle partial failure and rollback if using this approach?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

