'Kinesis vs SQS, which is the best for this particular case?
I have been reading about Kinesis vs SQS differences and when to use each but I'm struggling to know which is the appropriate solution for this particular problem:
- Strava-like app where users record their runs
- 50 incoming runs per second
- The processing of each run takes exactly 1 minute
- I want the user to have their results in less than 5 minutes
- A run is just a guid, the job that processes it will get al the info from S3
If i understand correctly in kinesis you can have 1 worker per shard, correct? That would mean 1 runs per minute. Since i have 3000 incoming runs per minute, to meet the 5 minute deadline would mean i would need to have 600 shards with 1 worker each.
Is this assumption correct?
With SQS I can just have 1 queue and as many workers as I like, up to SQS's limit of 120,000 inflight messages.
If 1 run errors during processing I want to reprocess it a few more times and then store it for further inspection.
I don't need to process messages in order, and duplicates are totally fine.
Solution 1:[1]
1 worker per message, after it's processed i no longer care about the message
In that case, a queuing services such as SQS should be used. Kinesis is a streaming service, which persist a data. This means that multiple works can read messages from a stream for as long as they are valid. Non of your workers would be able to remove the message from the stream.
Also with SQS you can setup dead-letter queues which would allow you capture messages with fail to process after a pre-defined number of trials.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Marcin |
