'Parallelization factor: AWS Kinesis data streams to Lambda
I'm very confused with the concept of ParallelizationFactor.
My understanding
https://stackoverflow.com/a/57534322/13000229
In the past, one KDS shard can send data to only one Lambda instance/invocation. More than one Lambda instance getting data from the same KDS shard can't run concurrently.
https://aws.amazon.com/blogs/compute/new-aws-lambda-scaling-controls-for-kinesis-and-dynamodb-event-sources/
In Nov 2019, a new parameter ParallelizationFactor (Concurrent batches per shard) came out.
The default factor of one exhibits normal behavior. A factor of two allows up to 200 concurrent invocations on 100 Kinesis data shards.
Questions
- By using
ParallelizationFactor, can more than one Lambda instance get different data from the same KDS shard concurrently?
For example, the shard has datad1,d2,d3d4,d5andd6, and we assumeBatchSize= 2 andParallelizationFactor= 2. Lambda instance A can consumed1andd2, while Lambda instance B can consumed3andd4at the same time. Then once Lambda instance A finishes the first batch, it starts processingd5andd6and so on.
If Question 1 is correct, what might be sacrificed? (e.g. the order in the same shard, one piece of data may be processed more than once)
If Question 1 is not correct, how will data in KDS shards be processed by Lambda concurrently?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

