'Lambda.FunctionError with Kinesis Delivery Stream

I have a data processing pipeline that consist of API Gateway Endpoint>Lambda Handler>Kinesis Delivery Stream>Lambda Transform Function>Datadog.

A request to my endpoint triggers around 160k records to be generated for processing (these are spread across 11 different delivery streams with exp back off on the Direct Put into the Delivery Stream).

My pipeline is consistently loosing around ~20k records (140k of the 160k show up in Datadog). I have confirmed through the metric aws.firehose.incoming_records that all 160k records are being submitted to the delivery stream.

Looking at the transform function's metrics, it shows no errors. I have error logging in the function itself which is not revealing any obvious issues.

In the Destination error logs details for the firehose, I do see the following:

{
    "deliveryStreamARN": "arn:aws:firehose:us-east-1:837515578404:deliverystream/PUT-DOG-6bnC7",
    "destination": "aws-kinesis-http-intake.logs.datadoghq.com...",
    "deliveryStreamVersionId": 9,
    "message": "The Lambda function was successfully invoked but it returned an error result.",
    "errorCode": "Lambda.FunctionError",
    "processor": "arn:aws:lambda:us-east-1:837515578404:function:prob_weighted_calculations-dev:$LATEST"
}

In addition there are records in my s3 bucket for failed deliveries. I re ran the failed records in the transform function (created a custom test event set based off of data in s3 bucket for failed delivery. The lambda executed without an issue.

I read that having a mismatch in the number of records sent to the transform function and then outputed by the transform could cause the above error log. So I put in explicit error checkin the transform that would trigger an error within the function if the output record did not match the input. Even with this, no errors in the lambda.

I am at a loss here as to what could be causing this and do not feel confident in my pipeline given ~20k records are "leaking" without explanation.

Any suggestion on where to look to continue troubleshooting this issue would be greatly appreciated!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source