'TextSizeLimitExceededException when calling the DetectPiiEntities operation

I am using aws comprehend for PII redaction, Idea is to detect entities and then redact PII from it.

Now the problem is this API has a Input text size limit. How can I increase the limit ?? Maybe to 1 MB ?? Or is there any other way to detect entities for large text.

ERROR: botocore.errorfactory.TextSizeLimitExceededException: An error occurred (TextSizeLimitExceededException) when calling the DetectPiiEntities operation: Input text size exceeds limit. Max length of request text allowed is 5000 bytes while in this request the text size is 7776 bytes



Solution 1:[1]

There's no way to increase this limit. For input text greater than 5000 bytes, you can split the text into multiple chunks of 5000 bytes each and then aggregate the results back. Please do mind that you keep some overlap between different chunks, to carry over some context from previous chunk.

For reference you can use similar solution exposed by Comprehend team itself . https://github.com/aws-samples/amazon-comprehend-s3-object-lambda-functions/blob/main/src/processors.py#L172

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 kapilsingh93