'400 Document pages exceed the limit: "PAGE_LIMIT_EXCEEDED"

The DocumentProcessorServiceAsyncClient.process_document method is erring out with the following error message: 400 Document pages exceed the limit: "PAGE_LIMIT_EXCEEDED". According to the API documentation this processes should be able to handle a maximum of 200 pages. By using the DocumentProcessorServiceAsyncClient and not the DocumentProcessorServiceClient, I assumed that I would be able to leverage the asynchronous maximum page limit. This does not appear to be the case.

The sample code I am testing:

api_path = f'projects/{project_id}/locations/{gcloud_region}/processors/{processor_id}'
documentai_client = documentai.DocumentProcessorServiceAsyncClient() # maybe pass some client_options here?

async def invoke_invoice_processor(self, filebytes):
    raw_document = documentai.RawDocument(
        content=filebytes,
        mime_type="application/pdf",
    )
    request = documentai.ProcessRequest(
        name=api_path,
        raw_document=raw_document,
    )
    response = await documentai_client.process_document(request=request)
    return response.document

The above code block works with PDFs 10 pages and under. It only fails with PDFs larger than 10 pages.

MY question: what do I need to change about the above code to successfully process larger PDFs over 10 pages?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source