'How can I fetch documents in batches from CosmosDB and process them?

I am trying to fetch documents from CosmosDB and then do a foreach loop on the documents returned, I am doing it as follows

var productListFromHAPI = 
    await CosmosDb.GetProductDataFromHAPI(brand, deployedCountry,
         primaryLocale, secondaryLocale, _rawDataContainer, log);

var finalListOfObjects = new List<StorelensItemModel_V3>();

foreach (var storeVariantToInsert in productListFromHAPI) { processing here }

The problem is that GetProductDataFromHAPI returns millions of documents and the host no matter how large I make is running out of resources.

How can I split this up so that I can fetch and process 1000 documents at the time? I know I can use select top 1000 etc but how do I then know that the second round I am not fetching the same items again?

I tried to use offset and limit as well but I could not get it to work

Pagination does not seem to be a good fit for this use case.



Solution 1:[1]

It looks like you are using a wrapper GetProductDataFromHAPI that is calling the Cosmos SDK underneath.

The Cosmos SDK FeedIterators allow you to paginate, consuming one page of results at a time:

Reference: https://docs.microsoft.com/dotnet/api/microsoft.azure.cosmos.feediterator-1?view=azure-dotnet#examples

using (FeedIterator<StorelensItemModel_V3> feedIterator = this.Container.GetItemQueryIterator<StorelensItemModel_V3>(
    "query"))
{
    while (feedIterator.HasMoreResults)
    {
        FeedResponse<StorelensItemModel_V3> response = await feedIterator.ReadNextAsync();
        
        // You can yield the response for an upper layer to be consumed and pass the ContinuationToken to use the next time you want to continue the query
    }
}

As a side note, keep in mind the performance tips when constructing and executing queries: https://docs.microsoft.com/azure/cosmos-db/sql/performance-tips-query-sdk?tabs=v3&pivots=programming-language-csharp

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Matias Quaranta