'Server performance question about streaming from cosmos dB

I read the article here about IAsyncEnumerable, more specifically towards a Cosmos Db-datasource

public async IAsyncEnumerable<T> Get<T>(string containerName, string sqlQuery)
{
    var container = GetContainer(containerName);
    using FeedIterator<T> iterator = container.GetItemQueryIterator<T>(sqlQuery);

    while (iterator.HasMoreResults)
    {
        foreach (var item in await iterator.ReadNextAsync())
        {
            yield return item;
        }
    } 
}

I am wondering how the CosmosDB is handling this, compared to paging, lets say 100 documents at the time. We have had some "429 - Request rate too large"-errors in the past and I dont wish to create new ones.

So, how will this affect server load/performance. I dont see a big difference from the servers perspective, between when client is streaming (and doing some quick checks), and old way, get all document and while (iterator.HasMoreResults) and collect the items in a list.



Solution 1:[1]

The SDK will retrieve batches of documents that can be adjusted in size using the QueryRequestOptions and changing the MaxItemCount (which defaults to 100 if not set). It has no option though to throttle the RU usage apart from it running into the 429 error and using the retry mechanism the SDK offers to retry a while later. Depending on how generous you set the retry mechanism it'll retry oft & long enough to get a proper response.

If you have a situation where you want to limit the RU usage for e.g. there's multiple processes using your cosmos and you don't want those to result in 429 errors you would have to write the logic yourself.

An example of how something like that could look:

var qry = container
    .GetItemLinqQueryable<Item>(requestOptions: new() { MaxItemCount = 2000 })
    .ToFeedIterator();

var results = new List<Item>();

var stopwatch = new Stopwatch();
var targetRuMsRate = 200d / 1000; //target 200RU/s
var previousElapsed = 0L;
var delay = 0;
stopwatch.Start();
var totalCharge = 0d;

while (qry.HasMoreResults)
{
    if (delay > 0)
    {
        await Task.Delay(delay);
    }
    previousElapsed = stopwatch.ElapsedMilliseconds;

    var response = await qry.ReadNextAsync();
    var charge = response.RequestCharge;
    var elapsed = stopwatch.ElapsedMilliseconds;

    var delta = elapsed - previousElapsed;
    delay = (int) ((charge - targetRuMsRate * delta) / targetRuMsRate);

    foreach (var item in response)
    {
        results.Add(item);
    }
}

Edit:

Internally the SDK will call the underlying Cosmos REST API. Once your code reaches the iterator.ReadNextSync() it will call the query documents method in the background. If you would dig into the source code or intercept the message send to HttpClient you can observe the resulting message which lacks the x-ms-max-item-count header that determines the number of the documents it'll try to retrieve (unless you have specified a MaxItemCount yourself). According to the Microsoft Docs it'll default to 100 if not set:

Query requests support pagination through the x-ms-max-item-count and x-ms-continuation request headers. The x-ms-max-item-count header specifies the maximum number of values that can be returned by the query execution. This can be between 1 and 1000, and is configured with a default of 100.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1