'Azure Data Factory copy to CosmosDB throttling
I have an Azure Data Factory pipeline that executes a 'Copy' step which takes a blob file with JSON data and copies it over to my CosmosDB.
The blob file is 75MB and my CosmosDB is scaled to 10.000 RU's (autoscale). The Azure Data Factory pipeline takes about 5 mins to copy over all the data but the main problem is that the CosmosDB is throttling because of the many requests. When checking out the metrics page the 'Normalized RU Consumption' spikes to 100% instantly.
I have been looking for a solution where the Data Factory pipeline just spends more time on the copy step instead of trying it this fast. I tried adjusting the settings in the 'Copy' step in Data Factory but that did not change anything at all.

Is there another way to make sure that the Data Factory pipeline does not consume all the RU's? It is no problem that the pipeline would run 1 hour+. Current issue now is that my CosmosDB Database is unavailable at this time because the Data Factory is taking up all the RU's. Other requests are then returned a 429 'Too many requests'.
Any suggestions are welcome!
EDIT: I have upscaled my CosmosDB to 50.000 RU's just to test out. The data factory pipeline was successful in 2 minutes now. That is good improvement, but it still occupied 100% of the RU's and the database was not available for about 5 minutes (I think CosmosDB still does some tasks after the data factory pipeline got succeeded). This is what I'd like to prevent, the 100% spikes. It would be ideal that only 50% RU's are utilised and it take double the time. Would this be possible?
Solution 1:[1]
I do not know any way to set a simple RU limit. But...
Last time I was in this position it did seem to help to manually limit integration units and parallelism to small numbers. Smaller number of clients should put SOME upper limit on write throughput. It's not exact science, though. May be it depends on input source type how streamed/parallel the data reading part is to begin with.
Another delaying measure was to set higher retry interval for copy action. This way when ADF itself got throttled, it would create openings for other clients to be served. At the cost of increased duration and cost of ADF run. Most likely OK for one-time actions.
I also played with sink write batch size to reduce protocol chattiness and improve overall ingestion time. Not so sure how if it affected overall RU usage, so it is most likely an aspect to balance.
Another trick you could use is to partition input file to smaller chunks in ADF, push the batches sequentially and introduce small delays between batches yourself to keep some RU available for others between every batch..
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Imre Pühvel |

