'Only Two Applications ever running for Synapse Notebooks in For Each activity

I am running a Synapse Notebook in a For Each activity in a Synapse Pipeline. The notebook loads some data from the datalake into the database and some custom processing (which is why we're using a Notebook):

enter image description here

No matter what configuration of Spark pools I use: small, medium, large, auto-scale, not auto-scale, dynamic allocators, number of nodes 3 or 5 or 10, dynamic or fixed, there are only ever two Spark applications running:

always two there are, no more, no less

The For Each activity should run 10 executions of the notebook. and is not capped at two, but set at the default. So in theory I would expect this to execute all 10 notebook calls concurrently. Is there any other Spark config which is causing this to cap at 2?



Solution 1:[1]

Had the same issue, in each notebook setting we decreased the max amount of executors to a really low amount. You can see the # of executors being used after a run by using the Spark GUI. This helped us bench mark a reasonable number to lower our max executor number. In most cases a max executor of 2 is all that is needed.

Consider the math for a small pool (4vCores) with max nodes 40. If I set the max executors in my notebook= 2, then that notebook will consume 2 executors X 4vCores = 8 total cores. The total the pool has to offer is 40 max nodes X 4vCores = 160 cores. The notebook is using 5% of the pool.

But if you only allow your small pool 10 nodes and a max of 5 executors per notebook. The total available is 10X4= 40. The notebooks would each use 20 and that's how you can have only 2 notebooks running at a time.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 dataengineering1234