'Only Two Applications ever running for Synapse Notebooks in For Each activity
I am running a Synapse Notebook in a For Each activity in a Synapse Pipeline. The notebook loads some data from the datalake into the database and some custom processing (which is why we're using a Notebook):
No matter what configuration of Spark pools I use: small, medium, large, auto-scale, not auto-scale, dynamic allocators, number of nodes 3 or 5 or 10, dynamic or fixed, there are only ever two Spark applications running:
The For Each activity should run 10 executions of the notebook. and is not capped at two, but set at the default. So in theory I would expect this to execute all 10 notebook calls concurrently. Is there any other Spark config which is causing this to cap at 2?
Solution 1:[1]
Had the same issue, in each notebook setting we decreased the max amount of executors to a really low amount. You can see the # of executors being used after a run by using the Spark GUI. This helped us bench mark a reasonable number to lower our max executor number. In most cases a max executor of 2 is all that is needed.
Consider the math for a small pool (4vCores) with max nodes 40. If I set the max executors in my notebook= 2, then that notebook will consume 2 executors X 4vCores = 8 total cores. The total the pool has to offer is 40 max nodes X 4vCores = 160 cores. The notebook is using 5% of the pool.
But if you only allow your small pool 10 nodes and a max of 5 executors per notebook. The total available is 10X4= 40. The notebooks would each use 20 and that's how you can have only 2 notebooks running at a time.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | dataengineering1234 |