'Slurm - run more than one array on the same node?

I am running around 200 matlab codes on a slurm cluster. The codes are not parallelized but use intensive vectorized notation. So each code uses around 5-6 cores of processing power.

The sbatch code I am using is below:

#!/bin/bash
#SBATCH --job-name=sdmodel
#SBATCH --output=logs/out/%a
#SBATCH --error=logs/err/%a
#SBATCH --nodes=1
#SBATCH --partition=common
#SBATCH --exclusive
#SBATCH --mem=0 
#SBATCH --array=1-225

module load Matlab/R2021a
matlab -nodisplay -r "run('main_cluster2.m'); exit"

Now the code above will assign one cluster node to each matlab task (225 of such tasks). However some cluster nodes have 20 or more cores. Which means that I could efficiently use one node to run 3 or 4 tasks simultaneously. Is there anyway to modify the above code to do so?



Solution 1:[1]

If node sharing is not allowed, then you should be able to use multiple srun commands in the script to subdivide the node. If you wanted to use 4 cores per task (on a 20 core node) then your script would then change to something like:

#!/bin/bash
#SBATCH --job-name=sdmodel
#SBATCH --output=logs/out/%a
#SBATCH --error=logs/err/%a
#SBATCH --nodes=1
#SBATCH --partition=common
#SBATCH --exclusive
#SBATCH --mem=0 
#SBATCH --array=1-225

module load Matlab/R2021a

for i in $(seq 1 5)
do 
   srun --ntasks=4 --exact matlab -nodisplay -r "run('main_cluster2.m'); exit" &
done
wait

The "&" at the end of each srun command puts the command into the background so you can skip onto launching multiple copies. The wait at the end makes sure the script waits for all backgrounded processes to finish before exiting.

Note, this may lead to wasted resources if each of the individual matlab commands take very different amounts of time as some runs will finish before others, leaving cores idle.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AndyT