'Is it possible to use different cpus for different job steps with slurm?

I deployed a slurm (version 19.05.5) control and compute node locally. I want to be able to run job steps with srun inside an sbatch script using one cpu per job step and in parallel.

So I wrote the following script date.sh:

#!/bin/sh

sleep 10
date +%s

And the following sbatch script:

#!/bin/sh
#SBATCH --job-name=TestJob
#SBATCH --time=00:05:00
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=3

srun -l --exclusive --ntasks=1 date.sh &
srun -l --exclusive --ntasks=1 date.sh &
srun -l --exclusive --ntasks=1 date.sh &
srun -l --exclusive --ntasks=1 date.sh &
srun -l --exclusive --ntasks=1 date.sh &
wait

The behaviour I expected was that the first three job steps run in parallel, in three different tasks, using three different cpus. But the result is the following:

0: 1652629977
0: 1652629977
0: 1652629977
srun: Job 42 step creation temporarily disabled, retrying
srun: Job 42 step creation temporarily disabled, retrying
srun: Step created for job 42
srun: Step created for job 42
0: 1652629987
0: 1652629987

So all the runs were done as a same task, which means one cpu was used. How do I force the runs to be done in different tasks? Other than spreading the srun commands over multiple scripts and calling sbatch on each one with the sbatch option ntasks=1.

More info:

  • Removing --exclusive results in all the 5 steps being run simultaneously in 1 task.
  • Removing --exclusive and --ntasks=1 results in running each step 3 times in 3 different tasks.
  • Removing --ntasks=1 results in running each step 3 times in 3 different tasks. But it prints out warnings about not being able to create job steps, then notifications are printed when the job steps are created.


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source