'Enforce srun to use exclusive cores on a single socket
I'm using sbatch and I have a node with 2 sockets, each having 18 cores, totalling 36 cores. I'm launching 4 scripts where each has two tasks that share a GPU:
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
Such configuration run 4 times gives 4 x 2 x 4 = 32 allocated cores. How to make sure every distinct job has exclusive cpus allocated only within a single socket? In other words, there cannot be a situation where job is allocated e.g. CPUs 0, 1, 22, 33 since they are placed on two different sockets, and each job should have exactly 4 cpus available when looking at cpu-bind.
Of course I could somehow play with cpu masks but the problem is that node configuration and number of jobs varies and I don't want to do it for every configuration.
I was looking at --cpu-bind=sockets but it seems it does not allocate exclusive processors:
cpu-bind=MASK - mycomp, task 0 0 [83008]: mask 0xff set
cpu-bind=MASK - mycomp, task 1 1 [83009]: mask 0xff set
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
