'Repeat one task for 100 times in parallel on slurm

I am new to cluster computation and I want to repeat one empirical experiment 100 times on python. For each experiment, I need to generate a set of data and solve an optimization problem, then I want to obtain the averaged value. To save time, I hope to do it in parallel. For example, suppose I can use 20 cores, I only need to repeat 5 times on each core.

Here's an example of a test.slurm script that I use for running the test.py script on a single core:

#!/bin/bash
#SBATCH --job-name=test        
#SBATCH --nodes=1               
#SBATCH --ntasks=1              
#SBATCH --cpus-per-task=1      
#SBATCH --mem=4G                 
#SBATCH --time=72:00:00          
#SBATCH --mail-type=begin       
#SBATCH --mail-type=end         
#SBATCH --mail-user=address@email

module purge
module load anaconda3/2018.12
source activate py36

python test.py

If I want to run it in multiple cores, how should I modify the slurm file accordingly?



Solution 1:[1]

To run the test on multiple cores, you can use srun -n option. After -n specify the number of processes, you need to launch.

srun -n 20 python test.py

srun is the launcher in slurm.

Or you can change the ntasks, cpus-per-task in slurm file. The slurm file will look like this:

#!/bin/bash
#SBATCH --job-name=test        
#SBATCH --nodes=1               
#SBATCH --ntasks=20              
#SBATCH --cpus-per-task=1      
#SBATCH --mem=4G                 
#SBATCH --time=72:00:00          
#SBATCH --mail-type=begin       
#SBATCH --mail-type=end         
#SBATCH --mail-user=address@email

module purge
module load anaconda3/2018.12
source activate py36
python test.py

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1