'Repeat one task for 100 times in parallel on slurm
I am new to cluster computation and I want to repeat one empirical experiment 100 times on python. For each experiment, I need to generate a set of data and solve an optimization problem, then I want to obtain the averaged value. To save time, I hope to do it in parallel. For example, suppose I can use 20 cores, I only need to repeat 5 times on each core.
Here's an example of a test.slurm script that I use for running the test.py script on a single core:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=72:00:00
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --mail-user=address@email
module purge
module load anaconda3/2018.12
source activate py36
python test.py
If I want to run it in multiple cores, how should I modify the slurm file accordingly?
Solution 1:[1]
To run the test on multiple cores, you can use srun -n option. After -n specify the number of processes, you need to launch.
srun -n 20 python test.py
srun is the launcher in slurm.
Or you can change the ntasks, cpus-per-task in slurm file.
The slurm file will look like this:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks=20
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=72:00:00
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --mail-user=address@email
module purge
module load anaconda3/2018.12
source activate py36
python test.py
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
