'slurm, make #SBATCH --array read the number of lines of a txt file
I have the following slurm script (script.sh) that will run on HPC 25 jobs in parallel with #SBATCH --array=0-24. Each job will take one variable from file.txt and use it as $VAR variable.
#!/bin/bash
#SBATCH --job-name test
#SBATCH --ntasks 4
#SBATCH --time 00-05:00
#SBATCH --output out
#SBATCH --error err
#SBATCH --array=0-24
readarray -t VARS < file.txt
VAR=${VARS[$SLURM_ARRAY_TASK_ID]}
export VAR
cat test_"$VAR".txt
In this case I know the number of jobs to be run by doing wc -l file.txt, which returns 25. Therefore each line of file.txt is a job to be run.
Is there a way I can avoid to do wc -l file.txt and make script.sh understand automatically the number of jobs to run?
Solution 1:[1]
You could use a here document in bash to achieve this. For example, the following script shows one possible approach
#!/bin/bash
# Read number of lines in file supplied as argument
nline=$(wc -l $1 | awk '{print $1}')
# Create the Slurm script ($ used in script need to be escaped: \$)
sbatch <<EOF
#!/bin/bash
#SBATCH --job-name test
#SBATCH --ntasks 4
#SBATCH --time 00-05:00
#SBATCH --output out
#SBATCH --error err
#SBATCH --array=0-$(( nline - 1 ))
readarray -t VARS < $1
VAR=\${VARS[\$SLURM_ARRAY_TASK_ID]}
export VAR
bash my_script.sh
EOF
Note: that variables that are part of the script, not part of the setup (in this case: VARS and SLURM_ARRAY_TASK_ID) must have their $escaped (i.e.$`) otherwise the wrapping bash script will try to interpret them.
If this script was saved in a file runarray.bash and your file with one line per subjob was in file.txt you would submit the job with:
bash runarray.bash file.txt
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
