'slurm, make #SBATCH --array read the number of lines of a txt file

I have the following slurm script (script.sh) that will run on HPC 25 jobs in parallel with #SBATCH --array=0-24. Each job will take one variable from file.txt and use it as $VAR variable.

#!/bin/bash
#SBATCH --job-name test
#SBATCH --ntasks 4
#SBATCH --time 00-05:00
#SBATCH --output out
#SBATCH --error err
#SBATCH --array=0-24

readarray -t VARS < file.txt
VAR=${VARS[$SLURM_ARRAY_TASK_ID]}
export VAR

cat test_"$VAR".txt

In this case I know the number of jobs to be run by doing wc -l file.txt, which returns 25. Therefore each line of file.txt is a job to be run.

Is there a way I can avoid to do wc -l file.txt and make script.sh understand automatically the number of jobs to run?



Solution 1:[1]

You could use a here document in bash to achieve this. For example, the following script shows one possible approach

#!/bin/bash

# Read number of lines in file supplied as argument
nline=$(wc -l $1 | awk '{print $1}')

# Create the Slurm script ($ used in script need to be escaped: \$)
sbatch <<EOF
#!/bin/bash
#SBATCH --job-name test
#SBATCH --ntasks 4
#SBATCH --time 00-05:00
#SBATCH --output out
#SBATCH --error err
#SBATCH --array=0-$(( nline - 1 ))

readarray -t VARS < $1
VAR=\${VARS[\$SLURM_ARRAY_TASK_ID]}
export VAR

bash my_script.sh
EOF

Note: that variables that are part of the script, not part of the setup (in this case: VARS and SLURM_ARRAY_TASK_ID) must have their $escaped (i.e.$`) otherwise the wrapping bash script will try to interpret them.

If this script was saved in a file runarray.bash and your file with one line per subjob was in file.txt you would submit the job with:

bash runarray.bash file.txt

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1