'How to start multiple job instance in a shell script to process multiple files in a directory?
#!/bin/bash
data_dir=./all
for file_name in "$data_dir"/*
do
echo "$file_name"
python process.py "$file_name"
done
For example, this script processes the files sequentially in a directory in a 'for' loop. Is it possible to start multiple process.py instances to process files concurrently? I want to do this in a shell script.
Solution 1:[1]
I have another possibility for you, if still needed. It uses the screen command to create a new detached process with the supplied command.
Here is an example:
#!/bin/bash
data_dir=./all
for file_name in "$data_dir"/*
do
echo "$file_name"
screen -dm python process.py "$file_name"
done
Solution 2:[2]
It's better to use os.listdir and subprocess.Popen to start new processes.
Solution 3:[3]
With GNU Parallel, like this:
parallel python process.py {} ::: all/*
It will run N jobs in parallel, where N is the number of CPU cores you have, or you can specify -j4 to run on just 4, for example.
Many, many options for:
- logging,
- splitting/chunking inputs,
- tagging/separating output,
- staggering job starts,
- massaging input parameters,
- fail and retry handling,
- distributing jobs and data to other machines
- and so on...
Try putting [gnu-parallel] in the StackOverflow search box.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | cfgn |
| Solution 2 | Yehor Smoliakov |
| Solution 3 |
