'running multiple processes, each with a different set of values for the environment variable
I have a number of jobs. Typically I start the jobs manually by opening a number of terminal windows, and in each terminal window setting certain environment variables to different values and then invoking my programs manually. For example:
Terminal 1 commands:
export OMP_NUM_THREADS=4
./run_application1.sh
Terminal 2 commands:
export OMP_NUM_THREADS=10
./run_application2.sh
.
.
.
Terminal 8 commands:
export OMP_NUM_THREADS=5
./run_application8.sh
As you can see in each terminal I invoke some application (run_applicationX.sh) and each uses a different value for OMP_NUM_THREADS. Now I want to write a script (bash or python, whichever is most suitable) that generalizes this. In other words I can pass a jobs number (say --jobs=2 for example) as well as an array A[] that is equal in length to --jobs, as well as a list of N applications (run_application1.sh, ...., run_applicationN.sh). Then I want to execute all the N applications, where at each instant at most --jobs applications are running in parallel. Furthermore the each application is supposed to use the value in A[#current job number] for its environment variable. In other words I am looking for something like this:
parfor i=1...N
export OMP_NUM_THREADS=${A[JOB NUMBER]}
./run_application{i}.sh
where at most --jobs applications are ever run in parallel. What is the best way to do this? I know that the GNU parallel tool could be used to do this, but I am not sure how I could assign a different set of environment variables based on the current jobs number. Notice that the job number is an integer between 1 and --jobs, which guarantees that the same set of environment variable values are never used simultaneously. Thanks
Solution 1:[1]
It is unclear to me what you want, but lets see if we together can build it.
app1() {
export OMP_NUM_THREADS=$1
sleep 1
echo app1 $OMP_NUM_THREADS
}
app2() {
export OMP_NUM_THREADS=$1
sleep 1
echo app2 $OMP_NUM_THREADS
}
app3() {
export OMP_NUM_THREADS=$1
sleep 1
echo app3 $OMP_NUM_THREADS
}
app4() {
export OMP_NUM_THREADS=$1
sleep 1
echo app4 $OMP_NUM_THREADS
}
export -f app1 app2 app3 app4
parallel app{1} {2} ::: 1 2 3 4 :::+ 2 3 5 7
Or compute OMP_NUM_THREADS based on job number using Perl code
seq 4 | parallel app{} '{= $_= seq()*seq()+1 =}'
To guarantee that not two jobs use the same value (often used for CUDA_VISIBLE_DEVICES), you can use the job slot number:
# 0..3
seq 10 | parallel -j 4 'CUDA_VISIBLE_DEVICES={= $_=slot()-1 =} app{}'
Or:
# 1..4
seq 10 | parallel -j 4 'app{} {%}'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
