'How to run multiple python scripts simultaneously from a wrapper script in such a way that CPU utilization is maximized?

I have to run about 200-300 python scripts daily having different arguments, for example:

python scripts/foo.py -a bla -b blabla ..
python scripts/foo.py -a lol -b lolol ..
....

Lets say I already have all these arguments for every script present inside a list, and I would like to concurrently execute them such that the CPU is always busy. How can I do so?'

My current solution:

script for running multiple processes:

    workers = 15
    for i in range(0,len(jobs),workers):
        job_string = ""
        for j in range(i,min(i+workers,len(jobs))):
            job_string += jobs[j] + " & "
        if len(job_string) == 0:
            continue
        print(job_string)
        val = subprocess.check_call("./scripts/parallelProcessing.sh '%s'" % job_string,shell=True)

scripts/parallelProcessing.sh (used in the above script)

echo $1
echo "running scripts in parallel"
eval $1
wait
echo "done processing"

Drawback:

I am executing K processes in a batch, and then another K and so on. But CPU cores utilization is much lower as the number of running processes keep reducing, and eventually only one process is running at a time (for a given batch). As a result, the time taken to complete all the processes is significant.

One simple solution is to ensure K processes are always running, i.e once the previous process gets completed, a new one must be scheduled. But I am not sure how to implement such a solution.

Expectations:

As the task is not very latency sensitive, I am looking forward to a simple solution which keeps CPU mostly busy.

Note: Any two of those processes can execute simultaneously without any concurrency issues. The host where these processes run has python2.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source