'Error: process_executor.py:702: ... A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout

As per error in the subject, what is the fix?

Environment:

  • Python 3.9 or 3.10
  • Windows 10 x64

Error occurs when using joblib for parallel processing:

result_chunks = joblib.Parallel(n_jobs=njobs)(joblib.delayed(f_chunk)(i) for i in n_chunks)


Solution 1:[1]

The problem is too short a timeout. It is caused when there is a lot of data to pass to the child processes and it times out internally.

Note: This warning is benign, joblib recovers internally and the results are accurate and complete.

To fix, increase timeout, I used this:

# Increase timeout (tune this number to suit your use case).
timeout=99999
result_chunks = joblib.Parallel(n_jobs=njobs, timeout=timeout)(joblib.delayed(f_chunk)(i) for i in n_chunks)

Alternatively, figure out a way to reduce the amount of data that has to be serialized and sent to the child processes.

Update 2022-04-03

This can also occur, regardless of timeout, if n_jobs is so high that total CPU usage is running close to 100%, e.g. 95%. The fix is to reduce njobs so total CPU usage drops, e.g. to 85%.

Update 2022-04-03

Also observed this happened when I was using Polars within each job, regardless of timeout and total CPU usage. It did not seem to happen when I switched back to Pandas. This could be as Polars is more efficient and uses more CPU, or (as the error mentions) due to a memory leak (which is unlikely).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1