'Error: process_executor.py:702: ... A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout
As per error in the subject, what is the fix?
Environment:
- Python 3.9 or 3.10
- Windows 10 x64
Error occurs when using joblib for parallel processing:
result_chunks = joblib.Parallel(n_jobs=njobs)(joblib.delayed(f_chunk)(i) for i in n_chunks)
Solution 1:[1]
The problem is too short a timeout. It is caused when there is a lot of data to pass to the child processes and it times out internally.
Note: This warning is benign, joblib recovers internally and the results are accurate and complete.
To fix, increase timeout, I used this:
# Increase timeout (tune this number to suit your use case).
timeout=99999
result_chunks = joblib.Parallel(n_jobs=njobs, timeout=timeout)(joblib.delayed(f_chunk)(i) for i in n_chunks)
Alternatively, figure out a way to reduce the amount of data that has to be serialized and sent to the child processes.
Update 2022-04-03
This can also occur, regardless of timeout, if n_jobs is so high that total CPU usage is running close to 100%, e.g. 95%. The fix is to reduce njobs so total CPU usage drops, e.g. to 85%.
Update 2022-04-03
Also observed this happened when I was using Polars within each job, regardless of timeout and total CPU usage. It did not seem to happen when I switched back to Pandas. This could be as Polars is more efficient and uses more CPU, or (as the error mentions) due to a memory leak (which is unlikely).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
