'Dask ProgressBar doesn't work with distributed backend

The progress bar works beautifully when used with the multiprocessing backend but doesn't seem to work at all when using a distributed scheduler as the backend.

Is there a way around this? Or another solution? The distributed package has some progress bars itself but they all require a list of futures to work.



Solution 1:[1]

The key difference is that with multi threading/processing, the results are piped back to the control thread, but with distributed, they are calculated asynchronously on the cluster (even if that's on your local machine). If you previously had code like

with ProgressBar():
    out = collection.compute()

Now you can do

from dask.distributed import progress
out = c.compute(collection)   # c is the client
progress(out)

and to collect your result: out.result() or c.gather(out)

Note that the distributed scheduler also makes a graphical dashboard available at http://yourhost:8787 , e.g., see under status/. There you can see your tasks getting executed without having to invoke a progress bar at all.

Solution 2:[2]

There is a solution linked to in this tqdm issue (a popular progress bar package), which will hopefully be merged in at some point: https://github.com/tqdm/tqdm/issues/1230

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mdurant
Solution 2 Scott