'Dask Array.compute() peak memory in Jupyterlab
I am working with dask on a distributed cluster, and I noticed a peak memory consumption when getting the results back to the local process.
My minimal example consists in instanciating the cluster and creating a simple array of ~1.6G with dask.array.arange.
I expected the memory consumption to be around the array size, but I observed a memory peak around 3.2G.
Is there any copy done by Dask during the computation ? Or does Jupyterlab needs to make a copy ?
import dask.array
import dask_jobqueue
import distributed
cluster_conf = {
"cores": 1,
"log_directory": "/work/scratch/chevrir/dask-workspace",
"walltime": '06:00:00',
"memory": "5GB"
}
cluster = dask_jobqueue.PBSCluster(**cluster_conf)
cluster.scale(n=1)
client = distributed.Client(cluster)
client
# 1.6 G in memory
a = dask.array.arange(2e8)
%load_ext memory_profiler
%memit a.compute()
# peak memory: 3219.02 MiB, increment: 3064.36 MiB
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
