'Dask Array.compute() peak memory in Jupyterlab

I am working with dask on a distributed cluster, and I noticed a peak memory consumption when getting the results back to the local process.

My minimal example consists in instanciating the cluster and creating a simple array of ~1.6G with dask.array.arange.

I expected the memory consumption to be around the array size, but I observed a memory peak around 3.2G.

Is there any copy done by Dask during the computation ? Or does Jupyterlab needs to make a copy ?

import dask.array
import dask_jobqueue
import distributed

cluster_conf = {
    "cores": 1,
    "log_directory": "/work/scratch/chevrir/dask-workspace",
    "walltime": '06:00:00',
    "memory": "5GB"
}

cluster = dask_jobqueue.PBSCluster(**cluster_conf)
cluster.scale(n=1)
client = distributed.Client(cluster)
client

# 1.6 G in memory
a = dask.array.arange(2e8)

%load_ext memory_profiler
%memit a.compute()
# peak memory: 3219.02 MiB, increment: 3064.36 MiB


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source