'How could I make my code work parallelize with dask?

First import some packages:

import numpy as np
from dask import delayed

Suppose I have two NumPy arrays:

a1 = np.ones(5000000)
a2 = np.ones(8000000)

I would like to show the sum and length of the two arrays, and the functions are shown as:

def sum(x):
  result = 0
  for data in x:
      result = result + data
  return result, len(x)

def get_result(x, y):
  return x, y

I have two examples in colab, the sequential example is like this:

%%time
result1 = sum(a1)
result2 = sum(a2)
result = get_result(result1, result2)
print(result)

And the output is:

((5000000.0, 5000000), (8000000.0, 8000000))
CPU times: user 1.41 s, sys: 3.7 ms, total: 1.42 s
Wall time: 1.42 s

However, I would like to compute these values parallelly.

result1 = delayed(sum)(a1)
result2 = delayed(sum)(a2)
result = delayed(get_result)(result1, result2)
result = result.compute()
print(result)

And the output is:

Delayed('get_result-ffbb6330-1014-42c5-b625-06e3e66a56ed')
CPU times: user 1.42 s, sys: 7.97 ms, total: 1.42 s
Wall time: 1.43 s

Why the second program didn't work parallelly? Because the wall time two examples are almost the same.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source