'How can I create and push to a shared or distributed array of arrays?
I have written Julia code in which I initialize an empty array as follows:
a = []
Later in the code, I simply push to this array as follows:
push![a, b]
where b = [c, d, e, ...] is another array, and each b can be of different length.
This works just fine in un-parallelized code. However, I want to do the same thing in parallelized code where a = [] is a shared or distributed array that the different processors can push to.
Neither SharedArray or DArray worked for me. Any advice?
Solution 1:[1]
Firstly you should always need to declare what are you holding in your array [] means Any[] and it is almost never a good idea.
Let us consider this vector with placeholders:
julia> a=[Int[] for _ in 1:8]
8-element Vector{Vector{Int64}}:
[]
[]
[]
[]
[]
[]
[]
[]
This Vector contains 8 references to other Vectors.
Let us now distribute it:
julia> using Distributed; addprocs(4);
julia> @everywhere using DistributedArrays
julia> b = distribute(a)
8-element DArray{Vector{Int64}, 1, Vector{Vector{Int64}}}:
[]
[]
[]
[]
[]
[]
[]
[]
This new b is now available through all worker processes where each worker holds its localpart of it. Let us mutate it!
julia> fetch(@spawnat 2 append!(localpart(b)[1], [1,2,3,4]));
julia> fetch(@spawnat 3 append!(localpart(b)[2], [10,20]));
julia> fetch(@spawnat 3 push!(localpart(b)[2], 30))
3-element Vector{Int64}:
10
20
30
We can see that everything is working as expected (we have used fetch to make sure our code actually got executed on remote workers).
Let us know check on the master process the state of b:
julia> b
8-element DArray{Vector{Int64}, 1, Vector{Vector{Int64}}}:
[1, 2, 3, 4]
[]
[]
[10, 20, 30]
[]
[]
[]
[]
You can see that we have successfully used remote workers to mutate b.
Solution 2:[2]
I asked a similar question here. I originally followed Prezmyslaw's answer but could not get distribute to distribute an already existing array the way I thought it would for Julia 1.7. What worked for me was defining the array as it was initialized :
using Distributed
addprocs(4)
@everywhere using DistributedArrays
a = distribute([[] for _ in procs()])
@sync @distributed for i = 1:10
b = fill(i, 5)
append!(localpart(a)[1], b) # I swapped push! for append!
end
a
What this does is: first it creates an array with subarrays that are distributed to each worker, then it distributes computation and fills the corresponding subarrays with the values calculated on each worker, finally it merges the subarrays to obtain a full array with all the values.
It is interesting to compare this with the exact same code but substituting a = distribute([[] for _ in procs()]) for a = [[] for _ in procs()]; distribute(a). Evidently the latter does not work as expected (at least for Julia 1.7).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Przemyslaw Szufel |
| Solution 2 |
