'Addition of complex 3D arrays in pyopencl

I'm trying to add 2 three dimensional complex arrays with help of python library pyopencl. The result achieved on gpu differs from the reference result achieved with use of cpu. Why does GPU code do not provide correct addition of complex numbers? I expect corresponding variables res and ref to be equal to each other. The code I use:

import pyopencl as cl
import numpy as np
from numpy.fft import fftn, ifftn
from numpy.random import random, normal
import os


os.environ['PYOPENCL_COMPILER_OUTPUT'] = '1'

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

Lx = 100
Ly = 100
Lz = 1
L = Lx * Ly * Lz


const = np.zeros(4).astype(np.float32)
const[0] = Lx
const[1] = Ly
const[2] = Lz
const[3] = L

const_buf = cl.Buffer(ctx, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR,    hostbuf=const)


arr1 = np.random.rand(Lz, Ly, Lx).astype(np.float32)
arr2 = np.random.rand(Lz, Ly, Lx).astype(np.float32)

F1 = fftn(arr1)
F2 = fftn(arr2)

out = np.zeros((Lz,Ly,Lx)).astype(np.complex64)
out_buf = cl.Buffer(ctx, cl.mem_flags.WRITE_ONLY, out.nbytes)

do_it_code = """
#include <pyopencl-complex.h>
__kernel void doit(__global const float *constants,
__global const cfloat_t *IN1,__global const cfloat_t *IN2,
__global cfloat_t *out)

{   int z = get_global_id(0);
    int y = get_global_id(1);
    int x = get_global_id(2);
    const int len = constants[3];
  const int lx = constants[0];
const int ly = constants[1];
const int lz = constants[2];
const int pl = lx * ly;
int i = x + y * lx + z * pl;

if (x <= lx-1 && y <= ly-1 && z <= lz-1) {
out[i] = cfloat_add(IN1[i],IN2[i]);

};
};
"""
bld_do_it = cl.Program(ctx, do_it_code).build()

def do_it(a,b):
    a_buf = cl.Buffer(ctx, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR,     hostbuf=a.astype(np.complex64))
    b_buf = cl.Buffer(ctx, cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR, hostbuf=b.astype(np.complex64))
    launch = bld_do_it.doit(queue, a.shape, None, const_buf, a_buf, b_buf, out_buf)
    launch.wait()
    cl.enqueue_copy(queue, out, out_buf)

    return out

ref=F1+F2
print(ref)
print("_________")
res=do_it(F1,F2)
print(res)

Solution 1:^[1]

To figure out what's wrong usually is good to work on much smaller data set. Let's set Lx=Ly=2 and check what we get:

[[[ 3.40425836+0.j -2.09837677+0.j]
  [-0.27708016+0.j  0.60454108+0.j]]]
_________
[[[ 3.4042583 +0.j -0.27708015+0.j]
  [-2.0983768 +0.j  0.60454106+0.j]]]

The numbers are almost identical, just not all in the places they were expected to be. Simple transpose can fix this:

res=np.transpose(res, (0,2,1))

and then result matches the reference result:

[[[ 3.4042583 +0.j -2.0983768 +0.j]
  [-0.27708015+0.j  0.60454106+0.j]]]

There is still little discrepancy between results - after few digits after comma - but that can be explained by different FPU units being used between CPU and GPU (in my case). More on this can be read for example here.

This should not need a transpose - Lx,Ly,Lz placing must be wrong somewhere - I will let OP to figure out this.

Solution 2:^[2]

I found this update to the OP's GPU code solved the transposition error noted in the discussion thread - and left as an 'exercise for the student' in one response. It turned out, for me at least, to be non-trivial to figure out so I thought this might be helpful to share:

do_it_code = """
#include <pyopencl-complex.h>
__kernel void doit(__global const float *constants,
__global const cfloat_t *IN1,__global const cfloat_t *IN2,
__global cfloat_t *out)

{   int z = get_global_id(0);
    int y = get_global_id(1);
    int x = get_global_id(2);
    const int len = constants[3];
  const int lx = constants[0];
const int ly = constants[1];
const int lz = constants[2];
const int pl = lx * ly;
const int pl2 = lz * ly;
int i = x + y * lx + z * pl;
int i2 = x*pl2 + y*lz  + z;

if (x <= lx-1 && y <= ly-1 && z <= lz-1) {
out[i] = cfloat_add(IN1[i2],IN2[i2]);

};
};
"""

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	doqtor
Solution 2

'Addition of complex 3D arrays in pyopencl

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]