'Iterative image processing in CUDA

I have written a CUDA kernel to process an image. But depending on the output of the processed image, I have to call the kernel again, to re-tune the image. For example, let us consider an image having 9 pixels

1 2 3
4 5 6
7 8 9 

Suppose that, depending on its neighboring values, the value 9 changes to 10. Since the value has changed, I have to re-process the new image, with the same kernel.

1 2 3
4 5 6
7 8 10

I have already written the algorithm to process the image in a single iteration. The way I'm planning to implement the iterations in CUDA is the following:

__global__ void process_image_GPU(unsigned int *d_input, unsigned int *d_output, int dataH, int dataW, unsigned int *val) {

     __shared__ unsigned int sh_map[TOTAL_WIDTH][TOTAL_WIDTH];
     // Do processing
     // If during processing, anywhere any thread changes the value of the image call
            { atomicAdd(val, 1); }

}
int main(int argc, char *argv[]) {
    // Allocate d_input, d_output and call cudaMemcpy
    unsigned int *x, *val;
    x = (unsigned int *)malloc(sizeof(unsigned int));
    x[0] = 0;
    cudaMalloc((void **)&val, sizeof(unsigned int));
    cudaMemcpy((void *)val, (void *)x, sizeof(unsigned int), cudaMemcpyHostToDevice);
    process_image_GPU<<<dimGrid, dimBlock>>>(d_input, d_output, rows, cols, val);
    cudaMemcpy((void *)x, (void *)val, sizeof(unsigned int), cudaMemcpyDeviceToHost);
    if(x != 0) 
        // Call the kernel again
}

Is it the only way to do this? Is there any other efficient way to implement the same?

Thanks a lot for your time.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source