'Setting up OpenMp for a pattern that requires the value of a previous iteration

So I am having some trouble parellising my implementation of an algorithm that does some transformation on pixels in an image using OpenMP. A high level description is below in the form of the pseudocode and a diagram.

for i in (1, iterations):
    do some setup
    for x in (1, pixels):
        transform x
        save x into new_pixels
    if i is validation iteration
        for x in (1, pixels):
           check value of x
    pixels = new_pixels

So right now I have OpenMP on the inner loop and the loop that checks the pixels in the validation iteration. The pragma I used for both those loops is below. The setup and assigning the new pixels has to be done on a single thread and is not paralellisable

#pragma omp parallel for

The problem with this is that it creates the threads that it will then use based on the number of threads set. This means that at every iteration I am starting the threads at least once and sometimes twice. This introduces a lot of overhead, so I get a speedup at most of 1.9x on 32 cores.

So I tried to add a parallel pragma around the iterations loop and then use a single pragma right after it to do the setup and then use a for pragma to do the pixels and then a single for the check and then a for pragma to actually check each pixel. This is shown below with the extended pseudocode.

#pragma omp parallel
#pragma omp single
for i in (1, iterations):
    do some setup
    #pragma omp for
    for x in (1, pixels):
        transform x
        save x into new_pixels
    #pragma omp single
    if i is validation iteration
        #pragma omp for
        for x in (1, pixels):
           check value of x
    #pragma omp single
    pixels = new_pixels

This does not work or even compile, so I am wondering what is the way to accomplish this pattern. This code is written in C.

Thanks and if I need to expand on anything please ask me to do so.

c openmp pragma

Solution 1:^[1]

Using tasks (e.g. using the taskloop construct) you can do exactly what your intention is. Note that, however,

parallel overheads of tasks are bigger than overheads of omp for. So, it may not be faster at all, but it worth a try. Based on the underwhelming speedup, my guess is that your code is not really calculation intensive.
you have to carefully check all your variables and set their sharing attributes (as private, shared, firstprivate, etc) properly, and you have to avoid race conditions. If you do not have experience doing so, you may miss something and your code will not work properly.

The task based OpenMP code is:

#pragma omp parallel
#pragma omp single
for i in (1, iterations):
    do some setup
    #pragma omp taskloop default(none) shared(...) firstprivate(...)
    for x in (1, pixels):
        transform x
        save x into new_pixels  
    if i is validation iteration
        #pragma omp taskloop default(none) shared(...) firstprivate(...)
        for x in (1, pixels):
           check value of x
    pixels = new_pixels

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Laci

'Setting up OpenMp for a pattern that requires the value of a previous iteration

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]