'Setting up OpenMp for a pattern that requires the value of a previous iteration
So I am having some trouble parellising my implementation of an algorithm that does some transformation on pixels in an image using OpenMP. A high level description is below in the form of the pseudocode and a diagram.
for i in (1, iterations):
do some setup
for x in (1, pixels):
transform x
save x into new_pixels
if i is validation iteration
for x in (1, pixels):
check value of x
pixels = new_pixels
So right now I have OpenMP on the inner loop and the loop that checks the pixels in the validation iteration. The pragma I used for both those loops is below. The setup and assigning the new pixels has to be done on a single thread and is not paralellisable
#pragma omp parallel for
The problem with this is that it creates the threads that it will then use based on the number of threads set. This means that at every iteration I am starting the threads at least once and sometimes twice. This introduces a lot of overhead, so I get a speedup at most of 1.9x on 32 cores.
So I tried to add a parallel pragma around the iterations loop and then use a single pragma right after it to do the setup and then use a for pragma to do the pixels and then a single for the check and then a for pragma to actually check each pixel. This is shown below with the extended pseudocode.
#pragma omp parallel
#pragma omp single
for i in (1, iterations):
do some setup
#pragma omp for
for x in (1, pixels):
transform x
save x into new_pixels
#pragma omp single
if i is validation iteration
#pragma omp for
for x in (1, pixels):
check value of x
#pragma omp single
pixels = new_pixels
This does not work or even compile, so I am wondering what is the way to accomplish this pattern. This code is written in C.
Thanks and if I need to expand on anything please ask me to do so.
Solution 1:[1]
Using tasks (e.g. using the taskloop construct) you can do exactly what your intention is. Note that, however,
parallel overheads of tasks are bigger than overheads of
omp for. So, it may not be faster at all, but it worth a try. Based on the underwhelming speedup, my guess is that your code is not really calculation intensive.you have to carefully check all your variables and set their sharing attributes (as
private,shared,firstprivate, etc) properly, and you have to avoid race conditions. If you do not have experience doing so, you may miss something and your code will not work properly.
The task based OpenMP code is:
#pragma omp parallel
#pragma omp single
for i in (1, iterations):
do some setup
#pragma omp taskloop default(none) shared(...) firstprivate(...)
for x in (1, pixels):
transform x
save x into new_pixels
if i is validation iteration
#pragma omp taskloop default(none) shared(...) firstprivate(...)
for x in (1, pixels):
check value of x
pixels = new_pixels
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Laci |

