'Using omp parallel for in multiplication algorithm (BigInt multiplication)
For educational purpose I'm developing c++ library for operating with large numbers represented as vectors of chars (vector<char>).
Here is algorithm that I am using for multiplication:
string multiplicationInner(CharVector a, CharVector b) {
reverse(a.begin(), a.end());
reverse(b.begin(), b.end());
IntVector stack(a.size() + b.size() + 1);
int i, j;
for (i = 0; i < a.size(); i++)
for (j = 0; j < b.size(); j++)
stack[i + j] += charToInt(a[i]) * charToInt(b[j]);
for (int i = 0; i < stack.size(); i++) {
int num = stack[i] % 10;
int move = stack[i] / 10;
stack[i] = num;
if (stack[i + 1])
stack[i + 1] += move;
else if (move)
stack[i + 1] = move;
}
CharVector stackChar = intVectorToCharVector(&stack);
deleteZerosAtEnd(&stackChar);
reverse(stackChar.begin(), stackChar.end());
return charVectorToString(&stackChar);
};
This function is called billion times in my program, so I would like to implement #pragma omp parallel for in it.
My question is: How can i parallelize first cycle?
This is what I have tried:
int i, j;
#pragma omp parallel for
for (i = 0; i < a.size(); i++) {
for (j = 0; j < b.size(); j++)
stack[i + j] += charToInt(a[i]) * charToInt(b[j]);
}
Algorithm stops working properly. Advice needed.
Edit: This variant works, but (with omp parallel for) benchmark shows it is 15x-20x slower than without it. (CPU: M1 Pro, 8 cores)
#pragma omp parallel for schedule(dynamic)
for (int k = 0; k < a.size() + b.size(); k++) {
for (int i = 0; i < a.size(); i++) {
int j = k - i;
if (j >= 0 && j < b.size()) {
stack[k] += charToInt(a[i]) * charToInt(b[j]);
}
}
}
This is part of my program, where multiplication is called most often. (Miller-Rabin test)
BigInt modularExponentiation(BigInt base, BigInt exponent, BigInt mod) {
BigInt x = B_ONE; // 1
BigInt y = base;
while (exponent > B_ZERO) { // while exponent > 0
if (isOdd(exponent))
x = (x * y) % mod;
y = (y * y) % mod;
exponent /= B_TWO; // exponent /= 2
}
return (x % mod);
};
bool isMillerRabinTestOk(BigInt candidate) {
if (candidate < B_TWO)
return false;
if (candidate != B_TWO && isEven(candidate))
return false;
BigInt canditateMinusOne = candidate - B_ONE;
BigInt s = canditateMinusOne;
while (isEven(s))
s /= B_TWO;
for (int i = 0; i < MILLER_RABIN_TEST_ITERATIONS; i++) {
BigInt a = BigInt(rand()) % canditateMinusOne + B_ONE;
BigInt temp = s;
BigInt mod = modularExponentiation(a, temp, candidate);
while (temp != canditateMinusOne && mod != B_ONE && mod != canditateMinusOne) {
mod = (mod * mod) % candidate;
temp *= B_TWO;
}
if (mod != canditateMinusOne && isEven(temp))
return false;
}
return true;
};
Solution 1:[1]
Your loops do not have the proper structure for parallelization. However, you can transform them:
for (k=0; k<a.size()+b.size(); k++) {
for (i=0; i<a.size(); i++) {
j=k-i;
stack[k] += a[i] * b[j];
}
Now the outer loop has no conflicts. Look at this as a "coordinate transformation": you're still traversing the same i/j row/column space, but now in new coordinates: k/i stands for diagonal/row.
Btw, this code is a little metaphorical. Check your loop bounds, and use the right multiplication. I'm just indicating the principle here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
