'Conditionally and efficiently locking on array of mutexes in C++
I have a simple work that is parallelized with OpenMP with very low scaling (2x speedup with 8 cores) due to too many locking:
struct MutexWithoutFalseSharing
{
std::mutex mut;
char padding[(64-sizeof(std::mutex))>0?(64-sizeof(std::mutex)):64];
};
MutexWithoutFalseSharing mutArr[256];
std::unordered_set<int> mapping[N];
// takes 4 milliseconds with omp, 8 milliseconds without omp
// (8cores)
// (result is vector of std::pair of ints, 50k-100k elements)
#pragma omp parallel for
for(auto& r:result)
{
std::lock_guard<std::mutex> lg(mutArr[r.first&255].mut);
mapping[(r.first].insert(r.second);
}
what it does is simply taking an id (r.first) and another id (r.second) from same std::pair and inserting them into an array of sets.
The std::vector of std::pair (result) contains data like this pattern:
r.first = {0,0,0,5,5,15,15,15,15,1000,1000,1270,4000,4000,4000,4000}
so that the locking algorithm picks same lock repeatedly, causing unnecessary extra latency. To evade the unnecessary extra locking, I'm thinking of this:
int lastId = -1000;
// assuming subResult is sub-set of original result (per-thread)
for(auto & r:subResult)
{
if(lastId != r.first)
{
mutexArray[lastId].unlock();
mutexArray[r.first].lock();
lastId=r.first;
}
do_something_that_uses(r.first);
}
but this is not very exception-friendly nor undefined-behavior/deadlock-free. For example, second thread may run its own iteration before the first thread and unlock is wasted (because the first thread did not lock it).
Is there a sane/RAII-friendly way of doing this, without explicit locking? If there is none, how can I check if the selected lock (from array of locks) was locked before calling unlock?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
