'Implementing boost::barrier in C++11

I've been trying to get a project rid of every boost reference and switch to pure C++11.

At one point, thread workers are created which wait for a barrier to give the 'go' command, do the work (spread through the N threads) and synchronize when all of them finish. The basic idea is that the main loop gives the go order (boost::barrier .wait()) and waits for the result with the same function.

I had implemented in a different project a custom made Barrier based on the Boost version and everything worked perfectly. Implementation is as follows:

Barrier.h:

class Barrier {
public:
    Barrier(unsigned int n);
    void Wait(void);
private:
    std::mutex counterMutex;
    std::mutex waitMutex;

    unsigned int expectedN;
    unsigned int currentN;
};

Barrier.cpp

Barrier::Barrier(unsigned int n) {
    expectedN = n;
    currentN = expectedN;
}

void Barrier::Wait(void) {
    counterMutex.lock();

    // If we're the first thread, we want an extra lock at our disposal

    if (currentN == expectedN) {
        waitMutex.lock();
    }

    // Decrease thread counter

    --currentN;

    if (currentN == 0) {
        currentN = expectedN;
        waitMutex.unlock();

        currentN = expectedN;
        counterMutex.unlock();
    } else {
        counterMutex.unlock();

        waitMutex.lock();
        waitMutex.unlock();
    }
}

This code has been used on iOS and Android's NDK without any problems, but when trying it on a Visual Studio 2013 project it seems only a thread which locked a mutex can unlock it (assertion: unlock of unowned mutex).

Is there any non-spinning (blocking, such as this one) version of barrier that I can use that works for C++11? I've only been able to find barriers which used busy-waiting which is something I would like to prevent (unless there is really no reason for it).



Solution 1:[1]

Use a std::condition_variable instead of a std::mutex to block all threads until the last one reaches the barrier.

class Barrier
{
private:
    std::mutex _mutex;
    std::condition_variable _cv;
    std::size_t _count;
public:
    explicit Barrier(std::size_t count) : _count(count) { }
    void Wait()
    {
        std::unique_lock<std::mutex> lock(_mutex);
        if (--_count == 0) {
            _cv.notify_all();
        } else {
            _cv.wait(lock, [this] { return _count == 0; });
        }
    }
};

Solution 2:[2]

Here's my version of the accepted answer above with Auto reset behavior for repetitive use; this was achieved by counting up and down alternately.

    /**
    * @brief Represents a CPU thread barrier
    * @note The barrier automatically resets after all threads are synced
    */
    class Barrier
    {
    private:
        std::mutex m_mutex;
        std::condition_variable m_cv;

        size_t m_count;
        const size_t m_initial;

        enum State : unsigned char {
            Up, Down
        };
        State m_state;

    public:
        explicit Barrier(std::size_t count) : m_count{ count }, m_initial{ count }, m_state{ State::Down } { }

        /// Blocks until all N threads reach here
        void Sync()
        {
            std::unique_lock<std::mutex> lock{ m_mutex };

            if (m_state == State::Down)
            {
                // Counting down the number of syncing threads
                if (--m_count == 0) {
                    m_state = State::Up;
                    m_cv.notify_all();
                }
                else {
                    m_cv.wait(lock, [this] { return m_state == State::Up; });
                }
            }

            else // (m_state == State::Up)
            {
                // Counting back up for Auto reset
                if (++m_count == m_initial) {
                    m_state = State::Down;
                    m_cv.notify_all();
                }
                else {
                    m_cv.wait(lock, [this] { return m_state == State::Down; });
                }
            }
        }
    };  

Solution 3:[3]

Seem all above answers don't work in the case the barrier is placed too near

Example: Each thread run the while loop look like this:

while (true)
{
    threadBarrier->Synch();
    // do heavy computation
    threadBarrier->Synch();
    // small external calculations like timing, loop count, etc, ...
}

And here is the solution using STL:

class ThreadBarrier
{
public:
    int m_threadCount = 0;
    int m_currentThreadCount = 0;

    std::mutex m_mutex;
    std::condition_variable m_cv;

public:
    inline ThreadBarrier(int threadCount)
    {
        m_threadCount = threadCount;
    };

public:
    inline void Synch()
    {
        bool wait = false;

        m_mutex.lock();

        m_currentThreadCount = (m_currentThreadCount + 1) % m_threadCount;

        wait = (m_currentThreadCount != 0);

        m_mutex.unlock();

        if (wait)
        {
            std::unique_lock<std::mutex> lk(m_mutex);
            m_cv.wait(lk);
        }
        else
        {
            m_cv.notify_all();
        }
    };

};

And the solution for Windows:

class ThreadBarrier
{
public:
    SYNCHRONIZATION_BARRIER m_barrier;

public:
    inline ThreadBarrier(int threadCount)
    {
        InitializeSynchronizationBarrier(
            &m_barrier,
            threadCount,
            8000);
    };

public:
    inline void Synch()
    {
        EnterSynchronizationBarrier(
            &m_barrier,
            0);
    };

};

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Michael Sutton
Solution 3 Nguyễn Hữu Kiệt