'Using optimistic locks in C++ and memory order

I'm wondering how 'optimistic locks' (as described e.g. here: https://db.in.tum.de/~leis/papers/artsync.pdf) can be used in C++ with regard to non-atomic data and what memory orders should be used.

My understanding from the above paper is that following code will synchronize t1 and t2 and t1 will either print both variables updated or none.

static constexpr uint64_t locked_bit = (1ULL << 63);

std::atomic<int> data1 = 0;
std::atomic<int> data2 = 0;
std::atomic<uint64_t> versioned_lock = 0;

int main() {
    std::thread t1([&]{
        restart:
        auto s = versioned_lock.load();
        if (s & locked_bit)
            goto restart;

        auto local_data1 = data1.load();
        auto local_data2 = data2.load();
        if (s == versioned_lock.load()) {
            // data has not been overwritten yet, can safely use local_data
            std::cout << local_data1 << " " << local_data2 << std::endl;
        } else {
            // possible race, local_data might be garbage?
        }
    });

    std::thread t2([&]{
        auto current_lock_version = versioned_lock.load();
        current_lock_version++;
        versioned_lock.store(current_lock_version | locked_bit);
        data1.store(1234);
        data2.store(4321);
        versioned_lock.store(current_lock_version);
    });

    t1.join();
    t2.join();
}

My questions are:

  1. I this a valid C++ code with no UB?
  2. Are my assumption about synchronization correct? Will t1 actually print either "0 0" or "1234 4321"?
  3. What memory orders should be used for reading/writing to versioned_lock? Should it be memory_order_seq_cst or can it be something less restrictive? Will it work also if data1 and data2 are non-atomic (just int or even some more complex data type) and does it affect the required memory order (or maybe there is a need for atomic_thread_fence)?


Solution 1:[1]

Is this a valid C++ code with no UB?

I think so. The usual way to get UB in such a program is a data race, but that can only happen when a non-atomic variable is written concurrently, and every shared variable in this program is atomic.

Are my assumption about synchronization correct? Will t1 actually print either "0 0" or "1234 4321"?

Yes. All the loads and stores in this version are seq_cst, so they occur in some total order, which I will use < to denote. Let L1, L2 denote the two loads of versioned_lock in t1, and S1, S2 the two stores in t2.

If S1, S2 both occur before L1, then both the new values will be seen and we print 1234 4321, which is fine. If S1, S2 both occur after L2 then we print 0 0, which is also fine. If either S1 or S2 occurs in between L1 and L2, then L1 and L2 return different values and we do not print anything, which again is fine.

The only ordering that remains is S1 < L1 < L2 < S2. In this case L1 returns the value with the locked bit set, and so we restart the loop instead of printing.

What memory orders should be used for reading/writing to versioned_lock? Should it be memory_order_seq_cst or can it be something less restrictive?

I think that the following suffices:

static constexpr uint64_t locked_bit = (1ULL << 63);

std::atomic<int> data1 = 0;
std::atomic<int> data2 = 0;
std::atomic<uint64_t> versioned_lock = 0;

int main() {
    std::thread t1([&]{
        restart:
        auto s = versioned_lock.load(std::memory_order_acquire); // L1
        if (s & locked_bit)
            goto restart;

        auto local_data1 = data1.load(std::memory_order_relaxed);
        auto local_data2 = data2.load(std::memory_order_relaxed);
        std::atomic_thread_fence(std::memory_order_acquire);    // AF
        if (s == versioned_lock.load(std::memory_order_relaxed)) { // L2
            // data has not been overwritten yet, can safely use local_data
            std::cout << local_data1 << " " << local_data2 << std::endl;
        } else {
            // possible race, local_data might be garbage?
        }
    });

    std::thread t2([&]{
        auto current_lock_version = versioned_lock.load(std::memory_order_relaxed);
        current_lock_version++;
        versioned_lock.store(current_lock_version | locked_bit, std::memory_order_relaxed); // S1
        std::atomic_thread_fence(std::memory_order_release); // RF
        data1.store(1234, std::memory_order_relaxed);
        data2.store(4321, std::memory_order_relaxed);
        versioned_lock.store(current_lock_version, std::memory_order_release); // S2
    });

    t1.join();
    t2.join();
}

That is, the accesses to versioned_lock that clear and test the locked bit must be release and acquire, the stores of the data must be preceded by a release fence, and the loads of the data followed by an acquire fence. Intuitively, this should keep the data accesses from leaking out from between the lock accesses that protect them. (You could also make all the stores of the data be release and then drop the release fence, but that would be unnecessarily strong: we do not care if the two data stores get reordered with each other. It would make a bigger difference if we had many more data elements. The same applies to the option of making the data loads acquire.)

To prove that this works, notice that the two results we must avoid are 1234 0 and 0 4321. So suppose that the load of data1 returns 1234 (the case when data2 returns 4321 is equivalent). Since this is the value that was stored by t2, the release fence (RF) synchronizes with the acquire fence (AF). Now S1 is sequenced before RF, and AF is sequenced before L2, so we conclude that S1 happens before L2. Thus L2 cannot return 0; it either returns locked_bit or 1.

If L1 returns a different value from L2 then we will not print. If L1 returns locked_bit we also will not print. The one remaining case is that L1 and L2 both return 1. In this case, L1 has read the value written by S2, and L1/S2 are acquire/release, so S2 synchronizes with L1. The data stores are sequenced before S2, and L1 is sequenced before the data loads, so both data stores happen before both data loads. Therefore we see both the new values, and print 1234 4321.

Will it work also if data1 and data2 are non-atomic (just int or even some more complex data type)?

No, that will not work at all. Nothing in this algorithm prevents the stores and loads of data1, data2 from conflicting; it's just that when they do conflict, our tests of the versioned lock tell us not to use the data. But if they were not atomic, the conflicting stores and loads would be a data race, causing undefined behavior whether we use the data or not.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1