'Why doesn't my ARM LDREX/STREX C function work?

I wrote a claim_lock function in C, according to the "Barrier Litmus Tests and Cookbook" document. I examined the generated code, and it all looks good, but it didn't work.

// This code conforms to the section 7.2 of PRD03-GENC-007826:
// "Acquiring and Releasing a Lock"
static inline void claim_lock( uint32_t volatile *lock )
{
  uint32_t failed = 1;
  uint32_t value;

  while (failed) {
    asm volatile ( "ldrex %[value], [%[lock]]"
                   : [value] "=&r" (value)
                   : [lock] "r" (lock) );
    if (value == 0) {
      // The failed and lock registers are not allowed to be the same, so
      // pretend to gcc that the lock pointer may be written as well as read.

      asm volatile ( "strex %[failed], %[value], [%[lock]]"
                     : [failed] "=&r" (failed)
                     , [lock] "+r" (lock)
                     : [value] "r" (1) );
    }
    else {
      asm ( "clrex" );
    }
  }
  asm ( "dmb sy" );
}

Generated code (gcc):

1000:       e3a03001        mov     r3, #1
1004:       e1902f9f        ldrex   r2, [r0]
1008:       e3520000        cmp     r2, #0
100c:       1a000004        bne     1024 <claim_lock+0x24>
1010:       e1802f93        strex   r2, r3, [r0]
1014:       e3520000        cmp     r2, #0
1018:       1afffff9        bne     1004 <claim_lock+0x4>
101c:       f57ff05f        dmb     sy
1020:       e12fff1e        bx      lr
1024:       f57ff01f        clrex
1028:       eafffff5        b       1004 <claim_lock+0x4>

Corresponding release function:

static inline void release_lock( uint32_t volatile *lock )
{
  // Ensure that any changes made while holding the lock are
  // visible before the lock is seen to have been released
  asm ( "dmb sy" );
  *lock = 0;
}

It worked in QEMU, but either hung, or allowed all cores to "claim" the so-called "lock" on real hardware (Raspberry Pi 3 Cortex-A53).



Solution 1:[1]

this is what i found in Context switch section of ARMv7-M Architecture Reference Manual

Blockquote

It is necessary to ensure that the local monitor is in the Open Access state after a context switch. In ARMv7-M, the local monitor is changed to Open Access automatically as part of an exception entry or exit sequence. The local monitor can also be forced to the Open Access state by a CLREX instruction. Note Context switching is not an application level operation. However, this information is included here to complete the description of the exclusive operations. A context switch might cause a subsequent Store-Exclusive to fail, requiring a load … store sequence to be replayed. To minimize the possibility of this happening, ARM recommends that the Store-Exclusive instruction is kept as close as possible to the associated Load-Exclusive instruction, see Load-Exclusive and Store-Exclusive usage restrictions.

Blockquote

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ishwar Chandra