[lttng-dev] liburcu rcu_xchg_pointer and rcu_cmpxchg_pointer ARM32 barriers

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Mon Dec 5 14:14:47 UTC 2016


Hi Paul,

So about the liburcu rcu_xchg_pointer() barriers, here is the current
situation:

rcu_xchg_pointer is implemented as:

#define _rcu_xchg_pointer(p, v)                         \
        __extension__                                   \
        ({                                              \
                __typeof__(*p) _________pv = (v);       \
                if (!__builtin_constant_p(v) ||         \
                    ((v) != NULL))                      \
                        cmm_wmb();                              \
                uatomic_xchg(p, _________pv);           \
        })

So we actually add a write barrier before the uatomic_xchg(),
which should not be required if we consider that uatomic_xchg()
*should* imply a full barrier before/after.

But in reality, it's ARM32 uatomic_xchg() which does not fulfill
its contract, due to __sync_lock_test_and_set being only
an acquire barrier [1]. So the extra cmm_wmb() is what saved
us here for rcu_xchg_pointer().

The code currently generated by rcu_xchg_pointer() looks like:

   11000:       f3bf 8f5f       dmb     sy
   11004:       e857 ef00       ldrex   lr, [r7]
   11008:       e847 0300       strex   r3, r0, [r7]
   1100c:       2b00            cmp     r3, #0
   1100e:       d1f9            bne.n   11004 <thr_writer+0x70>
   11010:       f3bf 8f5b       dmb     ish


Looking at the cmpxchg variant:

#define _rcu_cmpxchg_pointer(p, old, _new)                              \
        __extension__                                                   \
        ({                                                              \
                __typeof__(*p) _________pold = (old);                   \
                __typeof__(*p) _________pnew = (_new);                  \
                if (!__builtin_constant_p(_new) ||                      \
                    ((_new) != NULL))                                   \
                        cmm_wmb();                                              \
                uatomic_cmpxchg(p, _________pold, _________pnew);       \
        })

We also notice a cmm_wmb() before what should imply a full barrier
(uatomic_cmxchg). The latter is implemented with __sync_val_compare_and_swap_N,
which should imply a full barrier based on [1] (which is as vague as it
gets). Looking at the generated code, we indeed have two barriers before:

   11000:       f3bf 8f5f       dmb     sy
   11004:       f3bf 8f5b       dmb     ish
   11008:       e857 ef00       ldrex   lr, [r7]
   1100c:       45c6            cmp     lr, r8
   1100e:       d103            bne.n   11018 <thr_writer+0x84>
   11010:       e847 0300       strex   r3, r0, [r7]
   11014:       2b00            cmp     r3, #0
   11016:       d1f7            bne.n   11008 <thr_writer+0x74>
   11018:       f3bf 8f5b       dmb     ish

So for stable-0.8 and stable-0.9, I would be tempted to err on
the safe side and simply add the missing cmm_smp_mb() within
uatomic_xchg() before the __sync_lock_test_and_set().

For the master branch, in addition to adding the missing cmm_smp_mb()
to uatomic_xchg(), we could remove the redundant cmm_wmb() in
rcu_cmpxchg_pointer and rcu_xchg_pointer.

Thoughts ?

Thanks,

Mathieu

[1] https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


More information about the lttng-dev mailing list