[lttng-dev] liburcu rcu_xchg_pointer and rcu_cmpxchg_pointer ARM32 barriers
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Mon Dec 5 14:14:47 UTC 2016
Hi Paul,
So about the liburcu rcu_xchg_pointer() barriers, here is the current
situation:
rcu_xchg_pointer is implemented as:
#define _rcu_xchg_pointer(p, v) \
__extension__ \
({ \
__typeof__(*p) _________pv = (v); \
if (!__builtin_constant_p(v) || \
((v) != NULL)) \
cmm_wmb(); \
uatomic_xchg(p, _________pv); \
})
So we actually add a write barrier before the uatomic_xchg(),
which should not be required if we consider that uatomic_xchg()
*should* imply a full barrier before/after.
But in reality, it's ARM32 uatomic_xchg() which does not fulfill
its contract, due to __sync_lock_test_and_set being only
an acquire barrier [1]. So the extra cmm_wmb() is what saved
us here for rcu_xchg_pointer().
The code currently generated by rcu_xchg_pointer() looks like:
11000: f3bf 8f5f dmb sy
11004: e857 ef00 ldrex lr, [r7]
11008: e847 0300 strex r3, r0, [r7]
1100c: 2b00 cmp r3, #0
1100e: d1f9 bne.n 11004 <thr_writer+0x70>
11010: f3bf 8f5b dmb ish
Looking at the cmpxchg variant:
#define _rcu_cmpxchg_pointer(p, old, _new) \
__extension__ \
({ \
__typeof__(*p) _________pold = (old); \
__typeof__(*p) _________pnew = (_new); \
if (!__builtin_constant_p(_new) || \
((_new) != NULL)) \
cmm_wmb(); \
uatomic_cmpxchg(p, _________pold, _________pnew); \
})
We also notice a cmm_wmb() before what should imply a full barrier
(uatomic_cmxchg). The latter is implemented with __sync_val_compare_and_swap_N,
which should imply a full barrier based on [1] (which is as vague as it
gets). Looking at the generated code, we indeed have two barriers before:
11000: f3bf 8f5f dmb sy
11004: f3bf 8f5b dmb ish
11008: e857 ef00 ldrex lr, [r7]
1100c: 45c6 cmp lr, r8
1100e: d103 bne.n 11018 <thr_writer+0x84>
11010: e847 0300 strex r3, r0, [r7]
11014: 2b00 cmp r3, #0
11016: d1f7 bne.n 11008 <thr_writer+0x74>
11018: f3bf 8f5b dmb ish
So for stable-0.8 and stable-0.9, I would be tempted to err on
the safe side and simply add the missing cmm_smp_mb() within
uatomic_xchg() before the __sync_lock_test_and_set().
For the master branch, in addition to adding the missing cmm_smp_mb()
to uatomic_xchg(), we could remove the redundant cmm_wmb() in
rcu_cmpxchg_pointer and rcu_xchg_pointer.
Thoughts ?
Thanks,
Mathieu
[1] https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list