[ltt-dev] Userspace RCU library relicensed to LGPLv2.1
Mathieu Desnoyers
mathieu.desnoyers at polymtl.ca
Thu May 14 12:27:43 EDT 2009
* Steve Munroe (sjmunroe at us.ibm.com) wrote:
> Steven J. Munroe
> Linux on Power Toolchain Architect
> IBM Corporation, Linux Technology Center
>
>
> libc-alpha-owner at sourceware.org wrote on 05/14/2009 08:06:39 AM:
>
> > * Jan Blunck (jblunck at suse.de) wrote:
> > > On Wed, May 13, Mathieu Desnoyers wrote:
> > >
> > > > It currently supports x86 and powerpc. LGPL-compatible low-level
> > > > primitive headers will be required for other architectures. Note that
> > > > the build system is at best rudimentary at the moment.
> > >
> > > Is there a specific reason why the atomic_ops implementation was
> > used instead
> > > of the atomic builtins that come with GCC? IIRC, they are implemented
> on all
> > > architectures already.
> > >
> >
> > Hi Jan,
> >
> > As said Evgeniy, there is the compiler version issue, but in this case
> > there is more :
> >
> > If we look at
> > http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
> >
> > The instruction closest to an xchg() instruction (to exchange a pointer
> > in memory) is :
> >
> >
> > "type __sync_lock_test_and_set (type *ptr, type value, ...)
> >
> > This builtin, as described by Intel, is not a traditional
> > test-and-set operation, but rather an atomic exchange operation. It
> > writes value into *ptr, and returns the previous contents of *ptr.
> >
> It seems like either:
>
> bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)
> type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
> These builtins perform an atomic compare and swap. That is, if the
> current value of *ptr is oldval, then write newval into *ptr.
>
>
> The “bool” version returns true if the comparison is successful and
> newval was written. The “val” version returns the contents of *ptr
> before the operation.
>
> would do the trick with __sync_val_compare_and_swap and simple while loop.
> Most of the time a single iteration is all that is required and on PowerPC
> is the same loop you would need for xchg().
>
Even on powerpc it involves extra unneeded branches and memory barriers.
Re-stating part of my answer to Jan Blunck, the downside of using a
CAS-based solution on many architectures is :
- cache line exchanges increase (shared + exclusive access)
- code size increase (read, extra branches)
- execution speed decrease (extra branches)
- adds unneeded memory barriers. Release semantic is part of the
__sync_val_compare_and_swap primitive, and, unless there is a scenario
I would be missing, seems unneeded for xchg().
While I can see this as a temporary fall-back for architectures where a
proper atomic primitive is not implemented, it does not strike me as a
neat solution.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
More information about the lttng-dev
mailing list