[lttng-dev] [RFC PATCH urcu] Document uatomic operations

Wed May 16 14:17:42 EDT 2012

* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> On Tue, May 15, 2012 at 08:10:03AM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > On Mon, May 14, 2012 at 10:39:01PM -0400, Mathieu Desnoyers wrote:
> > > > Document each atomic operation provided by urcu/uatomic.h, along with
> > > > their memory barrier guarantees.
> > > 
> > > Great to see the documentation!!!  Some comments below.
> > > 
> > > 							Thanx, Paul
> > > 
> > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> > > > ---
> > > > diff --git a/doc/Makefile.am b/doc/Makefile.am
> > > > index bec1d7c..db9811c 100644
> > > > --- a/doc/Makefile.am
> > > > +++ b/doc/Makefile.am
> > > > @@ -1 +1 @@
> > > > -dist_doc_DATA = rcu-api.txt
> > > > +dist_doc_DATA = rcu-api.txt uatomic-api.txt
> > > > diff --git a/doc/uatomic-api.txt b/doc/uatomic-api.txt
> > > > new file mode 100644
> > > > index 0000000..3605acf
> > > > --- /dev/null
> > > > +++ b/doc/uatomic-api.txt
> > > > @@ -0,0 +1,80 @@
> > > > +Userspace RCU Atomic Operations API
> > > > +by Mathieu Desnoyers and Paul E. McKenney
> > > > +
> > > > +
> > > > +This document describes the <urcu/uatomic.h> API. Those are the atomic
> > > > +operations provided by the Userspace RCU library. The general rule
> > > > +regarding memory barriers is that only uatomic_xchg(),
> > > > +uatomic_cmpxchg(), uatomic_add_return(), and uatomic_sub_return() imply
> > > > +full memory barriers before and after the atomic operation. Other
> > > > +primitives don't guarantee any memory barrier.
> > > > +
> > > > +Only atomic operations performed on integers ("int" and "long", signed
> > > > +and unsigned) are supported on all architectures. Some architectures
> > > > +also support 1-byte and 2-byte atomic operations. Those respectively
> > > > +have UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT defined when
> > > > +uatomic.h is included. An architecture trying to perform an atomic write
> > > > +to a type size not supported by the architecture will trigger an illegal
> > > > +instruction.
> > > > +
> > > > +In the description below, "type" is a type that can be atomically
> > > > +written to by the architecture. It needs to be at most word-sized, and
> > > > +its alignment needs to greater or equal to its size.
> > > > +
> > > > +type uatomic_set(type *addr, type v)
> > > > +
> > > > +	Atomically write @v into @addr.
> > > 
> > > Wouldn't this instead be "void uatomic_set(type *addr, type v)"?
> > 
> > Well, in that case, we'd need to change the macro. Currently,
> > _uatomic_set maps directly to:
> > 
> > #define _uatomic_set(addr, v)   CMM_STORE_SHARED(*(addr), (v))
> > 
> > and CMM_STORE_SHARED returns v. The question becomes: should we change
> > _uatomic_set or CMM_STORE_SHARED so they don't return anything, or
> > document that they return something ?
> > 
> > One thing I noticed is that linters often complain that the return value
> > of CMM_STORE_SHARED is never used. One thing we could look into is to
> > try some gcc attributes and/or linter annotations to flag this return
> > value as possibly unused. Thoughts ?
> 
> Hmmm...
> 
> Does the following work?
> 
> #define _uatomic_set(addr, v)   ((void)CMM_STORE_SHARED(*(addr), (v)))

Well, it would work, yes, but then we would not be consistent between
return values or no return values of:

uatomic_set()
rcu_assign_pointer()
rcu_set_pointer()

if you notice, in the Linux kernel, rcu_assign_pointer returns the
new pointer value. But you are right that atomic_set() does not return
anything. So which consistency would be best to keep ?

Thanks,

Mathieu

> 
> > > By "Atomically write @v into @addr", what is meant is that no concurrent
> > > operation that reads from addr will see partial effects of uatomic_set(),
> > > correct?  In other words, the concurrent read will either see v or
> > > the old value, not a mush of the two.
> > 
> > yep. I added that clarification.
> > 
> > > 
> > > > +
> > > > +type uatomic_read(type *addr)
> > > > +
> > > > +	Atomically read @v from @addr.
> > > 
> > > Similar comments on the meaning of "atomically".  This may sound picky,
> > > but people coming from an x86 environment might otherwise assume that
> > > there is lock prefix involved...
> > 
> > same.
> > 
> > > 
> > > > +
> > > > +type uatomic_cmpxchg(type *addr, type old, type new)
> > > > +
> > > > +	Atomically check if @addr contains @old. If true, then replace
> > > > +	the content of @addr by @new. Return the value previously
> > > > +	contained by @addr. This function imply a full memory barrier
> > > > +	before and after the atomic operation.
> > > 
> > > Suggest "then atomically replace" or some such.  It might not hurt
> > > to add that this is an atomic read-modify-write operation.
> > 
> > Updated to:
> > 
> > type uatomic_cmpxchg(type *addr, type old, type new)
> > 
> >         An atomic read-modify-write operation that performs this 
> >         sequence of operations atomically: check if @addr contains @old.
> >         If true, then replace the content of @addr by @new. Return the
> >         value previously contained by @addr. This function imply a full
> >         memory barrier before and after the atomic operation.
> > 
> > > 
> > > Similar comments on the other value-returning atomics.
> > 
> > Will do something similar.
> > 
> > > 
> > > > +
> > > > +type uatomic_xchg(type *addr, type new)
> > > > +
> > > > +	Atomically replace the content of @addr by @new, and return the
> > > > +	value previously contained by @addr. This function imply a full
> > > > +	memory barrier before and after the atomic operation.
> > > > +
> > > > +type uatomic_add_return(type *addr, type v)
> > > > +type uatomic_sub_return(type *addr, type v)
> > > > +
> > > > +	Atomically increment/decrement the content of @addr by @v, and
> > > > +	return the resulting value. This function imply a full memory
> > > > +	barrier before and after the atomic operation.
> > > > +
> > > > +void uatomic_and(type *addr, type mask)
> > > > +void uatomic_or(type *addr, type mask)
> > > > +
> > > > +	Atomically write the result of bitwise "and"/"or" between the
> > > > +	content of @addr and @mask into @addr. Memory barriers are
> > > > +	provided by explicitly using cmm_smp_mb__before_uatomic_and(),
> > > > +	cmm_smp_mb__after_uatomic_and(),
> > > > +	cmm_smp_mb__before_uatomic_or(), and
> > > > +	cmm_smp_mb__after_uatomic_or().
> > > 
> > > I suggest replacing "Memory barriers are provided ..." with something like
> > > "These operations do not necessarily imply memory barriers.  If memory
> > > barriers are needed, they may be provided ...".  Then perhaps add a
> > > sentence stating that the advantage of using the four __before_/__after_
> > > primitives is that they are no-ops on architectures in which the underlying
> > > atomic instructions implicitly supply the needed memory barriers.
> > > 
> > > Simlar comments on the other non-value-returning atomic operations below.
> > 
> > OK, done.
> > 
> > Here is the update:
> > 
> > 
> > diff --git a/doc/Makefile.am b/doc/Makefile.am
> > index 27d3793..3422653 100644
> > --- a/doc/Makefile.am
> > +++ b/doc/Makefile.am
> > @@ -1 +1 @@
> > -dist_doc_DATA = rcu-api.txt cds-api.txt
> > +dist_doc_DATA = rcu-api.txt cds-api.txt uatomic-api.txt
> > diff --git a/doc/uatomic-api.txt b/doc/uatomic-api.txt
> > new file mode 100644
> > index 0000000..3ad8fbb
> > --- /dev/null
> > +++ b/doc/uatomic-api.txt
> > @@ -0,0 +1,102 @@
> > +Userspace RCU Atomic Operations API
> > +by Mathieu Desnoyers and Paul E. McKenney
> > +
> > +
> > +This document describes the <urcu/uatomic.h> API. Those are the atomic
> > +operations provided by the Userspace RCU library. The general rule
> > +regarding memory barriers is that only uatomic_xchg(),
> > +uatomic_cmpxchg(), uatomic_add_return(), and uatomic_sub_return() imply
> > +full memory barriers before and after the atomic operation. Other
> > +primitives don't guarantee any memory barrier.
> > +
> > +Only atomic operations performed on integers ("int" and "long", signed
> > +and unsigned) are supported on all architectures. Some architectures
> > +also support 1-byte and 2-byte atomic operations. Those respectively
> > +have UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT defined when
> > +uatomic.h is included. An architecture trying to perform an atomic write
> > +to a type size not supported by the architecture will trigger an illegal
> > +instruction.
> > +
> > +In the description below, "type" is a type that can be atomically
> > +written to by the architecture. It needs to be at most word-sized, and
> > +its alignment needs to greater or equal to its size.
> > +
> > +type uatomic_set(type *addr, type v)
> > +
> > +	Atomically write @v into @addr. By "atomically", we mean that no
> > +	concurrent operation that reads from addr will see partial
> > +	effects of uatomic_set().
> 
> Good clarification!
> 
> If this is to return the value written, it should say so.  (I would prefer
> no return, as this is the most common usage and the value written is easily
> available anyway.)
> 
> > +type uatomic_read(type *addr)
> > +
> > +	Atomically read @v from @addr. By "atomically", we mean that
> > +	uatomic_read() cannot see a partial effect of any concurrent
> > +	uatomic update.
> > +
> > +type uatomic_cmpxchg(type *addr, type old, type new)
> > +
> > +	An atomic read-modify-write operation that performs this
> > +	sequence of operations atomically: check if @addr contains @old.
> > +	If true, then replace the content of @addr by @new. Return the
> > +	value previously contained by @addr. This function imply a full
> > +	memory barrier before and after the atomic operation.
> 
> Good clarification!  (And below as well.)
> 
> > +type uatomic_xchg(type *addr, type new)
> > +
> > +	An atomic read-modify-write operation that performs this sequence
> > +	of operations atomically: replace the content of @addr by @new,
> > +	and return the value previously contained by @addr. This
> > +	function imply a full memory barrier before and after the atomic
> > +	operation.
> > +
> > +type uatomic_add_return(type *addr, type v)
> > +type uatomic_sub_return(type *addr, type v)
> > +
> > +	An atomic read-modify-write operation that performs this
> > +	sequence of operations atomically: increment/decrement the
> > +	content of @addr by @v, and return the resulting value. This
> > +	function imply a full memory barrier before and after the atomic
> > +	operation.
> > +
> > +void uatomic_and(type *addr, type mask)
> > +void uatomic_or(type *addr, type mask)
> > +
> > +	Atomically write the result of bitwise "and"/"or" between the
> > +	content of @addr and @mask into @addr.
> > +	These operations do not necessarily imply memory barriers.
> > +	If memory barriers are needed, they may be provided by
> > +	explicitly using
> > +	cmm_smp_mb__before_uatomic_and(),
> > +	cmm_smp_mb__after_uatomic_and(),
> > +	cmm_smp_mb__before_uatomic_or(), and
> > +	cmm_smp_mb__after_uatomic_or(). These explicit barriers are
> > +	np-ops on architectures in which the underlying atomic
> > +	instructions implicitly supply the needed memory barriers.
> 
> Good as well for these!
> 
> 						Thanx, Paul
> 
> > +void uatomic_add(type *addr, type v)
> > +void uatomic_sub(type *addr, type v)
> > +
> > +	Atomically increment/decrement the content of @addr by @v.
> > +	These operations do not necessarily imply memory barriers.
> > +	If memory barriers are needed, they may be provided by
> > +	explicitly using
> > +	cmm_smp_mb__before_uatomic_add(),
> > +	cmm_smp_mb__after_uatomic_add(),
> > +	cmm_smp_mb__before_uatomic_sub(), and
> > +	cmm_smp_mb__after_uatomic_sub(). These explicit barriers are 
> > +	np-ops on architectures in which the underlying atomic
> > +	instructions implicitly supply the needed memory barriers.
> > +
> > +void uatomic_inc(type *addr)
> > +void uatomic_dec(type *addr)
> > +
> > +	Atomically increment/decrement the content of @addr by 1.
> > +	These operations do not necessarily imply memory barriers.
> > +	If memory barriers are needed, they may be provided by
> > +	explicitly using
> > +	cmm_smp_mb__before_uatomic_inc(),
> > +	cmm_smp_mb__after_uatomic_inc(),
> > +	cmm_smp_mb__before_uatomic_dec(), and
> > +	cmm_smp_mb__after_uatomic_dec(). These explicit barriers are
> > +	np-ops on architectures in which the underlying atomic
> > +	instructions implicitly supply the needed memory barriers.
> > 
> > 
> > -- 
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com
> > 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com