[lttng-dev] [RFC PATCH urcu] Document uatomic operations

Wed May 16 13:18:03 EDT 2012

On Tue, May 15, 2012 at 08:10:03AM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > On Mon, May 14, 2012 at 10:39:01PM -0400, Mathieu Desnoyers wrote:
> > > Document each atomic operation provided by urcu/uatomic.h, along with
> > > their memory barrier guarantees.
> > 
> > Great to see the documentation!!!  Some comments below.
> > 
> > 							Thanx, Paul
> > 
> > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> > > ---
> > > diff --git a/doc/Makefile.am b/doc/Makefile.am
> > > index bec1d7c..db9811c 100644
> > > --- a/doc/Makefile.am
> > > +++ b/doc/Makefile.am
> > > @@ -1 +1 @@
> > > -dist_doc_DATA = rcu-api.txt
> > > +dist_doc_DATA = rcu-api.txt uatomic-api.txt
> > > diff --git a/doc/uatomic-api.txt b/doc/uatomic-api.txt
> > > new file mode 100644
> > > index 0000000..3605acf
> > > --- /dev/null
> > > +++ b/doc/uatomic-api.txt
> > > @@ -0,0 +1,80 @@
> > > +Userspace RCU Atomic Operations API
> > > +by Mathieu Desnoyers and Paul E. McKenney
> > > +
> > > +
> > > +This document describes the <urcu/uatomic.h> API. Those are the atomic
> > > +operations provided by the Userspace RCU library. The general rule
> > > +regarding memory barriers is that only uatomic_xchg(),
> > > +uatomic_cmpxchg(), uatomic_add_return(), and uatomic_sub_return() imply
> > > +full memory barriers before and after the atomic operation. Other
> > > +primitives don't guarantee any memory barrier.
> > > +
> > > +Only atomic operations performed on integers ("int" and "long", signed
> > > +and unsigned) are supported on all architectures. Some architectures
> > > +also support 1-byte and 2-byte atomic operations. Those respectively
> > > +have UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT defined when
> > > +uatomic.h is included. An architecture trying to perform an atomic write
> > > +to a type size not supported by the architecture will trigger an illegal
> > > +instruction.
> > > +
> > > +In the description below, "type" is a type that can be atomically
> > > +written to by the architecture. It needs to be at most word-sized, and
> > > +its alignment needs to greater or equal to its size.
> > > +
> > > +type uatomic_set(type *addr, type v)
> > > +
> > > +	Atomically write @v into @addr.
> > 
> > Wouldn't this instead be "void uatomic_set(type *addr, type v)"?
> 
> Well, in that case, we'd need to change the macro. Currently,
> _uatomic_set maps directly to:
> 
> #define _uatomic_set(addr, v)   CMM_STORE_SHARED(*(addr), (v))
> 
> and CMM_STORE_SHARED returns v. The question becomes: should we change
> _uatomic_set or CMM_STORE_SHARED so they don't return anything, or
> document that they return something ?
> 
> One thing I noticed is that linters often complain that the return value
> of CMM_STORE_SHARED is never used. One thing we could look into is to
> try some gcc attributes and/or linter annotations to flag this return
> value as possibly unused. Thoughts ?

Hmmm...

Does the following work?

#define _uatomic_set(addr, v)   ((void)CMM_STORE_SHARED(*(addr), (v)))

> > By "Atomically write @v into @addr", what is meant is that no concurrent
> > operation that reads from addr will see partial effects of uatomic_set(),
> > correct?  In other words, the concurrent read will either see v or
> > the old value, not a mush of the two.
> 
> yep. I added that clarification.
> 
> > 
> > > +
> > > +type uatomic_read(type *addr)
> > > +
> > > +	Atomically read @v from @addr.
> > 
> > Similar comments on the meaning of "atomically".  This may sound picky,
> > but people coming from an x86 environment might otherwise assume that
> > there is lock prefix involved...
> 
> same.
> 
> > 
> > > +
> > > +type uatomic_cmpxchg(type *addr, type old, type new)
> > > +
> > > +	Atomically check if @addr contains @old. If true, then replace
> > > +	the content of @addr by @new. Return the value previously
> > > +	contained by @addr. This function imply a full memory barrier
> > > +	before and after the atomic operation.
> > 
> > Suggest "then atomically replace" or some such.  It might not hurt
> > to add that this is an atomic read-modify-write operation.
> 
> Updated to:
> 
> type uatomic_cmpxchg(type *addr, type old, type new)
> 
>         An atomic read-modify-write operation that performs this 
>         sequence of operations atomically: check if @addr contains @old.
>         If true, then replace the content of @addr by @new. Return the
>         value previously contained by @addr. This function imply a full
>         memory barrier before and after the atomic operation.
> 
> > 
> > Similar comments on the other value-returning atomics.
> 
> Will do something similar.
> 
> > 
> > > +
> > > +type uatomic_xchg(type *addr, type new)
> > > +
> > > +	Atomically replace the content of @addr by @new, and return the
> > > +	value previously contained by @addr. This function imply a full
> > > +	memory barrier before and after the atomic operation.
> > > +
> > > +type uatomic_add_return(type *addr, type v)
> > > +type uatomic_sub_return(type *addr, type v)
> > > +
> > > +	Atomically increment/decrement the content of @addr by @v, and
> > > +	return the resulting value. This function imply a full memory
> > > +	barrier before and after the atomic operation.
> > > +
> > > +void uatomic_and(type *addr, type mask)
> > > +void uatomic_or(type *addr, type mask)
> > > +
> > > +	Atomically write the result of bitwise "and"/"or" between the
> > > +	content of @addr and @mask into @addr. Memory barriers are
> > > +	provided by explicitly using cmm_smp_mb__before_uatomic_and(),
> > > +	cmm_smp_mb__after_uatomic_and(),
> > > +	cmm_smp_mb__before_uatomic_or(), and
> > > +	cmm_smp_mb__after_uatomic_or().
> > 
> > I suggest replacing "Memory barriers are provided ..." with something like
> > "These operations do not necessarily imply memory barriers.  If memory
> > barriers are needed, they may be provided ...".  Then perhaps add a
> > sentence stating that the advantage of using the four __before_/__after_
> > primitives is that they are no-ops on architectures in which the underlying
> > atomic instructions implicitly supply the needed memory barriers.
> > 
> > Simlar comments on the other non-value-returning atomic operations below.
> 
> OK, done.
> 
> Here is the update:
> 
> 
> diff --git a/doc/Makefile.am b/doc/Makefile.am
> index 27d3793..3422653 100644
> --- a/doc/Makefile.am
> +++ b/doc/Makefile.am
> @@ -1 +1 @@
> -dist_doc_DATA = rcu-api.txt cds-api.txt
> +dist_doc_DATA = rcu-api.txt cds-api.txt uatomic-api.txt
> diff --git a/doc/uatomic-api.txt b/doc/uatomic-api.txt
> new file mode 100644
> index 0000000..3ad8fbb
> --- /dev/null
> +++ b/doc/uatomic-api.txt
> @@ -0,0 +1,102 @@
> +Userspace RCU Atomic Operations API
> +by Mathieu Desnoyers and Paul E. McKenney
> +
> +
> +This document describes the <urcu/uatomic.h> API. Those are the atomic
> +operations provided by the Userspace RCU library. The general rule
> +regarding memory barriers is that only uatomic_xchg(),
> +uatomic_cmpxchg(), uatomic_add_return(), and uatomic_sub_return() imply
> +full memory barriers before and after the atomic operation. Other
> +primitives don't guarantee any memory barrier.
> +
> +Only atomic operations performed on integers ("int" and "long", signed
> +and unsigned) are supported on all architectures. Some architectures
> +also support 1-byte and 2-byte atomic operations. Those respectively
> +have UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT defined when
> +uatomic.h is included. An architecture trying to perform an atomic write
> +to a type size not supported by the architecture will trigger an illegal
> +instruction.
> +
> +In the description below, "type" is a type that can be atomically
> +written to by the architecture. It needs to be at most word-sized, and
> +its alignment needs to greater or equal to its size.
> +
> +type uatomic_set(type *addr, type v)
> +
> +	Atomically write @v into @addr. By "atomically", we mean that no
> +	concurrent operation that reads from addr will see partial
> +	effects of uatomic_set().

Good clarification!

If this is to return the value written, it should say so.  (I would prefer
no return, as this is the most common usage and the value written is easily
available anyway.)

> +type uatomic_read(type *addr)
> +
> +	Atomically read @v from @addr. By "atomically", we mean that
> +	uatomic_read() cannot see a partial effect of any concurrent
> +	uatomic update.
> +
> +type uatomic_cmpxchg(type *addr, type old, type new)
> +
> +	An atomic read-modify-write operation that performs this
> +	sequence of operations atomically: check if @addr contains @old.
> +	If true, then replace the content of @addr by @new. Return the
> +	value previously contained by @addr. This function imply a full
> +	memory barrier before and after the atomic operation.

Good clarification!  (And below as well.)

> +type uatomic_xchg(type *addr, type new)
> +
> +	An atomic read-modify-write operation that performs this sequence
> +	of operations atomically: replace the content of @addr by @new,
> +	and return the value previously contained by @addr. This
> +	function imply a full memory barrier before and after the atomic
> +	operation.
> +
> +type uatomic_add_return(type *addr, type v)
> +type uatomic_sub_return(type *addr, type v)
> +
> +	An atomic read-modify-write operation that performs this
> +	sequence of operations atomically: increment/decrement the
> +	content of @addr by @v, and return the resulting value. This
> +	function imply a full memory barrier before and after the atomic
> +	operation.
> +
> +void uatomic_and(type *addr, type mask)
> +void uatomic_or(type *addr, type mask)
> +
> +	Atomically write the result of bitwise "and"/"or" between the
> +	content of @addr and @mask into @addr.
> +	These operations do not necessarily imply memory barriers.
> +	If memory barriers are needed, they may be provided by
> +	explicitly using
> +	cmm_smp_mb__before_uatomic_and(),
> +	cmm_smp_mb__after_uatomic_and(),
> +	cmm_smp_mb__before_uatomic_or(), and
> +	cmm_smp_mb__after_uatomic_or(). These explicit barriers are
> +	np-ops on architectures in which the underlying atomic
> +	instructions implicitly supply the needed memory barriers.

Good as well for these!

						Thanx, Paul

> +void uatomic_add(type *addr, type v)
> +void uatomic_sub(type *addr, type v)
> +
> +	Atomically increment/decrement the content of @addr by @v.
> +	These operations do not necessarily imply memory barriers.
> +	If memory barriers are needed, they may be provided by
> +	explicitly using
> +	cmm_smp_mb__before_uatomic_add(),
> +	cmm_smp_mb__after_uatomic_add(),
> +	cmm_smp_mb__before_uatomic_sub(), and
> +	cmm_smp_mb__after_uatomic_sub(). These explicit barriers are 
> +	np-ops on architectures in which the underlying atomic
> +	instructions implicitly supply the needed memory barriers.
> +
> +void uatomic_inc(type *addr)
> +void uatomic_dec(type *addr)
> +
> +	Atomically increment/decrement the content of @addr by 1.
> +	These operations do not necessarily imply memory barriers.
> +	If memory barriers are needed, they may be provided by
> +	explicitly using
> +	cmm_smp_mb__before_uatomic_inc(),
> +	cmm_smp_mb__after_uatomic_inc(),
> +	cmm_smp_mb__before_uatomic_dec(), and
> +	cmm_smp_mb__after_uatomic_dec(). These explicit barriers are
> +	np-ops on architectures in which the underlying atomic
> +	instructions implicitly supply the needed memory barriers.
> 
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
>