[lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Wed May 9 10:16:36 EDT 2012
* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> On Tue, May 08, 2012 at 02:48:27PM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > On Mon, May 07, 2012 at 12:10:55PM -0400, Mathieu Desnoyers wrote:
> > > >
> > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > > On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote:
> > > > [...]
> > > > > Just to make sure I understand -- the reason that the "del" functions
> > > > > say "no memory barrier" instead of "acts like rcu_dereference()" is
> > > > > that the "del" functions don't return anything.
> > > > >
> > > > [...]
> > > > > > @@ -391,6 +413,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct cds_lfht_node *node);
> > > > > > * function.
> > > > > > * Call with rcu_read_lock held.
> > > > > > * Threads calling this API need to be registered RCU read-side threads.
> > > > > > + * This function does not issue any memory barrier.
> > > > > > */
> > > >
> > > > One more question about the "del" memory ordering semantic. Following
> > > > commit
> > > >
> > > > commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0
> > > > Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> > > > Date: Mon Dec 19 16:45:51 2011 -0500
> > > >
> > > > rculfhash: Relax atomicity guarantees required by removal operation
> > > >
> > > > The atomicity guarantees for the removal operation do not need to be as
> > > > strict as a cmpxchg. Use a uatomic_set followed by a xchg on a newly
> > > > introduced "REMOVAL_OWNER_FLAG" to perform removal.
> > > >
> > > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> > > >
> > > >
> > > > The "del" operation is performed in two steps:
> > > >
> > > > 1 - uatomic_or(), which sets the "REMOVED" flag (the actual tombstone)
> > > > unconditionally into the node's next pointer.
> > > > 2 - uatomic_xchg(), which atomically exchanges the old pointer with
> > > > its current value (read) or'd with the REMOVAL_OWNER flag. The trick
> > > > is that if the xchg returns a pointer with the REMOVAL_OWNER flag
> > > > set, it means we are not the first thread to set this flag, so we
> > > > should not free the node. However, if xchg returns a node without
> > > > the REMOVAL_OWNER flag set, we are indeed the first to set it, so
> > > > we should call free.
> > > >
> > > > Now regarding memory ordering semantics, should we consider the atomic
> > > > action of "del" to apply when the "or" is called, or when the "xchg" is
> > > > called ? Or should we simply document that the "del" effect on the node
> > > > happens in two separate steps ?
> > > >
> > > > The way I see it, the actual effect of removal, as seen from RCU read
> > > > traversal and lookup point of view, is observed as soon as the "REMOVED"
> > > > tombstone is set, so I would think that the atomic publication of the
> > > > removal is performed by the "or".
> > > >
> > > > However, we ensure full memory barriers around "xchg", but not usually
> > > > around "or". Therefore, the current implementation does not issue a
> > > > memory barrier before the "or", so we should either change our planned
> > > > memory barrier documentation, or the implementation, to match. This
> > > > would probably require creation of a cmm_smp_mb__before_uatomic_or(), so
> > > > x86 does not end up issuing a useless memory barrer.
> > >
> > > My kneejerk reaction is that the "or" is really doing the deletion.
> > > Readers and other updaters care about the deletion, not about which CPU
> > > is going to do the free.
> > >
> > > Or did I misunderstand how this works?
> >
> > You got it right, this is how I see it too.
> >
> > However, in order to provide a full memory barrier before the "or", we'd
> > need to add a cmm_smp_mb() before the "or" (I don't think we want to
> > presume that our "or" operation issues full memory barriers on all
> > architectures).
> >
> > However, on x86, the "lock; or" does issue a full memory barrier. So I
> > think we should introduce a macro that can translate into a memory
> > barrier on architectures that need it, and to nothing on x86.
> >
> > Thoughts ?
>
> Makes sense to me!
Allright, committed the following. I end up using
"cmm_smp_mb__before_uatomic_*".
commit 7b783f818175cd92d98f78e761331f306ff406a5
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date: Tue May 8 17:12:20 2012 -0400
rculfhash: document implied memory barriers
We choose to provide full memory barriers before and after successful
hash table update operations. Eventually, new API with weaker semantic
can be added, but let's make the basic API as fool-proof as possible.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Reviewed-by: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
commit 196f4fab9bf26c48bc318ac2ff985469c4f62c7e
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date: Tue May 8 17:09:46 2012 -0400
rculfhash: Ensure future-proof memory barrier semantic consistency
Use cmm_smp_mb__before_uatomic_or() prior to the uatomic_or() in
_rcu_lfht_del() to ensure correct memory barrier semantic when we relax
(in the future) the barrier implementation of some architectures.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
commit 42e83919d54e7dc45d11b99a957b436403d16b68
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date: Tue May 8 17:07:03 2012 -0400
API cleanup: use "uatomic_*" in cmm_smp_mb__ API
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
commit 2812a2d2cfdfeea621768de1a0216bc1549a4902
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date: Tue May 8 16:47:28 2012 -0400
uatomic: add memory barrier API for and/or/add/sub/inc/sub
Implement:
cmm_smp_mb__before_and, cmm_smp_mb__after_and
cmm_smp_mb__before_or, cmm_smp_mb__after_or
cmm_smp_mb__before_add, cmm_smp_mb__after_add
cmm_smp_mb__before_sub, cmm_smp_mb__after_sub
cmm_smp_mb__before_inc, cmm_smp_mb__after_inc
cmm_smp_mb__before_dec, cmm_smp_mb__after_dec
For generic and x86.
These currently translate into simple compiler barriers on all
architectures, but the and/or/add/sub/inc/dec uatomics do not provide
memory ordering guarantees (only uatomic_add_return, uatomic_sub_return,
uatomic_xchg, and uatomic_cmpxchg provides full memory barrier
guarantees before and after the atomic operations).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Thanks,
Mathieu
>
> Thanx, Paul
>
> > Thanks,
> >
> > Mathieu
> >
> > >
> > > Thanx, Paul
> > >
> > > > Thoughts ?
> > > >
> > > > Thanks,
> > > >
> > > > Mathieu
> > > >
> > > > --
> > > > Mathieu Desnoyers
> > > > Operating System Efficiency R&D Consultant
> > > > EfficiOS Inc.
> > > > http://www.efficios.com
> > > >
> > > > _______________________________________________
> > > > rp mailing list
> > > > rp at svcs.cs.pdx.edu
> > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > >
> > >
> > >
> > > _______________________________________________
> > > lttng-dev mailing list
> > > lttng-dev at lists.lttng.org
> > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> >
> > --
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com
> >
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list