[lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

Paul E. McKenney paulmck at linux.vnet.ibm.com
Tue May 1 19:50:07 EDT 2012


On Tue, May 01, 2012 at 05:02:06PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > On Tue, May 01, 2012 at 01:41:36PM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote:
> > > > > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> > > > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > > > > On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote:
> > > > > > > > Hi!
> > > > > > > > 
> > > > > > > > After 1 year of development, with the last 6-7 months spent polishing
> > > > > > > > the API and testing the implementation, I think it is getting about time
> > > > > > > > to release the RCU lock-free hash table in a new Userspace RCU version
> > > > > > > > (0.7).
> > > > > > > > 
> > > > > > > > I recently described the guarantees provided by the hash table in more
> > > > > > > > detail, and created tests for the uniqueness guarantee for traversals
> > > > > > > > performed concurrently with add_unique and add_replace operations. I
> > > > > > > > also added test modes that create long hash chains, to test corner-cases
> > > > > > > > of the hash table.
> > > > > > > > 
> > > > > > > > One small thing I wonder is whether we should document that the hash
> > > > > > > > table update operations imply full memory barriers ?
> > > > > > > 
> > > > > > > The Linux kernel's rule seems good here -- if a hash-table operation is
> > > > > > > atomic and returns a value, it should imply a full barrier.  So:
> > > > > > > 
> > > > > > > cds_lfht_new(): No point in claiming barriers -- publishing the
> > > > > > > 	pointer to the hash table is where the barriers are important.
> > > > > > > 
> > > > > > > cds_lfht_destroy(): Ditto.
> > > > > > > 
> > > > > > > cds_lfht_lookup(): Not an update (let alone an atomic update), no barriers.
> > > > > > > 
> > > > > > > cds_lfht_next_duplicate(): Ditto.
> > > > > > > 
> > > > > > > cds_lfht_first(): Ditto.
> > > > > > > 
> > > > > > > cds_lfht_next(): Ditto.
> > > > > > > 
> > > > > > > cds_lfht_add(): Atomic update, but no return value, so no barrier
> > > > > > > 	implied.
> > > > > > 
> > > > > > Yep, makes sense. We use cmpxchg internally to perform the update, but
> > > > > > it could make sense to eventually use a cmpxchg that has no memory
> > > > > > barriers to perform this update. So I agree on not providing a memory
> > > > > > barrier guarantee on the "add" operation, since it does not return any
> > > > > > value.
> > > > > > 
> > > > > > > 
> > > > > > > cds_lfht_add_unique(): Atomic update that returns a value, so should
> > > > > > > 	imply a full memory barrier.
> > > > > > 
> > > > > > add_unique is a bit special:
> > > > > > 
> > > > > > - if it returns the node received as parameter, it means the add
> > > > > >   succeeded, which imply an update, and thus a memory barrier.
> > > > > 
> > > > > Hrm, thinking further: if we make the "add" operation not act as a
> > > > > full memory barrier, then the add_unique success should not act as a
> > > > > full mb neither.
> > > > 
> > > > Think of it as being similar to the Linux kernel's atomic_inc() and
> > > > atomic_add_return() primitives.  The latter guarantees memory barriers
> > > > and the former does not.
> > > 
> > > I think add/add_unique vs atomic_inc/atomic_add_return are fundamentally
> > > different.
> > > 
> > > atomic_inc:
> > >  - input: increment
> > >  - output: none
> > >  - effect: increment target address value
> > > 
> > > atomic_add_return:
> > >  - input: increment
> > >  - output: incremented value
> > >  - effect: increment target address value, atomically reading the
> > >    resulting value.
> > > 
> > > hash table add:
> > >  - input: new node to add
> > >  - output: none
> > >  - effect: atomically add the new node into the table
> > > 
> > > hash table add_unique (success):
> > >  - input: new node to add
> > >  - output: (we just return whether the operation has succeeded)
> > >  - effect: atomically add the new node into the table
> > > 
> > > hash table add_unique (failure):
> > >  - input: new node to try adding
> > >  - output: (we just return whether the operation has succeeded)
> > >  - effect: simple lookup (read)
> > > 
> > > So as we see, the add_unique failure only performs a read. Adding a
> > > memory barrier before this read would require us to add a memory barrier
> > > also on the success path, which would degrade performance. The success
> > > path does: lookup failure, cmpxchg to add the node, retry if changed.
> > > Adding a memory barrier before the lookup would add an extra memory
> > > barrier in addition to the one located in the cmpxchg, and I don't think
> > > we want that overhead.
> > 
> > Perhaps a better analogy is cmpxchg() in the Linux kernel.  Some
> > architectures place the memory barrier before unconditionally, but place
> > a memory barrier after only if the cmpxchg() succeeds.  Of course,
> > architectures for which the cmpxchg instruction implies a barrier
> > don't need any explicit memory-barrier instructions.
> > 
> > I do see your point about the memory barrier in the failure case requiring
> > an additional memory barrier in the success case, however.
> > 
> > > > > > - if it returns a different node than the one received as parameter, it
> > > > > >   failed, and thus means that it only performed a lookup, so there is no
> > > > > >   guarantee that a memory barrier has been executed.
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > cds_lfht_add_replace(): Ditto.
> > > > > > > 
> > > > > > > cds_lfht_replace(): Ditto.
> > > > > > > 
> > > > > > > cds_lfht_del(): Ditto.
> > > > > 
> > > > > One more point: "del" is similar to add_unique: if it succeeds, we
> > > > > execute a full memory barrier. If it fails, no memory barrier is
> > > > > guaranteed. But if we want to make the guarantees relax, we might not
> > > > > want to guarantee that a memory barrier is present in any of the "del"
> > > > > cases.
> > > > > 
> > > > > In the end, the only primitives for which I think it really makes sense
> > > > > to provide memory barrier semantic is the add_replace and replace : they
> > > > > actually act as an higher-level "cmpxchg" over the hash table nodes.
> > > > 
> > > > I believe that this should do the same -- memory barrier before, but no
> > > > memory barrier after on failure.
> > > > 
> > > > Another approach is C++11, in which there are a couple of arguments to
> > > > the compare-and-swap primitive specifying the memory ordering constraints
> > > > for the success and failure cases, respectively.
> > > 
> > > Hrm, I think adding this kind of flexibility might make the API too
> > > clobbered by details that "usual cases" don't care about.
> > 
> > ;-)
> > 
> > > > Unless you expect use cases with lots of failing cds_lfht_del() and
> > > > add_unique() calls, the performance difference should not be significant.
> > > 
> > > The problem is that I expect very, very frequent failing add_unique
> > > calls for use-cases like Binary Decision Diagrams (BDD) creation. This
> > > is why I don't think it is appropriate to put memory barriers on the
> > > failure cases, as these should have no more overhead than a simple
> > > lookup.
> > > 
> > > And if we choose not to provide memory barriers for add_unique failure,
> > > we should probably do the same for "del" to keep a consistent behavior
> > > over the API.
> > > 
> > > Thoughts ?
> > 
> > Given your BDD example, letting the failure cases avoid memory barriers
> > seems reasonable.  If problems arise, the default interfaces can be
> > fully ordered with faster versions where the failure cases avoid memory
> > barriers.
> 
> I think we agree on the failure cases of del and add_unique, which is
> good.
> 
> Now, I'm thinking slightly further about the success cases of add,
> add_unique and del.
> 
> Typically, when giving a node to add or add_unique for population into
> the hash table, we have to order the writes that initialize the node
> before its insertion into the hash table. So I think it would make sense
> to guarantee that we provide a memory barrier prior to add and
> add_unique (success) operations.
> 
> For the "del" case, it's the other way around: if a deletion succeeds,
> we want to have a memory barrier between the removal flag write and the
> following "free()" of the element. So we need to ensure that there is a
> memory barrier after a successful deletion.
> 
> If we think about it, the "add_replace" acts both as an "add" and "del":
> in all cases, we need to have a memory barrier before the add_replace,
> because we need to order the node initialization before the atomic
> insertion into the hash table, and we also need a memory barrier after,
> but only really in the case where add_replace replaced an old node, thus
> acting as a delete of that old node.
> 
> Considering all this, the barrier semantic I'm proposing is:
> 
> add:
> - full memory barrier before the insertion.
> 
> add_unique:
> - success: full memory barrier before the insertion.
> - failure: no memory barrier.
> 
> replace:
> - success: full memory barrier before the insertion, and after the
>   removal of the old node.
> - failure: no memory barrier.
> 
> add_replace:
> - returns an old node (replace): full memory barrier before the
>   insertion, and after the removal of the old node.
> - returns NULL (add): full memory barrier before the insertion.
> 
> del:
> - success: full memory barrier after the removal.
> - failure: no memory barrier.
> 
> I think those are the minimal barriers we need to provide to ensure that
> users will not have to add barriers of their own to use the API.
> 
> Thoughts ?

This means that some other CPU might see an add followed by a del as
happening in the opposite order.  Is that really OK?

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > 							Thanx, Paul
> > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > > 
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > > Thoughts ?
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Mathieu
> > > > > 
> > > > > > 
> > > > > > Yep, makes sense.
> > > > > > 
> > > > > > I'll add this documentation in the API.
> > > > > > 
> > > > > > Thanks!
> > > > > > 
> > > > > > Mathieu
> > > > > > 
> > > > > > > 
> > > > > > > cds_lfht_is_node_deleted(): Not an update (let alone an atomic update),
> > > > > > > 	no barriers.
> > > > > > > 
> > > > > > > cds_lfht_resize(): Atomic update, but no return value, so no barrier
> > > > > > > 	implied.
> > > > > > > 
> > > > > > > cds_lfht_for_each(): Not an update (let alone an atomic update),
> > > > > > > 	no barriers.
> > > > > > > 
> > > > > > > cds_lfht_for_each_duplicate(): Ditto.
> > > > > > > 
> > > > > > > cds_lfht_for_each_entry(): Ditto.
> > > > > > > 
> > > > > > > cds_lfht_for_each_entry_duplicate(): Ditto.
> > > > > > > 
> > > > > > > 							Thanx, Paul
> > > > > > > 
> > > > > > > > I'm personally getting confident that the hash table API is clean
> > > > > > > > enough, and the implementation well tested, but I'd like to have your
> > > > > > > > thoughts on the readiness of the hash table for production use.
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > 
> > > > > > > > Mathieu
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > Mathieu Desnoyers
> > > > > > > > Operating System Efficiency R&D Consultant
> > > > > > > > EfficiOS Inc.
> > > > > > > > http://www.efficios.com
> > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > rp mailing list
> > > > > > > > rp at svcs.cs.pdx.edu
> > > > > > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > lttng-dev mailing list
> > > > > > > lttng-dev at lists.lttng.org
> > > > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > > > > > 
> > > > > > -- 
> > > > > > Mathieu Desnoyers
> > > > > > Operating System Efficiency R&D Consultant
> > > > > > EfficiOS Inc.
> > > > > > http://www.efficios.com
> > > > > > 
> > > > > > _______________________________________________
> > > > > > lttng-dev mailing list
> > > > > > lttng-dev at lists.lttng.org
> > > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > > > > 
> > > > > -- 
> > > > > Mathieu Desnoyers
> > > > > Operating System Efficiency R&D Consultant
> > > > > EfficiOS Inc.
> > > > > http://www.efficios.com
> > > > > 
> > > > > _______________________________________________
> > > > > rp mailing list
> > > > > rp at svcs.cs.pdx.edu
> > > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > > > 
> > > > 
> > > 
> > > -- 
> > > Mathieu Desnoyers
> > > Operating System Efficiency R&D Consultant
> > > EfficiOS Inc.
> > > http://www.efficios.com
> > > 
> > > _______________________________________________
> > > rp mailing list
> > > rp at svcs.cs.pdx.edu
> > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > 
> > 
> > 
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
> 




More information about the lttng-dev mailing list