[lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

Tue May 1 15:47:14 EDT 2012

On Tue, May 01, 2012 at 01:41:36PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote:
> > > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > > On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote:
> > > > > > Hi!
> > > > > > 
> > > > > > After 1 year of development, with the last 6-7 months spent polishing
> > > > > > the API and testing the implementation, I think it is getting about time
> > > > > > to release the RCU lock-free hash table in a new Userspace RCU version
> > > > > > (0.7).
> > > > > > 
> > > > > > I recently described the guarantees provided by the hash table in more
> > > > > > detail, and created tests for the uniqueness guarantee for traversals
> > > > > > performed concurrently with add_unique and add_replace operations. I
> > > > > > also added test modes that create long hash chains, to test corner-cases
> > > > > > of the hash table.
> > > > > > 
> > > > > > One small thing I wonder is whether we should document that the hash
> > > > > > table update operations imply full memory barriers ?
> > > > > 
> > > > > The Linux kernel's rule seems good here -- if a hash-table operation is
> > > > > atomic and returns a value, it should imply a full barrier.  So:
> > > > > 
> > > > > cds_lfht_new(): No point in claiming barriers -- publishing the
> > > > > 	pointer to the hash table is where the barriers are important.
> > > > > 
> > > > > cds_lfht_destroy(): Ditto.
> > > > > 
> > > > > cds_lfht_lookup(): Not an update (let alone an atomic update), no barriers.
> > > > > 
> > > > > cds_lfht_next_duplicate(): Ditto.
> > > > > 
> > > > > cds_lfht_first(): Ditto.
> > > > > 
> > > > > cds_lfht_next(): Ditto.
> > > > > 
> > > > > cds_lfht_add(): Atomic update, but no return value, so no barrier
> > > > > 	implied.
> > > > 
> > > > Yep, makes sense. We use cmpxchg internally to perform the update, but
> > > > it could make sense to eventually use a cmpxchg that has no memory
> > > > barriers to perform this update. So I agree on not providing a memory
> > > > barrier guarantee on the "add" operation, since it does not return any
> > > > value.
> > > > 
> > > > > 
> > > > > cds_lfht_add_unique(): Atomic update that returns a value, so should
> > > > > 	imply a full memory barrier.
> > > > 
> > > > add_unique is a bit special:
> > > > 
> > > > - if it returns the node received as parameter, it means the add
> > > >   succeeded, which imply an update, and thus a memory barrier.
> > > 
> > > Hrm, thinking further: if we make the "add" operation not act as a
> > > full memory barrier, then the add_unique success should not act as a
> > > full mb neither.
> > 
> > Think of it as being similar to the Linux kernel's atomic_inc() and
> > atomic_add_return() primitives.  The latter guarantees memory barriers
> > and the former does not.
> 
> I think add/add_unique vs atomic_inc/atomic_add_return are fundamentally
> different.
> 
> atomic_inc:
>  - input: increment
>  - output: none
>  - effect: increment target address value
> 
> atomic_add_return:
>  - input: increment
>  - output: incremented value
>  - effect: increment target address value, atomically reading the
>    resulting value.
> 
> hash table add:
>  - input: new node to add
>  - output: none
>  - effect: atomically add the new node into the table
> 
> hash table add_unique (success):
>  - input: new node to add
>  - output: (we just return whether the operation has succeeded)
>  - effect: atomically add the new node into the table
> 
> hash table add_unique (failure):
>  - input: new node to try adding
>  - output: (we just return whether the operation has succeeded)
>  - effect: simple lookup (read)
> 
> So as we see, the add_unique failure only performs a read. Adding a
> memory barrier before this read would require us to add a memory barrier
> also on the success path, which would degrade performance. The success
> path does: lookup failure, cmpxchg to add the node, retry if changed.
> Adding a memory barrier before the lookup would add an extra memory
> barrier in addition to the one located in the cmpxchg, and I don't think
> we want that overhead.

Perhaps a better analogy is cmpxchg() in the Linux kernel.  Some
architectures place the memory barrier before unconditionally, but place
a memory barrier after only if the cmpxchg() succeeds.  Of course,
architectures for which the cmpxchg instruction implies a barrier
don't need any explicit memory-barrier instructions.

I do see your point about the memory barrier in the failure case requiring
an additional memory barrier in the success case, however.

> > > > - if it returns a different node than the one received as parameter, it
> > > >   failed, and thus means that it only performed a lookup, so there is no
> > > >   guarantee that a memory barrier has been executed.
> > > > 
> > > > 
> > > > > 
> > > > > cds_lfht_add_replace(): Ditto.
> > > > > 
> > > > > cds_lfht_replace(): Ditto.
> > > > > 
> > > > > cds_lfht_del(): Ditto.
> > > 
> > > One more point: "del" is similar to add_unique: if it succeeds, we
> > > execute a full memory barrier. If it fails, no memory barrier is
> > > guaranteed. But if we want to make the guarantees relax, we might not
> > > want to guarantee that a memory barrier is present in any of the "del"
> > > cases.
> > > 
> > > In the end, the only primitives for which I think it really makes sense
> > > to provide memory barrier semantic is the add_replace and replace : they
> > > actually act as an higher-level "cmpxchg" over the hash table nodes.
> > 
> > I believe that this should do the same -- memory barrier before, but no
> > memory barrier after on failure.
> > 
> > Another approach is C++11, in which there are a couple of arguments to
> > the compare-and-swap primitive specifying the memory ordering constraints
> > for the success and failure cases, respectively.
> 
> Hrm, I think adding this kind of flexibility might make the API too
> clobbered by details that "usual cases" don't care about.

;-)

> > Unless you expect use cases with lots of failing cds_lfht_del() and
> > add_unique() calls, the performance difference should not be significant.
> 
> The problem is that I expect very, very frequent failing add_unique
> calls for use-cases like Binary Decision Diagrams (BDD) creation. This
> is why I don't think it is appropriate to put memory barriers on the
> failure cases, as these should have no more overhead than a simple
> lookup.
> 
> And if we choose not to provide memory barriers for add_unique failure,
> we should probably do the same for "del" to keep a consistent behavior
> over the API.
> 
> Thoughts ?

Given your BDD example, letting the failure cases avoid memory barriers
seems reasonable.  If problems arise, the default interfaces can be
fully ordered with faster versions where the failure cases avoid memory
barriers.

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > 							Thanx, Paul
> > 
> > > Thoughts ?
> > > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > > > 
> > > > Yep, makes sense.
> > > > 
> > > > I'll add this documentation in the API.
> > > > 
> > > > Thanks!
> > > > 
> > > > Mathieu
> > > > 
> > > > > 
> > > > > cds_lfht_is_node_deleted(): Not an update (let alone an atomic update),
> > > > > 	no barriers.
> > > > > 
> > > > > cds_lfht_resize(): Atomic update, but no return value, so no barrier
> > > > > 	implied.
> > > > > 
> > > > > cds_lfht_for_each(): Not an update (let alone an atomic update),
> > > > > 	no barriers.
> > > > > 
> > > > > cds_lfht_for_each_duplicate(): Ditto.
> > > > > 
> > > > > cds_lfht_for_each_entry(): Ditto.
> > > > > 
> > > > > cds_lfht_for_each_entry_duplicate(): Ditto.
> > > > > 
> > > > > 							Thanx, Paul
> > > > > 
> > > > > > I'm personally getting confident that the hash table API is clean
> > > > > > enough, and the implementation well tested, but I'd like to have your
> > > > > > thoughts on the readiness of the hash table for production use.
> > > > > > 
> > > > > > Thanks,
> > > > > > 
> > > > > > Mathieu
> > > > > > 
> > > > > > -- 
> > > > > > Mathieu Desnoyers
> > > > > > Operating System Efficiency R&D Consultant
> > > > > > EfficiOS Inc.
> > > > > > http://www.efficios.com
> > > > > > 
> > > > > > _______________________________________________
> > > > > > rp mailing list
> > > > > > rp at svcs.cs.pdx.edu
> > > > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > > > > 
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > lttng-dev mailing list
> > > > > lttng-dev at lists.lttng.org
> > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > > > 
> > > > -- 
> > > > Mathieu Desnoyers
> > > > Operating System Efficiency R&D Consultant
> > > > EfficiOS Inc.
> > > > http://www.efficios.com
> > > > 
> > > > _______________________________________________
> > > > lttng-dev mailing list
> > > > lttng-dev at lists.lttng.org
> > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > > 
> > > -- 
> > > Mathieu Desnoyers
> > > Operating System Efficiency R&D Consultant
> > > EfficiOS Inc.
> > > http://www.efficios.com
> > > 
> > > _______________________________________________
> > > rp mailing list
> > > rp at svcs.cs.pdx.edu
> > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > 
> > 
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
> 
> _______________________________________________
> rp mailing list
> rp at svcs.cs.pdx.edu
> http://svcs.cs.pdx.edu/mailman/listinfo/rp
>