[lttng-dev] [RFC] Readiness for URCU release with RCU lock-free hash table

Tue May 1 13:41:36 EDT 2012

* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote:
> > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote:
> > > > > Hi!
> > > > > 
> > > > > After 1 year of development, with the last 6-7 months spent polishing
> > > > > the API and testing the implementation, I think it is getting about time
> > > > > to release the RCU lock-free hash table in a new Userspace RCU version
> > > > > (0.7).
> > > > > 
> > > > > I recently described the guarantees provided by the hash table in more
> > > > > detail, and created tests for the uniqueness guarantee for traversals
> > > > > performed concurrently with add_unique and add_replace operations. I
> > > > > also added test modes that create long hash chains, to test corner-cases
> > > > > of the hash table.
> > > > > 
> > > > > One small thing I wonder is whether we should document that the hash
> > > > > table update operations imply full memory barriers ?
> > > > 
> > > > The Linux kernel's rule seems good here -- if a hash-table operation is
> > > > atomic and returns a value, it should imply a full barrier.  So:
> > > > 
> > > > cds_lfht_new(): No point in claiming barriers -- publishing the
> > > > 	pointer to the hash table is where the barriers are important.
> > > > 
> > > > cds_lfht_destroy(): Ditto.
> > > > 
> > > > cds_lfht_lookup(): Not an update (let alone an atomic update), no barriers.
> > > > 
> > > > cds_lfht_next_duplicate(): Ditto.
> > > > 
> > > > cds_lfht_first(): Ditto.
> > > > 
> > > > cds_lfht_next(): Ditto.
> > > > 
> > > > cds_lfht_add(): Atomic update, but no return value, so no barrier
> > > > 	implied.
> > > 
> > > Yep, makes sense. We use cmpxchg internally to perform the update, but
> > > it could make sense to eventually use a cmpxchg that has no memory
> > > barriers to perform this update. So I agree on not providing a memory
> > > barrier guarantee on the "add" operation, since it does not return any
> > > value.
> > > 
> > > > 
> > > > cds_lfht_add_unique(): Atomic update that returns a value, so should
> > > > 	imply a full memory barrier.
> > > 
> > > add_unique is a bit special:
> > > 
> > > - if it returns the node received as parameter, it means the add
> > >   succeeded, which imply an update, and thus a memory barrier.
> > 
> > Hrm, thinking further: if we make the "add" operation not act as a
> > full memory barrier, then the add_unique success should not act as a
> > full mb neither.
> 
> Think of it as being similar to the Linux kernel's atomic_inc() and
> atomic_add_return() primitives.  The latter guarantees memory barriers
> and the former does not.

I think add/add_unique vs atomic_inc/atomic_add_return are fundamentally
different.

atomic_inc:
 - input: increment
 - output: none
 - effect: increment target address value

atomic_add_return:
 - input: increment
 - output: incremented value
 - effect: increment target address value, atomically reading the
   resulting value.

hash table add:
 - input: new node to add
 - output: none
 - effect: atomically add the new node into the table

hash table add_unique (success):
 - input: new node to add
 - output: (we just return whether the operation has succeeded)
 - effect: atomically add the new node into the table

hash table add_unique (failure):
 - input: new node to try adding
 - output: (we just return whether the operation has succeeded)
 - effect: simple lookup (read)

So as we see, the add_unique failure only performs a read. Adding a
memory barrier before this read would require us to add a memory barrier
also on the success path, which would degrade performance. The success
path does: lookup failure, cmpxchg to add the node, retry if changed.
Adding a memory barrier before the lookup would add an extra memory
barrier in addition to the one located in the cmpxchg, and I don't think
we want that overhead.

> 
> > > - if it returns a different node than the one received as parameter, it
> > >   failed, and thus means that it only performed a lookup, so there is no
> > >   guarantee that a memory barrier has been executed.
> > > 
> > > 
> > > > 
> > > > cds_lfht_add_replace(): Ditto.
> > > > 
> > > > cds_lfht_replace(): Ditto.
> > > > 
> > > > cds_lfht_del(): Ditto.
> > 
> > One more point: "del" is similar to add_unique: if it succeeds, we
> > execute a full memory barrier. If it fails, no memory barrier is
> > guaranteed. But if we want to make the guarantees relax, we might not
> > want to guarantee that a memory barrier is present in any of the "del"
> > cases.
> > 
> > In the end, the only primitives for which I think it really makes sense
> > to provide memory barrier semantic is the add_replace and replace : they
> > actually act as an higher-level "cmpxchg" over the hash table nodes.
> 
> I believe that this should do the same -- memory barrier before, but no
> memory barrier after on failure.
> 
> Another approach is C++11, in which there are a couple of arguments to
> the compare-and-swap primitive specifying the memory ordering constraints
> for the success and failure cases, respectively.

Hrm, I think adding this kind of flexibility might make the API too
clobbered by details that "usual cases" don't care about.

> 
> Unless you expect use cases with lots of failing cds_lfht_del() and
> add_unique() calls, the performance difference should not be significant.

The problem is that I expect very, very frequent failing add_unique
calls for use-cases like Binary Decision Diagrams (BDD) creation. This
is why I don't think it is appropriate to put memory barriers on the
failure cases, as these should have no more overhead than a simple
lookup.

And if we choose not to provide memory barriers for add_unique failure,
we should probably do the same for "del" to keep a consistent behavior
over the API.

Thoughts ?

Thanks,

Mathieu

> 
> 							Thanx, Paul
> 
> > Thoughts ?
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > > 
> > > Yep, makes sense.
> > > 
> > > I'll add this documentation in the API.
> > > 
> > > Thanks!
> > > 
> > > Mathieu
> > > 
> > > > 
> > > > cds_lfht_is_node_deleted(): Not an update (let alone an atomic update),
> > > > 	no barriers.
> > > > 
> > > > cds_lfht_resize(): Atomic update, but no return value, so no barrier
> > > > 	implied.
> > > > 
> > > > cds_lfht_for_each(): Not an update (let alone an atomic update),
> > > > 	no barriers.
> > > > 
> > > > cds_lfht_for_each_duplicate(): Ditto.
> > > > 
> > > > cds_lfht_for_each_entry(): Ditto.
> > > > 
> > > > cds_lfht_for_each_entry_duplicate(): Ditto.
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > > I'm personally getting confident that the hash table API is clean
> > > > > enough, and the implementation well tested, but I'd like to have your
> > > > > thoughts on the readiness of the hash table for production use.
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Mathieu
> > > > > 
> > > > > -- 
> > > > > Mathieu Desnoyers
> > > > > Operating System Efficiency R&D Consultant
> > > > > EfficiOS Inc.
> > > > > http://www.efficios.com
> > > > > 
> > > > > _______________________________________________
> > > > > rp mailing list
> > > > > rp at svcs.cs.pdx.edu
> > > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > lttng-dev mailing list
> > > > lttng-dev at lists.lttng.org
> > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > > 
> > > -- 
> > > Mathieu Desnoyers
> > > Operating System Efficiency R&D Consultant
> > > EfficiOS Inc.
> > > http://www.efficios.com
> > > 
> > > _______________________________________________
> > > lttng-dev mailing list
> > > lttng-dev at lists.lttng.org
> > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > 
> > -- 
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com
> > 
> > _______________________________________________
> > rp mailing list
> > rp at svcs.cs.pdx.edu
> > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com