[lttng-dev] [rp] [RFC] Readiness for URCU release with RCU lock-free hash table

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Tue May 1 17:02:06 EDT 2012


* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> On Tue, May 01, 2012 at 01:41:36PM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote:
> > > > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> > > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > > > On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote:
> > > > > > > Hi!
> > > > > > > 
> > > > > > > After 1 year of development, with the last 6-7 months spent polishing
> > > > > > > the API and testing the implementation, I think it is getting about time
> > > > > > > to release the RCU lock-free hash table in a new Userspace RCU version
> > > > > > > (0.7).
> > > > > > > 
> > > > > > > I recently described the guarantees provided by the hash table in more
> > > > > > > detail, and created tests for the uniqueness guarantee for traversals
> > > > > > > performed concurrently with add_unique and add_replace operations. I
> > > > > > > also added test modes that create long hash chains, to test corner-cases
> > > > > > > of the hash table.
> > > > > > > 
> > > > > > > One small thing I wonder is whether we should document that the hash
> > > > > > > table update operations imply full memory barriers ?
> > > > > > 
> > > > > > The Linux kernel's rule seems good here -- if a hash-table operation is
> > > > > > atomic and returns a value, it should imply a full barrier.  So:
> > > > > > 
> > > > > > cds_lfht_new(): No point in claiming barriers -- publishing the
> > > > > > 	pointer to the hash table is where the barriers are important.
> > > > > > 
> > > > > > cds_lfht_destroy(): Ditto.
> > > > > > 
> > > > > > cds_lfht_lookup(): Not an update (let alone an atomic update), no barriers.
> > > > > > 
> > > > > > cds_lfht_next_duplicate(): Ditto.
> > > > > > 
> > > > > > cds_lfht_first(): Ditto.
> > > > > > 
> > > > > > cds_lfht_next(): Ditto.
> > > > > > 
> > > > > > cds_lfht_add(): Atomic update, but no return value, so no barrier
> > > > > > 	implied.
> > > > > 
> > > > > Yep, makes sense. We use cmpxchg internally to perform the update, but
> > > > > it could make sense to eventually use a cmpxchg that has no memory
> > > > > barriers to perform this update. So I agree on not providing a memory
> > > > > barrier guarantee on the "add" operation, since it does not return any
> > > > > value.
> > > > > 
> > > > > > 
> > > > > > cds_lfht_add_unique(): Atomic update that returns a value, so should
> > > > > > 	imply a full memory barrier.
> > > > > 
> > > > > add_unique is a bit special:
> > > > > 
> > > > > - if it returns the node received as parameter, it means the add
> > > > >   succeeded, which imply an update, and thus a memory barrier.
> > > > 
> > > > Hrm, thinking further: if we make the "add" operation not act as a
> > > > full memory barrier, then the add_unique success should not act as a
> > > > full mb neither.
> > > 
> > > Think of it as being similar to the Linux kernel's atomic_inc() and
> > > atomic_add_return() primitives.  The latter guarantees memory barriers
> > > and the former does not.
> > 
> > I think add/add_unique vs atomic_inc/atomic_add_return are fundamentally
> > different.
> > 
> > atomic_inc:
> >  - input: increment
> >  - output: none
> >  - effect: increment target address value
> > 
> > atomic_add_return:
> >  - input: increment
> >  - output: incremented value
> >  - effect: increment target address value, atomically reading the
> >    resulting value.
> > 
> > hash table add:
> >  - input: new node to add
> >  - output: none
> >  - effect: atomically add the new node into the table
> > 
> > hash table add_unique (success):
> >  - input: new node to add
> >  - output: (we just return whether the operation has succeeded)
> >  - effect: atomically add the new node into the table
> > 
> > hash table add_unique (failure):
> >  - input: new node to try adding
> >  - output: (we just return whether the operation has succeeded)
> >  - effect: simple lookup (read)
> > 
> > So as we see, the add_unique failure only performs a read. Adding a
> > memory barrier before this read would require us to add a memory barrier
> > also on the success path, which would degrade performance. The success
> > path does: lookup failure, cmpxchg to add the node, retry if changed.
> > Adding a memory barrier before the lookup would add an extra memory
> > barrier in addition to the one located in the cmpxchg, and I don't think
> > we want that overhead.
> 
> Perhaps a better analogy is cmpxchg() in the Linux kernel.  Some
> architectures place the memory barrier before unconditionally, but place
> a memory barrier after only if the cmpxchg() succeeds.  Of course,
> architectures for which the cmpxchg instruction implies a barrier
> don't need any explicit memory-barrier instructions.
> 
> I do see your point about the memory barrier in the failure case requiring
> an additional memory barrier in the success case, however.
> 
> > > > > - if it returns a different node than the one received as parameter, it
> > > > >   failed, and thus means that it only performed a lookup, so there is no
> > > > >   guarantee that a memory barrier has been executed.
> > > > > 
> > > > > 
> > > > > > 
> > > > > > cds_lfht_add_replace(): Ditto.
> > > > > > 
> > > > > > cds_lfht_replace(): Ditto.
> > > > > > 
> > > > > > cds_lfht_del(): Ditto.
> > > > 
> > > > One more point: "del" is similar to add_unique: if it succeeds, we
> > > > execute a full memory barrier. If it fails, no memory barrier is
> > > > guaranteed. But if we want to make the guarantees relax, we might not
> > > > want to guarantee that a memory barrier is present in any of the "del"
> > > > cases.
> > > > 
> > > > In the end, the only primitives for which I think it really makes sense
> > > > to provide memory barrier semantic is the add_replace and replace : they
> > > > actually act as an higher-level "cmpxchg" over the hash table nodes.
> > > 
> > > I believe that this should do the same -- memory barrier before, but no
> > > memory barrier after on failure.
> > > 
> > > Another approach is C++11, in which there are a couple of arguments to
> > > the compare-and-swap primitive specifying the memory ordering constraints
> > > for the success and failure cases, respectively.
> > 
> > Hrm, I think adding this kind of flexibility might make the API too
> > clobbered by details that "usual cases" don't care about.
> 
> ;-)
> 
> > > Unless you expect use cases with lots of failing cds_lfht_del() and
> > > add_unique() calls, the performance difference should not be significant.
> > 
> > The problem is that I expect very, very frequent failing add_unique
> > calls for use-cases like Binary Decision Diagrams (BDD) creation. This
> > is why I don't think it is appropriate to put memory barriers on the
> > failure cases, as these should have no more overhead than a simple
> > lookup.
> > 
> > And if we choose not to provide memory barriers for add_unique failure,
> > we should probably do the same for "del" to keep a consistent behavior
> > over the API.
> > 
> > Thoughts ?
> 
> Given your BDD example, letting the failure cases avoid memory barriers
> seems reasonable.  If problems arise, the default interfaces can be
> fully ordered with faster versions where the failure cases avoid memory
> barriers.

I think we agree on the failure cases of del and add_unique, which is
good.

Now, I'm thinking slightly further about the success cases of add,
add_unique and del.

Typically, when giving a node to add or add_unique for population into
the hash table, we have to order the writes that initialize the node
before its insertion into the hash table. So I think it would make sense
to guarantee that we provide a memory barrier prior to add and
add_unique (success) operations.

For the "del" case, it's the other way around: if a deletion succeeds,
we want to have a memory barrier between the removal flag write and the
following "free()" of the element. So we need to ensure that there is a
memory barrier after a successful deletion.

If we think about it, the "add_replace" acts both as an "add" and "del":
in all cases, we need to have a memory barrier before the add_replace,
because we need to order the node initialization before the atomic
insertion into the hash table, and we also need a memory barrier after,
but only really in the case where add_replace replaced an old node, thus
acting as a delete of that old node.

Considering all this, the barrier semantic I'm proposing is:

add:
- full memory barrier before the insertion.

add_unique:
- success: full memory barrier before the insertion.
- failure: no memory barrier.

replace:
- success: full memory barrier before the insertion, and after the
  removal of the old node.
- failure: no memory barrier.

add_replace:
- returns an old node (replace): full memory barrier before the
  insertion, and after the removal of the old node.
- returns NULL (add): full memory barrier before the insertion.

del:
- success: full memory barrier after the removal.
- failure: no memory barrier.

I think those are the minimal barriers we need to provide to ensure that
users will not have to add barriers of their own to use the API.

Thoughts ?

Thanks,

Mathieu


> 
> 							Thanx, Paul
> 
> > Thanks,
> > 
> > Mathieu
> > 
> > 
> > > 
> > > 							Thanx, Paul
> > > 
> > > > Thoughts ?
> > > > 
> > > > Thanks,
> > > > 
> > > > Mathieu
> > > > 
> > > > > 
> > > > > Yep, makes sense.
> > > > > 
> > > > > I'll add this documentation in the API.
> > > > > 
> > > > > Thanks!
> > > > > 
> > > > > Mathieu
> > > > > 
> > > > > > 
> > > > > > cds_lfht_is_node_deleted(): Not an update (let alone an atomic update),
> > > > > > 	no barriers.
> > > > > > 
> > > > > > cds_lfht_resize(): Atomic update, but no return value, so no barrier
> > > > > > 	implied.
> > > > > > 
> > > > > > cds_lfht_for_each(): Not an update (let alone an atomic update),
> > > > > > 	no barriers.
> > > > > > 
> > > > > > cds_lfht_for_each_duplicate(): Ditto.
> > > > > > 
> > > > > > cds_lfht_for_each_entry(): Ditto.
> > > > > > 
> > > > > > cds_lfht_for_each_entry_duplicate(): Ditto.
> > > > > > 
> > > > > > 							Thanx, Paul
> > > > > > 
> > > > > > > I'm personally getting confident that the hash table API is clean
> > > > > > > enough, and the implementation well tested, but I'd like to have your
> > > > > > > thoughts on the readiness of the hash table for production use.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > Mathieu
> > > > > > > 
> > > > > > > -- 
> > > > > > > Mathieu Desnoyers
> > > > > > > Operating System Efficiency R&D Consultant
> > > > > > > EfficiOS Inc.
> > > > > > > http://www.efficios.com
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > rp mailing list
> > > > > > > rp at svcs.cs.pdx.edu
> > > > > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > lttng-dev mailing list
> > > > > > lttng-dev at lists.lttng.org
> > > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > > > > 
> > > > > -- 
> > > > > Mathieu Desnoyers
> > > > > Operating System Efficiency R&D Consultant
> > > > > EfficiOS Inc.
> > > > > http://www.efficios.com
> > > > > 
> > > > > _______________________________________________
> > > > > lttng-dev mailing list
> > > > > lttng-dev at lists.lttng.org
> > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > > > 
> > > > -- 
> > > > Mathieu Desnoyers
> > > > Operating System Efficiency R&D Consultant
> > > > EfficiOS Inc.
> > > > http://www.efficios.com
> > > > 
> > > > _______________________________________________
> > > > rp mailing list
> > > > rp at svcs.cs.pdx.edu
> > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > > 
> > > 
> > 
> > -- 
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com
> > 
> > _______________________________________________
> > rp mailing list
> > rp at svcs.cs.pdx.edu
> > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > 
> 
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list