[ltt-dev] [rp] [RFC] URCU concurrent data structure API

Sat Sep 3 16:10:54 EDT 2011

On Sat, Sep 03, 2011 at 08:30:55AM -0400, Mathieu Desnoyers wrote:
> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > On Wed, Aug 17, 2011 at 10:23:33PM -0400, Mathieu Desnoyers wrote:
> > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > > On Wed, Aug 17, 2011 at 12:40:39PM -0400, Mathieu Desnoyers wrote:
> [...]
> > > 
> > > Actually, if you do use synchronize_rcu() anywhere in the library,
> > > you need to explicitly prohibit the caller being in an RCU read-side
> > > critical section.  So I am not so sure that doing synchronize_rcu()
> > > in any library function is all that great an idea in the first
> > > place, at least in the general case.
> > > 
> > > A library function doing synchronize_rcu() needs to be documented
> > > and carefully handled to the same extent as would a library function
> > > that does a blocking read across the network, right?
> > 
> > Definitely agreed. synchronize_rcu should really be frowned upon in
> > data container libraries. I was mainly pointing out that having access
> > directly to the library functions provides more freedom to the data
> > structure implementers.
> [...]
> > 
> > > 
> > > > So far, doing macro magic with solution #3 appears to be the best
> > > > approach we have. The current implementation (in urcu master branch
> > > > HEAD) supports linking various flavors of the CDS library within the
> > > > same application (using different symbols for each one).
> > > > 
> > > > Thoughts ?
> > > 
> > > I am not yet convinced that we want to abandon #1 so quickly.  There
> > > will probably be yet more user-level RCU algorithms produced, both
> > > within this library and in other libraries, so minimizing the code
> > > where the compiler needs to know which user-level algorithm is in
> > > use seems to me to be a very good thing.
> > 
> > Yes, agreed.
> > 
> > > For example, if someone implements their own user-level RCU, approach
> > > #1 would allow them to use that in conjunction with the cds_ algorithms
> > > without even recompiling the urcu library.  In contrast, approach #3
> > > would require them to add their implementation to the urcu library,
> > > which would be a bad thing if their RCU implementation was so specialized
> > > that we didn't want to accept it into urcu, right?
> > 
> > One way to do that might be to provide an automated build system that
> > generates .so for all urcu flavors from any given data container. E.g.,
> > if we can automatically append suffixes to cds symbols, we could get
> > somewhere. It would be like the map header trick, but automated.
> > 
> > > 
> > > > Thanks for your feedback!
> > > 
> > > Definitely an interesting topic.  ;-)
> > 
> > I think there is an important point to discuss with respect to this
> > question too: do we want to give the cds_ functions direct control on
> > locking, or do we want to leave that to the caller ?
> > 
> > I can list at least one case where giving control over locking to the
> > cds_ functions would be useful: if we want the update-side of RCU RB
> > tree to eventually scale, we could try assigning sub-parts of the tree
> > to different mutexes, and use one mutex for the first levels of the
> > tree. This would require the data structure to tie the mutexes to the
> > actual data, so there is really no way the application could do that.
> > 
> > So if we choose to let data structures control locking, then the
> > decision to generate one cds .so per RCU flavor would start to make more
> > sense, because RCU can be seen as "locking" in some ways (although it is
> > more formally synchronization than locking).
> > 
> > Giving control over RCU read-side is currently not much of an issue,
> > because we can afford to let the application keep the rcu read-side lock
> > without much practical limitation for the current data structures we
> > have, but I think we might get into limitations for other data
> > structures in the future, as shown with my multi-lock tree example.
> 
> I thought about it some more, and had discussions with various people,
> and there are a few reasons to go for a scheme where rcu read lock
> should be taken by the caller, and to pass call_rcu as a parameter to
> the data structure init function:
> 
> A) The advantage, as you pointed out, is that one single .so is enough
>    to support all RCU flavors. Very convenient for external data
>    structure containers.
> 
> B) It clearly documents where rcu read-side locks are needed, so the user
>    keep control and in-depth understanding of their read-side locks.
> 
> C) When multiple API functions that require RCU read-side lock to be
>    held (sometimes even the same lock) throughout a sequence of API
>    calls, we have no choice but to let the caller hold the read-side
>    lock.
> 
> D) Due to support of multiple nesting of rcu read-side lock, any
>    "improvement" we could get by releasing the read-side lock in
>    retry loops would vanish in the cases where we are called within
>    nested C.S..
> 
> E) If a library uses synchronize_rcu, this should be clearly documented,
>    and even frowned upon, because this involves important limitations on
>    the design of the caller, and important performance hit. There are
>    usually ways to reach the same result through use of call_rcu, which
>    should really be used thoroughout these libraries.
> 
> F) It clearly documents when a data structure needs to use call_rcu
>    internally.
> 
> G) Some very early benchmark results show that there is indeed not
>    much performance gain to achieve by inlining call_rcu, even if it is
>    a version with a cache for the "call_rcu structure" lookup
>    (per-cpu/per-thread/global). So passing it as a parameter to
>    the data structure init function should be fine, even in cases
>    where it is called very often.
> 
> H) For use-cases where applications would like to use more than one
>    RCU flavor concurrently (which is now supported), leaving management
>    of RCU read-side C.S. to the reader allows the application to take
>    more than one RCU read-side lock across API calls. It also lets the
>    application specify its own call_rcu function that could handle more
>    than one RCU flavor.

Plus this allows these data structures to work correctly and
straightforwardly with some special-case application-specific
user-implemented RCU.

> So for all these reasons, I will go back to the API we have in our last
> release (0.6.4), therefore reverting some of the API changes I did on
> the git urcu master branch.

Sounds very good to me!

							Thanx, Paul

> Thank you everyone for this very precious feedback!
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com