[ltt-dev] [rp] [RFC] URCU concurrent data structure API

Wed Aug 17 17:23:52 EDT 2011

On Wed, Aug 17, 2011 at 12:40:39PM -0400, Mathieu Desnoyers wrote:
> Hi,
> 
> I'm currently trying to find a good way to present the cds_ data
> structure APIs within URCU for data structures depending on RCU for
> their synchronization. The main problem is that we have many flavors of
> rcu_read_lock/unlock and call_rcu to deal with.
> 
> Various approaches are possible:
> 
> 1) The current approach: require that the callers pass call_rcu as
>    parameter to data structure init functions, and require that the
>    callers hold rcu_read_lock across API invocation.
> 
>    downsides: holds rcu read lock across busy-waiting loops (for longer
>    than actually needed). Passing call_rcu as parameter and putting
>    requirements on the lock held when calling the API complexify the API,
>    and makes it impossible to inline call_rcu invocations.

The function-call overhead for call_rcu() should not be a big deal.
I am not all that concerned about an RCU read-side critical section
covering the busy waiting -- my guess is that the busy waiting itself
would become a problem long before the overly long RCU read-side
critical section becomes a problem.

> 2) Require all callers to pass call_rcu *and* rcu_read_lock/unlock as
>    parameter to data structure init function.
> 
>    downsides: both call_rcu and read lock/unlock become function calls
>    (slower). Complexify the API.
> 
> 3) Don't require caller to pass anything rcu-related to data structure
>    init. Would require to compile one instance of each data structure
>    per RCU flavor shared object (like we're doing with call_rcu now).
> 
>    Downside: we would need to ship per-rcu-flavor version of each data
>    structure.
> 
>    Upside: simple API, no rcu read-side lock around busy-waiting loops,
>    ability to inline both call_rcu and rcu_read_lock/unlock within the
>    data structure handling code.

If we do #3, it is best to make sure that different library functions
making different RCU-flavor choices can be linked into a single program.
More preprocessor trick, I guess...

> There are probably others, but I think it gives an idea of the main
> scenarios I consider. I start to like (3) more and more, and I'm tempted
> to move to it, but I would really like feedback on this API matter
> before I take any decision.

Actually, #1 seems simpler than #3 and with few major downsides.

I can imagine the following additional possibilities:

 4) Create a table mapping from the call_rcu() variant to the
    corresponding rcu_read_lock() and rcu_read_unlock() variants.  If busy
    waiting is required, look up the rcu_read_lock() and rcu_read_unlock()
    variants and then call them.  (If you are busy waiting anyway,
    who cares about a bit of extra overhead?)

    I don't see this as a reasonable initial alternative, but it would
    be a decent place to migrate to if starting from #1 and the long
    RCU read-side critical sections did somehow turn out to be a problem.

 5) Like #4, but map to a function that does rcu_read_unlock() followed
    immediately by rcu_read_lock().

 6) Like #1, but make the caller deal with deallocation.  Then the caller
    gets to select which of the call_rcu() variants should be used.
    (Yes, there might be a reason why this is a bad idea, I do need to
    go review the implementations.)

If #6 is impractical, I still like #1 with either #4 or #5 as fallbacks.

So, what am I missing?

							Thanx, Paul