[lttng-dev] my user space rcu code

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Thu Feb 7 12:17:06 EST 2013


* Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> * 赵宇龙 (zylthinking at gmail.com) wrote:
> > Hi,
> > 
> > I write a user space rcu, code at
> > https://github.com/zylthinking/tools/blob/master/rcu.h
> > https://github.com/zylthinking/tools/blob/master/rcu.c.
> > 
> > I notice the main difference  with liburcu should be I use a daemon thread
> > to do works such as waking up sleepers, while the liburcu does not.
> > 
> > I wonder why we can't use such a thread. When we use it, the
> > rcu_read_(un)lock will not includes wake up writers any more. which will
> > help to improve performance. It is the cost of such a daemon thread is too
> > high?
> > 
> > zhao yulong
> 
> Hi Zhao,
> 
> The main reason for having the "wakeup writer" present in the
> rcu_read_unlock() path in liburcu is to be energy-efficient: we don't
> want any thread to consume power unless they really have to. For
> instance, endless busy-waiting on a variable is avoided.
> 
> This is why we rely on sys_futex to wake up awaiting writers from
> rcu_read_unlock(). AFAIU, your rcu_daemon() thread is always active, and
> even though it calls "sched_yield()" to be nice to others, it will keep
> one CPU always powered on.
> 
> One important thing to notice is that liburcu rcu_read_unlock() only
> calls sys_futex if a writer is waiting. Therefore, in a scenario with
> frequent reads and infrequent updates, rcu_read_unlock() only has to
> take the performance overhead of a load, test and branch, actually
> skipping the futex wake call.
> 
> Another reason for not going for a worker thread to handle wait/wakeup
> between synchronize_rcu and rcu_read_unlock is to minimize impact on the
> application process/threading model. This is especially true for
> applications that rely on fork() _not_ followed by exec(): Linux
> actually copies a single thread of the parent (the one executing
> fork()), and discards all other threads. Therefore, we must be aware
> that adding an in-library thread will require users to handle the
> fork()-not-followed-by-exec() case carefully. Since we had no other
> choice, we rely on worker threads for call_rcu, but we don't use worker
> threads for the "simpler" use-case of synchronize_rcu().
> 
> A third reason for directly waking up the writer thread rather than
> having a worker thread dispatching this information is speed. Given the
> power efficiency constraints expressed above, we would have to issue one
> system call from the rcu_read_unlock() site to wake up the rcu_daemon()
> thread (so it does not have to busy-wait), and another system call to
> wake up the writer, involving a third thread in what should really
> involve only two threads. This will therefore add overhead to this
> signalling by requiring the scheduler to perform one extra context
> switch, and may involve extra communication between processors, since
> rcu_daemon() will likely execute on a different CPU, and will have to
> bring in cache lines from other processors.
> 
> Finally, let's discuss the real-time aspect. For RT, we ideally want
> wait-free rcu_read_lock/unlock. Indeed, having a sys_futex wakeup call
> in rcu_read_unlock() could arguably be seen a making the unlock path
> less strictly wait-free (in case you would be concerned about the
> internal implementation of sys_futex wake not being entirely wait-free).
> Currently, on Linux, one way to change the behavior of rcu_read_unlock()
> to make it even more RT-friendly (in case you are concerned about using
> sys_futex() on a pure RT thread) is to undefine CONFIG_RCU_HAVE_FUTEX,
> thus changing the behavior of urcu/futex.h and compat_futex.c. The
> futex_async() call will then do busy-waiting on the FUTEX_WAIT side
> (waiting 10us between attempts), and do exactly _nothing_ on the wake-up

sorry, I meant 10ms rather than 10us.

Thanks,

Mathieu

> side, which is certainly wait-free. This will be less energy-efficient,
> of course, but will provide a strictly wait-free rcu_read_unlock().
> 
> We might want to consider creating a liburcu-rt.so for real-time
> use-cases that prefer the non-energy-efficient wait, along with the
> strictly wait-free rcu_read_unlock(). Thoughts ?
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list