[ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)
Mathieu Desnoyers
compudj at krystal.dyndns.org
Wed Feb 11 03:58:52 EST 2009
* Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> Mathieu Desnoyers wrote:
> >
> > I just did a mb() version of the urcu :
> >
> > (uncomment CFLAGS=+-DDEBUG_FULL_MB in the Makefile)
> >
> > Time per read : 48.4086 cycles
> > (about 6-7 times slower, as expected)
> >
>
> I had read many papers of Paul.
> (http://www.rdrop.com/users/paulmck/RCU/)
> and I know Paul did his endeavor to remove memory barrier in
> RCU read site in kernel. His work is of consequence.
>
> But, I think,
> 1) Userspace RCU's read site can pay for the latency of
> memory barrier(include atomic operator).
> Userspace does not access to shared data so frequently as kernel.
> and userspace's read site is not so fast as kernel.
>
> 2) Userspace uses RCU is for RCU's excellence, not saving a little cpu cycles
> (http://lwn.net/Articles/263130/)
> One of the most important excellence is lock-free.
>
>
> If my thinking is right, the following opinion has some meaning too.
>
> Use All-SYSTEM 's RCU for Userspace RCU.
>
> All-SYSTEM 's RCU is QRCU which is implemented by Paul.
> http://lwn.net/Articles/223752/
>
> Any system which has mechanisms equivalent to atomic_op,
> __wait_event, wake_up, mutex, This system can also implement QRCU.
> So most system can implement QRCU, and I say QRCU is All-SYSTEM 's RCU.
>
> Obviously, we can implement a portable QRCU highly simply in NPTL.
> and read lock is:
> for (;;) {
> int idx = qp->completed & 0x1;
> if (likely(atomic_inc_not_zero(qp->ctr + idx)))
> return idx;
> }
> "atomic_inc_not_zero" is called once likely, it's fast enough.
>
Hi Lai,
There are a few reasons why we need rcu in userspace for tracing :
- We need very fast per-cpu read-side synchronization for data structure
handling. Updates are rare (enabling/disabling tracing). Therefore,
your argument about userspace not needing "fast" rcu does not hold in
this case. Note that LTTng has the performance it has today in the
kernel because I made sure to use no memory barriers when unnecessary
and because I used the minimal amount of atomic operations required.
Those represent costly synchronization primitives on quite a few
architectures.
- Being lock-free (atomic). To trace code executed in signal handlers,
we need to be able to nest over any user code. With the solution you
propose above, the busy-loop in the read-lock does not seems to be
signal-safe : if it nests over a writer, it could busy-loop forever.
Mathieu
> Lai.
>
>
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
More information about the lttng-dev
mailing list