[ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)

Tue Feb 10 14:17:31 EST 2009

* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> On Mon, Feb 09, 2009 at 02:03:17AM -0500, Mathieu Desnoyers wrote:
> 
> [ . . . ]
> 
> > I just added modified rcutorture.h and api.h from your git tree
> > specifically for an urcutorture program to the repository. Some results :
> > 
> > 8-way x86_64
> > E5405 @2 GHZ
> > 
> > ./urcutorture 8 perf
> > n_reads: 1937650000  n_updates: 3  nreaders: 8  nupdaters: 1 duration: 1
> > ns/read: 4.12871  ns/update: 3.33333e+08
> > 
> > ./urcutorture 8 uperf
> > n_reads: 0  n_updates: 4413892  nreaders: 0  nupdaters: 8 duration: 1
> > ns/read: nan  ns/update: 1812.46
> > 
> > n_reads: 98844204  n_updates: 10  n_mberror: 0
> > rcu_stress_count: 98844171 33 0 0 0 0 0 0 0 0 0
> > 
> > However, I've tried removing the second switch_qparity() call, and the
> > rcutorture test did not detect anything wrong. I also did a variation
> > which calls the "sched_yield" version of the urcu, "urcutorture-yield".
> 
> My confusion -- I was testing my old approach where the memory barriers
> are in rcu_read_lock() and rcu_read_unlock().  To force the failures in
> your signal-handler-memory-barrier approach, I suspect that you are
> going to need a bigger hammer.  In this case, one such bigger hammer
> would be:
> 
> o	Just before exit from the signal handler, do a
> 	pthread_cond_wait() under a pthread_mutex().
> 
> o	In force_mb_all_threads(), refrain from sending a signal to self.
> 
> 	Then it should be safe in force_mb_all_threads() to do a
> 	pthread_cond_broadcast() under the same pthread_mutex().
> 
> This should raise the probability of seeing the failure in the case
> where there is a single switch_qparity().
> 

I just did a mb() version of the urcu :

(uncomment CFLAGS=+-DDEBUG_FULL_MB in the Makefile)

Time per read : 48.4086 cycles
(about 6-7 times slower, as expected)

This will be useful especially to increase the chance to trigger races.

I tried removing the second parity switch from the writer. The rcu
torture test did not find the problem yet (maybe I am not using the
correct parameters ? It does not run for more than 5 seconds).

So I added a "-n" option to test_urcu, so it can make the usleep(1)
between the writes optional. I also changed the yield for a usleep with
random delay. I also now use a circular buffer rather than malloc so we
are sure the memory is not quickly reused by the writer and stays longer
in an invalid state.

So what really make the problem appear quickly is to add a delay between
the rcu_dereference and the assertion on the data validity in thr_reader.

It now appears after just a few seconds when running
./test_urcu_yield 20 -r -n
Compiled with CFLAGS=+-DDEBUG_FULL_MB

It seem to be much harder to trigger with the signal-based version. It's
expected, because the writer takes about 50 times longer to execute than
with the -DDEBUG_FULL_MB version.

So I'll let the ./test_urcu_yield NN -r -n run for a while on the
correct version (with DEBUG_FULL_MB) and see what it gives.

Mathieu

> 							Thanx, Paul
> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev at lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68