[ltt-dev] [URCU RFC patch 3/3] call_rcu: remove delay for wakeup scheme

Paul E. McKenney paulmck at linux.vnet.ibm.com
Mon Jun 6 15:41:25 EDT 2011


On Mon, Jun 06, 2011 at 03:21:07PM -0400, Mathieu Desnoyers wrote:
> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> > I notice that the "poll(NULL, 0, 10);" delay is executed both for the RT
> > and non-RT code.  So given that my goal is to get the call_rcu thread to
> > GC memory as quickly as possible to diminish the overhead of cache
> > misses, I decided to try removing this delay for !RT: the call_rcu
> > thread then wakes up ASAP when the thread invoking call_rcu wakes it. My
> > updates jump to 76349/s (getting there!) ;).
> > 
> > This improvement can be explained by a lower delay between call_rcu and
> > execution of its callback, which decrease the amount of cache used, and
> > therefore provides better cache locality.
> 
> I just wonder if it's worth it: removing this delay from the !RT
> call_rcu thread can cause high-rate of synchronize_rcu() calls. So
> although there might be an advantage in terms of update rate, it will
> likely cause extra cache-line bounces between the call_rcu threads and
> the reader threads.
> 
> test_urcu_rbtree 7 1 20 -g 1000000
> 
> With the delay in the call_rcu thread:
> search:  1842857 items/reader thread/s (7 reader threads)
> updates:   21066 items/s (1 update thread)
> ratio: 87 search/update
> 
> Without the delay in the call_rcu thread:
> search:  3064285 items/reader thread/s (7 reader threads)
> updates:   45096 items/s (1 update thread)
> ratio: 68 search/update
> 
> So basically, adding the delay doubles the update performance, at the
> cost of being 33% slower for reads. My first thought is that if an
> application has very frequent updates, then maybe it wants to have fast
> updates because the update throughput is then important. If the
> application has infrequent updates, then the reads will be fast anyway,
> because rare call_rcu invocation will trigger less cache-line bounce
> between readers and writers. Any other thoughts on this trade-off and
> how to deal with it ?

One approach would be to let the user handle it using real-time
priority adjustment.  Another approach would be to let the user
specify the wait time in milliseconds, and skip the poll() system
call if the specified wait time is zero.

The latter seems more sane to me.  It also allows the user to
specify (say) 10000 milliseconds for cases where there is a
lot of memory and where amortizing synchronize_rcu() overhead
across a large number of updates is important.

Other thoughts?

						Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> 
> > 
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> > ---
> >  urcu-call-rcu-impl.h |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > Index: userspace-rcu/urcu-call-rcu-impl.h
> > ===================================================================
> > --- userspace-rcu.orig/urcu-call-rcu-impl.h
> > +++ userspace-rcu/urcu-call-rcu-impl.h
> > @@ -242,7 +242,8 @@ static void *call_rcu_thread(void *arg)
> >  		else {
> >  			if (&crdp->cbs.head == _CMM_LOAD_SHARED(crdp->cbs.tail))
> >  				call_rcu_wait(crdp);
> > -			poll(NULL, 0, 10);
> > +			else
> > +				poll(NULL, 0, 10);
> >  		}
> >  	}
> >  	call_rcu_lock(&crdp->mtx);
> > 
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com




More information about the lttng-dev mailing list