[ltt-dev] [URCU RFC patch 3/3] call_rcu: remove delay for wakeup scheme

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Mon Jun 6 15:21:07 EDT 2011

* Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> I notice that the "poll(NULL, 0, 10);" delay is executed both for the RT
> and non-RT code.  So given that my goal is to get the call_rcu thread to
> GC memory as quickly as possible to diminish the overhead of cache
> misses, I decided to try removing this delay for !RT: the call_rcu
> thread then wakes up ASAP when the thread invoking call_rcu wakes it. My
> updates jump to 76349/s (getting there!) ;).
> This improvement can be explained by a lower delay between call_rcu and
> execution of its callback, which decrease the amount of cache used, and
> therefore provides better cache locality.

I just wonder if it's worth it: removing this delay from the !RT
call_rcu thread can cause high-rate of synchronize_rcu() calls. So
although there might be an advantage in terms of update rate, it will
likely cause extra cache-line bounces between the call_rcu threads and
the reader threads.

test_urcu_rbtree 7 1 20 -g 1000000

With the delay in the call_rcu thread:
search:  1842857 items/reader thread/s (7 reader threads)
updates:   21066 items/s (1 update thread)
ratio: 87 search/update

Without the delay in the call_rcu thread:
search:  3064285 items/reader thread/s (7 reader threads)
updates:   45096 items/s (1 update thread)
ratio: 68 search/update

So basically, adding the delay doubles the update performance, at the
cost of being 33% slower for reads. My first thought is that if an
application has very frequent updates, then maybe it wants to have fast
updates because the update throughput is then important. If the
application has infrequent updates, then the reads will be fast anyway,
because rare call_rcu invocation will trigger less cache-line bounce
between readers and writers. Any other thoughts on this trade-off and
how to deal with it ?



> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> ---
>  urcu-call-rcu-impl.h |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> Index: userspace-rcu/urcu-call-rcu-impl.h
> ===================================================================
> --- userspace-rcu.orig/urcu-call-rcu-impl.h
> +++ userspace-rcu/urcu-call-rcu-impl.h
> @@ -242,7 +242,8 @@ static void *call_rcu_thread(void *arg)
>  		else {
>  			if (&crdp->cbs.head == _CMM_LOAD_SHARED(crdp->cbs.tail))
>  				call_rcu_wait(crdp);
> -			poll(NULL, 0, 10);
> +			else
> +				poll(NULL, 0, 10);
>  		}
>  	}
>  	call_rcu_lock(&crdp->mtx);

Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.

More information about the lttng-dev mailing list