[ltt-dev] [rp] [URCU RFC patch 3/3] call_rcu: remove delay for wakeup scheme

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Mon Jun 6 18:29:37 EDT 2011


* Phil Howard (pwh at cecs.pdx.edu) wrote:
> On Mon, Jun 6, 2011 at 12:21 PM, Mathieu Desnoyers
> <mathieu.desnoyers at efficios.com> wrote:
> > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> >> I notice that the "poll(NULL, 0, 10);" delay is executed both for the RT
> >> and non-RT code.  So given that my goal is to get the call_rcu thread to
> >> GC memory as quickly as possible to diminish the overhead of cache
> >> misses, I decided to try removing this delay for !RT: the call_rcu
> >> thread then wakes up ASAP when the thread invoking call_rcu wakes it. My
> >> updates jump to 76349/s (getting there!) ;).
> >>
> >> This improvement can be explained by a lower delay between call_rcu and
> >> execution of its callback, which decrease the amount of cache used, and
> >> therefore provides better cache locality.
> >
> > I just wonder if it's worth it: removing this delay from the !RT
> > call_rcu thread can cause high-rate of synchronize_rcu() calls. So
> > although there might be an advantage in terms of update rate, it will
> > likely cause extra cache-line bounces between the call_rcu threads and
> > the reader threads.
> >
> > test_urcu_rbtree 7 1 20 -g 1000000
> >
> > With the delay in the call_rcu thread:
> > search:  1842857 items/reader thread/s (7 reader threads)
> > updates:   21066 items/s (1 update thread)
> > ratio: 87 search/update
> >
> > Without the delay in the call_rcu thread:
> > search:  3064285 items/reader thread/s (7 reader threads)
> > updates:   45096 items/s (1 update thread)
> > ratio: 68 search/update
> >
> > So basically, adding the delay doubles the update performance, at the
> > cost of being 33% slower for reads. My first thought is that if an
> > application has very frequent updates, then maybe it wants to have fast
> > updates because the update throughput is then important. If the
> > application has infrequent updates, then the reads will be fast anyway,
> > because rare call_rcu invocation will trigger less cache-line bounce
> > between readers and writers. Any other thoughts on this trade-off and
> > how to deal with it ?
> >
> 
> Did I miss something here? It looks like you more than doubled the
> update rate and almost doubled the lookup rate. The search/update
> ration is less, but if both the raw rates improved so much, how is
> this a bad thing?

Actually, my discussion of the results was good, but the I mis-entered
the raw results. Here is the re-run of the tests, with the results well
entered this time. I notice that on repeated runs, the update rates
seems to be much closer between delay vs no-delay than the original
difference I noticed.

test_urcu_rbtree 7 1 20 -g 1000000

With the delay in the call_rcu thread:
search:  3064285 items/reader thread/s (7 reader threads)
updates:   43051 items/s (1 update thread)
ratio:        71 search/update

Without the delay in the call_rcu thread:
search:  1550000 items/reader thread/s (7 reader threads)
updates:   47221 items/s (1 update thread)
ratio:       33 search/update

So removing the delay seems to hurt read performance quite a lot, and
does not benefit updates as much as I initially thought (it's only
9.6%). I would be tempted to just leave the delay in place for !RT case
then.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com




More information about the lttng-dev mailing list