[lttng-dev] [RFC PATCH] wfqueue: expand API, simplify implementation, small performance boost

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Thu Aug 16 17:11:55 EDT 2012


* Lai Jiangshan (eag0628 at gmail.com) wrote:
> We need the smallest patch at first, all other things left in disscussion.
> no free_each(), simple changes.

I guess you mean for_each().

If we limit ourself to the versions where the user is doing the locking,
I don't think there were any issues left.

In the version I provided (in the last series of 2 patches), I took care
of all issues that were raised in our email discussion.

I prefer to introduce these new API members all in one go, mainly because
I really don't want to add new API members (exposed through the public API)
and then remove them afterward when we notice that they expose too many
details. I think the current splice, dequeue, next, and first API
members allow any user to do the kind of use-case that call_rcu is
doing: this lets us achieve your original goal of not duplicating the
code everywhere.

If you still notice issues with __cds_wfq_for_each_blocking() and
__cds_wfq_for_each_blocking_safe() in the last patch, please let me
know,

Thanks !

Mathieu

> 
> thanks,
> Lai
> 
> On Thu, Aug 16, 2012 at 10:08 AM, Lai Jiangshan <eag0628 at gmail.com> wrote:
> >>>
> >>> Is it false sharing?
> >>> Access to q->head.next and access to q->tail have the same performance
> >>> because they are in the same cache line.
> >>
> >> Yes! you are right! And a quick benchmark confirms it:
> >>
> >> with head and tail on same cache line:
> >>
> >> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    100833595 nr_dequeues     88647134 successful enqueues 100833595 successful dequeues     88646898 end_dequeues 12186697 nr_ops 189480729
> >>
> >> with a 256 bytes padding between head and tail, keeping the mutex on the
> >> "head" cache line:
> >>
> >> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    228992829 nr_dequeues    228921791 successful enqueues 228992829 successful dequeues    228921367 end_dequeues 71462 nr_ops 457914620
> >>
> >> enqueue: 127% speedup
> >> dequeue: 158% speedup
> >>
> >> That is indeed a _really_ huge difference. However, to get this, we
> >> would have to increase the size of struct cds_wfq_queue beyond its
> >> current size, which would break API compatibility. Any idea on how to
> >> best do this without causing incompatibility would be welcome.
> >>
> >
> > choice 1) two set of APIs?(cache-line-opt and none-cache-line-opt),
> > many users don't need the cache-line-opt.
> > choice 2) Just break the compatibility for NONE-LGPL. I think
> > NONE-LGPL-user of it is rare. And current version of urcu <1.0, I
> > don't like too much burden when <1.0.
> >
> >
> > thanks,
> > Lai

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list