[lttng-dev] [RFC PATCH] wfqueue: expand API, simplify implementation, small performance boost

Thu Aug 16 21:57:34 EDT 2012

Patches are sent? I'm in vacation, I can't recieve them.

__cds_wfq_for_each_blocking_safe() sounds like we can
delete any node in for_each(), it breaks the mantics of QUEUE.
and the users and we don't know how to delete a node. for_each_safe()
is just a special things for call_rcu_thread() which destroy all
node in a tmp queue.

so I prefer to use sync_next() in the call_rcu_thread() as my original patch
now, it is also very simple. the for_each_safe() sugar is not required
in call_rcu_thread().

=====
I think we need dequeue_all() API.
splice() = dequeue_all() + enqueue(list).
dequeue_all(): 1 xchg()
enqueue(): 1 xchg()
splice(): 2 xchg()

And if no dequeue_all(), the users have to use splice()+tmp instead,
not straightly.
So it is worth to have dequeue_all().
=====

the first smallest patch can be:
equeue(),sync_next(),dequeue(),dequeue_all(),splice().

Leave all the other things in future patches.

Thanks,
Lai.

"other things" list in my view:
What memroy barrier should be add to dequeue()?
for_each() APIs and mantics.
cache-line-opt queue or current(none-cache-line-opt) or both and
related compatibility thing.

On Fri, Aug 17, 2012 at 5:11 AM, Mathieu Desnoyers
<mathieu.desnoyers at efficios.com> wrote:
> * Lai Jiangshan (eag0628 at gmail.com) wrote:
>> We need the smallest patch at first, all other things left in disscussion.
>> no free_each(), simple changes.
>
> I guess you mean for_each().
>
> If we limit ourself to the versions where the user is doing the locking,
> I don't think there were any issues left.
>
> In the version I provided (in the last series of 2 patches), I took care
> of all issues that were raised in our email discussion.
>
> I prefer to introduce these new API members all in one go, mainly because
> I really don't want to add new API members (exposed through the public API)
> and then remove them afterward when we notice that they expose too many
> details. I think the current splice, dequeue, next, and first API
> members allow any user to do the kind of use-case that call_rcu is
> doing: this lets us achieve your original goal of not duplicating the
> code everywhere.
>
> If you still notice issues with __cds_wfq_for_each_blocking() and
> __cds_wfq_for_each_blocking_safe() in the last patch, please let me
> know,
>
> Thanks !
>
> Mathieu
>
>>
>> thanks,
>> Lai
>>
>> On Thu, Aug 16, 2012 at 10:08 AM, Lai Jiangshan <eag0628 at gmail.com> wrote:
>> >>>
>> >>> Is it false sharing?
>> >>> Access to q->head.next and access to q->tail have the same performance
>> >>> because they are in the same cache line.
>> >>
>> >> Yes! you are right! And a quick benchmark confirms it:
>> >>
>> >> with head and tail on same cache line:
>> >>
>> >> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    100833595 nr_dequeues     88647134 successful enqueues 100833595 successful dequeues     88646898 end_dequeues 12186697 nr_ops 189480729
>> >>
>> >> with a 256 bytes padding between head and tail, keeping the mutex on the
>> >> "head" cache line:
>> >>
>> >> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    228992829 nr_dequeues    228921791 successful enqueues 228992829 successful dequeues    228921367 end_dequeues 71462 nr_ops 457914620
>> >>
>> >> enqueue: 127% speedup
>> >> dequeue: 158% speedup
>> >>
>> >> That is indeed a _really_ huge difference. However, to get this, we
>> >> would have to increase the size of struct cds_wfq_queue beyond its
>> >> current size, which would break API compatibility. Any idea on how to
>> >> best do this without causing incompatibility would be welcome.
>> >>
>> >
>> > choice 1) two set of APIs?(cache-line-opt and none-cache-line-opt),
>> > many users don't need the cache-line-opt.
>> > choice 2) Just break the compatibility for NONE-LGPL. I think
>> > NONE-LGPL-user of it is rare. And current version of urcu <1.0, I
>> > don't like too much burden when <1.0.
>> >
>> >
>> > thanks,
>> > Lai
>
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com