[lttng-dev] [RFC PATCH] wfqueue: expand API, simplify implementation, small performance boost

Wed Aug 15 22:08:54 EDT 2012

>>
>> Is it false sharing?
>> Access to q->head.next and access to q->tail have the same performance
>> because they are in the same cache line.
>
> Yes! you are right! And a quick benchmark confirms it:
>
> with head and tail on same cache line:
>
> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    100833595 nr_dequeues     88647134 successful enqueues 100833595 successful dequeues     88646898 end_dequeues 12186697 nr_ops 189480729
>
> with a 256 bytes padding between head and tail, keeping the mutex on the
> "head" cache line:
>
> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq testdur   10 nr_enqueuers   1 wdelay      0 nr_dequeuers   1 rdur      0 nr_enqueues    228992829 nr_dequeues    228921791 successful enqueues 228992829 successful dequeues    228921367 end_dequeues 71462 nr_ops 457914620
>
> enqueue: 127% speedup
> dequeue: 158% speedup
>
> That is indeed a _really_ huge difference. However, to get this, we
> would have to increase the size of struct cds_wfq_queue beyond its
> current size, which would break API compatibility. Any idea on how to
> best do this without causing incompatibility would be welcome.
>

choice 1) two set of APIs?(cache-line-opt and none-cache-line-opt),
many users don't need the cache-line-opt.
choice 2) Just break the compatibility for NONE-LGPL. I think
NONE-LGPL-user of it is rare. And current version of urcu <1.0, I
don't like too much burden when <1.0.

thanks,
Lai