[lttng-dev] [RFC PATCH] wfqueue: expand API, simplify implementation, small performance boost
Lai Jiangshan
eag0628 at gmail.com
Wed Aug 15 22:08:54 EDT 2012
>>
>> Is it false sharing?
>> Access to q->head.next and access to q->tail have the same performance
>> because they are in the same cache line.
>
> Yes! you are right! And a quick benchmark confirms it:
>
> with head and tail on same cache line:
>
> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 100833595 nr_dequeues 88647134 successful enqueues 100833595 successful dequeues 88646898 end_dequeues 12186697 nr_ops 189480729
>
> with a 256 bytes padding between head and tail, keeping the mutex on the
> "head" cache line:
>
> SUMMARY /home/compudj/doc/userspace-rcu/tests/.libs/lt-test_urcu_wfq testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 228992829 nr_dequeues 228921791 successful enqueues 228992829 successful dequeues 228921367 end_dequeues 71462 nr_ops 457914620
>
> enqueue: 127% speedup
> dequeue: 158% speedup
>
> That is indeed a _really_ huge difference. However, to get this, we
> would have to increase the size of struct cds_wfq_queue beyond its
> current size, which would break API compatibility. Any idea on how to
> best do this without causing incompatibility would be welcome.
>
choice 1) two set of APIs?(cache-line-opt and none-cache-line-opt),
many users don't need the cache-line-opt.
choice 2) Just break the compatibility for NONE-LGPL. I think
NONE-LGPL-user of it is rare. And current version of urcu <1.0, I
don't like too much burden when <1.0.
thanks,
Lai
More information about the lttng-dev
mailing list