[lttng-dev] [RFC URCU PATCH] wfqueue: ABI v1, simplify implementation, 2.3x to 2.6x performance boost

Mon Aug 20 23:59:24 EDT 2012

On 08/21/2012 09:03 AM, Mathieu Desnoyers wrote:
> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
>> On 08/20/2012 09:16 PM, Mathieu Desnoyers wrote:
>>> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
>>>> On 08/16/2012 05:31 AM, Mathieu Desnoyers wrote:
>>>>> This work is derived from the patch from Lai Jiangshan submitted as
>>>>> "urcu: new wfqueue implementation"
>>>>> (http://lists.lttng.org/pipermail/lttng-dev/2012-August/018379.html)
>>>>>
>>>>
>>>
>>> Hi Lai,
>>>
>>>> Hi, Mathieu
>>>>
>>>> please add this part to your patch.
>>>>
>>>> Changes(item5 has alternative):
>>>>
>>>> 1) Reorder the head and the tail in struct cds_wfq_queue
>>>>    Reason: wfq is enqueue-preference
>>>
>>> Is this a technical improvement over the original order ? Both cache
>>> lines are aligned, so it should not matter which one comes first. So I'm
>>> not sure why we should favor one order vs the other.
>>
>> struct astruct {
>> 	int foo; /* field offset = 0 */
>> 	int bar; /* field offset = 4 */
>> };
>>
>> Access to p->foo and access to p->bar are different in CPU instruction.
>> Access to p->bar will need to add a offset to p. (the addition may cost a very small time)
>> instructions will be probably longer than the instructions access to p->foo.
>>
>> If the code of access to p->bar is significant more than the code of access to p->foo,
>> we should swap p->bar and p->foo field's order to shorten the total instruction size
>> to reduce footprint.
>>
>> Our enqueue is inline function, when this inline function is expanded in every call site,
>> the code of access to the tail will much significant, so we need to put the tail at first.
>>
> 
> Very good point! I'm pulling this change too.
> 
>>
>>
>>>
>>>> 2) Reorder the code of ___cds_wfq_next_blocking()
>>>>    Reason: short code, better readability
>>>
>>> Yes, I agree with this change. Can you extract it into a separate patch ?
>>
>> You merge it into your patch which ___cds_wfq_next_blocking() is introduced.
>>
> 
> OK. done,
> 
>>>
>>>>
>>>> 3) Add cds_wfq_dequeue_[un]lock
>>>>    Reason: the fields of struct cds_wfq_queue are not exposed to users of lib,
>>>> 	but some APIs needs to have this lock held. Add this locking function
>>>> 	to allow the users use such APIs
>>>
>>> I agree with this change too. You can put it into the same patch as (2).
>>
>> You merge it into your patch.
> 
> OK, done too.
> 
> The current version (volatile branch) is at 
> 
> git://git.dorsal.polymtl.ca/~compudj/userspace-rcu
> branch: urcu/wfqueue-volatile
> 
> Thanks!
> 

We can merge all things into a single patch until the discussion are finished.
I don't want to have several commits in the git-tree that every
commit fix some thing of the previous commit.

When the big single patch is done, we can separate the patch properly.
It will help others to understand the code from git-log.

I will be very appreciate if one or two small patch(s) is committed with my name.

Thanks,
Lai.