[lttng-dev] Userspace RCU: workqueue with batching, cheap wakeup, and work stealing
Lai Jiangshan
laijs at cn.fujitsu.com
Thu Oct 23 21:45:59 EDT 2014
On 10/23/2014 09:06 PM, Mathieu Desnoyers wrote:
> ----- Original Message -----
>> From: "Lai Jiangshan" <laijs at cn.fujitsu.com>
>> To: "Mathieu Desnoyers" <mathieu.desnoyers at efficios.com>
>> Cc: "Ben Maurer" <bmaurer at fb.com>, "lttng-dev" <lttng-dev at lists.lttng.org>, "Yannick Brosseau" <scientist at fb.com>,
>> "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>, "Stephen Hemminger" <stephen at networkplumber.org>
>> Sent: Thursday, October 23, 2014 12:26:58 AM
>> Subject: Re: Userspace RCU: workqueue with batching, cheap wakeup, and work stealing
>>
>> Hi, Mathieu
>
> Hi Lai,
>
>>
>> In the code, how this problem be solved?
>> "All the tasks may continue stealing works without any progress."
>
> Good point. It is not. I see two possible solutions for this:
>
> 1) We break the circular chain of siblings at the list head.
> It has the downside that work could accumulate at the first
> worker of the list, or
> 2) Whenever we grab work (either from the global queue or from
> a sibling), we could dequeue one element and keep it to ourself,
> out of reach of siblings. This would ensure progress, since
> worker stealing work would always keep one work element to itself,
> which would therefore ensure progress.
>
> I personally prefer (2), since it has fewer downsides.
>
>>
>> Why ___urcu_steal_work() needs cds_wfcq_dequeue_lock()?
>
> This lock is to protect operations on the sibling workqueue.
> Here are the various actors touching it:
>
> - The worker attempting to steal from the sibling (with splice-from),
> - The sibling itself, in urcu_dequeue_work(), with a dequeue
> operation,
>
> Looking at the wfcq synchronization table:
>
> * Synchronization table:
> *
> * External synchronization techniques described in the API below is
> * required between pairs marked with "X". No external synchronization
> * required between pairs marked with "-".
> *
> * Legend:
> * [1] cds_wfcq_enqueue
> * [2] __cds_wfcq_splice (destination queue)
> * [3] __cds_wfcq_dequeue
> * [4] __cds_wfcq_splice (source queue)
> * [5] __cds_wfcq_first
> * [6] __cds_wfcq_next
> *
> * [1] [2] [3] [4] [5] [6]
> * [1] - - - - - -
> * [2] - - - - - -
> * [3] - - X X X X
> * [4] - - X - X X
> * [5] - - X X - -
> * [6] - - X X - -
>
> We need a mutex between splice-from and dequeue since they
> are executed by different threads. I can replace this with
> a call to cds_wfcq_splice_blocking() instead, which takes the
> lock internally. That would be neater.
>
>>
>> It is not a good idea for a worker to call synchronize_rcu() (in
>> urcu_accept_work()).
>> A new work may be coming soon.
>
> The reason why the worker calls synchronize_rcu() there is to protect
> the waitqueue lfstack "pop" operation from ABA. It matches the RCU
> read-side critical sections encompassing urcu_dequeue_wake_single().
>
> Another way to achieve this would be to grab a mutex across calls to
> urcu_dequeue_wake_single(), but unfortunately this call needs to be
> done by the dispatcher thread, and I'm trying to keep a low-latency
> bias in favor of the dispatcher here. In this case, it is done at the
> expense of overhead in the worker when the worker is about to busy-wait
> and eventually block, so I thought the tradeoff was not so bad.
"single worker pattern" does be a frequent use-case.
you can image the lag and jitter in this case.
How about a mutex&rcu protected idle-workers?
idle-worker is wokenup via rcu-protected scheme.
and dequeued&requeued (in urcu_accept_work() or urcu_worker_unregister()) via mutex-protected scheme.
>
> Thoughts ?
>
> Thanks for the feedback!
>
> Mathieu
>
>>
>> thanks,
>> Lai
>>
>
More information about the lttng-dev
mailing list