[lttng-dev] ***UNCHECKED*** Re: Re: urcu workqueue thread uses 99% of cpu while workqueue is empty
Minlan Wang
wangminlan at szsandstone.com
Fri Jun 24 02:21:35 EDT 2022
On Thu, Jun 23, 2022 at 10:09:55AM -0400, Mathieu Desnoyers wrote:
>
> By looking at the trace, here is one problematic scenario which end up emitting "workqueue:futex_wait_again":
>
> [23:33:48.060581417] (+0.000006985) localhost.localdomain workqueue:futex_no_wait: { cpu_id = 4 }, { vpid = 19495, vtid = 21054 }, { futex = 0x5652A1035C40, val = 0 }
> [23:33:48.060581747] (+0.000000330) localhost.localdomain workqueue:futex_wake_up_success: { cpu_id = 6 }, { vpid = 19495, vtid = 20649 }, { futex = 0x5652A1035C40, val = -1 }
> [23:33:48.060581926] (+0.000000179) localhost.localdomain workqueue:futex_dec_success: { cpu_id = 4 }, { vpid = 19495, vtid = 21054 }, { futex = 0x5652A1035C40, old = 0, new = -1 }
> [23:33:48.060582826] (+0.000000900) localhost.localdomain syscall_entry_futex: { cpu_id = 6 }, { vpid = 19495, vtid = 20649, pid = 19495, tid = 20649 }, { uaddr = 94912888659008, op = 1, val = 1, utime = 0, uaddr2 = 0, val3 = 0 }
> [23:33:48.060582855] (+0.000000029) localhost.localdomain syscall_entry_futex: { cpu_id = 4 }, { vpid = 19495, vtid = 21054, pid = 19495, tid = 21054 }, { uaddr = 94912888659008, op = 0, val = 4294967295, utime = 0, uaddr2 = 0, val3 = 0 }
> [23:33:48.060584722] (+0.000001867) localhost.localdomain sched_stat_runtime: { cpu_id = 4 }, { vpid = 19495, vtid = 21054, pid = 19495, tid = 21054 }, { comm = "bcache_writebac", tid = 21054, runtime = 16033, vruntime = 96940983054 }
> [23:33:48.060585840] (+0.000001118) localhost.localdomain sched_switch: { cpu_id = 4 }, { vpid = 19495, vtid = 21054, pid = 19495, tid = 21054 }, { prev_comm = "bcache_writebac", prev_tid = 21054, prev_prio = 20, prev_state = 1, next_comm = "swapper/4", next_tid = 0, next_prio = 20 }
> [23:33:48.060587680] (+0.000001840) localhost.localdomain sched_stat_sleep: { cpu_id = 6 }, { vpid = 19495, vtid = 20649, pid = 19495, tid = 20649 }, { comm = "bcache_writebac", tid = 21054, delay = 2815 }
> [23:33:48.060588560] (+0.000000880) localhost.localdomain sched_wakeup: { cpu_id = 6 }, { vpid = 19495, vtid = 20649, pid = 19495, tid = 20649 }, { comm = "bcache_writebac", tid = 21054, prio = 20, success = 1, target_cpu = 4 }
> [23:33:48.060590437] (+0.000001877) localhost.localdomain syscall_exit_futex: { cpu_id = 4 }, { vpid = 19495, vtid = 21054, pid = 19495, tid = 21054 }, { ret = 0, uaddr = 94912888659008, uaddr2 = 0 }
> [23:33:48.060591337] (+0.000000900) localhost.localdomain syscall_exit_futex: { cpu_id = 6 }, { vpid = 19495, vtid = 20649, pid = 19495, tid = 20649 }, { ret = 1, uaddr = 94912888659008, uaddr2 = 0 }
> [23:33:48.060591385] (+0.000000048) localhost.localdomain workqueue:futex_wait_return: { cpu_id = 4 }, { vpid = 19495, vtid = 21054 }, { futex = 0x5652A1035C40, val = -1, ret = 0 }
> [23:33:48.060592205] (+0.000000820) localhost.localdomain workqueue:futex_wait_again: { cpu_id = 4 }, { vpid = 19495, vtid = 21054 }, { futex = 0x5652A1035C40, val = -1 }
>
> Where the wake_up happens right before the dec_success on the waiter, which
> leads to the sched_wakeup awakening the waiter when the state is 0.
Yes, this makes things clear, urcu_workqueue_queue_work did wake up waiters on
futex when the futex is -1.
>
> If we want to dig more into why this scenario can happen, we could also add tracepoints
> in urcu_workqueue_queue_work() just before/after cds_wfcq_enqueue(), and in
> workqueue_thread() around each call to cds_wfcq_empty(), and around __cds_wfcq_splice_blocking().
>
> I suspect we end up in a situation where:
>
> * waker:
>
> cds_wfcq_enqueue(&workqueue->cbs_head, &workqueue->cbs_tail, &work->next); [a]
> uatomic_inc(&workqueue->qlen);
> wake_worker_thread(workqueue); [b]
>
> vs
>
> * waiter:
> splice_ret = __cds_wfcq_splice_blocking(&cbs_tmp_head, [c]
> &cbs_tmp_tail, &workqueue->cbs_head, &workqueue->cbs_tail);
> [...]
> if (cds_wfcq_empty(&workqueue->cbs_head, [d]
> &workqueue->cbs_tail)) {
> futex_wait(&workqueue->futex); [e]
> uatomic_dec(&workqueue->futex); [f]
>
> happen in an order such that
>
> [a] enqueues an item. Then [c] splices the item out of the queue because it was already awakened
> by a prior wakeup. So the queue is observed as empty by [c]. Then [b] sets the futex to 0
> (in userspace), which causes [e] to skip waiting on the futex, and [f] brings the value back
> to -1. However, in the next loop, the queue is still observed as empty by [d], thus calling
> [e] again. This time, the value is -1, so it calls sys_futex FUTEX_WAIT. Unfortunately, we
> still have the call to sys_futex FUTEX_WAKE that will eventually be executed, thus awakening
> the futex while the futex state is still -1, which is unexpected.
Yes, i agree with you. We'll add the patch you mentioned into our code to fix this.
Thanks for your help.
Since cds_wfcq_enqueue/__cds_wfcq_splice_blocking do nothing to
workqueue->futex, add debug info to see if workqueue is empty is not so
neccessary to me.
Let me know if you need anything else on this.
Thanks, again.
Minlan
More information about the lttng-dev
mailing list