[lttng-dev] urcu workqueue thread uses 99% of cpu while workqueue is empty

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Tue Jun 14 11:53:16 EDT 2022


----- On Jun 13, 2022, at 11:55 PM, Minlan Wang wangminlan at szsandstone.com wrote:

> Hi, Mathieu,
>	We are running a CentOS 8.2 os on Intel(R) Xeon(R) CPU E5-2630 v4,
> and using the workqueue interfaces in src/workqueue.h in
> userspace-rcu-latest-0.12.tar.bz2.

Also, I notice that you appear to be using an internal liburcu API (not public)
from outside of the liburcu project, which is not really expected.

If your process forks without exec, make sure you wire up the equivalent of
rculfhash pthread_atfork functions which call urcu_workqueue_pause_worker(),
urcu_workqueue_resume_worker() and urcu_workqueue_create_worker().

Also, can you validate of you have many workqueue worker threads trying to
dequeue from the same workqueue in parallel ? This is unsupported and would
cause the kind of issues you are observing here.

Thanks,

Mathieu

>	Recently, we found the workqueue thread rushes cpu into 99% usage.
> After some debuging, we found that the futex in struct urcu_workqueue got
> into very big negative value, e.g, -12484; while the qlen, cbs_tail, and
> cbs_head suggest that the workqueue is empty.
> We add a watchpoint of workqueue->futex in workqueue_thread(), and got this
> log when workqueue->futex first get into -2:
> ...
> Old value = -1
> New value = 0
> 0x00007ffff37c1d6d in futex_wake_up (futex=0x55555f74aa40) at workqueue.c:160
> 160     in workqueue.c
> #0  0x00007ffff37c1d6d in futex_wake_up (futex=0x55555f74aa40) at
> workqueue.c:160
> #1  0x00007ffff37c2737 in wake_worker_thread (workqueue=0x55555f74aa00) at
> workqueue.c:324
> #2  0x00007ffff37c29fb in urcu_workqueue_queue_work (workqueue=0x55555f74aa00,
> work=0x555566e05e00, func=0x7ffff7523c90 <write_dirty_finish>) at
> workqueue.c:3
> 67
> #3  0x00007ffff752c520 in aio_complete_cb (ctx=<optimized out>,
> iocb=<optimized out>, res=<optimized out>, res2=<optimized out>) at
> bio/aio_bio_adapter.c:152
> #4  0x00007ffff752c696 in poll_io_complete (arg=0x555562e4f4a0) at
> bio/aio_bio_adapter.c:289
> #5  0x00007ffff72e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #6  0x00007ffff415d96d in clone () from /usr/lib64/libc.so.6
> [Switching to Thread 0x7fffde3f3700 (LWP 821768)]
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = 0
> New value = -1
> 0x00007ffff37c2473 in __uatomic_dec (len=4, addr=0x55555f74aa40) at
> ../include/urcu/uatomic.h:490
> 490     ../include/urcu/uatomic.h: No such file or directory.
> #0  0x00007ffff37c2473 in __uatomic_dec (len=4, addr=0x55555f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x55555f74aa00) at workqueue.c:250
> #2  0x00007ffff72e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x00007ffff415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = -1
> New value = -2
> 0x00007ffff37c2473 in __uatomic_dec (len=4, addr=0x55555f74aa40) at
> ../include/urcu/uatomic.h:490
> 490     in ../include/urcu/uatomic.h
> #0  0x00007ffff37c2473 in __uatomic_dec (len=4, addr=0x55555f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x55555f74aa00) at workqueue.c:250
> #2  0x00007ffff72e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x00007ffff415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> 
> Old value = -2
> New value = -3
> 0x00007ffff37c2473 in __uatomic_dec (len=4, addr=0x55555f74aa40) at
> ../include/urcu/uatomic.h:490
> 490     in ../include/urcu/uatomic.h
> #0  0x00007ffff37c2473 in __uatomic_dec (len=4, addr=0x55555f74aa40) at
> ../include/urcu/uatomic.h:490
> #1  workqueue_thread (arg=0x55555f74aa00) at workqueue.c:250
> #2  0x00007ffff72e6ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x00007ffff415d96d in clone () from /usr/lib64/libc.so.6
> Hardware watchpoint 4: -location workqueue->futex
> ...
> 
> After this, things went into wild, workqueue->futex got into bigger negative
> value, and workqueue thread eat up the cpu it is using.
> This ends only when workqueue->futex down flew into 0.
> 
> Do you have any idea why this is happening, and how to fix it?
> 
> B.R
> Minlan Wang

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


More information about the lttng-dev mailing list