[lttng-dev] Deaklock in liblttng-ust
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Thu Sep 13 07:37:39 EDT 2012
* Chang, Zheng (Zheng.Chang at emc.com) wrote:
> Hi,
>
>
>
> I built a trace.so as a wrapper of lttng-ust, which predefines some
> events and APIs based on lttng-ust.
>
> And here is demo application linked to this share library.
>
>
>
> Sometimes the demo hung at launch time. I did test with the demo and
> easy-ust of lttng-ust on both IA32 and X64 and got the same result.
>
> Lttng-ust version is 2.02.
>
>
>
> I collect some debuging info with gdb here:
>
>
>
> Parent process:
>
>
>
> gdb) thread 3 (constructor of liblttng-ust.so)
>
> [Switching to thread 3 (Thread 0x7f91d5487950 (LWP 21901))]#0
> 0x00007f91d82ad400 in wait () from /lib64/libpthread.so.0
>
> (gdb) bt
>
> #0 0x00007f91d82ad400 in wait () from /lib64/libpthread.so.0
>
> #1 0x00007f91d64a8c4b in wait_for_sessiond (sock_info=0x7f91d66cc640)
> at lttng-ust-comm.c:481
>
> #2 0x00007f91d64a9545 in ust_listener_thread (arg=<value optimized
> out>) at lttng-ust-comm.c:669
>
> #3 0x00007f91d82a5650 in start_thread () from /lib64/libpthread.so.0
>
> #4 0x00007f91d7b0315d in clone () from /lib64/libc.so.6
>
>
>
> (gdb) thread 5 (constructor of demo)
>
> [Switching to thread 5 (Thread 0x7f91d690e950 (LWP 21899))]#0
> 0x00007f91d82ac344 in __lll_lock_wait () from /lib64/libpthread.so.0
>
> (gdb) bt
>
> #0 0x00007f91d82ac344 in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1 0x00007f91d82a72d0 in _L_lock_102 () from /lib64/libpthread.so.0
>
> #2 0x00007f91d82a6bbe in pthread_mutex_lock () from
> /lib64/libpthread.so.0
>
> #3 0x00007f91d64abf8b in ltt_probe_register (desc=0x7f91d66d0ce0) at
> ltt-probes.c:77
>
> #4 0x00007f91d66e0cfb in __lttng_events_init__sample_component () at
> /usr/include/lttng/ust-tracepoint-event.h:550
>
> #5 0x00007f91d66e71b6 in __do_global_ctors_aux () from mytrace.so
>
> #6 0x00007f91d66e03c3 in _init () from mytrace.so
>
> #7 0x00007f91d66df3d4 in ?? () from mytrace.so
>
> #8 0x00007f91d94f78d8 in ?? () from /lib64/ld-linux-x86-64.so.2
>
> #9 0x00007f91d94f7a07 in ?? () from /lib64/ld-linux-x86-64.so.2
>
> #10 0x00007f91d94fbbde in ?? () from /lib64/ld-linux-x86-64.so.2
>
> #11 0x00007f91d94f7566 in ?? () from /lib64/ld-linux-x86-64.so.2
>
> #12 0x00007f91d94fb38b in ?? () from /lib64/ld-linux-x86-64.so.2
>
> #13 0x00007f91d84bbf9b in ?? () from /lib64/libdl.so.2
>
> #14 0x00007f91d94f7566 in ?? () from /lib64/ld-linux-x86-64.so.2
>
> #15 0x00007f91d84bc34c in ?? () from /lib64/libdl.so.2
>
> #16 0x00007f91d84bbf01 in dlopen () from /lib64/libdl.so.2
>
> ..............................................
>
> #24 0x00007f91d82a5650 in start_thread () from /lib64/libpthread.so.0
>
> #25 0x00007f91d7b0315d in clone () from /lib64/libc.so.6
>
>
>
> (gdb) p sessions_mutex
>
> $4 = {__data = {__lock = 2, __count = 0, __owner = 21901, __nusers = 1,
> __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>
> __size = "\002\000\000\000\000\000\000\000\215U\000\000\001", '\0'
> <repeats 26 times>, __align = 2}
>
>
>
> Child process:
>
>
>
> (gdb) info thread (forked from 21901)
>
> * 1 Thread 0x7f91d5487950 (LWP 21902) 0x00007f91d82ac344 in
> __lll_lock_wait () from /lib64/libpthread.so.0
>
> (gdb) bt
>
> #0 0x00007f91d82ac344 in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1 0x00007f91d82a72d0 in _L_lock_102 () from /lib64/libpthread.so.0
>
> #2 0x00007f91d82a6bbe in pthread_mutex_lock () from
> /lib64/libpthread.so.0
>
> #3 0x00007f91d64abe58 in ltt_probe_unregister (desc=0x7f91d66d0ce0) at
> ltt-probes.c:129
>
> #4 0x00007f91d66e0d10 in __lttng_events_exit__sample_component () at
> /usr/include/lttng/ust-tracepoint-event.h:557
>
> #5 0x00007f91d66e07cf in __do_global_dtors_aux () from mytrace.so
>
> #6 0x00007f91d66cd664 in global_apps () from /usr/lib/liblttng-ust.so.0
>
> #7 0x00007f91d5479df0 in ?? ()
>
> #8 0x00007f91d66e71dd in _real_fini () from mytrace.so
>
> #9 0x00007f91d66e71d2 in _fini () from mytrace.so
>
> #10 0x00007f91d94f7f54 in ?? () from /lib64/ld-linux-x86-64.so.2
>
> #11 0x00007f91d7a652ed in exit () from /lib64/libc.so.6
>
> #12 0x00007f91d64a9207 in wait_for_sessiond (sock_info=0x7f91d66cc640)
> at lttng-ust-comm.c:542
>
> #13 0x00007f91d64a9545 in ust_listener_thread (arg=<value optimized
> out>) at lttng-ust-comm.c:669
>
> #14 0x00007f91d82a5650 in start_thread () from /lib64/libpthread.so.0
>
> #15 0x00007f91d7b0315d in clone () from /lib64/libc.so.6
>
> #16 0x0000000000000000 in ?? ()
>
> (gdb) p sessions_mutex
>
> $1 = {__data = {__lock = 2, __count = 0, __owner = 21901, __nusers = 1,
> __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>
> __size = "\002\000\000\000\000\000\000\000\215U\000\000\001", '\0'
> <repeats 26 times>, __align = 2}
>
>
>
> What happened seems like this:
>
>
>
> lttng_ust_init
>
> |
>
> |-> ust_listener_thread
>
> |
>
> |-> wait_for_sessiond
>
> |
>
> |->
> ust_lock
> (1)
>
> |->
> get_map_shm
>
> |
> |
>
> |
> |-> get_wait_shm
>
> |
> |
>
> |
> |-> fork
>
> |
> parent -> wait
> (2)
>
> |
> child -> exit -> _fini -> __do_global_dtors_aux -> ...... ->
> ltt_probe_unregister -> ust_lock (3)
>
> |->
> ust_unlock
> (4)
>
>
>
> Deadlock happened at point (1) and (3). Parent waited for child's
> termination and child waited for parent to release the lock.
>
>
>
> Reproduction conditions:
>
> - First time to create share memory
> (/dev/shm/lttng-ust-apps-wait* don't exist)
>
> - Child process got delayed( I'm not quite sure with this, I
> used gdb to hold child process for a while and it happened either)
>
>
>
> In normal case, child process didn't call _fini when it exited so that
> no deadlock happened.
>
>
>
> Is this a known issue?
Yes, see:
commit 3b8b68e73ec9b2b3cf550048046d3f7f69050688
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date: Wed Jun 13 04:26:34 2012 -0400
Fix: liblttng-ust-fork deadlock
* Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> * Burton, Michael (mburton at ciena.com) wrote:
> > Mathieu,
> >
> > I think there is a deadlock scenario in UST, which has been causing my p
>
> Good catch !
>
> >
> > sessiond is started as root:
> > - creates global sockets ONLY
> > - DOES NOT CREATE shm in $HOME/lttng-ust-wait-<uid>
> >
> > application linked against ust is run as root:
> > - in lttng_ust_init constructor
> > - ust_listener_thread (local_apps)
> > - fails to connect to local_apps in $HOME/.lttng (as expected)
> > - prev_connect_failed=1
> > - ust_unlock()
> > - restart
> > - wait_for_sessiond()
> > --> - ust_lock()
> > | - get_map_shm()
> > | - get_wait_shm()
> > DEADLOCK - shm_open() FAILS (not created by sessiond when run by root
> > | - fork() (trying to create shared memory itself)
> > | - ust_before_fork()
> > ------------> - ust_lock()
> >
> >
> > You should be able to create this with an empty main, with no
> > tracepoints. As long as sessiond is started as root so
> > $HOME/lttng-ust-wait-<uid> is not created. You can also make the
> > lttng-ust constructor (lttng_ust_init) wait forever and then you'll be
> > able to see the deadlock in gdb without even leaving the
> > lttng_ust_init constructor.
>
> Ah, I see. This deadlock is caused by the interaction between
> [ liblttng-ust-fork ] and liblttng-ust (the fork override is
> performed by [ liblttng-ust-fork ]).
This can be reproduced easily with the in-tree tests: by removing the
lttng-ust-apps-wait* files belonging to the user in /dev/shm, running
the "tests/fork" test (with ./run) hangs. If we run "hello" first, and
then the fork test, it works fine.
Fixing this by keeping a nesting counter around the fork() call, so we
return immediately from the pre/post fork handlers if they are
overridden by liblttng-ust-fork.
Reported-by: Michael Burton <mburton at ciena.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
You should upgrade your lttng-ust to latest 2.0.x.
Thanks,
Mathieu
>
>
>
> Thanks
>
> -Zheng
>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list