<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 9/13/2012 19:37 PM, Mathieu
Desnoyers wrote:<br>
</div>
<blockquote cite="mid:20120913113739.GA27777@Krystal" type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="MS Exchange Server version
6.5.7655.1">
<title>Re: [lttng-dev] Deaklock in liblttng-ust</title>
<!-- Converted from text/plain format -->
<p><font size="2">* Chang, Zheng (<a class="moz-txt-link-abbreviated" href="mailto:Zheng.Chang@emc.com">Zheng.Chang@emc.com</a>) wrote:<br>
> Hi,<br>
><br>
> <br>
><br>
> I built a trace.so as a wrapper of lttng-ust, which
predefines some<br>
> events and APIs based on lttng-ust.<br>
><br>
> And here is demo application linked to this share
library.<br>
><br>
> <br>
><br>
> Sometimes the demo hung at launch time. I did test with
the demo and<br>
> easy-ust of lttng-ust on both IA32 and X64 and got the
same result.<br>
><br>
> Lttng-ust version is 2.02.<br>
><br>
> <br>
><br>
> I collect some debuging info with gdb here:<br>
><br>
> <br>
><br>
> Parent process:<br>
><br>
> <br>
><br>
> gdb) thread 3 (constructor of liblttng-ust.so)<br>
><br>
> [Switching to thread 3 (Thread 0x7f91d5487950 (LWP
21901))]#0<br>
> 0x00007f91d82ad400 in wait () from /lib64/libpthread.so.0<br>
><br>
> (gdb) bt<br>
><br>
> #0 0x00007f91d82ad400 in wait () from
/lib64/libpthread.so.0<br>
><br>
> #1 0x00007f91d64a8c4b in wait_for_sessiond
(sock_info=0x7f91d66cc640)<br>
> at lttng-ust-comm.c:481<br>
><br>
> #2 0x00007f91d64a9545 in ust_listener_thread
(arg=<value optimized<br>
> out>) at lttng-ust-comm.c:669<br>
><br>
> #3 0x00007f91d82a5650 in start_thread () from
/lib64/libpthread.so.0<br>
><br>
> #4 0x00007f91d7b0315d in clone () from /lib64/libc.so.6<br>
><br>
> <br>
><br>
> (gdb) thread 5 (constructor of demo)<br>
><br>
> [Switching to thread 5 (Thread 0x7f91d690e950 (LWP
21899))]#0<br>
> 0x00007f91d82ac344 in __lll_lock_wait () from
/lib64/libpthread.so.0<br>
><br>
> (gdb) bt<br>
><br>
> #0 0x00007f91d82ac344 in __lll_lock_wait () from
/lib64/libpthread.so.0<br>
><br>
> #1 0x00007f91d82a72d0 in _L_lock_102 () from
/lib64/libpthread.so.0<br>
><br>
> #2 0x00007f91d82a6bbe in pthread_mutex_lock () from<br>
> /lib64/libpthread.so.0<br>
><br>
> #3 0x00007f91d64abf8b in ltt_probe_register
(desc=0x7f91d66d0ce0) at<br>
> ltt-probes.c:77<br>
><br>
> #4 0x00007f91d66e0cfb in
__lttng_events_init__sample_component () at<br>
> /usr/include/lttng/ust-tracepoint-event.h:550<br>
><br>
> #5 0x00007f91d66e71b6 in __do_global_ctors_aux () from
mytrace.so<br>
><br>
> #6 0x00007f91d66e03c3 in _init () from mytrace.so<br>
><br>
> #7 0x00007f91d66df3d4 in ?? () from mytrace.so<br>
><br>
> #8 0x00007f91d94f78d8 in ?? () from
/lib64/ld-linux-x86-64.so.2<br>
><br>
> #9 0x00007f91d94f7a07 in ?? () from
/lib64/ld-linux-x86-64.so.2<br>
><br>
> #10 0x00007f91d94fbbde in ?? () from
/lib64/ld-linux-x86-64.so.2<br>
><br>
> #11 0x00007f91d94f7566 in ?? () from
/lib64/ld-linux-x86-64.so.2<br>
><br>
> #12 0x00007f91d94fb38b in ?? () from
/lib64/ld-linux-x86-64.so.2<br>
><br>
> #13 0x00007f91d84bbf9b in ?? () from /lib64/libdl.so.2<br>
><br>
> #14 0x00007f91d94f7566 in ?? () from
/lib64/ld-linux-x86-64.so.2<br>
><br>
> #15 0x00007f91d84bc34c in ?? () from /lib64/libdl.so.2<br>
><br>
> #16 0x00007f91d84bbf01 in dlopen () from
/lib64/libdl.so.2<br>
><br>
> ..............................................<br>
><br>
> #24 0x00007f91d82a5650 in start_thread () from
/lib64/libpthread.so.0<br>
><br>
> #25 0x00007f91d7b0315d in clone () from /lib64/libc.so.6<br>
><br>
> <br>
><br>
> (gdb) p sessions_mutex<br>
><br>
> $4 = {__data = {__lock = 2, __count = 0, __owner = 21901,
__nusers = 1,<br>
> __kind = 0, __spins = 0, __list = {__prev = 0x0, __next =
0x0}},<br>
><br>
> __size =
"\002\000\000\000\000\000\000\000\215U\000\000\001", '\0'<br>
> <repeats 26 times>, __align = 2}<br>
><br>
> <br>
><br>
> Child process:<br>
><br>
> <br>
><br>
> (gdb) info thread (forked from 21901)<br>
><br>
> * 1 Thread 0x7f91d5487950 (LWP 21902) 0x00007f91d82ac344
in<br>
> __lll_lock_wait () from /lib64/libpthread.so.0<br>
><br>
> (gdb) bt<br>
><br>
> #0 0x00007f91d82ac344 in __lll_lock_wait () from
/lib64/libpthread.so.0<br>
><br>
> #1 0x00007f91d82a72d0 in _L_lock_102 () from
/lib64/libpthread.so.0<br>
><br>
> #2 0x00007f91d82a6bbe in pthread_mutex_lock () from<br>
> /lib64/libpthread.so.0<br>
><br>
> #3 0x00007f91d64abe58 in ltt_probe_unregister
(desc=0x7f91d66d0ce0) at<br>
> ltt-probes.c:129<br>
><br>
> #4 0x00007f91d66e0d10 in
__lttng_events_exit__sample_component () at<br>
> /usr/include/lttng/ust-tracepoint-event.h:557<br>
><br>
> #5 0x00007f91d66e07cf in __do_global_dtors_aux () from
mytrace.so<br>
><br>
> #6 0x00007f91d66cd664 in global_apps () from
/usr/lib/liblttng-ust.so.0<br>
><br>
> #7 0x00007f91d5479df0 in ?? ()<br>
><br>
> #8 0x00007f91d66e71dd in _real_fini () from mytrace.so<br>
><br>
> #9 0x00007f91d66e71d2 in _fini () from mytrace.so<br>
><br>
> #10 0x00007f91d94f7f54 in ?? () from
/lib64/ld-linux-x86-64.so.2<br>
><br>
> #11 0x00007f91d7a652ed in exit () from /lib64/libc.so.6<br>
><br>
> #12 0x00007f91d64a9207 in wait_for_sessiond
(sock_info=0x7f91d66cc640)<br>
> at lttng-ust-comm.c:542<br>
><br>
> #13 0x00007f91d64a9545 in ust_listener_thread
(arg=<value optimized<br>
> out>) at lttng-ust-comm.c:669<br>
><br>
> #14 0x00007f91d82a5650 in start_thread () from
/lib64/libpthread.so.0<br>
><br>
> #15 0x00007f91d7b0315d in clone () from /lib64/libc.so.6<br>
><br>
> #16 0x0000000000000000 in ?? ()<br>
><br>
> (gdb) p sessions_mutex<br>
><br>
> $1 = {__data = {__lock = 2, __count = 0, __owner = 21901,
__nusers = 1,<br>
> __kind = 0, __spins = 0, __list = {__prev = 0x0, __next =
0x0}},<br>
><br>
> __size =
"\002\000\000\000\000\000\000\000\215U\000\000\001", '\0'<br>
> <repeats 26 times>, __align = 2}<br>
><br>
> <br>
><br>
> What happened seems like this:<br>
><br>
> <br>
><br>
> lttng_ust_init<br>
><br>
> |<br>
><br>
> |-> ust_listener_thread<br>
><br>
> |<br>
><br>
> |->
wait_for_sessiond<br>
><br>
>
|<br>
><br>
>
|-><br>
> ust_lock<br>
> (1)<br>
><br>
>
|-><br>
> get_map_shm<br>
><br>
>
|<br>
> |<br>
><br>
>
|<br>
> |-> get_wait_shm<br>
><br>
>
|<br>
> |<br>
><br>
>
|<br>
> |-> fork<br>
><br>
>
|<br>
> parent -> wait<br>
> (2)<br>
><br>
>
|<br>
> child -> exit -> _fini -> __do_global_dtors_aux
-> ...... -><br>
> ltt_probe_unregister -> ust_lock (3)<br>
><br>
>
|-><br>
> ust_unlock<br>
> (4)<br>
><br>
> <br>
><br>
> Deadlock happened at point (1) and (3). Parent waited for
child's<br>
> termination and child waited for parent to release the
lock.<br>
><br>
> <br>
><br>
> Reproduction conditions:<br>
><br>
> - First time to create share memory<br>
> (/dev/shm/lttng-ust-apps-wait* don't exist)<br>
><br>
> - Child process got delayed( I'm not quite sure
with this, I<br>
> used gdb to hold child process for a while and it
happened either)<br>
><br>
> <br>
><br>
> In normal case, child process didn't call _fini when it
exited so that<br>
> no deadlock happened.<br>
><br>
> <br>
><br>
> Is this a known issue?<br>
<br>
Yes, see:<br>
<br>
commit 3b8b68e73ec9b2b3cf550048046d3f7f69050688<br>
Author: Mathieu Desnoyers
<a class="moz-txt-link-rfc2396E" href="mailto:mathieu.desnoyers@efficios.com"><mathieu.desnoyers@efficios.com></a><br>
Date: Wed Jun 13 04:26:34 2012 -0400<br>
<br>
Fix: liblttng-ust-fork deadlock<br>
<br>
* Mathieu Desnoyers (<a class="moz-txt-link-abbreviated" href="mailto:mathieu.desnoyers@efficios.com">mathieu.desnoyers@efficios.com</a>)
wrote:<br>
> * Burton, Michael (<a class="moz-txt-link-abbreviated" href="mailto:mburton@ciena.com">mburton@ciena.com</a>) wrote:<br>
> > Mathieu,<br>
> ><br>
> > I think there is a deadlock scenario in UST,
which has been causing my p<br>
><br>
> Good catch !<br>
><br>
> ><br>
> > sessiond is started as root:<br>
> > - creates global sockets ONLY<br>
> > - DOES NOT CREATE shm in
$HOME/lttng-ust-wait-<uid><br>
> ><br>
> > application linked against ust is run as root:<br>
> > - in lttng_ust_init constructor<br>
> > - ust_listener_thread (local_apps)<br>
> > - fails to connect to local_apps in
$HOME/.lttng (as expected)<br>
> > - prev_connect_failed=1<br>
> > - ust_unlock()<br>
> > - restart<br>
> > - wait_for_sessiond()<br>
> > --> - ust_lock()<br>
> > | - get_map_shm()<br>
> > | - get_wait_shm()<br>
> > DEADLOCK - shm_open() FAILS (not created by
sessiond when run by root<br>
> > | - fork() (trying to create shared
memory itself)<br>
> > | - ust_before_fork()<br>
> > ------------> - ust_lock()<br>
> ><br>
> ><br>
> > You should be able to create this with an empty
main, with no<br>
> > tracepoints. As long as sessiond is started as
root so<br>
> > $HOME/lttng-ust-wait-<uid> is not
created. You can also make the<br>
> > lttng-ust constructor (lttng_ust_init) wait
forever and then you'll be<br>
> > able to see the deadlock in gdb without even
leaving the<br>
> > lttng_ust_init constructor.<br>
><br>
> Ah, I see. This deadlock is caused by the interaction
between<br>
> [ liblttng-ust-fork ] and liblttng-ust (the fork
override is<br>
> performed by [ liblttng-ust-fork ]).<br>
<br>
This can be reproduced easily with the in-tree tests: by
removing the<br>
lttng-ust-apps-wait* files belonging to the user in
/dev/shm, running<br>
the "tests/fork" test (with ./run) hangs. If we run
"hello" first, and<br>
then the fork test, it works fine.<br>
<br>
Fixing this by keeping a nesting counter around the fork()
call, so we<br>
return immediately from the pre/post fork handlers if they
are<br>
overridden by liblttng-ust-fork.<br>
<br>
Reported-by: Michael Burton <a class="moz-txt-link-rfc2396E" href="mailto:mburton@ciena.com"><mburton@ciena.com></a><br>
Signed-off-by: Mathieu Desnoyers
<a class="moz-txt-link-rfc2396E" href="mailto:mathieu.desnoyers@efficios.com"><mathieu.desnoyers@efficios.com></a><br>
<br>
</font></p>
</blockquote>
Mathieu,<br>
<br>
It's a little different from the issue of Michael.<br>
In my case, the secondary ust_lock is blocked at the destructor of
the demo library, which is triggerred by the exit of child process.<br>
<font size="2"><br>
> #2 0x00007f91d82a6bbe in pthread_mutex_lock () from
/lib64/libpthread.so.0<br>
> #3 0x00007f91d64abe58 in ltt_probe_unregister
(desc=0x7f91d66d0ce0) at ltt-probes.c:129</font><br>
<br>
We can see ltt_probe_unregister calls ust_lock in the destructor. <br>
And here is the setion .dtors of the demo library:<br>
<br>
objdump --section .dtors -S mytrace.so<br>
0002fc98 <__DTOR_LIST__>:<br>
2fc98: ff ff ff ff 00 00 00 00 40 f0 00 00 d0 f0 00 00
........@.......<br>
0002fca8 <__DTOR_END__>:<br>
2fca8: 00 00 00 00
....<br>
0000f040 t __lttng_events_exit__lttng_ust
<ltt_probe_unregister is called by this vector><br>
0000f0d0 t __tracepoints__destroy<br>
<br>
The child process calls _fini when it calls API exit. It gets hung
and meanwhile the parent is waiting for its termination.<br>
I think the whole life-cycle of the process should be considered.
The parent's waiting in critical region is dangerous. <br>
Is it possible to refine the critical region with smaller fineness?<br>
<br>
What do you think?<br>
<br>
BR<br>
-Zheng<br>
<blockquote cite="mid:20120913113739.GA27777@Krystal" type="cite">
<p><font size="2">
<br>
You should upgrade your lttng-ust to latest 2.0.x.<br>
<br>
Thanks,<br>
<br>
Mathieu<br>
<br>
<br>
><br>
> <br>
><br>
> Thanks<br>
><br>
> -Zheng<br>
><br>
> <br>
><br>
<br>
> _______________________________________________<br>
> lttng-dev mailing list<br>
> <a class="moz-txt-link-abbreviated" href="mailto:lttng-dev@lists.lttng.org">lttng-dev@lists.lttng.org</a><br>
> <a moz-do-not-send="true"
href="http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev">http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev</a><br>
<br>
<br>
--<br>
Mathieu Desnoyers<br>
Operating System Efficiency R&D Consultant<br>
EfficiOS Inc.<br>
<a moz-do-not-send="true" href="http://www.efficios.com">http://www.efficios.com</a><br>
<br>
</font>
</p>
</blockquote>
<br>
<br>
</body>
</html>