[lttng-dev] some troubles having lttgng-ust actually log something on a shared library
Mathieu Desnoyers
compudj at krystal.dyndns.org
Thu Dec 22 15:33:54 EST 2011
* Mathieu Desnoyers (compudj at krystal.dyndns.org) wrote:
> Some more info about the libuuid issue:
>
> Here is my stack when the program hangs. It turns out that we trigger a
> deadlock between a mutex in ld within the guts of libc and the mutex I
> use to protect operations touching UST data structures:
>
> thread 1:
> - _dl_open first takes the
> __rtld_lock_lock_recursive (GL(dl_load_lock))
> - then, within tracepoint_register_lib, we take the UST lock.
>
[...]
> thread 3: first takes the UST lock in handle_message, then
> tls_get_addr_tail() takes __rtld_lock_lock_recursive
> (GL(dl_load_lock));. This only occurs when libuuid is loaded from a
> dlopen() dependency, as the case where it is linked with the application
> (-luuid). From
> http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html and
> http://people.redhat.com/drepper/tls.pdf , it appears that in the
> dlopen() case, the TLS are fixed lazily when used. If a lock is taken in
> the fixup, this can lead to interesting deadlocks.
>
> In addition, even though it is for Solaris, the following page seems to
> document well the difference between TLS allocation at program startup
> and at dynamic linking (dlopen):
> http://docs.oracle.com/cd/E23824_01/html/819-0690/gentextid-22601.html#scrolltoc
I've now splitted liblttng-ust into two libs: liblttng-ust-tracepoint
(dlopen'd by the application to register the tracepoints), and
liblttng-ust (which is used by the tracepoint probes, and talks to
lttng-sessiond).
I've updated the README file in lttng-ust to specify that the tracepoint
probes should not be dlopen()'d. dlopen has just too many nasty
side-effects with TLS (grabbing a lock also taken in constructors is
a major one). Please use LD_PRELOAD or link the application directly
with the tracepoint probe .so. Even if I change the locking scheme
within liblttng-ust to play nicely with the dynamic linker mutex, the
whole thing would still be fragile due to the possibility to hit that
mutex from an active tracepoint (which could nest in various other
mutexes).
Note that you can still dlopen() the file that is instrumented with
tracepoints (the file where you define e.g. TRACEPOINT_DEFINE and add
your tracepoint() statements). It's only the probe (with
TRACEPOINT_CREATE_PROBES defined) that cannot be dlopen'd.
Best regards,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list