[lttng-dev] Problem with UST related to dlload
Martin Ünsal
martinunsal at gmail.com
Wed Apr 30 17:57:39 EDT 2014
Incidentally I also asked for help on the GNU linker-specific part
(question 2) here:
http://gcc.gnu.org/ml/gcc-help/2014-04/msg00164.html
Martin
On Wed, Apr 30, 2014 at 2:21 PM, Martin Ünsal <martinunsal at gmail.com> wrote:
> Hi LTTng folks
>
> I have a strange problem using LTTng-UST on an ARM based platform. I have
> done some diagnosis but I am running low on ideas and was hoping for help
> from the experts. I am using lttng-tools 2.2.0, lttng-ust 2.2.0, liburcu
> 0.8.1. I know these are old but upgrading is easier said than done
> unfortunately. I didn't see anything related to this problem in relnotes,
> mailing list traffic, or master branch, but I could have missed something.
>
> The problem showed up when I switched from GCC 4.6.4 to 4.7.2. Conceptually,
> the situation is that I have a single executable, call it MyProgram, with
> two plugins loaded at runtime with dlopen(), lets call them libPlugin1.so
> and libPlugin2.so. There are three different LTTng-UST tracepoint providers,
> one each for the executable and the two plugins. With GCC 4.7.2, tracepoints
> in libPlugin1 stopped working. The tracepoints in MyProgram and in
> libPlugin2 continue to work correctly.
>
> I have established without a doubt that the toolchain upgrade is the cause
> of the regression.
>
> In the debugger, I confirmed that the tracepoint for libPlugin1.so is being
> executed, but __tracepoint_##provider##___##name.state is always 0 even when
> I enable the tracepoint in lttng-tools. As a result the tracepoint callback
> is not being invoked when it should be. In MyProgram and libPlugin2.so, the
> .state variable correctly reflects whether the tracepoint is enabled, and if
> the tracepoint is enabled, the tracepoint callback is invoked.
>
> Next I set a breakpoint in tracepoint_register_lib() and looked at
> tracepoints_start parameter.
>
> 1) With GCC 4.6.4 everything is as expected:
> a) tracepoint_register_lib() for MyProgram called with
> MyProgramProvider's __start___tracepoints_ptrs.
> b) tracepoint_register_lib() after libPlugin1 dlopen() called with
> libPlugin1Provider's __start___tracepoints_ptrs
> c) tracepoint_register_lib() after libPlugin2 dlopen() called with
> libPlugin2Provider's __start___tracepoint_ptrs
>
> 2) With GCC 4.7.2 there is a problem:
> a) tracepoint_register_lib() for MyProgram called with
> MyProgramProvider's __start___tracepoints_ptrs.
> b) tracepoint_register_lib() after libPlugin1 dlopen() called with
> MyProgramProvider's __start___tracepoints_ptrs (!!!! THIS IS WRONG !!!!)
> c) tracepoint_register_lib() after libPlugin2 dlopen() called with
> libPlugin2Provider's __start___tracepoint_ptrs
>
> I looked at the symbol table for libPlugin1.so to see if it would shed some
> light on the problem.
>
> 1) With GCC 4.6.4:
> # objdump -t /usr/lib/.debug/libPlugin1.so | grep __start___tracepoints_ptrs
> 00025bb0 l *ABS* 00000000 __start___tracepoints_ptrs
> # objdump -t /usr/lib/.debug/libPlugin2.so | grep __start___tracepoints_ptrs
> 00041eb4 l *ABS* 00000000 __start___tracepoints_ptrs
>
> 2) With GCC 4.7.2:
> # objdump -t /usr/lib/.debug/libPlugin1.so | grep __start___tracepoints_ptrs
> 00025a90 g __tracepoints_ptrs 00000000 __start___tracepoints_ptrs
> # objdump -t /usr/lib/.debug/libPlugin2.so | grep __start___tracepoints_ptrs
> 00041eb4 g __tracepoints_ptrs 00000000 __start___tracepoints_ptrs
>
> My hypothesis at this point is that since __start___tracepoints_ptrs changed
> from a local to a global symbol, the dynamic loader no longer knows how to
> select the correct weak symbol. I cannot explain why libPlugin2 still loads
> its provider correctly, perhaps it is just getting lucky.
>
> A few questions come to mind...
> 1) Have you run into a problem like this? Is there a known fix/workaround?
> 2) __start____tracepoints_ptrs is declared as extern in tracepoint.h, but it
> is not defined. This appears to be some sort of undocumented linker magic.
> http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html is the only reference I
> could find. Do you know where this behavior is documented or specified (if
> at all)?
> 3) Do you know why the symbol visibility for __start___tracepoints_ptrs
> changed between 4.6.4 to 4.7.2?
>
> Thanks for any help. This is a real puzzler for me.
>
> Martin
>
More information about the lttng-dev
mailing list