[lttng-dev] Problem with UST related to dlload

Woegerer, Paul Paul_Woegerer at mentor.com
Tue May 27 10:58:09 EDT 2014


On 05/27/2014 04:41 PM, Gerlando Falauto wrote:
> Hi Paul,
> 
> thank you very much for sharing this.
> 
> I had in the meantime run into the same suggestion by
> Henrik Wallin on a thread opened by Martin
> (https://gcc.gnu.org/ml/gcc-help/2014-05/msg00028.html).
> Further updates from Martin also suggest the issue is rather related to
> the OpenEmbedded toolchain.
> 
> I was about to post the "opposite" of your patch, as I don't see the
> need to have those symbols as weak instead. In the end, doesn't weak
> only allow for a further re-definition? In this case we're only
> declaring it as extern, aren't we?
> Definition actually happens by magic, as far as I can tell.
> But please correct me if I got it all wrong.

It's more complicated.

You absolutely need those symbol to be declared as:

    .weak   __start___tracepoints_ptrs
    .weak   __stop___tracepoints_ptrs

*and*

    .hidden __start___tracepoints_ptrs
    .hidden __stop___tracepoints_ptrs

The reason is that you can have the same tracepoint provider be USED in
several compilation units that will all become part of one and the same
shared object (or executable).

Then all those __start/stop___tracepoints_ptrs references in different
compilation units should refer to the same
__start/stop___tracepoints_ptrs definitions for the shared object (or
executable) they are part of. This is required because the
initialization of the tracepoints will only happen once per shared
object (or executable) with the static ctor mechanism also defined in
tracepoint.h

HTH,
Paul

> 
> Thank you,
> Gerlando
> 
> On 05/27/2014 04:32 PM, Woegerer, Paul wrote:
>> Hi Martin, Hi Gerlando,
>>
>> this sounds a lot like the compiler bug I found recently in Yocto 1.6
>> (reproducible on ARM, x86 and PPC)
>>
>> The problem in my case is that the Yocto generated GCC cross-compiler
>> translates:
>>
>> extern struct tracepoint * const __start___tracepoints_ptrs[]
>>      __attribute__((weak, visibility("hidden")));
>> extern struct tracepoint * const __stop___tracepoints_ptrs[]
>>      __attribute__((weak, visibility("hidden")));
>>
>> incorrectly to assembly. For these symbols that are declared with
>>
>> __attribute__((weak, visibility("hidden")));
>>
>> that are also defined to be external, in the assembly the following
>> lines are missing:
>>
>> .hidden __stop___tracepoints_ptrs
>> .hidden __start___tracepoints_ptrs
>>
>> This causes __stop___tracepoints_ptrs and __start___tracepoints_ptrs
>> to be further treated as ordinary weak symbols instead of
>> per-shared-object weak symbols.
>> That further will cause  the linker to resolve any such symbols with
>> the first definition of those symbols that it can see (it will not
>> constrain itself to only consider definitions from within the same
>> shared object). The net result is that only one tracepoint provider
>> gets activated (the first one the linker sees) instead of all the
>> tracepoint providers used in various source files.
>>
>> To fix this I use the following lttng-ust workaround (for now):
>>
>> diff --git a/include/lttng/tracepoint.h b/include/lttng/tracepoint.h
>> index 66e2abd..50cef26 100644
>> --- a/include/lttng/tracepoint.h
>> +++ b/include/lttng/tracepoint.h
>> @@ -313,9 +313,11 @@ __tracepoints__destroy(void)
>>    * (or for the whole main program).
>>    */
>>   extern struct tracepoint * const __start___tracepoints_ptrs[]
>> -       __attribute__((weak, visibility("hidden")));
>> +       __attribute__((weak));
>> +asm(".hidden __start___tracepoints_ptrs");
>>   extern struct tracepoint * const __stop___tracepoints_ptrs[]
>> -       __attribute__((weak, visibility("hidden")));
>> +       __attribute__((weak));
>> +asm(".hidden __stop___tracepoints_ptrs");
>>
>>   /*
>>    * When TRACEPOINT_PROBE_DYNAMIC_LINKAGE is defined, we do not emit a
>>
>>
>> Note that this issue is not reproducible with my GCC on host:
>> gcc version 4.8.1 20130909 [gcc-4_8-branch revision 202388] (SUSE Linux)
>> and also not with the latest Codebench 2014.05 ARM-Linux cross-toolchain.
>>
>> -- 
>> Best,
>> Paul
>>
>> On 05/27/2014 01:55 PM, Gerlando Falauto wrote:
>>> Hi Martin,
>>>
>>> I have been struggling for a while with this issue (see the whole
>>> thread):
>>>
>>> http://lists.lttng.org/pipermail/lttng-dev/2014-May/023035.html
>>>
>>> and landed on the same conclusions as yours (found your message by
>>> searching for __start___tracepoints_ptr!).
>>> So at least you're not alone!
>>>
>>> So, did you ever manage to get any of your questions answered:
>>>
>>>>> 1) Have you run into a problem like this? Is there a known
>>> fix/workaround?
>>>>> 2) __start____tracepoints_ptrs is declared as extern in tracepoint.h,
>>> but it
>>>>> is not defined. This appears to be some sort of undocumented linker
>>> magic.
>>>>> http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html is the only
>>> reference I
>>>>> could find. Do you know where this behavior is documented or
>>> specified (if
>>>>> at all)?
>>>>> 3) Do you know why the symbol visibility for
>>>>> __start___tracepoints_ptrs
>>>>> changed between 4.6.4 to 4.7.2?
>>>
>>> Thank you so much!
>>> Gerlando
>>>
>>> BTW, I'm also running GCC 4.7.2 (lttng-ust is cross-compiled, the test
>>> application is natively compiled).
>>>
>>> On an x86_64 host running either GCC 4.4.6 or 4.4.7, the issue is not
>>> observed.
>>>
>>>
>>> On 04/30/2014 11:57 PM, Martin Ünsal wrote:
>>>> Incidentally I also asked for help on the GNU linker-specific part
>>>> (question 2) here:
>>>>
>>>> http://gcc.gnu.org/ml/gcc-help/2014-04/msg00164.html
>>>>
>>>> Martin
>>>>
>>>>
>>>> On Wed, Apr 30, 2014 at 2:21 PM, Martin Ünsal <martinunsal at gmail.com>
>>>> wrote:
>>>>> Hi LTTng folks
>>>>>
>>>>> I have a strange problem using LTTng-UST on an ARM based platform. I
>>>>> have
>>>>> done some diagnosis but I am running low on ideas and was hoping for
>>>>> help
>>>>> from the experts. I am using lttng-tools 2.2.0, lttng-ust 2.2.0,
>>>>> liburcu
>>>>> 0.8.1. I know these are old but upgrading is easier said than done
>>>>> unfortunately. I didn't see anything related to this problem in
>>>>> relnotes,
>>>>> mailing list traffic, or master branch, but I could have missed
>>>>> something.
>>>>>
>>>>> The problem showed up when I switched from GCC 4.6.4 to 4.7.2.
>>>>> Conceptually,
>>>>> the situation is that I have a single executable, call it MyProgram,
>>>>> with
>>>>> two plugins loaded at runtime with dlopen(), lets call them
>>>>> libPlugin1.so
>>>>> and libPlugin2.so. There are three different LTTng-UST tracepoint
>>>>> providers,
>>>>> one each for the executable and the two plugins. With GCC 4.7.2,
>>>>> tracepoints
>>>>> in libPlugin1 stopped working. The tracepoints in MyProgram and in
>>>>> libPlugin2 continue to work correctly.
>>>>>
>>>>> I have established without a doubt that the toolchain upgrade is the
>>>>> cause
>>>>> of the regression.
>>>>>
>>>>> In the debugger, I confirmed that the tracepoint for libPlugin1.so is
>>>>> being
>>>>> executed, but __tracepoint_##provider##___##name.state is always 0
>>>>> even when
>>>>> I enable the tracepoint in lttng-tools. As a result the tracepoint
>>>>> callback
>>>>> is not being invoked when it should be. In MyProgram and
>>>>> libPlugin2.so, the
>>>>> .state variable correctly reflects whether the tracepoint is enabled,
>>>>> and if
>>>>> the tracepoint is enabled, the tracepoint callback is invoked.
>>>>>
>>>>> Next I set a breakpoint in tracepoint_register_lib() and looked at
>>>>> tracepoints_start parameter.
>>>>>
>>>>> 1) With GCC 4.6.4 everything is as expected:
>>>>>      a) tracepoint_register_lib() for MyProgram called with
>>>>> MyProgramProvider's __start___tracepoints_ptrs.
>>>>>      b) tracepoint_register_lib() after libPlugin1 dlopen() called
>>>>> with
>>>>> libPlugin1Provider's __start___tracepoints_ptrs
>>>>>      c) tracepoint_register_lib() after libPlugin2 dlopen() called
>>>>> with
>>>>> libPlugin2Provider's __start___tracepoint_ptrs
>>>>>
>>>>> 2) With GCC 4.7.2 there is a problem:
>>>>>      a) tracepoint_register_lib() for MyProgram called with
>>>>> MyProgramProvider's __start___tracepoints_ptrs.
>>>>>      b) tracepoint_register_lib() after libPlugin1 dlopen() called
>>>>> with
>>>>> MyProgramProvider's __start___tracepoints_ptrs (!!!! THIS IS WRONG
>>>>> !!!!)
>>>>>      c) tracepoint_register_lib() after libPlugin2 dlopen() called
>>>>> with
>>>>> libPlugin2Provider's __start___tracepoint_ptrs
>>>>>
>>>>> I looked at the symbol table for libPlugin1.so to see if it would
>>>>> shed some
>>>>> light on the problem.
>>>>>
>>>>> 1) With GCC 4.6.4:
>>>>> # objdump -t /usr/lib/.debug/libPlugin1.so | grep
>>>>> __start___tracepoints_ptrs
>>>>> 00025bb0 l       *ABS* 00000000 __start___tracepoints_ptrs
>>>>> # objdump -t /usr/lib/.debug/libPlugin2.so | grep
>>>>> __start___tracepoints_ptrs
>>>>> 00041eb4 l       *ABS* 00000000 __start___tracepoints_ptrs
>>>>>
>>>>> 2) With GCC 4.7.2:
>>>>> # objdump -t /usr/lib/.debug/libPlugin1.so | grep
>>>>> __start___tracepoints_ptrs
>>>>> 00025a90 g       __tracepoints_ptrs 00000000
>>>>> __start___tracepoints_ptrs
>>>>> # objdump -t /usr/lib/.debug/libPlugin2.so | grep
>>>>> __start___tracepoints_ptrs
>>>>> 00041eb4 g       __tracepoints_ptrs 00000000
>>>>> __start___tracepoints_ptrs
>>>>>
>>>>> My hypothesis at this point is that since __start___tracepoints_ptrs
>>>>> changed
>>>>> from a local to a global symbol, the dynamic loader no longer knows
>>>>> how to
>>>>> select the correct weak symbol. I cannot explain why libPlugin2 still
>>>>> loads
>>>>> its provider correctly, perhaps it is just getting lucky.
>>>>>
>>>>> A few questions come to mind...
>>>>> 1) Have you run into a problem like this? Is there a known
>>>>> fix/workaround?
>>>>> 2) __start____tracepoints_ptrs is declared as extern in tracepoint.h,
>>>>> but it
>>>>> is not defined. This appears to be some sort of undocumented linker
>>>>> magic.
>>>>> http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html is the only
>>>>> reference I
>>>>> could find. Do you know where this behavior is documented or
>>>>> specified (if
>>>>> at all)?
>>>>> 3) Do you know why the symbol visibility for
>>>>> __start___tracepoints_ptrs
>>>>> changed between 4.6.4 to 4.7.2?
>>>>>
>>>>> Thanks for any help. This is a real puzzler for me.
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>> _______________________________________________
>>>> lttng-dev mailing list
>>>> lttng-dev at lists.lttng.org
>>>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>>
>>>
>>>
>>> _______________________________________________
>>> lttng-dev mailing list
>>> lttng-dev at lists.lttng.org
>>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>>
> 


-- 
Paul Woegerer, SW Development Engineer
Sourcery Analyzer <http://go.mentor.com/sourceryanalyzer>
Mentor Graphics, Embedded Software Division



More information about the lttng-dev mailing list