[lttng-dev] Problem with UST related to dlload
Woegerer, Paul
Paul_Woegerer at mentor.com
Wed May 28 07:05:47 EDT 2014
On 05/28/2014 12:38 PM, Woegerer, Paul wrote:
> It would be interesting if
> this treatment of hidden symbols is standardized or if this is just an
> implementation-specific behavior of GNU ld.
Maybe this link contains the answer to that question:
http://docs.oracle.com/cd/E19683-01/816-7529/chapter6-79797/index.html
At the explanation of STV_HIDDEN it says: "A hidden symbol contained in
a relocatable object is either removed or converted to STB_LOCAL binding
by the link-editor when the relocatable object is included in an
executable file or shared object."
Could it be that "the potential conversion to an STB_LOCAL at link-time"
has exactly the effect of not requiring a definition at link-time ?
--
Paul
>
> thanks,
> Paul
>
> On 05/28/2014 02:39 AM, Martin Ünsal wrote:
>> Gerlando, I agree. The __attribute__((weak)) is not strictly necessary
>> in this case and the problem can be worked around temporarily by
>> removing this attribute. The reason is that the
>> __start___tracepoints_ptrs and __stop___tracepoints_ptrs are only
>> being declared, not defined, at compilation time. There is no need for
>> a weak definition if they are not defined at all. In fact the
>> definition is provided automagically by the linker using weak
>> semantics (i.e. only one definition per ELF binary, shared by all
>> declarations in all compilation units) regardless of the presence or
>> absence of weak attribute. Since __start___tracepoints_ptrs is defined
>> by the linker as the starting address of the _tracepoints_ptrs
>> section, it would be impossible for it to have anything other than
>> weak semantics, because it is nonsensical for different object files
>> in the same ELF binary to have different addresses for the same
>> executable section.
>>
>> Although removing __attribute__((weak)) is successful as a workaround,
>> I would not recommend to upstream it. Since these symbols have weak
>> semantics, they should have weak declarations. Removing this attribute
>> could cause a lot of confusion for people reading the code.
>>
>> I haven't tried Paul's patch but it also seems like a reasonable local
>> workaround but not the sort of thing to upstream.
>>
>> For a long term fix, in my opinion, Yocto/OpenEmbedded needs to fix
>> their compiler patches.
>>
>> Martin
>>
>>
>>
>> On Tue, May 27, 2014 at 9:04 AM, Gerlando Falauto
>> <gerlando.falauto at keymile.com <mailto:gerlando.falauto at keymile.com>>
>> wrote:
>>
>> Hi Paul,
>>
>> thanks for your explanation, but I'm more puzzled than ever.
>> I'm definitely lacking the appropriate background in both
>> terminology and internals, so I tried to figure out how the whole
>> magic works by empirical testing.
>>
>> Now, when you say:
>>
>>
>> > The reason is that you can have the same tracepoint provider be
>> USED in
>> > several compilation units that will all become part of one and
>> the same
>> > shared object (or executable).
>> >
>> > Then all those __start/stop___tracepoints_ptrs references in
>> different
>> > compilation units should refer to the same
>> > __start/stop___tracepoints_ptrs definitions for the shared
>> object (or
>> > executable) they are part of. This is required because the
>> > initialization of the tracepoints will only happen once per shared
>> > object (or executable) with the static ctor mechanism also
>> defined in
>> > tracepoint.h
>>
>> Who's responsible for initializating the tracepoints? Isn't it the
>> PROVIDER, instead of the user?
>>
>> Here's what I understood (or rather, speculated!), so please point
>> out where my understanding falls short.
>>
>> Tracepoint providers (where TRACEPOINT_DEFINE is defined) are what
>> actually implement tracepoints. You can have multiple source
>> files, each defining one or more tracepoints. So in the end each
>> object file will contain one or more tracepoint pointers within
>> its "__tracepoints_ptrs" section (courtesy of the compiler). When
>> linking (e.g. towards a shared object), a single section
>> __tracepoints_ptrs in the output ELF binary will merge all the
>> sections of the above objects, and hold all the pointers as a
>> contiguous array. This time, courtesy of the linker, who also
>> automagically defines __start___tracepoints_ptrs /
>> __stop___tracepoints_ptrs symbols to hold pointers to the
>> beginning and end parts of the section.
>>
>> Each object file will contain its own __tracepoints__ptrs_init()
>> constructor, responsible for registering ALL the tracepoints it
>> provides. Actually, we want only ONE constructor per shared object
>> to register all the tracepoint pointers provided by the whole
>> shared object (contained within
>> __start___tracepoints_ptrs/__stop___tracepoints_ptrs). This is
>> where, for instance, __tracepoint_ptrs_registered comes into play.
>> Multiple invocations of the constructor (one per object file)
>> should be avoided and only the first one needs to be performed.
>> And this is why __tracepoint_ptrs_registered needs to be weak
>> (multiple source files could lead to multiple definitions -- we
>> want one and only one per shared object) *AND* hidden (each shared
>> object should have its own copy).
>> If I remove the weak attribute from __tracepoint_ptrs_registered,
>> the linker starts screaming as soon as I compile one of the examples.
>>
>> On the other hand,
>> __start___tracepoints_ptrs/__stop___tracepoints_ptrs are generated
>> by the linker (or so I want to believe!) so only one instance is
>> emitted.
>> Keeping them hidden prevents the name clash during dynamic
>> linking, as the symbol will not be visible from other shared
>> objects or binaries.
>> But I don't see why they should also be weak.
>>
>> As a matter of fact, removing the weak attribute seems to fix my
>> problem (as far as I could test).
>> What am I missing?
>>
>> Thank you again for your patience,
>> Gerlando
>>
>>
>> On 05/27/2014 04:58 PM, Woegerer, Paul wrote:
>>
>> On 05/27/2014 04:41 PM, Gerlando Falauto wrote:
>>
>> Hi Paul,
>>
>> thank you very much for sharing this.
>>
>> I had in the meantime run into the same suggestion by
>> Henrik Wallin on a thread opened by Martin
>> (https://gcc.gnu.org/ml/gcc-help/2014-05/msg00028.html).
>> Further updates from Martin also suggest the issue is
>> rather related to
>> the OpenEmbedded toolchain.
>>
>> I was about to post the "opposite" of your patch, as I
>> don't see the
>> need to have those symbols as weak instead. In the end,
>> doesn't weak
>> only allow for a further re-definition? In this case we're
>> only
>> declaring it as extern, aren't we?
>> Definition actually happens by magic, as far as I can tell.
>> But please correct me if I got it all wrong.
>>
>>
>> It's more complicated.
>>
>> You absolutely need those symbol to be declared as:
>>
>> .weak __start___tracepoints_ptrs
>> .weak __stop___tracepoints_ptrs
>>
>> *and*
>>
>> .hidden __start___tracepoints_ptrs
>> .hidden __stop___tracepoints_ptrs
>>
>> The reason is that you can have the same tracepoint provider
>> be USED in
>> several compilation units that will all become part of one and
>> the same
>> shared object (or executable).
>>
>> Then all those __start/stop___tracepoints_ptrs references in
>> different
>> compilation units should refer to the same
>> __start/stop___tracepoints_ptrs definitions for the shared
>> object (or
>> executable) they are part of. This is required because the
>> initialization of the tracepoints will only happen once per shared
>> object (or executable) with the static ctor mechanism also
>> defined in
>> tracepoint.h
>>
>> HTH,
>> Paul
>>
>>
>> Thank you,
>> Gerlando
>>
>> On 05/27/2014 04:32 PM, Woegerer, Paul wrote:
>>
>> Hi Martin, Hi Gerlando,
>>
>> this sounds a lot like the compiler bug I found
>> recently in Yocto 1.6
>> (reproducible on ARM, x86 and PPC)
>>
>> The problem in my case is that the Yocto generated GCC
>> cross-compiler
>> translates:
>>
>> extern struct tracepoint * const
>> __start___tracepoints_ptrs[]
>> __attribute__((weak, visibility("hidden")));
>> extern struct tracepoint * const
>> __stop___tracepoints_ptrs[]
>> __attribute__((weak, visibility("hidden")));
>>
>> incorrectly to assembly. For these symbols that are
>> declared with
>>
>> __attribute__((weak, visibility("hidden")));
>>
>> that are also defined to be external, in the assembly
>> the following
>> lines are missing:
>>
>> .hidden __stop___tracepoints_ptrs
>> .hidden __start___tracepoints_ptrs
>>
>> This causes __stop___tracepoints_ptrs and
>> __start___tracepoints_ptrs
>> to be further treated as ordinary weak symbols instead of
>> per-shared-object weak symbols.
>> That further will cause the linker to resolve any
>> such symbols with
>> the first definition of those symbols that it can see
>> (it will not
>> constrain itself to only consider definitions from
>> within the same
>> shared object). The net result is that only one
>> tracepoint provider
>> gets activated (the first one the linker sees) instead
>> of all the
>> tracepoint providers used in various source files.
>>
>> To fix this I use the following lttng-ust workaround
>> (for now):
>>
>> diff --git a/include/lttng/tracepoint.h
>> b/include/lttng/tracepoint.h
>> index 66e2abd..50cef26 100644
>> --- a/include/lttng/tracepoint.h
>> +++ b/include/lttng/tracepoint.h
>> @@ -313,9 +313,11 @@ __tracepoints__destroy(void)
>> * (or for the whole main program).
>> */
>> extern struct tracepoint * const
>> __start___tracepoints_ptrs[]
>> - __attribute__((weak, visibility("hidden")));
>> + __attribute__((weak));
>> +asm(".hidden __start___tracepoints_ptrs");
>> extern struct tracepoint * const
>> __stop___tracepoints_ptrs[]
>> - __attribute__((weak, visibility("hidden")));
>> + __attribute__((weak));
>> +asm(".hidden __stop___tracepoints_ptrs");
>>
>> /*
>> * When TRACEPOINT_PROBE_DYNAMIC_LINKAGE is
>> defined, we do not emit a
>>
>>
>> Note that this issue is not reproducible with my GCC
>> on host:
>> gcc version 4.8.1 20130909 [gcc-4_8-branch revision
>> 202388] (SUSE Linux)
>> and also not with the latest Codebench 2014.05
>> ARM-Linux cross-toolchain.
>>
>> --
>> Best,
>> Paul
>>
>> On 05/27/2014 01:55 PM, Gerlando Falauto wrote:
>>
>> Hi Martin,
>>
>> I have been struggling for a while with this issue
>> (see the whole
>> thread):
>>
>> http://lists.lttng.org/pipermail/lttng-dev/2014-May/023035.html
>>
>> and landed on the same conclusions as yours (found
>> your message by
>> searching for __start___tracepoints_ptr!).
>> So at least you're not alone!
>>
>> So, did you ever manage to get any of your
>> questions answered:
>>
>> 1) Have you run into a problem like this?
>> Is there a known
>>
>> fix/workaround?
>>
>> 2) __start____tracepoints_ptrs is declared
>> as extern in tracepoint.h,
>>
>> but it
>>
>> is not defined. This appears to be some
>> sort of undocumented linker
>>
>> magic.
>>
>> http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
>> is the only
>>
>> reference I
>>
>> could find. Do you know where this
>> behavior is documented or
>>
>> specified (if
>>
>> at all)?
>> 3) Do you know why the symbol visibility for
>> __start___tracepoints_ptrs
>> changed between 4.6.4 to 4.7.2?
>>
>>
>> Thank you so much!
>> Gerlando
>>
>> BTW, I'm also running GCC 4.7.2 (lttng-ust is
>> cross-compiled, the test
>> application is natively compiled).
>>
>> On an x86_64 host running either GCC 4.4.6 or
>> 4.4.7, the issue is not
>> observed.
>>
>>
>> On 04/30/2014 11:57 PM, Martin Ünsal wrote:
>>
>> Incidentally I also asked for help on the GNU
>> linker-specific part
>> (question 2) here:
>>
>> http://gcc.gnu.org/ml/gcc-help/2014-04/msg00164.html
>>
>> Martin
>>
>>
>> On Wed, Apr 30, 2014 at 2:21 PM, Martin Ünsal
>> <martinunsal at gmail.com
>> <mailto:martinunsal at gmail.com>>
>> wrote:
>>
>> Hi LTTng folks
>>
>> I have a strange problem using LTTng-UST
>> on an ARM based platform. I
>> have
>> done some diagnosis but I am running low
>> on ideas and was hoping for
>> help
>> from the experts. I am using lttng-tools
>> 2.2.0, lttng-ust 2.2.0,
>> liburcu
>> 0.8.1. I know these are old but upgrading
>> is easier said than done
>> unfortunately. I didn't see anything
>> related to this problem in
>> relnotes,
>> mailing list traffic, or master branch,
>> but I could have missed
>> something.
>>
>> The problem showed up when I switched from
>> GCC 4.6.4 to 4.7.2.
>> Conceptually,
>> the situation is that I have a single
>> executable, call it MyProgram,
>> with
>> two plugins loaded at runtime with
>> dlopen(), lets call them
>> libPlugin1.so
>> and libPlugin2.so. There are three
>> different LTTng-UST tracepoint
>> providers,
>> one each for the executable and the two
>> plugins. With GCC 4.7.2,
>> tracepoints
>> in libPlugin1 stopped working. The
>> tracepoints in MyProgram and in
>> libPlugin2 continue to work correctly.
>>
>> I have established without a doubt that
>> the toolchain upgrade is the
>> cause
>> of the regression.
>>
>> In the debugger, I confirmed that the
>> tracepoint for libPlugin1.so is
>> being
>> executed, but
>> __tracepoint_##provider##___##name.state
>> is always 0
>> even when
>> I enable the tracepoint in lttng-tools. As
>> a result the tracepoint
>> callback
>> is not being invoked when it should be. In
>> MyProgram and
>> libPlugin2.so, the
>> .state variable correctly reflects whether
>> the tracepoint is enabled,
>> and if
>> the tracepoint is enabled, the tracepoint
>> callback is invoked.
>>
>> Next I set a breakpoint in
>> tracepoint_register_lib() and looked at
>> tracepoints_start parameter.
>>
>> 1) With GCC 4.6.4 everything is as expected:
>> a) tracepoint_register_lib() for
>> MyProgram called with
>> MyProgramProvider's
>> __start___tracepoints_ptrs.
>> b) tracepoint_register_lib() after
>> libPlugin1 dlopen() called
>> with
>> libPlugin1Provider's
>> __start___tracepoints_ptrs
>> c) tracepoint_register_lib() after
>> libPlugin2 dlopen() called
>> with
>> libPlugin2Provider's __start___tracepoint_ptrs
>>
>> 2) With GCC 4.7.2 there is a problem:
>> a) tracepoint_register_lib() for
>> MyProgram called with
>> MyProgramProvider's
>> __start___tracepoints_ptrs.
>> b) tracepoint_register_lib() after
>> libPlugin1 dlopen() called
>> with
>> MyProgramProvider's
>> __start___tracepoints_ptrs (!!!! THIS IS WRONG
>> !!!!)
>> c) tracepoint_register_lib() after
>> libPlugin2 dlopen() called
>> with
>> libPlugin2Provider's __start___tracepoint_ptrs
>>
>> I looked at the symbol table for
>> libPlugin1.so to see if it would
>> shed some
>> light on the problem.
>>
>> 1) With GCC 4.6.4:
>> # objdump -t /usr/lib/.debug/libPlugin1.so
>> | grep
>> __start___tracepoints_ptrs
>> 00025bb0 l *ABS* 00000000
>> __start___tracepoints_ptrs
>> # objdump -t /usr/lib/.debug/libPlugin2.so
>> | grep
>> __start___tracepoints_ptrs
>> 00041eb4 l *ABS* 00000000
>> __start___tracepoints_ptrs
>>
>> 2) With GCC 4.7.2:
>> # objdump -t /usr/lib/.debug/libPlugin1.so
>> | grep
>> __start___tracepoints_ptrs
>> 00025a90 g __tracepoints_ptrs 00000000
>> __start___tracepoints_ptrs
>> # objdump -t /usr/lib/.debug/libPlugin2.so
>> | grep
>> __start___tracepoints_ptrs
>> 00041eb4 g __tracepoints_ptrs 00000000
>> __start___tracepoints_ptrs
>>
>> My hypothesis at this point is that since
>> __start___tracepoints_ptrs
>> changed
>> from a local to a global symbol, the
>> dynamic loader no longer knows
>> how to
>> select the correct weak symbol. I cannot
>> explain why libPlugin2 still
>> loads
>> its provider correctly, perhaps it is just
>> getting lucky.
>>
>> A few questions come to mind...
>> 1) Have you run into a problem like this?
>> Is there a known
>> fix/workaround?
>> 2) __start____tracepoints_ptrs is declared
>> as extern in tracepoint.h,
>> but it
>> is not defined. This appears to be some
>> sort of undocumented linker
>> magic.
>> http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
>> is the only
>> reference I
>> could find. Do you know where this
>> behavior is documented or
>> specified (if
>> at all)?
>> 3) Do you know why the symbol visibility for
>> __start___tracepoints_ptrs
>> changed between 4.6.4 to 4.7.2?
>>
>> Thanks for any help. This is a real
>> puzzler for me.
>>
>> Martin
>>
>>
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev at lists.lttng.org
>> <mailto:lttng-dev at lists.lttng.org>
>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>>
>>
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev at lists.lttng.org
>> <mailto:lttng-dev at lists.lttng.org>
>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>>
>>
>>
>>
>>
>>
>>
>
--
Paul Woegerer, SW Development Engineer
Sourcery Analyzer <http://go.mentor.com/sourceryanalyzer>
Mentor Graphics, Embedded Software Division
More information about the lttng-dev
mailing list