[lttng-dev] Problem with UST related to dlload

Woegerer, Paul Paul_Woegerer at mentor.com
Wed May 28 07:05:47 EDT 2014


On 05/28/2014 12:38 PM, Woegerer, Paul wrote:

> It would be interesting if
> this treatment of hidden symbols is standardized or if this is just an
> implementation-specific behavior of GNU ld.

Maybe this link contains the answer to that question:
http://docs.oracle.com/cd/E19683-01/816-7529/chapter6-79797/index.html
At the explanation of STV_HIDDEN it says: "A hidden symbol contained in
a relocatable object is either removed or converted to STB_LOCAL binding
by the link-editor when the relocatable object is included in an
executable file or shared object."

Could it be that "the potential conversion to an STB_LOCAL at link-time"
has exactly the effect of not requiring a definition at link-time ?

--
Paul

>
> thanks,
> Paul
>
> On 05/28/2014 02:39 AM, Martin Ünsal wrote:
>> Gerlando, I agree. The __attribute__((weak)) is not strictly necessary
>> in this case and the problem can be worked around temporarily by
>> removing this attribute. The reason is that the
>> __start___tracepoints_ptrs and __stop___tracepoints_ptrs are only
>> being declared, not defined, at compilation time. There is no need for
>> a weak definition if they are not defined at all. In fact the
>> definition is provided automagically by the linker using weak
>> semantics (i.e. only one definition per ELF binary, shared by all
>> declarations in all compilation units) regardless of the presence or
>> absence of weak attribute. Since __start___tracepoints_ptrs is defined
>> by the linker as the starting address of the _tracepoints_ptrs
>> section, it would be impossible for it to have anything other than
>> weak semantics, because it is nonsensical for different object files
>> in the same ELF binary to have different addresses for the same
>> executable section.
>>
>> Although removing __attribute__((weak)) is successful as a workaround,
>> I would not recommend to upstream it. Since these symbols have weak
>> semantics, they should have weak declarations. Removing this attribute
>> could cause a lot of confusion for people reading the code.
>>
>> I haven't tried Paul's patch but it also seems like a reasonable local
>> workaround but not the sort of thing to upstream.
>>
>> For a long term fix, in my opinion, Yocto/OpenEmbedded needs to fix
>> their compiler patches.
>>
>> Martin
>>
>>
>>
>> On Tue, May 27, 2014 at 9:04 AM, Gerlando Falauto
>> <gerlando.falauto at keymile.com <mailto:gerlando.falauto at keymile.com>>
>> wrote:
>>
>>     Hi Paul,
>>
>>     thanks for your explanation, but I'm more puzzled than ever.
>>     I'm definitely lacking the appropriate background in both
>>     terminology and internals, so I tried to figure out how the whole
>>     magic works by empirical testing.
>>
>>     Now, when you say:
>>
>>
>>     > The reason is that you can have the same tracepoint provider be
>>     USED in
>>     > several compilation units that will all become part of one and
>>     the same
>>     > shared object (or executable).
>>     >
>>     > Then all those __start/stop___tracepoints_ptrs references in
>>     different
>>     > compilation units should refer to the same
>>     > __start/stop___tracepoints_ptrs definitions for the shared
>>     object (or
>>     > executable) they are part of. This is required because the
>>     > initialization of the tracepoints will only happen once per shared
>>     > object (or executable) with the static ctor mechanism also
>>     defined in
>>     > tracepoint.h
>>
>>     Who's responsible for initializating the tracepoints? Isn't it the
>>     PROVIDER, instead of the user?
>>
>>     Here's what I understood (or rather, speculated!), so please point
>>     out where my understanding falls short.
>>
>>     Tracepoint providers (where TRACEPOINT_DEFINE is defined) are what
>>     actually implement tracepoints. You can have multiple source
>>     files, each defining one or more tracepoints. So in the end each
>>     object file will contain one or more tracepoint pointers within
>>     its "__tracepoints_ptrs" section (courtesy of the compiler). When
>>     linking (e.g. towards a shared object), a single section
>>     __tracepoints_ptrs in the output ELF binary will merge all the
>>     sections of the above objects, and hold all the pointers as a
>>     contiguous array. This time, courtesy of the linker, who also
>>     automagically defines __start___tracepoints_ptrs /
>>     __stop___tracepoints_ptrs symbols to hold pointers to the
>>     beginning and end parts of the section.
>>
>>     Each object file will contain its own __tracepoints__ptrs_init()
>>     constructor, responsible for registering ALL the tracepoints it
>>     provides. Actually, we want only ONE constructor per shared object
>>     to register all the tracepoint pointers provided by the whole
>>     shared object (contained within
>>     __start___tracepoints_ptrs/__stop___tracepoints_ptrs). This is
>>     where, for instance, __tracepoint_ptrs_registered comes into play.
>>     Multiple invocations of the constructor (one per object file)
>>     should be avoided and only the first one needs to be performed.
>>     And this is why __tracepoint_ptrs_registered needs to be weak
>>     (multiple source files could lead to multiple definitions -- we
>>     want one and only one per shared object) *AND* hidden (each shared
>>     object should have its own copy).
>>     If I remove the weak attribute from __tracepoint_ptrs_registered,
>>     the linker starts screaming as soon as I compile one of the examples.
>>
>>     On the other hand,
>>     __start___tracepoints_ptrs/__stop___tracepoints_ptrs are generated
>>     by the linker (or so I want to believe!) so only one instance is
>>     emitted.
>>     Keeping them hidden prevents the name clash during dynamic
>>     linking, as the symbol will not be visible from other shared
>>     objects or binaries.
>>     But I don't see why they should also be weak.
>>
>>     As a matter of fact, removing the weak attribute seems to fix my
>>     problem (as far as I could test).
>>     What am I missing?
>>
>>     Thank you again for your patience,
>>     Gerlando
>>
>>
>>     On 05/27/2014 04:58 PM, Woegerer, Paul wrote:
>>
>>         On 05/27/2014 04:41 PM, Gerlando Falauto wrote:
>>
>>             Hi Paul,
>>
>>             thank you very much for sharing this.
>>
>>             I had in the meantime run into the same suggestion by
>>             Henrik Wallin on a thread opened by Martin
>>             (https://gcc.gnu.org/ml/gcc-help/2014-05/msg00028.html).
>>             Further updates from Martin also suggest the issue is
>>             rather related to
>>             the OpenEmbedded toolchain.
>>
>>             I was about to post the "opposite" of your patch, as I
>>             don't see the
>>             need to have those symbols as weak instead. In the end,
>>             doesn't weak
>>             only allow for a further re-definition? In this case we're
>>             only
>>             declaring it as extern, aren't we?
>>             Definition actually happens by magic, as far as I can tell.
>>             But please correct me if I got it all wrong.
>>
>>
>>         It's more complicated.
>>
>>         You absolutely need those symbol to be declared as:
>>
>>              .weak   __start___tracepoints_ptrs
>>              .weak   __stop___tracepoints_ptrs
>>
>>         *and*
>>
>>              .hidden __start___tracepoints_ptrs
>>              .hidden __stop___tracepoints_ptrs
>>
>>         The reason is that you can have the same tracepoint provider
>>         be USED in
>>         several compilation units that will all become part of one and
>>         the same
>>         shared object (or executable).
>>
>>         Then all those __start/stop___tracepoints_ptrs references in
>>         different
>>         compilation units should refer to the same
>>         __start/stop___tracepoints_ptrs definitions for the shared
>>         object (or
>>         executable) they are part of. This is required because the
>>         initialization of the tracepoints will only happen once per shared
>>         object (or executable) with the static ctor mechanism also
>>         defined in
>>         tracepoint.h
>>
>>         HTH,
>>         Paul
>>
>>
>>             Thank you,
>>             Gerlando
>>
>>             On 05/27/2014 04:32 PM, Woegerer, Paul wrote:
>>
>>                 Hi Martin, Hi Gerlando,
>>
>>                 this sounds a lot like the compiler bug I found
>>                 recently in Yocto 1.6
>>                 (reproducible on ARM, x86 and PPC)
>>
>>                 The problem in my case is that the Yocto generated GCC
>>                 cross-compiler
>>                 translates:
>>
>>                 extern struct tracepoint * const
>>                 __start___tracepoints_ptrs[]
>>                       __attribute__((weak, visibility("hidden")));
>>                 extern struct tracepoint * const
>>                 __stop___tracepoints_ptrs[]
>>                       __attribute__((weak, visibility("hidden")));
>>
>>                 incorrectly to assembly. For these symbols that are
>>                 declared with
>>
>>                 __attribute__((weak, visibility("hidden")));
>>
>>                 that are also defined to be external, in the assembly
>>                 the following
>>                 lines are missing:
>>
>>                 .hidden __stop___tracepoints_ptrs
>>                 .hidden __start___tracepoints_ptrs
>>
>>                 This causes __stop___tracepoints_ptrs and
>>                 __start___tracepoints_ptrs
>>                 to be further treated as ordinary weak symbols instead of
>>                 per-shared-object weak symbols.
>>                 That further will cause  the linker to resolve any
>>                 such symbols with
>>                 the first definition of those symbols that it can see
>>                 (it will not
>>                 constrain itself to only consider definitions from
>>                 within the same
>>                 shared object). The net result is that only one
>>                 tracepoint provider
>>                 gets activated (the first one the linker sees) instead
>>                 of all the
>>                 tracepoint providers used in various source files.
>>
>>                 To fix this I use the following lttng-ust workaround
>>                 (for now):
>>
>>                 diff --git a/include/lttng/tracepoint.h
>>                 b/include/lttng/tracepoint.h
>>                 index 66e2abd..50cef26 100644
>>                 --- a/include/lttng/tracepoint.h
>>                 +++ b/include/lttng/tracepoint.h
>>                 @@ -313,9 +313,11 @@ __tracepoints__destroy(void)
>>                     * (or for the whole main program).
>>                     */
>>                    extern struct tracepoint * const
>>                 __start___tracepoints_ptrs[]
>>                 -       __attribute__((weak, visibility("hidden")));
>>                 +       __attribute__((weak));
>>                 +asm(".hidden __start___tracepoints_ptrs");
>>                    extern struct tracepoint * const
>>                 __stop___tracepoints_ptrs[]
>>                 -       __attribute__((weak, visibility("hidden")));
>>                 +       __attribute__((weak));
>>                 +asm(".hidden __stop___tracepoints_ptrs");
>>
>>                    /*
>>                     * When TRACEPOINT_PROBE_DYNAMIC_LINKAGE is
>>                 defined, we do not emit a
>>
>>
>>                 Note that this issue is not reproducible with my GCC
>>                 on host:
>>                 gcc version 4.8.1 20130909 [gcc-4_8-branch revision
>>                 202388] (SUSE Linux)
>>                 and also not with the latest Codebench 2014.05
>>                 ARM-Linux cross-toolchain.
>>
>>                 --
>>                 Best,
>>                 Paul
>>
>>                 On 05/27/2014 01:55 PM, Gerlando Falauto wrote:
>>
>>                     Hi Martin,
>>
>>                     I have been struggling for a while with this issue
>>                     (see the whole
>>                     thread):
>>
>>                     http://lists.lttng.org/pipermail/lttng-dev/2014-May/023035.html
>>
>>                     and landed on the same conclusions as yours (found
>>                     your message by
>>                     searching for __start___tracepoints_ptr!).
>>                     So at least you're not alone!
>>
>>                     So, did you ever manage to get any of your
>>                     questions answered:
>>
>>                             1) Have you run into a problem like this?
>>                             Is there a known
>>
>>                     fix/workaround?
>>
>>                             2) __start____tracepoints_ptrs is declared
>>                             as extern in tracepoint.h,
>>
>>                     but it
>>
>>                             is not defined. This appears to be some
>>                             sort of undocumented linker
>>
>>                     magic.
>>
>>                             http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
>>                             is the only
>>
>>                     reference I
>>
>>                             could find. Do you know where this
>>                             behavior is documented or
>>
>>                     specified (if
>>
>>                             at all)?
>>                             3) Do you know why the symbol visibility for
>>                             __start___tracepoints_ptrs
>>                             changed between 4.6.4 to 4.7.2?
>>
>>
>>                     Thank you so much!
>>                     Gerlando
>>
>>                     BTW, I'm also running GCC 4.7.2 (lttng-ust is
>>                     cross-compiled, the test
>>                     application is natively compiled).
>>
>>                     On an x86_64 host running either GCC 4.4.6 or
>>                     4.4.7, the issue is not
>>                     observed.
>>
>>
>>                     On 04/30/2014 11:57 PM, Martin Ünsal wrote:
>>
>>                         Incidentally I also asked for help on the GNU
>>                         linker-specific part
>>                         (question 2) here:
>>
>>                         http://gcc.gnu.org/ml/gcc-help/2014-04/msg00164.html
>>
>>                         Martin
>>
>>
>>                         On Wed, Apr 30, 2014 at 2:21 PM, Martin Ünsal
>>                         <martinunsal at gmail.com
>>                         <mailto:martinunsal at gmail.com>>
>>                         wrote:
>>
>>                             Hi LTTng folks
>>
>>                             I have a strange problem using LTTng-UST
>>                             on an ARM based platform. I
>>                             have
>>                             done some diagnosis but I am running low
>>                             on ideas and was hoping for
>>                             help
>>                             from the experts. I am using lttng-tools
>>                             2.2.0, lttng-ust 2.2.0,
>>                             liburcu
>>                             0.8.1. I know these are old but upgrading
>>                             is easier said than done
>>                             unfortunately. I didn't see anything
>>                             related to this problem in
>>                             relnotes,
>>                             mailing list traffic, or master branch,
>>                             but I could have missed
>>                             something.
>>
>>                             The problem showed up when I switched from
>>                             GCC 4.6.4 to 4.7.2.
>>                             Conceptually,
>>                             the situation is that I have a single
>>                             executable, call it MyProgram,
>>                             with
>>                             two plugins loaded at runtime with
>>                             dlopen(), lets call them
>>                             libPlugin1.so
>>                             and libPlugin2.so. There are three
>>                             different LTTng-UST tracepoint
>>                             providers,
>>                             one each for the executable and the two
>>                             plugins. With GCC 4.7.2,
>>                             tracepoints
>>                             in libPlugin1 stopped working. The
>>                             tracepoints in MyProgram and in
>>                             libPlugin2 continue to work correctly.
>>
>>                             I have established without a doubt that
>>                             the toolchain upgrade is the
>>                             cause
>>                             of the regression.
>>
>>                             In the debugger, I confirmed that the
>>                             tracepoint for libPlugin1.so is
>>                             being
>>                             executed, but
>>                             __tracepoint_##provider##___##name.state
>>                             is always 0
>>                             even when
>>                             I enable the tracepoint in lttng-tools. As
>>                             a result the tracepoint
>>                             callback
>>                             is not being invoked when it should be. In
>>                             MyProgram and
>>                             libPlugin2.so, the
>>                             .state variable correctly reflects whether
>>                             the tracepoint is enabled,
>>                             and if
>>                             the tracepoint is enabled, the tracepoint
>>                             callback is invoked.
>>
>>                             Next I set a breakpoint in
>>                             tracepoint_register_lib() and looked at
>>                             tracepoints_start parameter.
>>
>>                             1) With GCC 4.6.4 everything is as expected:
>>                                   a) tracepoint_register_lib() for
>>                             MyProgram called with
>>                             MyProgramProvider's
>>                             __start___tracepoints_ptrs.
>>                                   b) tracepoint_register_lib() after
>>                             libPlugin1 dlopen() called
>>                             with
>>                             libPlugin1Provider's
>>                             __start___tracepoints_ptrs
>>                                   c) tracepoint_register_lib() after
>>                             libPlugin2 dlopen() called
>>                             with
>>                             libPlugin2Provider's __start___tracepoint_ptrs
>>
>>                             2) With GCC 4.7.2 there is a problem:
>>                                   a) tracepoint_register_lib() for
>>                             MyProgram called with
>>                             MyProgramProvider's
>>                             __start___tracepoints_ptrs.
>>                                   b) tracepoint_register_lib() after
>>                             libPlugin1 dlopen() called
>>                             with
>>                             MyProgramProvider's
>>                             __start___tracepoints_ptrs (!!!! THIS IS WRONG
>>                             !!!!)
>>                                   c) tracepoint_register_lib() after
>>                             libPlugin2 dlopen() called
>>                             with
>>                             libPlugin2Provider's __start___tracepoint_ptrs
>>
>>                             I looked at the symbol table for
>>                             libPlugin1.so to see if it would
>>                             shed some
>>                             light on the problem.
>>
>>                             1) With GCC 4.6.4:
>>                             # objdump -t /usr/lib/.debug/libPlugin1.so
>>                             | grep
>>                             __start___tracepoints_ptrs
>>                             00025bb0 l       *ABS* 00000000
>>                             __start___tracepoints_ptrs
>>                             # objdump -t /usr/lib/.debug/libPlugin2.so
>>                             | grep
>>                             __start___tracepoints_ptrs
>>                             00041eb4 l       *ABS* 00000000
>>                             __start___tracepoints_ptrs
>>
>>                             2) With GCC 4.7.2:
>>                             # objdump -t /usr/lib/.debug/libPlugin1.so
>>                             | grep
>>                             __start___tracepoints_ptrs
>>                             00025a90 g       __tracepoints_ptrs 00000000
>>                             __start___tracepoints_ptrs
>>                             # objdump -t /usr/lib/.debug/libPlugin2.so
>>                             | grep
>>                             __start___tracepoints_ptrs
>>                             00041eb4 g       __tracepoints_ptrs 00000000
>>                             __start___tracepoints_ptrs
>>
>>                             My hypothesis at this point is that since
>>                             __start___tracepoints_ptrs
>>                             changed
>>                             from a local to a global symbol, the
>>                             dynamic loader no longer knows
>>                             how to
>>                             select the correct weak symbol. I cannot
>>                             explain why libPlugin2 still
>>                             loads
>>                             its provider correctly, perhaps it is just
>>                             getting lucky.
>>
>>                             A few questions come to mind...
>>                             1) Have you run into a problem like this?
>>                             Is there a known
>>                             fix/workaround?
>>                             2) __start____tracepoints_ptrs is declared
>>                             as extern in tracepoint.h,
>>                             but it
>>                             is not defined. This appears to be some
>>                             sort of undocumented linker
>>                             magic.
>>                             http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
>>                             is the only
>>                             reference I
>>                             could find. Do you know where this
>>                             behavior is documented or
>>                             specified (if
>>                             at all)?
>>                             3) Do you know why the symbol visibility for
>>                             __start___tracepoints_ptrs
>>                             changed between 4.6.4 to 4.7.2?
>>
>>                             Thanks for any help. This is a real
>>                             puzzler for me.
>>
>>                             Martin
>>
>>
>>                         _______________________________________________
>>                         lttng-dev mailing list
>>                         lttng-dev at lists.lttng.org
>>                         <mailto:lttng-dev at lists.lttng.org>
>>                         http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>>
>>
>>                     _______________________________________________
>>                     lttng-dev mailing list
>>                     lttng-dev at lists.lttng.org
>>                     <mailto:lttng-dev at lists.lttng.org>
>>                     http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>>
>>
>>
>>
>>
>>
>>
>


-- 
Paul Woegerer, SW Development Engineer
Sourcery Analyzer <http://go.mentor.com/sourceryanalyzer>
Mentor Graphics, Embedded Software Division




More information about the lttng-dev mailing list