[lttng-dev] Problem with UST related to dlload

Gerlando Falauto gerlando.falauto at keymile.com
Wed May 28 09:04:42 EDT 2014


Hi Paul, Martin,

thank you so much for your answers and patience.

On 05/28/2014 12:38 PM, Woegerer, Paul wrote:
> Hi Martin, Hi Gerlando
>
> It tried your approach of removing __attribute__((weak)) and to my
> surprise this really seems to be sufficient.
>
> What stuns me is that providing the visibility attribute hidden
> implicitly also makes the symbol to be treated as a weak symbol (in the
> sense that it can be linked without providing a definition somewhere for
> it) by the linker.

I'll have to disagree here, at least I don't have the same feeling.

Here's what I get [with my buggy compiler] if I
a) remove __attribute__((weak)) *AND*
b) rename section("__tracepoints_ptrs") to section("__tracepoints_ptrs_XXX")

(thereby preventing the linker from creating the correct 
__start__/__stop__ symbols):

/opt/eldk-5.4/powerpc-softfloat/sysroots/powerpc-nf-linux/usr/lib/crtn.o
./.libs/liblttng-ust-runtime.a(lttng-ust-baddr.o):(.got2+0x34): 
undefined reference to `__start___tracepoints_ptrs'
./.libs/liblttng-ust-runtime.a(lttng-ust-baddr.o):(.got2+0x38): 
undefined reference to `__stop___tracepoints_ptrs'
./.libs/liblttng-ust-runtime.a(tracef.o):(.got2+0x20): undefined 
reference to `__start___tracepoints_ptrs'
./.libs/liblttng-ust-runtime.a(tracef.o):(.got2+0x24): undefined 
reference to `__stop___tracepoints_ptrs'
collect2: error: ld returned 1 exit status

So the hidden symbols are *NOT* weak at all (at least with my buggy 
compiler). They are just automagically defined by the linker.
As a matter of fact, I don't think they should have ever been weak in 
the first place. We *WANT* those symbols to exist and be well-defined, 
and we should make sure the linker complies with this requirement, as 
this is crucial to the correct behaviour of lttng-ust.
If we generate an inconsistency like the above and keep the weak 
attribute, we would end up with code which compiles perfectly but still 
will not work!

BTW, the only reference I could find to how and why ldd defines those 
symbols for section __start__/__stop__ is [1], which admittedly states: 
  "I couldn't find any formal documentation for this feature, only a few 
obscure mailing list references". :-(

So, as to Martin's statement:

 >For a long term fix, in my opinion, Yocto/OpenEmbedded needs to fix 
 >their compiler patches.

This is definitely true. However, we should also somehow prevent other 
people from the frustration we have both been through.
So if in the end it turns out that removing __attribute__((weak)) is 
*NOT* "The Right Thing To Do (TM)", we should at least implement some 
compiler checking at configure phase, and either bail out with a 
meaningful message, or define a preprocessor define so we can cope with 
that within tracepoint.h.

Still, I believe we should "kill the weak". [2] ;-)

What do you think?

Thanks again!
Gerlando

[1] 
http://mgalgs.github.io/2013/05/10/hacking-your-ELF-for-fun-and-profit.html
[2] No Nazi propaganda intended! ;-)

> I thought tagging them weak is required for exactly
> that. But apparently this is not the case. It would be interesting if
> this treatment of hidden symbols is standardized or if this is just an
> implementation-specific behavior of GNU ld.
>
> thanks,
> Paul
>
> On 05/28/2014 02:39 AM, Martin Ünsal wrote:
>> Gerlando, I agree. The __attribute__((weak)) is not strictly necessary
>> in this case and the problem can be worked around temporarily by
>> removing this attribute. The reason is that the
>> __start___tracepoints_ptrs and __stop___tracepoints_ptrs are only
>> being declared, not defined, at compilation time. There is no need for
>> a weak definition if they are not defined at all. In fact the
>> definition is provided automagically by the linker using weak
>> semantics (i.e. only one definition per ELF binary, shared by all
>> declarations in all compilation units) regardless of the presence or
>> absence of weak attribute. Since __start___tracepoints_ptrs is defined
>> by the linker as the starting address of the _tracepoints_ptrs
>> section, it would be impossible for it to have anything other than
>> weak semantics, because it is nonsensical for different object files
>> in the same ELF binary to have different addresses for the same
>> executable section.
>>
>> Although removing __attribute__((weak)) is successful as a workaround,
>> I would not recommend to upstream it. Since these symbols have weak
>> semantics, they should have weak declarations. Removing this attribute
>> could cause a lot of confusion for people reading the code.
>>
>> I haven't tried Paul's patch but it also seems like a reasonable local
>> workaround but not the sort of thing to upstream.
>>
>> For a long term fix, in my opinion, Yocto/OpenEmbedded needs to fix
>> their compiler patches.
>>
>> Martin
>>
>>
>>
>> On Tue, May 27, 2014 at 9:04 AM, Gerlando Falauto
>> <gerlando.falauto at keymile.com <mailto:gerlando.falauto at keymile.com>>
>> wrote:
>>
>>      Hi Paul,
>>
>>      thanks for your explanation, but I'm more puzzled than ever.
>>      I'm definitely lacking the appropriate background in both
>>      terminology and internals, so I tried to figure out how the whole
>>      magic works by empirical testing.
>>
>>      Now, when you say:
>>
>>
>>      > The reason is that you can have the same tracepoint provider be
>>      USED in
>>      > several compilation units that will all become part of one and
>>      the same
>>      > shared object (or executable).
>>      >
>>      > Then all those __start/stop___tracepoints_ptrs references in
>>      different
>>      > compilation units should refer to the same
>>      > __start/stop___tracepoints_ptrs definitions for the shared
>>      object (or
>>      > executable) they are part of. This is required because the
>>      > initialization of the tracepoints will only happen once per shared
>>      > object (or executable) with the static ctor mechanism also
>>      defined in
>>      > tracepoint.h
>>
>>      Who's responsible for initializating the tracepoints? Isn't it the
>>      PROVIDER, instead of the user?
>>
>>      Here's what I understood (or rather, speculated!), so please point
>>      out where my understanding falls short.
>>
>>      Tracepoint providers (where TRACEPOINT_DEFINE is defined) are what
>>      actually implement tracepoints. You can have multiple source
>>      files, each defining one or more tracepoints. So in the end each
>>      object file will contain one or more tracepoint pointers within
>>      its "__tracepoints_ptrs" section (courtesy of the compiler). When
>>      linking (e.g. towards a shared object), a single section
>>      __tracepoints_ptrs in the output ELF binary will merge all the
>>      sections of the above objects, and hold all the pointers as a
>>      contiguous array. This time, courtesy of the linker, who also
>>      automagically defines __start___tracepoints_ptrs /
>>      __stop___tracepoints_ptrs symbols to hold pointers to the
>>      beginning and end parts of the section.
>>
>>      Each object file will contain its own __tracepoints__ptrs_init()
>>      constructor, responsible for registering ALL the tracepoints it
>>      provides. Actually, we want only ONE constructor per shared object
>>      to register all the tracepoint pointers provided by the whole
>>      shared object (contained within
>>      __start___tracepoints_ptrs/__stop___tracepoints_ptrs). This is
>>      where, for instance, __tracepoint_ptrs_registered comes into play.
>>      Multiple invocations of the constructor (one per object file)
>>      should be avoided and only the first one needs to be performed.
>>      And this is why __tracepoint_ptrs_registered needs to be weak
>>      (multiple source files could lead to multiple definitions -- we
>>      want one and only one per shared object) *AND* hidden (each shared
>>      object should have its own copy).
>>      If I remove the weak attribute from __tracepoint_ptrs_registered,
>>      the linker starts screaming as soon as I compile one of the examples.
>>
>>      On the other hand,
>>      __start___tracepoints_ptrs/__stop___tracepoints_ptrs are generated
>>      by the linker (or so I want to believe!) so only one instance is
>>      emitted.
>>      Keeping them hidden prevents the name clash during dynamic
>>      linking, as the symbol will not be visible from other shared
>>      objects or binaries.
>>      But I don't see why they should also be weak.
>>
>>      As a matter of fact, removing the weak attribute seems to fix my
>>      problem (as far as I could test).
>>      What am I missing?
>>
>>      Thank you again for your patience,
>>      Gerlando
>>
>>
>>      On 05/27/2014 04:58 PM, Woegerer, Paul wrote:
>>
>>          On 05/27/2014 04:41 PM, Gerlando Falauto wrote:
>>
>>              Hi Paul,
>>
>>              thank you very much for sharing this.
>>
>>              I had in the meantime run into the same suggestion by
>>              Henrik Wallin on a thread opened by Martin
>>              (https://gcc.gnu.org/ml/gcc-help/2014-05/msg00028.html).
>>              Further updates from Martin also suggest the issue is
>>              rather related to
>>              the OpenEmbedded toolchain.
>>
>>              I was about to post the "opposite" of your patch, as I
>>              don't see the
>>              need to have those symbols as weak instead. In the end,
>>              doesn't weak
>>              only allow for a further re-definition? In this case we're
>>              only
>>              declaring it as extern, aren't we?
>>              Definition actually happens by magic, as far as I can tell.
>>              But please correct me if I got it all wrong.
>>
>>
>>          It's more complicated.
>>
>>          You absolutely need those symbol to be declared as:
>>
>>               .weak   __start___tracepoints_ptrs
>>               .weak   __stop___tracepoints_ptrs
>>
>>          *and*
>>
>>               .hidden __start___tracepoints_ptrs
>>               .hidden __stop___tracepoints_ptrs
>>
>>          The reason is that you can have the same tracepoint provider
>>          be USED in
>>          several compilation units that will all become part of one and
>>          the same
>>          shared object (or executable).
>>
>>          Then all those __start/stop___tracepoints_ptrs references in
>>          different
>>          compilation units should refer to the same
>>          __start/stop___tracepoints_ptrs definitions for the shared
>>          object (or
>>          executable) they are part of. This is required because the
>>          initialization of the tracepoints will only happen once per shared
>>          object (or executable) with the static ctor mechanism also
>>          defined in
>>          tracepoint.h
>>
>>          HTH,
>>          Paul
>>
>>
>>              Thank you,
>>              Gerlando
>>
>>              On 05/27/2014 04:32 PM, Woegerer, Paul wrote:
>>
>>                  Hi Martin, Hi Gerlando,
>>
>>                  this sounds a lot like the compiler bug I found
>>                  recently in Yocto 1.6
>>                  (reproducible on ARM, x86 and PPC)
>>
>>                  The problem in my case is that the Yocto generated GCC
>>                  cross-compiler
>>                  translates:
>>
>>                  extern struct tracepoint * const
>>                  __start___tracepoints_ptrs[]
>>                        __attribute__((weak, visibility("hidden")));
>>                  extern struct tracepoint * const
>>                  __stop___tracepoints_ptrs[]
>>                        __attribute__((weak, visibility("hidden")));
>>
>>                  incorrectly to assembly. For these symbols that are
>>                  declared with
>>
>>                  __attribute__((weak, visibility("hidden")));
>>
>>                  that are also defined to be external, in the assembly
>>                  the following
>>                  lines are missing:
>>
>>                  .hidden __stop___tracepoints_ptrs
>>                  .hidden __start___tracepoints_ptrs
>>
>>                  This causes __stop___tracepoints_ptrs and
>>                  __start___tracepoints_ptrs
>>                  to be further treated as ordinary weak symbols instead of
>>                  per-shared-object weak symbols.
>>                  That further will cause  the linker to resolve any
>>                  such symbols with
>>                  the first definition of those symbols that it can see
>>                  (it will not
>>                  constrain itself to only consider definitions from
>>                  within the same
>>                  shared object). The net result is that only one
>>                  tracepoint provider
>>                  gets activated (the first one the linker sees) instead
>>                  of all the
>>                  tracepoint providers used in various source files.
>>
>>                  To fix this I use the following lttng-ust workaround
>>                  (for now):
>>
>>                  diff --git a/include/lttng/tracepoint.h
>>                  b/include/lttng/tracepoint.h
>>                  index 66e2abd..50cef26 100644
>>                  --- a/include/lttng/tracepoint.h
>>                  +++ b/include/lttng/tracepoint.h
>>                  @@ -313,9 +313,11 @@ __tracepoints__destroy(void)
>>                      * (or for the whole main program).
>>                      */
>>                     extern struct tracepoint * const
>>                  __start___tracepoints_ptrs[]
>>                  -       __attribute__((weak, visibility("hidden")));
>>                  +       __attribute__((weak));
>>                  +asm(".hidden __start___tracepoints_ptrs");
>>                     extern struct tracepoint * const
>>                  __stop___tracepoints_ptrs[]
>>                  -       __attribute__((weak, visibility("hidden")));
>>                  +       __attribute__((weak));
>>                  +asm(".hidden __stop___tracepoints_ptrs");
>>
>>                     /*
>>                      * When TRACEPOINT_PROBE_DYNAMIC_LINKAGE is
>>                  defined, we do not emit a
>>
>>
>>                  Note that this issue is not reproducible with my GCC
>>                  on host:
>>                  gcc version 4.8.1 20130909 [gcc-4_8-branch revision
>>                  202388] (SUSE Linux)
>>                  and also not with the latest Codebench 2014.05
>>                  ARM-Linux cross-toolchain.
>>
>>                  --
>>                  Best,
>>                  Paul
>>
>>                  On 05/27/2014 01:55 PM, Gerlando Falauto wrote:
>>
>>                      Hi Martin,
>>
>>                      I have been struggling for a while with this issue
>>                      (see the whole
>>                      thread):
>>
>>                      http://lists.lttng.org/pipermail/lttng-dev/2014-May/023035.html
>>
>>                      and landed on the same conclusions as yours (found
>>                      your message by
>>                      searching for __start___tracepoints_ptr!).
>>                      So at least you're not alone!
>>
>>                      So, did you ever manage to get any of your
>>                      questions answered:
>>
>>                              1) Have you run into a problem like this?
>>                              Is there a known
>>
>>                      fix/workaround?
>>
>>                              2) __start____tracepoints_ptrs is declared
>>                              as extern in tracepoint.h,
>>
>>                      but it
>>
>>                              is not defined. This appears to be some
>>                              sort of undocumented linker
>>
>>                      magic.
>>
>>                              http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
>>                              is the only
>>
>>                      reference I
>>
>>                              could find. Do you know where this
>>                              behavior is documented or
>>
>>                      specified (if
>>
>>                              at all)?
>>                              3) Do you know why the symbol visibility for
>>                              __start___tracepoints_ptrs
>>                              changed between 4.6.4 to 4.7.2?
>>
>>
>>                      Thank you so much!
>>                      Gerlando
>>
>>                      BTW, I'm also running GCC 4.7.2 (lttng-ust is
>>                      cross-compiled, the test
>>                      application is natively compiled).
>>
>>                      On an x86_64 host running either GCC 4.4.6 or
>>                      4.4.7, the issue is not
>>                      observed.
>>
>>
>>                      On 04/30/2014 11:57 PM, Martin Ünsal wrote:
>>
>>                          Incidentally I also asked for help on the GNU
>>                          linker-specific part
>>                          (question 2) here:
>>
>>                          http://gcc.gnu.org/ml/gcc-help/2014-04/msg00164.html
>>
>>                          Martin
>>
>>
>>                          On Wed, Apr 30, 2014 at 2:21 PM, Martin Ünsal
>>                          <martinunsal at gmail.com
>>                          <mailto:martinunsal at gmail.com>>
>>                          wrote:
>>
>>                              Hi LTTng folks
>>
>>                              I have a strange problem using LTTng-UST
>>                              on an ARM based platform. I
>>                              have
>>                              done some diagnosis but I am running low
>>                              on ideas and was hoping for
>>                              help
>>                              from the experts. I am using lttng-tools
>>                              2.2.0, lttng-ust 2.2.0,
>>                              liburcu
>>                              0.8.1. I know these are old but upgrading
>>                              is easier said than done
>>                              unfortunately. I didn't see anything
>>                              related to this problem in
>>                              relnotes,
>>                              mailing list traffic, or master branch,
>>                              but I could have missed
>>                              something.
>>
>>                              The problem showed up when I switched from
>>                              GCC 4.6.4 to 4.7.2.
>>                              Conceptually,
>>                              the situation is that I have a single
>>                              executable, call it MyProgram,
>>                              with
>>                              two plugins loaded at runtime with
>>                              dlopen(), lets call them
>>                              libPlugin1.so
>>                              and libPlugin2.so. There are three
>>                              different LTTng-UST tracepoint
>>                              providers,
>>                              one each for the executable and the two
>>                              plugins. With GCC 4.7.2,
>>                              tracepoints
>>                              in libPlugin1 stopped working. The
>>                              tracepoints in MyProgram and in
>>                              libPlugin2 continue to work correctly.
>>
>>                              I have established without a doubt that
>>                              the toolchain upgrade is the
>>                              cause
>>                              of the regression.
>>
>>                              In the debugger, I confirmed that the
>>                              tracepoint for libPlugin1.so is
>>                              being
>>                              executed, but
>>                              __tracepoint_##provider##___##name.state
>>                              is always 0
>>                              even when
>>                              I enable the tracepoint in lttng-tools. As
>>                              a result the tracepoint
>>                              callback
>>                              is not being invoked when it should be. In
>>                              MyProgram and
>>                              libPlugin2.so, the
>>                              .state variable correctly reflects whether
>>                              the tracepoint is enabled,
>>                              and if
>>                              the tracepoint is enabled, the tracepoint
>>                              callback is invoked.
>>
>>                              Next I set a breakpoint in
>>                              tracepoint_register_lib() and looked at
>>                              tracepoints_start parameter.
>>
>>                              1) With GCC 4.6.4 everything is as expected:
>>                                    a) tracepoint_register_lib() for
>>                              MyProgram called with
>>                              MyProgramProvider's
>>                              __start___tracepoints_ptrs.
>>                                    b) tracepoint_register_lib() after
>>                              libPlugin1 dlopen() called
>>                              with
>>                              libPlugin1Provider's
>>                              __start___tracepoints_ptrs
>>                                    c) tracepoint_register_lib() after
>>                              libPlugin2 dlopen() called
>>                              with
>>                              libPlugin2Provider's __start___tracepoint_ptrs
>>
>>                              2) With GCC 4.7.2 there is a problem:
>>                                    a) tracepoint_register_lib() for
>>                              MyProgram called with
>>                              MyProgramProvider's
>>                              __start___tracepoints_ptrs.
>>                                    b) tracepoint_register_lib() after
>>                              libPlugin1 dlopen() called
>>                              with
>>                              MyProgramProvider's
>>                              __start___tracepoints_ptrs (!!!! THIS IS WRONG
>>                              !!!!)
>>                                    c) tracepoint_register_lib() after
>>                              libPlugin2 dlopen() called
>>                              with
>>                              libPlugin2Provider's __start___tracepoint_ptrs
>>
>>                              I looked at the symbol table for
>>                              libPlugin1.so to see if it would
>>                              shed some
>>                              light on the problem.
>>
>>                              1) With GCC 4.6.4:
>>                              # objdump -t /usr/lib/.debug/libPlugin1.so
>>                              | grep
>>                              __start___tracepoints_ptrs
>>                              00025bb0 l       *ABS* 00000000
>>                              __start___tracepoints_ptrs
>>                              # objdump -t /usr/lib/.debug/libPlugin2.so
>>                              | grep
>>                              __start___tracepoints_ptrs
>>                              00041eb4 l       *ABS* 00000000
>>                              __start___tracepoints_ptrs
>>
>>                              2) With GCC 4.7.2:
>>                              # objdump -t /usr/lib/.debug/libPlugin1.so
>>                              | grep
>>                              __start___tracepoints_ptrs
>>                              00025a90 g       __tracepoints_ptrs 00000000
>>                              __start___tracepoints_ptrs
>>                              # objdump -t /usr/lib/.debug/libPlugin2.so
>>                              | grep
>>                              __start___tracepoints_ptrs
>>                              00041eb4 g       __tracepoints_ptrs 00000000
>>                              __start___tracepoints_ptrs
>>
>>                              My hypothesis at this point is that since
>>                              __start___tracepoints_ptrs
>>                              changed
>>                              from a local to a global symbol, the
>>                              dynamic loader no longer knows
>>                              how to
>>                              select the correct weak symbol. I cannot
>>                              explain why libPlugin2 still
>>                              loads
>>                              its provider correctly, perhaps it is just
>>                              getting lucky.
>>
>>                              A few questions come to mind...
>>                              1) Have you run into a problem like this?
>>                              Is there a known
>>                              fix/workaround?
>>                              2) __start____tracepoints_ptrs is declared
>>                              as extern in tracepoint.h,
>>                              but it
>>                              is not defined. This appears to be some
>>                              sort of undocumented linker
>>                              magic.
>>                              http://gcc.gnu.org/ml/gcc-help/2010-04/msg00120.html
>>                              is the only
>>                              reference I
>>                              could find. Do you know where this
>>                              behavior is documented or
>>                              specified (if
>>                              at all)?
>>                              3) Do you know why the symbol visibility for
>>                              __start___tracepoints_ptrs
>>                              changed between 4.6.4 to 4.7.2?
>>
>>                              Thanks for any help. This is a real
>>                              puzzler for me.
>>
>>                              Martin
>>
>>
>>                          _______________________________________________
>>                          lttng-dev mailing list
>>                          lttng-dev at lists.lttng.org
>>                          <mailto:lttng-dev at lists.lttng.org>
>>                          http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>>
>>
>>                      _______________________________________________
>>                      lttng-dev mailing list
>>                      lttng-dev at lists.lttng.org
>>                      <mailto:lttng-dev at lists.lttng.org>
>>                      http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Paul Woegerer, SW Development Engineer
> Sourcery Analyzer <http://go.mentor.com/sourceryanalyzer>
> Mentor Graphics, Embedded Software Division
>




More information about the lttng-dev mailing list