[lttng-dev] Using lttng-ust 2.13.6 from Yocto Kirkstone and getting weird segfault saying strlen_asimd.S can't be found.

Brian Hutchinson b.hutchman at gmail.com
Mon Jul 29 18:36:48 EDT 2024


Hi Kienan,

Just now answering you on the question I owed you regarding the
experiment you requested about resolved symbols.

See below.

On Wed, Jul 24, 2024 at 12:04 PM Kienan Stewart <kstewart at efficios.com> wrote:
>
> Hi Brian,
>
> On 7/22/24 6:00 PM, Brian Hutchinson wrote:
> > Hi Kienan,
> >
> > Took a while to gather your grocery list but I think I have most of it
> > below ;)
>
> thanks for all the extra info. Replies inline below, but I'll cut a lot
> of the long output for readability.
>
> tl;dr the environment continues to be weird, but my present suspicion is
> that something in either compilation, the linking of your app (eg. with
> ld when producing the executable), or some post linking stripping might
> be causing issues.
>
> I will stop digging into further hypotheticals on my side as there is no
> reproducer for both the environment and the application. If you ever end
> up with a minimal reproducer that you can share, I'd be more than happy
> to examine it.
>
> >
> > I may have not been clear.  Most of the application components are
> > statically linked but I think there are some that are built as shared
> > objects (.so's) so that's what I was referring to.  I know that
> > lttng-ust is dynamically linked ... I think the lttng-ust docs say this
> > is only option but also makes reference to the fact static linking was
> > once possible (in some versions of the documentation) but not supported
> > anymore (I probably have the docs memorized by now ha, ha ... I've
> > looked at many, many versions of them).
> >
> > Just for full disclosure my ldd looks like:
> >
> >         linux-vdso.so.1 (0x0000ffffab196000)
> >         libfcgi.so.0 => /usr/lib/libfcgi.so.0 (0x0000ffffa57f0000)
> >         liblttng-ust.so.1 => /usr/lib/liblttng-ust.so.1
> > (0x0000ffffa5750000)
> >         libxml2.so.2 => /usr/lib/libxml2.so.2 (0x0000ffffa55d0000)
> >         librt.so.1 => /lib/librt.so.1 (0x0000ffffa55b0000)
> >         libm.so.6 => /lib/libm.so.6 (0x0000ffffa5510000)
> >         libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x0000ffffa52f0000)
> >         libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0000ffffa52c0000)
> >         libc.so.6 => /lib/libc.so.6 (0x0000ffffa5110000)
> >         /lib/ld-linux-aarch64.so.1 (0x0000ffffab15d000)
> >         liblttng-ust-common.so.1 =>
> > /usr/local/lib/liblttng-ust-common.so.1 (0x0000ffffa50e0000)
> >         liblttng-ust-tracepoint.so.1 =>
> > /usr/local/lib/liblttng-ust-tracepoint.so.1 (0x0000ffffa50a0000)
> >         libpthread.so.0 => /lib/libpthread.so.0 (0x0000ffffa5080000)
> >         libz.so.1 => /lib/libz.so.1 (0x0000ffffa5050000)
> >
> >
>
> I find it very suspicious that `liblttng-ust.so.1` is in `/usr/lib`,
> while the other lttng-ust libraries are being loaded from `/usr/local/lib`.
>
> This information also matches the statedump and the LD_DEBUG info from
> later on.
>
> Could you verify some of the following information:
>
> 1. In your build root for lttng-ust, enumerate all the liblttng*so
> files. For each shared object, run `file $libname` and record the value
> of the BuildID hash.
>
> 2. In your deployed environment, enumerate all the liblttng*so files in
> /usr/local/lib and record the value of the BuildID hash.
>
> 3. In your deployed environment, enumerate all the liblttng*so files in
> /usr/lib and record the value of the BuildID hash.
>
> The BuildID will vary from one object to the next, but the deployed
> environment BuildIDs should match the BuildIDs of the corresponding
> library from the build root.
>
> In particular my interest is the `liblttng-ust.so.1` library  that is
> loaded  from `/usr/lib` in your deployed environment, and if there are
> any in `/usr/local/lib`.
>
> In the working hello example, does it also end up linking against
> `/usr/lib/liblttng-ust.so.1`?
>
> >     Further along in stepping back:
> >
> >        - Does make check for lttng-ust pass in your environment?
> >
> >
> > So I'm glad you mentioned this.  I didn't try to run make check because
> > #1 have never seen anything in the documentation about it and what it
> > does and #2 It said that it depended on Perl and since this is an
> > embedded system, I really didn't want to pull it and its dependencies in.
> >
> > Nevertheless, I ran it on my native target build I setup to see what
> > would happen.  I didn't notice it complaining about perl so then I
> > checked my manifest and perl is in fact in our image (I guess some other
> > recipe pulled it in and I never paid attention to that):
> >
> > Making check in include
> > make[1]: Entering directory
> > '/home/iadmin/lttng-ust/lttng-ust-2.13.8/include'
> > ...
>  >
> >
> >        - Does make check for lttng-tools pass in your environment?
> >
> > I'm still using lttng-tools built by yocto so I don't know how to run
> > make check in that situation.  From devtool maybe???  I don't know, I'd
> > have to look into that one.   I never upgraded lttng-tools either, only
> > upgraded lttng-ust and built from source tar to get the fix you
> > mentioned for TLS stuff.
>
> The poky version of the tools recipe seems to have some stuff relating
> to ptest, which seems to be a yocto tool for package testing. There
> might be a run to run that?
>
> https://git.yoctoproject.org/poky/plain/meta/recipes-kernel/lttng/lttng-tools_2.13.13.bb
> https://wiki.yoctoproject.org/wiki/Ptest
>
> Otherwise I would get the lttng-tools source for your version,
> bootstrap, configure, build, and run the tests in tree.
>
> The bulk of regression testing for lttng-ust, -modules, and -tools
> happens during `make check` for lttng-tools.
>
> >
> >        - Is this reproducible in non-yocto environments or on other
> >     architectures with the same project?
> >
> > Hmm, that's a good question.  I don't have any other embedded targets to
> > try this on that are a different arch.  I think I heard at one point
> > early on a heavily stubbed out version ran on a PC but I'd have to look
> > into that.
> >
> >        - Does running the traced application with `LTTNG_UST_DEBUG=1` yield
> >     more information?
> >
> >
> > So far I've just been trying to get the app to run without even starting
> > lttng just to get past segfault issue.  Here is what it looks like with
> > debug turned on without starting lttng or a lttng-sessiond:
> >
> > liblttng_ust_tracepoint[599/599]: Your compiler treats weak symbols with
> > hidden visibility for integer objects as SAME address between compile
> > units part of the same module. (in check_weak_hidden() at
> >
> > liblttng_ust[599/601]: Waiting for local apps sessiond (in
> > wait_for_sessiond() at lttng-ust-comm.c:1758)
> > liblttng_ust[599/600]: Info: sessiond not accepting connections to
> > global apps socket (in ust_listener_thread() at lttng-ust-comm.c:1884)
> > Segmentation fault
> >
> >        - I'd also run lttng-sessiond with the environment variable
> >     `LTTNG_UST_DEBUG=1` set and `-vvv --verbose-consumer` in the program
> >     arguments and capture for stdout & stderr into a file for analysis.
> >        - Verify if the main program is dynamically linked with `ldd`
> >        - Verify which libraries are loaded at runtime and which calls are
> >     shimmed with `LD_DEBUG=libs,binding`
> >
> >
> > I provided some of this above.
> > The output of LD_DEBUG=libs,bindings is attached as file
> > ld_debug-libs-bindings.txt
>
> Thanks. For your own debugging, I would recommend to run the hello world
> example with the same environment variables in order to have comparison
> point to a working example.
>
>  From my observations in my own environment, for the hello example (and
> presumably your app if it's set up in the same way) I would expect
> something similar to the following
>
> ```
>    3624895:     initialize program: ./hello
>     3624895:
>     3624895:     binding file ./hello [0] to
> /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `dlopen' [GLIBC_2.34]
>     3624895:     opening
> file=/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0];
> direct_opencount=3
>     3624895:
>     3624895:     binding file ./hello [0] to
> /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `dlsym' [GLIBC_2.34]
>     3624895:     binding file
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
> `lttng_ust_tp_rcu_re
> ad_lock'
>     3624895:     binding file
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
> `lttng_ust_tp_rcu_re
> ad_unlock'
>     3624895:     binding file
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
> `lttng_ust_tp_rcu_de
> reference_sym'
>     3624895:     binding file
> x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
> `lttng_ust_tracepoin
> t_module_register'
>     3624895:     binding file
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
> `lttng_ust_tracepoin
> t_module_unregister'
>     3624895:     binding file
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
> `lttng_ust_tp_disabl
> e_destructors'
>     3624895:     binding file
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
> `lttng_ust_tp_get_de
> structors_state'
>     3624895:     binding file ./hello [0] to
> /x/lttng/master/usr/lib/liblttng-ust.so.1 [0]: normal symbol
> `lttng_ust_probe_register'
>     3624895:
>     3624895:     transferring control: ./hello
>     3624895:
> ```
>
> The bit that is missing from your program is the binding of
> `lttng_ust_probe_register`. Here, it's not like the symbol isn't
> resolved at all: it's found earlier and if it didn't exist at all the
> error would probably be relatively clear like:
>
> ```
> ./doc/examples/hello/hello: symbol lookup error:
> /x/lttng/master/usr/lib/liblttng-ust.so.1: undefined symbol:
> lttng_ust_probe_register
> ```
>
> I'd like you to run another test with your app. This should indicate
> that all symbols should be immediately resolved during dynamic loading
> rather than performing lazy symbol resolution. The goal of this test is
> to separate clearly the symbol resolution from the call that happens in
> the program's initialization. You can also run it with the hello demo to
> compare the runs.
>
> ```
> LTTNG_UST_DEBUG=1 LD_DEBUG=all LD_BIND_NOW=1 ./my_app

Done!  See the attached tar ... it's quite big 38k lines!

I see interesting stuff in there but not quite sure what's going on.

Regards,

Brian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: my_resolved_symbols.tar.xz
Type: application/x-xz
Size: 129328 bytes
Desc: not available
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20240729/ede0071d/attachment-0001.xz>


More information about the lttng-dev mailing list