[lttng-dev] Using lttng-ust 2.13.6 from Yocto Kirkstone and getting weird segfault saying strlen_asimd.S can't be found.
Kienan Stewart
kstewart at efficios.com
Wed Jul 24 12:04:49 EDT 2024
Hi Brian,
On 7/22/24 6:00 PM, Brian Hutchinson wrote:
> Hi Kienan,
>
> Took a while to gather your grocery list but I think I have most of it
> below ;)
thanks for all the extra info. Replies inline below, but I'll cut a lot
of the long output for readability.
tl;dr the environment continues to be weird, but my present suspicion is
that something in either compilation, the linking of your app (eg. with
ld when producing the executable), or some post linking stripping might
be causing issues.
I will stop digging into further hypotheticals on my side as there is no
reproducer for both the environment and the application. If you ever end
up with a minimal reproducer that you can share, I'd be more than happy
to examine it.
>
> I may have not been clear. Most of the application components are
> statically linked but I think there are some that are built as shared
> objects (.so's) so that's what I was referring to. I know that
> lttng-ust is dynamically linked ... I think the lttng-ust docs say this
> is only option but also makes reference to the fact static linking was
> once possible (in some versions of the documentation) but not supported
> anymore (I probably have the docs memorized by now ha, ha ... I've
> looked at many, many versions of them).
>
> Just for full disclosure my ldd looks like:
>
> linux-vdso.so.1 (0x0000ffffab196000)
> libfcgi.so.0 => /usr/lib/libfcgi.so.0 (0x0000ffffa57f0000)
> liblttng-ust.so.1 => /usr/lib/liblttng-ust.so.1
> (0x0000ffffa5750000)
> libxml2.so.2 => /usr/lib/libxml2.so.2 (0x0000ffffa55d0000)
> librt.so.1 => /lib/librt.so.1 (0x0000ffffa55b0000)
> libm.so.6 => /lib/libm.so.6 (0x0000ffffa5510000)
> libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x0000ffffa52f0000)
> libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0000ffffa52c0000)
> libc.so.6 => /lib/libc.so.6 (0x0000ffffa5110000)
> /lib/ld-linux-aarch64.so.1 (0x0000ffffab15d000)
> liblttng-ust-common.so.1 =>
> /usr/local/lib/liblttng-ust-common.so.1 (0x0000ffffa50e0000)
> liblttng-ust-tracepoint.so.1 =>
> /usr/local/lib/liblttng-ust-tracepoint.so.1 (0x0000ffffa50a0000)
> libpthread.so.0 => /lib/libpthread.so.0 (0x0000ffffa5080000)
> libz.so.1 => /lib/libz.so.1 (0x0000ffffa5050000)
>
>
I find it very suspicious that `liblttng-ust.so.1` is in `/usr/lib`,
while the other lttng-ust libraries are being loaded from `/usr/local/lib`.
This information also matches the statedump and the LD_DEBUG info from
later on.
Could you verify some of the following information:
1. In your build root for lttng-ust, enumerate all the liblttng*so
files. For each shared object, run `file $libname` and record the value
of the BuildID hash.
2. In your deployed environment, enumerate all the liblttng*so files in
/usr/local/lib and record the value of the BuildID hash.
3. In your deployed environment, enumerate all the liblttng*so files in
/usr/lib and record the value of the BuildID hash.
The BuildID will vary from one object to the next, but the deployed
environment BuildIDs should match the BuildIDs of the corresponding
library from the build root.
In particular my interest is the `liblttng-ust.so.1` library that is
loaded from `/usr/lib` in your deployed environment, and if there are
any in `/usr/local/lib`.
In the working hello example, does it also end up linking against
`/usr/lib/liblttng-ust.so.1`?
> Further along in stepping back:
>
> - Does make check for lttng-ust pass in your environment?
>
>
> So I'm glad you mentioned this. I didn't try to run make check because
> #1 have never seen anything in the documentation about it and what it
> does and #2 It said that it depended on Perl and since this is an
> embedded system, I really didn't want to pull it and its dependencies in.
>
> Nevertheless, I ran it on my native target build I setup to see what
> would happen. I didn't notice it complaining about perl so then I
> checked my manifest and perl is in fact in our image (I guess some other
> recipe pulled it in and I never paid attention to that):
>
> Making check in include
> make[1]: Entering directory
> '/home/iadmin/lttng-ust/lttng-ust-2.13.8/include'
> ...
>
>
> - Does make check for lttng-tools pass in your environment?
>
> I'm still using lttng-tools built by yocto so I don't know how to run
> make check in that situation. From devtool maybe??? I don't know, I'd
> have to look into that one. I never upgraded lttng-tools either, only
> upgraded lttng-ust and built from source tar to get the fix you
> mentioned for TLS stuff.
The poky version of the tools recipe seems to have some stuff relating
to ptest, which seems to be a yocto tool for package testing. There
might be a run to run that?
https://git.yoctoproject.org/poky/plain/meta/recipes-kernel/lttng/lttng-tools_2.13.13.bb
https://wiki.yoctoproject.org/wiki/Ptest
Otherwise I would get the lttng-tools source for your version,
bootstrap, configure, build, and run the tests in tree.
The bulk of regression testing for lttng-ust, -modules, and -tools
happens during `make check` for lttng-tools.
>
> - Is this reproducible in non-yocto environments or on other
> architectures with the same project?
>
> Hmm, that's a good question. I don't have any other embedded targets to
> try this on that are a different arch. I think I heard at one point
> early on a heavily stubbed out version ran on a PC but I'd have to look
> into that.
>
> - Does running the traced application with `LTTNG_UST_DEBUG=1` yield
> more information?
>
>
> So far I've just been trying to get the app to run without even starting
> lttng just to get past segfault issue. Here is what it looks like with
> debug turned on without starting lttng or a lttng-sessiond:
>
> liblttng_ust_tracepoint[599/599]: Your compiler treats weak symbols with
> hidden visibility for integer objects as SAME address between compile
> units part of the same module. (in check_weak_hidden() at
>
> liblttng_ust[599/601]: Waiting for local apps sessiond (in
> wait_for_sessiond() at lttng-ust-comm.c:1758)
> liblttng_ust[599/600]: Info: sessiond not accepting connections to
> global apps socket (in ust_listener_thread() at lttng-ust-comm.c:1884)
> Segmentation fault
>
> - I'd also run lttng-sessiond with the environment variable
> `LTTNG_UST_DEBUG=1` set and `-vvv --verbose-consumer` in the program
> arguments and capture for stdout & stderr into a file for analysis.
> - Verify if the main program is dynamically linked with `ldd`
> - Verify which libraries are loaded at runtime and which calls are
> shimmed with `LD_DEBUG=libs,binding`
>
>
> I provided some of this above.
> The output of LD_DEBUG=libs,bindings is attached as file
> ld_debug-libs-bindings.txt
Thanks. For your own debugging, I would recommend to run the hello world
example with the same environment variables in order to have comparison
point to a working example.
From my observations in my own environment, for the hello example (and
presumably your app if it's set up in the same way) I would expect
something similar to the following
```
3624895: initialize program: ./hello
3624895:
3624895: binding file ./hello [0] to
/lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `dlopen' [GLIBC_2.34]
3624895: opening
file=/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0];
direct_opencount=3
3624895:
3624895: binding file ./hello [0] to
/lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `dlsym' [GLIBC_2.34]
3624895: binding file
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
`lttng_ust_tp_rcu_re
ad_lock'
3624895: binding file
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
`lttng_ust_tp_rcu_re
ad_unlock'
3624895: binding file
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
`lttng_ust_tp_rcu_de
reference_sym'
3624895: binding file
x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
`lttng_ust_tracepoin
t_module_register'
3624895: binding file
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
`lttng_ust_tracepoin
t_module_unregister'
3624895: binding file
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
`lttng_ust_tp_disabl
e_destructors'
3624895: binding file
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
`lttng_ust_tp_get_de
structors_state'
3624895: binding file ./hello [0] to
/x/lttng/master/usr/lib/liblttng-ust.so.1 [0]: normal symbol
`lttng_ust_probe_register'
3624895:
3624895: transferring control: ./hello
3624895:
```
The bit that is missing from your program is the binding of
`lttng_ust_probe_register`. Here, it's not like the symbol isn't
resolved at all: it's found earlier and if it didn't exist at all the
error would probably be relatively clear like:
```
./doc/examples/hello/hello: symbol lookup error:
/x/lttng/master/usr/lib/liblttng-ust.so.1: undefined symbol:
lttng_ust_probe_register
```
I'd like you to run another test with your app. This should indicate
that all symbols should be immediately resolved during dynamic loading
rather than performing lazy symbol resolution. The goal of this test is
to separate clearly the symbol resolution from the call that happens in
the program's initialization. You can also run it with the hello demo to
compare the runs.
```
LTTNG_UST_DEBUG=1 LD_DEBUG=all LD_BIND_NOW=1 ./my_app
```
>
> - Review in detail which gcc commands are executed to produce the
> tracepoint provider and link it to the main executable.
>
> Exactly like hello world example.
Okay. I see your program also has at least some portions in C++, which
is different than the hello world example.
Does your build process use particular optimizations or strip parts from
the resulting object files?
When you're building the hello world example, does it get compiled in
the same buildroot as your app, or is it done differently?
Do you use a custom linker script(s)?
>
>
>
> So the embedded target doesn't have babletrace so I archived the session
> and transferred it over to my dev PC and used tracecompas to view the
> lttng-session.
>
> You can see that in the attached file my_session.csv
This is good, the statedump is happening properly.
>
> ```
>
> >
> > Thanks! I'm running out of things to try.
>
For the next steps I would start disassembling "my_app" or checking in
gdb which bad address is being fed into strnlen.
Sections of interest (`objdump -sD my_app` _and_ `readelf -W -a my_app`):
- lttng_ust_tracepoints_ptrs
- lttng_ust_tracepoints_strings
Symbols of interest (`objdump -sD my_app` _and `nm my_app`):
- lttng_ust_tracepoint_hello_world___my_first_tracepoint
- lttng_ust_tracepoints_ptrs
- __start_lttng_ust_tracepoints_ptrs
- __stop_lttng_ust_tracepoints_ptrs
thanks,
kienan
> Ultimately, given the bespoke environment, build steps, and application
> it's tough to diagnose a lot of things that we would go over with a
> fine
> tooth comb: seeing how the TPPs are built and linked, seeing how the
> application is built and linked, analyzing what's happening at runtime,
> having a coredump that allows us to see the variables, etc.
>
> If you can provide a minimal reproducer, it's possible to dig further
> into it.
>
> Otherwise, to look in more detail at your specific project would be
> covered under a service contract (for which we can sign NDAs, etc. as
> needed). Feel free to reach out the sales at efficios.com
> <mailto:sales at efficios.com> to organise that.
>
>
> I understand but so far schedules & budget/priorities haven't aligned so
> once again I've been scheduled to look at this issue again since we last
> tried it on older Dunfell OS with older versions of lttng and it looks
> like we have the same problem as before.
>
> Thanks for your help!
>
> Regards,
>
> Brian
>
>
> >
> > I don't fully understand the implications of some of this
> documentation,
> > but I have learned enough to know LD_PRELOAD means nothing if
> everything
> > is static built. Our application is heavily multi threaded, uses
> fork,
> > clone and who knows if it's doing double closes or not ... so
> I've tried
> > these LD_PRELOAD "helpers" or whatever they are, before and
> didn't get a
> > different result, but now know it's only for shared library
> object. So
> > now I will experiment with making our tpp a shared object and try
> > LD_PRELOAD of fork, fd and pthread helpers and starting my app
> that way.
>
> LD_PRELOAD has an effect if the application you are running is
> dynamically linked (note: the TPP can be statically linked inside an
> dynamically linked executable, and the preloads are still useful and/or
> needed in that case).
>
> The function of the helpers are to ensure that various system calls
> don't clobber things in use by lttng-ust. E.g., close() is shimmed by
> liblttng-ust-fd.so so that FDs that lttng-ust uses aren't suddenly shut
> by the program.
>
> If you want to see it in action, you could try an experiment with a
> slightly modified hello-static-lib application, with instructions in
> the
> top comment:
> https://gist.github.com/kienanstewart/299eb6a511d92d458569b210e4a418a8 <https://gist.github.com/kienanstewart/299eb6a511d92d458569b210e4a418a8>
>
> The effect of the preloads such as liblttng-ust-fd.so are
> independent of
> how the TPP is built and linked to the application or library.
>
> thanks,
> kienan
>
> >
> > This is what I'm referring to above from the lttng docs:
> >
> >
> > Use LTTng-UST with daemons
> >
> <https://lttng.org/docs/v2.13/#doc-using-lttng-ust-with-daemons
> <https://lttng.org/docs/v2.13/#doc-using-lttng-ust-with-daemons>>
> >
> > If your instrumented application calls fork(2)
> > <https://man7.org/linux/man-pages/man2/fork.2.html
> <https://man7.org/linux/man-pages/man2/fork.2.html>>, clone(2)
> > <https://man7.org/linux/man-pages/man2/clone.2.html
> <https://man7.org/linux/man-pages/man2/clone.2.html>>, or BSD’s
> rfork(2)
> >
> <http://www.freebsd.org/cgi/man.cgi?query=rfork&sektion=2&manpath=FreeBSD+4.10-RELEASE <http://www.freebsd.org/cgi/man.cgi?query=rfork&sektion=2&manpath=FreeBSD+4.10-RELEASE>>, without a following exec(3) <https://man7.org/linux/man-pages/man3/exec.3.html <https://man7.org/linux/man-pages/man3/exec.3.html>>-family system call, you must preload the |liblttng-ust-fork.so| shared object when you start the application.
> >
> > LD_PRELOAD=liblttng-ust-fork.so ./my-app
> >
> > If your tracepoint provider package is a shared library which you
> also
> > preload, you must put both shared objects in |LD_PRELOAD|:
> >
> > LD_PRELOAD=liblttng-ust-fork.so:/path/to/tp.so ./my-app
> >
> >
> > Use LTTng-UST with applications which close file
> descriptors
> > that don’t belong to them
> > <https://lttng.org/docs/v2.13/#doc-liblttng-ust-fd
> <https://lttng.org/docs/v2.13/#doc-liblttng-ust-fd>>
> >
> > Since 2.9
> >
> > If your instrumented application closes one or more file descriptors
> > which it did not open itself, you must preload the
> |liblttng-ust-fd.so|
> > shared object when you start the application:
> >
> > LD_PRELOAD=liblttng-ust-fd.so ./my-app
> >
> > Typical use cases include closing all the file descriptors after
> fork(2)
> > <https://man7.org/linux/man-pages/man2/fork.2.html
> <https://man7.org/linux/man-pages/man2/fork.2.html>> or rfork(2)
> >
> <http://www.freebsd.org/cgi/man.cgi?query=rfork&sektion=2&manpath=FreeBSD+4.10-RELEASE <http://www.freebsd.org/cgi/man.cgi?query=rfork&sektion=2&manpath=FreeBSD+4.10-RELEASE>> and buggy applications doing “double closes”.
> >
> >
> > Thanks,
> >
> > Brian
> >
> >
>
More information about the lttng-dev
mailing list