[lttng-dev] Using lttng-ust 2.13.6 from Yocto Kirkstone and getting weird segfault saying strlen_asimd.S can't be found.
Kienan Stewart
kstewart at efficios.com
Mon Jul 29 15:03:34 EDT 2024
Hi Brian,
On 7/25/24 3:54 PM, Brian Hutchinson wrote:
> Hi Kienan,
>
> I'll answer your questions below, but I've got questions on what I saw
> building and installing lttng-tools (2.13.13) and lttng-ust (2.13.8).
>
> Based on the struggles I've had trying to get lttng to work with my
> app over various Yocto versions (Dunfell & Kirkstone) and lttng
> version, I think the problems I'm facing are mostly around C++ and
> weak and hidden symbols in Yocto toolchain.
>
> When I started my app with the options you mentioned previously a
> while back, Id see things like:
>
> # LTTNG_UST_DEBUG=1 LTTNG_UST_REGISTER_TIMEOUT=-1 /opt/tc/TrafficController
> liblttng_ust_tracepoint[4012/4012]: Your compiler treats weak symbols
> with hidden visibility for integer objects as SAME address between
> compile units part of the same module. (in check_weak_hidden() at
> tracepoint.c:1012)
> liblttng_ust_tracepoint[4012/4012]: Your compiler treats weak symbols
> with hidden visibility for pointer objects as SAME address between
> compile units part of the same module. (in check_weak_hidden() at
> tracepoint.c:1016)
> liblttng_ust_tracepoint[4012/4012]: Your compiler treats weak symbols
> with hidden visibility for 24-byte structure objects as SAME address
> between compile units part of the same module. (in check_weak_hidden()
> at tracepoint.c:1020)
>
These messages are extra information for debugging and not indicative of
a problem in of itself. C.f.
https://github.com/lttng/lttng-ust/blob/24f7193c9b918bf714a40e9fc908eeb4978ada1c/src/lib/lttng-ust-tracepoint/tracepoint.c#L1010
There is a unit test related to this:
https://github.com/lttng/lttng-ust/blob/24f7193c9b918bf714a40e9fc908eeb4978ada1c/tests/unit/gcc-weak-hidden/main.c#L76
> I further researched this whole 'weak symbol' and 'hidden visibility'
> topic in the lttng-dev archives and it smells a lot like what I've
> been seeing. You should be able to mix both tracef and tracepoint
> calls in souce code ... but I could not. I could get a tracef call to
> work but if I put a tracepoint call in the same code then nothing
> would work. This was with Dunfell 3.1.7 and earlier versions of
> lttng.
>
> At one point I could get a tracepoint call to work but I'd have to let
> our cmake build system build and link the tpp.c file and then turn
> around and use gcc to recompile it and copy it to where all the
> objects were to create the huge .a library the app was built against.
> That's when I first learned there are issues with C++. I think g++ is
> used to build even .c files that aren't c++.
>
> Then if I tried to put a tracepoint in another sub project, none of
> the tracepoints would work and I'd get empty traces. This is a
> symptom of the 'weak symbols with hidden visibility' issue ... and I
> finally found others that were having same issue in the archives. I
> don't fully understand the issues here, although I do understand some
> of what's going on ... I just don't know what to do about it.
>
You said initially said that you're using `lttng_ust_tracepoint` exactly
as the hello world from the documentation; however, you have just
described several attempts at doing different things. Which case are we
trying to understand here?
> At this point I was being encouraged to keep upgrading to newer
> versions of lttng. Our app never changed, gcc & lttng etc., kept
> changing. Now with newer versions nothing runs, all I get is an
> immediate segfault. Again, I'm building just like I did before a year
> or so ago with older versions of Yocto and lttng. I say all of that
> to give perspective and history of what I've seen and experienced.
> Now this TLS thing has entered the picture too and so far I've only
> changed lttng, I don't know if I should be applying patches to my gcc
> for that issue. Like I said, I'm currently using Yocto Kirkstone
> 4.0.18 and 6.1.38 kernel.
>
> Now I'll move into the area of things I've seen building/installing
> lttng-tools and lttng-ust natively on the target environment I've
> setup where I can run 'make check' etc. These are in the category of
> "hey, is this ok, should I be worried about this":
>
> While building lttng-tools I see things like:
>
> *** Warning: Linking the executable userspace-probe-elf-binary against
> the loadable module
> *** libfoo.so is not portable!
>
The library is for a test program. My understanding is that the library
is compiled that way to force a stripped shared object to be produced in
order to validate that symbol lookups in libraries with no symtab
function as expected by using the dynsym table.
C.f.
https://github.com/lttng/lttng-tools/commit/ef3dfe5d31c88fb548189a6441aaf8b2afc0bd4b
> In file included from ../../../src/common/macros.h:15,
> from ../../../include/lttng/health-internal.h:19,
> from lttng-ctl-health.c:19:
> In function 'lttng_strnlen',
> inlined from 'lttng_strncpy' at ../../../src/common/macros.h:128:6,
> inlined from 'set_health_socket_path' at lttng-ctl-health.c:146:9,
> inlined from 'lttng_health_query' at lttng-ctl-health.c:264:8:
> ../../../src/common/compat/string.h:19:16: warning: 'strnlen'
> specified bound 4096 may exceed source size 37 [-Wstringop-overread]
> 19 | return strnlen(str, max);
> | ^~~~~~~~~~~~~~~~~
> lttng-ctl-health.c: At top level:
> cc1: note: unrecognized command-line option
> '-Wno-incomplete-setjmp-declaration' may have been intended to silence
> earlier diagnostics
This warning is addressed in
https://github.com/lttng/lttng-tools/commit/b25a59916106e5055be516f61f183a48f459b0b3
> ** Warning: Linking the shared library libbar.la against the loadable module
> *** libzzz.so is not portable!
>
> *** Warning: Linking the shared library libfoo.la against the loadable module
> *** libbar.so is not portable!
>
> While installing lttng-tools I see things like this:
>
> make[4]: Entering directory '/opt/lttng/lttng-tools-2.13.13/src/lib/lttng-ctl'
> CC lttng-ctl.lo
> CC snapshot.lo
> CC lttng-ctl-health.lo
> In file included from ../../../src/common/macros.h:15,
> from ../../../include/lttng/health-internal.h:19,
> from lttng-ctl-health.c:19:
> In function 'lttng_strnlen',
> inlined from 'lttng_strncpy' at ../../../src/common/macros.h:128:6,
> inlined from 'set_health_socket_path' at lttng-ctl-health.c:146:9,
> inlined from 'lttng_health_query' at lttng-ctl-health.c:264:8:
> ../../../src/common/compat/string.h:19:16: warning: 'strnlen'
> specified bound 4096 may exceed source size 37 [-Wstringop-overread]
> 19 | return strnlen(str, max);
> | ^~~~~~~~~~~~~~~~~
> lttng-ctl-health.c: At top level:
> cc1: note: unrecognized command-line option
> '-Wno-incomplete-setjmp-declaration' may have been intended to silence
> earlier diagnostics
>
> Making install in trigger-condition-event-matches
> make[2]: Entering directory
> '/opt/lttng/lttng-tools-2.13.13/doc/examples/trigger-condition-event-matches'
> CC instrumented-app.o
> CC tracepoint-trigger-example.o
> AR libtracepoint-trigger-example.a
> ar: `u' modifier ignored since `D' is the default (see `U')
>
> While building lttng-ust I see things like:
>
> Making all in utils
> make[2]: Entering directory
> '/home/iadmin/lttng-ust/lttng-ust-2.13.8/tests/utils'
> CC tap.o
> AR libtap.a
> ar: `u' modifier ignored since `D' is the default (see `U')
>
While libtool now uses `cr` by default, automake still defines the
default to `cru` which is what ends up getting used in the example.
Since many distros have changed the configuration of ar such that 'D' is
the default rather than the previous behaviour 'U', 'u' is redundant.
The behaviour in automake has been changed in automake 1.16.90+.
C.f.
https://github.com/autotools-mirror/libtool/commit/418129bc63afc312701e84cb8afa5ca413df1ab5
C.f.
http://git.savannah.gnu.org/cgit/automake.git/commit/?id=8cdbdda5aec652c356fe6dbba96810202176ae75
> *** Warning: Linking the shared library libzero.la against the
> loadable module
> *** libfakeust0.so is not portable!
> CCLD app_noust_indirect_abi0
>
> *** Warning: Linking the executable app_noust_indirect_abi0 against
> the loadable module
> *** libzero.so is not portable!
> CC app_noust_indirect_abi0_abi1-app_noust.o
> CC libone.lo
> CCLD libone.la
> CCLD app_noust_indirect_abi0_abi1
>
> *** Warning: Linking the executable app_noust_indirect_abi0_abi1
> against the loadable module
> *** libzero.so is not portable!
>
> *** Warning: Linking the executable app_noust_indirect_abi0_abi1
> against the loadable module
> *** libone.so is not portable!
> CC app_noust_indirect_abi1-app_noust.o
> CCLD app_noust_indirect_abi1
>
> *** Warning: Linking the executable app_noust_indirect_abi0_abi1
> against the loadable module
> *** libone.so is not portable!
> CC app_noust_indirect_abi1-app_noust.o
> CCLD app_noust_indirect_abi1
>
> *** Warning: Linking the executable app_noust_indirect_abi1 against
> the loadable module
> *** libone.so is not portable!
> CC app_ust.o
> CC tp.o
> CCLD app_ust
> CC app_ust_dlopen.o
> CCLD app_ust_dlopen
> CC app_ust_indirect_abi0-app_ust.o
> CC app_ust_indirect_abi0-tp.o
> CCLD app_ust_indirect_abi0
>
> *** Warning: Linking the executable app_ust_indirect_abi0 against the
> loadable module
> *** libzero.so is not portable!
> CC app_ust_indirect_abi0_abi1-app_ust.o
> CC app_ust_indirect_abi0_abi1-tp.o
> CCLD app_ust_indirect_abi0_abi1
>
> *** Warning: Linking the executable app_ust_indirect_abi0_abi1 against
> the loadable module
> *** libzero.so is not portable!
>
> I don't know if these are ok or if I should be worried about any of that.
>
These are all for different tests.
> ... now on to your questions below.
>
>
>
> On Wed, Jul 24, 2024 at 12:04 PM Kienan Stewart <kstewart at efficios.com> wrote:
>>
>> Hi Brian,
>>
>> On 7/22/24 6:00 PM, Brian Hutchinson wrote:
>>> Hi Kienan,
>>>
>>> Took a while to gather your grocery list but I think I have most of it
>>> below ;)
>>
>> thanks for all the extra info. Replies inline below, but I'll cut a lot
>> of the long output for readability.
>>
>> tl;dr the environment continues to be weird, but my present suspicion is
>> that something in either compilation, the linking of your app (eg. with
>> ld when producing the executable), or some post linking stripping might
>> be causing issues.
>
> I'm not aware of any stripping that's going on. In fact everything is
> being built with debug symbols at the moment and I even turned off
> optimization ... even used the debug friendly -O flag to see if that
> made a difference.
>
>>
>> I will stop digging into further hypotheticals on my side as there is no
>> reproducer for both the environment and the application. If you ever end
>> up with a minimal reproducer that you can share, I'd be more than happy
>> to examine it.
>
> I'm planning on trying to make a small reproducer I can share but not there yet.
>
Great! I appreciate that you're taking the time to do so.
>>
>>>
>>> I may have not been clear. Most of the application components are
>>> statically linked but I think there are some that are built as shared
>>> objects (.so's) so that's what I was referring to. I know that
>>> lttng-ust is dynamically linked ... I think the lttng-ust docs say this
>>> is only option but also makes reference to the fact static linking was
>>> once possible (in some versions of the documentation) but not supported
>>> anymore (I probably have the docs memorized by now ha, ha ... I've
>>> looked at many, many versions of them).
>>>
>>> Just for full disclosure my ldd looks like:
>>>
>>> linux-vdso.so.1 (0x0000ffffab196000)
>>> libfcgi.so.0 => /usr/lib/libfcgi.so.0 (0x0000ffffa57f0000)
>>> liblttng-ust.so.1 => /usr/lib/liblttng-ust.so.1
>>> (0x0000ffffa5750000)
>>> libxml2.so.2 => /usr/lib/libxml2.so.2 (0x0000ffffa55d0000)
>>> librt.so.1 => /lib/librt.so.1 (0x0000ffffa55b0000)
>>> libm.so.6 => /lib/libm.so.6 (0x0000ffffa5510000)
>>> libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x0000ffffa52f0000)
>>> libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0000ffffa52c0000)
>>> libc.so.6 => /lib/libc.so.6 (0x0000ffffa5110000)
>>> /lib/ld-linux-aarch64.so.1 (0x0000ffffab15d000)
>>> liblttng-ust-common.so.1 =>
>>> /usr/local/lib/liblttng-ust-common.so.1 (0x0000ffffa50e0000)
>>> liblttng-ust-tracepoint.so.1 =>
>>> /usr/local/lib/liblttng-ust-tracepoint.so.1 (0x0000ffffa50a0000)
>>> libpthread.so.0 => /lib/libpthread.so.0 (0x0000ffffa5080000)
>>> libz.so.1 => /lib/libz.so.1 (0x0000ffffa5050000)
>>>
>>>
>>
>> I find it very suspicious that `liblttng-ust.so.1` is in `/usr/lib`,
>> while the other lttng-ust libraries are being loaded from `/usr/local/lib`.
>
> So Yocto puts all of the lttng libs into /usr/lib. When I sent the
> previous info I was using lttng-tools and modules built by Yocto/OE
> and I setup a native build environment on the target so I could run
> 'make check' etc., and that's why there were things in /usr/local/lib
> because that's where you guys want stuff to be. So I actually left
> the lttng-ust installables in /usr/local/build but also copied them to
> /usr/lib to overwrite old Yocto versions there.
>
It's not so much that it's "where we want it to be". The documentation
uses `/usr/local/lib` because `/usr/local` is meant for software
installed by the sysadmin administrator, as is the case when building a
custom version. `/usr/lib` should be used by packages shipped with the
system.
C.f. https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
You're free to do as you see fit, but when you start mixing and matching
libraries and some are put in /usr/lib by your system packages and some
you move there manually I find it more difficult to follow what is going on.
>>
>> This information also matches the statedump and the LD_DEBUG info from
>> later on.
>>
>> Could you verify some of the following information:
>>
>> 1. In your build root for lttng-ust, enumerate all the liblttng*so
>> files. For each shared object, run `file $libname` and record the value
>> of the BuildID hash.R5jow
>
> Sorry, I'm not following you here. The only buildID hash I can think
> of is with 'eu-unstrip -n' but that's on core files, not individual
> libs. And looking at the options I have for 'file' on my target, I
> don't see anything that looks like what you are asking.
Perhaps I wasn't clear, the command to run is really just `file`. As a
fuller example:
```
$ file ./src/lib/lttng-ust-fork/.libs/liblttng-ust-fork.so.1.0.0
./src/lib/lttng-ust-fork/.libs/liblttng-ust-fork.so.1.0.0: ELF 64-bit
LSB shared object, x86-64, version 1 (SYSV), dynamically linked,
BuildID[sha1]=b2b4a0fc449cf317e32c23e0bb57ea1ad702b702, with debug_info,
not stripped
```
BuildID is added by default when linking with ld >= ~ 2.18
The idea here is to use the BuildID as a way to validate that the
libraries built and the ones deployed and loaded at runtime.
>
>>
>> 2. In your deployed environment, enumerate all the liblttng*so files in
>> /usr/local/lib and record the value of the BuildID hash.
>>
>> 3. In your deployed environment, enumerate all the liblttng*so files in
>> /usr/lib and record the value of the BuildID hash.
>>
>> The BuildID will vary from one object to the next, but the deployed
>> environment BuildIDs should match the BuildIDs of the corresponding
>> library from the build root.
>>
>> In particular my interest is the `liblttng-ust.so.1` library that is
>> loaded from `/usr/lib` in your deployed environment, and if there are
>> any in `/usr/local/lib`.
>
> As I said above, when I built lttng-ust from source natively on the
> target and installed it, it went to /usr/local/lib, your default. I
> then copied everything from /usr/local/lib (which was just lttng-ust
> stuff) to /usr/lib to overwrite the old versions Yocto put there.
>
>>
>> In the working hello example, does it also end up linking against
>> `/usr/lib/liblttng-ust.so.1`?
>
> Yes
Okay.
>
>>
>>> Further along in stepping back:
>>>
>>> - Does make check for lttng-ust pass in your environment?
>>>
>>>
>>> So I'm glad you mentioned this. I didn't try to run make check because
>>> #1 have never seen anything in the documentation about it and what it
>>> does and #2 It said that it depended on Perl and since this is an
>>> embedded system, I really didn't want to pull it and its dependencies in.
>>>
>>> Nevertheless, I ran it on my native target build I setup to see what
>>> would happen. I didn't notice it complaining about perl so then I
>>> checked my manifest and perl is in fact in our image (I guess some other
>>> recipe pulled it in and I never paid attention to that):
>>>
>>> Making check in include
>>> make[1]: Entering directory
>>> '/home/iadmin/lttng-ust/lttng-ust-2.13.8/include'
>>> ...
>> >
>>>
>>> - Does make check for lttng-tools pass in your environment?
>>>
>>> I'm still using lttng-tools built by yocto so I don't know how to run
>>> make check in that situation. From devtool maybe??? I don't know, I'd
>>> have to look into that one. I never upgraded lttng-tools either, only
>>> upgraded lttng-ust and built from source tar to get the fix you
>>> mentioned for TLS stuff.
>>
>> The poky version of the tools recipe seems to have some stuff relating
>> to ptest, which seems to be a yocto tool for package testing. There
>> might be a run to run that?
>>
>> https://git.yoctoproject.org/poky/plain/meta/recipes-kernel/lttng/lttng-tools_2.13.13.bb
>> https://wiki.yoctoproject.org/wiki/Ptest
>>
>> Otherwise I would get the lttng-tools source for your version,
>> bootstrap, configure, build, and run the tests in tree.
>>
>> The bulk of regression testing for lttng-ust, -modules, and -tools
>> happens during `make check` for lttng-tools.
>
> I did all of that. First I did it with lttng-ust, building it from
> source tarball on target natively to eliminate any Yocto recipe/patch
> issues. Then did the same with lttng-tools.
Sounds like `make check` for lttng-tools passed then?
>
>>
>>>
>>> - Is this reproducible in non-yocto environments or on other
>>> architectures with the same project?
>>>
>>> Hmm, that's a good question. I don't have any other embedded targets to
>>> try this on that are a different arch. I think I heard at one point
>>> early on a heavily stubbed out version ran on a PC but I'd have to look
>>> into that.
>>>
>>> - Does running the traced application with `LTTNG_UST_DEBUG=1` yield
>>> more information?
>>>
>>>
>>> So far I've just been trying to get the app to run without even starting
>>> lttng just to get past segfault issue. Here is what it looks like with
>>> debug turned on without starting lttng or a lttng-sessiond:
>>>
>>> liblttng_ust_tracepoint[599/599]: Your compiler treats weak symbols with
>>> hidden visibility for integer objects as SAME address between compile
>>> units part of the same module. (in check_weak_hidden() at
>>>
>>> liblttng_ust[599/601]: Waiting for local apps sessiond (in
>>> wait_for_sessiond() at lttng-ust-comm.c:1758)
>>> liblttng_ust[599/600]: Info: sessiond not accepting connections to
>>> global apps socket (in ust_listener_thread() at lttng-ust-comm.c:1884)
>>> Segmentation fault
>>>
>>> - I'd also run lttng-sessiond with the environment variable
>>> `LTTNG_UST_DEBUG=1` set and `-vvv --verbose-consumer` in the program
>>> arguments and capture for stdout & stderr into a file for analysis.
>>> - Verify if the main program is dynamically linked with `ldd`
>>> - Verify which libraries are loaded at runtime and which calls are
>>> shimmed with `LD_DEBUG=libs,binding`
>>>
>>>
>>> I provided some of this above.
>>> The output of LD_DEBUG=libs,bindings is attached as file
>>> ld_debug-libs-bindings.txt
>>
>> Thanks. For your own debugging, I would recommend to run the hello world
>> example with the same environment variables in order to have comparison
>> point to a working example.
>
> I ran the hello world in the example directory
> lttng-ust-2.13.8/doc/examples/hello-static-lib and it ran fine.
>
Good.
> I can send the sessiond.log etc. if you want to see it but it ran just
> fine on target using the native built lttng-ust and lttng-tools.
My understanding at this point is the unit tests are passing for
LTTng-UST on your system, as are the unit and regression tests for
LTTng-tools. The example programs shipped with LTTng-UST work on your
system, as does the example from the documentation. The statedump
tracepoints loaded from LTTng-UST are also working fine, as evinced by
the program logs and the LTTng trace you shared.
Despite my confusion about how exactly you're using the `hello world`
tracepoint in your application (as you've now described several
variations), the direction this points to for me are details related to
how you're using LTTng-UST and/or how are your building and linking your
application. To be clear, I don't mean to say that there is or is not an
issue in LTTng-UST, but to point at where to examine next in detail
including analysis of the produced object files.
>
>>
>> From my observations in my own environment, for the hello example (and
>> presumably your app if it's set up in the same way) I would expect
>> something similar to the following
>>
>> ```
>> 3624895: initialize program: ./hello
>> 3624895:
>> 3624895: binding file ./hello [0] to
>> /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `dlopen' [GLIBC_2.34]
>> 3624895: opening
>> file=/x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0];
>> direct_opencount=3
>> 3624895:
>> 3624895: binding file ./hello [0] to
>> /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `dlsym' [GLIBC_2.34]
>> 3624895: binding file
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
>> `lttng_ust_tp_rcu_re
>> ad_lock'
>> 3624895: binding file
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
>> `lttng_ust_tp_rcu_re
>> ad_unlock'
>> 3624895: binding file
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
>> `lttng_ust_tp_rcu_de
>> reference_sym'
>> 3624895: binding file
>> x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
>> `lttng_ust_tracepoin
>> t_module_register'
>> 3624895: binding file
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
>> `lttng_ust_tracepoin
>> t_module_unregister'
>> 3624895: binding file
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
>> `lttng_ust_tp_disabl
>> e_destructors'
>> 3624895: binding file
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0] to
>> /x/lttng/master/usr/lib/liblttng-ust-tracepoint.so.1 [0]: normal symbol
>> `lttng_ust_tp_get_de
>> structors_state'
>> 3624895: binding file ./hello [0] to
>> /x/lttng/master/usr/lib/liblttng-ust.so.1 [0]: normal symbol
>> `lttng_ust_probe_register'
>> 3624895:
>> 3624895: transferring control: ./hello
>> 3624895:
>> ```
>>
>> The bit that is missing from your program is the binding of
>> `lttng_ust_probe_register`. Here, it's not like the symbol isn't
>> resolved at all: it's found earlier and if it didn't exist at all the
>> error would probably be relatively clear like:
>>
>> ```
>> ./doc/examples/hello/hello: symbol lookup error:
>> /x/lttng/master/usr/lib/liblttng-ust.so.1: undefined symbol:
>> lttng_ust_probe_register
>> ```
>>
>> I'd like you to run another test with your app. This should indicate
>> that all symbols should be immediately resolved during dynamic loading
>> rather than performing lazy symbol resolution. The goal of this test is
>> to separate clearly the symbol resolution from the call that happens in
>> the program's initialization. You can also run it with the hello demo to
>> compare the runs.
>
> At this point I need to quit for a bit so I'll send what I have up to
> this point and perform these tests on both my app and the hello
> example and report back as soon as I can.
>
>>
>> ```
>> LTTNG_UST_DEBUG=1 LD_DEBUG=all LD_BIND_NOW=1 ./my_app
>> ```
>>
>>>
>>> - Review in detail which gcc commands are executed to produce the
>>> tracepoint provider and link it to the main executable.
>>>
>>> Exactly like hello world example.
>>
>> Okay. I see your program also has at least some portions in C++, which
>> is different than the hello world example.
>>
>> Does your build process use particular optimizations or strip parts from
>> the resulting object files?
>>
>> When you're building the hello world example, does it get compiled in
>> the same buildroot as your app, or is it done differently?
>>
>> Do you use a custom linker script(s)?
>>
>>
>>>
>>>
>>>
>>> So the embedded target doesn't have babletrace so I archived the session
>>> and transferred it over to my dev PC and used tracecompas to view the
>>> lttng-session.
>>>
>>> You can see that in the attached file my_session.csv
>>
>> This is good, the statedump is happening properly.
>>
>>>
>>> ```
>>>
>>> >
>>> > Thanks! I'm running out of things to try.
>>>
>>
>> For the next steps I would start disassembling "my_app" or checking in
>> gdb which bad address is being fed into strnlen.
>>
>> Sections of interest (`objdump -sD my_app` _and_ `readelf -W -a my_app`):
>>
>> - lttng_ust_tracepoints_ptrs
>> - lttng_ust_tracepoints_strings
>>
>> Symbols of interest (`objdump -sD my_app` _and `nm my_app`):
>>
>> - lttng_ust_tracepoint_hello_world___my_first_tracepoint
>> - lttng_ust_tracepoints_ptrs
>> - __start_lttng_ust_tracepoints_ptrs
>> - __stop_lttng_ust_tracepoints_ptrs
>>
>> thanks,
>> kienan
>>
>>> Ultimately, given the bespoke environment, build steps, and application
>>> it's tough to diagnose a lot of things that we would go over with a
>>> fine
>>> tooth comb: seeing how the TPPs are built and linked, seeing how the
>>> application is built and linked, analyzing what's happening at runtime,
>>> having a coredump that allows us to see the variables, etc.
>>>
>>> If you can provide a minimal reproducer, it's possible to dig further
>>> into it.
>>>
>>> Otherwise, to look in more detail at your specific project would be
>>> covered under a service contract (for which we can sign NDAs, etc. as
>>> needed). Feel free to reach out the sales at efficios.com
>>> <mailto:sales at efficios.com> to organise that.
>>>
>>>
>>> I understand but so far schedules & budget/priorities haven't aligned so
>>> once again I've been scheduled to look at this issue again since we last
>>> tried it on older Dunfell OS with older versions of lttng and it looks
>>> like we have the same problem as before.
>>>
>>> Thanks for your help!
>>>
>>> Regards,
>>>
>>> Brian
>>>
>>>
>>> >
>>> > I don't fully understand the implications of some of this
>>> documentation,
>>> > but I have learned enough to know LD_PRELOAD means nothing if
>>> everything
>>> > is static built. Our application is heavily multi threaded, uses
>>> fork,
>>> > clone and who knows if it's doing double closes or not ... so
>>> I've tried
>>> > these LD_PRELOAD "helpers" or whatever they are, before and
>>> didn't get a
>>> > different result, but now know it's only for shared library
>>> object. So
>>> > now I will experiment with making our tpp a shared object and try
>>> > LD_PRELOAD of fork, fd and pthread helpers and starting my app
>>> that way.
>>>
>>> LD_PRELOAD has an effect if the application you are running is
>>> dynamically linked (note: the TPP can be statically linked inside an
>>> dynamically linked executable, and the preloads are still useful and/or
>>> needed in that case).
>>>
>>> The function of the helpers are to ensure that various system calls
>>> don't clobber things in use by lttng-ust. E.g., close() is shimmed by
>>> liblttng-ust-fd.so so that FDs that lttng-ust uses aren't suddenly shut
>>> by the program.
>>>
>>> If you want to see it in action, you could try an experiment with a
>>> slightly modified hello-static-lib application, with instructions in
>>> the
>>> top comment:
>>> https://gist.github.com/kienanstewart/299eb6a511d92d458569b210e4a418a8 <https://gist.github.com/kienanstewart/299eb6a511d92d458569b210e4a418a8>
>>>
>>> The effect of the preloads such as liblttng-ust-fd.so are
>>> independent of
>>> how the TPP is built and linked to the application or library.
>>>
>>> thanks,
>>> kienan
>>>
>>> >
>>> > This is what I'm referring to above from the lttng docs:
>>> >
>>> >
>>> > Use LTTng-UST with daemons
>>> >
>>> <https://lttng.org/docs/v2.13/#doc-using-lttng-ust-with-daemons
>>> <https://lttng.org/docs/v2.13/#doc-using-lttng-ust-with-daemons>>
>>> >
>>> > If your instrumented application calls fork(2)
>>> > <https://man7.org/linux/man-pages/man2/fork.2.html
>>> <https://man7.org/linux/man-pages/man2/fork.2.html>>, clone(2)
>>> > <https://man7.org/linux/man-pages/man2/clone.2.html
>>> <https://man7.org/linux/man-pages/man2/clone.2.html>>, or BSD’s
>>> rfork(2)
>>> >
>>> <http://www.freebsd.org/cgi/man.cgi?query=rfork&sektion=2&manpath=FreeBSD+4.10-RELEASE <http://www.freebsd.org/cgi/man.cgi?query=rfork&sektion=2&manpath=FreeBSD+4.10-RELEASE>>, without a following exec(3) <https://man7.org/linux/man-pages/man3/exec.3.html <https://man7.org/linux/man-pages/man3/exec.3.html>>-family system call, you must preload the |liblttng-ust-fork.so| shared object when you start the application.
>>> >
>>> > LD_PRELOAD=liblttng-ust-fork.so ./my-app
>>> >
>>> > If your tracepoint provider package is a shared library which you
>>> also
>>> > preload, you must put both shared objects in |LD_PRELOAD|:
>>> >
>>> > LD_PRELOAD=liblttng-ust-fork.so:/path/to/tp.so ./my-app
>>> >
>>> >
>>> > Use LTTng-UST with applications which close file
>>> descriptors
>>> > that don’t belong to them
>>> > <https://lttng.org/docs/v2.13/#doc-liblttng-ust-fd
>>> <https://lttng.org/docs/v2.13/#doc-liblttng-ust-fd>>
>>> >
>>> > Since 2.9
>>> >
>>> > If your instrumented application closes one or more file descriptors
>>> > which it did not open itself, you must preload the
>>> |liblttng-ust-fd.so|
>>> > shared object when you start the application:
>>> >
>>> > LD_PRELOAD=liblttng-ust-fd.so ./my-app
>>> >
>>> > Typical use cases include closing all the file descriptors after
>>> fork(2)
>>> > <https://man7.org/linux/man-pages/man2/fork.2.html
>>> <https://man7.org/linux/man-pages/man2/fork.2.html>> or rfork(2)
>>> >
>>> <http://www.freebsd.org/cgi/man.cgi?query=rfork&sektion=2&manpath=FreeBSD+4.10-RELEASE <http://www.freebsd.org/cgi/man.cgi?query=rfork&sektion=2&manpath=FreeBSD+4.10-RELEASE>> and buggy applications doing “double closes”.
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > Brian
>>> >
>>> >
>>>
thanks,
kienan
More information about the lttng-dev
mailing list