[lttng-dev] Record stacktraces at userspace tracing domain

Tue Dec 10 11:41:37 EST 2024

Hi Alexander, Christophe,

On 12/2/24 4:17 PM, Christophe Bédard via lttng-dev wrote:
 > Hi,
 >
 > I did the same thing a while ago, i.e., trigger tracepoints on
 > malloc/free/etc. using liblttng-ust-libc-wrapper and collect userspace
 > callstack information (so that the indirect calls to malloc/free can be
 > removed from an application).
 >
 > There is a userspace callstack context implementation here for lttng-ust
 > 2.10, see the last 3 commits:
 > https://github.com/tahini/lttng-ust-1/commits/ust-callstack-2.10/. Here's
 > the corresponding lttng-tools 2.10 branch needed to enable the userspace
 > callstack context:
 > https://github.com/tahini/lttng-tools/commits/ust-context-callstack/.
 >
 > I've rebased it on 2.11 here:
 > https://github.com/ApexAI/lttng-ust/commits/ust-callstack-2.11/.
 > lttng-tools:
 > https://github.com/ApexAI/lttng-tools/commits/ust-callstack-2.11/. It
 > shouldn't be too hard to rebase it all on a newer version.
 >
 > Hope that helps,
 >
 > Christophe
 >
 > On Mon, Dec 2, 2024 at 8:32 AM Alexander Krabler via lttng-dev <
 > lttng-dev at lists.lttng.org> wrote:
 >
 >> Hello,
 >>
 >> we want to record stacktraces at specific userspace events like e.g. 
calls
 >> to malloc and free using liblttng-ust-libc-wrapper.so.
 >> There is the callstack-user context to achieve this in general, however,
 >> it seems like tracing of userspace stacktraces is only available in the
 >> kernel tracing domain.
 >>
 >> Is there already a solution to achieve this goal?

Depending on the type of information you require, it could be possible 
to use the instruction-pointer (ip) context[1], the statedump[2] and/or 
lttng-ust-dl[3] for base addresses, and the symbol table in-order to 
resolve ip -> symbol name in the offline analysis phase with the help of 
babeltrace's debug-info plugin[4]. While a bit more work for the 
analysis, this process reduces the impact of the tracing on the program 
at run-time.

 >> If not, what would need to be done to achieve this?

The proof of concept suggested by Christophe has some limitations that 
may make it unsuitable for use in a production environment notably lack 
of testing - including corner cases such as emitting callstack traces 
from noexcept regions or signal handlers. Run-time stack unwinding may 
be expensive when frame pointers or hardware mechanisms aren't 
available, and the typical work-around to that is to sample the stack 
and unwind in post processing which is not a desirable solution in the 
case of LTTng-UST. Future work could also include integration with 
sframe[5], which aims to provide the information necessary for callstack 
unwinding with a lower code-size cost than DWARF and without the 
run-time cost of frame pointers.

If your intention is to use the callstack recording in production, it 
could also help to reduce the sampling rate. E.g., only sample every N 
time units or every N calls, or sampling "overly large" allocations - 
whatever that may be in your context.

What is required to get this into upstream UST?

  * Funding to work on making it production grade (robustness, testing, 
performance, and integration with the other LTTng features)

thanks,
kienan

[1]: https://lttng.org/man/3/lttng-ust/v2.13/#doc-_context_information
[2]: https://lttng.org/man/3/lttng-ust/v2.13/#doc-state-dump
[3]: https://lttng.org/man/3/lttng-ust/v2.13/#doc-ust-lib
[4]: 
https://babeltrace.org/docs/v2.0/man7/babeltrace2-filter.lttng-utils.debug-info.7/
[5]: https://www.sourceware.org/binutils/docs/sframe-spec.html

 >>
 >> Thanks,
 >> Alexander
 >