[lttng-dev] Adding a simple "look here" event to the trace

Wed Sep 27 07:54:02 EDT 2023

On 2023-09-08 06:56, Danter, Richard via lttng-dev wrote:
> Hi all,
> 
> I am investigating an issue that takes some time to reproduce. Finding
> the right point in the logs is therefore very difficult.
> 
> Since I can detect when the issue happens in the kernel I would like to
> be able to emit an event into the trace that I can then search for in
> Trace Compass of through Babeltrace. So basically a kind of flag that
> says "look here". That way I can jump right to the problem and then
> look backwards from there to see what happened just before.
> 
> I have looked at the docs for how to add a trace point, but it seems
> pretty complicated. I may have missed something though, so I wonder if
> there is a trivial way to add such a flag to the log? Up to now I just
> put a printk() in which helps, but would still be nicer to have
> something directly in the log.

Hello Rich,

This is a good question! The easiest way to point directly to the 
relevant part of a trace is to stop capturing trace data immediately 
after the identified issue is encountered. This means you know what 
you're looking for is right at the end of the trace. Stopping the trace 
seems like a good fit in this scenario because you're only interested in 
what happens immediately before the issue and you're able to identify 
when the problem has happened.

Assuming you would like to avoid modifying the kernel code, LTTng 
triggers [1] may be a good fit. Triggers allow you to associate a 
condition (e.g. event X happened) with an action you would like to take 
(e.g. stop tracing). When the condition is encountered, the associated 
action is automatically triggered.

In this scenario we would recommend:

  1. Trace in overwrite mode (flight recorder mode): Since the issue 
takes a while to reproduce and only the events immediately preceding the 
issue are relevant, keeping just a limited amount of the most recent 
data avoids accumulating useless data volume.

  2. Determine when the issue is encountered with a trigger: This will 
focus the trace on the problem area.

  3. When the issue is encountered, take a snapshot: This will give you 
a trace that contains what is relevant. What happened immediately before 
the trigger will be at the end of the trace.

In terms of defining the trigger condition, you can add a trigger [2] 
that matches a kernel event type that happens as close as possible to 
right after the issue is encountered and then specify additional details 
for the condition using the capture descriptor [3]. Ideally, you want a 
condition that will only be true when the issue is encountered to avoid 
having to manually sort through the snapshots afterwards. The add 
trigger man page provides several examples [4] that illustrate the 
condition and action syntax.

Hope this helps!

Best,
Erica

[1] LTTng triggers - https://lttng.org/docs/v2.13/#doc-trigger
[2] Add trigger - https://lttng.org/man/1/lttng-add-trigger/v2.13/
[3] Trigger capture descriptor - 
https://lttng.org/man/1/lttng-add-trigger/v2.13/#doc-capture-descr
[4] Trigger examples - 
https://lttng.org/man/1/lttng-add-trigger/v2.13/#doc-examples

> 
> If there isn't such a thing already, then would it be a reasonable
> enhancement request to be able to add such a feature?
> 
> Thanks
> Rich
> 
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev