[lttng-dev] lttng live event loss with babeltrace2

Jonathan Rajotte-Julien jonathan.rajotte-julien at efficios.com
Tue May 4 10:44:03 EDT 2021


On Mon, May 03, 2021 at 05:05:10PM -0700, Eqbal via lttng-dev wrote:
> Hi,
> 
> I have a lttng live session trace consumer application using libbabeltrace2
> where I create a graph to consume lttng live session traces and output to
> another sink. I am running the graph in a loop at some polling interval as
> long as I get BT_GRAPH_RUN_STATUS_AGAIN status. What I am noticing is that
> if my polling interval is large enough I tend to lose either all or some of
> the events. I experimented with various polling intervals and it seems if
> the polling interval is less than *DELAYUS *from "lttng-create
> --live=DELAYUS" option then I am able to get all the events, otherwise I
> tend to lose events.
> 
> Here are the steps I follow:
> 1. start session daemon and relay daemon
> 2. create a live session (with default delay of 1s), enable events and start
> 3. Start my application (hello world example from lttng docs)

Not sure if you modified it in any way, but be careful with short lived apps
since an app can terminate before lttng-ust have a chance to register.

> 4. Start the consumer application built using libbabeltrace that connects
> to the live session

hmm. Note that when attaching to a session it does not start at the beginning of the
trace collected by lttng-relayd, it start at the last received data from
lttng-relayd from the lttng-consumerd (LTTNG_VIEWER_SEEK_LAST).

Hence I would recommend that these steps be inversed:

4. Start the consumer application built using libbabeltrace that connects
to the live session
3. Start my application (hello world example from lttng docs)


> 
> I noticed that the events are actually persisted in the ~/lttng-traces by
> the relay daemon, but it does not reach babeltrace consumer application. I
> have noticed the same behavior with babeltrace2 cli.
> 
> I would like to understand what is the reason for such behavior and if
> playing with the polling interval in relation to the DELAYUS value is the
> right thing to do.

I think I reproduced the issue but I'm not completely sure it is the same
problem. Please file an issue on the bug tracker [1] with
as much information as possible, the exact lttng
commands used, the current behaviour and the expected behaviour. 
I'll add my findings if relevant.

But I think it might be a weird handling of how we handle the first "empty"
retry and the subsequent get phase. After the initial phase everything seems to
work as expected.

[1] https://bugs.lttng.org/


-- 
Jonathan Rajotte-Julien
EfficiOS


More information about the lttng-dev mailing list