[lttng-dev] [lttng-ust] Missing events when just before a process exits

THEUNISSEN Rolf rolf.theunissen at altran.com
Wed Mar 14 12:37:35 EDT 2018


> -----Original Message-----
> From: Jonathan Rajotte-Julien [mailto:jonathan.rajotte-julien at efficios.com]
> Sent: Tuesday, March 13, 2018 4:35 PM
> To: THEUNISSEN Rolf <rolf.theunissen at altran.com>
> Cc: lttng-dev at lists.lttng.org
> Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a process
> exits
> 
> Hi Rolf,
> 
> On Tue, Mar 13, 2018 at 10:48:17AM +0000, THEUNISSEN Rolf wrote:
> > Hi,
> >
> > I am currently tracing many processes with LTTng-UST on a system under
> heavy load, with 2 CPUs.
> >
> > The traces seem to be missing ust-events near the end of a trace. After
> doing some analysis, it seems that events are missing from the other CPU
> then the process exits on. More concretely:
> >
> > During the lifetime of a process:
> > - Process X executes on CPU 0,1: all events are in the trace Just
> > before the process is about to exit:
> > - Process X executes on CPU 0: missing events from the trace
> 
> How are you validating this?

I am also using kernel tracing, to trace the scheduler. In TraceCompass I can view the states of the processes and CPUs.

> 
> > A few mili-seconds later:
> > - Process X executes on CPU1 and exits: The last events are in the
> > trace
> 
> This is not expected. Do you have a reproducer so we can check what might
> be happening?

I am not allowed to provide details about my current traces. And it is hard to make a small example, but I got some more clues what is happening, see below.

> 
> We will need the version for lttng-ust, lttng-tools, babeltrace.

I am using version 2.10, for analyzing the traces I am using TraceCompass
Kernel version: 3.10.62

> 
> Note that you can run your application with lttng-ust in debug mode using
> LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr.
> 
> e.g:
>     $ LTTNG_UST_DEBUG=y ./my_sample_app
> 

Enableling this debugging flag gave me a clue what is happening. For all processes, the registration is going OK. But for some processes, the un-registration is not logged. On closer analysis (now with tracepoints from SystemTap), I see that the processes for which the events are missing and the un-registration is missing are killed by SIGKILL signal. This probably explains why some data can be lost. But maybe strange that only messages get lost that were traced on a different CPU.

I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM is sent to the process. Somehow the kernel decides to deliver an SIGKILL instead of a SIGTERM.

Rolf

> Cheers
> 
> --
> Jonathan Rajotte-Julien
> EfficiOS


More information about the lttng-dev mailing list