[lttng-dev] [lttng-ust] Missing events when just before a process exits
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Wed Mar 14 13:24:34 EDT 2018
----- On Mar 14, 2018, at 12:37 PM, THEUNISSEN Rolf rolf.theunissen at altran.com wrote:
>> -----Original Message-----
>> From: Jonathan Rajotte-Julien [mailto:jonathan.rajotte-julien at efficios.com]
>> Sent: Tuesday, March 13, 2018 4:35 PM
>> To: THEUNISSEN Rolf <rolf.theunissen at altran.com>
>> Cc: lttng-dev at lists.lttng.org
>> Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a process
>> exits
>>
>> Hi Rolf,
>>
>> On Tue, Mar 13, 2018 at 10:48:17AM +0000, THEUNISSEN Rolf wrote:
>> > Hi,
>> >
>> > I am currently tracing many processes with LTTng-UST on a system under
>> heavy load, with 2 CPUs.
>> >
>> > The traces seem to be missing ust-events near the end of a trace. After
>> doing some analysis, it seems that events are missing from the other CPU
>> then the process exits on. More concretely:
>> >
>> > During the lifetime of a process:
>> > - Process X executes on CPU 0,1: all events are in the trace Just
>> > before the process is about to exit:
>> > - Process X executes on CPU 0: missing events from the trace
>>
>> How are you validating this?
>
> I am also using kernel tracing, to trace the scheduler. In TraceCompass I can
> view the states of the processes and CPUs.
>
>>
>> > A few mili-seconds later:
>> > - Process X executes on CPU1 and exits: The last events are in the
>> > trace
>>
>> This is not expected. Do you have a reproducer so we can check what might
>> be happening?
>
> I am not allowed to provide details about my current traces. And it is hard to
> make a small example, but I got some more clues what is happening, see below.
>
>>
>> We will need the version for lttng-ust, lttng-tools, babeltrace.
>
> I am using version 2.10, for analyzing the traces I am using TraceCompass
> Kernel version: 3.10.62
>
>>
>> Note that you can run your application with lttng-ust in debug mode using
>> LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr.
>>
>> e.g:
>> $ LTTNG_UST_DEBUG=y ./my_sample_app
>>
>
> Enableling this debugging flag gave me a clue what is happening. For all
> processes, the registration is going OK. But for some processes, the
> un-registration is not logged. On closer analysis (now with tracepoints from
> SystemTap), I see that the processes for which the events are missing and the
> un-registration is missing are killed by SIGKILL signal. This probably explains
> why some data can be lost. But maybe strange that only messages get lost that
> were traced on a different CPU.
>
> I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM is sent to
> the process. Somehow the kernel decides to deliver an SIGKILL instead of a
> SIGTERM.
When using lttng-ust with per-uid buffers, there is a known design
limitation regarding process terminated by non-handled kill signals: if
this occurs while the buffer is being written to (between reserve and commit),
it will cause the rest of the per-cpu buffer to be unreadable. This applies to all
events recorded by all applications of the same user id for that particular
CPU. Recovering from this involves destroying and re-creating the tracing session.
If this kind of scenario is likely for you, you might want to consider using
per-pid buffers (lttng enable-channel -u --buffers-pid), which do not suffer
from this design limitation. The downside of per-pid buffers is that it allocates
more memory on your system for buffering, and adds extra overhead when used with
processes that have a short life-time.
Thanks,
Mathieu
>
> Rolf
>
>> Cheers
>>
>> --
>> Jonathan Rajotte-Julien
>> EfficiOS
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list