[lttng-dev] faulty context for sched_stat_ tracepoints?
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Thu Nov 26 12:05:51 EST 2015
We could perhaps clarify our documentation to state that the "context"
info gathered by lttng is that of the execution context when the tracepoint
is invoked, but I'm afraid we won't start intermingling the tracepoint payload
information with context. This would be both technologically cumbersome,
and confusing.
I'm CCing Philippe Proulx who maintains our documentation,
Thanks,
Mathieu
----- On Nov 26, 2015, at 11:57 AM, Luetkebohle Ingo (CR/AEA2) <Ingo.Luetkebohle at de.bosch.com> wrote:
> Hi,
> well, the thing is, when I add "context" information, I want more information on
> the task of the tracepoint. That the tracepoint is executed in a situation
> where the *kernel* already has a new task context is of no interest to me as a
> user.
> Therefore, I would argue that what lttng treats as the context should, in this
> situation, be something different from what the kernel treats as its context.
> Don't know if that's *possible*, but its what I would expect ;-)
> YMMV, of course.
> cheers
> Dr. Ingo Lütkebohle
> Software Design and Analysis -- Robotics (CR/AEA2)
> Tel. +49(711)811-12248
> Fax +49(711)811-0
> Mobil +49-1525-8813417
> Ingo.Luetkebohle at de.bosch.com
> Von: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> Gesendet: Donnerstag, 26. November 2015 17:20
> An: Luetkebohle Ingo (CR/AEA2)
> Cc: lttng-dev
> Betreff: Re: AW: [lttng-dev] faulty context for sched_stat_ tracepoints?
> ----- On Nov 26, 2015, at 10:58 AM, Luetkebohle Ingo (CR/AEA2)
> <Ingo.Luetkebohle at de.bosch.com> wrote:
>> Hi Mathieu,
>> thank you for the explanation!
>> This would suggest that the problem only occurs for scheduler events, because
>> only then we have this situation that the currently scheduled task is not the
>> one that the tracepoint relates that. Am I interpreting that correctly?
> AFAIK, only the scheduler tracepoints pass a task struct as parameter (the one
> being
> acted upon) that can differ from the current task. So yes, this kind of
> situation where a
> tracepoint receives a task other than current as parameter pretty much only
> happens
> in the scheduler instrumentation.
>> If yes, this would certainly ease my concerns, because for most tracepoints, I
>> get the expected behavior. I can also get the information that I want right now
>> by looking at comm instead.
>> However, for future development of LTTng: Isn't the intuitive context that of
>> the *tracepoint*? In other words, isn't it wrong to always use the currently
>> scheduled task?
> By definition, the "current" task holds the current context, which is the
> one saved in the tid/procname lttng context.
> The task passed as parameter to the tracepoint is the task acted upon
> in this case, not the currently running one anymore. Therefore, the "context" of
> the
> tracepoint _is_ the currently running task, not the one passed as parameter.
> This highlights something that might not be obvious at first: when the scheduler
> puts a task into a sleeper queue, it does so while running in the context of the
> newly scheduled task.
> Thanks,
> Mathieu
>> cheers,
>> Dr. Ingo Lütkebohle
>> Software Design and Analysis -- Robotics (CR/AEA2)
>> Tel. +49(711)811-12248
>> Fax +49(711)811-0
>> Mobil +49-1525-8813417
>> Ingo.Luetkebohle at de.bosch.com
>> Von: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>> Gesendet: Donnerstag, 26. November 2015 16:21
>> An: Luetkebohle Ingo (CR/AEA2)
>> Cc: lttng-dev
>> Betreff: Re: [lttng-dev] faulty context for sched_stat_ tracepoints?
>> ---- On Nov 26, 2015, at 7:34 AM, Luetkebohle Ingo (CR/AEA2)
>> <Ingo.Luetkebohle at de.bosch.com> wrote:
>>> Hi,
>>> I noticed that the “procname” and “tid” * context * variables can differ from
>>> the “comm” and “tid” variables of the sched_stat_-tracepoints (e.g., for
>>> sched_stat_runtime).
>>> For example, see the following two outputs:
>>> [15:49:33.674072890] (+0.000029771) rng2070 sched_stat_sleep: { cpu_id = 4 }, {
>>> pid = 26974, procname = "amcl", tid = 26975 }, { comm = "lttng-sessiond", tid =
>>> 798, delay = 194122524 }
>>> [15:49:33.722894563] (+0.000004851) rng2070 sched_stat_sleep: { cpu_id = 0 }, {
>>> pid = 0, procname = "swapper/0", tid = 0 }, { comm = "amcl", tid = 27078, delay
>>> = 10043658 }
>>> In both cases, procname differs from comm. I’ve looked at the kernel source
>>> code, and the “comm” variable is taken from task_struct.filename. Barring
>>> renames, this would be the correct one, so I’m assuming that the context is
>>> sometimes incorrect.
>>> Now, for this particular case, I can take the correct variables from the
>>> tracepoint, but of course, this makes me worried that the context might also be
>>> wrong in other cases… Do you have any idea what could be going on here?
>>> LTTng is Version 2.7.x+stable+bzr2948+pack17+201511192116~ubuntu14.04.1 and this
>>> is on a 3.19.0-33-generic kernel (btw, that’s not the default kernel for Ubuntu
>>> 14.04, but the LTS vivid backport kernel).
>> Hi,
>> Looking at lttng-modules lttng-context-tid.c tid_record(), we see that it uses
>> task_pid_nr(current).
>> It therefore gets the tid of the currently scheduled task. This is similar for
>> procname.
>> Now looking at the Linux kernel kernel/sched/fair.c: enqueue_sleeper(), which
>> calls
>> trace_sched_stat_sleep(), we see that it passes "tsk" as parameter to the
>> tracepoint.
>> Already here, nothing seems to ensure that "tsk" is the currently scheduled
>> task. It's
>> rather a task that is being put to sleep. Looking at the callers of
>> enqueue_sleeper(),
>> we can understand that the current task is first scheduled out before being
>> enqueued
>> as a sleeper. This is why, when we hit the sched_stat_sleep tracepoint, the
>> current
>> task is not "tsk", which explains the difference between the context information
>> and the tracepoint payload information: they really target different tasks.
>> Does this help solve your concerns ?
>> Thanks,
>> Mathieu
>>> Mit freundlichen Grüßen / Best regards
>>> Ingo Luetkebohle
>>> Application Software (CR/AEA2)
>>> Tel. +49(711)811-12248 | Fax +49(711)811-0 | Ingo.Luetkebohle at de.bosch.com
>>> _______________________________________________
>>> lttng-dev mailing list
>>> lttng-dev at lists.lttng.org
>>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20151126/0c94b90a/attachment-0001.html>
More information about the lttng-dev
mailing list