[lttng-dev] faulty context for sched_stat_ tracepoints?

Luetkebohle Ingo (CR/AEA2) Ingo.Luetkebohle at de.bosch.com
Fri Nov 27 03:12:53 EST 2015


Hi,


I didn't mean to suggest some kind of "intermingling". I merely thought that, because for every other kind of tracepoint, the "context" is the execution context of the task that *caused* the tracepoint (which happens to be the same as the *current* context in almost all cases), it would be consistent if this were the case for scheduler-tracepoints also. This seems to make more sense to me.


but I don't really feel strongly about this, and you're the experts, so if you prefer it to be the current context in all cases, I'll acquiesce.

cheers

Dr. Ingo Lütkebohle
Software Design and Analysis -- Robotics (CR/AEA2)

Tel. +49(711)811-12248
Fax +49(711)811-0
Mobil +49-1525-8813417
Ingo.Luetkebohle at de.bosch.com<mailto:Ingo.Luetkebohle at de.bosch.com>
________________________________
Von: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Gesendet: Donnerstag, 26. November 2015 18:05
An: Luetkebohle Ingo (CR/AEA2); Philippe Proulx
Cc: lttng-dev
Betreff: Re: AW: AW: [lttng-dev] faulty context for sched_stat_ tracepoints?

We could perhaps clarify our documentation to state that the "context"
info gathered by lttng is that of the execution context when the tracepoint
is invoked, but I'm afraid we won't start intermingling the tracepoint payload
information with context. This would be both technologically cumbersome,
and confusing.

I'm CCing Philippe Proulx who maintains our documentation,

Thanks,

Mathieu

----- On Nov 26, 2015, at 11:57 AM, Luetkebohle Ingo (CR/AEA2) <Ingo.Luetkebohle at de.bosch.com> wrote:

Hi,


well, the thing is, when I add "context" information, I want more information on the task of the tracepoint.  That the tracepoint is executed in a situation where the *kernel* already has a new task context is of no interest to me as a user.


Therefore, I would argue that what lttng treats as the context should, in this situation, be something different from what the kernel treats as its context.


Don't know if that's *possible*, but its what I would expect ;-)

YMMV, of course.


cheers


Dr. Ingo Lütkebohle
Software Design and Analysis -- Robotics (CR/AEA2)

Tel. +49(711)811-12248
Fax +49(711)811-0
Mobil +49-1525-8813417
Ingo.Luetkebohle at de.bosch.com<mailto:Ingo.Luetkebohle at de.bosch.com>
________________________________
Von: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Gesendet: Donnerstag, 26. November 2015 17:20
An: Luetkebohle Ingo (CR/AEA2)
Cc: lttng-dev
Betreff: Re: AW: [lttng-dev] faulty context for sched_stat_ tracepoints?


----- On Nov 26, 2015, at 10:58 AM, Luetkebohle Ingo (CR/AEA2) <Ingo.Luetkebohle at de.bosch.com> wrote:

Hi Mathieu,


thank you for the explanation!


This would suggest that the problem only occurs for scheduler events, because only then we have this situation that the currently scheduled task is not the one that the tracepoint relates that. Am I interpreting that correctly?

AFAIK, only the scheduler tracepoints pass a task struct as parameter (the one being
acted upon) that can differ from the current task. So yes, this kind of situation where a
tracepoint receives a task other than current as parameter pretty much only happens
in the scheduler instrumentation.



If yes, this would certainly ease my concerns, because for most tracepoints, I get the expected behavior. I can also get the information that I want right now by looking at comm instead.


However, for future development of LTTng: Isn't the intuitive context that of the *tracepoint*? In other words, isn't it wrong to always use the currently scheduled task?

By definition, the "current" task holds the current context, which is the
one saved in the tid/procname lttng context.

The task passed as parameter to the tracepoint is the task acted upon
in this case, not the currently running one anymore. Therefore, the "context" of the
tracepoint _is_ the currently running task, not the one passed as parameter.

This highlights something that might not be obvious at first: when the scheduler
puts a task into a sleeper queue, it does so while running in the context of the
newly scheduled task.

Thanks,

Mathieu




cheers,


Dr. Ingo Lütkebohle
Software Design and Analysis -- Robotics (CR/AEA2)

Tel. +49(711)811-12248
Fax +49(711)811-0
Mobil +49-1525-8813417
Ingo.Luetkebohle at de.bosch.com<mailto:Ingo.Luetkebohle at de.bosch.com>
________________________________
Von: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Gesendet: Donnerstag, 26. November 2015 16:21
An: Luetkebohle Ingo (CR/AEA2)
Cc: lttng-dev
Betreff: Re: [lttng-dev] faulty context for sched_stat_ tracepoints?
---- On Nov 26, 2015, at 7:34 AM, Luetkebohle Ingo (CR/AEA2) <Ingo.Luetkebohle at de.bosch.com> wrote:
Hi,

I noticed that the "procname" and "tid" *context* variables can differ from the "comm" and "tid" variables of the sched_stat_-tracepoints (e.g., for sched_stat_runtime).

For example, see the following two outputs:
[15:49:33.674072890] (+0.000029771) rng2070 sched_stat_sleep: { cpu_id = 4 }, { pid = 26974, procname = "amcl", tid = 26975 }, { comm = "lttng-sessiond", tid = 798, delay = 194122524 }
[15:49:33.722894563] (+0.000004851) rng2070 sched_stat_sleep: { cpu_id = 0 }, { pid = 0, procname = "swapper/0", tid = 0 }, { comm = "amcl", tid = 27078, delay = 10043658 }

In both cases, procname differs from comm.  I've looked at the kernel source code, and the "comm" variable is taken from task_struct.filename. Barring renames, this would be the correct one, so I'm assuming that the context is sometimes incorrect.

Now, for this particular case, I can take the correct variables from the tracepoint, but of course, this makes me worried that the context might also be wrong in other cases... Do you have any idea what could be going on here?

LTTng is Version 2.7.x+stable+bzr2948+pack17+201511192116~ubuntu14.04.1 and this is on a 3.19.0-33-generic kernel (btw, that's not the default kernel for Ubuntu 14.04, but the LTS vivid backport kernel).
Hi,

Looking at lttng-modules lttng-context-tid.c tid_record(), we see that it uses task_pid_nr(current).
It therefore gets the tid of the currently scheduled task. This is similar for procname.

Now looking at the Linux kernel kernel/sched/fair.c: enqueue_sleeper(), which calls
trace_sched_stat_sleep(), we see that it passes "tsk" as parameter to the tracepoint.
Already here, nothing seems to ensure that "tsk" is the currently scheduled task. It's
rather a task that is being put to sleep. Looking at the callers of enqueue_sleeper(),
we can understand that the current task is first scheduled out before being enqueued
as a sleeper. This is why, when we hit the sched_stat_sleep tracepoint, the current
task is not "tsk", which explains the difference between the context information
and the tracepoint payload information: they really target different tasks.

Does this help solve your concerns ?

Thanks,

Mathieu


Mit freundlichen Grüßen / Best regards

Ingo Luetkebohle

Application Software (CR/AEA2)
Tel. +49(711)811-12248 | Fax +49(711)811-0 | Ingo.Luetkebohle at de.bosch.com<mailto:Ingo.Luetkebohle at de.bosch.com>



_______________________________________________
lttng-dev mailing list
lttng-dev at lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20151127/429a8903/attachment-0001.html>


More information about the lttng-dev mailing list