[lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Thu Dec 1 18:47:15 EST 2011
* Greg KH (greg at kroah.com) wrote:
> On Fri, Dec 02, 2011 at 12:06:37AM +0100, Peter Zijlstra wrote:
> > On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote:
> > >
> > > If you don't want to trace sched_switch, but just conveniently prepend
> > > this information to all your events
> >
> > Oh so you want to debug a scheduler issue but don't want to use the
> > scheduler tracepoint, I guess that makes perfect sense for clueless
> > people.
>
> Matheiu, can't lttng use the scheduler tracepoint for this information?
LTTng allows user to choose between both methods, each one being suited
to a particular use of the tracer:
A) Extraction through the scheduler tracepoint:
LTTng viewers have a full-fledged current state reconstruction of the
traced OS (for any point in time during the trace) performed as one
of the bottom layers of our trace analysis tools. This makes sense
for use-cases where the data needs to be transported, and/or stored,
and where the amount of data throughput needs to be minimized. We use
this technique a lot, of course. This state-tracking requires
CPU/memory resource usage by the viewer.
B) Extraction through "optional" event context information:
We have, in development, a new "enhanced top" called lttngtop that
uses tracing information, directly read from mmap'd buffers, to
provide second-by-second profile information of the system. It is
not as sensitive to data compactness as the transport/disk storage
use-case, mainly because no data copy is ever required -- the buffers
simply get overwritten after lttngtop has finished aggregating the
information. This has less performance overhead that the big hammer
"top" that periodically reads all files in /proc, and can provide
much more detailed profiles.
This use-case favors sending additional data from kernel to
user-space rather than recomputing the OS state within lttngtop, due
to the very low overhead of direct mmap data transport, over
recomputing state needlessly.
We could very well "cheat" and use a scheduler tracepoint to keep a
duplicate of the current priority value for each CPU within the tracer
kernel module. Let me know if you want me to do this.
Also, as a matter of fact, the "prio" information exported from the
sched_switch event in mainline trace events does not match the prio
shown in /proc stat files. The "MAX_RT_PRIO" offset is missing.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list