[ltt-dev] LTTng specialized probes

Thu Oct 9 11:28:31 EDT 2008

* Michael Davidson (md at google.com) wrote:
> On Tue, Oct 7, 2008 at 5:07 PM, Mathieu Desnoyers <
> compudj at krystal.dyndns.org> wrote:
> 
> >
> > >
> > > This conversion could be done in the kernel, by a user space program
> > > running on the same machine, or by a program running elsewhere.
> >
> > You seem to assume that the same ABI the kernel runs in will be
> > available elsewhere, which might not be true in the embedded field. The
> > same applies to 64-bits x86 kernel with 32-bits userland. Therefore,
> > getting data out of the kernel should be done by a standardized ABI;
> > ideally following the kernel ABI for speed, but more importantly :
> > self-described. This is what LTTng does.
> >
> 
> I am confused by your response - I think that we are in agreement but may
> be using terminology slightly differently.
> 
> When I said that the conversion could be done by "a program running
> elsewhere"
> I was assuming that the metadata  which came along with the raw trace data
> would be sufficient to make that possible.
> 
> In other words:
> 
> - the metadata itself must be in a canonical format (ascii perhaps?) or be
>   sufficiently self  describing that it can be interpreted correctly.
> 
> - the metadata must completely describe the format of the binary data
>   including such things as byte order, size and alignment of data types
>   and the mapping of all binary values into a canonical external format
>   (ie maps for event numbers to event names, system call numbers to
>   system call names etc)
> 
> Given that information you can write a portable program which can run
> anywhere which can do the conversion.
> 
> I am not sure what the mechanism for extracting the trace data from
> the kernel and getting it into user space has to do with this.
> 
> md

Those are per-se two distinct topics (getting the data out to userspace
and exporting metadata which permits to parse the output data portably).
Where they become related is :

1 - When we have to consider which data format we use to write the event
data (event-specific payload) into the buffers. Ideally, we would like
that format to be parsable portably and fully described by the metadata.

2 - When we want to consider what is the most compact size for event
records, considering the fact that we have such metadata to describe the
event payload matters. Following discussions on LKML about number of TSC
bits required, and given it would be good to keep the "extended
information" as simple as possible, to me, a header with :

<core event header, aligned on 32 bits>
1-bit opt. ext. TSC (active if a 27-bits TSC overflow is detected)
27-bits TSC
4-bits event ID (ID #7 reserved to specify (optional) extended event ID,
                 ID #6 reserved to specify both ext. event ID and event
                 size)
<extended information, optional>
opt. 16-bits event ID
opt. 16-bits event size (if size == 65535 -> has large event size field)
opt. 32-bits large event size (aligned on 32 bits)
opt. 64-bits TSC MSB (aligned on architecture pointer size)
(realign on 32 or 64 bits depending on architecture before the event
 payload)

should be enough to look up through the metadata and see what event
payload to expect for a given event ID. Moreover, given we have 6 event
IDs which can fit in the 32-bits header (0 to 5), many uses-cases
(considering per-buffer IDs) will fit within these 6 available numbers.
And that would permit having an optional "event size" field, which can
be enabled on a per-event basis.

However, in the current Unified tracing buffer implementation done by
Steven Rostedt, he reserves those 5-bits for internal event IDs (rather
that putting buffer-specific information in a buffer header) and uses
the bits left for event size (limits the event size to a somewhat low
value). Although I agree that having the event size here is very useful
to cross-check the typing mechanism with the data actually written in
the buffers by the kernel, I think it should be made a "tracer debug"
mode, not a standard required field, because it requires to use bits
that would otherwise be available for event IDs.

Therefore, I think that whether or not there is metadata which describes
the event payload influences the event header and buffer header
decisions. In the implementation currently done by Steven, he ties
the buffering mechanism to the buffer and event headers, but leaves out
the metadata to a "separate" layer, because he only plans to parse the
buffers from within the kernel (he is not interested in exporting them
to userspace). But I think the separation is done at the wrong spot :
event header should be more closely tied to the available metadata and
buffer header should be sufficient to describe the buffer content
without having to reserve any specific event to express buffer-specific
information.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68