[ltt-dev] LTTng specialized probes

Mathieu Desnoyers compudj at krystal.dyndns.org
Mon Oct 6 11:56:36 EDT 2008


* Martin Bligh (mbligh at google.com) wrote:
> On Mon, Oct 6, 2008 at 8:26 AM, Mathieu Desnoyers
> <compudj at krystal.dyndns.org> wrote:
> > The idea is that someone who want to add a new instrumentation site in
> > the Linux kernel does not have to write a specialized probe up front.
> > The format string parser will take care of writing the typed data into
> > the buffers (default behavior), but can still overridden by a
> > specialized function which will expect the format string arguments and
> > serialize those into the buffers.
> 
> OK, it seemed mandatory to me, but if it's not, that's good.
> 
> > About what we discussed in Portland and where Steven is currently going:
> > it does not provide any kind of binary standard to export the data
> > between different platforms or even from kernel 64-bits kernel to
> > 32-bits userland. Steven also cleary states that he doesn't care about
> > exporting this data to userspace in binary format. He wants a
> > supplementary layer to do this formatting, which I don't think will
> > produce the performance results we are looking for. Plus, I think
> > feeding the data through the kernel which recorded the information to
> > decode it is the wrong approach, especially when the system which
> > recorded such information is a small embedded device, where getting the
> > data _out_ is already non-trivial. Feeding it back in seems a bit crazy.
> 
> I know he wants the in-kernel parsing for ease-of-use, and getting things
> upstream ... but it seemed to me that there was nothing in what he was
> doing that made it impossible to get the data in binary form out to userspace.
> Exporting the buffers is obviously easy.

Yes, exporting garbage to userspace is easy too ;) Making sense out of
it, especially without DWARF info, might be a bit more difficult.

> I was under the impression you were
> recording strings in the buffers anyway, in which case I don't see why you
> care, but I might be totally mistaken.

The LTTng buffer format records those markers format strings
only once in a "metadata" channel so the mapping

event id <-> marker name <-> format string

can be extracted from the trace. We can therefore encode event size and
typing in this table and manage to leave that metadata out of the high
throughput tracing stream. By adding a layer that does not take
advantage of such indirection, Steven is actually reserving event IDs
for "internal use" when we could, in many cases, use those bits to put
the event IDs which map to the marker event table. By separating the
low-level event header management from the event ID registration
mechanism, we are aiming at a much less efficient solution.

Also, by limiting the event reservation so events never cross a page
boundary, we are actually limiting the event size that can be exported
through such stream to 4kB. To me, 4kB non-contiguous pages should be
_one_ memory backend to use for the buffers (others being video memory
which survives hot reboots or linearly addressable buffer allocated at
boot time), which clearly does not have the same 4kB restrictions. I
therefore don't see why the higher-level buffer management primitives
(reserve/commit) should suffer from this specific lower-level buffer
limitation, especially given we can encapsulate writes so it's easy to
deal with page-crossing writes (c.f. vmap()-less buffers I posted last
week).

> Even so, it seems what we'd need
> is just to make sure the buffer headers were exported, plus the decoding
> functions - making C files that will link with both the kernel and into a
> userspace library would be a little tricky, but not impossible?
> 

Linking 64-bits kernel objects into 32-bits userland executables seems
messy to me. And this is without considering cross-architecture concerns
(embedded developers with a small powerpc board but an x86 dev. machine
might want to look at the trace from a non-ABI compatible architecture).

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list