[ltt-dev] LTTng specialized probes

Mathieu Desnoyers compudj at krystal.dyndns.org
Wed Oct 8 22:43:22 EDT 2008


* Martin Bligh (mbligh at google.com) wrote:
> > Has anyone thought of / tried out some caching mechanism for this task?
> > I mean, scan the format string once (I don't think it will change during
> > runtime... :->), save somewhere that it expects n bytes of
> > to-be-serialized data on the caller's stack and then get away with only
> > copying those over into the trace buffer on succeeding marker hits?
> 
> Is it crazy to think of this as "everyone registers a print function", and
> we just provide a default print function that takes a string that people
> can use if they are not concerned with efficiency? Possibly with a different
> macro header if need be to hide it.
> 

Yes, that's how I see it, just from the opposite end : I first aim to
provide a generic fallback so it's easy to use and leave the ability to
connect a specialized function to serialize the information when
performance matters.

> Another random thought ... was talking with Michael and Jiaying here about
> the markers, and it seemed one of the performance / complexity issues
> was the necessity to call multiple tracers from one marker (disclaimer:
> I haven't had the time to look at this in any detail, sorry, so please forgive
> if this totally misses something ...). Would it be possible to always just have
> one thing attached to the marker - this would be optimal for the common case.
> In the event that we do need multiple tracers attached, we could simply make
> the one thing attached to the marker a multiplexor that called a list of tracers
> (transparently).

This kind of optimization is already in the marker infrastructure : if
there is a single probe connected, a single callback is called and no
iteration is required on the probe function pointer array. This is not
done in the tracepoints however because this involves either another
level of function call (but we don't care in the case of markers because
we already have to do a va_start on the var args, which involves having
a first function level) or to spill mode code at the instrumentation
site. I have tried to keep the tracepoint code very compact at the
kernel instrumentation site by putting small optimized loop which
iterates on every connected marker.

The problem is that as soon as we add special-cases to optimize for
specific single-probe scenario, we have to spill more code at the kernel
instrumentation site, and if we do that, the kernel developers will
strongly oppose.

> 
> I expect this would reduce complexity and speed up the common case, at the
> expense of a little performance in the uncommon case. Though all of this
> is fairly second-hand, so maybe is not useful ;-( Sorry, buried in
> Google-specific
> stuff right now.
> 

Well, the tradeoff here is mostly to try to keep the instrumentation
that we put in the kernel code as small as possible in terms of
instruction cache hit which is opposed to any kind of tricky
optimization we would like to do to make some common cases faster when
tracing is enabled.

One thing we could do that might help making performance better is this:

We create a DEFINE_MARKER(name, string) and a get_marker_struct(name).
Those will permit to have the exact same thing the current markers give
us (declaring an event name, associate an ID to it and declare its
parameters). However, this would not imply any function call like what
markers currently require. By doing this, we could put the optimized
"type-specific" code which writes the data into the trace buffers
directly in the tracepoint probe and therefore skip two function calls,
which should make performance much better.

Mathieu


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list