[ltt-dev] LTTng specialized probes

Tue Oct 7 20:26:14 EDT 2008

* Jiaying Zhang (jiayingz at google.com) wrote:
> On Mon, Oct 6, 2008 at 7:11 AM, Mathieu Desnoyers <
> compudj at krystal.dyndns.org> wrote:
> 
> > Hi,
> >
> > I'm currently working towards getting LTTng in shape for what is
> > required for mainline. I got the "TLB-less" buffers and splice()
> > working last week. I then did some performance testing on the flight
> > recorder mode and noticed an optimization that's really worth doing :
> >
> > LTTng "ltt-serialize.c", which parses the format strings and formats
> > data into the trace buffers takes a lot of CPU time. I tried only
> > keeping the size calculation (first pass on the format string) and
> > disabling the real data write and basically got something like :
> >
> > (default LTTng instrumentation, very approximate numbers)
> >
> > tbench no tracing : ~1900MB/s
> >       Markers enabled : ~1800MB/s
> >       with size calculation : ~1400MB/s
> >       size calc + data write : ~950MB/s
> 
> 
> Thanks a lot for sharing these numbers! Looks like we should
> use special probe functions for high-frequency tracing events.
> Also, do you know why enabling markers adds so much overhead?
> 

Note that enabling markers enables 2 functions calls, including stack
setup of marker callbacks, before ltt_vtrace detects that tracing is not
active. Also, the "tbench no tracing" number has been taken with
2.6.27-rc7 and others with 2.6.27-rc8; more tests would be required to
determine which role the two function calls exactly plays in this.

Note that those are function calls rather than inline in the kernel code
because the last thing we want to do is to pollute the kernel
instruction cache. Also, there are 2 function calls instead of 1 because
the tracepoints are an in-kernel abstraction which allows various
in-kernel tracers to connect on the traced sites, while the second call
(the marker) presents the data to userspace. This second marker function
call also permits a lot of flexibility, like easily selecting
specialized or generic serialization functions depending on the type to
record.

Mathieu

> Jiaying
> 
> 
> >
> > I then remembered I've done ltt-serialize in such a way that it can be
> > easily overridden by per-format string specialized callbacks.
> >
> > Therefore, it would be worthwhile to create such specialized serializers
> > so the common cases can be made much faster. I think it will have a very
> > significant impact on performance.
> >
> > It's simply a matter of creating a new .c kernel module in ltt/ and to
> > create structures similar to :
> >
> > ltt-serialize.c :
> >
> > struct ltt_available_probe default_probe = {
> >        .name = "default",
> >        .format = NULL,
> >        .probe_func = ltt_vtrace,
> >        .callbacks[0] = ltt_serialize_data,
> > };
> >
> > Give it a non-null format string (just giving the types expected by the
> > callback), a good name, and a callback function, which implements the
> > specialized serialization. Note that kernel/marker.c currently expects
> > the format string to match exactly the marker format string, including
> > the type names, which should be changed. The type verification should
> > only check that the %X parameters are the same (and that there are the
> > same amount of arguments expected).
> >
> > That should not be hard, but it's not what I plan to focus on next.
> > Anyone is willing to work on this ?
> >
> > Mathieu
> >
> > --
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> >

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68