[ltt-dev] LTTng-UST vs SystemTap userspace tracing benchmarks

Wed Feb 16 13:50:56 EST 2011

Stefan was referring to #4 in your taxonomy.

It's indeed the case that what UST uses today is an always-there normal
C code sequence that loads global variables to decide whether to make
indirect function calls.  I don't recall off hand how many layers of
function calls to the libust DSO and such there are in either the
disabled or enabled cases.  At best, there is the always the overhead of
several instructions and at least one load in the hot code path, and the
i-cache pollution that goes with that.

It's indeed the cast that what Systemtap uses today is a
sometimes-inserted normal breakpoint instruction, which is indeed a
software interrupt that requires kernel mediation.  When disabled, there
is as close to zero overhead as you can have, being a tiny placeholder
instruction sequence (currently just one nop), so the runtime overhead
is under a cycle and the i-cache pollution is the smallest possible unit
(one instruction, being just one byte on x86).

The "sweet spot" between the two is to have overhead close to
Systemtap's epsilon for a disabled probe, while having overhead close to
UST's pure-user method when a probe is enabled.  In the in-kernel
context, this is what the Linux kernel's latest code (still being hashed
out, but mostly done) has for kernel tracepoints using the so-called
"jump label" method.  That is also possible for sdt markers with some
careful consideration and attention to machine-specific details for each
machine architecture of concern.  It entails making the placeholder in
the hot code path slightly larger (at least for x86, it has to be a
"long nop", being probably neglibly more runtime overhead, and a few
bytes more i-cache pollution), and adding some additional static code
outside the hot path.  The work to enable or disable a probe becomes
just as costly as the current Systemtap method, since it involves
modifying the program text in place (inserting jump instructions rather
than breakpoint ones).  Once enabled, the runtime work of the probes
firing can be very much like what UST does today.

Thanks,
Roland