[ltt-dev] LTTng-UST vs SystemTap userspace tracing benchmarks

Thu Feb 17 14:33:38 EST 2011

On 02/17/2011 02:11 PM, Josh Stone wrote:
> On 02/17/2011 09:33 AM, Julien Desfossez wrote:
>> Hi,
>>
>> On 02/16/2011 10:30 AM, Tom Tromey wrote:
>>> Julien> 0) Baseline : running the program without any instrumentation
>>> Julien> 1) Flight recorder tracing comparison UST vs SystemTap
>>>
>>> I'd be interested to also see the numbers when the probes are in place
>>> in the source, but not enabled.  That is, what is the overhead of a
>>> disabled probe?
>> I disabled the probe by undefining HAVE_SYSTEMTAP, but I have the same
>> results in flight recorder mode. Of course if the module is not loaded
>> we have no overhead at all. It means that the module is responsible for
>> all the overhead regarless if the probe is called or not.
>> I would be really interested if you know why it happens (and how to fix it).
>> This last test was done on a Fedora Core 14 (kernel
>> 2.6.35.10-74.fc14.x86_64 with SystemTap 1.3-3).
>>
>> If you want to test, the benchmark code is here :
>> git://git.lttng.org/benchmarks.git
> 
> Your "testutrace.stp" is probing with process.function, which means
> you're not using the compiled tracepoint at all, but rather a function
> probe based on dwarf debuginfo.  So compiling !HAVE_SYSTEMTAP in this
> case doesn't matter, the function still exists for the module to probe.
> The correct form for SDT probes is a process.mark probe, as you quoted
> in your original mail, in which case stap would fail to compile the
> module for the !HAVE_SYSTEMTAP case as the marks don't exist.

Ok, my bad, I copied an older version of the test, when I setup the
repository, its fixed now. It doesn't change the earlier results which
was done with the mark, just the one I posted today.
Now the empty probe is actually much faster and the overhead when the
probe is disabled is now null as expected :)

> In the general use case, a script can be conditional on the presence of
> different probe types, as described in "man stapprobes".  For the
> purpose of benchmarking I would avoid this, so we can be absolutely sure
> of what's being probed.  But for reference, it can look like:
>   probe process("foo").mark("myfn")!,
>         process("foo").function("myfn")
>   { ... }

Good to know, thanks.

> Note also that there's about twice the overhead for process.function
> versus process.mark.  With .mark, a NOP instruction is inserted for us
> to place the debug breakpoint on.  As of the uprobes in stap 1.3, we can
> skip the singlestep of probes on a NOP.  But for function probes, the
> debug breakpoint is placed near the beginning of the function, likely on
> a significant instruction, so it must be singlestepped.  Having a
> singlestep means there's basically two traps per probe hit, so it really
> is a big win to use process.mark instead.
> 
> Getting back to Tom's request, I think these are the variations that we
> need to see for a fuller picture:
> 
> 1) Baseline with NO instrumentation compiled in at all.  You may need
> something like an asm("") in single_trace() to keep gcc from compiling
> the loop away altogether.
> 1a) Same binary w/ probe process.function (showing that stap can probe
> unmodified binaries, though I expect this to be slowest of all)
> 
> 2) UST baseline: UST compiled in, but not active.
> 2a) Same binary w/ tracing activated, UST w/ TC
> 2b) Same binary w/ tracing activated, UST w/o TC
> 2c) etc. any other UST variant
> * if the UST variations require different compilation, the split this up
> and report active/inactive numbers each time.
> 
> 3) SDT baseline: stap SDT compiled in, but not active.
> 3a) Same binary w/ active probe process.mark
> 
> 4) SDT-semaphore baseline: SDT compiled in and using a semaphore, not
> active.  The semaphore is TRACEPOINT_BENCHMARK_SINGLE_TRACE_ENABLED(),
> so you could put  if (..._ENABLED()) TRACE(...);
> 4a) Same binary w/ active probe process.mark

Ok, I will do these tests on the same machine and post the results soon.

Thanks,

Julien