[lttng-dev] liblttng-ust overhead, take 2
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Thu Sep 7 17:55:22 UTC 2017
----- On Sep 7, 2017, at 1:06 PM, Milian Wolff milian.wolff at kdab.com wrote:
> On Thursday, September 7, 2017 6:29:58 PM CEST Jérémie Galarneau wrote:
>> On 7 September 2017 at 04:47, Milian Wolff <milian.wolff at kdab.com> wrote:
>> > Hey all,
>> >
>> > after my latest report over liblttng-ust overhead yielded such quick
>> > results, I would like to expand on this with a more realistic workload
>> > that I'm looking at.
>> >
>> > We have two applications, one that sets up an SQLite database and then
>> > launches multiple client applications that try to read and write
>> > concurrently from that database. Of course, this will show file-level
>> > lock contention. I'd like to use LTTng to visualize the effect.
>> >
>> > On my machine I get the following numbers.
>> >
>> > 1) Without LTTng and without UST:
>> > ~~~~~
>> >
>> > Performance counter stats for './launcher/lab_latency_launcher':
>> > 495.075367 task-clock (msec) # 0.150 CPUs utilized
>> >
>> > 2,340 context-switches # 0.005 M/sec
>> >
>> > 291 cpu-migrations # 0.588 K/sec
>> >
>> > 52,265 page-faults # 0.106 M/sec
>> >
>> > 3,098,760,038 cycles # 6.259 GHz
>> > 1,894,074,743 instructions # 0.61 insn per
>> > cycle
>> >
>> > 354,869,636 branches # 716.799 M/sec
>> >
>> > 10,317,266 branch-misses # 2.91% of all
>> > branches
>> >
>> > 3.298337574 seconds time elapsed
>> >
>> > ~~~~~~
>> >
>> > 2) With LTTng (all kernel events enabled) and without UST:
>> > ~~~~~
>> >
>> > Performance counter stats for 'run_lttng_trace.sh ./launcher/
>> >
>> > lab_latency_launcher':
>> > 296.329694 task-clock (msec) # 0.068 CPUs utilized
>> >
>> > 1,808 context-switches # 0.006 M/sec
>> >
>> > 398 cpu-migrations # 0.001 M/sec
>> >
>> > 53,493 page-faults # 0.181 M/sec
>> >
>> > 2,411,871,165 cycles # 8.139 GHz
>> > 2,370,005,487 instructions # 0.98 insn per
>> > cycle
>> >
>> > 425,284,747 branches # 1435.174 M/sec
>> >
>> > 10,442,662 branch-misses # 2.46% of all
>> > branches
>> >
>> > 4.338938821 seconds time elapsed
>> >
>> > ~~~~~~
>> >
>> > 3) With LTTng (all kernel events enabled) and with UST `tracef()` events
>>
>> > enabled (~400 UST events in total):
>> We need more data to make sense of those numbers.
>>
>> What does the payload of those events look like? Using tracef(),
>> string formating will most likely dominate over the time taken to
>> write to the bufers.
>
> I attached the complete source code for you to reproduce this yourself. Also,
> string formatting _does not_ dominate the time, otherwise I'd see more CPU
> utilization (cf. perf stat output's first line).
>
>> How many clients are you launching? What is their expected lifetime?
>
> 100 clients, 10 in parallel, each taking a couple of milliseconds. See the
> source code for yourself.
I've tried your program here, but cannot reproduce the overhead you describe.
What am I doing wrong ? (fyi, I use master branches of lttng-ust, modules, tools,
liburcu). This is on a Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, Linux 4.9.0-3-amd64
4.9.30-2+deb9u2 (debian).
* build-no-ust:
time ./launcher/launcher > /dev/null
real 0m1.752s
user 0m0.264s
sys 0m0.172s
* build-no-ust (with kernel syscalls traced):
[ using lttng enable-event -k --syscall -a ]
time ./launcher/launcher > /dev/null
real 0m1.700s
user 0m0.212s
sys 0m0.236s
* build-no-ust (with kernel tracepoints traced):
[ using lttng enable-event -k --tracepoint -a ]
time ./launcher/launcher > /dev/null
real 0m2.031s
user 0m0.248s
sys 0m0.332s
-> this overhead is expected given that we are tracing lots of
kernel events. Selecting specific events instead would help
making this overhead smaller.
* build-no-ust (with all kernel events (tracepoints+syscalls)):
[ using lttng enable-event -k -a ]
time ./launcher/launcher > /dev/null
real 0m2.031s
user 0m0.256s
sys 0m0.332s
* build-with-ust (with tracef UST event)
[ using lttng enable-event -u lttng_ust_tracef:event ]
time ./launcher/launcher > /dev/null
real 0m1.525s
user 0m0.432s
sys 0m0.360s
lttng view | wc -l
400
-> 400 events collected.
* build-with-ust (with tracef UST event and all kernel events)
[ using: lttng enable-event -u lttng_ust_tracef:event
and lttng enable-event -k -a ]
time ./launcher/launcher > /dev/null
real 0m2.428s
user 0m0.460s
sys 0m0.892s
The main difference I see between your setup and mine is that I use
"time" instead of perf to run it. Also, I run lttng-sessiond as root,
is it your case as well ?
Also, please note that lttng-modules exposes a /proc/lttng-logger
file, into which you can write from user-space. It ends up in the
kernel trace. This could do the trick if for some reason lttng-ust
does not work for your scenario. However, the lttng-logger proc
file is just there as a convenience for scripts, and is by no mean
a "fast" tracing interface.
Thanks,
Mathieu
>
> Thanks
>
> --
> Milian Wolff | milian.wolff at kdab.com | Senior Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list