[lttng-dev] liblttng-ust overhead, take 2

Thu Sep 7 16:29:58 UTC 2017

On 7 September 2017 at 04:47, Milian Wolff <milian.wolff at kdab.com> wrote:
> Hey all,
>
> after my latest report over liblttng-ust overhead yielded such quick results,
> I would like to expand on this with a more realistic workload that I'm looking
> at.
>
> We have two applications, one that sets up an SQLite database and then
> launches multiple client applications that try to read and write concurrently
> from that database. Of course, this will show file-level lock contention. I'd
> like to use LTTng to visualize the effect.
>
> On my machine I get the following numbers.
>
> 1) Without LTTng and without UST:
> ~~~~~
>  Performance counter stats for './launcher/lab_latency_launcher':
>
>         495.075367      task-clock (msec)         #    0.150 CPUs utilized
>              2,340      context-switches          #    0.005 M/sec
>                291      cpu-migrations            #    0.588 K/sec
>             52,265      page-faults               #    0.106 M/sec
>      3,098,760,038      cycles                    #    6.259 GHz
>      1,894,074,743      instructions              #    0.61  insn per cycle
>        354,869,636      branches                  #  716.799 M/sec
>         10,317,266      branch-misses             #    2.91% of all branches
>
>        3.298337574 seconds time elapsed
> ~~~~~~
>
> 2) With LTTng (all kernel events enabled) and without UST:
> ~~~~~
>  Performance counter stats for 'run_lttng_trace.sh ./launcher/
> lab_latency_launcher':
>
>         296.329694      task-clock (msec)         #    0.068 CPUs utilized
>              1,808      context-switches          #    0.006 M/sec
>                398      cpu-migrations            #    0.001 M/sec
>             53,493      page-faults               #    0.181 M/sec
>      2,411,871,165      cycles                    #    8.139 GHz
>      2,370,005,487      instructions              #    0.98  insn per cycle
>        425,284,747      branches                  # 1435.174 M/sec
>         10,442,662      branch-misses             #    2.46% of all branches
>
>        4.338938821 seconds time elapsed
> ~~~~~~
>
> 3) With LTTng (all kernel events enabled) and with UST `tracef()` events
> enabled (~400 UST events in total):
>

We need more data to make sense of those numbers.

What does the payload of those events look like? Using tracef(),
string formating will most likely dominate over the time taken to
write to the bufers.

How many clients are you launching? What is their expected lifetime?

Thanks,
Jérémie

> ~~~~~~
>  Performance counter stats for './launcher/lab_latency_launcher':
>
>         796.595231      task-clock (msec)         #    0.069 CPUs utilized
>             17,728      context-switches          #    0.022 M/sec
>                429      cpu-migrations            #    0.539 K/sec
>             85,737      page-faults               #    0.108 M/sec
>      4,483,999,258      cycles                    #    5.629 GHz
>      5,112,633,073      instructions              #    1.14  insn per cycle
>        985,315,773      branches                  # 1236.909 M/sec
>         14,993,555      branch-misses             #    1.52% of all branches
>
>       11.505288968 seconds time elapsed
> ~~~~~~
>
> 4) This is running against the patched userspace-urcu. Without the performance
> fix there, the performance is even worse:
>
> ~~~~~~
>  Performance counter stats for 'run_lttng_trace.sh ./launcher/
> lab_latency_launcher':
>
>         952.959433      task-clock (msec)         #    0.019 CPUs utilized
>              6,021      context-switches          #    0.006 M/sec
>                954      cpu-migrations            #    0.001 M/sec
>             69,310      page-faults               #    0.073 M/sec
>      5,335,988,506      cycles                    #    5.599 GHz
>      6,031,685,882      instructions              #    1.13  insn per cycle
>      1,093,132,635      branches                  # 1147.093 M/sec
>         16,090,093      branch-misses             #    1.47% of all branches
>
>       50.786464678 seconds time elapsed
> ~~~~~
>
> So, I don't think we need to talk about 4) anymore. But I do think that
> comparing 3) to 2) begs for clarifications. Just linking against lttng-ust
> grinds the workload to a halt. The ~400 UST events cannot explain this at all.
> Additionally, the timeline visualization of the processes becomes much harder,
> as the UST overhead seems to dominate. This becomes apparent when comparing
> the 2) and 3) data in tracecompass visually...
>
> Could anyone look into this please? As-is, I cannot recommend using lttng-ust.
> Rather, I'd opt for "faking" the custom trace points by e.g. doing something
> like open("/UST-FAKE/$payload-here"). This also allows me to "annotate" the
> kernel trace, a bit clumsily, but at least it has extremely low overhead when
> compared to UST...
>
> You can find a toy Qt 5 application to reproduce my measurements above here:
>
> https://swanson.kdab.com/owncloud/index.php/s/FGI2PZNvwzoR4RZ
>
> Unpack it, then build it with:
>
> ~~~~~
> cd ust-slowness
> mkdir build-with-ust build-no-ust
> cd build-with-ust
> qmake -r CONFIG+=with_lttng_ust ..
> make
> cd ../build-no-ust
> qmake -r ..
> make
> ~~~~~~
>
> Then you can reproduce the issue by comparing runs of the `launcher` binary in
> the two build folders. One will be quick, the other will be dog slow. I hope
> you guys find more optimization opportunities here.
>
> If not, then shouldn't it be possible to get something like tracef() simulated
> via a custom syscall that gets directly handled by the kernel LTTng module?
> This would hopefully be just as fast as using only the kernel tracing, while
> removing the overhead of UST tracing... Just an idea.
>
> Cheers
>
> --
> Milian Wolff | milian.wolff at kdab.com | Senior Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>

-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com