[lttng-dev] liblttng-ust overhead, take 2

Thu Sep 7 17:06:36 UTC 2017

On Thursday, September 7, 2017 6:29:58 PM CEST Jérémie Galarneau wrote:
> On 7 September 2017 at 04:47, Milian Wolff <milian.wolff at kdab.com> wrote:
> > Hey all,
> > 
> > after my latest report over liblttng-ust overhead yielded such quick
> > results, I would like to expand on this with a more realistic workload
> > that I'm looking at.
> > 
> > We have two applications, one that sets up an SQLite database and then
> > launches multiple client applications that try to read and write
> > concurrently from that database. Of course, this will show file-level
> > lock contention. I'd like to use LTTng to visualize the effect.
> > 
> > On my machine I get the following numbers.
> > 
> > 1) Without LTTng and without UST:
> > ~~~~~
> > 
> >  Performance counter stats for './launcher/lab_latency_launcher':
> >         495.075367      task-clock (msec)         #    0.150 CPUs utilized
> >         
> >              2,340      context-switches          #    0.005 M/sec
> >              
> >                291      cpu-migrations            #    0.588 K/sec
> >             
> >             52,265      page-faults               #    0.106 M/sec
> >      
> >      3,098,760,038      cycles                    #    6.259 GHz
> >      1,894,074,743      instructions              #    0.61  insn per
> >      cycle
> >      
> >        354,869,636      branches                  #  716.799 M/sec
> >        
> >         10,317,266      branch-misses             #    2.91% of all
> >         branches
> >        
> >        3.298337574 seconds time elapsed
> > 
> > ~~~~~~
> > 
> > 2) With LTTng (all kernel events enabled) and without UST:
> > ~~~~~
> > 
> >  Performance counter stats for 'run_lttng_trace.sh ./launcher/
> > 
> > lab_latency_launcher':
> >         296.329694      task-clock (msec)         #    0.068 CPUs utilized
> >         
> >              1,808      context-switches          #    0.006 M/sec
> >              
> >                398      cpu-migrations            #    0.001 M/sec
> >             
> >             53,493      page-faults               #    0.181 M/sec
> >      
> >      2,411,871,165      cycles                    #    8.139 GHz
> >      2,370,005,487      instructions              #    0.98  insn per
> >      cycle
> >      
> >        425,284,747      branches                  # 1435.174 M/sec
> >        
> >         10,442,662      branch-misses             #    2.46% of all
> >         branches
> >        
> >        4.338938821 seconds time elapsed
> > 
> > ~~~~~~
> > 
> > 3) With LTTng (all kernel events enabled) and with UST `tracef()` events
> 
> > enabled (~400 UST events in total):
> We need more data to make sense of those numbers.
> 
> What does the payload of those events look like? Using tracef(),
> string formating will most likely dominate over the time taken to
> write to the bufers.

I attached the complete source code for you to reproduce this yourself. Also, 
string formatting _does not_ dominate the time, otherwise I'd see more CPU 
utilization (cf. perf stat output's first line).

> How many clients are you launching? What is their expected lifetime?

100 clients, 10 in parallel, each taking a couple of milliseconds. See the 
source code for yourself.

Thanks

-- 
Milian Wolff | milian.wolff at kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3826 bytes
Desc: not available
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20170907/af2ba66d/attachment.bin>