[lttng-dev] large liblttng-ust startup overhead (ust_lock)

Milian Wolff milian.wolff at kdab.com
Tue Sep 5 14:34:54 UTC 2017


Hey all,

I have noticed a very large overhead when linking against liblttng-ust:

~~~~~
┌milian at milian-kdab2:/tmp
└$ cat lttng-test.c  
int main()
{
  return 0;
}
┌milian at milian-kdab2:/tmp
└$ gcc -O2 -g -ldl lttng-test.c                     
┌milian at milian-kdab2:/tmp
└$ perf stat -r 5 ./a.out 

 Performance counter stats for './a.out' (5 runs):

          0.209587      task-clock (msec)         #    0.596 CPUs utilized            
( +-  8.76% )
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                49      page-faults               #    0.235 M/sec                    
( +-  1.19% )
           706,854      cycles                    #    3.373 GHz                      
( +-  8.82% )
           773,603      instructions              #    1.09  insn per cycle           
( +-  0.75% )
           147,128      branches                  #  701.987 M/sec                    
( +-  0.70% )
             4,096      branch-misses             #    2.78% of all branches          
( +-  5.27% )

       0.000351422 seconds time elapsed                                          
( +- 11.85% )

┌milian at milian-kdab2:/tmp
└$ gcc -O2 -g -ldl -llttng-ust lttng-test.c
┌milian at milian-kdab2:/tmp
└$ perf stat -r 5 ./a.out 

 Performance counter stats for './a.out' (5 runs):

          2.063040      task-clock (msec)         #    0.009 CPUs utilized            
( +-  1.37% )
                44      context-switches          #    0.021 M/sec                    
( +-  1.95% )
                 2      cpu-migrations            #    0.776 K/sec                    
( +- 25.00% )
               209      page-faults               #    0.101 M/sec                    
( +-  0.34% )
         7,053,686      cycles                    #    3.419 GHz                      
( +-  2.03% )
         6,893,783      instructions              #    0.98  insn per cycle           
( +-  0.25% )
         1,342,492      branches                  #  650.735 M/sec                    
( +-  0.20% )
            29,390      branch-misses             #    2.19% of all branches          
( +-  0.61% )

       0.225597302 seconds time elapsed                                          
( +-  6.68% )
~~~~~

This is without any LTTng session configured. If I enable LTTng kernel and 
userspace events, this becomes even worse:

~~~~~
┌milian at milian-kdab2:/tmp
└$ cat $(which run_lttng_trace.sh)
#!/bin/sh

if [ -z "$(pidof lttng-sessiond)" ]; then
    sudo lttng-sessiond --daemonize
fi

sudo lttng create -o ~/lttng-traces/$(date -Iseconds)
sudo lttng enable-channel kernel -k --subbuf-size 16M --num-subbuf 8
sudo lttng enable-event -c kernel -k -a
sudo lttng enable-channel ust -u --subbuf-size 16M --num-subbuf 8
sudo lttng enable-event -c ust -u lttng_ust_tracef:*
sudo lttng start
$@
sudo lttng stop

sudo chmod a+rx -R ~/lttng-traces
┌milian at milian-kdab2:/tmp
└$ run_lttng_trace.sh perf stat -r 5 ./a.out 
Session auto-20170905-162818 created.
Traces will be written in /home/milian/lttng-traces/2017-09-05T16:28:18+02:00
Kernel channel kernel enabled for session auto-20170905-162818
All Kernel events are enabled in channel kernel
UST channel ust enabled for session auto-20170905-162818
UST event lttng_ust_tracef:* created in channel ust
Tracing started for session auto-20170905-162818

 Performance counter stats for './a.out' (5 runs):

          4.937820      task-clock (msec)         #    0.005 CPUs utilized            
( +-  1.28% )
               117      context-switches          #    0.024 M/sec                    
( +-  0.70% )
                 3      cpu-migrations            #    0.608 K/sec                    
( +- 23.57% )
               245      page-faults               #    0.050 M/sec                    
( +-  0.20% )
        16,541,355      cycles                    #    3.350 GHz                      
( +-  0.94% )
        20,611,637      instructions              #    1.25  insn per cycle           
( +-  0.23% )
         3,580,525      branches                  #  725.123 M/sec                    
( +-  0.19% )
            52,093      branch-misses             #    1.45% of all branches          
( +-  1.73% )

       0.965545882 seconds time elapsed                                          
( +- 16.96% )

Waiting for data availability...
Tracing stopped for session auto-20170905-162818
~~~~~

I see this ~1s delay during startup in nearly all test applications where I 
wanted to use lttng-ust. In tracecompass, and according to perf sleep time 
profiling, this seems to be mostly due to extensive sleeping on mutex locks 
(from ust_lock in liblttng-ust). With perf trace, I also see membarrier taking 
a lot of time:

~~~~~
# without a running lttng session:
┌milian at milian-kdab2:/tmp
└$ perf trace --duration 1 ./a.out
     6.492 (52.468 ms): a.out/23672 recvmsg(fd: 3<socket:[1178439]>, msg: 
0x7fbe2fbb1070                  ) = 1
     5.077 (54.271 ms): a.out/23671 futex(uaddr: 0x7fbe30d508a0, op: 
WAIT_BITSET|PRIV|CLKRT, utime: 0x7ffc474ff5a0, val3: 4294967295) = 0
    59.598 (79.379 ms): a.out/23671 membarrier(cmd: 1                                                     
) = 0
   138.984 (39.945 ms): a.out/23671 membarrier(cmd: 1                                                     
) = 0
    59.621 (119.324 ms): a.out/23672 futex(uaddr: 0x7fbe303e54c0, op: WAIT|
PRIV, val: 2                    ) = 0
   179.045 (79.918 ms): a.out/23671 membarrier(cmd: 1                                                     
) = 0
   258.971 (39.997 ms): a.out/23671 membarrier(cmd: 1                                                     
) = 0
   299.052 (79.883 ms): a.out/23671 membarrier(cmd: 1                                                     
) = 0
   378.942 (59.988 ms): a.out/23671 membarrier(cmd: 1                                                     
) = 0
   439.022 (69.908 ms): a.out/23671 membarrier(cmd: 1                                                     
) = 0
   508.937 (80.027 ms): a.out/23671 membarrier(cmd: 1                                                     
) = 0

# with a running lttng session:
┌milian at milian-kdab2:/tmp
└$ run_lttng_trace.sh perf trace --duration 1 ./a.out 
     6.666 (392.496 ms): a.out-ust/23790 recvmsg(fd: 3<socket:[1185362]>, msg: 
0x7fcd81d8a0b0                  ) = 612
   400.238 (68.452 ms): a.out-ust/23790 membarrier(cmd: 1                                                     
) = 0
   468.698 (39.983 ms): a.out-ust/23790 membarrier(cmd: 1                                                     
) = 0
   510.567 (98.113 ms): a.out-ust/23790 recvmsg(fd: 3<socket:[1185362]>, msg: 
0x7fcd81d8a070                  ) = 1
     6.467 (603.500 ms): a.out/23789 futex(uaddr: 0x7fcd82f298a0, op: 
WAIT_BITSET|PRIV|CLKRT, utime: 0x7fff4df6c970, val3: 4294967295) = 0
   610.020 ( 6.307 ms): a.out/23789 futex(uaddr: 0x7fcd82f29960, op: WAIT|
PRIV, val: 2                    ) = 0
   616.446 (62.265 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
   678.721 (59.916 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
   616.332 (122.335 ms): a.out-ust/23790 futex(uaddr: 0x7fcd82f29960, op: 
WAIT|PRIV, val: 2                    ) = 0
   738.976 (49.658 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
   788.642 (49.987 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
   739.014 (99.644 ms): a.out-ust/23790 futex(uaddr: 0x7fcd825be4c0, op: WAIT|
PRIV, val: 2                    ) = 0
   838.651 (49.977 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
   888.636 (69.983 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
   838.662 (119.978 ms): a.out-ust/23790 futex(uaddr: 0x7fcd825be4c0, op: 
WAIT|PRIV, val: 2                    ) = 0
   958.739 (59.873 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
  1018.621 (80.058 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
  1098.782 (49.844 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
  1148.634 (59.998 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
  1208.731 (69.940 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
  1278.680 (79.947 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
  1358.727 (39.881 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
  1398.616 (39.980 ms): a.out/23789 membarrier(cmd: 1                                                     
) = 0
     6.640 (1432.157 ms): a.out-ust/23791 futex(uaddr: 0x7fcd82f90000                                           
) = -1 (null) INTERNAL ERROR: strerror_r(512, 0x7ffdbc5467a0, 128)=22
~~~~~

This makes lttng-ust pretty much unusable for any short-lived application. Is 
this a known limitation?

I'm using lttng-ust on Arch from the AUR packages, i.e. LTTng 2.9.4. I've seen 
this problem also on Ubuntu 17.04 though I didn't do any of the above 
measurements there. So far, I suspected the ust statedump to take too much 
time, but I disabled that one for the measurements above.

Any help would be appreciated

-- 
Milian Wolff | milian.wolff at kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3826 bytes
Desc: not available
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20170905/90afe8c3/attachment.bin>


More information about the lttng-dev mailing list