[lttng-dev] large liblttng-ust startup overhead (ust_lock)
Milian Wolff
milian.wolff at kdab.com
Tue Sep 5 15:10:41 UTC 2017
On Tuesday, September 5, 2017 4:52:42 PM CEST Milian Wolff wrote:
> On Tuesday, September 5, 2017 4:34:54 PM CEST Milian Wolff wrote:
> > Hey all,
> >
> > I have noticed a very large overhead when linking against liblttng-ust:
> >
> > ~~~~~
> > ┌milian at milian-kdab2:/tmp
> > └$ cat lttng-test.c
> > int main()
> > {
> >
> > return 0;
> >
> > }
> > ┌milian at milian-kdab2:/tmp
> > └$ gcc -O2 -g -ldl lttng-test.c
> > ┌milian at milian-kdab2:/tmp
> > └$ perf stat -r 5 ./a.out
> >
> > Performance counter stats for './a.out' (5 runs):
> > 0.209587 task-clock (msec) # 0.596 CPUs utilized
> >
> > ( +- 8.76% )
> >
> > 0 context-switches # 0.000 K/sec
> > 0 cpu-migrations # 0.000 K/sec
> >
> > 49 page-faults # 0.235 M/sec
> >
> > ( +- 1.19% )
> >
> > 706,854 cycles # 3.373 GHz
> >
> > ( +- 8.82% )
> >
> > 773,603 instructions # 1.09 insn per
> > cycle
> >
> > ( +- 0.75% )
> >
> > 147,128 branches # 701.987 M/sec
> >
> > ( +- 0.70% )
> >
> > 4,096 branch-misses # 2.78% of all
> > branches
> >
> > ( +- 5.27% )
> >
> > 0.000351422 seconds time elapsed
> >
> > ( +- 11.85% )
> >
> > ┌milian at milian-kdab2:/tmp
> > └$ gcc -O2 -g -ldl -llttng-ust lttng-test.c
> > ┌milian at milian-kdab2:/tmp
> > └$ perf stat -r 5 ./a.out
> >
> > Performance counter stats for './a.out' (5 runs):
> > 2.063040 task-clock (msec) # 0.009 CPUs utilized
> >
> > ( +- 1.37% )
> >
> > 44 context-switches # 0.021 M/sec
> >
> > ( +- 1.95% )
> >
> > 2 cpu-migrations # 0.776 K/sec
> >
> > ( +- 25.00% )
> >
> > 209 page-faults # 0.101 M/sec
> >
> > ( +- 0.34% )
> >
> > 7,053,686 cycles # 3.419 GHz
> >
> > ( +- 2.03% )
> >
> > 6,893,783 instructions # 0.98 insn per
> > cycle
> >
> > ( +- 0.25% )
> >
> > 1,342,492 branches # 650.735 M/sec
> >
> > ( +- 0.20% )
> >
> > 29,390 branch-misses # 2.19% of all
> > branches
> >
> > ( +- 0.61% )
> >
> > 0.225597302 seconds time elapsed
> >
> > ( +- 6.68% )
> > ~~~~~
> >
> > This is without any LTTng session configured. If I enable LTTng kernel and
> > userspace events, this becomes even worse:
> >
> > ~~~~~
> > ┌milian at milian-kdab2:/tmp
> > └$ cat $(which run_lttng_trace.sh)
> > #!/bin/sh
> >
> > if [ -z "$(pidof lttng-sessiond)" ]; then
> >
> > sudo lttng-sessiond --daemonize
> >
> > fi
> >
> > sudo lttng create -o ~/lttng-traces/$(date -Iseconds)
> > sudo lttng enable-channel kernel -k --subbuf-size 16M --num-subbuf 8
> > sudo lttng enable-event -c kernel -k -a
> > sudo lttng enable-channel ust -u --subbuf-size 16M --num-subbuf 8
> > sudo lttng enable-event -c ust -u lttng_ust_tracef:*
> > sudo lttng start
> > $@
> > sudo lttng stop
> >
> > sudo chmod a+rx -R ~/lttng-traces
> > ┌milian at milian-kdab2:/tmp
> > └$ run_lttng_trace.sh perf stat -r 5 ./a.out
> > Session auto-20170905-162818 created.
> > Traces will be written in
> > /home/milian/lttng-traces/2017-09-05T16:28:18+02:00 Kernel channel kernel
> > enabled for session auto-20170905-162818
> > All Kernel events are enabled in channel kernel
> > UST channel ust enabled for session auto-20170905-162818
> > UST event lttng_ust_tracef:* created in channel ust
> > Tracing started for session auto-20170905-162818
> >
> > Performance counter stats for './a.out' (5 runs):
> > 4.937820 task-clock (msec) # 0.005 CPUs utilized
> >
> > ( +- 1.28% )
> >
> > 117 context-switches # 0.024 M/sec
> >
> > ( +- 0.70% )
> >
> > 3 cpu-migrations # 0.608 K/sec
> >
> > ( +- 23.57% )
> >
> > 245 page-faults # 0.050 M/sec
> >
> > ( +- 0.20% )
> >
> > 16,541,355 cycles # 3.350 GHz
> >
> > ( +- 0.94% )
> >
> > 20,611,637 instructions # 1.25 insn per
> > cycle
> >
> > ( +- 0.23% )
> >
> > 3,580,525 branches # 725.123 M/sec
> >
> > ( +- 0.19% )
> >
> > 52,093 branch-misses # 1.45% of all
> > branches
> >
> > ( +- 1.73% )
> >
> > 0.965545882 seconds time elapsed
> >
> > ( +- 16.96% )
> >
> > Waiting for data availability...
> > Tracing stopped for session auto-20170905-162818
> > ~~~~~
> >
> > I see this ~1s delay during startup in nearly all test applications where
> > I
> > wanted to use lttng-ust. In tracecompass, and according to perf sleep time
> > profiling, this seems to be mostly due to extensive sleeping on mutex
> > locks
> > (from ust_lock in liblttng-ust). With perf trace, I also see membarrier
> > taking a lot of time:
> >
> > ~~~~~
> > # without a running lttng session:
> > ┌milian at milian-kdab2:/tmp
> > └$ perf trace --duration 1 ./a.out
> >
> > 6.492 (52.468 ms): a.out/23672 recvmsg(fd: 3<socket:[1178439]>, msg:
> > 0x7fbe2fbb1070 ) = 1
> >
> > 5.077 (54.271 ms): a.out/23671 futex(uaddr: 0x7fbe30d508a0, op:
> > WAIT_BITSET|PRIV|CLKRT, utime: 0x7ffc474ff5a0, val3: 4294967295) = 0
> >
> > 59.598 (79.379 ms): a.out/23671 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 138.984 (39.945 ms): a.out/23671 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 59.621 (119.324 ms): a.out/23672 futex(uaddr: 0x7fbe303e54c0, op:
> > WAIT|
> >
> > PRIV, val: 2 ) = 0
> >
> > 179.045 (79.918 ms): a.out/23671 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 258.971 (39.997 ms): a.out/23671 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 299.052 (79.883 ms): a.out/23671 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 378.942 (59.988 ms): a.out/23671 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 439.022 (69.908 ms): a.out/23671 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 508.937 (80.027 ms): a.out/23671 membarrier(cmd: 1
> >
> > ) = 0
> >
> > # with a running lttng session:
> > ┌milian at milian-kdab2:/tmp
> > └$ run_lttng_trace.sh perf trace --duration 1 ./a.out
> >
> > 6.666 (392.496 ms): a.out-ust/23790 recvmsg(fd: 3<socket:[1185362]>,
> >
> > msg: 0x7fcd81d8a0b0 ) = 612
> >
> > 400.238 (68.452 ms): a.out-ust/23790 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 468.698 (39.983 ms): a.out-ust/23790 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 510.567 (98.113 ms): a.out-ust/23790 recvmsg(fd: 3<socket:[1185362]>,
> >
> > msg: 0x7fcd81d8a070 ) = 1
> >
> > 6.467 (603.500 ms): a.out/23789 futex(uaddr: 0x7fcd82f298a0, op:
> > WAIT_BITSET|PRIV|CLKRT, utime: 0x7fff4df6c970, val3: 4294967295) = 0
> >
> > 610.020 ( 6.307 ms): a.out/23789 futex(uaddr: 0x7fcd82f29960, op: WAIT|
> >
> > PRIV, val: 2 ) = 0
> >
> > 616.446 (62.265 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 678.721 (59.916 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 616.332 (122.335 ms): a.out-ust/23790 futex(uaddr: 0x7fcd82f29960, op:
> > WAIT|PRIV, val: 2 ) = 0
> >
> > 738.976 (49.658 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 788.642 (49.987 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 739.014 (99.644 ms): a.out-ust/23790 futex(uaddr: 0x7fcd825be4c0, op:
> > WAIT| PRIV, val: 2 ) = 0
> >
> > 838.651 (49.977 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 888.636 (69.983 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 838.662 (119.978 ms): a.out-ust/23790 futex(uaddr: 0x7fcd825be4c0, op:
> > WAIT|PRIV, val: 2 ) = 0
> >
> > 958.739 (59.873 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 1018.621 (80.058 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 1098.782 (49.844 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 1148.634 (59.998 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 1208.731 (69.940 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 1278.680 (79.947 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 1358.727 (39.881 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 1398.616 (39.980 ms): a.out/23789 membarrier(cmd: 1
> >
> > ) = 0
> >
> > 6.640 (1432.157 ms): a.out-ust/23791 futex(uaddr: 0x7fcd82f90000
> >
> > ) = -1 (null) INTERNAL ERROR: strerror_r(512, 0x7ffdbc5467a0, 128)=22
> > ~~~~~
> >
> > This makes lttng-ust pretty much unusable for any short-lived application.
> > Is this a known limitation?
> >
> > I'm using lttng-ust on Arch from the AUR packages, i.e. LTTng 2.9.4. I've
> > seen this problem also on Ubuntu 17.04 though I didn't do any of the above
> > measurements there. So far, I suspected the ust statedump to take too much
> > time, but I disabled that one for the measurements above.
> >
> > Any help would be appreciated
>
> Here's the perf sleep record data when no session is configured:
>
> ~~~~~
> 100% = 848.063336375 ms
>
> # Overhead Pid:Command Shared Object Symbol Source:Line
> # ........ ................. ................ ..............
> ........... #
> 76.50% 10336:a.out-ust [kernel.vmlinux] [k] __schedule core.c:0
>
> ---__schedule core.c:0
> schedule
> futex_wait_queue_me futex.c:0
> futex_wait futex.c:0
> do_futex
> sys_futex
> do_syscall_64
> return_from_SYSCALL_64 .tmp_entry_64.o:0
> syscall
> ust_listener_thread futex.h:65
> /usr/include/urcu/futex.h:65 (inline)
> /usr/include/urcu/futex.h:97 (inline)
> /home/milian/projects/src/lttng/lttng-ust/liblttng-ust/lttng-
> ust-comm.c:1311 (inline)
> /home/milian/projects/src/lttng/lttng-ust/liblttng-ust/lttng-
> ust-comm.c:1373 (inline)
>
> 15.79% 10335:a.out-ust [kernel.vmlinux] [k] __schedule core.c:0
>
> ---__schedule core.c:0
> schedule
>
> --15.70%--futex_wait_queue_me futex.c:0
> futex_wait futex.c:0
> do_futex
> sys_futex
> do_syscall_64
> return_from_SYSCALL_64 .tmp_entry_64.o:0
> __lll_lock_wait
> pthread_mutex_lock
>
> --15.69%--0x2338
> 0x3b29
> __nptl_deallocate_tsd.part.4
> pthread_create.c:0
> start_thread pthread_create.c:0
> __clone
>
> 7.72% 10334:a.out [kernel.vmlinux] [k] __schedule core.c:0
>
> ---__schedule core.c:0
> schedule
> futex_wait_queue_me futex.c:0
> futex_wait futex.c:0
> do_futex
> sys_futex
> do_syscall_64
> return_from_SYSCALL_64 .tmp_entry_64.o:0
> do_futex_wait sem_timedwait.c:0
> __new_sem_wait_slow sem_timedwait.c:0
> lttng_ust_init.part.3 lttng-ust-comm.c:1757
> call_init.part.0 dl-init.c:0
> _dl_init
> _dl_start_user
>
> ~~~~~
>
> Here's the perf sleep record data for an existing session:
>
> ~~~~~
> 100% = 3.0793293265 s
>
> # Overhead Pid:Command Shared Object Symbol Source:Line
> # ........ ................. ................ ..............
> ........... #
> 52.10% 10762:a.out-ust [kernel.vmlinux] [k] __schedule core.c:0
>
> ---__schedule core.c:0
> schedule
> futex_wait_queue_me futex.c:0
> futex_wait futex.c:0
> do_futex
> sys_futex
> do_syscall_64
> return_from_SYSCALL_64 .tmp_entry_64.o:0
> syscall
> ust_listener_thread futex.h:65
> /usr/include/urcu/futex.h:65 (inline)
> /usr/include/urcu/futex.h:97 (inline)
> /home/milian/projects/src/lttng/lttng-ust/liblttng-ust/lttng-
> ust-comm.c:1311 (inline)
> /home/milian/projects/src/lttng/lttng-ust/liblttng-ust/lttng-
> ust-comm.c:1373 (inline)
>
> 27.09% 10761:a.out-ust [kernel.vmlinux] [k] __schedule core.c:0
>
> ---__schedule core.c:0
> schedule
>
> |--13.69%--schedule_timeout
> |
> | unix_stream_read_generic af_unix.c:0
> | unix_stream_recvmsg af_unix.c:0
> | sock_recvmsg
> | ___sys_recvmsg socket.c:0
> | __sys_recvmsg
> | sys_recvmsg
> | do_syscall_64
> | return_from_SYSCALL_64 .tmp_entry_64.o:0
> | __libc_recvmsg
> | ustcomm_recv_unix_sock lttng-ust-comm.c:307
>
> --13.40%--futex_wait_queue_me futex.c:0
> futex_wait futex.c:0
> do_futex
> sys_futex
> do_syscall_64
> return_from_SYSCALL_64 .tmp_entry_64.o:0
> __lll_lock_wait
> pthread_mutex_lock
>
> |--8.41%--0x2338
> |
> | 0x3b29
> | __nptl_deallocate_tsd.part.4
>
> pthread_create.c:0
>
> | start_thread pthread_create.c:0
> | __clone
>
> --4.99%--ust_lock_nocheck lttng-ust-comm.c:178
>
> 20.81% 10760:a.out [kernel.vmlinux] [k] __schedule core.c:0
>
> ---__schedule core.c:0
> schedule
>
> --20.81%--futex_wait_queue_me futex.c:0
> futex_wait futex.c:0
> do_futex
> sys_futex
> do_syscall_64
> return_from_SYSCALL_64 .tmp_entry_64.o:0
>
> --20.77%--do_futex_wait sem_timedwait.c:0
> __new_sem_wait_slow sem_timedwait.c:0
> lttng_ust_init.part.3 lttng-ust-comm.c:
> 1757
> call_init.part.0 dl-init.c:0
> _dl_init
> _dl_start_user
>
> ~~~~~
>
> Hope that helps
One thing that is apparent from the above, and can also be made visible by
adding some visible output to the application before quitting it: It's not the
main thread that's blocked initially. Rather, the main thread spins up the
secondary -ust threads, then runs its code and finishes before the -ust
threads manage to initialize. Before the main thread then quits, it waits for
the ust threads to finish.
--
Milian Wolff | milian.wolff at kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3826 bytes
Desc: not available
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20170905/194b5921/attachment-0001.bin>
More information about the lttng-dev
mailing list