[lttng-dev] large liblttng-ust startup overhead (ust_lock)

Milian Wolff milian.wolff at kdab.com
Wed Sep 6 07:42:44 UTC 2017


On Dienstag, 5. September 2017 20:20:10 CEST Mathieu Desnoyers wrote:
> ----- On Sep 5, 2017, at 11:10 AM, Milian Wolff milian.wolff at kdab.com wrote:
> > On Tuesday, September 5, 2017 4:52:42 PM CEST Milian Wolff wrote:
> >> On Tuesday, September 5, 2017 4:34:54 PM CEST Milian Wolff wrote:
> >> > Hey all,
> >> > 
> >> > I have noticed a very large overhead when linking against liblttng-ust:

<snip>

> >> > 
> >> > This is without any LTTng session configured. If I enable LTTng kernel
> >> > and
> >> > userspace events, this becomes even worse:
> >> > 
> >> > ~~~~~
> >> > ┌milian at milian-kdab2:/tmp
> >> > └$ cat $(which run_lttng_trace.sh)
> >> > #!/bin/sh
> >> > 
> >> > if [ -z "$(pidof lttng-sessiond)" ]; then
> >> > 
> >> >     sudo lttng-sessiond --daemonize
> >> > 
> >> > fi
> >> > 
> >> > sudo lttng create -o ~/lttng-traces/$(date -Iseconds)
> >> > sudo lttng enable-channel kernel -k --subbuf-size 16M --num-subbuf 8
> >> > sudo lttng enable-event -c kernel -k -a
> >> > sudo lttng enable-channel ust -u --subbuf-size 16M --num-subbuf 8
> >> > sudo lttng enable-event -c ust -u lttng_ust_tracef:*
> >> > sudo lttng start
> >> > $@
> >> > sudo lttng stop
> >> > 
> >> > sudo chmod a+rx -R ~/lttng-traces

<snip>

> > One thing that is apparent from the above, and can also be made visible by
> > adding some visible output to the application before quitting it: It's not
> > the main thread that's blocked initially. Rather, the main thread spins
> > up the secondary -ust threads, then runs its code and finishes before the
> > -ust threads manage to initialize. Before the main thread then quits, it
> > waits for the ust threads to finish.
> 
> Those threads appear to be talking to the session daemon (on recvmsg).
> Please note that there are two cases there:
> 
> - If you use per-pid buffers (explicitly at channel creation), the consumer
>   daemon need to allocate buffers on behalf of each and every new process
>   that appear while tracing, which takes a significant amount of time,
> - If you use default buffers (per-uid), only the first process that
> registers for a specific UID will require that the consumer daemon
> allocates buffers. After that, the same buffers will be sent to other
> applications registering for the same user ID.
> 
> Clearly, per-pid buffers incur a significant slowdown at application start.
> This is one reason why those are not used by default.
> 
> If you benchmark per-uid buffers, you will want to benchmark those two
> scenarios:
> 
> 1) startup time of the first process that registers for a UID,
> 
> 2) startup time of following process registrations
> 
> Technically, the buffer allocation delay only impacts the first process that
> registers for a UID, and not all of them.

See my lttng commands above, I'm using the default buffer scheme, so if I 
understand you correctly, that should be per-UID. Looking into the lttng trace 
folder, I can confirm this by looking at the directory hierarchy:

~~~~~
┌milian at milian-kdab2:~/lttng-traces/2017-09-05T16:44:39+02:00
└$ find
.
./kernel
./kernel/kernel_6
./kernel/kernel_4
./kernel/kernel_2
./kernel/metadata
./kernel/kernel_1
./kernel/kernel_5
./kernel/index
./kernel/index/kernel_1.idx
./kernel/index/kernel_7.idx
./kernel/index/kernel_4.idx
./kernel/index/kernel_0.idx
./kernel/index/kernel_5.idx
./kernel/index/kernel_3.idx
./kernel/index/kernel_6.idx
./kernel/index/kernel_2.idx
./kernel/kernel_7
./kernel/kernel_0
./kernel/kernel_3
./ust
./ust/uid
./ust/uid/1000
./ust/uid/1000/64-bit
./ust/uid/1000/64-bit/ust_6
./ust/uid/1000/64-bit/ust_7
./ust/uid/1000/64-bit/ust_1
./ust/uid/1000/64-bit/metadata
./ust/uid/1000/64-bit/ust_0
./ust/uid/1000/64-bit/ust_3
./ust/uid/1000/64-bit/index
./ust/uid/1000/64-bit/index/ust_2.idx
./ust/uid/1000/64-bit/index/ust_3.idx
./ust/uid/1000/64-bit/index/ust_5.idx
./ust/uid/1000/64-bit/index/ust_6.idx
./ust/uid/1000/64-bit/index/ust_0.idx
./ust/uid/1000/64-bit/index/ust_4.idx
./ust/uid/1000/64-bit/index/ust_7.idx
./ust/uid/1000/64-bit/index/ust_1.idx
./ust/uid/1000/64-bit/ust_5
./ust/uid/1000/64-bit/ust_4
./ust/uid/1000/64-bit/ust_2
~~~~~

Note that my measurements somewhat contradict your assumptions thought: It is 
not only the first task that I run while tracing which gets slowed down. It's 
all of them. Have a look at the `perf stat` output I pasted:

       0.965545882 seconds time elapsed ( +- 16.96% )

I can reproduce values within that variance by running the example code 
multiple times and measuring each time individually, while keeping the lttng 
session up and running.

Bye

-- 
Milian Wolff | milian.wolff at kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts




More information about the lttng-dev mailing list