[lttng-dev] Reducing profiling overhead

Wed Jun 14 15:10:16 UTC 2017

----- On Jun 13, 2017, at 3:41 PM, Tom Deneau tom.deneau at amd.com wrote:

> I am trying to use lttng to simultaneously trace some kernel and user events.
> The kernel events are a few syscalls (sendto, futex).
> The user events are the pthread ones in lttng-ust-libc-wrapper.
> 
> Looking for recommended lttng configurations to:
>   * avoid dropping any events
>   * avoid too much overhead to slow down the profiled application.
> 
> As background info, the profiled application might be running as many as 64
> software threads.
> Each thread might be doing as many as 4000 "operations"/ sec so approximately
> 250,000 total
> operations/sec.  A single operation involves about 60-70 trace events (the
> number is variable
> depending on the amount of lock contention).
> 
> All the threads are very similar so there is no need to profile all the threads.
> 
> Right now I don't get any messages from lttng-stop nor from babeltrace saying
> that any events are being dropped so that's good.
> 
> I am however seeing a pretty significant performance drop in the app with
> profiling.
> The 250,000 total ops/sec can go down to 170,000 ops/sec.  I have measured this
> by profiling for several seconds and observing the effect on the workload
> periodic performance reports.
> And also by measuring the number of ops/sec per thread in the traces themselves.
> (An op always has
> one sendto syscall).
> 
> The subbuf setup I am currently using probably is not  optimal:
>        lttng enable-channel k -k --num-subbuf 64 --subbuf-size 512k
>        lttng enable-channel u -u --num-subbuf 64 --subbuf-size 512k
>        lttng add-context -k -c k -t tid -t pid
>        lttng add-context -u -c u -t vtid -t vpid
> 
> When I enable-events, I generally filter on either a pid/vpid (for both kernel
> and user)
> or on a small number of tids/vtids (to keep the trace smaller).
> 
> Any pointers for subbuf configuration or anything to reduce the profiling
> overhead?
> Or is the 33% perf hit for this kind of profiling load pretty expected?

Try selectively enabling a subset of the events you need, in order
to pinpoint which set is causing this overhead.

I suspect that the system call tracing would be the culprit. Whenever
syscall tracing is activated, we actually enable the kernel syscall
tracing instrumentation for all threads, and filter early on. However,
this adds significant overhead to all system calls.

One possibility here would be to extend the Linux kernel system call tracing
facility to allow tracking specific threads and processes.

I suspect that the pthread events can be heavy on performance too, since
they are traced very, very often. Do you really need to trace each lock
taken/released, or only contention ? Perhaps you could be more specific
in your enabled instrumentation to lessen its overhead.

Thanks,

Mathieu

> 
> -- Tom Deneau
> 
> 
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com