[lttng-dev] What is the size overhead of UST tracepoints? (hint: very large indeed)

Amit Margalit AMITM at il.ibm.com
Tue Oct 29 04:42:20 EDT 2013


Hello all,

Let me start with my bottom line conclusions, and then explain them:

LTTng's size overhead and start-up time overhead is huge! An application 
cannot use (directly or through linking in shared libraries with TPs) more 
than about 1000-2000 tracepoints.

I am working on incorporating LTTng as an alternative logging method for 
an existing project, which already has many thousands of places in the 
code that do logging of some data.

By many thousands, I mean > 15K.

I have basically written an automated instrumentor that generates custom 
tracepoints, for every log location plus 1 entry-tracepoints and 1 
exit-tracepoint for every function.

This totals about 100K tracepoints, with approximately 600K different 
fields, and has increased the size of the project's resulting executables 
from ~100MB to ~900MB (!!) - although this is with debug symbols.

Analysis of the reasons for this has revealed some interesting (but 
discouraging) facts:
Each tracepoint name is stored in a static buffer sized 256 bytes, 
regardless of how long the name actually is (contributing some 25MB to the 
size)
Each trace event field name is store also in a 256-byte buffer 
(contirbuting another 150MB)
Each event / tracepoint holds an event structure which holds the fields, 
etc. and is padded generously.

Additionally, there is a ton of code to calculate the event size, its 
alignment, and to verify that the names are not longer than the 256 byte 
buffer, and more.

Some of this is either used just once, or even never used, merely compiled 
to let the compiler complain if the sizes are wrong, etc.

Now this code won't be included if you compile a statically linked 
executable, but my project compiles almost everything into shared libs.

Another big big problem is the time it takes to register all of these 
tracepoints. For one of my executables which includes ~10K tracepoints, it 
took over 5 seconds just to get to "main()" due to tracepoint constructor 
code registering the tracepoints...

I've started working on an alternative method of doing the same...

Is anyone already aware of this? Is anyone already working on improving 
this? I'd be very happy to work together on this.


Amit Margalit
IBM XIV - Storage Reinvented
XIV-NAS Development Team
Tel. 03-689-7774
Fax. 03-689-7230
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20131029/6b85a328/attachment.html>


More information about the lttng-dev mailing list