[ltt-dev] [linuxtools-dev] Standard protocols/interfaces/formats forperformance tools (TCF, LTTng, ...)

Thu Mar 11 14:58:18 EST 2010

> I proposed, and currently chair the newly formed Multicore Association,
> Tool Infrastructure work group (TIWG).  The work group welcomes
> opportunities to better understand other efforts, that TIWG can
> leverage, and learn from.  I will be at the Multicore Expo, where I am
> presenting, and I also plan on attending the EclipseCon.  

Great! it may be a good idea to start accumulating pointers, identified 
shortcomings, ideas... in preparation for this and LinuxCon.

>>>> Along those lines, we (Mentor) have a need for a protocol 
>>> to connect to remote trace collectors and configure trace 
>>> triggering/collection, and then efficiently download lots of binary trace data.  
>>> Sound familiar?
...
>>>> Mentor has a file format we use that was 
>>> inspired by LTTng's format but is optimized for extremely large real-time trace 
>>>> logs.  I intend to throw this into the mix.
...
>>> It would be good to ask if the Ftrace team is interested to 
>>> participate in this standardization effort. Proposing 
>>> modifications to the Ftrace file format is on my roadmap.

This is indeed the problem I currently see with Ftrace, suitability for 
huge live/realtime traces. For this you need an extremely compact format 
and a good way to pass and update metadata along with the trace. 
Otherwise, Ftrace and Perf offer a large number of exciting features.

In LTTng, following some feedback from Google among others, quite a bit 
of information is implicit: per cpu files and scheduling events obviate 
the need for pid and cpu id; event ids implicitly tells the event size 
and format... Similarly, event ids are scoped by channel using little 
space, and timestamps do not store all the most significant bits. Since 
new modules may be loaded at any time with new event types, the dynamic 
allocation of event ids and update of associated metadata is something 
which must be handled properly.

Other approaches are possible to achieve the same result. Aaron Spear 
mentioned "contexts" to qualify node/cpu/pid, I am eager to learn more 
about that... You could have "define context" events, where a context id 
would be associated with a number of attributes (CPU, pid, event 
name...) and could be reused at any time simply by issuing another 
"define context" event with the same id but different attributes. The 
important part is that each event should use little more than its 
specific payload (typical event has a payload of 4 bytes and occupies a 
total of 8 to 12 bytes on LTTng). Ftrace currently has a large number of 
common fields and was thus not optimised for this; this rapidly turns a 
10GB trace into a 30GB one.

The second important missing feature is dynamic updates of the metadata 
as new event types are added when modules are loaded. In LTTng, metadata 
is received as events of a predefined type in a dedicated channel. I am 
sure that something similar could be possible for Ftrace.

>> We believe that the future will be heavily multi-core, and it is
>> a difficult problem to solve figuring out graceful ways to partition a
>> complex "application" across these cores effectively.  E.g. a system
>> with SMP Linux on a couple of cores, a low level RTOS on another core,
>> and then some DSP's as well.  Today you often use totally different
>> tools for all of those cores.  How do you understand what the heck is
>> happening in this system, never mind figuring out how to optimize the
>> system as a whole...   I think a good first step is some level of
>> interoperability in data formats so that event data collected from
>> different sources and technologies (e.g. LTTng for Linux and real-time
>> trace for the DSP's) can be correlated and analyzed side by side. 

We have some neat and fairly sophisticated tools in LTTV now to 
correlate traces taken on distributed systems with non synchronized 
clocks simply by looking at messages exchanges.