[lttng-dev] CTF semantics

Tue Jun 14 17:06:39 UTC 2016

On Tue, Jun 14, 2016 at 12:50 PM, Mathieu Desnoyers
<mathieu.desnoyers at efficios.com> wrote:
> ----- On Jun 14, 2016, at 12:31 PM, Milian Wolff milian.wolff at kdab.com wrote:
>
>> On Tuesday, June 14, 2016 4:10:46 PM CEST Mathieu Desnoyers wrote:
>>> ----- On Jun 14, 2016, at 7:09 AM, Milian Wolff milian.wolff at kdab.com wrote:
>>> > Hey all,
>>> >
>>> > I have looked through the CTF specification and ponder using it to replace
>>> > my custom text-based output format of heaptrack.
>>>
>>> Very cool!
>>>
>>> > At this stage, I have a fundamental question: How do the existing viewers
>>> > like Trace Compass understand the semantics of the data? Or are the
>>> > viewers not generic but instead rely on the existing generators like
>>> > lttng? How does one know e.g. what the backtrace of a given event is?
>>>
>>> CTF only specifies the data layout and associates events/fields to names
>>> (namespacing). The analyses associate meaning to the information gathered
>>> by using the namespace associated to the tracer that collected the trace,
>>> event and field names.
>>>
>>> For a lttng backtrace (we have a ongoing work prototype branch here which
>>> requires frame pointers for user-space applications:
>>> https://github.com/compudj/lttng-modules-dev/commits/callstack), we can
>>> associate the callstack_user context name to this callstack concept, and
>>> if the trace viewer wants to really show this as a callstack (linked with
>>> debug information to find the function names for instance), it needs to
>>> know the semantic of this new context field for LTTng.
>>>
>>> With the upcoming CTF 2.0, we plan on adding much more flexibility to the
>>> spec, so we could declare user attributes that would "flag" a specific
>>> aspect of the semantic, across various tracers. But we intend to leave the
>>> tracers express their own semantic as much as possible, and then eventually
>>> agree on common sets of constructs that are found in many implementations,
>>> perhaps to create a side-spec of "standard user attributes" in the future.
>>
>> Great, thanks for the in-depth explanation.
>>
>> One off-topic question: You say call stacks require frame pointers, why?
>> libunwind can unwind based on DWARF debug information. Sure, on embedded you
>> don't want that, but on a desktop that is just fine. Or did you reinvent the
>> unwinding and don't use libunwind?
>
> For lttng-modules (kernel tracer): I would ideally like to implement
> libunwind within the kernel to unwind user-space stack. I know systemtap has
> something that does this, but its performance seems rather slow, and having
> to pass each ELF file to consider explicitly before tracing is cumbersome.
> We work with a student at Ecole Polytechnique currently prototyping on this
> topic.
>
> For lttng-ust: libunwind appears to be too slow, and we would like to
> make this reentrant wrt signal handlers, and not have to share state
> (locks) between cores, for scalability reasons.
>
>>
>>> CTF 2.0 will keep the data streams as-is, and change the metadata format
>>> from TSDL (custom grammar) to JSON.
>>
>> Good choice, but JSON does not allow comments. Did you think about that? I
>> haven't used CTF at all yet, but I could think of cases where one wants to add
>> a comment to a complicated grammar. Maybe YAML is a better choice for that
>> reason.
>
> Philippe could tell us more on this topic.

Technically, the grammar is not complicated: it's JSON. The semantics
of the objects encoded with JSON may have a certain degree of complexity.

Now, the metadata stream of CTF 2 is expected to be both generated and
parsed by a machine, so the only use case I see for comments is to explain
some lines in the examples of the specification. There are other ways to
do this, for example line annotations, line numbers, and callouts.

We also discussed reserving a comment property that can be inserted
in any JSON object, just in case.

Like you, I admire YAML. But one of the design goals of CTF 2 is to ease the
development of consumers, which is why we opt for JSON, a grammar that is
supported by open source libraries in all major programming languages, and
if that does not fit, a robust JSON parser can be written in a few hundred lines
of C. The flexibility of YAML adds some complexity and dependencies which
are not strictly necessary for our use case I believe. TSDL (CTF 1's metadata
language) is easy to write by hand (very inspired by C), but few people actually
write plain TSDL from scratch compared to the numerous tools that generate
it (LTTng 2, Babeltrace's CTF writer, and barectf 2, to name a few).

JSON has other "issues", like no support for hexadecimal constant integers
(0x). Again, this is not a problem as long as it is machine-generated. And we
have other solutions for this.

That being said, you could write a YAML metadata with comments by hand;
converting from YAML to JSON is pretty much a one-line job in Python, for
example.

If you're interested in the development of CTF 2, I'll make sure to Cc you when
I post specification and informal proposals.

Phil

>
>>
>>> > In heaptrack's current format heavily interns data to greatly reduce the
>>> > file size of the output data. This is crucial, and can be done with
>>> > minimal overhead. So I'd like to do the same if and when I convert to
>>> > using CTF. But how would e.g. know how to interpret that an integer
>>> > member of a struct actually is an index into a list of backtraces?
>>>
>>> Just trying to understand here. So you store a backtrace once in the trace,
>>> associate it with a unique number, and later on, if you need to save the
>>> same backtrace, you just use this number instead ?
>>
>> Basically, yes. I actually go even further, and intern also the parts of the
>> trees, e.g.:
>>
>> A
>>|- B
>>   |- C
>>   |- D
>>|- E
>>   |- F
>>
>> If we assume the leafs of this tree would trigger events with call stacks,
>> then I'd intern the data such that I only output the debug data for every item
>> in the tree once. I.e. I don't do
>>
>> A | B | C
>> A | B | D
>> A | E | F
>>
>> Instead, I essentially store it as
>>
>> 1: A 0
>> 2: B 1
>> 3: C 2
>> 4: D 2
>> 5: E 1
>> 6: F 5
>>
>> This is enough to rebuild the call stacks, and reduces the report size
>> dramatically. E.g. an allocation of 9 bytes from D is represented as
>>
>> a 9 4
>> + <id of above line>
>
> OK. One main question I have is whether you would like to deal with
> lost CTF packets or not ? There are a few possible approaches there:
>
> 1) you keep track of packet sequence number and discarded event counts,
>    and tell the user when there is missing information,
>
> 2) you dump the entire table at the beginning of each CTF packet (e.g.
>    as a packet context field). Could be achievable if not too large.
>
> 3) you "reset" the information about which state has been dumped at the
>    beginning of each packet, so each packet is self-contained state-wise.
>
> Dealing with lost packets is useful if you plan to use flight-recorder
> (snapshot) type of tracing, where the beginning of the trace is often
> overwritten when capturing the trace snapshot.
>
> With respect to the event field type to use, I would probably use a variant.
>
> For instance:
>
> enum : uint32_t {
>   "id" = 0 .. 4294967294,
>   "map" = 4294967295,
> } f1;
>
> variant <f1> {
>   struct {} id;  /* empty field */
>   struct {
>     uint32_t id;
>     string func;
>   } map;
> } f2;
>
> The above would typically store a 32-bit "id" (the common case), and
> reserve the value 4294967295 to indicate that a non-empty variant
> follows, which has a mapping between id and function string.
>
> Would that fit your use-case ?
>
> Thanks,
>
> Mathieu
>
>
>>
>> Bye
>> --
>> Milian Wolff | milian.wolff at kdab.com | Software Engineer
>> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
>> Tel: +49-30-521325470
>> KDAB - The Qt Experts
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev