[lttng-dev] [tracecompass-dev] CTF2-PROP-1.0: Proposal for a major revision of the Common Trace Format, version 1.8

Fri Oct 28 00:08:54 UTC 2016

CCing the other mailing lists again.

----- Original Message -----
> From: "Matthew Khouzam" <matthew.khouzam at ericsson.com>
> To: "tracecompass-dev" <tracecompass-dev at eclipse.org>
> Sent: Thursday, 27 October, 2016 15:29:30
> Subject: Re: [tracecompass-dev] CTF2-PROP-1.0: Proposal for a major revision of the Common Trace Format, version 1.8

> On 16-10-25 02:26 PM, Philippe Proulx wrote:
>> A few things that still annoy me:
>>
>> * Should a boolean field type inherit the properties of an integer field
>>   type instead of a simple bit array field type? In other words, should
>>   a boolean field type have a signedness property?
> It is ok to have it but ignore it. I would suggest that boolean not have
> a sign explicitly visible though.
> 
> Here's some quick questions wrt to bools and bit arrays:
> 
> A bool is false if and only if the data is all 0, so I would argue that
> byte order is less interesting than signs.

The byte order of a bit array is important because a bit array field is
still a value, that is, the order of the bit sequence is important.
Obviously the sign would not be needed to encode or decode a boolean
value as a boolean field. The real question here is: is a boolean type
an integer type? In C++/Java/Python/etc., it's not. It's either true or
false. In C, we often use `int` or another integer type to act as a
boolean type, and some applications could have different "true" values
as different integer values. IMO this goes against what a boolean value,
or a logical value, is. This is why I would be tempted to keep the boolean
field type inherit from a bit array field type, and for those special
cases in C, just use a union field type with both a boolean field type
and an integer field type sharing the same size.

> 
> Here's a bit array question:
> 
> You have an array of 8 bool in a byte, can we use formatters to have it
> display the capabilities directly?
> 
> Something like:
> PORTB = 0x11011000

You mean 0b11011000 ;-).

Another AVR enthusiast here.

> can it be represented natively as
> OUT7 | OUT6 | OUT4 | OUT3?

Yes this is totally possible with a structure field type of bit array
(or unsigned integer) field types, perhaps with a user attribute which
asks the consumer to hide the decoded bit array fields which are a given
value, something like this (for your scenario):

    {
      "fragment": "field-type-alias",
      "name": "uint1be",
      "field-type": {
        "field-type": "int",
        "size": 1,
        "byte-order": "be",
        "user-attrs": {
          "some-ns": {
            "hide-if-equals": 0
          }
        }
      }
    }

    {
      "field-type": "struct",
      "user-attrs": {
        "basic": {
          "description": "ATmega16 I/O port B"
        }
      },
      "alignment": 8,
      "fields": [
        {"name": "PORTB7", "field-type": "uint1be"},
        {"name": "PORTB6", "field-type": "uint1be"},
        {"name": "PORTB5", "field-type": "uint1be"},
        {"name": "PORTB4", "field-type": "uint1be"},
        {"name": "PORTB3", "field-type": "uint1be"},
        {"name": "PORTB2", "field-type": "uint1be"},
        {"name": "PORTB1", "field-type": "uint1be"},
        {"name": "PORTB0", "field-type": "uint1be"},
      ]
    }

You could also indicate at the structure field type level that you need
the whole structure field to be considered this way:

    {
      "fragment": "field-type-alias",
      "name": "uint1be",
      "field-type": {
        "field-type": "int",
        "size": 1,
        "byte-order": "be"
      }
    }

    {
      "field-type": "struct",
      "user-attrs": {
        "basic": {
          "description": "ATmega16 I/O port B",
        },
        "some-ns": {
          "show-as": "flags"
        }
      },
      "alignment": 8,
      "fields": [
        {"name": "PORTB7", "field-type": "uint1be"},
        {"name": "PORTB6", "field-type": "uint1be"},
        {"name": "PORTB5", "field-type": "uint1be"},
        {"name": "PORTB4", "field-type": "uint1be"},
        {"name": "PORTB3", "field-type": "uint1be"},
        {"name": "PORTB2", "field-type": "uint1be"},
        {"name": "PORTB1", "field-type": "uint1be"},
        {"name": "PORTB0", "field-type": "uint1be"},
      ]
    }       

> 
> 
> 
> 
>>
>>   Since the interesting values of a boolean field are really _true_ and
>>   _false_, in my opinion we should not care about any signedness here.
>>   If you need this, you can use the new union field type and match a
>>   boolean field type of size X with a (signed, for example) integer
>>   field type of size X.
>>
>> * Do we really need to support other bases than base 10 in the constant
>>   integer JSON object? AFAIK, other bases are not required to encode and
>>   decode any integer value. They're only there to ease human reading of
>>   the metadata stream... however it's pretty much the only place where
>>   such a human-friendly entity is defined, so is it really needed?
> Does this go towards
> 
> *5- A CTF 2 trace /should/ be as easy as possible to consume?*
> 
> *
> I would argue reading some of the json that this will not be the hardest
> thing to convert to human readable.
> *

What do you mean? Are you in favor of supporting the four bases or only 10?

> 
>>   Keep in mind that keeping the support for bases 2, 8, and 16 requires
>>   each single CTF 2 consumer to be able to convert those strings to
>>   integers.
>>
>> * There's a clear relation between some field types that, the way it's
>>   written now, have no common parent.
>>
>>   For example, a variable-length integer field type describes fields
>>   that, once decoded, provide integer values, just like the integer
>>   field type does. However, they have no relation. Even though they both
>>   share a `signed` property, the variable-length integer field type does
>>   not need a `size` property, which is inherited from the bit array
>>   field type.
>>
>>   Same thing for the text array field type vs. the array field type
>>   (former does not need an `element-field-type` property because it's
>>   implicit).
>>
>>   Array field type and sequence field type could also be related by
>>   their common `element-field-type` property, but they are not as of
>>   this version.
>>
>>   Do you have any idea how to bring them into relation with one another
>>   without making the text too heavy? I'm thinking about some kind of
>>   property mixin (or trait?) which could be applied over a field type.
>>   For example, the "integer field type mixin" could define a single
>>   `signed` property and both the integer field type and the
>>   variable-length integer field type could claim to "implement" this
>>   mixin. "Mixin" is probably not the right term. This could simplify
>>   some parts of the text where a field providing an integer value is
>>   needed: the text could read something like "a field type with the
>>   integer field type mixin applied is needed here".
> This is similar to the clock and mapping of CTF 1.8, I don't think
> there's a way around it.

At least this mapping is outside the field type itself now, so both can
be decoupled in the specification and hopefully in the code.

> 
>>
>> * I'm not impressed by the clock field tags, in that we define in an
>>   upper layer an m-map which can be inserted _within_ an m-map which was
>>   defined in a _lower_ layer of the specification.
>>
>>   However I believe it's important that all field tags which target a
>>   specific scope field be in the same fragment that defines the type of
>>   this scope field. For example, all field tags which target the event
>>   record header scope should be part of the data stream class fragment
>>   where this event record header field type is defined.
> See above, it's a bit of a catch 22 unless you make the clock the parent
> of the trace, you will be stuck like this, since a computer has
> (normally) 1 clock and can make many traces.
> 
> This is the danger of having something 100% free-form, it will be
> inherently unstructured.
>>
>>
> One thing I was having a hard time understanding was this:
> *
> 4. The size of any CTF 2 static field, that is, any field with a
> non-dynamic type, /must/ always be the same in data streams to speed up
> the validation in some situations.*
> 
> In other words, a field described by a fixed-size type, or by a compound
> type containing only fixed-size types, /should/ always have the same
> size, no matter its offset in the data stream.
> 
> **This guarantee allows, in certain situations, to greatly speed up the
> validation process by having a great part of it done at the metadata
> stream level.
> 
> Could you give an example please?

Yes.

It is required that all the decoded fields of a union field have the
same size, that is, after aligning the union, and after decoding each
one of them, the current decoding head is at the exact same offset.
Without this requirement, you could produce some vary nasty traces.
That's a decision we took, and we believe it's reasonable.

Considering this, let's say you have the following union field type
(simplified view):

    union (alignment: 1):
      S: struct (min. alignment: 1):
        A: int (size: 8, alignment: 8)
        B: int (size: 16, alignment: 16)
      I: int (size: 32, alignment: 16)

Currently, the structure field type has an automatic alignment which
depends on the alignments of its fields (like CTF 1.8): in this case
above, S is 16-bit aligned. What we know fore sure about this union
field type is that:

1. To decode S, whatever the current decoding head is, we need to align
   it to the next multiple of 16, and then decode it: its size is always
   32 in this case: 8-bit A, 8-bit padding, 16-bit B.
2. To decode I, whatever the current decoding head is, we need to align
   it to the next multiple of 16, and then decode it: its size is always
   32.

Therefore, just by looking at this union field type, we know that,
_wherever_ it is used to decode a union field in a data stream, its
condition that the size of its fields is always the same is satisfied.

Now, consider the same example, but this time the _effective_ alignment
of the structure field type is 1 (no implicit 16-bit alignment):

    union (alignment: 1):
      S: struct (alignment: 1):
        A: int (size: 8, alignment: 8)
        B: int (size: 16, alignment: 16)
      I: int (size: 32, alignment: 16)

If the current decoding head is at offset 0, then after decoding S, it's
at offset 32 (after decoding I too):

     0: S.A        I
     8: padding    ...
    16: S.B        ...
    32: done       done

If the current decoding head is at offset 8, then after decoding S, it's
at offset 32 too, BUT after decoding I, it's at offset 48:

     8: S.A        padding
    16: S.B        I
    32: done       ...
    48:            done

Therefore the union condition is not satisfied.

Now, it could very well be that, in a given trace, all union fields with
this type are always 16-bit aligned, so the union condition is always
satisfied. But we could not know it statically, just inspecting the
metadata, as you can see. This explains the "to greatly speed up the
validation process" part: it's easier and faster to catch CTF errors at
the metadata stream level than dynamically, while reading the data
streams.

This is why we chose to keep the automatic/implicit alignment rule of
the structure field type, following this design goal.

Philippe Proulx
EfficiOS Inc.
http://www.efficios.com/

> _______________________________________________
> tracecompass-dev mailing list
> tracecompass-dev at eclipse.org
> To change your delivery options, retrieve your password, or unsubscribe from
> this list, visit
> https://dev.eclipse.org/mailman/listinfo/tracecompass-dev