[lttng-dev] CTF2-PROP-1.0: Proposal for a major revision of the Common Trace Format, version 1.8

Tue Oct 25 18:26:09 UTC 2016

Hello fellow trace format enthusiasts.

This is a proposal for a major revision of CTF v1.8 (to CTF v2).

I strongly suggest that you read the HTML version at:

    http://diamon.org/ctf/files/CTF2-PROP-1.0.html

since the text below is an AsciiDoc source. You should, however,
inline-comment if you have something to say about a specific section of
the document, but _please_ keep only a few lines of context (what's
necessary) above and below your comment, because the text is about 4500
lines long.

Other emails will follow with other CTF 2 documents. We decided to
decouple some features and optional parts of CTF 2 in different
documents so that each one is really focused on its own subject. Then
producers and consumers may comply with this or that document. For
example, as long as a consumer can decode a CTF 2 trace (following the
specification itself), it's not the end of the world if it doesn't know
that a given integer field type prefers to be displayed in base 16.

The other documents are:

* http://diamon.org/ctf/files/CTF2-DOCID-1.0.html
* http://diamon.org/ctf/files/CTF2-BASICATTRS-1.0.html
* http://diamon.org/ctf/files/CTF2-PMETA-1.0.html
* http://diamon.org/ctf/files/CTF2-FS-1.0.html

Feel free to question this proposal!

A few things that still annoy me:

* Should a boolean field type inherit the properties of an integer field
  type instead of a simple bit array field type? In other words, should
  a boolean field type have a signedness property?

  Since the interesting values of a boolean field are really _true_ and
  _false_, in my opinion we should not care about any signedness here.
  If you need this, you can use the new union field type and match a
  boolean field type of size X with a (signed, for example) integer
  field type of size X.

* Do we really need to support other bases than base 10 in the constant
  integer JSON object? AFAIK, other bases are not required to encode and
  decode any integer value. They're only there to ease human reading of
  the metadata stream... however it's pretty much the only place where
  such a human-friendly entity is defined, so is it really needed?

  Keep in mind that keeping the support for bases 2, 8, and 16 requires
  each single CTF 2 consumer to be able to convert those strings to
  integers.

* There's a clear relation between some field types that, the way it's
  written now, have no common parent.

  For example, a variable-length integer field type describes fields
  that, once decoded, provide integer values, just like the integer
  field type does. However, they have no relation. Even though they both
  share a `signed` property, the variable-length integer field type does
  not need a `size` property, which is inherited from the bit array
  field type.

  Same thing for the text array field type vs. the array field type
  (former does not need an `element-field-type` property because it's
  implicit).

  Array field type and sequence field type could also be related by
  their common `element-field-type` property, but they are not as of
  this version.

  Do you have any idea how to bring them into relation with one another
  without making the text too heavy? I'm thinking about some kind of
  property mixin (or trait?) which could be applied over a field type.
  For example, the "integer field type mixin" could define a single
  `signed` property and both the integer field type and the
  variable-length integer field type could claim to "implement" this
  mixin. "Mixin" is probably not the right term. This could simplify
  some parts of the text where a field providing an integer value is
  needed: the text could read something like "a field type with the
  integer field type mixin applied is needed here".

* I'm not impressed by the clock field tags, in that we define in an
  upper layer an m-map which can be inserted _within_ an m-map which was
  defined in a _lower_ layer of the specification.

  However I believe it's important that all field tags which target a
  specific scope field be in the same fragment that defines the type of
  this scope field. For example, all field tags which target the event
  record header scope should be part of the data stream class fragment
  where this event record header field type is defined.

  Any idea?

Thanks for your comments!

Philippe Proulx
EfficiOS Inc.
http://www.efficios.com/

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

= CTF2-PROP-1.0: Proposal for a major revision of the Common Trace Format, version 1.8
Philippe Proulx <pproulx at efficios.com>
v1.0, 21 October 2016
:toc:
:toclevels: 5
:ieee754: IEEE 754-2008's binary interchange format

This document is an informal proposal for the next major revision of the
**Common Trace Format** (CTF): version 2.0 (hereafter named
_CTF{nbsp}2_).

This is _not_ a formal reference. Some parts of this document, however,
may be formal enough to be elligible for a specification document.

.RFC 2119
NOTE: The key words _must_, _must not_, _required_, _shall_, _shall
not_, _should_, _should not_, _recommended_, _may_, and _optional_ in
this document, when emphasized, are to be interpreted as described in
https://www.ietf.org/rfc/rfc2119.txt[RFC 2119].

== A new workgroup responsible for the publication of CTF documents

The http://diamon.org/[DiaMon Workgroup], a
https://www.linuxfoundation.org/[Linux Foundation] workgroup which
creates de-facto standards and tools for tracing, monitoring, and
diagnostics, is now responsible for the publication of the official
documents about CTF.

The DiaMon Workgroup is also responsible for making available a platform
where interested parties can comment on the proposals related to CTF.

== A new method for identifying CTF{nbsp}2 documents

We suggest that all documents related to CTF{nbsp}2 bear a unique
**document identifier** (ID) having the following format:

    CTF2-<short name>-<major>.<minor>[r<revision>]

[options="header"]
.Descriptions and roles of CTF{nbsp}2 document ID parts
|===
|Part |Description |Bump _may_ introduce new concepts, procedures, and formats? |Bump _may_ remove or change existing concepts, procedures, and formats?

|+<short{nbsp}name>+
|The capitalized short name of the document,
unique amongst all the CTF{nbsp}2 documents.
|N/A
|N/A

|`<major>`
|The major version number of the document.
|Yes
|Yes

|`<minor>`
|The minor version number of the document.
|Yes
|No

|`<revision>`
|
The revision letter of the document (from `A` to `Z`).

Document revisions are used to add examples, clarify existing concepts,
fix grammar or content mistakes, or reword existing parts, for example.
|No
|No
|===

For example, the short name of this document is `PROP`, for _proposal_,
and its full document ID is `CTF2-PROP-1.0`. The next revision would be
`CTF2-PROP-1.0rA`, and the following would be `CTF2-PROP-1.0rB`.

In any CTF{nbsp}2 document, another CTF{nbsp}2 document _must_ be
referred to by using only its ID. For example: _This concept is further
explained in did:CTF2-SOMEID-1.2_. There is no need to refer to a
specific revision: the reference always targets the latest document's
revision.

We suggest the following IDs for the initial documents:

* did:CTF2-DOCID-1.0: _CTF{nbsp}2 document identifier format_
* did:CTF2-SPEC-2.0: _The Common Trace Format (CTF), version 2.0_
* did:CTF2-BASICATTRS-1.0: _Basic CTF{nbsp}2 user attributes_
* did:CTF2-FS-1.0: _Layout of a CTF{nbsp}2 trace stored on a file system_
* did:CTF2-PMETA-1.0: _CTF{nbsp}2 metadata stream packet format_

== Why CTF{nbsp}2?

Why do we need a major version bump of the CTF specification?

A major version bump is never an easy choice when it comes to revisiting
a software library, a communication protocol, a file format, or anything
that serves as a contract between a producer and a consumer. When such a
decision is taken, it must be justified by solid arguments, since it
makes it impossible for old consumers to consume the product of new
producers.

In this proposal, for instance, CTF{nbsp}2 traces are not backward
compatible with CTF{nbsp}1 traces. Although the binary format is not
changed, the metadata stream is written in a different language.

CTF{nbsp}1 has been used and tested for many years now, by different
producers and consumers. Over that time, we have noted a few gaps in the
trace format, gaps that prevent us from extending CTF{nbsp}1 as much as
we would like, amongst other things. CTF{nbsp}2 is designed to overcome
those gaps, as far as we know them, and to be flexible enough to
gracefully accept future additions while avoiding another major version
bump in the following years.

[[design-goals]]
=== Design goals

The design goals of CTF{nbsp}2 are as follows, in order, beginning with
the most important:

. **CTF{nbsp}2 data streams _must_ be backward compatible with
  CTF{nbsp}1 data streams.**
+
Many applications are already programmed to write valid CTF{nbsp}1
packets. Modifying the code of those applications to produce different
binary packets can be cumbersome, and sometimes impossible if the
application passed acceptance tests, for example.
+
Making sure that applications producing CTF{nbsp}1 traces can also
produce CTF{nbsp}2 traces only by changing the metadata stream is an
absolute necessity.

. **The CTF{nbsp}2 streams _must_ be as efficient as possible to produce
  by a tracer.**
+
This design goal was also one of the major ones which motivated the
design of CTF{nbsp}1.
+
In other words, a small embedded system _must_ be able to produce
CTF{nbsp}2 streams natively. Moreover, the tracer _must_ be able to copy
binary data to the packet buffer of a data stream without altering it.

. **CTF{nbsp}2's model _should_ be backward compatible
  with CTF{nbsp}1's.**
+
Some APIs are already written to deal with CTF{nbsp}1 ``objects'', or
concepts (event record classes, event records, field types, and data
stream clock classes, to name a few). The model of CTF{nbsp}2 _should_
be compatible with the model of CTF{nbsp}1, so that those existing APIs
can operate on CTF{nbsp}2 objects too without requiring huge
refactorings.

. **The size of any CTF{nbsp}2 static field, that is, any field with a
  non-dynamic type, _must_ always be the same in data streams to speed
  up the validation in some situations.**
+
In other words, a field described by a fixed-size type, or by a compound
type containing only fixed-size types, _should_ always have the same
size, no matter its offset in the data stream.
+
This guarantee allows, in certain situations, to greatly speed up the
validation process by having a great part of it done at the metadata
stream level.

. **A CTF{nbsp}2 trace _should_ be as easy as possible to consume.**
+
CTF{nbsp}1 focuses on being easy to be produced, which is a good idea
since producers are tracers in this context, and a minimal tracer
_should_ be able to produce a correct CTF trace with minimal code
(design goal 2).
+
However, because CTF{nbsp}1's metadata stream is written in TSDL, a
custom, declarative, C-like DSL designed for CTF, writing a minimal
consumer of CTF{nbsp}1 is not an easy task. TSDL is an intricate
language, with many special cases, many of which are borrowed from the C
language, which cannot be ignored when writing a consumer supporting all
its features. TSDL was developed to ease the _manual_ (human) production
of metadata streams.
+
Over time, we realized that, while producing traces is important,
consuming them to solve problems by analysing the event records is just
as important, if not more.
+
This is why CTF{nbsp}2 _should_ encourage the development of CTF
consumers in any programming language by reducing the number of special
cases, as well as by using a very simple, yet well-known grammar for the
metadata stream.
+
CTF{nbsp}1 tries to accomodate other trace formats, which can be
converted to CTF without changing the data streams by writing the
matching metadata stream. This is also a source of special cases.
CTF{nbsp}2 _should_ build on binary trace conversion (from another,
non-CTF trace format to CTF{nbsp}2) rather than trying to accomodate
other formats.

. **A CTF{nbsp}2 metadata stream _must_ be extensible by users and by
  future minor revisions of the specification (forward compatibility).**
+
CTF{nbsp}1's TSDL grammar is pretty restrictive when it comes to
customizing existing blocks with user-defined attributes.
+
Many protocols and declarative languages support custom user data in
their payload. For example, HTML5 allows any element to have user
attributes by prefixing their names with `data-`.
+
A producer of CTF{nbsp}2 _should_ be able to add custom attributes to
almost any object defined by the specification. This allows standard
consumers to read any CTF{nbsp}2 trace and ignore unknown user
attributes, providing a ``bland'', yet complete view of the trace
fields, while special consumers can be written (or existing consumers
can be extended) to interpret specific user attributes and use them to
present a meaningful visualization.
+
It is also possible for the DiaMon Workgroup to publish a proposal for
new properties for a next minor revision of the specification, but test
them as temporary user attributes for some time in order to collect
comments before updating the specification itself.
+
This design goal also means that an ``old'' CTF{nbsp}2 consumer _should_
be able to decode a ``new'' CTF{nbsp}2 trace, possibly with missing
field semantics if field types are added in minor revisions.

. **CTF{nbsp}2's specification _must_ focus on how to encode and decode
  data streams.**
+
CTF{nbsp}1 has a few base properties in its metadata stream that are not
strictly needed to encode or decode data streams. For example, the
`base` property of the `integer` field type is only useful to visualize
the decoded integer fields: the decoding process does not depend on a
preferred radix. Also, the name of an event record class is not needed
to decode the event records it describes: only its numeric ID is needed
to select the appropriate context and payload field types to use for the
encoding/decoding process.
+
All the object properties defined by the CTF{nbsp}2 specification
document _should_ exist only because they have a role in the
encoding/decoding process. Everything else _should_ be delegated to
other documents which define optional extension layers using the
mechanisms designed as a response to design goal 6.

. **CTF{nbsp}2's specification _must not_ specify how to transport a
  trace, nor how a trace should be stored.**
+
In the CTF{nbsp}2 specification, a CTF{nbsp}2 trace _should_ be defined
as a set of bit streams, without specifying how those streams are
transported or stored. Other official documents published by the DiaMon
Workgroup can define standard ways to support and transport CTF{nbsp}2
streams for specific use cases. Trace producers and consumers can choose
to implement one or more transport/storage mechanisms by following the
other documents.

== Changes since CTF{nbsp}1

Here is a brief summary of the changes, from CTF{nbsp}1 to CTF{nbsp}2,
introduced by this proposal.

* The terminology of the specification and the binary layouts of the
  data streams are **completely detached from the C language** and from
  the behaviour of any C compiler.
+
CTF{nbsp}2 is a programming language-agnostic trace format.

* **Terminology update**:
** _Binary stream_ → _data stream_.
** _Trace_ → _trace class_, when it names a block of
   metadata which describes traces.
** _Stream_ → _data stream class_, when it names a
   block of metadata which describes data streams.
** _Stream ID_ → _data stream class ID_.
** _Stream instance ID_ → _data stream ID_.
** _Event_ → _event record class_, when it names a
   block of metadata which describes event records.
** _Event ID_ → _event record class ID_.
** _Event_ → _event record_, when it names an actual
   recorded event contained in a packet.
** _Declaration_ → _field type_.
** _Type alias_ → _field type alias_.
** _Clock_ → _data stream clock class_, when it names a
   block of metadata which describes actual data stream clocks.
** _Clock_ → _data stream clock_, when it names a
   per-data stream instance of a specific data stream clock class.
** _Native byte order_ → _default byte order_.
** _Stream packet context_ → _data stream packet context_.
** _Stream event header_ → _data stream event record header_.
** _Stream event context_ → _data stream event record context_.
** _Event context_ → _event record context_.
** _Event fields_ → _event record payload_.
** _Events discarded_ → _discarded event record count_.
** _Packet size_ → _packet's total size_.
** _Content size_ → _packet's content size_.
* **The <<metadata-stream,metadata stream>> is written in JSON.** JSON
  values are used to represent <<metadata-types,metadata types>>, the
  internal type system of the CTF{nbsp}2 metadata.
* The CTF{nbsp}2 specification **does not specify a ``packetized''
  metadata stream format**. This is a back-end-specific way of wrapping
  a <<metadata-stream,metadata stream>> as defined in this document.
  Another document specifies the packetized metadata stream format,
  which can be used by standard storage and transport documents if
  needed.
* Most objects (called mdt:map values) of the CTF{nbsp}2 metadata model
  _may_ contain **<<user-attrs,custom user attributes>>**.
* **Properties that are not strictly needed to _encode_ or _decode_ the
  data streams are defined as user attributes in other documents.**
  This includes, but is not limited to, the trace's environment,
  the event record class's name and log level, and the integer field
  type's preferred display base (CTF{nbsp}1's `base` property).
* **New field types**:
** A <<bitarray-field-type,**bit array** field type>> describes the most
   primitive fields of a data stream actually holding values.
+
An integer field type is now defined as a bit array field type with a
signedness property.
+
A floating point number field type is now defined as a bit array field
type: its size property is enough to determine its encoding under
{ieee754}.

** A <<null-field-type,**null** field type>>, which represents
   nonexistent, or missing fields. The intended use of this field type
   is as one of the possible types of a variant field type, especially
   when representing the types of a dynamically-typed language using a
   variant field type.
+
This field type _should_ be used to represent Python's `None`, Java's
and JavaScript's `null`, and Ruby's `nil`, for example.

** A <<bool-field-type,**boolean** field type>>, which is a bit array
   field type with a special meaning. When all the bits of a boolean
   field are cleared, the field's value is said to be _false_.
   Otherwise, the field's value is said to be _true_.
** A <<varbitarray-field-type,**variable-length bit array** field
   type>>. Each byte of a field having this type has its most
   significant bit set if one more byte must be encoded. The 7 low-order
   bits of each byte are concatenated in a specific way to form the
   final, equivalent bit array.
** A <<varbool-field-type,**variable-length boolean** field type>>, a
   <<varint-field-type,**variable-length integer** field type>>, and a
   <<varenum-field-type,**variable-length enumeration** field type>>,
   which use the encoding mechanism of the variable-length bit array
   field type.
** A <<textarray-field-type,**text array** field type>> and a
   <<textsequence-field-type,**text sequence** field type>>, which are
   specialized versions of the array field type and sequence field type
   with a byte as the element field type. Text array and sequence fields
   hold possibly null-terminated UTF-8 string values.
** A <<union-field-type,**union** field type>>, which provides a list of
   one or more field types, each of which is an alternative
   representation of the same binary field.
+
This mechanism _may_ be used to add field types in minor revisions of
CTF{nbsp}2, while still ensuring the forward compatibility of ``old''
consumers.

* **Modified field types**:
** All CTF{nbsp}2 field types have an **alignment** property. Some field
   types, however, impose alignment constraints to match the constraints
   of CTF{nbsp}1's field types and to make consumers easier to develop.
** The default alignment of an <<int-field-type,integer field type>>
   is{nbsp}1. It used to be{nbsp}8 if the size is a multiple of{nbsp}8,
   and{nbsp}1 otherwise, in CTF{nbsp}1.
** The **`base` and `encoding` properties are removed** from the
   <<int-field-type,integer field type>>. The `encoding` property is
   used in CTF{nbsp}1 to indicate that a byte is a UTF-8 character, for
   example, but since some UTF-8 characters are encoded on more than one
   byte, this property makes no sense here. The new
   <<textarray-field-type,text array field type>> and the
   <<textsequence-field-type,text sequence field type>> _may_ be used to
   achieve the same result instead.
** The **`exp` and `mant` properties are removed** from the
   <<float-field-type,floating point number field type>>. As mentioned
   above, a floating point number field type is now defined as a bit
   array field type, which has a `size` property to indicate its total,
   fixed size, in bits. A floating point number field encoded following
   {ieee754} can be decoded knowing only this parameter, the storage
   width of the bit array, from which other parameters can be deduced
   according to the standard.
+
In other words, as far as CTF{nbsp}1's floating point number field type
is following {ieee754} for encoding its fields, only specific pairs of
`exp` and `mant` properties are valid: 8 and 24, 11 and 53, 15 and 113,
20 and 237, and so on.

** The **`encoding` property** of the <<string-field-type,string field
   type>> **is removed**. CTF{nbsp}2 string fields always contain a
   sequence of bytes which encode a string with UTF-8. If a string is
   encoded with the `ascii` encoding in a CTF{nbsp}1 data stream, it's
   still valid in CTF{nbsp}2 since an ASCII string is a UTF-8 string.

* The mechanism by which a relative variant field type's tag or sequence
  field type's length is searched for in preceding scopes (trace packet
  size, data stream packet context, event record context, event record
  payload, etc.) when it's not found in the current scope is removed. It
  has not be shown to be useful and it adds complexity to consumers. It
  is always possible to convert a CTF{nbsp}1 metadata stream using this
  behaviour to a CTF{nbsp}2 metadata stream not using it.
* **``Special'' fields**, such as the magic number field, the data
  stream class ID field, and the packet's total size field, **can have
  any name**: they are _tagged_ with specified tag names by the producer
  for the consumer to know that they have a special meaning without
  relying on reserved names like CTF{nbsp}1's `magic`, `stream_id`, and
  `packet_size`.
+
With this mechanism, it's possible for a field to be tagged multiple
times, with different meanings.

* The frequency of a <<data-stream-clock-class-fragment,data stream
  clock class>> no longer defaults to the arbitrary 1{nbsp}GHz (it's a
  mandatory property).
* There is no more lexical scopes to limit the scope of field type
  aliases and other definitions. <<field-type-alias-fragment,Field type
  aliases>> must have unique names within the whole metadata stream.

== CTF{nbsp}2 actors

There are two main _actors_ when it comes to tracing:

Producer::
  A software or hardware system which produces (writes) the streams of a
  trace.
+
A trace producer is often called a _tracer_.
+
A producer is only concerned with how to write the metadata stream and
how to encode supported values as CTF{nbsp}2 data fields and serialize
them to one or more data streams.

Consumer::
  A software or hardware system which consumes (reads) the streams of a
  trace.
+
A trace consumer is often called a _trace viewer_ or a _trace analyzer_.
+
A consumer is only concerned with how to read and interpret the metadata
stream, and how to deserialize CTF{nbsp}2 data fields from data streams
and decode them to retrieve the values they represent.

Note that a piece of software can be both a consumer and a producer.
This is the case of a trace converter, for example.

== What is a CTF{nbsp}2 trace?

A _CTF{nbsp}2 trace_ is a set of zero or more <<data-streams,data
streams>> and exactly one <<metadata-stream,metadata stream>>.

The data streams contain actual packets of event records, while the
metadata stream contains information on how to interpret the data
streams of the same trace.

The support or transport of data streams and the metadata stream is not
specified. A stream _may_ be serialized as a single file on a file
system, or it _may_ be sent over the network using TCP, to name a few
examples. The mechanism to identify the metadata stream amongst the
streams of a trace is also not specified.

[[metadata-types]]
=== Metadata types

Many of the following sections of this document describe the _required_
and _optional_ properties of _metadata mdt:map values_. All the metadata
mdt:map values can be represented using a defined set of types. The
values allowed by those types have no specific textual or binary
representation.

To avoid any confusion with field types and JSON types, the `m-` prefix
is used before the names of the metadata types. An **m-value** is any
value allowed by any of the following types.

[NOTE]
.Not to be confused with _field types_!
====
The metadata types define the possible values that can be used to define
the metadata mdt:map values, for example:

* Integer field type mdt:map
* Structure field type mdt:map
* Event record class mdt:map
* Data stream class mdt:map

As an example, here's a JSON representation of a possible integer field
type mdt:map:

[source,json]
----
{
  "field-type": "int",
  "size": 23,
  "alignment": 16
}
----

Here:

* The whole JSON object represents an mdt:map value.
* `"field-type"` represents an mdt:string value, which is an mdt:map key
  here.
* `23` represents an mdt:number value.

The whole mdt:map value represents an integer field type. This integer
field type can be used to encode and decode integer values to and from
binary data fields, depending on where exactly this mdt:map is placed
within the whole <<metadata-array,metadata mdt:array>>.
====

The metadata types are:

mdt:null::
  Nullable type: the only possible value of this type is the _null_
  value.

mdt:bool::
  Boolean type, that is, the following set of values:
+
--
* _True_
* _False_
--

mdt:int::
  Integer type, that is, the set of all the negative and positive
  integer values.

mdt:number::
  Number type, that is, the set of all the rational numbers that can be
  represented with the
  https://en.wikipedia.org/wiki/Decimal_representation[decimal
  representation].

mdt:string::
  String type, that is, the set of all the possible finite sequences of
  Unicode characters, including the zero-length sequence.

mdt:array::
  Array type, that is, the set of all the possible finite sequences of
  any m-values, including the zero-length sequence.

mdt:map::
  Unordered map type, that is, the set of all the possible sets of
  (mdt:string value, m-value) pairs, including the zero-length set. The
  mdt:string value in a given pair is called the _key_ of the
  association. An association _may_ also be called a _property_.

NOTE: For reasons of brevity and style, the word _value_ after a
metadata type name is sometimes discarded in this text. For example,
_you can use an mdt:int to..._ means _you can use an mdt:int value
to..._.

[[json]]
=== Bidirectional association between metadata types and JSON values

The <<metadata-types,metadata types>> can be bidirectionally mapped to
http://json.org/[JSON] values as follows:

.Bidirectional association between metadata types and JSON values
[options="header"]
|===
|Metadata type |JSON values

|mdt:null
|
The _null_ m-value is mapped to the JSON `null` value.

|mdt:bool
|
The _true_ and _false_ m-values are mapped to the `true` and `false`
JSON values respectively.

|mdt:int
|
Any allowed m-value is mapped to a JSON number without a
fractional part or to a <<const-int,constant integer JSON object>>.

|mdt:number
|
Any allowed m-value is mapped to a JSON number or to
a <<const-int,constant integer JSON object>>.

|mdt:string
|
Any allowed value is mapped to a JSON string.

|mdt:array
|
Any finite sequence is mapped to a JSON array, where each
m-value is mapped to a JSON value using this table.

|mdt:map
|
Any set of associations is mapped to a JSON object, where each pair's
mdt:string value is mapped to a key (JSON string, using this table) in
this JSON object, and its associated m-value is mapped to a JSON value
using this table and associated to this key.
|===

All the examples of m-values in this document use this mapping to show
textual representations.

[[const-int]]
==== Constant integer JSON object

Unfortunately, JSON does not support binary, octal, or hexadecimal
constant integers. Also, it is known that some JSON parsers have a
limited support for big integers (generally, integer values which do not
fit a 64-bit representation). A constant integer JSON object can
represent an mdt:int value or an mdt:number value without a fractional
part.

It is _recommended_ to use a constant integer JSON object instead of a
JSON number when the mdt:int or mdt:number value to represent is lesser
than -9223372036854775808 or greater than 9223372036854775807 (signed
64-bit range).

|`base`
|`2`, `8`, `10`, and `16`
|Radix of the number.
|Optional
|`10`

.Constant integer JSON object: positive decimal integer
====
Equivalent to 2876321721982327:

[source,json]
{"value": "2876321721982327"}
====

.Constant integer JSON object: negative binary integer
====
Equivalent to -253339:

[source,json]
{"base": 2, "value": "-111101110110011011"}
====

.Constant integer JSON object: positive octal integer
====
Equivalent to 420:

[source,json]
{"base": 8, "value": "644"}
====

.Constant integer JSON object: positive hexadecimal integer
====
Equivalent to 3735928559:

[source,json]
{"base": 16, "value": "deadbeef"}
====

.Constant integer JSON object: negative decimal integer
====
Equivalent to -2317:

[source,json]
{"value": "-2317"}
====

[[metadata-array]]
=== Metadata mdt:array

The metadata mdt:array is an mdt:array of **fragments** which contains
all the metadata information of a given trace.

A fragment is an m-value.

A fragment is either:

* The version fragment, that is, an mdt:string which is always  `CTF 2`.
* One of the other allowed fragments, which are described in the upper
  layers of CTF{nbsp}2.

The first fragment in the metadata mdt:array is always the version
fragment. It is followed by one or more fragments, as described by the
upper layers of CTF{nbsp}2.

[[metadata-stream]]
=== Metadata stream

A _metadata stream_ is the <<json,JSON representation>> of a
<<metadata-array,metadata mdt:array>>, that is, a UTF-8 JSON array which
is written by the producer to describe the data streams of the same
trace.

The rationale for choosing JSON over another representation, for example
TSDL (CTF{nbsp}1's metadata language), is as follows:

. JSON can represent all the possible m-values using the
  <<json,m-value to JSON value assocation table>>.
. JSON is a simple language to consume. A very basic JSON parser
  can be written in a few hundred lines of C code. Moreover, tested
  and documented JSON parsers exist for all the major programming
  languages.
. JSON is a very simple language to produce.
. JSON strings support Unicode.

One of the <<design-goals,design goals>> of CTF{nbsp}2 is to make
consumption as easy as possible. Relieving the burden of implementing a
custom TSDL parser is a substantial part of how this goal is achieved.

Keep in mind that this JSON metadata is expected to be generated by
machines, thus shortcuts that would save time to human beings are
avoided in favor of easier consumption, but without compromising easy
and fast machine generation.

[[data-streams]]
=== Data streams

A CTF{nbsp}2 _data stream_ is a sequence of packets.

Each packet starts with an _optional_ header field followed by an
_optional_ context field, after which is a sequence of event records.

An event record starts with an _optional_ header, followed by an
_optional_ context field defined at the data stream class level,
followed by an _optional_ context field defined at the event record
class level, followed by an _optional_ payload field. An event record's
total binary size _must_ be greater than 0.

=== Summary

A _CTF{nbsp}2 trace_ is a set of:

* One <<metadata-stream,metadata stream>>, which is the UTF-8 JSON
  representation of a <<metadata-array,metadata mdt:array>>. A metadata
  mdt:array is an mdt:array containing fragments. Fragments are m-values
  which describe properties of the trace and the binary layouts of its
  various parts.
* Zero or more <<data-streams,data streams>>. A data stream contains
  packets. A packet contains event records. The layout of packet headers
  and contexts, and of event record headers, contexts, and payloads, are
  described by the metadata stream of the same trace.

== Structure of the CTF{nbsp}2 specification

We suggest that the concepts of CTF{nbsp}2 be presented in the
specification document as **three layers**:

. The first layer, named the <<ctffer,CTF{nbsp}2 field encoding rules>>
  (CTFFER), shows **how to encode common programming language values as
  binary data fields according to the their descriptions, field types**.
  This is a serialization protocol. This is the foundation of
  CTF{nbsp}2, in that the other layers need data fields to have any
  meaning. This layer is independent of the tracing domain, in that it
  can be used to encode any self-described bit stream for any
  application.

. The <<layer-2,second layer>> adds the concept of **packets** and
  **event records**, and how different layouts of packet header and
  context fields, and of event record header, context, and payload
  fields, _may_ exist within different data streams and event records of
  the same trace thanks to **data stream class fragments** and **event
  record class fragments** in the metadata stream.
+
This layer also introduces the field type alias fragment and the trace
class fragment.

. The <<layer-3,third layer>> adds the concept of **time** (clocks).
  Clocks are essential data stream variables in a CTF{nbsp}2 trace
  because they associate event records and packets with one or more
  points in time. Data stream clocks are sampled by the producer when
  writing specific, regular data fields. They are updated by the
  consumer when reading the corresponding fields. Data stream clocks are
  described by **data stream clock class fragments** in the metadata
  stream.

CTF{nbsp}2 is designed so that each layer _may_ be implemented in its
own software package. This structure separates the concepts of
CTF{nbsp}2 into different sections of the text, making it more easy to
read. This structure should also make testing more easy.

Each layer depends on the previous one.

=== Compliance

A _CTF{nbsp}2 producer_, either a piece of software or a machine,
_must_ implement the first two layers of did:CTF2-SPEC-2.0.

A _CTF{nbsp}2 consumer_, either a piece of software or a machine,
_must_ implement all three layers of did:CTF2-SPEC-2.0.

[[ctffer]]
== Layer 1: CTF{nbsp}2 field encoding rules (CTFFER)

The _CTF{nbsp}2 field encoding rules_, or CTFFER, dictate how to
serialize a _value_ to a binary data field by using the properties of a
_field type_, that is, an mdt:map which describes a set of possible
binary data field values. The field type can later be used to
deserialize a data field back to a value.

The representation of a _value_ depends on the programming language in
which the CTF{nbsp}2 producer or consumer is written here. For example,
a producer written in C _may_ serialize an `int` variable as a
CTF{nbsp}2 integer field described by an <<int-field-type,integer field
type>> having the appropriate size, alignment, byte order, and
signedness properties to accommodate any value that an `int` variable
could hold for a specific architecture (32-bit for IA-32 and 16-bit for
AVR, for example). However, since a Python{nbsp}3 `int` object can hold
any integer value, a better choice for a Python{nbsp}3 producer would be
to serialize such an object as a CTF{nbsp}2 variable-length integer
field described by a <<varint-field-type,variable-length integer field
type>>.

The following subsections only describe how to encode values as binary
fields by using field types. The specification does not suggest specific
field type configurations. It is up to the producer side to choose
appropriate field type properties depending on its environment. The
procedures to encode values presented in the following subsections are
very generic: they take no account of optimizations that would be
possible in specific situations. For example, it is often possible, in
programming languages which are aware of their memory layout, to encode
a whole complex structure of values by a simple memory copy to the data
stream, as far as appropriate field types are used to describe this
exact memory layout for further correct decoding. It is in order to
satisfy those situations, which support faster producers, that field
types are flexible: a produder can encode some value in various ways by
choosing the resulting field's alignment, byte order, size, and the rest
within the data stream destination.

The values which can be encoded are:

Null values::
  A value with no size, which usually represents missing or unknown
  data.

Bit array values::
  A finite sequence of contiguous bits without a specific meaning. The
  required size, in bits, to represent a bit array value can be known
  statically or dynamically.

Boolean values::
  A bit array value which is either _false_ (all bits are cleared) or
  _true_ (anything else).

Integer values::
  A bit array value which represents an integer (signed or not).

Number values::
  A real number which can be represented with {ieee754}.

Enumeration values::
  An integer value with an associated label (known by its type).

String values::
  A finite sequence, with a length known dynamically, of Unicode
  characters.

Structure values::
  A finite sequence of values of different types.

Array values::
  A finite sequence, with a length known statically, of values sharing
  the same type.

Sequence values::
  A finite sequence, with a length known dynamically, of values sharing
  the same type.

The CTFFER also support _variant_ and _union_ fields. A variant field is
an encoded value of a given type amongst many possible types. This type
is dynamically chosen by a tag (a previously encoded enumeration value).
A union field is a data field which represents, at the same time,
different values of different types.

[[byte-order]]
=== Byte order mdt:string

A _byte order mdt:string_ is one of the following values:

`default`::
  Use default byte order.

`be`::
  Big-endian.

`le`::
  Little-endian.

CTF{nbsp}2 does not support middle or mixed endianness.

[[user-attrs]]
=== User attributes mdt:map

Many metadata mdt:map values described in this document may have a
`user-attrs` property, which _must_ be set to an mdt:map, if set at all.

Each key of the user attributes mdt:map is a **namespace**. The value of
a given key is the custom user attribute within this namespace (any
m-value is valid).

The format of a namespace is not specified. It is _recommended_ to use a
URI, or at least to include a domain name owned by the organization
defining the attributes nested under this namespace. A UUID is also a
rational option.

What to do with those user attributes from a consumer's standpoint is
not specified by this document. The values of those attributes are _not_
needed to decode the CTF data streams, and _may_ be safely ignored by
any CTF{nbsp}2 consumer.

It is expected that ``industrial standards'' defining sets of useful
attributes within given namespaces will emerge naturally over time.
Producers and consumers supporting the same attributes can enhance the
experience of the whole tracing ecosystem.

It is expected that user attributes usually fall into one of the
following categories:

* **Model**: Information about the application data model of an object.
* **Textual style**: Style attributes/hints that could be applied to a
  textual rendering of the object (color, font attributes, print format,
  etc.).
* **Graphical style**: Style attributes/hints that could be applied to a
  graphical output of the object.

.User attributes mdt:map with namespace `diamon.org/ctf/ns/std`
====
JSON representation:

[source,json]
{
  "diamon.org/ctf/ns/std": {
    "base": 16
  }
}
====

.User attributes mdt:map with different namespaces
====
JSON representation:

[source,json]
{
  "diamon.org/ctf/ns/std": {
    "name": "sched_switch",
    "ns": "lttng.org/ctf-ns/modules/2.9"
  },
  "lttng.org/ctf-ns": {
    "tmp-event": true,
    "ignore-ip": true
  }
}
====

.User attributes mdt:map with namespace `mytracer.org/ctf-ns/hints`
====
JSON representation:

[source,json]
{
  "mytracer.org/ctf-ns/hints": {
    "format-string": "{src} sent {size} bytes to {dst} at {addr}"
  }
}
====

Although _not recommended_, an empty mdt:string is a valid namespace:

.User attributes mdt:map with empty namespace
====
JSON representation:

[source,json]
{
  "": {
    "my-option": 23,
    "include": ["this", "and", "that"]
  }
}
====

The value of user attributes for a given namespace need not be an
mdt:map:

.User attributes mdt:map with an mdt:number attribute
====
JSON representation:

[source,json]
{
  "my namespace": -17.22
}
====

[[scope]]
=== Scope

A _scope_ is a specific field within a data stream. The exact location
of a scope within a data stream depends on the current encoding context.
The upper layer defines which scopes are available in its context and
how to find them by name (mdt:string).

[[field-path]]
=== Field path m-value

A _field path m-value_, used by <<sequence-field-type,sequence>> and
<<variant-field-type,variant>> field types, is a path leading to a
previously encoded data field by ``digging'' into structure and union
fields. It can be either _relative_ (starting from a known field), or
_absolute_ (starting from a user-specified scope field).

[[abs-field-path]]
==== Absolute field path mdt:map

An _absolute field path mdt:map_ defines a field path from a specific
scope.

The `path` property _may_ be empty: this targets the scope field itself.

.Absolute field path mdt:map
====
JSON representation:

[source,json]
{
  "scope": "data-stream-packet-context",
  "path": ["path", "to", "cpu_id"]
}
====

==== Relative field path mdt:array

A _relative field path mdt:array_ is an mdt:array of field names
(mdt:string values) to follow, starting from the sequence/variant field
using the field path.

.Relative field path array
====
JSON representation:

[source,json]
----
["path", "to", "cpu_id"]
----
====

==== Field lookup mechanism

Field path elements are names of _structure_ or _union_ fields. If one
of those fields is a variant field, then the lookup _must_ recursively
find the variant's current field.

For example, let's say we have the following scope named `my-scope`
(_FT_ means _field type_):

    a: int FT
    b: struct FT
      v: variant FT
        choice1: int FT
        choice2: int FT
        choice3: struct FT
          a: int FT
          b: float FT
        choice4: enum FT
      i: int FT

All the following field path m-values are valid sequence field type
lengths here:

* Trivial:
+
--
[source,json]
{
  "scope": "my-scope",
  "path": ["a"]
}
--
+
and
+
--
[source,json]
{
  "scope": "my-scope",
  "path": ["b", "i"]
}
--

* If `choice1`, `choice2`, or `choice4` is selected
  in `v` when performing the lookup:
+
--
[source,json]
{
  "scope": "my-scope",
  "path": ["b", "v"]
}
--

* If `choice3` is selected in `v` when performing the lookup:
+
--
[source,json]
{
  "scope": "my-scope",
  "path": ["b", "v", "a"]
}
--

Relative field paths are looked up by going back into the current
structure/union field, and then back into the current structure/union
field's parent structure/union field, and so on. For example:

    z: int FT <------------------------------------.
    y: struct FT                                   |
      a: int FT                                    |
      b: struct FT                                 |
        c: int FT                                  |
        d: string FT                               |
        e: struct FT                               |
          f: int FT <----------------------------. |
          g: int FT                              | |
      h: int FT                                  | |
      i: struct FT                               | |
        j: int FT                                | |
        k: sequence FT, length: ["b", "e", "f"] -' |
    x: sequence FT, length: ["z"] -----------------'

[[enc-ctx]]
=== Encoding context

We define a _current encoding head_, an integer variable which is
initialized by the upper layer before encoding a value to a data field.
This variable is the current position of the writing ``head''.

When a bit array is written, the current encoding head is updated by
adding the written size to it.

Before encoding a value as a data field, the current encoding head
_must_ be **aligned** to respect the alignment requirements of said
field (given by its field type). The following operation can be used to
update the current encoding head _p_ to the beginning of a field with an
effective alignment of _a_ (bits):

    p = (p + a - 1) & -a

For example, if the current encoding head is 37 and the alignment of the
next field to write is 8, then the current encoding head _must_ be
updated to 40 before writing the field. If the current encoding head is
48 and the alignment of the next field to write is 8, then the current
head is already aligned.

[[field-type-m-value]]
=== Field type m-value

A _field type m-value_ can be either an mdt:string value or an mdt:map
value:

* If it's an mdt:string value, it is the name of a field type alias. The
  upper layer has a mapping of field type alias names to complete field
  types (mdt:map values).
+
--
.Field type alias
====
JSON representation:

[source,json]
"my alias"
====
--

* [[field-type-m-map]]If it's an mdt:map value, it has the following
  base properties:
+
--
.Field type mdt:map properties
[options="header"]
|===
|Name |Type |Description |Required? |Default m-value

|`alignment`
|mdt:int
|Field's alignment (_must_ be a power of two, greater than 0).
|Optional
|1
|===
--
+
The following table summarizes the available field types:
+
--
.Available field types
[options="header"]
|===
|`field-type` property (mdt:string) |Name |Inherits properties from

|`null`
|<<null-field-type,Null field type>>.
|<<field-type-m-map,Field type m-map>>.

|`bitarray`
|<<bitarray-field-type,Bit array field type>>.
|<<field-type-m-map,Field type m-map>>.

|`bool`
|<<bool-field-type,Boolean field type>>.
|<<field-type-m-map,Field type m-map>>.

|`int`
|<<int-field-type,Integer field type>>.
|<<bitarray-field-type,Bit array field type>>.

|`float`
|<<float-field-type,Floating point number field type>>.
|<<bitarray-field-type,Bit array field type>>.

|`enum`
|<<enum-field-type,Enumeration field type>>.
|<<int-field-type,Integer field type>>.

|`varbitarray`
|<<varbitarray-field-type,Variable-length bit array field type>>.
|<<field-type-m-map,Field type m-map>>.

|`varbool`
|<<varbool-field-type,Variable-length boolean field type>>.
|<<varbitarray-field-type,Variable-length bit array field type>>.

|`varint`
|<<varint-field-type,Variable-length integer field type>>.
|<<varbitarray-field-type,Variable-length bit array field type>>.

|`varenum`
|<<varenum-field-type,Variable-length enumeration field type>>.
|<<varint-field-type,Variable-length integer field type>>.

|`string`
|Null-terminated UTF-8 <<string-field-type,string field type>>.
|<<field-type-m-map,Field type m-map>>.

|`struct`
|<<struct-field-type,Structure field type>>.
|<<field-type-m-map,Field type m-map>>.

|`array`
|<<array-field-type,Array field type>>.
|<<field-type-m-map,Field type m-map>>.

|`textarray`
|<<textarray-field-type,Text array field type>>.
|<<field-type-m-map,Field type m-map>>.

|`sequence`
|<<sequence-field-type,Sequence field type>>.
|<<field-type-m-map,Field type m-map>>.

|`textsequence`
|<<textsequence-field-type,Text sequence field type>>.
|<<field-type-m-map,Field type m-map>>.

|`variant`
|<<variant-field-type,Variant field type>>.
|<<field-type-m-map,Field type m-map>>.

|`union`
|<<union-field-type,Union field type>>.
|<<field-type-m-map,Field type m-map>>.
|===
--
+
Note that, on the decoding side, any unknown field type must be ignored,
and any unknown field type mdt:map property must also be ignored.

[[null-field-type]]
==== Null field type

A _null field type_ describes null fields.

A null value usually represents missing or unknown data.

.Null field type
====
JSON representation:

[source,json]
{
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  },
  "field-type": "null",
  "alignment": 4
}
====

.Minimal null field type
====
JSON representation:

[source,json]
{"field-type": "null"}
====

===== Encode a null value as a null field

To encode a null value using a null field type:

* <<enc-ctx,Align>> the current encoding head using the `alignment`
  property of the null field type.

[[bitarray-field-type]]
==== Bit array field type

A _bit array field type_ describes bit array fields.

A bit array value is a simple array of bits. It is not an integer
value (it has no signedness).

.Bit array field type
====
JSON representation:

[source,json]
{
  "field-type": "bitarray",
  "alignment": 16,
  "size": 5,
  "byte-order": "le",
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal bit array field type
====
JSON representation:

[source,json]
{
  "field-type": "bitarray",
  "size": 32
}
====

[[enc-bit-array-field]]
===== Encode a bit array value as a bit array field

To encode a bit array value using a bit array field type:

. <<enc-ctx,Align>> the current encoding head using the `alignment`
  property of the bit array field type.
. Follow the rules of _Common Trace Format v1.8.2_, section 4.1.5, to
  encode the bit array value according to its `byte-order` and `size`
  properties.
. Add the value of the `size` property to the current encoding head.

[[bool-field-type]]
==== Boolean field type

A _boolean field type_ describes boolean fields.

A boolean value is a bit array value which, when all its bits are
cleared, is said to be _false_, and otherwise is said to be _true_.

.Boolean field type
====
JSON representation:

[source,json]
{
  "field-type": "bool",
  "size": 8,
  "byte-order": "be",
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal boolean field type
====
JSON representation:

[source,json]
{
  "field-type": "bool",
  "size": 8
}
====

[[enc-bool-field]]
===== Encode a boolean value as a boolean field

To encode a boolean value using a boolean field type:

. Encode the boolean value as a bit array value. This process is
  platform-dependent.
. Follow the rules of how to <<enc-bit-array-field,encode a bit array
  value as a bit array field>>.

[[int-field-type]]
==== Integer field type

An _integer field type_ describes integer fields.

.Integer field type
====
JSON representation:

[source,json]
{
  "field-type": "int",
  "alignment": 16,
  "size": 5,
  "byte-order": "le",
  "signed": true,
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal integer field type
====
JSON representation:

[source,json]
{
  "field-type": "int",
  "size": 32
}
====

[[enc-int-field]]
===== Encode an integer value as an integer field

To encode an integer value using an integer field type:

. Encode the integer value as a bit array value. Follow the two's
  complement representation if the integer value is signed. This
  process is platform-dependent.
. Follow the rules of how to <<enc-bit-array-field,encode a bit array
  value as a bit array field>>.

[[float-field-type]]
==== Floating point number field type

A _floating point number field type_ describes floating point number
fields encoded with {ieee754}.

A number value (real number) can be encoded as a floating point number
field provided it is representable with one of the versions of
{ieee754}.

The value of the `size` property corresponds to the value of the
parameter _k_ (storage width in bits) in Table 3.5, _Binary interchange
format parameters_, of the IEEE Std 754-2008 document. All the other
parameters of the format needed to encode and decode the floating point
number value can be deduced from the value of _k_.

.Floating point number field type describing the basic binary64 format
====
This floating point number field type describes fields encoded with the
parameters of the basic binary64 format, which is the encoding used by
the ``double'' type of most programming languages.

JSON representation:

[source,json]
{
  "field-type": "float",
  "alignment": 8,
  "size": 64,
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal floating point number field type
====
JSON representation:

[source,json]
{
  "field-type": "float",
  "size": 32
}
====

[[enc-float-field]]
===== Encode a number value as a floating point number field

To encode a number value using a floating point number field type:

. Encode the number value as a bit array value following {ieee754}. This
  process is platform-dependent.
. Follow the rules of how to <<enc-bit-array-field,encode a bit array
  value as a bit array field>>.

[[enum-field-type-member]]
==== Enumeration field type member m-value

An _enumeration field type member m-value_ represents the range of
values mapped to the label of an <<enum-field-type,enumeration field
type>> member.

An enumeration field type member m-value is either:

* An mdt:int value, in which case the member's label is mapped to this
  value. For example (JSON representation):
+
[source,json]
----
28
----

* An mdt:map with the `lower` and `upper` properties (mdt:int values)
  which indicate the lower (inclusive) and upper (inclusive) limits of a
  range to which the member's label is mapped. For example (JSON
  representation):
+
[source,json]
----
{"lower": -3, "upper": 17}
----

[[enum-field-type]]
==== Enumeration field type

An _enumeration field type_ describes enumeration fields.

An enumeration value is an integer value mapped to a label.

.Enumeration field type
====
JSON representation:

[source,json]
{
  "field-type": "enum",
  "alignment": 16,
  "size": 32,
  "signed": true,
  "members": {
    "NEW": [0],
    "TERMINATED": [-1],
    "READY": [2, 17],
    "RUNNING": [-3],
    "WAITING": [
        {"lower": 19, "upper": 199},
        1000
    ],
    "RESTARTING": [
        {"base": 8, "value": "126674015"},
        {"lower": -155, "upper": -98}
    ]
  },
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}

With this enumeration field type, the following enumeration fields would
have the following associated labels:

* -1: `TERMINATED`
* 17: `READY`
* -101: `RESTARTING`
* 1000: `WAITING`
* 22771725: `RESTARTING`
* 2: `READY`
* 50: `WAITING`
====

.Minimal enumeration field type
====
JSON representation:

[source,json]
{
  "field-type": "enum",
  "members": {
    "": [0]
  }
}
====

[[enc-enum-field]]
===== Encode an enumeration value as an enumeration field

To encode an enumeration value (which is an integer value) using an
enumeration field type:

* Follow the rules of how to <<enc-int-field,encode an integer value as
  an integer field>>.

[[string-field-type]]
==== String field type

A _string field type_ describes string fields.

A string value is a finite sequence of Unicode characters.

.String field type
====
JSON representation:

[source,json]
{
  "field-type": "string",
  "alignment": 16,
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal string field type
====
JSON representation:

[source,json]
{"field-type": "string"}
====

[[enc-string-field]]
===== Encode a string value as a string field

To encode a string value using a string field type:

. <<enc-ctx,Align>> the current encoding head using the `alignment`
  property of the string field type.
. For each Unicode character of the string value:
.. Encode this Unicode character as a sequence of 8-bit bit arrays
   (bytes) following UTF-8. This process is platform-dependent.
.. For each resulting UTF-8 byte of this character:
*** Follow the rules of how to <<enc-bit-array-field,encode a bit
    array value as a bit array field>>.
. Follow the rules of how to <<enc-int-field,encode an integer value as
  an integer field>> to encode the UTF-8 null character (U+0000).

[[varbitarray-field-type]]
==== Variable-length bit array field type

A _variable-length bit array field type_ describes variable-length bit
array fields.

A bit array value of any size that is a multiple of 7{nbsp}bits, and at
least 7{nbsp}bits, can be dynamically encoded as a variable-length bit
array field.

.Variable-length bit array field type
====
JSON representation:

[source,json]
{
  "field-type": "varbitarray",
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal variable-length bit array field type
====
JSON representation:

[source,json]
{"field-type": "varbitarray"}
====

[[enc-varbitarray-field]]
===== Encode a bit array value as a variable-length bit array field

To encode a bit array value using a variable-length bit array field type:

. <<enc-ctx,Align>> the current encoding head using the `alignment`
  property of the variable-length bit array field type.
. Encode the bit array value as a bit array field following the unsigned
  https://en.wikipedia.org/wiki/LEB128[LEB128] format.
. Add the encoded variable-length bit array field's size (_not_ the
  original bit array value's size) to the current encoding head.

[[varbool-field-type]]
==== Variable-length boolean field type

A _variable-length boolean field type_ describes variable-length boolean
fields.

A boolean value is a bit array value which, when all its bits are
cleared, is said to be _false_, and otherwise is said to be _true_.

A boolean value of any size that is a multiple of 7{nbsp}bits, and at
least 7{nbsp}bits, can be dynamically encoded as a variable-length bit
array field.

.Variable-length boolean field type
====
JSON representation:

[source,json]
{
  "field-type": "varbool",
  "alignment": 32,
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal variable-length boolean field type
====
JSON representation:

[source,json]
{"field-type": "varbool"}
====

[[enc-varbool-field]]
===== Encode a boolean value as a variable-length boolean field

To encode a boolean value using a variable-length boolean field type:

. Encode the boolean value as a bit array value. This process is
  platform-dependent.
. Follow the rules of how to <<enc-varbitarray-field,encode a bit array
  value as a variable-length bit array field>>.

[[varint-field-type]]
==== Variable-length integer field type

A _variable-length integer field type_ describes variable-length integer
fields.

An integer value of any size can be encoded dynamically as a
variable-length integer field.

.Variable-length integer field type
====
JSON representation:

[source,json]
{
  "field-type": "varint",
  "alignment": 16,
  "signed": true,
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal variable-length integer field type
====
JSON representation:

[source,json]
{"field-type": "varint"}
====

[[enc-varint-field]]
===== Encode an integer value as a variable-length integer field

To encode an integer value using a variable-length integer field type:

. Encode the integer value as a bit array value. Follow the two's
  complement representation if the integer value is signed. Sign-extend
  the bit array to the next multiple of 7{nbsp}bits. This process is
  platform-dependent.
. Follow the rules of how to <<enc-varbitarray-field,encode a bit array
  value as a variable-length bit array field>>.

[[varenum-field-type]]
==== Variable-length enumeration field type

A _variable-length enumeration field type_ describes variable-length
enumeration fields.

An enumeration value is an integer value mapped to a label.

An enumeration value of any size can be encoded dynamically as a
variable-length enumeration field.

.Variable-length enumeration field type
====
JSON representation:

[source,json]
{
  "field-type": "varenum",
  "signed": true,
  "members": {
    "NEW": [0],
    "TERMINATED": [-1],
    "READY": [2, 17],
    "RUNNING": [-3],
    "WAITING": [
      {"lower": 19, "upper": 199},
      1000
    ],
    "RESTARTING": [
      {"base": 8, "value": "126674015"},
      {"lower": -155, "upper": -98}
    ]
  },
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}

With this enumeration field type, the following enumeration fields would
have the following associated labels:

* -1: `TERMINATED`
* 17: `READY`
* -101: `RESTARTING`
* 1000: `WAITING`
* 22771725: `RESTARTING`
* 2: `READY`
* 50: `WAITING`
====

.Minimal variable-length enumeration field type
====
JSON representation:

[source,json]
{
  "field-type": "varenum",
  "members": {
    "": [0]
  }
}
====

[[enc-varenum-field]]
===== Encode an enumeration value as a variable-length integer field

To encode an enumeration value (which is an integer value) using a
variable-length enumeration field type:

. Encode the integer value as a bit array value. Follow the two's
  complement representation if the integer value is signed. Sign-extend
  the bit array to the next multiple of 7{nbsp}bits. This process is
  platform-dependent.
. Follow the rules of how to <<enc-varbitarray-field,encode a bit array
  value as a variable-length bit array field>>.

[[struct-union-variant-field]]
==== Structure/union/variant field type field mdt:map

A _structure/union/variant field type field mdt:map_ represents one
field of a <<struct-field-type,structure field type>> or of a
<<union-field-type,union field type>>, or one choice of a
<<variant-field-type,variant field type>>.

.Structure/union/variant field type field mdt:map
====
JSON representation:

[source,json]
{
  "name": "src_addr",
  "field-type": {
    "field-type": "array",
    "length": 4,
    "element-field-type": "uint8"
  },
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

[[struct-field-type]]
==== Structure field type

A _structure field type_ describes structure fields.

A structure value is a finite sequence of values of different types.
This is sometimes also called a _record_.

The `alignment` property indicates the _minimal_ alignment of the
structure fields encoded with this field type. The _automatic_ alignment
of the structure field type is the greatest value amongst all the
effective alignments of the field type's fields. The _effective_
alignment of the structure field type is the greatest value amongst the
field type's minimal and automatic alignments.

.Structure field type
====
JSON representation:

[source,json]
{
  "field-type": "struct",
  "alignment": 8,
  "fields": [
    {
      "name": "timestamp_begin",
      "field-type": "uint64"
    },
    {
      "name": "timestamp_end",
      "field-type": "uint64"
    },
    {
      "name": "packet_size",
      "field-type": "uint32"
    },
    {
      "name": "content_size",
      "field-type": "uint32"
    },
    {
      "name": "core location",
      "field-type": {
        "field-type": "struct",
        "fields": [
          {
            "name": "x",
            "field-type": {
              "field-type": "int",
              "size": 3
            }
          },
          {
            "name": "y",
            "field-type": {
              "field-type": "int",
              "size": 5
            }
          }
        ]
      }
    }
  ],
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}

In this example, assuming that the effective alignment of the
`timestamp_end` field is 64, and that this is the greatest alignment
amongst all the alignments of the structure field type's fields, then
the structure field type's effective alignment is also 64.
====

.Minimal (empty) structure field type
====
JSON representation:

[source,json]
{"field-type": "struct"}
====

[[enc-struct-field]]
===== Encode a structure value as a structure field

To encode a structure value using a structure field type:

. <<enc-ctx,Align>> the current encoding head using the _effective_
  alignment of the structure field type.
. For each field of the structure value:
** Encode the field's value using the field type of the
   structure/union/variant field type field mdt:map at the corresponding
   position in the `fields` mdt:array of the structure field type.

[[array-field-type]]
==== Array field type

An _array field type_ describes array fields.

An array value is a finite sequence of values.

The length of all the possible array fields represented by a given array
field type is known statically (when producing the array field type).

.Array field type
====
[source,json]
{
  "field-type": "array",
  "alignment": 64,
  "length": 72,
  "element-field-type": {
    "field-type": "float",
    "size": 64,
    "byte-order": "be"
  },
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Array field type describing UUID fields
====
JSON representation:

[source,json]
{
  "field-type": "array",
  "alignment": 8,
  "length": 16,
  "element-field-type": {
    "field-type": "int",
    "size": 8
  },
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal array field type
====
JSON representation:

[source,json]
{
  "field-type": "array",
  "length": 32,
  "element-field-type": "elem-ft"
}
====

[[enc-array-field]]
===== Encode an array value as an array field

To encode an array value using an array field type:

. <<enc-ctx,Align>> the current encoding head using the `alignment`
  property of the array field type.
. For each element of the array value:
** Encode the value using the `element-field-type` property of the array
   field type.

[[textarray-field-type]]
==== Text array field type

A _text array field type_ describes text array fields. It is a
specialized version of the <<array-field-type,array field type>>.

A text array value is a finite sequence of bytes which form a UTF-8
string. The length of all the possible text array fields represented by
a given text array field type is known statically (when producing the
text array field type). The text array value's length _may_ be greater
than the number of effective UTF-8 bytes, as long as the string is
null-terminated. In this case, the padding bytes after the UTF-8 null
character can have any value.

.Text array field type
====
JSON representation:

[source,json]
{
  "field-type": "textarray",
  "alignment": 32,
  "length": 16,
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal text array field type
====
JSON representation:

[source,json]
{
  "field-type": "textarray",
  "length": 32
}
====

[[enc-textarray-field]]
===== Encode a text array value as a text array field

A text array value is an array value with bytes as elements.

To encode a text array value using a text array field type:

* Follow the rules of how to <<enc-array-field,encode an array value as
  an array field>>.

[[sequence-field-type]]
==== Sequence field type

A _sequence field type_ describes sequence fields.

A sequence value is a finite sequence of values. Its length is known
dynamically.

|`length`
|<<field-path,Field path m-value>>
|Path to a previously encoded integer, enumeration, variable-length
integer, or variable-length enumeration field which
indicates the number of elements contained in the sequence field.
|Required
|

.Sequence field type
====
JSON representation:

[source,json]
{
  "field-type": "sequence",
  "alignment": 32,
  "length": ["msg", "info", "count"],
  "element-field-type": {
    "field-type": "string"
  },
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal sequence field type
====
JSON representation:

[source,json]
{
  "field-type": "sequence",
  "length": ["len"],
  "element-field-type": "elem-ft"
}
====

[[enc-sequence-field]]
===== Encode a sequence value as a sequence field

To encode a sequence value using a sequence field type:

. <<enc-ctx,Align>> the current encoding head using the `alignment`
  property of the sequence field type.
. For each element of the sequence value (the count given by a
  previously encoded integer or enumeration field, located thanks to the
  `length` property):
** Encode the value using the `element-field-type` property of the
   sequence field type.

[[textsequence-field-type]]
==== Text sequence field type

A _text sequence field type_ describes text sequence fields. It is a
specialized version of the <<sequence-field-type,sequence field type>>.

A text sequence value is a finite sequence of bytes which form a UTF-8
string. Its length is known dynamically. The text sequence value's
length _may_ be greater than the number of effective UTF-8 bytes, as long
as the string is null-terminated. In this case the padding bytes after
the UTF-8 null character can have any value.

|`length`
|<<field-path,Field path m-value>>
|Path to a previously encoded integer, enumeration, variable-length
integer, or variable-length enumeration field which
indicates the number of bytes contained in the text sequence field.
|Required
|
|===

.Text sequence field type
====
JSON representation:

[source,json]
{
  "field-type": "textsequence",
  "alignment": 32,
  "length": {
    "scope": "my-scope",
    "path": "cmd-len"
  },
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal text sequence field type
====
JSON representation:

[source,json]
{
  "field-type": "textsequence",
  "length": ["len"]
}
====

[[enc-textsequence-field]]
===== Encode a text sequence value as a text sequence field

A text sequence value is a sequence value with bytes as elements.

To encode a text sequence value using a text sequence field type:

* Follow the rules of how to <<enc-sequence-field,encode a sequence
  value as a sequence field>>.

[[variant-field-type]]
==== Variant field type

A _variant field type_ describes variant fields.

A variant value is a value of some type amongst many possible types. The
exact type of the value is indicated dynamically by a previously encoded
tag field (an enumeration or variable-length enumeration field).

|`tag`
|<<field-path,Field path m-value>>
|Path to a previously encoded enumeration or variable-length
enumeration field which indicates the type, by label name to choice name
association, of the variant field.
|Required
|

The `name` property of the all the structure/union/variant field type
field mdt:map values listed in the `choices` property _must_ exist as a
member's label name in the tag's enumeration field type.

.Variant field type
====
JSON representation:

[source,json]
{
  "field-type": "variant",
  "alignment": 16,
  "tag": ["path", "to", "tag"],
  "choices": [
    {
      "user-attrs": {
        "ns": {
          "split": 5
        }
      },
      "name": "ID",
      "field-type": {
        "field-type": "int",
        "size": 35,
        "signed": true,
        "alignment": 32
      }
    },
    {
      "name": "NAME",
      "field-type": {
        "field-type": "string"
      }
    }
  ],
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal variant field type
====
JSON representation:

[source,json]
{
  "field-type": "variant",
  "tag": ["tag"],
  "choices": [
    {
      "name": "",
      "field-type": "some-ft"
    }
  ]
}

Note that such a variant field type, with a single choice, is useless
because this choice could be used directly instead.
====

[[enc-variant-field]]
===== Encode a variant value as a variant field

To encode a variant value using a variant field type:

. <<enc-ctx,Align>> the current encoding head using the `alignment`
  property of the variant field type.
. Follow the encoding rules of the field type, amongst the field types
  of the choices listed in the `choices` property, currently selected by
  the previously encoded `tag` field.

[[union-field-type]]
==== Union field type

A _union field type_ describes union fields.

A union field is a binary field which, once encoded using some field
type, can be decoded using other field types.

.Union field type
====
In this example, the union fields represented by this field type can
_always_ be decoded as either string fields, or as unsigned 64-bit
integer fields. This means that the producer ensures that, when writing
a string field, its length is always 64 bits (including the terminating
null character).

JSON representation:

[source,json]
{
  "field-type": "union",
  "alignment": 8,
  "fields": [
    {
      "name": "as string",
      "field-type": {
        "field-type": "string"
      }
    },
    {
      "name": "as int",
      "field-type": {
        "field-type": "int",
        "size": 64
      }
    }
  ],
  "user-attrs": {
    "my-namespace": {
      "my-attr": "desc"
    }
  }
}
====

.Minimal union field type
====
JSON representation:

[source,json]
{
  "field-type": "union",
  "length": 32,
  "fields": [
    {
      "name": "",
      "field-type": "some-ft"
    }
  ]
}

Note that such a union field type, with a single field, is useless
because this field could be used directly instead.
====

[[enc-variant-field]]
===== Encode a value as a union field

To encode a value using a union field type:

. <<enc-ctx,Align>> the current encoding head using the `alignment`
  property of the union field type.
. Follow the encoding rules of the chosen field type, amongst the field
  types of the fields listed in the `fields` property, corresponding to
  the type of the value to encode.

An important condition when encoding a union field is that, whichever
field type is chosen to decode the field afterwards, the current
decoding head _must_ always be the same after the process.

[[layer-2]]
== Layer 2: Data streams, packets, and event records

This layer adds the following concepts to the CTF{nbsp}2 specification:

* Field type alias fragment.
* Field tag mdt:map.
* A _trace_ contains zero or more _data streams_. A data stream is a
  sequence of zero or more _packets_. A packet contains zero or more
  _event records_.
* The concepts and layouts of data streams, packets, and event records.
* Trace class, data stream class, and event record class fragments.

[[data-stream]]
=== Data stream

A CTF{nbsp}2 _data stream_ is defined as a sequence of
<<packet,packets>>.

    [packet]
    [packet]
    [packet]
    [packet]
    [packet]
    ...

A <<data-stream-class-fragment,data stream class fragment>> describes,
in the <<metadata-array,metadata mdt:array>>, a class of data streams.
For a single data stream class, there can be zero or more data streams
in the trace.

There are two ways to distinguish individual data streams:

* Rely on the storage or transport back-end to separate individual data
  streams.
+
For example, a trace stored on a file system could contain one data
stream per file. If a trace is sent over the network, the wrapping
network protocol could assign a unique ID to each data stream.

* Tag a field of the trace packet header field with the
  <<data-stream-id,`data-stream-id` field tag>>.
+
This is the _recommended_ way to isolate a data stream: it is more
robust and more portable. With this method, each packet holds the unique
ID of the data stream to which it logically belongs.

[[packet]]
=== Packet

A _packet_ contains two _optional_, contiguous scope fields, named the
_trace packet header_ and the _data stream packet context_ fields,
followed by zero or more event records, and finally by _optional_
padding bits to honor its total size:

    [trace packet header field]
    [data stream packet context field]
    [event record]
    [event record]
    [event record]
    ...
    [padding]

Before the first bit of a packet is written, the current encoding head
of the <<enc-ctx,CTFFER encoding context>> is set to 0. This current
encoding head keeps on incrementing as fields are encoded following the
<<ctffer,CTFFER>> until the beginning of the next packet in the same
<<data-stream,data stream>>, where it is reset to 0 again. This means
that the reference to align any binary field within a packet is the
beginning of a packet, _not_ the beginning of a data stream.

To simplify matters for CTF{nbsp}2 consumers, the size of a packet, in
bits, _must_ be greater than 8, and _must_ be a multiple of 8.

The first packet's padding bit is reached when the current encoding head
is equal to the value of the packet's content size field found in the
data stream packet context field (if any).

If there is more than one data stream class in the trace, the trace
packet header field contains a data stream class ID field which
indicates which data stream class describes the data stream packet
context field and the event records.

[[event-record]]
=== Event record

An _event record_ contains four _optional_, contiguous scope fields,
named the _data stream event record header_, the _data stream event
record context_, the _event record context_, and the _event record
payload_ fields.

    [data stream event record header field]
    [data stream event record context field]
    [event record context]
    [event record payload]

If there is more than one event record class in a given data stream
class, the data stream event record header field contains an event
record class ID field which indicates which event record class describes
the event record context and event record payload fields.

The encoded size of any event record must be at least 1{nbsp}bit.

[[field-type-alias-fragment]]
=== Field type alias fragment

A _field type alias fragment_ is a <<metadata-array,fragment>> which
associates a name (mdt:string) to a <<field-type-m-value,field type
m-value>> (complete field type mdt:map, or previously defined field type
alias's name).

The name of a field type alias can be used by field types which are
written after the field type alias fragment in the metadata mdt:array.

Within a given <<metadata-array,metadata mdt:array>>, two field type
alias fragments cannot have the same `name` property.

.Field type alias fragment
====
JSON representation:

[source,json]
{
  "fragment": "field-type-alias",
  "name": "uint8",
  "field-type": {
    "field-type": "integer",
    "size": 8,
    "alignment": 8
  }
}
====

.Field type alias fragment giving another name to a previously defined alias
====
JSON representation:

[source,json]
{
  "fragment": "field-type-alias",
  "name": "uint8_t",
  "field-type": "uint8"
}
====

[[field-tag]]
=== Field tag mdt:map

A _field tag mdt:map_ ``tags'' a scope's field with a special meaning.

The following sections define specific field tag mdt:map values which
can be used in specific contexts.

[[discarded-event-record-count-field-tag]]
=== Discarded event record count field tag mdt:map

A _discarded event record count field tag mdt:map_ is a
<<field-tag,field tag>> which tags a field as being a counter of
discarded event records.

Event records can be discarded for multiple reasons from the producer's
perspective. This document specifies the available reasons, when event
records are lost for those reasons, and how to compute the total number
of discarded event records so far for a given <<data-stream,data
stream>>.

A field tagged with the `legacy` reason has the same behaviour as the
`events_discarded` field of the stream packet context of CTF v1.8.2.

The type of the field located using the `path` property _must_ be an
unsigned <<int-field-type,integer field type>>, an unsigned
<<enum-field-type,enumeration field type>>, an unsigned
<<varint-field-type,variable-length integer field type>>, or an unsigned
<<varenum-field-type,variable-length enumeration field type>>.

.Discarded event record count field tag
====
JSON representation:

[source,json]
{
  "tag": "discarded-event-record-count",
  "path": {
    "scope": "data-stream-packet-context",
    "path": ["number of discarded events"]
  },
  "reason": "legacy"
}
====

[[trace-class-fragment]]
=== Trace class fragment

A _trace class fragment_ is a <<metadata-array,fragment>> which defines
properties that are common to the whole trace, that is, to all the
<<data-stream,data streams>>.

Exactly one trace class fragment _must_ exist in a given
<<metadata-array,metadata mdt:array>>, and it _must_ precede any
<<data-stream-class-fragment,data stream class fragment>>.

Note that, if this property is missing, no field types in the whole
<<metadata-array,metadata mdt:array>> can have a
`byte-order` property set to `default`.

|`packet-header-field-type`
|<<field-type-m-value,Field type m-value>>
|Field type of the trace packet header field.

The name of this scope is `trace-packet-header`, to locate it
with an <<abs-field-path,absolute field path mdt:map>>.
|Optional
|1-bit aligned <<null-field-type,null field type>>.

|`tags`
|mdt:array of <<field-tag,field tag mdt:map values>>.
|Field tags of this trace class. See the allowed field tags below.

Upper layers may also define specific field tags that are allowed here.
|Optional
|Empty mdt:array value.
|===

.Trace class fragment
====
JSON representation:

[source,json]
{
  "fragment": "trace-class",
  "default-byte-order": "le",
  "uuid": "1255cddc-5afe-4b4a-92b2-17aa5dde2ea6",
  "packet-header-field-type": {
    "field-type": "struct",
    "fields": [
      {
        "name": "the magic",
        "field-type": "uint32"
      },
      {
        "name": "the uuid",
        "field-type": {
          "field-type": "array",
          "element-field-type": "uint8"
        }
      },
      {
        "name": "the data stream class ID",
        "field-type": "uint8"
      }
    ]
  },
  "tags": [
    {
      "tag": "magic",
      "path": {
        "scope": "trace-packet-header",
        "path": ["the magic"]
      }
    },
    {
      "tag": "uuid",
      "path": {
        "scope": "trace-packet-header",
        "path": ["the uuid"]
      }
    },
    {
      "tag": "data-stream-class-id",
      "path": {
        "scope": "trace-packet-header",
        "path": ["the data stream class ID"]
      }
    }
  ],
  "user-attrs": {
    "my ns": "yes"
  }
}
====

==== Allowed field tags targetting trace packet header fields

In addition to the tags below,
<<discarded-event-record-count-field-tag,discarded event record count
field tags>> are allowed, and any field tag defined by the upper layers
of CTF{nbsp}2.

.Trace class fragment's allowed tags for the trace packet header fields
[options="header"]
|===
|Tag name |Meaning |Tagged field constraints

|`magic`
|**Magic number field**.

This field indicates the CTF{nbsp}2 data stream magic number.

|This field _must_ be the first field of the trace packet header field
type.

This field's type _must_ be a 32-bit unsigned <<int-field-type,integer
field type>>.

The value of this field in all the data streams _must_ be 3254525889
(0xc1fc1fc1).

|`uuid`
|**Trace's UUID field**.

This field indicates the UUID of the trace.

|This field's type _must_ be an <<array-field-type,array field type>>
of length 16, with an 8-bit aligned, 8-bit
<<int-field-type,integer field type>> as its element.

The value of this field in all the data streams _must_ be equal to the
binary equivalent of the trace class fragment's `uuid` property.

|`data-stream-class-id`
|**Data stream class ID field**.

This field indicates the numeric ID of the
<<data-stream-class-fragment,data stream class>> used to encode the rest
of the packet.

If this tag is not specified, the ID of the data stream class used to
encode the rest of the packet is implicitly 0.

If more than one field are tagged with this tag, the _last_ one to be
encoded in the entire trace packet header field is the effective ID of
the data stream class used to encode the rest of the <<packet,packet>>.

|This field's type _must_ be an unsigned
<<int-field-type,integer field type>>, an unsigned
<<enum-field-type,enumeration field type>>, an unsigned
<<varint-field-type,variable-length integer field type>>, or an
unsigned <<varenum-field-type,variable-length enumeration field type>>.

|[[data-stream-id]]`data-stream-id`
|**Data stream ID field**.

This field indicates the unique numeric ID of the <<data-stream,data
stream>> to which the packet belongs.

If this tag is not specified, the data stream to which the packet
belongs is identified using the storage or transport back-end of the
trace.

If more than one field are tagged with this tag, the _last_ one to be
encoded in the entire trace packet header field is the effective unique
data stream ID.

|This field's type _must_ be an unsigned
<<int-field-type,integer field type>>, an unsigned
<<enum-field-type,enumeration field type>>, an unsigned
<<varint-field-type,variable-length integer field type>>, or an
unsigned <<varenum-field-type,variable-length enumeration field type>>.
|===

[[data-stream-class-fragment]]
=== Data stream class fragment

A _data stream class fragment_ is a <<metadata-array,fragment>> which
defines properties that are common to one or more <<data-stream,data
streams>>.

More than one data stream class fragment may exist in a given
<<metadata-array,metadata mdt:array>>, but they must come after the
<<trace-class-fragment,trace class fragment>>.

Any <<event-record-class-fragment,event record class fragment>> which is
a child of a given data stream class fragment must come after the latter
in the metadata mdt:array.

|`packet-context-field-type`
|<<field-type-m-value,Field type m-value>>
|Field type of the data stream packet context field.

The name of this scope is `data-stream-packet-context`, to locate it
with an <<abs-field-path,absolute field path mdt:map>>.
|Optional
|1-bit aligned <<null-field-type,null field type>>.

|`event-record-context-field-type`
|<<field-type-m-value,Field type m-value>>
|Field type of the data stream event record context field.

The name of this scope is `data-stream-event-record-context`, to refer
to it with an <<abs-field-path,absolute field path mdt:map>>.
|Optional
|1-bit aligned <<null-field-type,null field type>>.

|`event-record-header-field-type`
|<<field-type-m-value,Field type m-value>>
|Field type of the data stream event record header field.

The name of this scope is `data-stream-event-record-header`, to refer
to it with an <<abs-field-path,absolute field path mdt:map>>.
|Optional
|1-bit aligned <<null-field-type,null field type>>.

|`tags`
|mdt:array of <<field-tag,field tag mdt:map values>>.
|Field tags of this data stream class. See the allowed field tags below.

Upper layers may also define specific field tags that are allowed here.
|Optional
|Empty mdt:array value.
|===

Two data stream class fragments with the same `id` property cannot exist
in the same <<metadata-array,metadata mdt:array>>.

.Data stream class fragment
====
JSON representation:

[source,json]
{
  "fragment": "data-stream-class",
  "id": 3,
  "packet-context-field-type": {
    "field-type": "struct",
    "fields": [
      {
        "name": "packet size",
        "field-type": "uint32"
      },
      {
        "name": "content size",
        "field-type": "uint32"
      },
      {
        "name": "discarded events",
        "field-type": "uint32"
      },
      {
        "name": "timestamp begin",
        "field-type": "uint64"
      },
      {
        "name": "timestamp end",
        "field-type": "uint64"
      },
      {
        "name": "cpu ID",
        "field-type": "uint8"
      }
    ]
  },
  "event-record-header-field-type": {
    "field-type": "struct",
    "fields": [
      {
        "name": "ID",
        "field-type": {
          "field-type": "enum",
          "size": 5,
          "members": {
            "compact": [{"lower": 0, "upper": 30}],
            "extended": [31]
          }
        }
      },
      {
        "name": "compact or extended",
        "field-type": {
          "field-type": "variant",
          "tag": ["ID"],
          "choices": [
            {
              "name": "compact",
              "field-type": {
                "field-type": "struct",
                "fields": [
                  {
                    "name": "timestamp",
                    "field-type": "uint27"
                  }
                ]
              }
            },
            {
              "name": "extended",
              "field-type": {
                "field-type": "struct",
                "fields": [
                  {
                    "name": "ID",
                    "field-type": "uint32"
                  },
                  {
                    "name": "timestamp",
                    "field-type": "uint64"
                  }
                ]
              }
            }
          ]
        }
      }
    ]
  },
  "tags": [
    {
      "tag": "packet-total-size",
      "path": {
        "scope": "data-stream-packet-context",
        "path": ["packet size"]
      }
    },
    {
      "tag": "packet-content-size",
      "path": {
        "scope": "data-stream-packet-context",
        "path": ["content size"]
      }
    },
    {
      "tag": "discarded-event-record-count",
      "path": {
        "scope": "data-stream-packet-context",
        "path": ["discarded events"]
      },
      "reason": "legacy"
    },
    {
      "tag": "event-record-class-id",
      "path": {
        "scope": "data-stream-event-record-header",
        "path": ["ID"]
      }
    },
    {
      "tag": "event-record-class-id",
      "path": {
        "scope": "data-stream-event-record-header",
        "path": ["compact or extended", "ID"]
      }
    }
  ],
  "user-attrs": {
    "my ns": "yes"
  }
}
====

==== Allowed field tags targetting data stream packet context fields

In addition to the tags below,
<<discarded-event-record-count-field-tag,discarded event record count
field tags>> are allowed, and any field tag defined by the upper layers
of CTF{nbsp}2.

.Data stream class fragment's allowed tags for the data stream packet context fields
[options="header"]
|===
|Tag name |Meaning |Tagged field constraints

|`packet-total-size`
|**Packet's total size field**.

This field indicates the size, in bits, of the _whole_ <<packet,packet>>
in which this data stream packet context field is encoded. This size
includes the padding bits after the packet's content, if any.

If this tag is not specified, this is the only packet of its data
stream, that is, it ends where the <<data-stream,data stream>> ends.

|This field's type _must_ be an unsigned
<<int-field-type,integer field type>>, an unsigned
<<enum-field-type,enumeration field type>>, an unsigned
<<varint-field-type,variable-length integer field type>>, or an
unsigned <<varenum-field-type,variable-length enumeration field type>>.

|`packet-content-size`
|**Packet's content size field**.

This field indicates the content size, in bits, of the <<packet,packet>>
in which this data stream packet context field is encoded. The packet's
content size is the number of bits from the packet's first bit to the
last bit of the last event record (included). The difference between the
packet's total size and the packet's content size is the padding size.

If this tag is not specified, the packet has no padding bits, thus its
content size is the same as its total size.

|This field's type _must_ be an unsigned
<<int-field-type,integer field type>>, an unsigned
<<enum-field-type,enumeration field type>>, an unsigned
<<varint-field-type,variable-length integer field type>>, or an
unsigned <<varenum-field-type,variable-length enumeration field type>>.

|`packet-sequence-number`
|**Packet's sequence number field**.

This field indicates the sequence number of the packet in which this
data stream packet context field is encoded. This is the zero-based
index of the packet within its <<data-stream,data stream>>.

|This field's type _must_ be an unsigned
<<int-field-type,integer field type>>, an unsigned
<<enum-field-type,enumeration field type>>, an unsigned
<<varint-field-type,variable-length integer field type>>, or an
unsigned <<varenum-field-type,variable-length enumeration field type>>.
|===

==== Allowed field tags targetting data stream event record header fields

In addition to the tags below, any field tag defined by the upper layers
of CTF{nbsp}2 is allowed.

.Data stream class fragment's allowed tags for the data stream event record header fields
[options="header"]
|===
|Tag name |Meaning |Tagged field constraints

|`event-record-class-id`
|**Event record class ID field**.

This field indicates the numeric ID of the
<<event-record-class-fragment,event record class>> used to encode the
rest of the <<event-record,event record>>.

If this tag is not specified, the ID of the event record class used to
encode the rest of the event record is implicitly 0.

If more than one field are tagged with this tag, the _last_ one to be
encoded in the entire data stream event record header field is the
effective ID of the event record class used to encode the rest of the
event record.

|This field's type _must_ be an unsigned
<<int-field-type,integer field type>>, an unsigned
<<enum-field-type,enumeration field type>>, an unsigned
<<varint-field-type,variable-length integer field type>>, or an
unsigned <<varenum-field-type,variable-length enumeration field type>>.
|===

[[event-record-class-fragment]]
=== Event record class fragment

An _event record class fragment_ is a <<metadata-array,fragment>> which
defines properties that are common to one or more event records.

More than one event record class fragment may exist in a given
<<metadata-array,metadata mdt:array>>, but they must come after their
parent <<data-stream-class-fragment,data stream class fragment>>.

|`context-field-type`
|<<field-type-m-value,Field type m-value>>
|Field type of the event record context field.

The name of this scope is `event-record-context`, to locate it
with an <<abs-field-path,absolute field path mdt:map>>.
|Optional
|1-bit aligned <<null-field-type,null field type>>.

|`payload-field-type`
|<<field-type-m-value,Field type m-value>>
|Field type of the event record payload field.

The name of this scope is `event-record-payload`, to locate it
with an <<abs-field-path,absolute field path mdt:map>>.
|Optional
|1-bit aligned <<null-field-type,null field type>>.

|`tags`
|mdt:array of <<field-tag,field tag mdt:map values>>.
|Field tags of this event record class.

Upper layers may also define specific field tags that are allowed here.
|Optional
|Empty mdt:array value.
|===

Two event record class fragments with the same `id` property, and with
the same `parent-data-stream-class-id` property, cannot exist in the
same <<metadata-array,metadata mdt:array>>.

.Event record class fragment
====
JSON representation:

[source,json]
{
  "fragment": "event-record-class",
  "id": 45,
  "parent-data-stream-class-id": 3,
  "payload-field-type": {
    "field-type": "struct",
    "fields": [
      {
        "name": "prev_comm",
        "field-type": {
          "field-type": "textarray",
          "length": 16
        }
      },
      {"name": "prev_tid", "field-type": "uint32"},
      {"name": "prev_prio", "field-type": "uint32"},
      {"name": "prev_state", "field-type": "uint64"},
      {
        "name": "next_comm",
        "field-type": {
          "field-type": "textarray",
          "length": 16
        }
      },
      {"name": "next_tid", "field-type": "uint32"},
      {"name": "next_prio", "field-type": "uint32"}
    ]
  },
  "user-attrs": {
    "my ns": "yes"
  }
}
====

[[layer-3]]
== Layer 3: Timekeeping

This layer adds concepts of time to the CTF{nbsp}2 specification. With
this layer, timestamps _may_ be associated to packet and event record
fields.

[[data-stream-clock-class-fragment]]
=== Data stream clock class fragment

A _data stream clock class fragment_ is a <<metadata-array,fragment>>
which defines properties that are common to one or more data stream
clocks.

|`freq`
|mdt:int
|Frequency (Hz).
|Required
|

Two data stream clock class fragments with the same `name` property
cannot exist in the same <<metadata-array,metadata mdt:array>>.

See _Common Trace Format v1.8.2_, section 8, for more information about
data stream clock classes (equivalent to TSDL's `clock` block).

.Data stream clock class fragment
====
JSON representation:

[source,json]
{
  "fragment": "data-stream-clock-class",
  "name": "monotonic",
  "uuid": "96a25753-91af-4602-a71b-d53b7c3dde45",
  "freq": 1000000000,
  "error-cycles": 1000,
  "offset-seconds": 1326476837,
  "offset-cycles": 897235420,
  "is-absolute": true,
  "user-attrs": {
    "my ns": "yes"
  }
}
====

[[data-stream-clock]]
=== Data stream clock

A _data stream clock_ is an instance of a
<<data-stream-clock-class-fragment,data stream clock class>>.

Each <<data-stream,data stream>> has one such clock instance for each
known data stream clock class.

A data stream clock is an integer variable initialized to 0 before
decoding the <<data-stream,data stream>> to which it is attached. This
variable holds the current value, in cycles, for this specific data
stream, of its data stream clock class.

When any field tagged with one of the data stream clock update tags is
decoded, its associated data stream clock can be updated.

[[update-data-stream-clock-now-field-tag]]
=== Update data stream clock now field tag mdt:map

An _update data stream clock now field tag mdt:map_ is a
<<field-tag,field tag>> which tags a field to update a data stream clock
with its value when it is decoded.

This tag _may_ be used in the `tags` property of a
<<trace-class-fragment,trace class fragment mdt:map>>, a
<<data-stream-class-fragment,data stream class fragment mdt:map>>, or an
<<event-record-class-fragment,event record class fragment mdt:map>>.

If this tag is used, a matching <<data-stream-clock-class-fragment,data
stream clock class fragment>> _must_ exist in the
<<metadata-array,metadata mdt:array>> before the fragment in which it is
used.

The tagged field's type _must_ be an unsigned <<int-field-type,integer
field type>>, an unsigned <<enum-field-type,enumeration field type>>, an
unsigned <<varint-field-type,variable-length integer field type>>, or an
unsigned <<varenum-field-type,variable-length enumeration field type>>.

.Data stream class fragment using this field tag
====
JSON representation:

[source,json]
{
  "fragment": "data-stream-class",
  "id": 2,
  "event-record-header-field-type": {
    "field-type": "struct",
    "fields": [
      {
        "name": "ts",
        "field-type": "uint32"
      }
    ]
  },
  "tags": [
    {
      "tag": "update-data-stream-clock-now",
      "data-stream-clock-class-name": "monotonic",
      "path": {
        "scope": "data-stream-event-record-header",
        "path": ["ts"]
      }
    }
  ]
}
====

[[update-data-stream-clock-after-packet-field-tag]]
=== Update data stream clock after packet field tag mdt:map

An _update data stream clock after packet field tag mdt:map_ is a
<<field-tag,field tag>> which tags a field to update a data stream clock
with its value after the whole packet is decoded (delayed update).

This tag is only allowed in the `tags` property of a
<<data-stream-class-fragment,data stream class fragment mdt:map>>. It
_must_ target the `data-stream-packet-context` scope.

If this tag is used, a matching <<data-stream-clock-class-fragment,data
stream clock class fragment>>
_must_ exist in the <<metadata-array,metadata mdt:array>>
before the fragment in which it is used.

The tagged field's type _must_ be an unsigned <<int-field-type,integer
field type>>, an unsigned <<enum-field-type,enumeration field type>>, an
unsigned <<varint-field-type,variable-length integer field type>>, or an
unsigned <<varenum-field-type,variable-length enumeration field type>>.

== Complete metadata stream example

Here's an example of a complete, valid <<metadata-stream,metadata
stream>>. It contains one <<data-stream-class-fragment,data stream class
fragment>> with two <<event-record-class-fragment,event record class
fragments>> as its children.

[source,json]
----
[
  "CTF 2",
  {
    "fragment": "field-type-alias",
    "name": "uint8",
    "field-type": {
      "field-type": "int",
      "size": 8,
      "align": 8
    }
  },
  {
    "fragment": "field-type-alias",
    "name": "uint16",
    "field-type": {
      "field-type": "int",
      "size": 16,
      "align": 8
    }
  },
  {
    "fragment": "field-type-alias",
    "name": "uint32",
    "field-type": {
      "field-type": "int",
      "size": 32,
      "align": 8
    }
  },
  {
    "fragment": "field-type-alias",
    "name": "uint64",
    "field-type": {
      "field-type": "int",
      "size": 64,
      "align": 8
    }
  },
  {
    "fragment": "field-type-alias",
    "name": "int64",
    "field-type": {
      "field-type": "int",
      "size": 64,
      "align": 8,
      "signed": true
    }
  },
  {
    "fragment": "field-type-alias",
    "name": "uint27",
    "field-type": {
      "field-type": "int",
      "size": 27
    }
  },
  {
    "fragment": "field-type-alias",
    "name": "uuid",
    "field-type": {
      "field-type": "array",
      "alignment": 8,
      "length": 16,
      "field-type": "uint8"
    }
  },
  {
    "fragment": "trace-class",
    "default-byte-order": "le",
    "uuid": "908d7a0d-2bbc-4584-ba3b-a9c73a62f52e",
    "packet-header-field-type": {
      "field-type": "struct",
      "fields": [
        {
          "name": "the magic",
          "field-type": "uint32"
        },
        {
          "name": "the UUID",
          "field-type": "uuid"
        },
        {
          "name": "ids",
          "field-type": {
            "field-type": "struct",
            "fields": [
              {
                "name": "data stream class",
                "field-type": "uint8"
              },
              {
                "name": "data stream",
                "field-type": "uint8"
              }
            ]
          }
        }
      ]
    },
    "tags": [
      {
        "tag": "magic",
        "path": {
          "scope": "trace-packet-header",
          "path": ["the magic"]
        }
      },
      {
        "tag": "uuid",
        "path": {
          "scope": "trace-packet-header",
          "path": ["the UUID"]
        }
      },
      {
        "tag": "data-stream-class-id",
        "path": {
          "scope": "trace-packet-header",
          "path": ["ids", "data stream class"]
        }
      },
      {
        "tag": "data-stream-id",
        "path": {
          "scope": "trace-packet-header",
          "path": ["ids", "data stream"]
        }
      }
    ]
  },
  {
    "fragment": "data-stream-clock-class",
    "name": "clock src",
    "uuid": "96a25753-91af-4602-a71b-d53b7c3dde45",
    "freq": 1000000000,
    "error-cycles": 1000,
    "offset-seconds": 1326476837,
    "offset-cycles": 897235420,
    "is-absolute": true,
  },
  {
    "fragment": "data-stream-class",
    "packet-context-field-type": {
      "field-type": "struct",
      "fields": [
        {
          "name": "packet sizes",
          "field-type": {
            "field-type": "struct",
            "fields": [
              {
                "name": "total",
                "field-type": "uint32"
              },
              {
                "name": "content",
                "field-type": "uint32"
              }
            ]
          }
        },
        {
          "name": "discarded events",
          "field-type": "uint32"
        },
        {
          "name": "sequence number",
          "field-type": "uint64"
        },
        {
          "name": "packet begin TS",
          "field-type": "uint64"
        },
        {
          "name": "packet end TS",
          "field-type": "uint64"
        },
        {
          "name": "cpu ID",
          "field-type": "uint8"
        },
      ]
    },
    "event-record-header-field-type": {
      "field-type": "struct",
      "fields": [
        {
          "name": "ID",
          "field-type": {
            "field-type": "enum",
            "size": 5,
            "members": {
              "compact": [{"lower": 0, "upper": 30}],
              "extended": [31]
            }
          }
        },
        {
          "name": "compact or extended",
          "field-type": {
            "field-type": "variant",
            "tag": ["ID"],
            "choices": [
              {
                "name": "compact",
                "field-type": {
                  "field-type": "struct",
                  "fields": [
                    {
                      "name": "timestamp",
                      "field-type": "uint27"
                    }
                  ]
                }
              },
              {
                "name": "extended",
                "field-type": {
                  "field-type": "struct",
                  "fields": [
                    {
                      "name": "ID",
                      "field-type": "uint32"
                    },
                    {
                      "name": "timestamp",
                      "field-type": "uint64"
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    },
    "tags": [
      {
        "tag": "packet-total-size",
        "path": {
          "scope": "data-stream-packet-context",
          "path": ["packet sizes", "total"]
        }
      },
      {
        "tag": "packet-content-size",
        "path": {
          "scope": "data-stream-packet-context",
          "path": ["packet sizes", "content"]
        }
      },
      {
        "tag": "discarded-event-record-count",
        "path": {
          "scope": "data-stream-packet-context",
          "path": ["discarded events"]
        }
      },
      {
        "tag": "packet-sequence-number",
        "path": {
          "scope": "data-stream-packet-context",
          "path": ["sequence number"]
        }
      },
      {
        "tag": "update-data-stream-clock-now",
        "data-stream-clock-class-name": "clock src",
        "path": {
          "scope": "data-stream-packet-context",
          "path": ["packet begin TS"]
        }
      },
      {
        "tag": "update-data-stream-clock-after-packet",
        "data-stream-clock-class-name": "clock src",
        "path": {
          "scope": "data-stream-packet-context",
          "path": ["packet end TS"]
        }
      },
      {
        "tag": "event-record-class-id",
        "path": {
          "scope": "data-stream-event-record-header",
          "path": ["ID"]
        }
      },
      {
        "tag": "event-record-class-id",
        "path": {
          "scope": "data-stream-event-record-header",
          "path": ["compact or extended", "ID"]
        }
      },
      {
        "tag": "update-data-stream-clock-now",
        "data-stream-clock-class-name": "clock src",
        "path": {
          "scope": "data-stream-event-record-header",
          "path": ["compact or extended", "timestamp"]
        }
      }
    ]
  },
  {
    "fragment": "field-type-alias",
    "name": "comm",
    "field-type": {
      "field-type": "textarray",
      "length": 16
    }
  },
  {
    "fragment": "event-record-class",
    "id": 0,
    "payload-field-type": {
      "field-type": "struct",
      "fields": [
        {"name": "prev_comm", "field-type": "comm"},
        {"name": "prev_tid", "field-type": "uint32"},
        {"name": "prev_prio", "field-type": "uint32"},
        {"name": "prev_state", "field-type": "uint64"},
        {"name": "next_comm", "field-type": "comm"},
        {"name": "next_tid", "field-type": "uint32"},
        {"name": "next_prio", "field-type": "uint32"}
      ]
    }
  },
  {
    "fragment": "event-record-class",
    "id": 1,
    "payload-field-type": {
      "field-type": "struct",
      "fields": [
        {"name": "ret", "field-type": "int64"},
        {"name": "addr", "field-type": "uint64"},
        {"name": "len", "field-type": "uint64"},
        {"name": "prot", "field-type": "uint64"},
        {"name": "flags", "field-type": "uint64"},
        {"name": "fd", "field-type": "uint64"},
        {"name": "pgoff", "field-type": "uint64"}
      ]
    }
  }
]
----