[lttng-dev] RFC (v2) Streaming and reading traces over the network

Wed Apr 18 15:00:18 EDT 2012

RFC - Streaming And Reading Traces Over The Network

Author: Julien Desfossez <julien.desfossez at efficios.com>

Contributors:
    * Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
    * David Goulet <david.goulet at efficios.com>

Version:
    - v0.1: 26/03/2012
        * Initial proposal
    - v0.2: 17/04/2012
        * First revision

Introduction
------------

This RFC proposes a way for the LTTng kernel and user-space tracer to stream
traces over the network, and for a viewer to read traces while they are
being
generated.

Changelog
---------

The first version of this RFC was posted on march 26th 2012 on lttng-dev
mailing list, this version is a rewrite and adds more details about the
inner
working of the streaming protocol and clarifies the synchronization
operations.

Prerequisites/assumptions
-------------------------

- work over TCP and UDP
- play nicely with NAT
- trace data and control data are exchanged on different connections
  and possibly on different protocols
- control data is mandatory and must use a reliable connection (TCP)
- trace packets (as defined in CTF) may arrive in any order when using
  connection-less protocol.

Name convention
---------------

- A : the traced machine streaming trace data over the network to B
- B : the remote consumer, receiving data from A
- C : the viewer on B displaying the streamed trace to the user
- trace data : the multiplexed stream of trace streams

Creating a network session
--------------------------

The first step when creating a trace to stream over the network is to
create a
tracing session on A that contains all the information to reach B. The inner
details of this part are covered in a separate RFC from David Goulet,
lets just
say that it includes the IP/name of the receiving machine (B) as well as the
control and data ports/protocols.

Once the session is started, A sends data to B over the control and data
paths.
The data path contains only trace data, the control path is streamed over a
reliable network protocol and contains the session information, indexes and
synchronization informations.

When a network trace starts, A sends each of its streams to B over the
control path. The information associated with each stream is A's
hostname, the session name, each channel name, and each stream id. B
responds with a stream handle identifier, unique across B, for each
stream.

B creates the folder hierarchy and acknowledges when it is ready, the trace
starts on A and the streaming begins.

Upon CPU hotplug, which dynamically adds streams to A, a control message
is exchanged with B to send the new stream, along with hostname,
session name, stream id. B responds with a unique stream handle
identifier.

Trace packets encapsulation
---------------------------

The data network stream received by B contains all the trace streams
multiplexed. In order to write the data into the appropriate trace files and
ensure the data is written in order, a network header must be added by the
tracer. This header located in the CTF packet header contains the following
unsigned 64 bits identifiers :

- #stream_handle : the stream handle unique identifier
- #seq : a sequence number relative to each stream
- #prev_seq : the previous sequence number to determine if a packet was
lost by
  the network or the tracer
- #circuit_id : unique routing ID across all proxy, A, and B (unused for
  now, set to 0).

Apart from #seq which is generated by the tracer, the other identifiers are
inserted by the consumer on A before sending the packet. In order to do so
without having to copy the data, when the tracer generates a trace
packet, it
leaves an empty space at the position where those fields are located and
provides API/ABI calls to allow the consumer to find the offsets and
fill them
with the appropriate value. Then the packet is directly spliced over the
network.

Note that the API/ABI calls to get the positions where to write the
stream id and prev_seq value are provided by the tracers and do _not_
involve any interaction with a CTF reader.

Note that circuit_id is present for future use in the case of routing
across proxy consumers.

Synchronization
---------------

In order to allow the trace viewer to display the traces without the risk of
receiving information belonging to a timestamp prior to the current
information (could be caused by low traffic streams), we have to define
a buffer flush frequency and a synchronization algorithm.

In order to avoid sending empty trace data for inactive streams, the
consumer
on A is in charge of the synchronization information. At a predefined
frequency, the consumer must trigger a buffer flush, and for each stream it
must save the current sequence number. The trace packets are then sent
over the
network. During that time, the consumer generates a synchronization
packet to
send on the control path. This packet contains the sequence number for each
stream sampled before their last buffer flush. When B receives this packet,
it knows that it is safe to read the trace up to, and including, each
sequence number. If a stream is inactive and did not generate any trace
data since the last buffer flush, the last known sequence number is sent.

Receiving the trace data
------------------------

On B, when trace packets arrive, they must be saved on disk. But since the
medium may not be considered reliable (UDP for example), packets may be
lost or
arrive in a different order. The trace data must be written in order on
disk.
When B receives a packet, it reads the stream handle (at fixed position
in the
packet) to determine in which trace file to write. Then, looking at the
sequence and previous sequence number, it can determine whether it must
queue
the packet (to wait for a previous packet to arrive) or if it can write
it to
disk. The previous packet ID (#prev_seq) is a way to detect if data has been
discarded by the tracer or by the network transport. When packets are not
received in consecutive order, B must assume that the missing packet(s) may
arrive later and stop writing trace data for this trace file until the
gap is
closed. UDP packets can be lost forever, so a threshold (number of packets
queued or timeout) must be defined to avoid waiting forever. When this
condition is reached, B can either discard the missing packet or ask for a
retransmission (if supported on A).