[lttng-dev] [RFC] Per-user event ID allocation proposal

Fri Sep 14 08:22:18 EDT 2012

* David Goulet (dgoulet at efficios.com) wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> Hi Mathieu,
> 
> This looks good! I have some questions to clarify part of the RFC.
> 
> Mathieu Desnoyers:
> > Mathieu Desnoyers September 11, 2012
> > 
> > Per-user event ID allocation proposal
> > 
> > The intent of this shared event ID registry is to allow sharing
> > tracing buffers between applications belonging to the same user
> > (UID) for UST (user-space) tracing.
> > 
> > A.1) Overview of System-wide, per-ABI (32 and 64-bit), per-user, 
> > per-session, LTTng-UST event ID allocation:
> > 
> > - Modify LTTng-UST and lttng-tools to keep a system-wide, per-ABI
> > (32 and 64-bit), per-user, per-session registry of enabled events
> > and their associated numeric IDs. - LTTng-UST will have to register
> > its tracepoints to the session daemon, sending the field typing of
> > these tracepoints during registration, - Dynamically check that
> > field types match upon registration of an event in this global
> > registry, refuse registration if the field types do not match, -
> > The metadata will be generated by lttng-tools instead of the 
> > application.
> > 
> > A.2 Per-user Event ID Details:
> > 
> > The event ID registry is shared across all processes for a given 
> > session/ABI/channel/user (UID). The intent is to forbid one user
> > to access tracing data from another user, while keeping the
> > system-wide number of buffers small.
> > 
> > The event ID registry is attached to a: - session, - specific ABI
> > (32/64-bit), - channel, - user (UID).
> > 
> > lttng-session fill this registry by pulling this information as
> > needed from traced processes (a.k.a. applications) to populate the
> > registry. This information is needed only when an event is active
> > for a created session. Therefore, applications need not to notify
> > the sessiond if no session is created.
> > 
> > The rationale for using a "pull" scheme, where the sessiond pulls 
> > information from applications, in opposition to a "push" scheme,
> > where application would initiate commands to push the information,
> > is that it minimizes the amount of logic required within
> > liblttng-ust, and it does not require liblttng-ust to wait for
> > reply from lttng-sessiond, which minimize the impact on the
> > application behavior, providing application resilience to
> > lttng-sessiond crash.
> > 
> > Updates to this registry are triggered by two distinct scenarios:
> > either an "enable-event" command (could also be "start", depending
> > on the sessiond design) is being executed, or, while tracing, a
> > library is being loaded within the application.
> > 
> > Before we start describing the algorithms that update the registry,
> > it is _very_ important to understand that an event enabled with 
> > "enable-event" can contain a wildcard (e.g.: libc*) and loglevel,
> > and therefore is associated to possibly _many_ events in the
> > application.
> > 
> > Algo (1) When an "enable-event"/"start" command is executed, the
> > sessiond will get, in return for sending an enable-event command to
> > the application (which apply to a channel within a session), a
> > variable-sized array of enabled events (remember, we can enable a
> > wildcard!), along with their name, loglevel, field name, and field
> > type. The sessiond proceeds to check that each event does not
> > conflict with another event in the registry with the same name, but
> > having different field names/types or loglevel. If its field
> > names/typing or loglevel differ from a previous event, it prints a
> > warnings. If it matches a previous event, it re-uses the same ID as
> > the previous event. If no match, it allocates a new event ID. It
> > sends a command to the application to let it know the mapping 
> > between the event name and ID for the channel. When the
> > application receives that command, it can finally proceed to attach
> > the tracepoint probe to the tracepoint site.
> 
> > The sessiond keeps a per-application/per-channel hash table of 
> > already enabled events, so it does not provide the same event
> > name/id mapping twice for a given channel.
> 
> and per-session ?

Yes.

> 
> Of what I understand of this proposal, an event is associated to
> per-user/per-session/per-apps/per-channel values.

Well, given that channels become per-user, an application will write its
data into the channel with same UID as itself. (it might imply some
limitations with setuid() in an application, or at least to document
those, or that we overload setuid())

The "per-app" part is not quite right. Event IDs are re-used and shared
across all applications that belong to the same UID.

> 
> (I have a question at the end about how an event ID should be generated)
> 
> > 
> > Algo (2) In the case where a library (.so) is being loaded in the
> > application while tracing, the update sequence goes as follow: the
> > application first checks if there is any session created. It so, it
> > sends a NOTIFY_NEW_EVENTS message to the sessiond through the
> > communication socket (normally used to send ACK to commands). The
> > lttng-sessiond will therefore need to listen (read) to each
> > application communication socket, and will also need to dispatch
> > NOTIFY_NEW_EVENTS messages each time it expects an ACK reply for a
> > command it has sent to the application.
> 
> Taking back the last sentence, can you explain more or clarify the
> mechanism here of "dispatching a NOTIFY_NEW_EVENTS" each time an ACK
> reply is expected?... Do you mean that each time we are waiting for an
> ACK, if we get a NOTIFY instead (which could happen due to a race
> between notification and command handling) you will launch a NOTIFY
> code path where the session daemon check the events hash table and
> check for event(s) to pull from the UST tracer? ... so what about
> getting the real ACK after that ?

In this scheme, the NOTIFY is entirely asynchronous, and gets no
acknowledge from the sessiond to the app. This means that when
dispatching the NOTIFY received at the ACK site (due to the race you
refer to), we could simply queue this notify within the sessiond so it
gets handled after we finished handling the current command (e.g. next
time the thread go back to poll fds).

> 
> > When a NOTIFY_NEW_EVENTS is received from an application, the
> > sessiond iterates on each session, each channel, redoing Algo (1). 
> > The per-app/per-channel hash table that remembers already enabled
> > events will ensure that we don't end up enabling the same event
> > twice.
> > 
> > At application startup, the "registration done" message will only
> > be sent once all the commands setting the mapping between event
> > name and ID are sent. This ensures tracing is not started until all
> > events are enabled (delaying the application for a configurable
> > delay).
> > 
> > At library load, a "registration done" will also be sent by the
> > sessiond some time after the NOTIFY_NEW_EVENTS has been received --
> > at the end of Algo(1). This means that library load, within
> > applications, can be delayed for the same amount of time that apply
> > to application start (configurable).
> > 
> > The registry is emptied when the session is destroyed. Event IDs
> > are never freed, only re-used for events with the same name, after
> > loglevel, field name and field type match check.
> 
> This means that event IDs here should be some kind of a hash using a
> combination of values of the event to make sure it's unique on a
> per-event/per-channel/per-session basis ? (considering the sessiond
> should keep them in a separate registry)

I think it would be enough to hash the events by their full name, and
then do a compare to check if the fields match. We _want_ the hash table
lookup to succeed if we get an event with same name but different
fields, but then our detailed check for field mispatch would fail.

> 
> > 
> > This registry is also used to generate metadata from the sessiond.
> > The sessiond will now be responsible for generation of the metadata
> > stream.
> > 
> 
> This implies that the session daemon will need to keep track of the
> global memory location of each applications in order to consumer
> metadata streams ?

Uh ? no. The metadata stream would be _created_ by the sessiond.
Applications would not have anything to do with the metadata stream in
this scheme. Basically, we need to ensure that all the information
required to generate the metadata for a given session/channel is present
in the table that contains mapping between numeric event IDs and event
name/field names/types/loglevel.

Anything still unclear ?

Thanks,

Mathieu

> 
> Cheers!
> David
> -----BEGIN PGP SIGNATURE-----
> 
> iQEcBAEBCgAGBQJQUe3dAAoJEELoaioR9I024JUH/14NpMtoMxKR+Y+oNd9AH6TW
> Wj23HMJDwfhbRK8T1Mz6oMI/jWkSLVrJxoB3fh3Tbx5dJBwePXrkD+Da5NqV7MMV
> PEc3Hqx66YYNh9EcbkYkg/LJfEmc4XwLxvi4x8DA4LIFwG9SDzs0N/i+e6pWs/Dy
> blUFq/Kk7t9Ah72DCzjnSqQU+plW8Nr9wuxjpFV7uiXYpMrQsArhtuengtXmv7+7
> R+KfokIccnKXZURTdEPg5aZg1NRc4QnOl8CPjX/rcD64N32EllRIIdoLFjHAxens
> aGsv2U19/53J8nvMl93qswKWNyvt59yr3a7uobKnvmSZMMSyTgApZpxqaB9Y41I=
> =cN91
> -----END PGP SIGNATURE-----

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com