[lttng-dev] [RFC] Per-user event ID allocation proposal

David Goulet dgoulet at efficios.com
Fri Sep 14 12:09:52 EDT 2012

Hash: SHA512

Mathieu Desnoyers:
> * David Goulet (dgoulet at efficios.com) wrote:
> Mathieu Desnoyers:
>>>> * David Goulet (dgoulet at efficios.com) wrote: Hi Mathieu,
>>>> This looks good! I have some questions to clarify part of the
>>>> RFC.
>>>> Mathieu Desnoyers:
>>>>>>> Mathieu Desnoyers September 11, 2012
>>>>>>> Per-user event ID allocation proposal
>>>>>>> The intent of this shared event ID registry is to
>>>>>>> allow sharing tracing buffers between applications
>>>>>>> belonging to the same user (UID) for UST (user-space)
>>>>>>> tracing.
>>>>>>> A.1) Overview of System-wide, per-ABI (32 and 64-bit), 
>>>>>>> per-user, per-session, LTTng-UST event ID allocation:
>>>>>>> - Modify LTTng-UST and lttng-tools to keep a
>>>>>>> system-wide, per-ABI (32 and 64-bit), per-user,
>>>>>>> per-session registry of enabled events and their
>>>>>>> associated numeric IDs. - LTTng-UST will have to
>>>>>>> register its tracepoints to the session daemon, sending
>>>>>>> the field typing of these tracepoints during 
>>>>>>> registration, - Dynamically check that field types
>>>>>>> match upon registration of an event in this global
>>>>>>> registry, refuse registration if the field types do not
>>>>>>> match, - The metadata will be generated by lttng-tools
>>>>>>> instead of the application.
>>>>>>> A.2 Per-user Event ID Details:
>>>>>>> The event ID registry is shared across all processes
>>>>>>> for a given session/ABI/channel/user (UID). The intent
>>>>>>> is to forbid one user to access tracing data from
>>>>>>> another user, while keeping the system-wide number of
>>>>>>> buffers small.
>>>>>>> The event ID registry is attached to a: - session, -
>>>>>>> specific ABI (32/64-bit), - channel, - user (UID).
>>>>>>> lttng-session fill this registry by pulling this
>>>>>>> information as needed from traced processes (a.k.a.
>>>>>>> applications) to populate the registry. This
>>>>>>> information is needed only when an event is active for
>>>>>>> a created session. Therefore, applications need not to
>>>>>>> notify the sessiond if no session is created.
>>>>>>> The rationale for using a "pull" scheme, where the
>>>>>>> sessiond pulls information from applications, in
>>>>>>> opposition to a "push" scheme, where application would
>>>>>>> initiate commands to push the information, is that it
>>>>>>> minimizes the amount of logic required within
>>>>>>> liblttng-ust, and it does not require liblttng-ust to
>>>>>>> wait for reply from lttng-sessiond, which minimize the
>>>>>>> impact on the application behavior, providing 
>>>>>>> application resilience to lttng-sessiond crash.
>>>>>>> Updates to this registry are triggered by two distinct 
>>>>>>> scenarios: either an "enable-event" command (could also
>>>>>>> be "start", depending on the sessiond design) is being
>>>>>>> executed, or, while tracing, a library is being loaded
>>>>>>> within the application.
>>>>>>> Before we start describing the algorithms that update
>>>>>>> the registry, it is _very_ important to understand that
>>>>>>> an event enabled with "enable-event" can contain a
>>>>>>> wildcard (e.g.: libc*) and loglevel, and therefore is
>>>>>>> associated to possibly _many_ events in the
>>>>>>> application.
>>>>>>> Algo (1) When an "enable-event"/"start" command is
>>>>>>> executed, the sessiond will get, in return for sending
>>>>>>> an enable-event command to the application (which apply
>>>>>>> to a channel within a session), a variable-sized array
>>>>>>> of enabled events (remember, we can enable a
>>>>>>> wildcard!), along with their name, loglevel, field
>>>>>>> name, and field type. The sessiond proceeds to check 
>>>>>>> that each event does not conflict with another event in
>>>>>>> the registry with the same name, but having different
>>>>>>> field names/types or loglevel. If its field
>>>>>>> names/typing or loglevel differ from a previous event,
>>>>>>> it prints a warnings. If it matches a previous event,
>>>>>>> it re-uses the same ID as the previous event. If no
>>>>>>> match, it allocates a new event ID. It sends a command
>>>>>>> to the application to let it know the mapping between
>>>>>>> the event name and ID for the channel. When the 
>>>>>>> application receives that command, it can finally
>>>>>>> proceed to attach the tracepoint probe to the
>>>>>>> tracepoint site.
>>>>>>> The sessiond keeps a per-application/per-channel hash
>>>>>>> table of already enabled events, so it does not provide
>>>>>>> the same event name/id mapping twice for a given
>>>>>>> channel.
>>>> and per-session ?
>>>>> Yes.
>>>> Of what I understand of this proposal, an event is associated
>>>> to per-user/per-session/per-apps/per-channel values.
>>>>> Well, given that channels become per-user, an application
>>>>> will write its data into the channel with same UID as
>>>>> itself. (it might imply some limitations with setuid() in
>>>>> an application, or at least to document those, or that we
>>>>> overload setuid())
>>>>> The "per-app" part is not quite right. Event IDs are
>>>>> re-used and shared across all applications that belong to
>>>>> the same UID.
>>>> (I have a question at the end about how an event ID should
>>>> be generated)
>>>>>>> Algo (2) In the case where a library (.so) is being
>>>>>>> loaded in the application while tracing, the update
>>>>>>> sequence goes as follow: the application first checks
>>>>>>> if there is any session created. It so, it sends a
>>>>>>> NOTIFY_NEW_EVENTS message to the sessiond through the
>>>>>>> communication socket (normally used to send ACK to
>>>>>>> commands). The lttng-sessiond will therefore need to
>>>>>>> listen (read) to each application communication
>>>>>>> socket, and will also need to dispatch
>>>>>>> NOTIFY_NEW_EVENTS messages each time it expects an ACK
>>>>>>> reply for a command it has sent to the application.
>>>> Taking back the last sentence, can you explain more or
>>>> clarify the mechanism here of "dispatching a
>>>> NOTIFY_NEW_EVENTS" each time an ACK reply is expected?... Do
>>>> you mean that each time we are waiting for an ACK, if we get
>>>> a NOTIFY instead (which could happen due to a race between
>>>> notification and command handling) you will launch a NOTIFY
>>>> code path where the session daemon check the events hash 
>>>> table and check for event(s) to pull from the UST tracer? ...
>>>> so what about getting the real ACK after that ?
>>>>> In this scheme, the NOTIFY is entirely asynchronous, and
>>>>> gets no acknowledge from the sessiond to the app. This
>>>>> means that when dispatching the NOTIFY received at the ACK
>>>>> site (due to the race you refer to), we could simply queue
>>>>> this notify within the sessiond so it gets handled after we
>>>>> finished handling the current command (e.g. next time the
>>>>> thread go back to poll fds).
>>>>>>> When a NOTIFY_NEW_EVENTS is received from an
>>>>>>> application, the sessiond iterates on each session,
>>>>>>> each channel, redoing Algo (1). The per-app/per-channel
>>>>>>> hash table that remembers already enabled events will
>>>>>>> ensure that we don't end up enabling the same event
>>>>>>> twice.
>>>>>>> At application startup, the "registration done" message
>>>>>>> will only be sent once all the commands setting the
>>>>>>> mapping between event name and ID are sent. This
>>>>>>> ensures tracing is not started until all events are
>>>>>>> enabled (delaying the application for a configurable
>>>>>>> delay).
>>>>>>> At library load, a "registration done" will also be
>>>>>>> sent by the sessiond some time after the
>>>>>>> NOTIFY_NEW_EVENTS has been received -- at the end of
>>>>>>> Algo(1). This means that library load, within
>>>>>>> applications, can be delayed for the same amount of
>>>>>>> time that apply to application start (configurable).
>>>>>>> The registry is emptied when the session is destroyed.
>>>>>>> Event IDs are never freed, only re-used for events with
>>>>>>> the same name, after loglevel, field name and field
>>>>>>> type match check.
>>>> This means that event IDs here should be some kind of a hash
>>>> using a combination of values of the event to make sure it's
>>>> unique on a per-event/per-channel/per-session basis ?
>>>> (considering the sessiond should keep them in a separate
>>>> registry)
>>>>> I think it would be enough to hash the events by their full
>>>>> name, and then do a compare to check if the fields match.
>>>>> We _want_ the hash table lookup to succeed if we get an
>>>>> event with same name but different fields, but then our
>>>>> detailed check for field mispatch would fail.
>>>>>>> This registry is also used to generate metadata from
>>>>>>> the sessiond. The sessiond will now be responsible for
>>>>>>> generation of the metadata stream.
>>>> This implies that the session daemon will need to keep track
>>>> of the global memory location of each applications in order
>>>> to consumer metadata streams ?
>>>>> Uh ? no. The metadata stream would be _created_ by the
>>>>> sessiond. Applications would not have anything to do with
>>>>> the metadata stream in this scheme. Basically, we need to
>>>>> ensure that all the information required to generate the
>>>>> metadata for a given session/channel is present in the
>>>>> table that contains mapping between numeric event IDs and
>>>>> event name/field names/types/loglevel.
> Hmmm... the session daemon creates the metadata now... this means
> you are going to get the full ringbuffer + ctf code inside the
> lttng-tools tree....!?
>> No. Just move the internal representation of the tracepoint
>> metadata that UST currently has (see ust-events.h) into
>> lttng-tools.

Right so the session daemon has to pass over the metadata buffer
information up to the application in order to write them?

Please, if I'm wrong again, maybe just write a paragraph to explain
the whole shebang :P


>> Thanks,
>> Mathieu
> Any case, I would love having a reason for that since this seems to
> me an important change.
> David
>>>>> Anything still unclear ?
>>>>> Thanks,
>>>>> Mathieu
>>>> Cheers! David


More information about the lttng-dev mailing list