[lttng-dev] [RFC] Per-user event ID allocation proposal

Fri Sep 14 12:09:52 EDT 2012

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Mathieu Desnoyers:
> * David Goulet (dgoulet at efficios.com) wrote:
> 
> 
> Mathieu Desnoyers:
>>>> * David Goulet (dgoulet at efficios.com) wrote: Hi Mathieu,
>>>> 
>>>> This looks good! I have some questions to clarify part of the
>>>> RFC.
>>>> 
>>>> Mathieu Desnoyers:
>>>>>>> Mathieu Desnoyers September 11, 2012
>>>>>>> 
>>>>>>> Per-user event ID allocation proposal
>>>>>>> 
>>>>>>> The intent of this shared event ID registry is to
>>>>>>> allow sharing tracing buffers between applications
>>>>>>> belonging to the same user (UID) for UST (user-space)
>>>>>>> tracing.
>>>>>>> 
>>>>>>> A.1) Overview of System-wide, per-ABI (32 and 64-bit), 
>>>>>>> per-user, per-session, LTTng-UST event ID allocation:
>>>>>>> 
>>>>>>> - Modify LTTng-UST and lttng-tools to keep a
>>>>>>> system-wide, per-ABI (32 and 64-bit), per-user,
>>>>>>> per-session registry of enabled events and their
>>>>>>> associated numeric IDs. - LTTng-UST will have to
>>>>>>> register its tracepoints to the session daemon, sending
>>>>>>> the field typing of these tracepoints during 
>>>>>>> registration, - Dynamically check that field types
>>>>>>> match upon registration of an event in this global
>>>>>>> registry, refuse registration if the field types do not
>>>>>>> match, - The metadata will be generated by lttng-tools
>>>>>>> instead of the application.
>>>>>>> 
>>>>>>> A.2 Per-user Event ID Details:
>>>>>>> 
>>>>>>> The event ID registry is shared across all processes
>>>>>>> for a given session/ABI/channel/user (UID). The intent
>>>>>>> is to forbid one user to access tracing data from
>>>>>>> another user, while keeping the system-wide number of
>>>>>>> buffers small.
>>>>>>> 
>>>>>>> The event ID registry is attached to a: - session, -
>>>>>>> specific ABI (32/64-bit), - channel, - user (UID).
>>>>>>> 
>>>>>>> lttng-session fill this registry by pulling this
>>>>>>> information as needed from traced processes (a.k.a.
>>>>>>> applications) to populate the registry. This
>>>>>>> information is needed only when an event is active for
>>>>>>> a created session. Therefore, applications need not to
>>>>>>> notify the sessiond if no session is created.
>>>>>>> 
>>>>>>> The rationale for using a "pull" scheme, where the
>>>>>>> sessiond pulls information from applications, in
>>>>>>> opposition to a "push" scheme, where application would
>>>>>>> initiate commands to push the information, is that it
>>>>>>> minimizes the amount of logic required within
>>>>>>> liblttng-ust, and it does not require liblttng-ust to
>>>>>>> wait for reply from lttng-sessiond, which minimize the
>>>>>>> impact on the application behavior, providing 
>>>>>>> application resilience to lttng-sessiond crash.
>>>>>>> 
>>>>>>> Updates to this registry are triggered by two distinct 
>>>>>>> scenarios: either an "enable-event" command (could also
>>>>>>> be "start", depending on the sessiond design) is being
>>>>>>> executed, or, while tracing, a library is being loaded
>>>>>>> within the application.
>>>>>>> 
>>>>>>> Before we start describing the algorithms that update
>>>>>>> the registry, it is _very_ important to understand that
>>>>>>> an event enabled with "enable-event" can contain a
>>>>>>> wildcard (e.g.: libc*) and loglevel, and therefore is
>>>>>>> associated to possibly _many_ events in the
>>>>>>> application.
>>>>>>> 
>>>>>>> Algo (1) When an "enable-event"/"start" command is
>>>>>>> executed, the sessiond will get, in return for sending
>>>>>>> an enable-event command to the application (which apply
>>>>>>> to a channel within a session), a variable-sized array
>>>>>>> of enabled events (remember, we can enable a
>>>>>>> wildcard!), along with their name, loglevel, field
>>>>>>> name, and field type. The sessiond proceeds to check 
>>>>>>> that each event does not conflict with another event in
>>>>>>> the registry with the same name, but having different
>>>>>>> field names/types or loglevel. If its field
>>>>>>> names/typing or loglevel differ from a previous event,
>>>>>>> it prints a warnings. If it matches a previous event,
>>>>>>> it re-uses the same ID as the previous event. If no
>>>>>>> match, it allocates a new event ID. It sends a command
>>>>>>> to the application to let it know the mapping between
>>>>>>> the event name and ID for the channel. When the 
>>>>>>> application receives that command, it can finally
>>>>>>> proceed to attach the tracepoint probe to the
>>>>>>> tracepoint site.
>>>> 
>>>>>>> The sessiond keeps a per-application/per-channel hash
>>>>>>> table of already enabled events, so it does not provide
>>>>>>> the same event name/id mapping twice for a given
>>>>>>> channel.
>>>> 
>>>> and per-session ?
>>>> 
>>>>> Yes.
>>>> 
>>>> 
>>>> Of what I understand of this proposal, an event is associated
>>>> to per-user/per-session/per-apps/per-channel values.
>>>> 
>>>>> Well, given that channels become per-user, an application
>>>>> will write its data into the channel with same UID as
>>>>> itself. (it might imply some limitations with setuid() in
>>>>> an application, or at least to document those, or that we
>>>>> overload setuid())
>>>> 
>>>>> The "per-app" part is not quite right. Event IDs are
>>>>> re-used and shared across all applications that belong to
>>>>> the same UID.
>>>> 
>>>> 
>>>> 
>>>> (I have a question at the end about how an event ID should
>>>> be generated)
>>>> 
>>>>>>> 
>>>>>>> Algo (2) In the case where a library (.so) is being
>>>>>>> loaded in the application while tracing, the update
>>>>>>> sequence goes as follow: the application first checks
>>>>>>> if there is any session created. It so, it sends a
>>>>>>> NOTIFY_NEW_EVENTS message to the sessiond through the
>>>>>>> communication socket (normally used to send ACK to
>>>>>>> commands). The lttng-sessiond will therefore need to
>>>>>>> listen (read) to each application communication
>>>>>>> socket, and will also need to dispatch
>>>>>>> NOTIFY_NEW_EVENTS messages each time it expects an ACK
>>>>>>> reply for a command it has sent to the application.
>>>> 
>>>> Taking back the last sentence, can you explain more or
>>>> clarify the mechanism here of "dispatching a
>>>> NOTIFY_NEW_EVENTS" each time an ACK reply is expected?... Do
>>>> you mean that each time we are waiting for an ACK, if we get
>>>> a NOTIFY instead (which could happen due to a race between
>>>> notification and command handling) you will launch a NOTIFY
>>>> code path where the session daemon check the events hash 
>>>> table and check for event(s) to pull from the UST tracer? ...
>>>> so what about getting the real ACK after that ?
>>>> 
>>>>> In this scheme, the NOTIFY is entirely asynchronous, and
>>>>> gets no acknowledge from the sessiond to the app. This
>>>>> means that when dispatching the NOTIFY received at the ACK
>>>>> site (due to the race you refer to), we could simply queue
>>>>> this notify within the sessiond so it gets handled after we
>>>>> finished handling the current command (e.g. next time the
>>>>> thread go back to poll fds).
>>>> 
>>>> 
>>>>>>> When a NOTIFY_NEW_EVENTS is received from an
>>>>>>> application, the sessiond iterates on each session,
>>>>>>> each channel, redoing Algo (1). The per-app/per-channel
>>>>>>> hash table that remembers already enabled events will
>>>>>>> ensure that we don't end up enabling the same event
>>>>>>> twice.
>>>>>>> 
>>>>>>> At application startup, the "registration done" message
>>>>>>> will only be sent once all the commands setting the
>>>>>>> mapping between event name and ID are sent. This
>>>>>>> ensures tracing is not started until all events are
>>>>>>> enabled (delaying the application for a configurable
>>>>>>> delay).
>>>>>>> 
>>>>>>> At library load, a "registration done" will also be
>>>>>>> sent by the sessiond some time after the
>>>>>>> NOTIFY_NEW_EVENTS has been received -- at the end of
>>>>>>> Algo(1). This means that library load, within
>>>>>>> applications, can be delayed for the same amount of
>>>>>>> time that apply to application start (configurable).
>>>>>>> 
>>>>>>> The registry is emptied when the session is destroyed.
>>>>>>> Event IDs are never freed, only re-used for events with
>>>>>>> the same name, after loglevel, field name and field
>>>>>>> type match check.
>>>> 
>>>> This means that event IDs here should be some kind of a hash
>>>> using a combination of values of the event to make sure it's
>>>> unique on a per-event/per-channel/per-session basis ?
>>>> (considering the sessiond should keep them in a separate
>>>> registry)
>>>> 
>>>>> I think it would be enough to hash the events by their full
>>>>> name, and then do a compare to check if the fields match.
>>>>> We _want_ the hash table lookup to succeed if we get an
>>>>> event with same name but different fields, but then our
>>>>> detailed check for field mispatch would fail.
>>>> 
>>>> 
>>>>>>> 
>>>>>>> This registry is also used to generate metadata from
>>>>>>> the sessiond. The sessiond will now be responsible for
>>>>>>> generation of the metadata stream.
>>>>>>> 
>>>> 
>>>> This implies that the session daemon will need to keep track
>>>> of the global memory location of each applications in order
>>>> to consumer metadata streams ?
>>>> 
>>>>> Uh ? no. The metadata stream would be _created_ by the
>>>>> sessiond. Applications would not have anything to do with
>>>>> the metadata stream in this scheme. Basically, we need to
>>>>> ensure that all the information required to generate the
>>>>> metadata for a given session/channel is present in the
>>>>> table that contains mapping between numeric event IDs and
>>>>> event name/field names/types/loglevel.
> 
> Hmmm... the session daemon creates the metadata now... this means
> you are going to get the full ringbuffer + ctf code inside the
> lttng-tools tree....!?
> 
>> No. Just move the internal representation of the tracepoint
>> metadata that UST currently has (see ust-events.h) into
>> lttng-tools.

Right so the session daemon has to pass over the metadata buffer
information up to the application in order to write them?

Please, if I'm wrong again, maybe just write a paragraph to explain
the whole shebang :P

Cheers
David

> 
>> Thanks,
> 
>> Mathieu
> 
> 
> Any case, I would love having a reason for that since this seems to
> me an important change.
> 
> David
> 
>>>> 
>>>>> Anything still unclear ?
>>>> 
>>>>> Thanks,
>>>> 
>>>>> Mathieu
>>>> 
>>>> 
>>>> Cheers! David
>>>> 
> 
-----BEGIN PGP SIGNATURE-----

iQEcBAEBCgAGBQJQU1bNAAoJEELoaioR9I02hYUH/A1ZVDJ36R90xaPUG+BHMTKl
NcN9dK4CBg4dRhVD7JI22HXkG4BYL/rFA0b8FDxIamIqAzdinZwacbNvHlIQ3Qi6
4yFbIjRd0oUJXgkPClyAnl+KPHUNI73HFwuqiFx+PWya/3zNdXBvtvAoyVr3XNSu
DG8qV0rlWhtqnd5c8waoMiabdmQ9unbv4Lbme1tPnb8y4AAW7Bkur16gplwiHmJQ
iVttc1PhQ14cyl6PyxA6KzhLXLP3WbTxE5tGnzZAEPI0rU79cMqVoR1tWP+Ggeee
HMjUQmVvPcM3GJ17ufXcCYKwlwBIIHaovShS1uKnaMgaFkuEUoP+yh92Z5lsaU0=
=IrHn
-----END PGP SIGNATURE-----