[lttng-dev] [RFC] Per-user event ID allocation proposal

Fri Sep 14 12:13:13 EDT 2012

* David Goulet (dgoulet at efficios.com) wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> 
> 
> Mathieu Desnoyers:
> > * David Goulet (dgoulet at efficios.com) wrote:
> > 
> > 
> > Mathieu Desnoyers:
> >>>> * David Goulet (dgoulet at efficios.com) wrote: Hi Mathieu,
> >>>> 
> >>>> This looks good! I have some questions to clarify part of the
> >>>> RFC.
> >>>> 
> >>>> Mathieu Desnoyers:
> >>>>>>> Mathieu Desnoyers September 11, 2012
> >>>>>>> 
> >>>>>>> Per-user event ID allocation proposal
> >>>>>>> 
> >>>>>>> The intent of this shared event ID registry is to
> >>>>>>> allow sharing tracing buffers between applications
> >>>>>>> belonging to the same user (UID) for UST (user-space)
> >>>>>>> tracing.
> >>>>>>> 
> >>>>>>> A.1) Overview of System-wide, per-ABI (32 and 64-bit), 
> >>>>>>> per-user, per-session, LTTng-UST event ID allocation:
> >>>>>>> 
> >>>>>>> - Modify LTTng-UST and lttng-tools to keep a
> >>>>>>> system-wide, per-ABI (32 and 64-bit), per-user,
> >>>>>>> per-session registry of enabled events and their
> >>>>>>> associated numeric IDs. - LTTng-UST will have to
> >>>>>>> register its tracepoints to the session daemon, sending
> >>>>>>> the field typing of these tracepoints during 
> >>>>>>> registration, - Dynamically check that field types
> >>>>>>> match upon registration of an event in this global
> >>>>>>> registry, refuse registration if the field types do not
> >>>>>>> match, - The metadata will be generated by lttng-tools
> >>>>>>> instead of the application.
> >>>>>>> 
> >>>>>>> A.2 Per-user Event ID Details:
> >>>>>>> 
> >>>>>>> The event ID registry is shared across all processes
> >>>>>>> for a given session/ABI/channel/user (UID). The intent
> >>>>>>> is to forbid one user to access tracing data from
> >>>>>>> another user, while keeping the system-wide number of
> >>>>>>> buffers small.
> >>>>>>> 
> >>>>>>> The event ID registry is attached to a: - session, -
> >>>>>>> specific ABI (32/64-bit), - channel, - user (UID).
> >>>>>>> 
> >>>>>>> lttng-session fill this registry by pulling this
> >>>>>>> information as needed from traced processes (a.k.a.
> >>>>>>> applications) to populate the registry. This
> >>>>>>> information is needed only when an event is active for
> >>>>>>> a created session. Therefore, applications need not to
> >>>>>>> notify the sessiond if no session is created.
> >>>>>>> 
> >>>>>>> The rationale for using a "pull" scheme, where the
> >>>>>>> sessiond pulls information from applications, in
> >>>>>>> opposition to a "push" scheme, where application would
> >>>>>>> initiate commands to push the information, is that it
> >>>>>>> minimizes the amount of logic required within
> >>>>>>> liblttng-ust, and it does not require liblttng-ust to
> >>>>>>> wait for reply from lttng-sessiond, which minimize the
> >>>>>>> impact on the application behavior, providing 
> >>>>>>> application resilience to lttng-sessiond crash.
> >>>>>>> 
> >>>>>>> Updates to this registry are triggered by two distinct 
> >>>>>>> scenarios: either an "enable-event" command (could also
> >>>>>>> be "start", depending on the sessiond design) is being
> >>>>>>> executed, or, while tracing, a library is being loaded
> >>>>>>> within the application.
> >>>>>>> 
> >>>>>>> Before we start describing the algorithms that update
> >>>>>>> the registry, it is _very_ important to understand that
> >>>>>>> an event enabled with "enable-event" can contain a
> >>>>>>> wildcard (e.g.: libc*) and loglevel, and therefore is
> >>>>>>> associated to possibly _many_ events in the
> >>>>>>> application.
> >>>>>>> 
> >>>>>>> Algo (1) When an "enable-event"/"start" command is
> >>>>>>> executed, the sessiond will get, in return for sending
> >>>>>>> an enable-event command to the application (which apply
> >>>>>>> to a channel within a session), a variable-sized array
> >>>>>>> of enabled events (remember, we can enable a
> >>>>>>> wildcard!), along with their name, loglevel, field
> >>>>>>> name, and field type. The sessiond proceeds to check 
> >>>>>>> that each event does not conflict with another event in
> >>>>>>> the registry with the same name, but having different
> >>>>>>> field names/types or loglevel. If its field
> >>>>>>> names/typing or loglevel differ from a previous event,
> >>>>>>> it prints a warnings. If it matches a previous event,
> >>>>>>> it re-uses the same ID as the previous event. If no
> >>>>>>> match, it allocates a new event ID. It sends a command
> >>>>>>> to the application to let it know the mapping between
> >>>>>>> the event name and ID for the channel. When the 
> >>>>>>> application receives that command, it can finally
> >>>>>>> proceed to attach the tracepoint probe to the
> >>>>>>> tracepoint site.
> >>>> 
> >>>>>>> The sessiond keeps a per-application/per-channel hash
> >>>>>>> table of already enabled events, so it does not provide
> >>>>>>> the same event name/id mapping twice for a given
> >>>>>>> channel.
> >>>> 
> >>>> and per-session ?
> >>>> 
> >>>>> Yes.
> >>>> 
> >>>> 
> >>>> Of what I understand of this proposal, an event is associated
> >>>> to per-user/per-session/per-apps/per-channel values.
> >>>> 
> >>>>> Well, given that channels become per-user, an application
> >>>>> will write its data into the channel with same UID as
> >>>>> itself. (it might imply some limitations with setuid() in
> >>>>> an application, or at least to document those, or that we
> >>>>> overload setuid())
> >>>> 
> >>>>> The "per-app" part is not quite right. Event IDs are
> >>>>> re-used and shared across all applications that belong to
> >>>>> the same UID.
> >>>> 
> >>>> 
> >>>> 
> >>>> (I have a question at the end about how an event ID should
> >>>> be generated)
> >>>> 
> >>>>>>> 
> >>>>>>> Algo (2) In the case where a library (.so) is being
> >>>>>>> loaded in the application while tracing, the update
> >>>>>>> sequence goes as follow: the application first checks
> >>>>>>> if there is any session created. It so, it sends a
> >>>>>>> NOTIFY_NEW_EVENTS message to the sessiond through the
> >>>>>>> communication socket (normally used to send ACK to
> >>>>>>> commands). The lttng-sessiond will therefore need to
> >>>>>>> listen (read) to each application communication
> >>>>>>> socket, and will also need to dispatch
> >>>>>>> NOTIFY_NEW_EVENTS messages each time it expects an ACK
> >>>>>>> reply for a command it has sent to the application.
> >>>> 
> >>>> Taking back the last sentence, can you explain more or
> >>>> clarify the mechanism here of "dispatching a
> >>>> NOTIFY_NEW_EVENTS" each time an ACK reply is expected?... Do
> >>>> you mean that each time we are waiting for an ACK, if we get
> >>>> a NOTIFY instead (which could happen due to a race between
> >>>> notification and command handling) you will launch a NOTIFY
> >>>> code path where the session daemon check the events hash 
> >>>> table and check for event(s) to pull from the UST tracer? ...
> >>>> so what about getting the real ACK after that ?
> >>>> 
> >>>>> In this scheme, the NOTIFY is entirely asynchronous, and
> >>>>> gets no acknowledge from the sessiond to the app. This
> >>>>> means that when dispatching the NOTIFY received at the ACK
> >>>>> site (due to the race you refer to), we could simply queue
> >>>>> this notify within the sessiond so it gets handled after we
> >>>>> finished handling the current command (e.g. next time the
> >>>>> thread go back to poll fds).
> >>>> 
> >>>> 
> >>>>>>> When a NOTIFY_NEW_EVENTS is received from an
> >>>>>>> application, the sessiond iterates on each session,
> >>>>>>> each channel, redoing Algo (1). The per-app/per-channel
> >>>>>>> hash table that remembers already enabled events will
> >>>>>>> ensure that we don't end up enabling the same event
> >>>>>>> twice.
> >>>>>>> 
> >>>>>>> At application startup, the "registration done" message
> >>>>>>> will only be sent once all the commands setting the
> >>>>>>> mapping between event name and ID are sent. This
> >>>>>>> ensures tracing is not started until all events are
> >>>>>>> enabled (delaying the application for a configurable
> >>>>>>> delay).
> >>>>>>> 
> >>>>>>> At library load, a "registration done" will also be
> >>>>>>> sent by the sessiond some time after the
> >>>>>>> NOTIFY_NEW_EVENTS has been received -- at the end of
> >>>>>>> Algo(1). This means that library load, within
> >>>>>>> applications, can be delayed for the same amount of
> >>>>>>> time that apply to application start (configurable).
> >>>>>>> 
> >>>>>>> The registry is emptied when the session is destroyed.
> >>>>>>> Event IDs are never freed, only re-used for events with
> >>>>>>> the same name, after loglevel, field name and field
> >>>>>>> type match check.
> >>>> 
> >>>> This means that event IDs here should be some kind of a hash
> >>>> using a combination of values of the event to make sure it's
> >>>> unique on a per-event/per-channel/per-session basis ?
> >>>> (considering the sessiond should keep them in a separate
> >>>> registry)
> >>>> 
> >>>>> I think it would be enough to hash the events by their full
> >>>>> name, and then do a compare to check if the fields match.
> >>>>> We _want_ the hash table lookup to succeed if we get an
> >>>>> event with same name but different fields, but then our
> >>>>> detailed check for field mispatch would fail.
> >>>> 
> >>>> 
> >>>>>>> 
> >>>>>>> This registry is also used to generate metadata from
> >>>>>>> the sessiond. The sessiond will now be responsible for
> >>>>>>> generation of the metadata stream.
> >>>>>>> 
> >>>> 
> >>>> This implies that the session daemon will need to keep track
> >>>> of the global memory location of each applications in order
> >>>> to consumer metadata streams ?
> >>>> 
> >>>>> Uh ? no. The metadata stream would be _created_ by the
> >>>>> sessiond. Applications would not have anything to do with
> >>>>> the metadata stream in this scheme. Basically, we need to
> >>>>> ensure that all the information required to generate the
> >>>>> metadata for a given session/channel is present in the
> >>>>> table that contains mapping between numeric event IDs and
> >>>>> event name/field names/types/loglevel.
> > 
> > Hmmm... the session daemon creates the metadata now... this means
> > you are going to get the full ringbuffer + ctf code inside the
> > lttng-tools tree....!?
> > 
> >> No. Just move the internal representation of the tracepoint
> >> metadata that UST currently has (see ust-events.h) into
> >> lttng-tools.
> 
> Right so the session daemon has to pass over the metadata buffer
> information up to the application in order to write them?
> 
> Please, if I'm wrong again, maybe just write a paragraph to explain
> the whole shebang :P

No.

The sessiond keeps the registry about the entire metadata for each
channel. The application will _not_ generate the metadata stream. That's
the whole change: now the sessiond will generate this metadata stream.
The applications would now have nothing to do with that stream: they
would simply write into the data buffers shared across all processes of
a given user.

Not sure if it's still unclear ?

Thanks,

Mathieu

> 
> Cheers
> David
> 
> > 
> >> Thanks,
> > 
> >> Mathieu
> > 
> > 
> > Any case, I would love having a reason for that since this seems to
> > me an important change.
> > 
> > David
> > 
> >>>> 
> >>>>> Anything still unclear ?
> >>>> 
> >>>>> Thanks,
> >>>> 
> >>>>> Mathieu
> >>>> 
> >>>> 
> >>>> Cheers! David
> >>>> 
> > 
> -----BEGIN PGP SIGNATURE-----
> 
> iQEcBAEBCgAGBQJQU1bNAAoJEELoaioR9I02hYUH/A1ZVDJ36R90xaPUG+BHMTKl
> NcN9dK4CBg4dRhVD7JI22HXkG4BYL/rFA0b8FDxIamIqAzdinZwacbNvHlIQ3Qi6
> 4yFbIjRd0oUJXgkPClyAnl+KPHUNI73HFwuqiFx+PWya/3zNdXBvtvAoyVr3XNSu
> DG8qV0rlWhtqnd5c8waoMiabdmQ9unbv4Lbme1tPnb8y4AAW7Bkur16gplwiHmJQ
> iVttc1PhQ14cyl6PyxA6KzhLXLP3WbTxE5tGnzZAEPI0rU79cMqVoR1tWP+Ggeee
> HMjUQmVvPcM3GJ17ufXcCYKwlwBIIHaovShS1uKnaMgaFkuEUoP+yh92Z5lsaU0=
> =IrHn
> -----END PGP SIGNATURE-----

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com