[lttng-dev] [RFC] Per-user event ID allocation proposal

Fri Sep 14 12:07:40 EDT 2012

* David Goulet (dgoulet at efficios.com) wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> 
> 
> Mathieu Desnoyers:
> > * David Goulet (dgoulet at efficios.com) wrote: Hi Mathieu,
> > 
> > This looks good! I have some questions to clarify part of the RFC.
> > 
> > Mathieu Desnoyers:
> >>>> Mathieu Desnoyers September 11, 2012
> >>>> 
> >>>> Per-user event ID allocation proposal
> >>>> 
> >>>> The intent of this shared event ID registry is to allow
> >>>> sharing tracing buffers between applications belonging to the
> >>>> same user (UID) for UST (user-space) tracing.
> >>>> 
> >>>> A.1) Overview of System-wide, per-ABI (32 and 64-bit),
> >>>> per-user, per-session, LTTng-UST event ID allocation:
> >>>> 
> >>>> - Modify LTTng-UST and lttng-tools to keep a system-wide,
> >>>> per-ABI (32 and 64-bit), per-user, per-session registry of
> >>>> enabled events and their associated numeric IDs. - LTTng-UST
> >>>> will have to register its tracepoints to the session daemon,
> >>>> sending the field typing of these tracepoints during
> >>>> registration, - Dynamically check that field types match upon
> >>>> registration of an event in this global registry, refuse
> >>>> registration if the field types do not match, - The metadata
> >>>> will be generated by lttng-tools instead of the application.
> >>>> 
> >>>> A.2 Per-user Event ID Details:
> >>>> 
> >>>> The event ID registry is shared across all processes for a
> >>>> given session/ABI/channel/user (UID). The intent is to forbid
> >>>> one user to access tracing data from another user, while
> >>>> keeping the system-wide number of buffers small.
> >>>> 
> >>>> The event ID registry is attached to a: - session, - specific
> >>>> ABI (32/64-bit), - channel, - user (UID).
> >>>> 
> >>>> lttng-session fill this registry by pulling this information
> >>>> as needed from traced processes (a.k.a. applications) to
> >>>> populate the registry. This information is needed only when
> >>>> an event is active for a created session. Therefore,
> >>>> applications need not to notify the sessiond if no session is
> >>>> created.
> >>>> 
> >>>> The rationale for using a "pull" scheme, where the sessiond
> >>>> pulls information from applications, in opposition to a
> >>>> "push" scheme, where application would initiate commands to
> >>>> push the information, is that it minimizes the amount of
> >>>> logic required within liblttng-ust, and it does not require
> >>>> liblttng-ust to wait for reply from lttng-sessiond, which
> >>>> minimize the impact on the application behavior, providing
> >>>> application resilience to lttng-sessiond crash.
> >>>> 
> >>>> Updates to this registry are triggered by two distinct
> >>>> scenarios: either an "enable-event" command (could also be
> >>>> "start", depending on the sessiond design) is being executed,
> >>>> or, while tracing, a library is being loaded within the
> >>>> application.
> >>>> 
> >>>> Before we start describing the algorithms that update the
> >>>> registry, it is _very_ important to understand that an event
> >>>> enabled with "enable-event" can contain a wildcard (e.g.:
> >>>> libc*) and loglevel, and therefore is associated to possibly
> >>>> _many_ events in the application.
> >>>> 
> >>>> Algo (1) When an "enable-event"/"start" command is executed,
> >>>> the sessiond will get, in return for sending an enable-event
> >>>> command to the application (which apply to a channel within a
> >>>> session), a variable-sized array of enabled events (remember,
> >>>> we can enable a wildcard!), along with their name, loglevel,
> >>>> field name, and field type. The sessiond proceeds to check
> >>>> that each event does not conflict with another event in the
> >>>> registry with the same name, but having different field
> >>>> names/types or loglevel. If its field names/typing or
> >>>> loglevel differ from a previous event, it prints a warnings.
> >>>> If it matches a previous event, it re-uses the same ID as the
> >>>> previous event. If no match, it allocates a new event ID. It 
> >>>> sends a command to the application to let it know the mapping
> >>>>  between the event name and ID for the channel. When the 
> >>>> application receives that command, it can finally proceed to
> >>>> attach the tracepoint probe to the tracepoint site.
> > 
> >>>> The sessiond keeps a per-application/per-channel hash table
> >>>> of already enabled events, so it does not provide the same
> >>>> event name/id mapping twice for a given channel.
> > 
> > and per-session ?
> > 
> >> Yes.
> > 
> > 
> > Of what I understand of this proposal, an event is associated to 
> > per-user/per-session/per-apps/per-channel values.
> > 
> >> Well, given that channels become per-user, an application will
> >> write its data into the channel with same UID as itself. (it
> >> might imply some limitations with setuid() in an application, or
> >> at least to document those, or that we overload setuid())
> > 
> >> The "per-app" part is not quite right. Event IDs are re-used and
> >> shared across all applications that belong to the same UID.
> > 
> > 
> > 
> > (I have a question at the end about how an event ID should be
> > generated)
> > 
> >>>> 
> >>>> Algo (2) In the case where a library (.so) is being loaded in
> >>>> the application while tracing, the update sequence goes as
> >>>> follow: the application first checks if there is any session
> >>>> created. It so, it sends a NOTIFY_NEW_EVENTS message to the
> >>>> sessiond through the communication socket (normally used to
> >>>> send ACK to commands). The lttng-sessiond will therefore need
> >>>> to listen (read) to each application communication socket,
> >>>> and will also need to dispatch NOTIFY_NEW_EVENTS messages
> >>>> each time it expects an ACK reply for a command it has sent
> >>>> to the application.
> > 
> > Taking back the last sentence, can you explain more or clarify the 
> > mechanism here of "dispatching a NOTIFY_NEW_EVENTS" each time an
> > ACK reply is expected?... Do you mean that each time we are waiting
> > for an ACK, if we get a NOTIFY instead (which could happen due to a
> > race between notification and command handling) you will launch a
> > NOTIFY code path where the session daemon check the events hash
> > table and check for event(s) to pull from the UST tracer? ... so
> > what about getting the real ACK after that ?
> > 
> >> In this scheme, the NOTIFY is entirely asynchronous, and gets no 
> >> acknowledge from the sessiond to the app. This means that when 
> >> dispatching the NOTIFY received at the ACK site (due to the race
> >> you refer to), we could simply queue this notify within the
> >> sessiond so it gets handled after we finished handling the
> >> current command (e.g. next time the thread go back to poll fds).
> > 
> > 
> >>>> When a NOTIFY_NEW_EVENTS is received from an application,
> >>>> the sessiond iterates on each session, each channel, redoing
> >>>> Algo (1). The per-app/per-channel hash table that remembers
> >>>> already enabled events will ensure that we don't end up
> >>>> enabling the same event twice.
> >>>> 
> >>>> At application startup, the "registration done" message will
> >>>> only be sent once all the commands setting the mapping
> >>>> between event name and ID are sent. This ensures tracing is
> >>>> not started until all events are enabled (delaying the
> >>>> application for a configurable delay).
> >>>> 
> >>>> At library load, a "registration done" will also be sent by
> >>>> the sessiond some time after the NOTIFY_NEW_EVENTS has been
> >>>> received -- at the end of Algo(1). This means that library
> >>>> load, within applications, can be delayed for the same amount
> >>>> of time that apply to application start (configurable).
> >>>> 
> >>>> The registry is emptied when the session is destroyed. Event
> >>>> IDs are never freed, only re-used for events with the same
> >>>> name, after loglevel, field name and field type match check.
> > 
> > This means that event IDs here should be some kind of a hash using
> > a combination of values of the event to make sure it's unique on a 
> > per-event/per-channel/per-session basis ? (considering the
> > sessiond should keep them in a separate registry)
> > 
> >> I think it would be enough to hash the events by their full name,
> >> and then do a compare to check if the fields match. We _want_ the
> >> hash table lookup to succeed if we get an event with same name
> >> but different fields, but then our detailed check for field
> >> mispatch would fail.
> > 
> > 
> >>>> 
> >>>> This registry is also used to generate metadata from the
> >>>> sessiond. The sessiond will now be responsible for generation
> >>>> of the metadata stream.
> >>>> 
> > 
> > This implies that the session daemon will need to keep track of
> > the global memory location of each applications in order to
> > consumer metadata streams ?
> > 
> >> Uh ? no. The metadata stream would be _created_ by the sessiond. 
> >> Applications would not have anything to do with the metadata
> >> stream in this scheme. Basically, we need to ensure that all the
> >> information required to generate the metadata for a given
> >> session/channel is present in the table that contains mapping
> >> between numeric event IDs and event name/field
> >> names/types/loglevel.
> 
> Hmmm... the session daemon creates the metadata now... this means you
> are going to get the full ringbuffer + ctf code inside the lttng-tools
> tree....!?

No. Just move the internal representation of the tracepoint metadata
that UST currently has (see ust-events.h) into lttng-tools.

Thanks,

Mathieu

> 
> Any case, I would love having a reason for that since this seems to me
> an important change.
> 
> David
> 
> > 
> >> Anything still unclear ?
> > 
> >> Thanks,
> > 
> >> Mathieu
> > 
> > 
> > Cheers! David
> > 
> -----BEGIN PGP SIGNATURE-----
> 
> iQEcBAEBCgAGBQJQU1QQAAoJEELoaioR9I02rfsIAMuPRy0A9bY1A4GFiGy0SnRj
> D873hzdwqgxdniyr/R1tT7CqvN4oIqo9at6/jJCm7Si6HS5OCeVvII6iGW0HzuU3
> Yo9XdzIt75NoiUBgpIccfXqdRLg5bK8IYYJd+CJjrk7xiP7CVJ+XnarShAic82Tm
> RSWgjKustgSpJtYHlxdlu+bBu3Y0Dbv9G6ClGJeC96yODVvlkXGtEtRln2SQtVE+
> zJcZgqJz+GDc4m1I3nwrXsrXSKgLF4hQuwv7lcXMGRw1xf8pXCW21+UXAlckjlqK
> eAutnIIU3IU7SrZ4m5+I/IixAPxg+uAmPSVdTXpwS6r5StomM1vE94ZICT4tyKk=
> =mq+J
> -----END PGP SIGNATURE-----

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com