[lttng-dev] [RFC] Per-user event ID allocation proposal

Fri Sep 14 11:58:08 EDT 2012

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Mathieu Desnoyers:
> * David Goulet (dgoulet at efficios.com) wrote: Hi Mathieu,
> 
> This looks good! I have some questions to clarify part of the RFC.
> 
> Mathieu Desnoyers:
>>>> Mathieu Desnoyers September 11, 2012
>>>> 
>>>> Per-user event ID allocation proposal
>>>> 
>>>> The intent of this shared event ID registry is to allow
>>>> sharing tracing buffers between applications belonging to the
>>>> same user (UID) for UST (user-space) tracing.
>>>> 
>>>> A.1) Overview of System-wide, per-ABI (32 and 64-bit),
>>>> per-user, per-session, LTTng-UST event ID allocation:
>>>> 
>>>> - Modify LTTng-UST and lttng-tools to keep a system-wide,
>>>> per-ABI (32 and 64-bit), per-user, per-session registry of
>>>> enabled events and their associated numeric IDs. - LTTng-UST
>>>> will have to register its tracepoints to the session daemon,
>>>> sending the field typing of these tracepoints during
>>>> registration, - Dynamically check that field types match upon
>>>> registration of an event in this global registry, refuse
>>>> registration if the field types do not match, - The metadata
>>>> will be generated by lttng-tools instead of the application.
>>>> 
>>>> A.2 Per-user Event ID Details:
>>>> 
>>>> The event ID registry is shared across all processes for a
>>>> given session/ABI/channel/user (UID). The intent is to forbid
>>>> one user to access tracing data from another user, while
>>>> keeping the system-wide number of buffers small.
>>>> 
>>>> The event ID registry is attached to a: - session, - specific
>>>> ABI (32/64-bit), - channel, - user (UID).
>>>> 
>>>> lttng-session fill this registry by pulling this information
>>>> as needed from traced processes (a.k.a. applications) to
>>>> populate the registry. This information is needed only when
>>>> an event is active for a created session. Therefore,
>>>> applications need not to notify the sessiond if no session is
>>>> created.
>>>> 
>>>> The rationale for using a "pull" scheme, where the sessiond
>>>> pulls information from applications, in opposition to a
>>>> "push" scheme, where application would initiate commands to
>>>> push the information, is that it minimizes the amount of
>>>> logic required within liblttng-ust, and it does not require
>>>> liblttng-ust to wait for reply from lttng-sessiond, which
>>>> minimize the impact on the application behavior, providing
>>>> application resilience to lttng-sessiond crash.
>>>> 
>>>> Updates to this registry are triggered by two distinct
>>>> scenarios: either an "enable-event" command (could also be
>>>> "start", depending on the sessiond design) is being executed,
>>>> or, while tracing, a library is being loaded within the
>>>> application.
>>>> 
>>>> Before we start describing the algorithms that update the
>>>> registry, it is _very_ important to understand that an event
>>>> enabled with "enable-event" can contain a wildcard (e.g.:
>>>> libc*) and loglevel, and therefore is associated to possibly
>>>> _many_ events in the application.
>>>> 
>>>> Algo (1) When an "enable-event"/"start" command is executed,
>>>> the sessiond will get, in return for sending an enable-event
>>>> command to the application (which apply to a channel within a
>>>> session), a variable-sized array of enabled events (remember,
>>>> we can enable a wildcard!), along with their name, loglevel,
>>>> field name, and field type. The sessiond proceeds to check
>>>> that each event does not conflict with another event in the
>>>> registry with the same name, but having different field
>>>> names/types or loglevel. If its field names/typing or
>>>> loglevel differ from a previous event, it prints a warnings.
>>>> If it matches a previous event, it re-uses the same ID as the
>>>> previous event. If no match, it allocates a new event ID. It 
>>>> sends a command to the application to let it know the mapping
>>>>  between the event name and ID for the channel. When the 
>>>> application receives that command, it can finally proceed to
>>>> attach the tracepoint probe to the tracepoint site.
> 
>>>> The sessiond keeps a per-application/per-channel hash table
>>>> of already enabled events, so it does not provide the same
>>>> event name/id mapping twice for a given channel.
> 
> and per-session ?
> 
>> Yes.
> 
> 
> Of what I understand of this proposal, an event is associated to 
> per-user/per-session/per-apps/per-channel values.
> 
>> Well, given that channels become per-user, an application will
>> write its data into the channel with same UID as itself. (it
>> might imply some limitations with setuid() in an application, or
>> at least to document those, or that we overload setuid())
> 
>> The "per-app" part is not quite right. Event IDs are re-used and
>> shared across all applications that belong to the same UID.
> 
> 
> 
> (I have a question at the end about how an event ID should be
> generated)
> 
>>>> 
>>>> Algo (2) In the case where a library (.so) is being loaded in
>>>> the application while tracing, the update sequence goes as
>>>> follow: the application first checks if there is any session
>>>> created. It so, it sends a NOTIFY_NEW_EVENTS message to the
>>>> sessiond through the communication socket (normally used to
>>>> send ACK to commands). The lttng-sessiond will therefore need
>>>> to listen (read) to each application communication socket,
>>>> and will also need to dispatch NOTIFY_NEW_EVENTS messages
>>>> each time it expects an ACK reply for a command it has sent
>>>> to the application.
> 
> Taking back the last sentence, can you explain more or clarify the 
> mechanism here of "dispatching a NOTIFY_NEW_EVENTS" each time an
> ACK reply is expected?... Do you mean that each time we are waiting
> for an ACK, if we get a NOTIFY instead (which could happen due to a
> race between notification and command handling) you will launch a
> NOTIFY code path where the session daemon check the events hash
> table and check for event(s) to pull from the UST tracer? ... so
> what about getting the real ACK after that ?
> 
>> In this scheme, the NOTIFY is entirely asynchronous, and gets no 
>> acknowledge from the sessiond to the app. This means that when 
>> dispatching the NOTIFY received at the ACK site (due to the race
>> you refer to), we could simply queue this notify within the
>> sessiond so it gets handled after we finished handling the
>> current command (e.g. next time the thread go back to poll fds).
> 
> 
>>>> When a NOTIFY_NEW_EVENTS is received from an application,
>>>> the sessiond iterates on each session, each channel, redoing
>>>> Algo (1). The per-app/per-channel hash table that remembers
>>>> already enabled events will ensure that we don't end up
>>>> enabling the same event twice.
>>>> 
>>>> At application startup, the "registration done" message will
>>>> only be sent once all the commands setting the mapping
>>>> between event name and ID are sent. This ensures tracing is
>>>> not started until all events are enabled (delaying the
>>>> application for a configurable delay).
>>>> 
>>>> At library load, a "registration done" will also be sent by
>>>> the sessiond some time after the NOTIFY_NEW_EVENTS has been
>>>> received -- at the end of Algo(1). This means that library
>>>> load, within applications, can be delayed for the same amount
>>>> of time that apply to application start (configurable).
>>>> 
>>>> The registry is emptied when the session is destroyed. Event
>>>> IDs are never freed, only re-used for events with the same
>>>> name, after loglevel, field name and field type match check.
> 
> This means that event IDs here should be some kind of a hash using
> a combination of values of the event to make sure it's unique on a 
> per-event/per-channel/per-session basis ? (considering the
> sessiond should keep them in a separate registry)
> 
>> I think it would be enough to hash the events by their full name,
>> and then do a compare to check if the fields match. We _want_ the
>> hash table lookup to succeed if we get an event with same name
>> but different fields, but then our detailed check for field
>> mispatch would fail.
> 
> 
>>>> 
>>>> This registry is also used to generate metadata from the
>>>> sessiond. The sessiond will now be responsible for generation
>>>> of the metadata stream.
>>>> 
> 
> This implies that the session daemon will need to keep track of
> the global memory location of each applications in order to
> consumer metadata streams ?
> 
>> Uh ? no. The metadata stream would be _created_ by the sessiond. 
>> Applications would not have anything to do with the metadata
>> stream in this scheme. Basically, we need to ensure that all the
>> information required to generate the metadata for a given
>> session/channel is present in the table that contains mapping
>> between numeric event IDs and event name/field
>> names/types/loglevel.

Hmmm... the session daemon creates the metadata now... this means you
are going to get the full ringbuffer + ctf code inside the lttng-tools
tree....!?

Any case, I would love having a reason for that since this seems to me
an important change.

David

> 
>> Anything still unclear ?
> 
>> Thanks,
> 
>> Mathieu
> 
> 
> Cheers! David
> 
-----BEGIN PGP SIGNATURE-----

iQEcBAEBCgAGBQJQU1QQAAoJEELoaioR9I02rfsIAMuPRy0A9bY1A4GFiGy0SnRj
D873hzdwqgxdniyr/R1tT7CqvN4oIqo9at6/jJCm7Si6HS5OCeVvII6iGW0HzuU3
Yo9XdzIt75NoiUBgpIccfXqdRLg5bK8IYYJd+CJjrk7xiP7CVJ+XnarShAic82Tm
RSWgjKustgSpJtYHlxdlu+bBu3Y0Dbv9G6ClGJeC96yODVvlkXGtEtRln2SQtVE+
zJcZgqJz+GDc4m1I3nwrXsrXSKgLF4hQuwv7lcXMGRw1xf8pXCW21+UXAlckjlqK
eAutnIIU3IU7SrZ4m5+I/IixAPxg+uAmPSVdTXpwS6r5StomM1vE94ZICT4tyKk=
=mq+J
-----END PGP SIGNATURE-----