[lttng-dev] [RFC] Per-user event ID allocation proposal

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Fri Sep 14 13:03:52 EDT 2012


* David Goulet (dgoulet at efficios.com) wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> 
> 
> Mathieu Desnoyers:
> > * David Goulet (dgoulet at efficios.com) wrote:
> > 
> > 
> > Mathieu Desnoyers:
> >>>> * David Goulet (dgoulet at efficios.com) wrote:
> >>>> 
> >>>> 
> >>>> Mathieu Desnoyers:
> >>>>>>> * David Goulet (dgoulet at efficios.com) wrote: Hi
> >>>>>>> Mathieu,
> >>>>>>> 
> >>>>>>> This looks good! I have some questions to clarify part
> >>>>>>> of the RFC.
> >>>>>>> 
> >>>>>>> Mathieu Desnoyers:
> >>>>>>>>>> Mathieu Desnoyers September 11, 2012
> >>>>>>>>>> 
> >>>>>>>>>> Per-user event ID allocation proposal
> >>>>>>>>>> 
> >>>>>>>>>> The intent of this shared event ID registry is
> >>>>>>>>>> to allow sharing tracing buffers between
> >>>>>>>>>> applications belonging to the same user (UID) for
> >>>>>>>>>> UST (user-space) tracing.
> >>>>>>>>>> 
> >>>>>>>>>> A.1) Overview of System-wide, per-ABI (32 and
> >>>>>>>>>> 64-bit), per-user, per-session, LTTng-UST event
> >>>>>>>>>> ID allocation:
> >>>>>>>>>> 
> >>>>>>>>>> - Modify LTTng-UST and lttng-tools to keep a 
> >>>>>>>>>> system-wide, per-ABI (32 and 64-bit), per-user, 
> >>>>>>>>>> per-session registry of enabled events and their 
> >>>>>>>>>> associated numeric IDs. - LTTng-UST will have to 
> >>>>>>>>>> register its tracepoints to the session daemon,
> >>>>>>>>>> sending the field typing of these tracepoints
> >>>>>>>>>> during registration, - Dynamically check that
> >>>>>>>>>> field types match upon registration of an event
> >>>>>>>>>> in this global registry, refuse registration if
> >>>>>>>>>> the field types do not match, - The metadata will
> >>>>>>>>>> be generated by lttng-tools instead of the
> >>>>>>>>>> application.
> >>>>>>>>>> 
> >>>>>>>>>> A.2 Per-user Event ID Details:
> >>>>>>>>>> 
> >>>>>>>>>> The event ID registry is shared across all
> >>>>>>>>>> processes for a given session/ABI/channel/user
> >>>>>>>>>> (UID). The intent is to forbid one user to access
> >>>>>>>>>> tracing data from another user, while keeping the
> >>>>>>>>>> system-wide number of buffers small.
> >>>>>>>>>> 
> >>>>>>>>>> The event ID registry is attached to a: -
> >>>>>>>>>> session, - specific ABI (32/64-bit), - channel, -
> >>>>>>>>>> user (UID).
> >>>>>>>>>> 
> >>>>>>>>>> lttng-session fill this registry by pulling this 
> >>>>>>>>>> information as needed from traced processes
> >>>>>>>>>> (a.k.a. applications) to populate the registry.
> >>>>>>>>>> This information is needed only when an event is
> >>>>>>>>>> active for a created session. Therefore,
> >>>>>>>>>> applications need not to notify the sessiond if
> >>>>>>>>>> no session is created.
> >>>>>>>>>> 
> >>>>>>>>>> The rationale for using a "pull" scheme, where
> >>>>>>>>>> the sessiond pulls information from applications,
> >>>>>>>>>> in opposition to a "push" scheme, where
> >>>>>>>>>> application would initiate commands to push the
> >>>>>>>>>> information, is that it minimizes the amount of
> >>>>>>>>>> logic required within liblttng-ust, and it does
> >>>>>>>>>> not require liblttng-ust to wait for reply from
> >>>>>>>>>> lttng-sessiond, which minimize the impact on the
> >>>>>>>>>> application behavior, providing application
> >>>>>>>>>> resilience to lttng-sessiond crash.
> >>>>>>>>>> 
> >>>>>>>>>> Updates to this registry are triggered by two
> >>>>>>>>>> distinct scenarios: either an "enable-event"
> >>>>>>>>>> command (could also be "start", depending on the
> >>>>>>>>>> sessiond design) is being executed, or, while
> >>>>>>>>>> tracing, a library is being loaded within the
> >>>>>>>>>> application.
> >>>>>>>>>> 
> >>>>>>>>>> Before we start describing the algorithms that
> >>>>>>>>>> update the registry, it is _very_ important to
> >>>>>>>>>> understand that an event enabled with
> >>>>>>>>>> "enable-event" can contain a wildcard (e.g.:
> >>>>>>>>>> libc*) and loglevel, and therefore is associated
> >>>>>>>>>> to possibly _many_ events in the application.
> >>>>>>>>>> 
> >>>>>>>>>> Algo (1) When an "enable-event"/"start" command
> >>>>>>>>>> is executed, the sessiond will get, in return for
> >>>>>>>>>> sending an enable-event command to the
> >>>>>>>>>> application (which apply to a channel within a
> >>>>>>>>>> session), a variable-sized array of enabled
> >>>>>>>>>> events (remember, we can enable a wildcard!),
> >>>>>>>>>> along with their name, loglevel, field name, and
> >>>>>>>>>> field type. The sessiond proceeds to check that
> >>>>>>>>>> each event does not conflict with another event
> >>>>>>>>>> in the registry with the same name, but having
> >>>>>>>>>> different field names/types or loglevel. If its
> >>>>>>>>>> field names/typing or loglevel differ from a
> >>>>>>>>>> previous event, it prints a warnings. If it
> >>>>>>>>>> matches a previous event, it re-uses the same ID
> >>>>>>>>>> as the previous event. If no match, it allocates
> >>>>>>>>>> a new event ID. It sends a command to the
> >>>>>>>>>> application to let it know the mapping between 
> >>>>>>>>>> the event name and ID for the channel. When the 
> >>>>>>>>>> application receives that command, it can
> >>>>>>>>>> finally proceed to attach the tracepoint probe to
> >>>>>>>>>> the tracepoint site.
> >>>>>>> 
> >>>>>>>>>> The sessiond keeps a per-application/per-channel
> >>>>>>>>>> hash table of already enabled events, so it does
> >>>>>>>>>> not provide the same event name/id mapping twice
> >>>>>>>>>> for a given channel.
> >>>>>>> 
> >>>>>>> and per-session ?
> >>>>>>> 
> >>>>>>>> Yes.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Of what I understand of this proposal, an event is
> >>>>>>> associated to per-user/per-session/per-apps/per-channel
> >>>>>>> values.
> >>>>>>> 
> >>>>>>>> Well, given that channels become per-user, an
> >>>>>>>> application will write its data into the channel with
> >>>>>>>> same UID as itself. (it might imply some limitations
> >>>>>>>> with setuid() in an application, or at least to
> >>>>>>>> document those, or that we overload setuid())
> >>>>>>> 
> >>>>>>>> The "per-app" part is not quite right. Event IDs are 
> >>>>>>>> re-used and shared across all applications that
> >>>>>>>> belong to the same UID.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> (I have a question at the end about how an event ID
> >>>>>>> should be generated)
> >>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> Algo (2) In the case where a library (.so) is
> >>>>>>>>>> being loaded in the application while tracing,
> >>>>>>>>>> the update sequence goes as follow: the
> >>>>>>>>>> application first checks if there is any session
> >>>>>>>>>> created. It so, it sends a NOTIFY_NEW_EVENTS
> >>>>>>>>>> message to the sessiond through the communication
> >>>>>>>>>> socket (normally used to send ACK to commands).
> >>>>>>>>>> The lttng-sessiond will therefore need to listen
> >>>>>>>>>> (read) to each application communication socket,
> >>>>>>>>>> and will also need to dispatch NOTIFY_NEW_EVENTS
> >>>>>>>>>> messages each time it expects an ACK reply for a
> >>>>>>>>>> command it has sent to the application.
> >>>>>>> 
> >>>>>>> Taking back the last sentence, can you explain more or 
> >>>>>>> clarify the mechanism here of "dispatching a 
> >>>>>>> NOTIFY_NEW_EVENTS" each time an ACK reply is
> >>>>>>> expected?... Do you mean that each time we are waiting
> >>>>>>> for an ACK, if we get a NOTIFY instead (which could
> >>>>>>> happen due to a race between notification and command
> >>>>>>> handling) you will launch a NOTIFY code path where the
> >>>>>>> session daemon check the events hash table and check
> >>>>>>> for event(s) to pull from the UST tracer? ... so what
> >>>>>>> about getting the real ACK after that ?
> >>>>>>> 
> >>>>>>>> In this scheme, the NOTIFY is entirely asynchronous,
> >>>>>>>> and gets no acknowledge from the sessiond to the app.
> >>>>>>>> This means that when dispatching the NOTIFY received
> >>>>>>>> at the ACK site (due to the race you refer to), we
> >>>>>>>> could simply queue this notify within the sessiond so
> >>>>>>>> it gets handled after we finished handling the
> >>>>>>>> current command (e.g. next time the thread go back to
> >>>>>>>> poll fds).
> >>>>>>> 
> >>>>>>> 
> >>>>>>>>>> When a NOTIFY_NEW_EVENTS is received from an 
> >>>>>>>>>> application, the sessiond iterates on each
> >>>>>>>>>> session, each channel, redoing Algo (1). The
> >>>>>>>>>> per-app/per-channel hash table that remembers
> >>>>>>>>>> already enabled events will ensure that we don't
> >>>>>>>>>> end up enabling the same event twice.
> >>>>>>>>>> 
> >>>>>>>>>> At application startup, the "registration done"
> >>>>>>>>>> message will only be sent once all the commands
> >>>>>>>>>> setting the mapping between event name and ID are
> >>>>>>>>>> sent. This ensures tracing is not started until
> >>>>>>>>>> all events are enabled (delaying the application
> >>>>>>>>>> for a configurable delay).
> >>>>>>>>>> 
> >>>>>>>>>> At library load, a "registration done" will also
> >>>>>>>>>> be sent by the sessiond some time after the 
> >>>>>>>>>> NOTIFY_NEW_EVENTS has been received -- at the end
> >>>>>>>>>> of Algo(1). This means that library load, within 
> >>>>>>>>>> applications, can be delayed for the same amount
> >>>>>>>>>> of time that apply to application start
> >>>>>>>>>> (configurable).
> >>>>>>>>>> 
> >>>>>>>>>> The registry is emptied when the session is
> >>>>>>>>>> destroyed. Event IDs are never freed, only
> >>>>>>>>>> re-used for events with the same name, after
> >>>>>>>>>> loglevel, field name and field type match check.
> >>>>>>> 
> >>>>>>> This means that event IDs here should be some kind of a
> >>>>>>> hash using a combination of values of the event to make
> >>>>>>> sure it's unique on a per-event/per-channel/per-session
> >>>>>>> basis ? (considering the sessiond should keep them in a
> >>>>>>> separate registry)
> >>>>>>> 
> >>>>>>>> I think it would be enough to hash the events by
> >>>>>>>> their full name, and then do a compare to check if
> >>>>>>>> the fields match. We _want_ the hash table lookup to
> >>>>>>>> succeed if we get an event with same name but
> >>>>>>>> different fields, but then our detailed check for
> >>>>>>>> field mispatch would fail.
> >>>>>>> 
> >>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> This registry is also used to generate metadata
> >>>>>>>>>> from the sessiond. The sessiond will now be
> >>>>>>>>>> responsible for generation of the metadata
> >>>>>>>>>> stream.
> >>>>>>>>>> 
> >>>>>>> 
> >>>>>>> This implies that the session daemon will need to keep
> >>>>>>> track of the global memory location of each
> >>>>>>> applications in order to consumer metadata streams ?
> >>>>>>> 
> >>>>>>>> Uh ? no. The metadata stream would be _created_ by
> >>>>>>>> the sessiond. Applications would not have anything to
> >>>>>>>> do with the metadata stream in this scheme.
> >>>>>>>> Basically, we need to ensure that all the information
> >>>>>>>> required to generate the metadata for a given
> >>>>>>>> session/channel is present in the table that contains
> >>>>>>>> mapping between numeric event IDs and event
> >>>>>>>> name/field names/types/loglevel.
> >>>> 
> >>>> Hmmm... the session daemon creates the metadata now... this
> >>>> means you are going to get the full ringbuffer + ctf code
> >>>> inside the lttng-tools tree....!?
> >>>> 
> >>>>> No. Just move the internal representation of the
> >>>>> tracepoint metadata that UST currently has (see
> >>>>> ust-events.h) into lttng-tools.
> > 
> > Right so the session daemon has to pass over the metadata buffer 
> > information up to the application in order to write them?
> > 
> > Please, if I'm wrong again, maybe just write a paragraph to
> > explain the whole shebang :P
> > 
> >> No.
> > 
> >> The sessiond keeps the registry about the entire metadata for
> >> each channel. The application will _not_ generate the metadata
> >> stream. That's the whole change: now the sessiond will generate
> >> this metadata stream. The applications would now have nothing to
> >> do with that stream: they would simply write into the data
> >> buffers shared across all processes of a given user.
> > 
> >> Not sure if it's still unclear ?
> 
> Ok! So, this isn't a small task and we might want to elaborate on that
> part since we have to decide what resources are needed to do that
> either on the consumer side or session daemon... new threads?... new
> dependency... and so on.

Yes, that's right. This is no small task indeed, but it's a task we
should focus on with a #1 priority.

Thanks,

Mathieu

> 
> Thanks!
> David
> 
> > 
> >> Thanks,
> > 
> >> Mathieu
> > 
> > 
> > Cheers David
> > 
> >>>> 
> >>>>> Thanks,
> >>>> 
> >>>>> Mathieu
> >>>> 
> >>>> 
> >>>> Any case, I would love having a reason for that since this
> >>>> seems to me an important change.
> >>>> 
> >>>> David
> >>>> 
> >>>>>>> 
> >>>>>>>> Anything still unclear ?
> >>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>> 
> >>>>>>>> Mathieu
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Cheers! David
> >>>>>>> 
> >>>> 
> > 
> -----BEGIN PGP SIGNATURE-----
> 
> iQEcBAEBCgAGBQJQU1prAAoJEELoaioR9I02PwgH+wXT6VTfYAo4oatZCVYg8PCO
> NQBhDEa2/JHGfkJ68wlTZ9Yx/xVZvYWkm5CGAL6zX5iYU18t1hRotpekYrtO1e2+
> NamJEYprgAKln3e7DUBv98ZPAUKp83Zca5cbJbTtXy4H5ljkx6QqajqRutB+g5zB
> UZBoi8L7GlnJcII/n4874W0huefY9auF6QuJRR1/bjNkMWip87ohn6E1wz2hkG1O
> HGymtaeUpo1sURkV5dAAnxZqendxM0QMLkwLeT1wjyED6WKwh/6SoBjfDs0yMlco
> 1CDKhsWrix7D3rAmKqLQMkzxDaigClfnkucmvUMi6HcVrNepBYHpkzJhZJe2298=
> =mEj6
> -----END PGP SIGNATURE-----

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list