[ltt-dev] [RFC for CTF] Storing state metadata

Mon Feb 7 15:41:35 EST 2011

* Alexandre Montplaisir (alexandre.montplaisir at polymtl.ca) wrote:
> Hi Mathieu, thanks for you quick feedback!
> 
> It reminded me, I talked to a couple people about this but I forgot to
> specify it in the document : there's in fact two independent problems here:
> 
> #1 - How to express state changes in trace metadata
> #2 - What to put in (or "how to organize") the attribute tree
> 
> So maybe I should add a section that specifically addresses #2

Yep, makes sense.

> 
> 
> 
> On 11-02-07 02:25 PM, Mathieu Desnoyers wrote:
> >>        Definitions
> >> --------------------------
> >>
> >> * Attributes
> >> An attribute is a "single element of state", the basic unit, the atom if you
> >> would. Each bit of information we want to store about the state is represented
> >> with an attribute. The idea so far was to organize them in a tree, similar to
> >> the /proc filesystem.
> >> For example:
> >>
> >> host1/CPUs/0/Current_process
> >> host1/Processes/2500/Exec_name
> >>
> >> could be attributes. They would represent, respectively, the current scheduled
> >> process on CPU 0 and the current executable name of process with PID 2500, both
> >> on host "host1".
> > little point of semantic: CPUs schedule threads, not processes.
> 
> Indeed. We should add a field "Current_thread" with its TGID(?).

With its "PID" as seen from the kernel side (the kernel PID is actually
the thread ID. The TGID is the thread group ID, identifying the process)

> But the current_process is interesting too, because we can map which PID
> does a syscall (for example), since the syscall event's payload gives us
> the CPU but not the PID.

We don't have to update the current process. We just have to keep the
info in a thread-specific branch, specifying what its TGID is. It never
has to be updated and it is valid for the lifetime of the thread.

> 
> >>     Points of interest
> >> --------------------------
> >>
> >> * Integer vs Strings state values
> >> The design of the State History so far allows for State values to be either
> >> Integers or variable-length Strings. However, in cases where we have a defined
> >> set of possible values known in advance, it might be interesting to use enum-
> >> like integers instead of strings to save up on storage space. (e.g. system call
> >> names, IRQ names, etc.)
> >>
> >> One thing to remember in this case is that the "mapping" between the enums and
> >> the integers will have to be known by both the tracer and the analysis tool, so
> >> this adds a dependency.
> >> (The State History library does not need to know about it though, we can have it
> >> store any value and it will happily return it without knowing what it means.)
> > One possibility would be to keep one extra type of info: enums would be
> > a ( value , reference to enumeration mapping table ) pair, so that the
> > corresponding string could be extracted from the value without having to
> > keep information about the enumeration mapping table externally. We
> > could even decide to have a whole level of "directories" in the state
> > tree mapped to a single enumeration mapping table, which would apply to
> > all children, so we don't have to repeat the enum table reference. Just
> > food for thoughts.
> >
> 
> Yes that's a good idea. Those mappings would be different for each
> application anyway, so it makes sense to make it part of the supplied
> information.
> They shouldn't be stored in the History per se (because it's not
> interval-like information that can change during the trace), but could
> come in another container for "static state information" that's valid
> all the time?

Not sure. We already have some data structures planned to keep these
mappings that are invariant across all the trace. The point is mostly to
be able to link the data (in the state history) to these mappings. I
don't see why it would be inconvenient to keep this information inside
the state history: it's just a state that has a duration of the whole
trace, so it does not use much space.

> 
> >>
> >> * Events vs. State changes
> >> The goal of adding state metadata to trace points is to map state changes to
> >> events. By definition, a state-changing event will define one *or more* state
> >> changes. All the information required to define these state changes has to be
> >> present locally in the scope of the trace point, or in some cases in the state
> >> history itself.
> >>
> >> For example, a scheduling event could cause the following state changes:
> >> - set the "running" status to the process that got scheduled in
> > again, process -> thread
> >
> >> - set the "preempted" (for example) status to the process that got scheduled out
> >> - update the "current running process" on the relevant CPU
> >>
> >> When we explicitely express each one of those changes using the attributes and
> >> values we defined earlier, we can also use the term "attribute modifications".
> >>
> >>
> >> * Conditions
> >> It's also interesting to define conditions at which state changes occur. Once
> >> again those conditions can only use information that is either available locally
> >> or in the state history.
> >>
> >> For example, if we look at the state changes caused by a scheduling event, shown
> >> at the previous point, we might want to *not* insert state changes when the
> >> previous or next pid is "0", since we do not care about the current status of
> >> "process 0".
> > Why would we skip pid 0 ? It's really important to know when the system
> > is going to execute the idle thread.
> 
> Hmm, if I remember right, in this case it was to avoid creating
> un-needed intervals when the CPUs weren't executing anything, and save a
> bit of space. We could still know that "idle" was being executed as the
> History would return "null".

"idle" can actually execute things: interrupt handlers, management of
swap, etc are executed on behalf of PID 0. I don't see how the current
thread is set to NULL in your state update examples below; it seems to
keep the previous thread as running.

> 
> >> Examples of the declaration
> >> --------------------------
> >>
> >> This is an example for a scheduling event. We assume we have local access to
> >> the usual event payload [next_pid, prev_pid, prev_state] as well as "cpu", the
> >> cpu number on which this event happened.
> >>
> >>
> >>
> >> * Alternative #1:  C-like syntax
> >> (omitted semi-colons, strcat's and the like for clarity)
> >>
> >> state_change changes[3]
> >>
> >> /* Set the status of the process scheduled in */
> >> if ( next_pid != 0 ) {
> >> 	changes[0].type = MODIFY
> >> 	changes[0].attribute_name = "<hostname>/Processes/" + next_pid + "/Status"
> >> 	changes[0].value = STATE_RUNNING
> >> }

maybe more a c-like structure, e.g.

if (condition)
  modify(attribute_name, value);

?

We can then parse this and generate whatever data structure
representation we like.

> >>
> >> /* Set the status of the process scheduled out */
> >> if ( prev_pid != 0 ) {
> >> 	changes[1].type = MODIFY
> >> 	changes[1].attribute_name = "<hostname>/Processes/" + prev_pid + "/Status"
> >> 	changes[1].value = prev_state
> >> }
> >>
> >> /* Set the current active process on the relevant CPU */
> >> changes[2].type = MODIFY
> >> changes[2].attribute_name = "<hostname>/CPUs/" + cpu + "/Current_process"
> >> changes[2].value = next_pid
> > Clean, understandable, although I'm not convinced that the example is
> > well chosen for the pid != 0.
> >
> 
> Let's suppose it's just an example to show how conditions would work ;)

OK :)

> 
> >> * Alternative #2:  XML syntax
> >>
> >> <statechange>
> >> 	<condition = "next_pid != 0">
> >> 	<type = MODIFY>
> >> 	<attributename>
> >> 		<external>hostname</external>
> >> 		<literal>Processes</literal>
> >> 		<internal>next_pid</internal>
> >> 		<literal>Status</literal>
> >> 	</attributename>
> >> 	<value>
> >> 		<internal>STATE_RUNNING</internal>
> >> 	</value>
> >> </statechange>
> >> <statechange>
> >> 	<condition = "prev_pid != 0">
> >> 	<type = MODIFY>
> >> 	<attributename>
> >> 		<external>hostname</external>
> >> 		<literal>Processes</literal>
> >> 		<internal>prev_pid</internal>
> >> 		<literal>Status</literal>
> >> 	</attributename>
> >> 	<value>
> >> 		<internal>prev_state</internal>
> >> 	</value>
> >> </statechange>
> >> <statechange>
> >> 	<condition = true>	<!-- always record this change -->
> >> 	<type = MODIFY>
> >> 	<attributename>
> >> 		<external>hostname</external>
> >> 		<literal>CPUs</literal>
> >> 		<internal>cpu</internal>
> >> 		<literal>Current_process</literal>
> >> 	</attributename>
> >> 	<value>
> >> 		<internal>next_pid</internal>
> >> 	</value>
> >> </statechange>
> > Hrm, do we really expect people to type this in manually ? ;)
> 
> Web developers maybe xD

heheh

> 
> >> In both cases, attribute names contain either literal, external or internal
> >> components. "Internal" refer to variables available locally. Literals are that,
> >> string literals that will be used as-is in the attribute tree. Externals are
> >> placeholder values that the trace reading library and/or the state history
> >> building mechanism will have to replace with the correct value.
> >>
> >>
> >> (Surely there is a lot of shortcomings in these examples right now, but
> >> hopefully they explain what I'm trying to do ;)
> >>
> >> Personnally I find #1 more compact and more readable, but #2 has the advantage
> >> of not having to be in the program itself.
> > Not true. We could parse C-like syntax descriptions provided along with
> > the plugins. We don't have to go with XML for this. A single description
> > format would indeed be better if we can both keep the degree of
> > flexibility required by plugin-provided descriptions and not be too
> > verbose.
> 
> Ah ok, even better then. Yeah I prefer C-like syntax too.
> 
> Maybe it's still too early to decide on that, but I was wondering who
> should "execute" that code? The trace reading library or the viewer?

We should probably make the parsing available at the trace library
level, but we also want to have parsing usable directly at the viewer
level so we can parse descriptions brought with plugins. So I think we
should do both: deploy the code at the trace reading library level, but
provide external symbols to it can be used by the viewer.

Thanks,

Mathieu

> 
> > Thanks,
> >
> > Mathieu
> >
> >
> 
> 
> -- 
> Alexandre Montplaisir
> DORSAL lab,
> École Polytechnique de Montréal
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com