[ltt-dev] [RFC for CTF] Storing state metadata

Mathieu Desnoyers compudj at krystal.dyndns.org
Mon Feb 7 14:25:40 EST 2011


* Alexandre Montplaisir (alexandre.montplaisir at polymtl.ca) wrote:
> Hi all,
> 
> As we have talked in the past weeks, I've been looking at ways to store
> state-related metadata in a way that it can be supplied with
> instrumented applications, instead of with trace viewers.
> 
> Here is an overview of what was discussed and what I had in mind so far.
> It's a rough draft, and still at a very "brainstorming" stage.
> 
> Feedback/comments very welcome! =)
> 
> 
> Thanks,
> 
> -- 
> Alexandre Montplaisir
> DORSAL lab,
> École Polytechnique de Montréal
> 

> Request For Comments / Proposal on how to store state-related metadata in tracepoints
> 
> Alexandre Montplaisir <alexandre.montplaisir at polymtl.ca>
> 
> 
> Trace viewers normally carry their own state machine to represent the state of
> traced systems at any given point in a trace. Typically, the definition of this
> state machine was in the viewer itself, and had to be constantly updated
> whenever the tracing instrumentation would change.
> 
> It would be interesting if we could provide a basic state machine definition
> included with the instrumentation. This would allow viewers to show basic state
> information without having to "know" the type of trace in advance.
> 
> This proposal tries to give an example of how such a state sytem could be
> defined in trace points (or referred to by the tracepoints), and what
> information would be needed.
> 
> 
> 
>        Definitions
> --------------------------
> 
> * Attributes
> An attribute is a "single element of state", the basic unit, the atom if you
> would. Each bit of information we want to store about the state is represented
> with an attribute. The idea so far was to organize them in a tree, similar to
> the /proc filesystem.
> For example:
> 
> host1/CPUs/0/Current_process
> host1/Processes/2500/Exec_name
> 
> could be attributes. They would represent, respectively, the current scheduled
> process on CPU 0 and the current executable name of process with PID 2500, both
> on host "host1".

little point of semantic: CPUs schedule threads, not processes.

> 
> A main point about the design of this "attribute tree" is that it does not need
> to be defined in advance : it should be built on the go, as we read information
> from the trace (e.g. we won't know how many CPUs there will be, etc.)
> 
> 
> * State values
> The goal of the attributes is to store values. Each "state value" is only valid
> for a certain period of time, or "interval". Only one value exists for a given
> attribute/timestamp pair, but this value can be different at other times.
> 
> For example, attribute "host1/CPUs/0/Current_process" could have value "1750"
> for a given period, which would mean the scheduled process on CPU0 was PID 1750
> during that time.
> 
> "Null" is also a possible and important state value. It means "there is no
> information about this attribute at this time". If a process only lived for two
> minutes in an hour-long trace, everywhere else its attributes will have null
> values.
> 
> 
> 
>     Points of interest
> --------------------------
> 
> * Integer vs Strings state values
> The design of the State History so far allows for State values to be either
> Integers or variable-length Strings. However, in cases where we have a defined
> set of possible values known in advance, it might be interesting to use enum-
> like integers instead of strings to save up on storage space. (e.g. system call
> names, IRQ names, etc.)
> 
> One thing to remember in this case is that the "mapping" between the enums and
> the integers will have to be known by both the tracer and the analysis tool, so
> this adds a dependency.
> (The State History library does not need to know about it though, we can have it
> store any value and it will happily return it without knowing what it means.)

One possibility would be to keep one extra type of info: enums would be
a ( value , reference to enumeration mapping table ) pair, so that the
corresponding string could be extracted from the value without having to
keep information about the enumeration mapping table externally. We
could even decide to have a whole level of "directories" in the state
tree mapped to a single enumeration mapping table, which would apply to
all children, so we don't have to repeat the enum table reference. Just
food for thoughts.

> 
> 
> * Events vs. State changes
> The goal of adding state metadata to trace points is to map state changes to
> events. By definition, a state-changing event will define one *or more* state
> changes. All the information required to define these state changes has to be
> present locally in the scope of the trace point, or in some cases in the state
> history itself.
> 
> For example, a scheduling event could cause the following state changes:
> - set the "running" status to the process that got scheduled in

again, process -> thread

> - set the "preempted" (for example) status to the process that got scheduled out
> - update the "current running process" on the relevant CPU
> 
> When we explicitely express each one of those changes using the attributes and
> values we defined earlier, we can also use the term "attribute modifications".
> 
> 
> * Conditions
> It's also interesting to define conditions at which state changes occur. Once
> again those conditions can only use information that is either available locally
> or in the state history.
> 
> For example, if we look at the state changes caused by a scheduling event, shown
> at the previous point, we might want to *not* insert state changes when the
> previous or next pid is "0", since we do not care about the current status of
> "process 0".

Why would we skip pid 0 ? It's really important to know when the system
is going to execute the idle thread.

> 
> 
> * Types of state changes
> Finally, some events affect the state in more complex ways than direct attribute
> modifications. It usually has something to do with required information that is
> not available locally in the event payload and requires a query on the history.
> 
> The state history library (for now) provides abstractions for these different
> types:
> 
>   MODIFY(timestamp, value, attribute)
>   Bread-and-butter modification method, we insert in the history a state change
>   at "timestamp", in which we now assign "value" to the given "attribute".
>   
>   REMOVE(timestamp, attribute)
>   Similar to MODIFY(timestamp, "null", attribute), except we also "nullify" all
>   the children of the attribute. A bit like "rm -rf". This is needed in some
>   cases where we don't know exactly how many children an attribute has.
>   (e.g. a process dies, we want to remove all of its child-attributes).
>   
>   PUSH(timestamp, value, attribute)
>   POP(timestamp, attribute)
>   In some cases we are not only interested in the latest value of a given
>   attribute, but we want to keep a "stack" of previous ones we have seen so far.
>   This is the case with process execution modes (nested IRQs and syscalls and 
>   the like).
>   
>   INCREMENT(timestamp, attribute)
>   Sometimes we might just want to increment a counter, without having to keep
>   an array in memory just to pass values to MODIFY's. The history will look for
>   the previous value of this attribute and will insert a change that increments
>   the count by 1.
>   This is particularly useful if we want to store statistics in the history.
> 
> 
> (This may add unwanted complexity at the "tracer" level though, but I haven't
> figured out a way of generating different types of changes other than declaring
> them right from the start.)
> 
> 
> Examples of the declaration
> --------------------------
> 
> This is an example for a scheduling event. We assume we have local access to
> the usual event payload [next_pid, prev_pid, prev_state] as well as "cpu", the
> cpu number on which this event happened.
> 
> 
> 
> * Alternative #1:  C-like syntax
> (omitted semi-colons, strcat's and the like for clarity)
> 
> state_change changes[3]
> 
> /* Set the status of the process scheduled in */
> if ( next_pid != 0 ) {
> 	changes[0].type = MODIFY
> 	changes[0].attribute_name = "<hostname>/Processes/" + next_pid + "/Status"
> 	changes[0].value = STATE_RUNNING
> }
> 
> /* Set the status of the process scheduled out */
> if ( prev_pid != 0 ) {
> 	changes[1].type = MODIFY
> 	changes[1].attribute_name = "<hostname>/Processes/" + prev_pid + "/Status"
> 	changes[1].value = prev_state
> }
> 
> /* Set the current active process on the relevant CPU */
> changes[2].type = MODIFY
> changes[2].attribute_name = "<hostname>/CPUs/" + cpu + "/Current_process"
> changes[2].value = next_pid

Clean, understandable, although I'm not convinced that the example is
well chosen for the pid != 0.

> * Alternative #2:  XML syntax
> 
> <statechange>
> 	<condition = "next_pid != 0">
> 	<type = MODIFY>
> 	<attributename>
> 		<external>hostname</external>
> 		<literal>Processes</literal>
> 		<internal>next_pid</internal>
> 		<literal>Status</literal>
> 	</attributename>
> 	<value>
> 		<internal>STATE_RUNNING</internal>
> 	</value>
> </statechange>
> <statechange>
> 	<condition = "prev_pid != 0">
> 	<type = MODIFY>
> 	<attributename>
> 		<external>hostname</external>
> 		<literal>Processes</literal>
> 		<internal>prev_pid</internal>
> 		<literal>Status</literal>
> 	</attributename>
> 	<value>
> 		<internal>prev_state</internal>
> 	</value>
> </statechange>
> <statechange>
> 	<condition = true>	<!-- always record this change -->
> 	<type = MODIFY>
> 	<attributename>
> 		<external>hostname</external>
> 		<literal>CPUs</literal>
> 		<internal>cpu</internal>
> 		<literal>Current_process</literal>
> 	</attributename>
> 	<value>
> 		<internal>next_pid</internal>
> 	</value>
> </statechange>

Hrm, do we really expect people to type this in manually ? ;)

> 
> In both cases, attribute names contain either literal, external or internal
> components. "Internal" refer to variables available locally. Literals are that,
> string literals that will be used as-is in the attribute tree. Externals are
> placeholder values that the trace reading library and/or the state history
> building mechanism will have to replace with the correct value.
> 
> 
> (Surely there is a lot of shortcomings in these examples right now, but
> hopefully they explain what I'm trying to do ;)
> 
> Personnally I find #1 more compact and more readable, but #2 has the advantage
> of not having to be in the program itself.

Not true. We could parse C-like syntax descriptions provided along with
the plugins. We don't have to go with XML for this. A single description
format would indeed be better if we can both keep the degree of
flexibility required by plugin-provided descriptions and not be too
verbose.

Thanks,

Mathieu

> If we want to also support 
> externally-supplied state machines, having a common syntax is probably a good
> thing.)
> 
> 
>       Link with the
>     State History API
> --------------------------
> 
> First we define what a "state change" is Java-side.
> 
> 
> enum StateChangeType {MODIFY, REMOVE, PUSH, POP, INC;}
> 
> class StateChange {
> 	StateChangeType type;
> 	String[] attributeName;
> 	int newValue;
> 	long timestamp;
> 
> 	...
> }
> 
> 
> And we add a field "stateChanges" to the Events read from the trace. We suppose
> the trace reading library (a.k.a. Matthew's magical box) will fill up this array
> based on the information in the trace point.
> 
> 
> class Event {
> 	...
> 	StateChange[] stateChanges;
> 	...
> }
> 
> (We will also need to implement how the parser will replace "external" 
> placeholder values with real ones taken in the state history built so far)
> 
> 
> After this, the whole "State Event Handler" mechanism can be replaced with the 
> following snippet:
> 
> /* We assume we have the following already defined:
>  * ts = event.timestamp
>  * history = reference to the State History interface object
>  */
> for ( i=0; i < event.stateChanges.length; i++ ) {
> 	StateChange currentChange = event.stateChanges[i];
> 	
> 	switch ( currentChange.type ) {
> 	case MODIFY:
> 		history.modifyAttribute(ts,
> 					currentChange.newValue,
> 					currentChange.attributeName);
> 		break;
> 	case REMOVE:
> 		history.removeAttribute(ts, currentChange.attributeName);
> 		break;
> 	case PUSH:
> 		history.pushAttribute(	ts,
> 					currentChange.newValue,
> 					currentChange.attributeName);
> 		break;
> 	case POP:
> 		history.popAttribute(ts, currentChange.attributeName);
> 		break;
> 	case INC:
> 		history.increment(ts, currentChange.attributeName);
> 		break;
> 	}
> }
> 
> 
> 
> 

> _______________________________________________
> ltt-dev mailing list
> ltt-dev at lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com




More information about the lttng-dev mailing list