[ltt-dev] [RFC for CTF] Storing state metadata

Mon Feb 7 15:01:28 EST 2011

Hi Mathieu, thanks for you quick feedback!

It reminded me, I talked to a couple people about this but I forgot to
specify it in the document : there's in fact two independent problems here:

#1 - How to express state changes in trace metadata
#2 - What to put in (or "how to organize") the attribute tree

So maybe I should add a section that specifically addresses #2

On 11-02-07 02:25 PM, Mathieu Desnoyers wrote:
>>        Definitions
>> --------------------------
>>
>> * Attributes
>> An attribute is a "single element of state", the basic unit, the atom if you
>> would. Each bit of information we want to store about the state is represented
>> with an attribute. The idea so far was to organize them in a tree, similar to
>> the /proc filesystem.
>> For example:
>>
>> host1/CPUs/0/Current_process
>> host1/Processes/2500/Exec_name
>>
>> could be attributes. They would represent, respectively, the current scheduled
>> process on CPU 0 and the current executable name of process with PID 2500, both
>> on host "host1".
> little point of semantic: CPUs schedule threads, not processes.

Indeed. We should add a field "Current_thread" with its TGID(?).
But the current_process is interesting too, because we can map which PID
does a syscall (for example), since the syscall event's payload gives us
the CPU but not the PID.

>>     Points of interest
>> --------------------------
>>
>> * Integer vs Strings state values
>> The design of the State History so far allows for State values to be either
>> Integers or variable-length Strings. However, in cases where we have a defined
>> set of possible values known in advance, it might be interesting to use enum-
>> like integers instead of strings to save up on storage space. (e.g. system call
>> names, IRQ names, etc.)
>>
>> One thing to remember in this case is that the "mapping" between the enums and
>> the integers will have to be known by both the tracer and the analysis tool, so
>> this adds a dependency.
>> (The State History library does not need to know about it though, we can have it
>> store any value and it will happily return it without knowing what it means.)
> One possibility would be to keep one extra type of info: enums would be
> a ( value , reference to enumeration mapping table ) pair, so that the
> corresponding string could be extracted from the value without having to
> keep information about the enumeration mapping table externally. We
> could even decide to have a whole level of "directories" in the state
> tree mapped to a single enumeration mapping table, which would apply to
> all children, so we don't have to repeat the enum table reference. Just
> food for thoughts.
>

Yes that's a good idea. Those mappings would be different for each
application anyway, so it makes sense to make it part of the supplied
information.
They shouldn't be stored in the History per se (because it's not
interval-like information that can change during the trace), but could
come in another container for "static state information" that's valid
all the time?

>>
>> * Events vs. State changes
>> The goal of adding state metadata to trace points is to map state changes to
>> events. By definition, a state-changing event will define one *or more* state
>> changes. All the information required to define these state changes has to be
>> present locally in the scope of the trace point, or in some cases in the state
>> history itself.
>>
>> For example, a scheduling event could cause the following state changes:
>> - set the "running" status to the process that got scheduled in
> again, process -> thread
>
>> - set the "preempted" (for example) status to the process that got scheduled out
>> - update the "current running process" on the relevant CPU
>>
>> When we explicitely express each one of those changes using the attributes and
>> values we defined earlier, we can also use the term "attribute modifications".
>>
>>
>> * Conditions
>> It's also interesting to define conditions at which state changes occur. Once
>> again those conditions can only use information that is either available locally
>> or in the state history.
>>
>> For example, if we look at the state changes caused by a scheduling event, shown
>> at the previous point, we might want to *not* insert state changes when the
>> previous or next pid is "0", since we do not care about the current status of
>> "process 0".
> Why would we skip pid 0 ? It's really important to know when the system
> is going to execute the idle thread.

Hmm, if I remember right, in this case it was to avoid creating
un-needed intervals when the CPUs weren't executing anything, and save a
bit of space. We could still know that "idle" was being executed as the
History would return "null".

>> Examples of the declaration
>> --------------------------
>>
>> This is an example for a scheduling event. We assume we have local access to
>> the usual event payload [next_pid, prev_pid, prev_state] as well as "cpu", the
>> cpu number on which this event happened.
>>
>>
>>
>> * Alternative #1:  C-like syntax
>> (omitted semi-colons, strcat's and the like for clarity)
>>
>> state_change changes[3]
>>
>> /* Set the status of the process scheduled in */
>> if ( next_pid != 0 ) {
>> 	changes[0].type = MODIFY
>> 	changes[0].attribute_name = "<hostname>/Processes/" + next_pid + "/Status"
>> 	changes[0].value = STATE_RUNNING
>> }
>>
>> /* Set the status of the process scheduled out */
>> if ( prev_pid != 0 ) {
>> 	changes[1].type = MODIFY
>> 	changes[1].attribute_name = "<hostname>/Processes/" + prev_pid + "/Status"
>> 	changes[1].value = prev_state
>> }
>>
>> /* Set the current active process on the relevant CPU */
>> changes[2].type = MODIFY
>> changes[2].attribute_name = "<hostname>/CPUs/" + cpu + "/Current_process"
>> changes[2].value = next_pid
> Clean, understandable, although I'm not convinced that the example is
> well chosen for the pid != 0.
>

Let's suppose it's just an example to show how conditions would work ;)

>> * Alternative #2:  XML syntax
>>
>> <statechange>
>> 	<condition = "next_pid != 0">
>> 	<type = MODIFY>
>> 	<attributename>
>> 		<external>hostname</external>
>> 		<literal>Processes</literal>
>> 		<internal>next_pid</internal>
>> 		<literal>Status</literal>
>> 	</attributename>
>> 	<value>
>> 		<internal>STATE_RUNNING</internal>
>> 	</value>
>> </statechange>
>> <statechange>
>> 	<condition = "prev_pid != 0">
>> 	<type = MODIFY>
>> 	<attributename>
>> 		<external>hostname</external>
>> 		<literal>Processes</literal>
>> 		<internal>prev_pid</internal>
>> 		<literal>Status</literal>
>> 	</attributename>
>> 	<value>
>> 		<internal>prev_state</internal>
>> 	</value>
>> </statechange>
>> <statechange>
>> 	<condition = true>	<!-- always record this change -->
>> 	<type = MODIFY>
>> 	<attributename>
>> 		<external>hostname</external>
>> 		<literal>CPUs</literal>
>> 		<internal>cpu</internal>
>> 		<literal>Current_process</literal>
>> 	</attributename>
>> 	<value>
>> 		<internal>next_pid</internal>
>> 	</value>
>> </statechange>
> Hrm, do we really expect people to type this in manually ? ;)

Web developers maybe xD

>> In both cases, attribute names contain either literal, external or internal
>> components. "Internal" refer to variables available locally. Literals are that,
>> string literals that will be used as-is in the attribute tree. Externals are
>> placeholder values that the trace reading library and/or the state history
>> building mechanism will have to replace with the correct value.
>>
>>
>> (Surely there is a lot of shortcomings in these examples right now, but
>> hopefully they explain what I'm trying to do ;)
>>
>> Personnally I find #1 more compact and more readable, but #2 has the advantage
>> of not having to be in the program itself.
> Not true. We could parse C-like syntax descriptions provided along with
> the plugins. We don't have to go with XML for this. A single description
> format would indeed be better if we can both keep the degree of
> flexibility required by plugin-provided descriptions and not be too
> verbose.

Ah ok, even better then. Yeah I prefer C-like syntax too.

Maybe it's still too early to decide on that, but I was wondering who
should "execute" that code? The trace reading library or the viewer?

> Thanks,
>
> Mathieu
>
>

-- 
Alexandre Montplaisir
DORSAL lab,
École Polytechnique de Montréal