[ltt-dev] Bug in LTTv statistics ?

Mathieu Desnoyers compudj at krystal.dyndns.org
Fri Jul 16 11:48:30 EDT 2010


* François Godin (copelnug at gmail.com) wrote:
> 2010/7/16 Mathieu Desnoyers <compudj at krystal.dyndns.org>
> 
> > * François Godin (copelnug at gmail.com) wrote:
> > > Thank for the answer. That made it clear but it bring others questions :
> > >
> > > Is the state for the "page_fault_entry" event is the previous state of
> > the
> > > cpu ? I suppose but a confirmation would be welcome.
> >
> > Well, in reality, the page fault entry event is recorded in page fault
> > context. So the textdump has it right. But that's really just a
> > "modeling artefact" to have one or the other.
> >
> 
> Ok, but I need to know how the mode is found in LTTv stats to allow me to
> compare two solutions. Then once I got it right, I can switch back to being
> compatible with the textdump. Is my assumption correct ?

The mode is the current "state" when the stats callback is called.
Because the stats callback is called before the trap_entry state update
callback, we end up with the state that was there before the event.

> 
> 
> > >
> > > I try to understand why on one trace, is there still "page_fault_entry"
> > in
> > > TRAP mode. I checked the previous correction, and while it find the right
> > > states for all events except those that LTTv put in "TRAP" mode. Am I
> > > missing something else or is it a glitch in LTTv?
> >
> > No sure what you mean. Can you try to explain a little more ?
> >
> 
> I've made tests to verify my assumption. I load textdump in a custom parser
> and it give me the mode of the "page_fault_entry" events. I then compare the
> total of each mode to the number of "page_fault_entry" in each mode in the
> statistics view. For one trace, the result was perfect but for another
> trace, my parser didn't find any "page_fault_entry" events in the "TRAP"
> mode like LTTv stats did. The others mode had the right total except one
> whose total given by the parser was the one given by LTTv + the "page
> fault_entry" in TRAP.
> 
> Page_fault_entry by mode:
> LTTv    Parser
> 1          1          MODE_UNKNOWN
> 971      999       USER_MODE             999(parser) = 971(LTTv)
> +28(LTTv-TRAP)
> 26        26         SYSCALL
> 28        0           TRAP

Hrm hrm. I'd guess that the nested trap (a trap within a trap) case, or
the major page faults are not handled correctly. A major page fault
does:

userspace or kernel
  - trap (page fault entry)
  - reenable interrupts
  - allow to be scheduled out while waiting for I/
...
  - rescheduled
  - disable interrupts
  - iret (page fault exit)

So you could look into those corner-cases to see if we did something
fishy in there.

> 
> 
> I've also made another discovery. It seems that the time between the
> "page_fault_entry" or "page_fault_exit" and the previous event is never
> added in the CPU_TIME. Is this intended?

This is why we have the "cumulative cpu time", which keeps track of the
amount of cpu time used by a context _and_ its nested contexts. I'm not
100% sure it work well though.

Thanks,

Mathieu

> 
> 
> >
> > >
> > >
> > >
> > >
> > > 2010/7/15 Mathieu Desnoyers <compudj at krystal.dyndns.org>
> > >
> > > > * François Godin (copelnug at gmail.com) wrote:
> > > > > I think I've found another bug in the LTTv statistics. The problem is
> > > > with
> > > > > page_fault_entry event. In the textdump they are classified as being
> > in
> > > > mode
> > > > > TRAP but they appear in SYSCALL and USER_MODE mode in the statistics
> > > > view.
> > > >
> > > > This comes from the way callbacks are dealt with in the priority hook
> > > > list:
> > > >
> > > > When an event is encountered, the statistics "before" event callbacks
> > > > are called first, then the state "before" event callbacks are called,
> > > > then the event callbacks per se (e.g. textdump) are called, and then
> > the
> > > > after state, followed by after stats hooks.
> > > >
> > > > So given that the statistics hooks are called before the state
> > > > modification, and that the textdump hooks are called after the state
> > > > modification, there is a discrepancy between the two. But in reality,
> > > > the statistic hook has the correct state associated (the trap occurs
> > its
> > > > calling context, not in "itself"). But how to fix the textdump is
> > > > unclear to me.
> > > >
> > > > Thanks,
> > > >
> > > > Mathieu
> > > >
> > > > --
> > > > Mathieu Desnoyers
> > > > Operating System Efficiency R&D Consultant
> > > > EfficiOS Inc.
> > > > http://www.efficios.com
> > > >
> > >
> > >
> > >
> > > --
> > > François Godin
> >
> > --
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com
> >
> 
> 
> 
> -- 
> François Godin

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com




More information about the lttng-dev mailing list