[ltt-dev] OMAP3/4 trace clock and lockdep tracing

Thu Apr 7 12:47:06 EDT 2011

* Harald Gustafsson (hgu1972 at gmail.com) wrote:
> 2011/4/6 Mathieu Desnoyers <compudj at krystal.dyndns.org>:
> > I haven't tested this on OMAP4 personally, so I'm really interested in
> > your feedback. I did my development on OMAP3 UP, but I designed the
> > trace clock to support SMP (but it's not tested by myself, others have
> > reported success though).
> Good to know that other have succeeded on a SMP system.
> 
> > You might want to try with and without:
> >
> > - power management suspend/resume
> > - cpufreq changes
> For initial debug I could turn these off, but the use cases we are to
> analyse must have these on, and have correct trace clock for these
> activities.
> 
> > There is code in trace clock to support each of these. 10 ms offset
> > between the cores is way too much, so I guess there is something wrong
> > there. First identifying if it's suspend/resume or cpufreq that are
> > causing the problems would help us there.
> Yes it must be something with the suspend/resume and not resyncing correctly.

Please note that you can be woken up by an interrupt, which means that
an IRQ handler will be executed before you resume to execute the
instruction that follows the WFI. Do you resync when woken up by these
interrupts ?

> 
> > What do you mean by "enters/leaves WFI" ? (WFI TLA means ?)
> Wait-For-Instruction is the last instruction that the processor do
> before actually sleep or idle. It hangs on this until it is again
> waked-up. Hence I placed the calls to save clock and resync before and
> after these points.
> 
> > Given what you say above, cpufreq running in performance mode should
> > make it OK as far as cpufreq is concerned, although there might be some
> > discrepancy at boot time if the cpufreq mode is only changed later.
> I have discovered that the idle code sometimes enters a state clocked
> at 32kHz, hence that would mess up the clock counter. This might be
> the cause of the problem, but I have not had time to look in more
> detail.
> 
> > I suspect that the problem might come from that your OMAP4 architecture
> > does not call the trace clock resync after power management resume, like
> > OMAP3 is doing.
> Note that this is not an OMAP4 arch I'm running on but another dual
> core SMP ARMv7. I thought I called resync at all places but maybe not.
> 
> > You might also want to have a look at the Linaro 2.6.38
> > git tree, which integrates LTTng with Linaro, which might have better
> > support for OMAP4 suspend/resume.
> I have not checked that yet, need to take a closer look later on.
> 
> > I'm really interested in fixing this up, but I'll need your help for
> > testing.
> That's generous of you, I'm happy to do testing but it might be
> difficult since you don't have access to the actual architecture files
> and the issue likely is in those. Could I get back on this one, since
> I would like to have good performance on this, but I'm using code that
> my employer has not yet released (need to follow process). For now I
> decided to implement a much simpler trace clock, due to that people
> were waiting on the solution, that always is in sync between cores and
> don't need resync at idle/wake-up but still has decent resolution. It
> looks like this:
> 
> static atomic64_t last_clock_val;
> inline u64 trace_clock_read64(void)
> {
>     u64 clock_val, ret, inc_clock_val;
> 
>     do {
> 	clock_val = cnt32_to_63(clock32k->read(clock32k)) << TRACE_CLOCK_32K_BITSHIFT;

You might want to use the trace clock 32 to 64 provided by LTTng instead
of cnt_32_to_63. See the kernel/trace/trace-clock-32-to-64.c.

> 	inc_clock_val = atomic64_add_return(1,&last_clock_val); //contains
> memory barrier
>     } while( (clock_val > inc_clock_val) &&
> (atomic64_cmpxchg(&last_clock_val, inc_clock_val, clock_val) !=
> inc_clock_val));

I'd have to look a little bit at this code to figure out exactly what it
does, but I'm glad it hear it works for you :)

Mathieu

> 
>     if(clock_val > inc_clock_val)
> 	ret = clock_val;
>     else
> 	ret = inc_clock_val;
>     return ret;
> }
> 
> Same concept as the generic jiffy + counter, but using an always on
> 32kHz clock and no timers. It speculatively increase the LSBs of the
> atomic clock variable and replace it with the up shifted 32kHz clock
> if it has ticked up. Worst case it spins the while loop 1<<
> TRACE_CLOCK_32K_BITSHIFT times if two threads goes in lock-step for
> each line (although the atomic operations can also spin). We are still
> looking into improving it, but it seems to work. ARM has faster cache
> snooping than x86 so it is less of a problem that it uses a shared
> atomic between cores. I'm also speculating in using a cycle counter
> progress instead of an inc of 1, but that would mean the
> TRACE_CLOCK_32K_BITSHIFT needs to be larger and that is problematic.
> 
> Regards,
>  Harald
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com