[ltt-dev] feedback on lost events patch ARM linux - 2.6.21 , hardware timer integration

Mathieu Desnoyers mathieu.desnoyers at polymtl.ca
Mon Jan 5 16:39:03 EST 2009


* Venu A (venu_a at hyd.hellosoft.com) wrote:
> Hello Mathieu / Richard 
> Apologize for the long mail... 
> Recently we integrated LTTng for ARM-linux 2.6.21 to isolate some thread control issue during VoIP call. 
> 
> Some of my experiences on ARM+LTTng: 
> 1)Procedure 
> Being a newbie, had some teething problems in getting LTTng up and running, the mailing list did help a lot though. 
> While the procedure for installing is pretty straight forward for x86, same on ARM is different. I will share the procedure (once I get done) for my fellow newbies working on ARM. 
> 
> 2) Lost events in Normal mode or Flight recorder mode 
> Lost events always give the jitters ...there is always a fear of going off course because of loss of critical data. 
> There were significant lost events when using the lttd daemon and the flight recorder mode on ARM. When using the daemon, lost events could be explained due to a commit failure as explained in Mathieu's paper. But lost events in flight recorder mode did not make sense. Same did not happen in x86, making the issue more specific to ARM. 
> Luckily I chanced upon Richard Purdie's and Mathieu's conversation and integrated the patch. After that no lost events!!! thanks for the patch Richard/Mathieu 
> 
> 3) Timer IRQ Almost exactly 10ms was being spent in the timer IRQ every single time. Other IRQs would exit in a few microseconds. 
> ------------------------------------------------------------------------------------------ 
> kernel.irq_entry: 42949913.090079349 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, IRQ { 1, kernel }
> kernel.irq_exit: 42949913.100001224 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SYSCALL
> kernel.soft_irq_entry: 42949913.100002444 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SOFTIRQ { 1 }
> kernel.timer_set: 42949913.100003665 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SOFTIRQ { 24015, 0xbf0237e4, 3275424544 }
> kernel.timer_set: 42949913.100004886 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SOFTIRQ { 24015, 0xc02dc8fc, 0 }
> kernel.soft_irq_exit: 42949913.100006106 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SYSCALL { 1 }
> -------------------------------------------------------------------------------------------- 
> I incorrectly assumed the timing in nanoseconds was accurate since there was no rounding. Only later I realized from the mailing-list that ARM does not have TSC, hence a logical clock is being used. Also in the timer IRQ handler, the jiffies is updated causing 10ms to be incremented at IRQ exit. This really threw me off course for a couple fo days. 
> After I integrated Richard's/ Mathieu's patch the kernel_update_time started showing up in between the timer IRQ entry/exit, and I could safely ignore this behavior. 
> ------------------------------------------------------------------------------------------ 
> kernel.irq_entry: 8593.370026854 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, IRQ { 1, kernel }
> kernel.update_time: 8593.370028075 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, IRQ { 4296320922, 13836, 260000000, 0, 0 }
> kernel.irq_exit: 8593.380001220 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, SYSCALL
> kernel.soft_irq_entry: 8593.380002440 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, SOFTIRQ { 1 }
> kernel.timer_set: 8593.380003661 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, SOFTIRQ { 1353627, 0xc02dcbf4, 0 }
> kernel.soft_irq_exit: 8593.380004882 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, SYSCALL { 1 }
> ------------------------------------------------------------------------------------------ 
> 
> 4) FLIGHT RECORDER MODE 
> To minimize impact of LTTng profiling, I prefer using flight recorder mode. I noticed that LTT daemon periodically dumps profiling data into a file causing a burst of CPU processing in eventsd/0 process. This can distort the picture of a real-time or a heavy load process. The downside of using flight recorder mode is obvious ... only last few seconds of data is stored (I still need to understand and use the hybrid mode.).In my case, with all the probes enabled, the default configuration would store 10 seconds of data. 
> 
> 5) Text Dump 
> The text dump utility proved very useful for indepth analysis... I felt the viewer was somwhat hard to navigate as easily. I would be helpful if there was some documentaion of the other plug-ins and their options 
> 

Hi Venu,

Yes, indeed the fact that the logical clock does not follow time
accurately should probably be told to the user as a warning in LTTV.

> Some questions 
> -------------------- 
> Hardware Timer Integration for ARM for more precision: (Our Soc has
> free hardware timers available for use. ) 
> Regarding timer integration:

Yes, having platform-specific support for this on architectures which
has free timers is definitely the way to go.

> 1) For ARM, can the hardware timer be integrated in
> ltt_get_timestamp32() following powerpc example or in get_cycles() as
> in mips ?

Yes. Just make sure this timer increments (and does not decrement). You
may need to do 1 - value to make it go upward.

> 2) Is there any penalty incurred in reading off a timer register in
> ltt_get_timestamp32(). if so any ways to minimize ? Is any locking
> required?

It's pretty cheap, I would not worry about it. It only disables
preemption and uses an RCU-like mechanism which is very efficient and
has good reentrancy WRT interrupts, nmis, ...

> 3) Is it ok to reset the dedicated hardware timer at boot up and let
> it overflow, assuming the heartbeat event and synthetic TSC are used.
> Does the overflow period have any bearing other than the heartbeat
> event period ? 

In the (old) LTTng version you are using, the overflow has effect on
both the heartbeat timer and on the synthetic tsc 32->64 bits. On newer
LTTng, the heartbeat is not required anymore.

Therefore, both timers must happen at least once every overflow period
while tracing is active.

> Regarding timer resolution:
> 1) In mathieu's paper, for x86 "the time spent in the two TSC reads
> varies between 97 and 105 cycles, with an average of 100.0 cycles".
> Would this overhead also apply to ARM for directly reading the timer
> register.

No, it's platform-dependant. And LTTng changed a lot since these timings
have been taken.

> Also for x86, the average time spent in the probe is 288.5 cycles. Can
> this be used as a ballpark figure for ARM.

I don't think so. Very platform-dependent.

> 2)I feel if profiling event takes x cycles, then the resolution of
> timer can be set to 2x or 3x. Is it reasonable to assume that the
> granularity of profiling is limited by the overhead of profiling event
> itself ?

You can dig back in an LKML thread for a discussion between Linus and me
about this.

http://lkml.org/lkml/2008/9/23/186

> 3) At 300 MHz for a 32 bit timer:
> a) with no division factor per cycle - Timer will overflow in 14.3
> seconds, resolution is 3.3ns 
> b) for a division factor of 256 the timer will overflow in 3660 sec or
> 1 hour, resolution is 844ns or close to 1 microsecond.
> Which is more preferable ?

Highest resolution (option a).

> 4) Are there any open issues pending for ARM are during sleep mode and
> frequency scaling modes ?

Probably, I have not looked much into this area yet. Testing would be
welcome, and there will probably be problems.

> Really appreciate the work being done in the LTTng group.
> 

You are very welcome. Thanks for the useful feedback. Hopefully we'll
manage to integrate your work in the LTTng tree.

Mathieu

> CPU LOADING Suggestion 
> for plotting micro CPU loading on x86 and other platforms with dedicated timers:
> 
> Since the timer interrupt comes periodically every 10ms,  the time difference between last_event before Timer_IRQ_entry and the previous Timer_IRQ_exit can give the instantaneous loading.
> Eg:
> Timer_IRQ_entry @ T1
> Timer_IRQ_exit @ T2
> {
> some processing
> last_event @ T3
> }
> Timer_IRQ_entry @ T4
> 
> micro processing load % = (Last_event @ T3 - Timer_IRQ_exit @ T2 /  10) *100
> 
> This data can be averaged of user defined windows of 10ms, 100ms 1 sec etc to give an average trend of CPU loading. The CPU loading analyzer can help identify regions in the trace where CPU loading is uneven. Any other ideas on this regard ?
> 
> regards
> Venu Annamraju
> Team Lead, VoIP-DSP
> HelloSoft India 
> 
> The information contained in this e-mail message and/or attachments to it
>   may
> contain confidential or privileged information. If you are not the intended
>   recipient, any dissemination, use, review, distribution, printing or
>   copying of the information contained in this e-mail message and/or
>   attachments to it are strictly prohibited. If you have received this
>   communication in error, please notify us by reply e-mail immediately and
>   permanently delete the message and any attachments.
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list