[ltt-dev] feedback on lost events patch ARM linux - 2.6.21 , hardware timer integration

Venu A venu_a at hyd.hellosoft.com
Wed Dec 24 02:44:29 EST 2008


Hello Mathieu / Richard 
Apologize for the long mail... 
Recently we integrated LTTng for ARM-linux 2.6.21 to isolate some thread control issue during VoIP call. 

Some of my experiences on ARM+LTTng: 
1)Procedure 
Being a newbie, had some teething problems in getting LTTng up and running, the mailing list did help a lot though. 
While the procedure for installing is pretty straight forward for x86, same on ARM is different. I will share the procedure (once I get done) for my fellow newbies working on ARM. 

2) Lost events in Normal mode or Flight recorder mode 
Lost events always give the jitters ...there is always a fear of going off course because of loss of critical data. 
There were significant lost events when using the lttd daemon and the flight recorder mode on ARM. When using the daemon, lost events could be explained due to a commit failure as explained in Mathieu's paper. But lost events in flight recorder mode did not make sense. Same did not happen in x86, making the issue more specific to ARM. 
Luckily I chanced upon Richard Purdie's and Mathieu's conversation and integrated the patch. After that no lost events!!! thanks for the patch Richard/Mathieu 

3) Timer IRQ Almost exactly 10ms was being spent in the timer IRQ every single time. Other IRQs would exit in a few microseconds. 
------------------------------------------------------------------------------------------ 
kernel.irq_entry: 42949913.090079349 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, IRQ { 1, kernel }
kernel.irq_exit: 42949913.100001224 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SYSCALL
kernel.soft_irq_entry: 42949913.100002444 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SOFTIRQ { 1 }
kernel.timer_set: 42949913.100003665 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SOFTIRQ { 24015, 0xbf0237e4, 3275424544 }
kernel.timer_set: 42949913.100004886 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SOFTIRQ { 24015, 0xc02dc8fc, 0 }
kernel.soft_irq_exit: 42949913.100006106 (/home/voip/ltt/ltt_traces/flt_oncall/flight-cpu_0), 0, 0, UNNAMED, UNBRANDED, 0, 0x0, SYSCALL { 1 }
-------------------------------------------------------------------------------------------- 
I incorrectly assumed the timing in nanoseconds was accurate since there was no rounding. Only later I realized from the mailing-list that ARM does not have TSC, hence a logical clock is being used. Also in the timer IRQ handler, the jiffies is updated causing 10ms to be incremented at IRQ exit. This really threw me off course for a couple fo days. 
After I integrated Richard's/ Mathieu's patch the kernel_update_time started showing up in between the timer IRQ entry/exit, and I could safely ignore this behavior. 
------------------------------------------------------------------------------------------ 
kernel.irq_entry: 8593.370026854 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, IRQ { 1, kernel }
kernel.update_time: 8593.370028075 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, IRQ { 4296320922, 13836, 260000000, 0, 0 }
kernel.irq_exit: 8593.380001220 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, SYSCALL
kernel.soft_irq_entry: 8593.380002440 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, SOFTIRQ { 1 }
kernel.timer_set: 8593.380003661 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, SOFTIRQ { 1353627, 0xc02dcbf4, 0 }
kernel.soft_irq_exit: 8593.380004882 (/home/voip/ltt/ltt_traces/lttver4/flt_oncall2/flight-cpu_0), 0, 0, swapper, UNBRANDED, 0, 0x0, SYSCALL { 1 }
------------------------------------------------------------------------------------------ 

4) FLIGHT RECORDER MODE 
To minimize impact of LTTng profiling, I prefer using flight recorder mode. I noticed that LTT daemon periodically dumps profiling data into a file causing a burst of CPU processing in eventsd/0 process. This can distort the picture of a real-time or a heavy load process. The downside of using flight recorder mode is obvious ... only last few seconds of data is stored (I still need to understand and use the hybrid mode.).In my case, with all the probes enabled, the default configuration would store 10 seconds of data. 

5) Text Dump 
The text dump utility proved very useful for indepth analysis... I felt the viewer was somwhat hard to navigate as easily. I would be helpful if there was some documentaion of the other plug-ins and their options 

Some questions 
-------------------- 
Hardware Timer Integration for ARM for more precision: (Our Soc has free hardware timers available for use. ) 
Regarding timer integration:
1) For ARM, can the hardware timer be integrated in ltt_get_timestamp32() following powerpc example or in get_cycles() as in mips ?
2) Is there any penalty incurred in reading off a timer register in ltt_get_timestamp32(). if so any ways to minimize ? Is any locking required?
3) Is it ok to reset the dedicated hardware timer at boot up and let it overflow, assuming the heartbeat event and synthetic TSC are used. Does the overflow period have any bearing other than the heartbeat event period ? 
Regarding timer resolution:
1) In mathieu's paper, for x86 "the time spent in the two TSC reads varies between 97 and 105 cycles, with an average of 100.0 cycles". Would this overhead also apply to ARM for directly reading the timer register.
Also for x86, the average time spent in the probe is 288.5 cycles. Can this be used as a ballpark figure for ARM.
2)I feel if profiling event takes x cycles, then the resolution of timer can be set to 2x or 3x. Is it reasonable to assume that the granularity of profiling is limited by the overhead of profiling event itself ?
3) At 300 MHz for a 32 bit timer:
a) with no division factor per cycle - Timer will overflow in 14.3 seconds, resolution is 3.3ns 
b) for a division factor of 256 the timer will overflow in 3660 sec or 1 hour, resolution is 844ns or close to 1 microsecond.
Which is more preferable ?
4) Are there any open issues pending for ARM are during sleep mode and frequency scaling modes ?
Really appreciate the work being done in the LTTng group.

CPU LOADING Suggestion 
for plotting micro CPU loading on x86 and other platforms with dedicated timers:

Since the timer interrupt comes periodically every 10ms,  the time difference between last_event before Timer_IRQ_entry and the previous Timer_IRQ_exit can give the instantaneous loading.
Eg:
Timer_IRQ_entry @ T1
Timer_IRQ_exit @ T2
{
some processing
last_event @ T3
}
Timer_IRQ_entry @ T4

micro processing load % = (Last_event @ T3 - Timer_IRQ_exit @ T2 /  10) *100

This data can be averaged of user defined windows of 10ms, 100ms 1 sec etc to give an average trend of CPU loading. The CPU loading analyzer can help identify regions in the trace where CPU loading is uneven. Any other ideas on this regard ?

regards
Venu Annamraju
Team Lead, VoIP-DSP
HelloSoft India 

The information contained in this e-mail message and/or attachments to it
  may
contain confidential or privileged information. If you are not the intended
  recipient, any dissemination, use, review, distribution, printing or
  copying of the information contained in this e-mail message and/or
  attachments to it are strictly prohibited. If you have received this
  communication in error, please notify us by reply e-mail immediately and
  permanently delete the message and any attachments.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.casi.polymtl.ca/pipermail/lttng-dev/attachments/20081224/4385435e/attachment-0003.htm>


More information about the lttng-dev mailing list