[ltt-dev] Help with problem (maybe) in LTTng on 2.6.37 on ARM

B.J. Buchalter bj at mhlabs.com
Sun May 8 02:19:49 EDT 2011


Hi Folks,

I have been trying to use LTTng to track down some scheduling issues in a program that I writing. The issue I am dealing with is basically deadline misses in a user space program that uses isochronous FireWire I/O.

In any case, the results I am getting using LLTng are puzzling, and I am not sure whether they are indicating a severe problem with interrupts being disabled for too, or if it is a problem with some implementation details on the platform I running on.

The platform is TI816x -- a relatively new ARM Cortex-A8 SOC; I have applied the LTTng patchset to the patched 2.6.37 kernel that TI has provided to support this SOC.

There are two things that are happening in traces that I don't understand; they maybe related:

1) I am seeing softirq_entry and softirq_exit events for ids that are much larger the 32 softirqs the could exist, and definitely larger than the 10 that are defined in my kernel. For example, I see a softirq_raise for softirq_id 1 (TIMER), followed by a softirq_entry for softirq_id 255, and then softirq_exit for softirq_id 112.

2) I am also seeing an enormous amount of time being spent in the the softirq that fires after softirq_raise for softirq_id 1 is logged.

I think that what may be happening is that the timer interrupt is adjusting the trace clock in some way. The sequence I see is:

kernel.irq_entry: 28.180053710 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, IRQ { ip = 3221504228, handler = 0xc0054874, irq_id = 67, kernel_mode = 1 }
kernel.timer_update_time: 28.180054931 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, IRQ { jiffies = 4294977392, xtime_sec = 1304810558, xtime_nsec = 329927105, walltomonotonic_sec = -1304810158, walltomonotonic_nsec = 622485308 }
kernel.softirq_raise: 28.180056152 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.softirq_raise: 28.180057373 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.irq_exit: 28.180058593 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SYSCALL { handled = 1 }
kernel.softirq_entry: 28.180059814 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SOFTIRQ { softirq_id = 255 [softirq 255] }

//****** note jump of approximately 10ms

kernel.timer_set: 28.190001220 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SOFTIRQ { expires = 10097, function = 0xc00af198, data = 0 }
kernel.timer_set: 28.190002441 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SOFTIRQ { expires = 10097, function = 0xbf006210, data = 0 }
kernel.softirq_exit: 28.190003662 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SYSCALL { softirq_id = 112 [softirq 112] }
kernel.irq_entry: 28.190004882 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, IRQ { ip = 3221504228, handler = 0xc0054874, irq_id = 67, kernel_mode = 1 }
kernel.timer_update_time: 28.190006103 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, IRQ { jiffies = 4294977393, xtime_sec = 1304810558, xtime_nsec = 339927009, walltomonotonic_sec = -1304810158, walltomonotonic_nsec = 622485308 }
kernel.softirq_raise: 28.190007324 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.softirq_raise: 28.190008544 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.irq_exit: 28.190009765 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SYSCALL { handled = 1 }
kernel.softirq_entry: 28.190010986 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SOFTIRQ { softirq_id = 255 [softirq 255] }

//****** note jump of approximately 10ms

kernel.timer_set: 28.200001220 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SOFTIRQ { expires = 10098, function = 0xc00af198, data = 0 }
kernel.timer_set: 28.200002441 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SOFTIRQ { expires = 10098, function = 0xbf006210, data = 0 }
kernel.softirq_exit: 28.200003662 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, SYSCALL { softirq_id = 112 [softirq 112] }
kernel.irq_entry: 28.200004882 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, IRQ { ip = 3221504228, handler = 0xc0054874, irq_id = 67, kernel_mode = 1 }

(I added the notes about the time jumps to the trace output).

So, my two main questions are:

(1) Why is the softirq_id being reported incorrectly for kernel.softirq_entry and kernel.softirq_exit. It is passed as a simple int in trace_softirq_entry, so there doesn't seem to be a lot of places for it to go wrong, but the probe function in the ltt kernel modules has some complexity to it that I don't understand.

(2) Why does the timestamp jump by approximately 10ms between the kernel.softirq_entry for the timer, and the kernel.timer_set? This is what looks like a huge usage of cpu time, but it is not clear to me that it is accurate. 

In any case, any guidance as to how I might go about resolving these issues so that I can get accurate traces would be greatly appreciated.

One final question, that doesn't seem to be fully addressed by docs I have read. If I want to add custom trace points to the kernel and have them supported by the full toolchain (e.g. in the kernel, enabled for tracing, into the trace files and then interpreted by lttv), what steps do I need to take. If this is documented somewhere, and I am just missing it, please point me at the docs.

Best regards,

B.J. Buchalter
	






More information about the lttng-dev mailing list