[ltt-dev] Kernel crashes when creating a trace
Mathieu Desnoyers
mathieu.desnoyers at polymtl.ca
Wed Nov 18 16:59:44 EST 2009
* Ashwin Tanugula (ashwin.tanugula at broadcom.com) wrote:
>
>
> > >
> > > Without "return 1;", it still hangs, I will enable lockdep and send you the log ASAP.
> > >
>
> > Oh ! I found the mistake. Erroneous "cmpxchg" in the 32-bit version of
> > the trace clock. That should work without the return 1; now. Please try 0.174.
>
> I am using patch-2.6.31.6-lttng-0.176, ltt-control-0.77-18112009 and lttv-0.12.23-18112009.
>
> And I still see the hang without "return 1:"
OK. Can you please try with this patch applied on top of 0.176 ? This
should be very close to mips64 which, AFAIK, should work and does not
need this workaround to deal with 64-bit values. Sorry it's taking so
long. It would help if I could get my hands on a MIPS32 board. More
comments after the patch.
mips32 fix trace clock spinlock
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at polymtl.ca>
---
arch/mips/kernel/trace-clock.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
Index: linux-2.6-lttng/arch/mips/kernel/trace-clock.c
===================================================================
--- linux-2.6-lttng.orig/arch/mips/kernel/trace-clock.c 2009-11-18 16:47:48.000000000 -0500
+++ linux-2.6-lttng/arch/mips/kernel/trace-clock.c 2009-11-18 16:50:04.000000000 -0500
@@ -61,22 +61,24 @@ static raw_spinlock_t trace_clock_lock =
static inline u64 trace_clock_cmpxchg64(u64 *ptr, u64 old, u64 new)
{
+ unsigned long flags;
u64 val;
+ local_irq_save(flags);
+ __raw_spin_lock(&trace_clock_lock);
val = *ptr;
if (likely(val == old))
*ptr = new;
+ __raw_spin_unlock(&trace_clock_lock);
+ local_irq_restore(flags);
return val;
}
notrace u64 trace_clock_async_tsc_read(void)
{
u64 new_tsc, last_tsc;
- unsigned long flags;
WARN_ON(!async_tsc_refcount || !async_tsc_enabled);
- local_irq_save(flags);
- __raw_spin_lock(&trace_clock_lock);
new_tsc = trace_clock_read_synthetic_tsc();
do {
last_tsc = trace_clock_last_tsc;
@@ -92,8 +94,6 @@ notrace u64 trace_clock_async_tsc_read(v
*/
} while (trace_clock_cmpxchg64(&trace_clock_last_tsc, last_tsc,
new_tsc) < new_tsc);
- __raw_spin_unlock(&trace_clock_lock);
- local_irq_restore(flags);
return new_tsc;
}
EXPORT_SYMBOL_GPL(trace_clock_async_tsc_read);
>
> > Now if you still have some "Kernel/LTTV event size differs for event.."
> > that's another problem we'll have to address. It seems to work fine on x86 32/64 here though.
>
>
> With "return 1;" I can sometimes generate the trace and sometime I cannot as lttd is still running even after destroying the trace.
>
> Please look at the attached log file.
>
running lttd in verbose mode could eventually help here. (see lttd
--help). You would have to start it separately from lttctl though (see
lttctl --help to see how to create a trace, then start lttd, and then
start tracing with lttctl).
> > Do you view the trace on the same architecture you record it on ? What are the arch endiannesses ?
>
> No, I am not viewing on the same architecture.
>
> I am viewing the mips32 (little endian) generated trace on an x86 32-bit machine.
>
> I am getting the following error(s) when I try to view the trace data
>
> i) ERROR:tracefile.c:1676:ltt_seek_next_event: assertion failed: (tf->event.offset < tf->buffer.data_size)
> ./lttv-gui: line 10: 12602 Aborted (core dumped) $LTTV_CMD.real -m lttvwindow -m guievents -m guifilter -m guicontrolflow -m resourceview -m guistatistics -m guitracecontrol $*
>
>
This is a bit weird. Make sure you do a make clean of your lttv tree and
rebuild. Also, compiling lttv with:
./configure CFLAGS=-g
make
make install
and running:
gdb lttv.real
set args -v -d -m textDump -t /tmp/trace
run
and showing the stack trace with:
bt full
would help here. (btw, lttv -m textDump --help shows you the textDump
help information)
You could also put the trace somewhere I can access it, so I can look at
it with lttv on my local machine.
> ii) ** ERROR **: Process 495 has been created at [0.000000000] and inserted at [0.000000005] before
> fork on cpu 1[0.000000005].
> Probably an unsynchronized TSC problem on the traced machine.
> aborting...
> ./lttv-gui: line 10: 12726 Aborted (core dumped) $LTTV_CMD.real -m lttvwindow -m guievents -m guifilter -m guicontrolflow -m resourceview -m guistatistics -m guitracecontrol $*
>
> I guess 2 happens because of the "return 1;" ??
Yes, clearly.
Thanks,
Mathieu
>
>
> Thanks,
> Ashwin
>
>
> > Mathieu
>
>
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
More information about the lttng-dev
mailing list