[ltt-dev] Kernel crashes when creating a trace

Mathieu Desnoyers mathieu.desnoyers at polymtl.ca
Wed Nov 18 16:59:44 EST 2009


* Ashwin Tanugula (ashwin.tanugula at broadcom.com) wrote:
> 
> 
> > >
> > > Without "return 1;", it still hangs, I will enable lockdep and send you the log ASAP.
> > >
> 
> > Oh ! I found the mistake. Erroneous "cmpxchg" in the 32-bit version of
> > the trace clock. That should work without the return 1; now. Please try 0.174.
> 
> I am using patch-2.6.31.6-lttng-0.176, ltt-control-0.77-18112009 and lttv-0.12.23-18112009.
> 
> And I still see the hang without "return 1:"

OK. Can you please try with this patch applied on top of 0.176 ? This
should be very close to mips64 which, AFAIK, should work and does not
need this workaround to deal with 64-bit values. Sorry it's taking so
long. It would help if I could get my hands on a MIPS32 board. More
comments after the patch.


mips32 fix trace clock spinlock

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at polymtl.ca>
---
 arch/mips/kernel/trace-clock.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6-lttng/arch/mips/kernel/trace-clock.c
===================================================================
--- linux-2.6-lttng.orig/arch/mips/kernel/trace-clock.c	2009-11-18 16:47:48.000000000 -0500
+++ linux-2.6-lttng/arch/mips/kernel/trace-clock.c	2009-11-18 16:50:04.000000000 -0500
@@ -61,22 +61,24 @@ static raw_spinlock_t trace_clock_lock =
 
 static inline u64 trace_clock_cmpxchg64(u64 *ptr, u64 old, u64 new)
 {
+	unsigned long flags;
 	u64 val;
 
+	local_irq_save(flags);
+	__raw_spin_lock(&trace_clock_lock);
 	val = *ptr;
 	if (likely(val == old))
 		*ptr = new;
+	__raw_spin_unlock(&trace_clock_lock);
+	local_irq_restore(flags);
 	return val;
 }
 
 notrace u64 trace_clock_async_tsc_read(void)
 {
 	u64 new_tsc, last_tsc;
-	unsigned long flags;
 
 	WARN_ON(!async_tsc_refcount || !async_tsc_enabled);
-	local_irq_save(flags);
-	__raw_spin_lock(&trace_clock_lock);
 	new_tsc = trace_clock_read_synthetic_tsc();
 	do {
 		last_tsc = trace_clock_last_tsc;
@@ -92,8 +94,6 @@ notrace u64 trace_clock_async_tsc_read(v
 		 */
 	} while (trace_clock_cmpxchg64(&trace_clock_last_tsc, last_tsc,
 				       new_tsc) < new_tsc);
-	__raw_spin_unlock(&trace_clock_lock);
-	local_irq_restore(flags);
 	return new_tsc;
 }
 EXPORT_SYMBOL_GPL(trace_clock_async_tsc_read);

> 
> > Now if you still have some "Kernel/LTTV event size differs for event.."
> > that's another problem we'll have to address. It seems to work fine on x86 32/64 here though.
> 
> 
> With "return 1;" I can sometimes generate the trace and sometime I cannot as lttd is still running even after destroying the trace.
> 
> Please look at the attached log file.
> 

running lttd in verbose mode could eventually help here. (see lttd
--help). You would have to start it separately from lttctl though (see
lttctl --help to see how to create a trace, then start lttd, and then
start tracing with lttctl).

> > Do you view the trace on the same architecture you record it on ? What are the arch endiannesses ?
> 
> No, I am not viewing on the same architecture.
> 
> I am viewing the mips32 (little endian) generated trace on an x86 32-bit machine.
> 
> I am getting the following error(s) when I try to view the trace data
> 
> i) ERROR:tracefile.c:1676:ltt_seek_next_event: assertion failed: (tf->event.offset < tf->buffer.data_size)
> ./lttv-gui: line 10: 12602 Aborted                 (core dumped) $LTTV_CMD.real -m lttvwindow -m guievents -m guifilter -m guicontrolflow -m resourceview -m guistatistics -m guitracecontrol $*
> 
> 

This is a bit weird. Make sure you do a make clean of your lttv tree and
rebuild. Also, compiling lttv with:

./configure CFLAGS=-g
make
make install

and running:

gdb lttv.real
set args -v -d -m textDump -t /tmp/trace
run

and showing the stack trace with:

bt full

would help here. (btw, lttv -m textDump --help   shows you the textDump
help information)

You could also put the trace somewhere I can access it, so I can look at
it with lttv on my local machine.

> ii) ** ERROR **: Process 495 has been created at [0.000000000] and inserted at [0.000000005] before 
> fork on cpu 1[0.000000005].
> Probably an unsynchronized TSC problem on the traced machine.
> aborting...
> ./lttv-gui: line 10: 12726 Aborted                 (core dumped) $LTTV_CMD.real -m lttvwindow -m guievents -m guifilter -m guicontrolflow -m resourceview -m guistatistics -m guitracecontrol $*
> 
> I guess 2 happens because of the "return 1;" ??

Yes, clearly.

Thanks,

Mathieu

> 
> 
> Thanks,
> Ashwin
> 
> 
> > Mathieu
> 
> 
> 



-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list