[ltt-dev] Kernel crashes when creating a trace

Mathieu Desnoyers mathieu.desnoyers at polymtl.ca
Tue Nov 10 23:25:39 EST 2009


* Ashwin Tanugula (ashwin.tanugula at broadcom.com) wrote:
> Hi Mathieu,
> 
> Sorry to bother you, but my mail never got posted. ( I tried posting
> it two times)
> 

Hi Ashwin,

Hrm, this is weird. Did you try posting it to
ltt-dev at lists.casi.polymtl.ca without success ?

> Can you please help me with the following problem.
> 

Sure, Thanks for the thorough bug report. Let's look into this.

> 
> I patched my kernel (MIPS) with the patch-2.6.31.5-lttng-0.165.tar.bz2 patchset.
> 

Can you try with 0.167 instead and see if the problem is still there
first ? I fixed mostly problems with 32-bit machines and erroneous
types. I'm not sure if your MIPS flavor is 32 or 64-bit. However the
32-bit problem occured at trace teardown, not at trace creation.

> I am using the ltt-control-0.72-23102009.tar.gz control package.
> 
> I tried ltt-control-0.71-30092009, but it has the same problem.
> 
> I am able to patch and compile the kernel fine, but when I try to create a trace, then my kernel crashes.
> 
> Here is the dump.
> 
> # lttctl -c trace2
> Linux Trace Toolkit Trace Contro------------[ cut here ]------------
> WARNING: at arch/mips/kernel/trace-clock.c:77 trace_clock_async_tsc_read+0x30/0x17c()

At least some of my own warnings are kicking in. There is nothing like
safety nets when things go wrong :)

So this warning is:

WARN_ON(!async_tsc_refcount || !async_tsc_enabled);

So I think the problem that there is a trace clock read within the
get_trace_clock() call. Hrm. mips get_trace_clock is calling
get_synthetic_tsc(), which reads the trace clock. That's wrong.

I seem to be going back and forth between mutexes and spinlocks to
protect the trace clock:

- spinlocks permits the trace clock enable/disable/update code to be called
  from various contexts, including returning from PM sleep, cpu hotplug,
  etc.
- mutex permit to call the underlying get/put_synthetic_tsc() with,
  which uses get/put_online_cpus() (which can sleep) within the
  trace-clock mutex. However it cannot be called from some of the cpu
  hotplug notifiers.

Also the fact that the enabling synthetic tsc uses
trace_clock_read_synthetic_tsc()is not good, because it uses a layer
that depends on reading the 32 lsb, which creates a circular dependency.

To make things clearer:

- synthetic TSC extends the MIPS 32-bit cycle counter to 64-bit.
- MIPS trace clock modifies that synthetic TSC values to ensure it works
  fine on MIPS processors that have no synchronized TSCs.

So currently, these two layers are well separated, but they are so
dependent on each other that I might have to restructure that a bit.
The solution is not clear to me currently. I think I'll sleep on it and
try poking this tomorrow.

If you remove the WARN_ON from arch/mips/kernel/trace-clock.c:77, does
it help ?

Thanks,

Mathieu

> 
> 
> Controlling tModules linked in:race : trace2
> 
> 
> lttctl: CreatinCall Trace:
> g trace
> [<80018b84>] dump_stack+0x8/0x34
> [<8003e59c>] warn_slowpath_common+0x70/0x98
> [<8001bfc0>] trace_clock_async_tsc_read+0x30/0x17c
> [<80086a18>] trace_clock_read_synthetic_tsc+0x94/0xbc
> [<80086d14>] get_synthetic_tsc+0xe0/0x1a0
> [<8001c1b0>] get_trace_clock+0x34/0x17c
> [<8023019c>] ltt_trace_alloc+0x60/0x56c
> [<8023bdfc>] alloc_write+0x130/0x17c
> [<800c201c>] vfs_write+0xbc/0x180
> [<800c2b24>] sys_write+0x60/0x130
> [<8000339c>] stack_done+0x20/0x3c
> 
> ---[ end trace bc41c847fcc9449a ]---
> ------------[ cut here ]------------
> WARNING: at arch/mips/kernel/trace-clock.c:77 trace_clock_async_tsc_read+0x30/0x17c()
> Modules linked in:
> Call Trace:
> [<80018b84>] dump_stack+0x8/0x34
> [<8003e59c>] warn_slowpath_common+0x70/0x98
> [<8001bfc0>] trace_clock_async_tsc_read+0x30/0x17c
> [<80086a18>] trace_clock_read_synthetic_tsc+0x94/0xbc
> [<8001c014>] trace_clock_async_tsc_read+0x84/0x17c
> [<80086a18>] trace_clock_read_synthetic_tsc+0x94/0xbc
> [<80086d14>] get_synthetic_tsc+0xe0/0x1a0
> [<8001c1b0>] get_trace_clock+0x34/0x17c
> [<8023019c>] ltt_trace_alloc+0x60/0x56c
> [<8023bdfc>] alloc_write+0x130/0x17c
> [<800c201c>] vfs_write+0xbc/0x180
> [<800c2b24>] sys_write+0x60/0x130
> [<8000339c>] stack_done+0x20/0x3c
> 
> ---[ end trace bc41c847fcc9449b ]---
> 
> 
> 
> With  ltt-control-0.70-18082009 and older verions, I am getting a segfault
> 
> # lttctl -c trace1
> Linux Trace Toolkit Trace Control 0.70-18082009
> 
> Controlling trace : trace1
> 
> lttctl: Creating trace
> Segmentation fault
> 
> Here is the strace log
> 
> # strace lttctl -c trace1
> execve("./lttctl", ["lttctl", "-c", "trace1"], [/* 11 vars */]) = 0
> old_mmap(NULL, 20, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aab1000
> stat("/etc/ld.so.cache", 0x7fe4e848)    = -1 ENOENT (No such file or directory)
> open("/usr/local/lib/liblttctl.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
> open("/lib/liblttctl.so.0", O_RDONLY)   = 3
> fstat(3, {st_mode=S_IFREG|0755, st_size=31577, ...}) = 0
> old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aab2000
> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\10\0\1\0\0\0@\t\0\0004\0\0\0h"..., 4096) = 4096
> old_mmap(NULL, 20480, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aab3000
> old_mmap(0x2aab3000, 11140, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x2aab3000
> old_mmap(0x2aab6000, 3132, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x2000) = 0x2aab6000
> old_mmap(0x2aab7000, 3152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2a
> ab7000
> close(3)                                = 0
> munmap(0x2aab2000, 4096)                = 0
> open("/usr/local/lib/libc.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
> open("/lib/libc.so.0", O_RDONLY)        = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=837630, ...}) = 0
> old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aab2000
> read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\10\0\1\0\0\0p\256\0\0004\0\0\0,"..., 4096) = 4096
> old_mmap(NULL, 675840, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aab8000
> old_mmap(0x2aab8000, 641908, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x2aab8000
> old_mmap(0x2ab55000, 8032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x9c000) = 0x2ab55000
> old_mmap(0x2ab57000, 24408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2
> ab57000
> close(3)                                = 0
> munmap(0x2aab2000, 4096)                = 0
> open("/lib/libc.so.0", O_RDONLY)        = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=837630, ...}) = 0
> close(3)                                = 0
> old_mmap(NULL, 2768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aab2000
> set_thread_area(0x2aab9440)             = 0
> mprotect(0x2ab55000, 4096, PROT_READ)   = 0
> mprotect(0x2aaaf000, 4096, PROT_READ)   = 0
> ioctl(0, TIOCNXCL, {B115200 opost isig icanon echo ...}) = 0
> ioctl(1, TIOCNXCL, {B115200 opost isig icanon echo ...}) = 0
> brk(0)                                  = 0x406000
> brk(0x407000)                           = 0x407000
> write(1, "Linux Trace Toolkit Trace Control"..., 48Linux Trace Toolkit Trace Control 0.70-18082009
> ) = 48
> write(1, "\n"..., 1
> )                    = 1
> write(1, "Controlling trace : trace1\n"..., 27Controlling trace : trace1
> ) = 27
> write(1, "\n"..., 1
> )                    = 1
> open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
> ioctl(3, TIOCNXCL, 0x7fe4b980)          = -1 ENOTTY (Inappropriate ioctl for device)
> brk(0x408000)                           = 0x408000
> read(3, "rootfs / rootfs rw 0 0\nubi0:rootf"..., 4096) = 580
> open("/proc/sys/debug/ltt/control/", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
> fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
> brk(0x409000)                           = 0x409000
> close(4)                                = 0
> write(1, "lttctl: Creating trace\n"..., 23lttctl: Creating trace
> ) = 23
> open("/proc/sys/debug/ltt/control/trace1", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOENT (No such fi
> le or directory)
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV +++
> 
> Thanks,
> Ashwin

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list