[ltt-dev] Kernel crashes when creating a trace

Mathieu Desnoyers mathieu.desnoyers at polymtl.ca
Mon Nov 16 13:35:41 EST 2009


Hi Ashwin,

I spent the last 3 days stress-testing lttng with concurrent:

- cpu hotplug
- trace start/stop
- marker armall/disarmall

For minutes and fixed all the problems I encountered.

The reason why I had a hard time reproducing your error scenario was
that the right timing occurs on UP systems.

All the fixes are integrated in LTTng 0.170. I am very confident that
things will work fine now. Please keep testing with the "return 1;" for
the MIPS trace clock, at it is not fixed yet.

Thanks,

Mathieu

* Ashwin Tanugula (ashwin.tanugula at broadcom.com) wrote:
> > > 
> > > I have tested it again and now it's now working now. Sorry for prematurely sending the email.
> > > 
> > > I think we are back to where we were.
> > > 
> > > With "return 1;" at the beginning of trace_clock_async_tsc_read, I can 
> > > start the trace, but the kernel crashes while stopping the trace> 
> 
> > OK.. let's continue to test with the "return 1;" in trace_clock_async_tsc_read and try to fix the teardown crash first. Can you try the new 
> > 0.169 ? I fixes a teardown race with the channel list.
> 
> Mathieu,
> 
> This is what I see now. Please look at all three call traces as all of them are slightly different.
> 
> 
> First Run
> ==========
> # /root/ltt-control-0.74-12112009/lttctl/.libs/lttctl -D trace1
> Linux Trace Toolkit Trace Control 0.74-12112009
> 
> Controlling trace : trace1
> 
> lttctl: Pausing trace
> lttctl: Destroying trace
> LTT: 348 events written in channel metadata (cpu 0, index 0)
> LTT: 15 events written in channel block (cpu 0, index 0)
> LTT: 25 events written in channel block (cpu 1, index 0)
> CPU 1 Unable to handle kernel paging request at virtual address 0000002c, epc == 802340bc, ra == 80234
> 0e8
> Oops[#1]:
> Cpu 1
> $ 0   : 00000000 10008b00 8b889200 00000004
> $ 4   : 00000000 00000012 ffffffff 000025f6
> $ 8   : 00000001 00000000 00000001 6e69202c
> $12   : 2aab3d60 018b6739 0000015c 00000000
> $16   : 00080000 8c1e8000 00040000 00000000
> $20   : 8c1e8034 8b99d300 80460000 00000000
> $24   : 00000001 802824a0                  
> $28   : 8ab30000 8ab31e20 8b99d300 802340e8
> Hi    : 00000000
> Lo    : 000000bd
> epc   : 802340bc ltt_chanbuf_free+0xc8/0x294
>     Not tainted
> ra    : 802340e8 ltt_chanbuf_free+0xf4/0x294
> Status: 10008b03    KERNEL EXL IE 
> Cause : 00800008
> BadVA : 0000002c
> PrId  : 0002a044 (Broadcom BMIPS4380)
> Modules linked in:
> Process lttd (pid: 348, threadinfo=8ab30000, task=8c0e9018, tls=2aab9440)
> Stack : 8bd752a0 0000015c 8c1e8034 00000000 00000000 8b99f60c 8b99d31c 80233ff4
>         8bd78c18 8bd75324 8c0191c0 8b97a60c 00404918 7fd6ec1c 7fd6eba8 80245b38
>         00000002 00000001 00000005 00000004 8b99d300 8ab2bb00 802342e0 804d0000
>         804d0000 800e1724 00000010 8ab2bb00 00000010 802371f8 00000010 8ab2bb00
>         8bd78c18 8bd75324 800c3878 800c3830 8ab2bb00 8b97a60c 8ab2bb00 00000000
>         ...
> Call Trace:
> [<802340bc>] ltt_chanbuf_free+0xc8/0x294
> [<80245b38>] kref_put+0xdc/0x128
> [<802342e0>] ltt_chanbuf_release_read+0x58/0x124
> [<802371f8>] ltt_release+0x10/0x20
> [<800c3878>] __fput+0xec/0x21c
> [<800bf0d8>] filp_close+0x5c/0xa4
> [<800c0c8c>] sys_close+0xc8/0x1a4
> [<8000339c>] stack_done+0x20/0x3c
> 
> 
> Code: 8c440000  00a34006  00081880 <8c85002c> 00081100  00431023  00451021  8c450008  10a0ffea 
> Disabling lock debugging due to kernel taint
> LTT: 140 events written in channel fd_state (cpu 0, index 0)
> LTT: 358 events written in channel fs (cpu 0, index 0)
> LTT: 47 events written in channel fs (cpu 1, index 0)
> LTT: 1 events written in channel global_state (cpu 0, index 0)
> LTT: 10 events written in channel irq_state (cpu 0, index 0)
> LTT: 1 events written in channel jbd2 (cpu 0, index 0)
> LTT: 1 events written in channel jbd2 (cpu 1, index 0)
> LTT: 61606 events written in channel kernel (cpu 0, index 0)
> LTT: 40704 events written in channel kernel (cpu 0, index 1)
> LTT: 72922 events written in channel kernel (cpu 1, index 0)
> LTT: 4893 events written in channel kernel (cpu 1, index 1)
> LTT: 594 events written in channel mm (cpu 0, index 0)
> LTT: 43 events written in channel mm (cpu 1, index 0)
> LTT: 14 events written in channel net (cpu 0, index 0)
> LTT: 2 events written in channel netif_state (cpu 0, index 0)
> LTT: 103 events written in channel rcu (cpu 0, index 0)
> LTT: 10 events written in channel rcu (cpu 1, index 0)
> LTT: 32 events written in channel softirq_state (cpu 0, index 0)
> LTT: 1 events written in channel swap_state (cpu 0, index 0)
> LTT: 43 events written in channel task_state (cpu 0, index 0)
> LTT: 171 events written in channel vm_state (cpu 0, index 0)
> # SysRq : Show Blocked State
>   task                PC stack   pid father
>  
> 
> 
> Second Run
> ==========
> # /root/ltt-control-0.74-12112009/lttctl/.libs/lttctl -D trace1
> Linux Trace Toolkit Trace Control 0.74-12112009
> 
> Controlling trace : trace1
> 
> lttctl: Pausing trace
> lttctl: Destroying trace
> LTT: 29 events written in channel block (cpu 1, index 0)
> LTT: 1 events written in channel block (cpu 0, index 0)
> CPU 1 Unable to handle kernel paging request at virtual address 0000002c, epc == 802340bc, ra == 80234
> 0e8
> Oops[#1]:
> Cpu 1
> $ 0   : 00000000 10008b00 8c0bb300 00000004
> $ 4   : 00000000 00000012 ffffffff 000024a7
> $ 8   : 00000001 00000000 00000001 70632820
> $12   : 2aab3d60 018b6739 00000001 00000000
> $16   : 00080000 8c0ac140 00040000 00000000
> $20   : 8c0ac174 8b992480 80460000 00000000
> $24   : 00000000 802824a0                  
> $28   : 8ab34000 8ab35e20 8b992480 802340e8
> Hi    : 00000000
> Lo    : 000000bd
> epc   : 802340bc ltt_chanbuf_free+0xc8/0x294
>     Not tainted
> ra    : 802340e8 ltt_chanbuf_free+0xf4/0x294
> Status: 10008b03    KERNEL EXL IE 
> Cause : 00800008
> BadVA : 0000002c
> PrId  : 0002a044 (Broadcom BMIPS4380)
> Modules linked in:
> Process lttd (pid: 350, threadinfo=8ab34000, task=8c0609f8, tls=2aab9440)
> Stack : 8bd56e7c 00000001 8c0ac174 00000000 00000000 8b993f8c 8b99249c 80233ff4
>         8bd55ac8 8bd56f00 8c019ec0 8b9a2e8c 00404918 7fbccc4c 7fbccbd8 80245b38
>         00000002 00000001 00000007 00000006 8b992480 8ab26840 802342e0 804d0000
>         804d0000 800e1724 00000010 8ab26840 00000010 802371f8 00000010 8ab26840
>         8bd55ac8 8bd56f00 800c3878 800c3830 8ab26840 8b9a2e8c 8ab26840 00000000
>         ...
> Call Trace:
> [<802340bc>] ltt_chanbuf_free+0xc8/0x294
> [<80245b38>] kref_put+0xdc/0x128
> [<802342e0>] ltt_chanbuf_release_read+0x58/0x124
> [<802371f8>] ltt_release+0x10/0x20
> [<800c3878>] __fput+0xec/0x21c
> [<800bf0d8>] filp_close+0x5c/0xa4
> [<800c0c8c>] sys_close+0xc8/0x1a4
> [<8000339c>] stack_done+0x20/0x3c
> 
> 
> Code: 8c440000  00a34006  00081880 <8c85002c> 00081100  00431023  00451021  8c450008  10a0ffea 
> Disabling lock debugging due to kernel taint
> LTT: 140 events written in channel fd_state (cpu 0, index 0)
> LTT: 168 events written in channel fs (cpu 0, index 0)
> LTT: 50 events written in channel fs (cpu 1, index 0)
> CPU 1 Unable to handle kernel paging request at virtual address 00000040, epc == 80234040, ra == 80245
> b38
> Oops[#2]:
> LTT: 1 events written in channel global_state (cpu 0, index 0)
> Cpu 1
> $ 0   : 00000000 00000001 8b890d84 00010000
> $ 4   : 8b99251c 80233ff4 8ab266c0 00000003
> $ 8   : 804c4fb8 8c0b25f0 00000000 8c046000
> $12   : ffffffff 8ab35940 8040a2d0 67677562
> $16   : 8b99251c 8c0ac000 8bd55c18 00000004
> $20   : 8c019ec0 00000000 8b840574 00000001
> $24   : 00000000 80084330                  
> $28   : 8ab34000 8ab35b30 8b992500 80245b38
> Hi    : 00000005
> Lo    : bd6eed40
> epc   : 80234040 ltt_chanbuf_free+0x4c/0x294
>     Tainted: G      D   
> ra    : 80245b38 kref_put+0xdc/0x128
> Status: 10008b03    KERNEL EXL IE 
> Cause : 00800008
> BadVA : 00000040
> PrId  : 0002a044 (Broadcom BMIPS4380)
> Modules linked in:
> Process lttd (pid: 350, threadinfo=8ab34000, task=8c0609f8, tls=2aab9440)
> Stack : 8bd54148 800dbb94 804c4fb8 8ab26780 00000000 8b993f8c 8b99251c 80233ff4
>         8bd55c18 8bd52090 8c019ec0 80500900 8b840574 00000001 8b992480 80245b38
>         00000002 00000001 00000006 00000005 8b992500 8ab266c0 802342e0 804d0000
>         804d0000 800e1724 00000010 8ab266c0 00000010 802371f8 00000010 8ab266c0
>         8bd55c18 8bd52090 800c3878 800c3830 00000000 ffffffff 8ab266c0 00000000
>         ...
> Call Trace:
> [<80234040>] ltt_chanbuf_free+0x4c/0x294
> [<80245b38>] kref_put+0xdc/0x128
> [<802342e0>] ltt_chanbuf_release_read+0x58/0x124
> [<802371f8>] ltt_release+0x10/0x20
> [<800c3878>] __fput+0xec/0x21c
> [<800bf0d8>] filp_close+0x5c/0xa4
> [<8004150c>] put_files_struct+0x1ec/0x228
> [<8004319c>] do_exit+0x13c/0x764
> [<80018aa0>] die+0x108/0x110
> [<8001d7d0>] do_page_fault+0x31c/0x41c
> [<80001420>] ret_from_exception+0x0/0x24
> [<802340bc>] ltt_chanbuf_free+0xc8/0x294
> [<80245b38>] kref_put+0xdc/0x128
> [<802342e0>] ltt_chanbuf_release_read+0x58/0x124
> [<802371f8>] ltt_release+0x10/0x20
> [<800c3878>] __fput+0xec/0x21c
> [<800bf0d8>] filp_close+0x5c/0xa4
> [<800c0c8c>] sys_close+0xc8/0x1a4
> [<8000339c>] stack_done+0x20/0x3c
> 
> 
> Code: 00021027  02621021  8c550000 <8ea30040> 14600084  00000000  8ea20044  14400078  00000000 
> Fixing recursive fault but reboot is needed!
> LTT: 10 events written in channel irq_state (cpu 0, index 0)
> LTT: 1 events written in channel jbd2 (cpu 1, index 0)
> LTT: 52928 events written in channel kernel (cpu 0, index 0)
> LTT: 39629 events written in channel kernel (cpu 1, index 0)
> LTT: 78 events written in channel mm (cpu 0, index 0)
> LTT: 8 events written in channel mm (cpu 1, index 0)
> LTT: 115 events written in channel net (cpu 0, index 0)
> LTT: 2 events written in channel netif_state (cpu 0, index 0)
> LTT: 100 events written in channel rcu (cpu 0, index 0)
> LTT: 2 events written in channel rcu (cpu 1, index 0)
> LTT: 32 events written in channel softirq_state (cpu 0, index 0)
> LTT: 1 events written in channel swap_state (cpu 0, index 0)
> LTT: 43 events written in channel task_state (cpu 0, index 0)
> LTT: 171 events written in channel vm_state (cpu 0, index 0)
>  
> 
> 
> Third Run
> ==========
> 
> # /root/ltt-control-0.74-12112009/lttctl/.libs/lttctl -D trace1
> Linux Trace Toolkit Trace Control 0.74-12112009
> 
> Controlling trace : trace1
> 
> lttctl: Pausing trace
> lttctl: Destroying trace
> ------------[ cut here ]------------
> WARNING: at ltt/ltt-relay-vfs.c:82 ltt_poll+0x11c/0x124()
> Modules linked in:
> Call Trace:
> [<80018c9c>] dump_stack+0x8/0x34
> [<8003e74c>] warn_slowpath_common+0x70/0x98
> [<80237608>] ltt_poll+0x11c/0x124
> [<800d5ed8>] do_sys_poll+0x1d4/0x464
> [<800d61ec>] sys_poll+0x84/0x118
> [<8000339c>] stack_done+0x20/0x3c
> 
> ---[ end trace 187788b5a716f473 ]---
> Unhandled kernel unaligned access[#1]:
> Cpu 1
> $ 0   : 00000000 10008b00 00000200 80507134
> $ 4   : 00000001 00000000 00000009 000025d3
> $ 8   : 000000bd 00000000 00000002 2d2d5d20
> $12   : 0000000f 8c0df7f0 8040a2d0 00000000
> $16   : 8b9c6880 8ab3fec0 00000000 00000047
> $20   : 8b9c68e8 8b9c6880 8b887c0c 00000028
> $24   : 00000000 802824a0                  
> $28   : 8c0de000 8c0dfb18 8c0dfb50 80237608
> Hi    : 00000000
> Lo    : 000000bd
> epc   : 80237608 ltt_poll+0x11c/0x124
>     Tainted: G        W 
> ra    : 80237608 ltt_poll+0x11c/0x124
> Status: 10008b03    KERNEL EXL IE 
> Cause : 80800010
> BadVA : 0000004b
> PrId  : 0002a044 (Broadcom BMIPS4380)
> Modules linked in:
> Process lttd (pid: 348, threadinfo=8c0de000, task=8c0e8b78, tls=2b376960)
> Stack : 8c0dfb20 80105aac 8c0dfb48 0000000c 00000057 8b9c68d8 8ab3fec0 00000000
>         800d5ed8 800d608c 00000020 00000008 00000057 804c0000 00000001 00000001
>         8b992308 804fea18 8b9c6880 0000001f 00000005 00000003 00000007 00100003
>         00000009 00100003 0000000b 00100003 0000000d 00100003 0000000f 00100003
>         00000011 00100003 00000013 00100003 00000015 00100003 00000017 00100003
>         ...
> Call Trace:
> [<80237608>] ltt_poll+0x11c/0x124
> [<800d5ed8>] do_sys_poll+0x1d4/0x464
> [<800d61ec>] sys_poll+0x84/0x118
> [<8000339c>] stack_done+0x20/0x3c
> 
> 
> Code: 24844efc  0c00f9dd  24050052 <0808dd61> 8e640004  27bdff20  afbf00dc  afbe00d8  afb700d4 
> Disabling lock debugging due to kernel taint
> LTT: 25 events written in channel block (cpu 1, index 0)
> LTT: 15 events written in channel block (cpu 0, index 0)
> CPU 1 Unable to handle kernel paging request at virtual address 00000040, epc == 80234040, ra == 80245
> b38
> Oops[#2]:
> Cpu 1
> $ 0   : 00000000 00000001 8b8be480 00010000
> $ 4   : 8b9c691c 80233ff4 8ab3fd40 00000003
> $ 8   : 804c4c88 8c1ce3f8 800c0bc4 fffffff0
> $12   : 37691fad 00000000 00000001 81196240
> $16   : 8b9c691c 8c096000 8bd78d18 00000000
> $20   : 8c0191c0 00000000 00404918 00000000
> $24   : 004048a4 80084330                  
> $28   : 8ab4e000 8ab4fe20 8b9c6900 80245b38
> Hi    : 00000003
> Lo    : 1ed24a80
> epc   : 80234040 ltt_chanbuf_free+0x4c/0x294
>     Tainted: G      D W 
> ra    : 80245b38 kref_put+0xdc/0x128
> Status: 10008b03    KERNEL EXL IE 
> Cause : 00800008
> BadVA : 00000040
> PrId  : 0002a044 (Broadcom BMIPS4380)
> Modules linked in:
> Process lttd (pid: 347, threadinfo=8ab4e000, task=8c0e0278, tls=2aab9440)
> Stack : 8bd57d74 8bd78a78 804c4fb8 8ab3fe00 00000000 8b9a268c 8b9c691c 80233ff4
>         8bd78d18 8bd57e7c 8c0191c0 8b99760c 00404918 7fc0293c 7fc028c8 80245b38
>         00000002 00000001 00000005 00000004 8b9c6900 8ab3fd40 802342e0 804d0000
>         804d0000 800e1724 00000010 8ab3fd40 00000010 802371f8 00000010 8ab3fd40
>         8bd78d18 8bd57e7c 800c3878 800c3830 8ab3fd40 8b99760c 8ab3fd40 00000000
>         ...
> Call Trace:
> [<80234040>] ltt_chanbuf_free+0x4c/0x294
> [<80245b38>] kref_put+0xdc/0x128
> [<802342e0>] ltt_chanbuf_release_read+0x58/0x124
> [<802371f8>] ltt_release+0x10/0x20
> [<800c3878>] __fput+0xec/0x21c
> [<800bf0d8>] filp_close+0x5c/0xa4
> [<800c0c8c>] sys_close+0xc8/0x1a4
> [<8000339c>] stack_done+0x20/0x3c
> 
> 
> Code: 00021027  02621021  8c550000 <8ea30040> 14600084  00000000  8ea20044  14400078  00000000 
> LTT: 140 events written in channel fd_state (cpu 0, index 0)
> LTT: 185 events written in channel fs (cpu 0, index 0)
> LTT: 72 events written in channel fs (cpu 1, index 0)
> LTT: 1 events written in channel global_state (cpu 0, index 0)
> LTT: 10 events written in channel irq_state (cpu 0, index 0)
> LTT: 1 events written in channel jbd2 (cpu 0, index 0)
> LTT: 1 events written in channel jbd2 (cpu 1, index 0)
> LTT: 50148 events written in channel kernel (cpu 0, index 0)
> LTT: 37566 events written in channel kernel (cpu 1, index 0)
> LTT: 99 events written in channel mm (cpu 0, index 0)
> LTT: 29 events written in channel mm (cpu 1, index 0)
> LTT: 21 events written in channel net (cpu 0, index 0)
> LTT: 2 events written in channel netif_state (cpu 0, index 0)
> LTT: 136 events written in channel rcu (cpu 0, index 0)
> LTT: 9 events written in channel rcu (cpu 1, index 0)
> LTT: 32 events written in channel softirq_state (cpu 0, index 0)
> LTT: 1 events written in channel swap_state (cpu 0, index 0)
> LTT: 43 events written in channel task_state (cpu 0, index 0)
> LTT: 171 events written in channel vm_state (cpu 0, index 0)
> 
> Thanks,
> Ashwin
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list