[lttng-dev] [PATCH lttng-modules] Fix: Building the event list fails on fragmented memory

Jan Glauber jan.glauber at gmail.com
Thu Jun 25 04:08:30 EDT 2015


On Tue, Jun 23, 2015 at 09:04:18PM +0000, Mathieu Desnoyers wrote:
> Nope, it wasn't it. metadata_written is always <= cache_alloc, so the
> current upstream code looks correct.
> 
> I've just tried after changing the CPU configuration from "hypervisor
> default" to "copy host cpu config" on my machine, and was then unable
> to reproduce. I wonder if there is not an issue with __memcpy implementation
> for specific x86_64 processors (e.g. reading too many bytes from the vmalloc'd
> area).
> 
> Thoughts ?

Just speculation without further debugging data from the hypervisor setup
where it fails...

>From my understanding memcpy should work regardless of kmalloc or
vmalloc.

Maybe the length parameter of the memcpy is wrong?
With kmalloc and the 1:1 mapping this wont matter but if the memcpy
crosses a page it might fault if the next page is not vmalloc'ed.

But the fault address ffffc900038d995e looks more like src is completely
messed up.

What is the difference between the CPU configurations you tried?

Regards, Jan


> Thanks,
> 
> Mathieu
> 
> ----- On Jun 23, 2015, at 4:21 PM, Mathieu Desnoyers mathieu.desnoyers at efficios.com wrote:
> 
> > I think I found the culprit (testing now):
> > 
> > int lttng_metadata_printf(struct lttng_session *session,
> >                          const char *fmt, ...)
> > 
> >                tmp_cache_alloc_size = max_t(unsigned int,
> >                                session->metadata_cache->cache_alloc + len,
> >                                session->metadata_cache->cache_alloc << 1)
> > 
> > should be:
> > 
> >                tmp_cache_alloc_size = max_t(unsigned int,
> >                                session->metadata_cache->metadata_written + len,
> >                                session->metadata_cache->cache_alloc << 1);
> > 
> > I think we have a possible memory corruption in the upstream code here.
> > 
> > More coming soon,
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > 
> > ----- On Jun 23, 2015, at 3:46 PM, Mathieu Desnoyers
> > mathieu.desnoyers at efficios.com wrote:
> > 
> >> ----- On Jun 23, 2015, at 9:46 AM, Jan Glauber jan.glauber at gmail.com wrote:
> >> 
> >>> On Sat, Jun 13, 2015 at 07:32:28PM +0000, Mathieu Desnoyers wrote:
> >>>> ----- On Jun 10, 2015, at 9:17 AM, Martin Leisener martin at leisener.de wrote:
> >>>> 
> >>>> > Hi Mathieu,
> >>>> > 
> >>>> > sorry for sending the broken patch, I fell for some "smart" formating of my
> >>>> > email client, I cannot utilize it myself the way I send it.
> >>>> > Hence I now send the patch again as it was original intended. Sent to myself, I
> >>>> > can apply it, so I hope it now works in general as well
> >>>> > I examined as recommended diff -urN, it seems the output can be utilized by
> >>>> > patch the same way as git diff.
> >>>> > Hope it is now received the way I intended.
> >>>> 
> >>>> Hi Martin,
> >>>> 
> >>>> I tried you patch on a x86-64 machine (my laptop), and although taking a
> >>>> single kernel trace worked fine, it appears to OOPS when I run
> >>>> 
> >>>> (in lttng-tools)
> >>>> cd tests
> >>>> ./run.sh root_regression
> >>> 
> >>> Hi Mathieu,
> >>> 
> >>> can you please share the oops message so I can have a look?
> >> 
> >> Sure,
> >> 
> >> Here is the backtrace happening with the attached patch over lttng-modules
> >> commit 8c6e7f13c778701dec2d6549f1e2ca98970907a0. I modified your patch
> >> slightly.
> >> 
> >> [ 4078.314978] BUG: unable to handle kernel paging request at ffffc900038d995e
> >> [ 4078.315824] IP: [<ffffffff81316f12>] __memcpy+0x12/0x20
> >> [ 4078.315824] PGD 236c92067 PUD 236c93067 PMD bac0c067 PTE 0
> >> [ 4078.315824] Oops: 0000 [#1] SMP
> >> [ 4078.315824] Modules linked in: lttng_probe_workqueue(O) lttng_probe_vmscan(O)
> >> lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O)
> >> lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O)
> >> lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O)
> >> lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O)
> >> lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O)
> >> lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O)
> >> lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_ext4(O)
> >> lttng_probe_compaction(O) lttng_probe_block(O) lttng_types(O)
> >> lttng_ring_buffer_metadata_mmap_client(O)
> >> lttng_ring_buffer_client_mmap_overwrite(O)
> >> lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O)
> >> lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O)
> >> lttng_tracer(O) lttng_statedump(O) lttng_kprobes(O) lttng_lib_ring_buffer(O)
> >> lttng_kretprobes(O) virtio_blk virtio_net virtio_pci virtio_ring virtio
> >> [ 4078.315824] CPU: 5 PID: 4258 Comm: lttng-consumerd Tainted: G           O
> >> 4.1.0 #7
> >> [ 4078.315824] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
> >> 01/01/2011
> >> [ 4078.315824] task: ffff8802350c3660 ti: ffff8800bae84000 task.ti:
> >> ffff8800bae84000
> >> [ 4078.315824] RIP: 0010:[<ffffffff81316f12>]  [<ffffffff81316f12>]
> >> __memcpy+0x12/0x20
> >> [ 4078.315824] RSP: 0018:ffff8800bae87da0  EFLAGS: 00010246
> >> [ 4078.315824] RAX: ffff880235439025 RBX: 0000000000000fd8 RCX: 00000000000001fb
> >> [ 4078.315824] RDX: 0000000000000000 RSI: ffffc900038d995e RDI: ffff880235439025
> >> [ 4078.315824] RBP: ffff8800bae87db8 R08: ffff8800bacecc00 R09: 0000000000008000
> >> [ 4078.315824] R10: 0000000000000000 R11: 0000000000000246 R12: ffff8800bae87dc8
> >> [ 4078.315824] R13: ffff88023466e800 R14: 0000000000000fd8 R15: 0000000000000fd8
> >> [ 4078.315824] FS:  00007f5d3b1cc700(0000) GS:ffff8802372a0000(0000)
> >> knlGS:0000000000000000
> >> [ 4078.315824] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >> [ 4078.315824] CR2: ffffc900038d995e CR3: 00000000bb1ed000 CR4: 00000000000006e0
> >> [ 4078.315824] Stack:
> >> [ 4078.315824]  ffffffffa01ac797 ffff8800bb5bd480 ffff8800bb5bd4d0
> >> ffff8800bae87e48
> >> [ 4078.315824]  ffffffffa0073060 ffff88023466e800 0000000000000000
> >> 0000000000000fd8
> >> [ 4078.315824]  ffffffff00000001 ffff8800bacecc00 0000000000000fd8
> >> 0000000000008025
> >> [ 4078.315824] Call Trace:
> >> [ 4078.315824]  [<ffffffffa01ac797>] ? lttng_event_write+0x87/0xb0
> >> [lttng_ring_buffer_metadata_client]
> >> [ 4078.315824]  [<ffffffffa0073060>] lttng_metadata_output_channel+0xd0/0x120
> >> [lttng_tracer]
> >> [ 4078.315824]  [<ffffffffa00755f9>] lttng_metadata_ring_buffer_ioctl+0x79/0xd0
> >> [lttng_tracer]
> >> [ 4078.315824]  [<ffffffff8117ba10>] do_vfs_ioctl+0x2e0/0x4e0
> >> [ 4078.315824]  [<ffffffff812b35c7>] ? file_has_perm+0x87/0xa0
> >> [ 4078.315824]  [<ffffffff8117bc91>] SyS_ioctl+0x81/0xa0
> >> [ 4078.315824]  [<ffffffff810115d1>] ? syscall_trace_leave+0xd1/0xe0
> >> [ 4078.315824]  [<ffffffff818bbd37>] tracesys_phase2+0x84/0x89
> >> [ 4078.315824] Code: 5b 5d c3 66 0f 1f 44 00 00 e8 6b fc ff ff eb e1 90 90 90 90
> >> 90 90 90 90 90 0f 1f 44 00 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5
> >> 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
> >> [ 4078.315824] RIP  [<ffffffff81316f12>] __memcpy+0x12/0x20
> >> [ 4078.315824]  RSP <ffff8800bae87da0>
> >> [ 4078.315824] CR2: ffffc900038d995e
> >> [ 4078.315824] ---[ end trace a05b652829ceda48 ]---
> >> [ 4078.433848] LTTng: block device enumeration is not supported by kernel
> >> 
> >> 
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> http://www.efficios.com
> >> _______________________________________________
> >> lttng-dev mailing list
> >> lttng-dev at lists.lttng.org
> >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > 
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com
> > 
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com



More information about the lttng-dev mailing list