[lttng-dev] [PATCH lttng-modules] Fix: Building the event list fails on fragmented memory
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Thu Jun 25 09:37:29 EDT 2015
----- On Jun 25, 2015, at 9:00 AM, Mathieu Desnoyers mathieu.desnoyers at efficios.com wrote:
> ----- On Jun 25, 2015, at 4:08 AM, Jan Glauber jan.glauber at gmail.com wrote:
>
>> On Tue, Jun 23, 2015 at 09:04:18PM +0000, Mathieu Desnoyers wrote:
>>> Nope, it wasn't it. metadata_written is always <= cache_alloc, so the
>>> current upstream code looks correct.
>>>
>>> I've just tried after changing the CPU configuration from "hypervisor
>>> default" to "copy host cpu config" on my machine, and was then unable
>>> to reproduce. I wonder if there is not an issue with __memcpy implementation
>>> for specific x86_64 processors (e.g. reading too many bytes from the vmalloc'd
>>> area).
>>>
>>> Thoughts ?
>>
>> Just speculation without further debugging data from the hypervisor setup
>> where it fails...
>>
>> From my understanding memcpy should work regardless of kmalloc or
>> vmalloc.
>>
>> Maybe the length parameter of the memcpy is wrong?
>> With kmalloc and the 1:1 mapping this wont matter but if the memcpy
>> crosses a page it might fault if the next page is not vmalloc'ed.
>>
>> But the fault address ffffc900038d995e looks more like src is completely
>> messed up.
>>
>> What is the difference between the CPU configurations you tried?
>
> Actually, I just narrowed it down to a missing mutex around the metadata
> cache accesses. The updater can reallocate the memory, and the reader
> (which outputs from the cache to a ring buffer) assumes that it does not
> have to lock the metadata cache because it only reads its content.
> Unfortunately, this does not take into account that the memory backing
> the cache can be reallocated.
>
> I'll prepare a fix.
I just pushed the two following commits upstream:
commit a606b6e893b99bc31cacee98415942dbc577c90c
Author: Martin Leisener <martin at leisener.de>
Date: Wed Jun 10 15:17:37 2015 +0200
Fix: Building the event list fails on fragmented memory
On a small arm imx6 solo with 256MB RAM it often happens that memory
becomes fragmented rather quickly, so that kmalloc will not be able to
get enough consecutive pages (enocuntered for example if you enable all
kernel events: lttng enable-event -k syscall --all).
This patch switches the allocation to vmalloc. Tested for x86 on Ubuntu
12.04 Lts and on imx6 solo 256MB RAM
If this patch is not applied, you can identify low and/or fragmented
memory failures by looking at the kernel ring buffer (please ignore DMA,
it is due to some memory setup misconfiguration, should read Normal):
...
[ 321.993820] lttng-sessiond: page allocation failure: order:4, mode:0x1040d0
...
[ 321.994711] lowmem_reserve[]: 0 0 0
[ 321.994727] DMA: 801*4kB (UEMC) 424*8kB (EMC) 355*16kB (UEMC) 344*32kB (MC) 340*64kB (C) 8*128kB (C) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 46068kB
[ Edit by Mathieu: use vzalloc() rather than vmalloc() + memset 0. ]
Signed-off-by: Martin Leisener <martin at leisener.de>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
commit 92d9f5e6df0f72842b28ccd303569c88e183325a
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date: Thu Jun 25 09:10:52 2015 -0400
Fix: use after free on metadata cache reallocation
When the metadata cache is expanded (reallocated) by
lttng_metadata_printf(), the metadata cache reader
(lttng_metadata_output_channel()) may use freed memory, because the
metadata cache is not protected from concurrent read accesses. The
metadata cache updates are protected from each other by the sessions
mutex, but metadata cache reads do not hold the sessions mutex.
Actually, the comment on top of lttng_metadata_output_channel() stating
"We have exclusive access to our metadata buffer (protected by the
sessions_mutex)" is simply wrong, because this mutex is never held when
calling lttng_metadata_output_channel().
Promote the per-stream lock to the metadata cache used by each of those
metadata streams, thus ensuring mutual exclusion between metadata cache
reallocation and readers.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Thanks,
Mathieu
>
> Thanks!
>
> Mathieu
>
>
>>
>> Regards, Jan
>>
>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>> ----- On Jun 23, 2015, at 4:21 PM, Mathieu Desnoyers
>>> mathieu.desnoyers at efficios.com wrote:
>>>
>>> > I think I found the culprit (testing now):
>>> >
>>> > int lttng_metadata_printf(struct lttng_session *session,
>>> > const char *fmt, ...)
>>> >
>>> > tmp_cache_alloc_size = max_t(unsigned int,
>>> > session->metadata_cache->cache_alloc + len,
>>> > session->metadata_cache->cache_alloc << 1)
>>> >
>>> > should be:
>>> >
>>> > tmp_cache_alloc_size = max_t(unsigned int,
>>> > session->metadata_cache->metadata_written + len,
>>> > session->metadata_cache->cache_alloc << 1);
>>> >
>>> > I think we have a possible memory corruption in the upstream code here.
>>> >
>>> > More coming soon,
>>> >
>>> > Thanks,
>>> >
>>> > Mathieu
>>> >
>>> >
>>> > ----- On Jun 23, 2015, at 3:46 PM, Mathieu Desnoyers
>>> > mathieu.desnoyers at efficios.com wrote:
>>> >
>>> >> ----- On Jun 23, 2015, at 9:46 AM, Jan Glauber jan.glauber at gmail.com wrote:
>>> >>
>>> >>> On Sat, Jun 13, 2015 at 07:32:28PM +0000, Mathieu Desnoyers wrote:
>>> >>>> ----- On Jun 10, 2015, at 9:17 AM, Martin Leisener martin at leisener.de wrote:
>>> >>>>
>>> >>>> > Hi Mathieu,
>>> >>>> >
>>> >>>> > sorry for sending the broken patch, I fell for some "smart" formating of my
>>> >>>> > email client, I cannot utilize it myself the way I send it.
>>> >>>> > Hence I now send the patch again as it was original intended. Sent to myself, I
>>> >>>> > can apply it, so I hope it now works in general as well
>>> >>>> > I examined as recommended diff -urN, it seems the output can be utilized by
>>> >>>> > patch the same way as git diff.
>>> >>>> > Hope it is now received the way I intended.
>>> >>>>
>>> >>>> Hi Martin,
>>> >>>>
>>> >>>> I tried you patch on a x86-64 machine (my laptop), and although taking a
>>> >>>> single kernel trace worked fine, it appears to OOPS when I run
>>> >>>>
>>> >>>> (in lttng-tools)
>>> >>>> cd tests
>>> >>>> ./run.sh root_regression
>>> >>>
>>> >>> Hi Mathieu,
>>> >>>
>>> >>> can you please share the oops message so I can have a look?
>>> >>
>>> >> Sure,
>>> >>
>>> >> Here is the backtrace happening with the attached patch over lttng-modules
>>> >> commit 8c6e7f13c778701dec2d6549f1e2ca98970907a0. I modified your patch
>>> >> slightly.
>>> >>
>>> >> [ 4078.314978] BUG: unable to handle kernel paging request at ffffc900038d995e
>>> >> [ 4078.315824] IP: [<ffffffff81316f12>] __memcpy+0x12/0x20
>>> >> [ 4078.315824] PGD 236c92067 PUD 236c93067 PMD bac0c067 PTE 0
>>> >> [ 4078.315824] Oops: 0000 [#1] SMP
>>> >> [ 4078.315824] Modules linked in: lttng_probe_workqueue(O) lttng_probe_vmscan(O)
>>> >> lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O)
>>> >> lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O)
>>> >> lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O)
>>> >> lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O)
>>> >> lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O)
>>> >> lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O)
>>> >> lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_ext4(O)
>>> >> lttng_probe_compaction(O) lttng_probe_block(O) lttng_types(O)
>>> >> lttng_ring_buffer_metadata_mmap_client(O)
>>> >> lttng_ring_buffer_client_mmap_overwrite(O)
>>> >> lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O)
>>> >> lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O)
>>> >> lttng_tracer(O) lttng_statedump(O) lttng_kprobes(O) lttng_lib_ring_buffer(O)
>>> >> lttng_kretprobes(O) virtio_blk virtio_net virtio_pci virtio_ring virtio
>>> >> [ 4078.315824] CPU: 5 PID: 4258 Comm: lttng-consumerd Tainted: G O
>>> >> 4.1.0 #7
>>> >> [ 4078.315824] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>>> >> 01/01/2011
>>> >> [ 4078.315824] task: ffff8802350c3660 ti: ffff8800bae84000 task.ti:
>>> >> ffff8800bae84000
>>> >> [ 4078.315824] RIP: 0010:[<ffffffff81316f12>] [<ffffffff81316f12>]
>>> >> __memcpy+0x12/0x20
>>> >> [ 4078.315824] RSP: 0018:ffff8800bae87da0 EFLAGS: 00010246
>>> >> [ 4078.315824] RAX: ffff880235439025 RBX: 0000000000000fd8 RCX: 00000000000001fb
>>> >> [ 4078.315824] RDX: 0000000000000000 RSI: ffffc900038d995e RDI: ffff880235439025
>>> >> [ 4078.315824] RBP: ffff8800bae87db8 R08: ffff8800bacecc00 R09: 0000000000008000
>>> >> [ 4078.315824] R10: 0000000000000000 R11: 0000000000000246 R12: ffff8800bae87dc8
>>> >> [ 4078.315824] R13: ffff88023466e800 R14: 0000000000000fd8 R15: 0000000000000fd8
>>> >> [ 4078.315824] FS: 00007f5d3b1cc700(0000) GS:ffff8802372a0000(0000)
>>> >> knlGS:0000000000000000
>>> >> [ 4078.315824] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> >> [ 4078.315824] CR2: ffffc900038d995e CR3: 00000000bb1ed000 CR4: 00000000000006e0
>>> >> [ 4078.315824] Stack:
>>> >> [ 4078.315824] ffffffffa01ac797 ffff8800bb5bd480 ffff8800bb5bd4d0
>>> >> ffff8800bae87e48
>>> >> [ 4078.315824] ffffffffa0073060 ffff88023466e800 0000000000000000
>>> >> 0000000000000fd8
>>> >> [ 4078.315824] ffffffff00000001 ffff8800bacecc00 0000000000000fd8
>>> >> 0000000000008025
>>> >> [ 4078.315824] Call Trace:
>>> >> [ 4078.315824] [<ffffffffa01ac797>] ? lttng_event_write+0x87/0xb0
>>> >> [lttng_ring_buffer_metadata_client]
>>> >> [ 4078.315824] [<ffffffffa0073060>] lttng_metadata_output_channel+0xd0/0x120
>>> >> [lttng_tracer]
>>> >> [ 4078.315824] [<ffffffffa00755f9>] lttng_metadata_ring_buffer_ioctl+0x79/0xd0
>>> >> [lttng_tracer]
>>> >> [ 4078.315824] [<ffffffff8117ba10>] do_vfs_ioctl+0x2e0/0x4e0
>>> >> [ 4078.315824] [<ffffffff812b35c7>] ? file_has_perm+0x87/0xa0
>>> >> [ 4078.315824] [<ffffffff8117bc91>] SyS_ioctl+0x81/0xa0
>>> >> [ 4078.315824] [<ffffffff810115d1>] ? syscall_trace_leave+0xd1/0xe0
>>> >> [ 4078.315824] [<ffffffff818bbd37>] tracesys_phase2+0x84/0x89
>>> >> [ 4078.315824] Code: 5b 5d c3 66 0f 1f 44 00 00 e8 6b fc ff ff eb e1 90 90 90 90
>>> >> 90 90 90 90 90 0f 1f 44 00 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5
>>> >> 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
>>> >> [ 4078.315824] RIP [<ffffffff81316f12>] __memcpy+0x12/0x20
>>> >> [ 4078.315824] RSP <ffff8800bae87da0>
>>> >> [ 4078.315824] CR2: ffffc900038d995e
>>> >> [ 4078.315824] ---[ end trace a05b652829ceda48 ]---
>>> >> [ 4078.433848] LTTng: block device enumeration is not supported by kernel
>>> >>
>>> >>
>>> >> --
>>> >> Mathieu Desnoyers
>>> >> EfficiOS Inc.
>>> >> http://www.efficios.com
>>> >> _______________________________________________
>>> >> lttng-dev mailing list
>>> >> lttng-dev at lists.lttng.org
>>> >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>> >
>>> > --
>>> > Mathieu Desnoyers
>>> > EfficiOS Inc.
>>> > http://www.efficios.com
>>> >
>>> > _______________________________________________
>>> > lttng-dev mailing list
>>> > lttng-dev at lists.lttng.org
>>> > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>> > http://www.efficios.com
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list