[lttng-dev] Segfault at v_read() called from lib_ring_buffer_try_reserve_slow() in LTTng-UST traced app - CPU/VMware dependent
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Thu Dec 11 10:36:30 EST 2014
----- Original Message -----
> From: "David OShea" <David.OShea at quantum.com>
> To: "lttng-dev" <lttng-dev at lists.lttng.org>
> Sent: Sunday, December 7, 2014 10:30:04 PM
> Subject: [lttng-dev] Segfault at v_read() called from
> lib_ring_buffer_try_reserve_slow() in LTTng-UST traced app - CPU/VMware
> dependent
> Hi all,
> We have encountered a problem with using LTTng-UST tracing with our
> application, where on a particular VMware vCenter cluster we almost ways get
> segfaults when tracepoints are enabled, whereas on another vCenter cluster,
> and on every other machine we’ve ever used, we don’t hit this problem.
> I can reproduce this using lttng-ust/tests/hello after using:
> """
> lttng create
> lttng enable-channel channel0 --userspace
> lttng add-context --userspace -t vpid -t vtid -t procname
> lttng enable-event --userspace "ust_tests_hello:*" -c channel0
> lttng start
> """
> In which case I get the following stack trace with an obvious NULL pointer
> dereference:
> """
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48
> 48 return uatomic_read(&v_a->a);
> [...]
> #0 v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48
> #1 0x00007f4aa10a4804 in lib_ring_buffer_try_reserve_slow (
> buf=0x7f4a98008a00, chan=0x7f4a98008a00, offsets=0x7fffef67c620,
> ctx=0x7fffef67ca40) at ring_buffer_frontend.c:1677
> #2 0x00007f4aa10a6c9f in lib_ring_buffer_reserve_slow (ctx=0x7fffef67ca40)
> at ring_buffer_frontend.c:1819
> #3 0x00007f4aa1095b75 in lib_ring_buffer_reserve (ctx=0x7fffef67ca40,
> config=0x7f4aa12b8ae0 <client_config>)
> at ../libringbuffer/frontend_api.h:211
> #4 lttng_event_reserve (ctx=0x7fffef67ca40, event_id=0)
> at lttng-ring-buffer-client.h:473
> #5 0x000000000040135f in __event_probe__ust_tests_hello___tptest (
> __tp_data=0xed3410, anint=0, netint=0, values=0x7fffef67cb50,
> text=0x7fffef67cb70 "test", textlen=<optimized out>, doublearg=2,
> floatarg=2222, boolarg=true) at ././ust_tests_hello.h:32
> #6 0x0000000000400d2c in __tracepoint_cb_ust_tests_hello___tptest (
> boolarg=true, floatarg=2222, doublearg=2, textlen=4,
> text=0x7fffef67cb70 "test", values=0x7fffef67cb50,
> netint=<optimized out>, anint=0) at ust_tests_hello.h:32
> #7 main (argc=<optimized out>, argv=<optimized out>) at hello.c:92
> """
> I hit this segfault 10 out of 10 times I ran “hello” on a VM on one vCenter
> and 0 out of 10 times I ran it on the other, and the VMs otherwise had the
> same software installed on them:
> - CentOS 6-based
> - kernel-2.6.32-504.1.3.el6 with some minor changes made in networking
> - userspace-rcu-0.8.3, lttng-ust-2.3.2 and lttng-tools-2.3.2 which might have
> some minor patches backported, and leftovers of changes to get them to build
> on CentOS 5
> On the “good” vCenter, I tested on two different VM hosts:
> Processor Type: Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
> EVC Mode: Intel(R) "Nehalem" Generation
> Image Profile: (Updated) ESXi-5.1.0-799733-standard
> Processor Type: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
> EVC Mode: Intel(R) "Nehalem" Generation
> Image Profile: (Updated) ESXi-5.1.0-799733-standard
> The “bad” vCenter VM host that I tested on had this configuration:
> ESX Version: VMware ESXi, 5.0.0, 469512
> Processor Type: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz
> Any ideas?
My bet would be that the OS is lying to userspace about the
number of possible CPUs. I wonder what liblttng-ust
libringbuffer/shm.h num_possible_cpus() is returning compared
to what lib_ring_buffer_get_cpu() returns.
Can you check this out ?
Thanks,
Mathieu
> Thanks in advance,
> David
> The information contained in this transmission may be confidential. Any
> disclosure, copying, or further distribution of confidential information is
> not permitted unless such privilege is explicitly granted in writing by
> Quantum. Quantum reserves the right to have electronic communications,
> including email and attachments, sent across its networks filtered through
> anti virus and spam software programs and retain such messages in order to
> comply with applicable data security and retention requirements. Quantum is
> not responsible for the proper and complete transmission of the substance of
> this communication or for any delay in its receipt.
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20141211/0d038d7d/attachment-0001.html>
More information about the lttng-dev
mailing list