[lttng-dev] Segfault at v_read() called from lib_ring_buffer_try_reserve_slow() in LTTng-UST traced app - CPU/VMware dependent

Sun Dec 7 22:30:04 EST 2014

Hi all,

We have encountered a problem with using LTTng-UST tracing with our application, where on a particular VMware vCenter cluster we almost ways get segfaults when tracepoints are enabled, whereas on another vCenter cluster, and on every other machine we've ever used, we don't hit this problem.

I can reproduce this using lttng-ust/tests/hello after using:

"""
lttng create
lttng enable-channel channel0 --userspace
lttng add-context --userspace -t vpid -t vtid -t procname
lttng enable-event --userspace "ust_tests_hello:*" -c channel0
lttng start
"""

In which case I get the following stack trace with an obvious NULL pointer dereference:

"""
Program terminated with signal SIGSEGV, Segmentation fault.
#0  v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48
48              return uatomic_read(&v_a->a);
[...]
#0  v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48
#1  0x00007f4aa10a4804 in lib_ring_buffer_try_reserve_slow (
    buf=0x7f4a98008a00, chan=0x7f4a98008a00, offsets=0x7fffef67c620,
    ctx=0x7fffef67ca40) at ring_buffer_frontend.c:1677
#2  0x00007f4aa10a6c9f in lib_ring_buffer_reserve_slow (ctx=0x7fffef67ca40)
    at ring_buffer_frontend.c:1819
#3  0x00007f4aa1095b75 in lib_ring_buffer_reserve (ctx=0x7fffef67ca40,
    config=0x7f4aa12b8ae0 <client_config>)
    at ../libringbuffer/frontend_api.h:211
#4  lttng_event_reserve (ctx=0x7fffef67ca40, event_id=0)
    at lttng-ring-buffer-client.h:473
#5  0x000000000040135f in __event_probe__ust_tests_hello___tptest (
    __tp_data=0xed3410, anint=0, netint=0, values=0x7fffef67cb50,
    text=0x7fffef67cb70 "test", textlen=<optimized out>, doublearg=2,
    floatarg=2222, boolarg=true) at ././ust_tests_hello.h:32
#6  0x0000000000400d2c in __tracepoint_cb_ust_tests_hello___tptest (
    boolarg=true, floatarg=2222, doublearg=2, textlen=4,
    text=0x7fffef67cb70 "test", values=0x7fffef67cb50,
    netint=<optimized out>, anint=0) at ust_tests_hello.h:32
#7  main (argc=<optimized out>, argv=<optimized out>) at hello.c:92
"""

I hit this segfault 10 out of 10 times I ran "hello" on a VM on one vCenter and 0 out of 10 times I ran it on the other, and the VMs otherwise had the same software installed on them:

- CentOS 6-based
- kernel-2.6.32-504.1.3.el6 with some minor changes made in networking
- userspace-rcu-0.8.3, lttng-ust-2.3.2 and lttng-tools-2.3.2 which might have some minor patches backported, and leftovers of changes to get them to build on CentOS 5

On the "good" vCenter, I tested on two different VM hosts:

Processor Type: Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
EVC Mode: Intel(R) "Nehalem" Generation
Image Profile: (Updated) ESXi-5.1.0-799733-standard

Processor Type: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
EVC Mode: Intel(R) "Nehalem" Generation
Image Profile: (Updated) ESXi-5.1.0-799733-standard

The "bad" vCenter VM host that I tested on had this configuration:

ESX Version: VMware ESXi, 5.0.0, 469512
Processor Type: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz

Any ideas?

Thanks in advance,
David

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20141208/5e0f7e25/attachment-0001.html>