<html><body><div style="font-family: times new roman, new york, times, serif; font-size: 12pt; color: #000000"><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"David OShea" <David.OShea@quantum.com><br><b>To: </b>"lttng-dev" <lttng-dev@lists.lttng.org><br><b>Sent: </b>Sunday, December 7, 2014 10:30:04 PM<br><b>Subject: </b>[lttng-dev] Segfault at v_read() called from lib_ring_buffer_try_reserve_slow() in LTTng-UST traced app - CPU/VMware dependent<br><div><br></div><style><!--
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><div class="WordSection1"><p class="MsoNormal">Hi all,</p><p class="MsoNormal">We have encountered a problem with using LTTng-UST tracing with our application, where on a particular VMware vCenter cluster we almost ways get segfaults when tracepoints are enabled, whereas on another vCenter cluster, and on every other machine we’ve ever used, we don’t hit this problem.</p><p class="MsoNormal">I can reproduce this using lttng-ust/tests/hello after using:</p><p class="MsoNormal">"""</p><p class="MsoNormal">lttng create</p><p class="MsoNormal">lttng enable-channel channel0 --userspace</p><p class="MsoNormal">lttng add-context --userspace -t vpid -t vtid -t procname</p><p class="MsoNormal">lttng enable-event --userspace "ust_tests_hello:*" -c channel0</p><p class="MsoNormal">lttng start</p><p class="MsoNormal">"""</p><p class="MsoNormal">In which case I get the following stack trace with an obvious NULL pointer dereference:</p><p class="MsoNormal">"""</p><p class="MsoNormal">Program terminated with signal SIGSEGV, Segmentation fault.</p><p class="MsoNormal">#0 v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48</p><p class="MsoNormal">48 return uatomic_read(&v_a->a);</p><p class="MsoNormal">[...]</p><p class="MsoNormal">#0 v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48</p><p class="MsoNormal">#1 0x00007f4aa10a4804 in lib_ring_buffer_try_reserve_slow (</p><p class="MsoNormal"> buf=0x7f4a98008a00, chan=0x7f4a98008a00, offsets=0x7fffef67c620,</p><p class="MsoNormal"> ctx=0x7fffef67ca40) at ring_buffer_frontend.c:1677</p><p class="MsoNormal">#2 0x00007f4aa10a6c9f in lib_ring_buffer_reserve_slow (ctx=0x7fffef67ca40)</p><p class="MsoNormal"> at ring_buffer_frontend.c:1819</p><p class="MsoNormal">#3 0x00007f4aa1095b75 in lib_ring_buffer_reserve (ctx=0x7fffef67ca40,</p><p class="MsoNormal"> config=0x7f4aa12b8ae0 <client_config>)</p><p class="MsoNormal"> at ../libringbuffer/frontend_api.h:211</p><p class="MsoNormal">#4 lttng_event_reserve (ctx=0x7fffef67ca40, event_id=0)</p><p class="MsoNormal"> at lttng-ring-buffer-client.h:473</p><p class="MsoNormal">#5 0x000000000040135f in __event_probe__ust_tests_hello___tptest (</p><p class="MsoNormal"> __tp_data=0xed3410, anint=0, netint=0, values=0x7fffef67cb50,</p><p class="MsoNormal"> text=0x7fffef67cb70 "test", textlen=<optimized out>, doublearg=2,</p><p class="MsoNormal"> floatarg=2222, boolarg=true) at ././ust_tests_hello.h:32</p><p class="MsoNormal">#6 0x0000000000400d2c in __tracepoint_cb_ust_tests_hello___tptest (</p><p class="MsoNormal"> boolarg=true, floatarg=2222, doublearg=2, textlen=4,</p><p class="MsoNormal"> text=0x7fffef67cb70 "test", values=0x7fffef67cb50,</p><p class="MsoNormal"> netint=<optimized out>, anint=0) at ust_tests_hello.h:32</p><p class="MsoNormal">#7 main (argc=<optimized out>, argv=<optimized out>) at hello.c:92</p><p class="MsoNormal">"""</p><p class="MsoNormal">I hit this segfault 10 out of 10 times I ran “hello” on a VM on one vCenter and 0 out of 10 times I ran it on the other, and the VMs otherwise had the same software installed on them:</p><p class="MsoNormal">- CentOS 6-based</p><p class="MsoNormal">- kernel-2.6.32-504.1.3.el6 with some minor changes made in networking</p><p class="MsoNormal">- userspace-rcu-0.8.3, lttng-ust-2.3.2 and lttng-tools-2.3.2 which might have some minor patches backported, and leftovers of changes to get them to build on CentOS 5</p><p class="MsoNormal">On the “good” vCenter, I tested on two different VM hosts:</p><p class="MsoNormal">Processor Type: Intel(R) Xeon(R) CPU E5530 @ 2.40GHz</p><p class="MsoNormal">EVC Mode: Intel(R) "Nehalem" Generation</p><p class="MsoNormal">Image Profile: (Updated) ESXi-5.1.0-799733-standard</p><p class="MsoNormal">Processor Type: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz</p><p class="MsoNormal">EVC Mode: Intel(R) "Nehalem" Generation</p><p class="MsoNormal">Image Profile: (Updated) ESXi-5.1.0-799733-standard</p><p class="MsoNormal">The “bad” vCenter VM host that I tested on had this configuration:</p><p class="MsoNormal">ESX Version: VMware ESXi, 5.0.0, 469512</p><p class="MsoNormal">Processor Type: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz</p><p class="MsoNormal">Any ideas?</p></div></blockquote><div><br></div><div>My bet would be that the OS is lying to userspace about the<br></div><div>number of possible CPUs. I wonder what liblttng-ust<br></div><div>libringbuffer/shm.h num_possible_cpus() is returning compared<br></div><div>to what lib_ring_buffer_get_cpu() returns.<br></div><div><br></div><div>Can you check this out ?<br></div><div><br></div><div>Thanks,<br></div><div><br></div><div>Mathieu<br></div><div><br></div><div><br></div><div><br></div><div><br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div class="WordSection1"><p class="MsoNormal">Thanks in advance,<br> David</p></div><hr>The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.<br><br>_______________________________________________<br>lttng-dev mailing list<br>lttng-dev@lists.lttng.org<br>http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev<br></blockquote><div><br><br></div><div><br></div><div>-- <br></div><div><span name="x"></span>Mathieu Desnoyers<br>EfficiOS Inc.<br>http://www.efficios.com<span name="x"></span><br></div></div></body></html>