[lttng-dev] Double free or corruption error (fasttop)

Shehab Elsayed shehabyomn at gmail.com
Thu Mar 22 12:24:18 EDT 2018


Actually, i am not sure if this would help. I have been trying to reproduce
the problem on a different machine, but I can't. Even without the patches
!!!!!

Shehab Y. Elsayed, MSc.
PhD Student
The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
University of Toronto
E-mail: shehabyomn at gmail.com
<https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>

On Wed, Mar 21, 2018 at 8:13 PM, Mathieu Desnoyers <
mathieu.desnoyers at efficios.com> wrote:

> ----- On Mar 21, 2018, at 8:01 PM, Shehab Elsayed <shehabyomn at gmail.com>
> wrote:
>
> Just to clarify more what I meant by custom events; I have new tracepoints
> added in the source code of the benchmark. However, I don't enable the
> corresponding events when I do the actual tracing.
>
> This is the sequence followed in building the benchmark:
> gcc-7.2 -c -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112
> -std=c11 -g -fno-strict-aliasing -DLTTNG_INST lu.c
> gcc-7.2 -O2 -pthread -D_XOPEN_SOURCE=500 -D_POSIX_C_SOURCE=200112 -std=c11
> -g -fno-strict-aliasing -DLTTNG_INST -o LU_NCB lu.o
> ../../instrumentation/lttng_tp/tp.o -lm -llttng-ust -ldl
>
> LTTNG_INST is just a preprocessor flag I have and tp.o is my custom
> tracepoints
>
> Could you share a repository with your custom instrumentation on github,
> so I could
> try it out ?
>
> My current problem is that I cannot reproduce your issue on my end.
>
> Thanks,
>
> Mathieu
>
>
>
>
> Shehab Y. Elsayed, MSc.
> PhD Student
> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
> University of Toronto
> E-mail: shehabyomn at gmail.com
> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>
>
> On Wed, Mar 21, 2018 at 7:55 PM, Shehab Elsayed <shehabyomn at gmail.com>
> wrote:
>
>> Still running into same problem. I attached the debug trace I got after
>> applying the 2 patches.
>>
>> The benchmark I am running also includes some custom created tracepoints.
>> I am not adding the events being traced in the files I have provided. Do
>> you think this might be causing a problem when I have tracpoints from 2
>> different packages?
>>
>> Shehab Y. Elsayed, MSc.
>> PhD Student
>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>> University of Toronto
>> E-mail: shehabyomn at gmail.com
>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>
>>
>> On Wed, Mar 21, 2018 at 4:22 PM, Mathieu Desnoyers <
>> mathieu.desnoyers at efficios.com> wrote:
>>
>>> ----- On Mar 21, 2018, at 1:55 PM, Shehab Elsayed <shehabyomn at gmail.com>
>>> wrote:
>>>
>>> I am so sorry for the late replies.
>>>
>>> I have tried the first patch you sent and the problem still happens
>>> (although seemingly less frequently especially with debugging enabled!!!!).
>>> I have attached the output I got from one of the erroneous runs.
>>>
>>> I will try the updated patch and let you know how it goes.
>>>
>>> Also, I am not sure if these points make any difference:
>>> 1- The error actually happens at the end of the application. It actually
>>> finishes execution, but then something goes wrong.
>>> 2- I run into this problem only for some of the benchmarks (or at least
>>> the problems happens much less frequently for others that I didn't run into
>>> it, not yet)
>>>
>>> Thanks you very much, and sorry again for the late replies.
>>>
>>>
>>> No worries! Looking through your log, I notice that pthread set cancel
>>> state has problems when
>>> called from application threads. We do not restore the original thread's
>>> cancel state. Can you try
>>> with the attached patch applied on top of the previous one ?
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>
>>>
>>>
>>> Shehab Y. Elsayed, MSc.
>>> PhD Student
>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>> University of Toronto
>>> E-mail: shehabyomn at gmail.com
>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>
>>>
>>> On Wed, Mar 21, 2018 at 1:27 PM, Mathieu Desnoyers <
>>> mathieu.desnoyers at efficios.com> wrote:
>>>
>>>> ----- On Mar 20, 2018, at 5:42 PM, Mathieu Desnoyers <
>>>> mathieu.desnoyers at efficios.com> wrote:
>>>>
>>>> ----- On Mar 20, 2018, at 4:58 PM, Mathieu Desnoyers <
>>>> mathieu.desnoyers at efficios.com> wrote:
>>>>
>>>> ----- On Mar 20, 2018, at 12:07 PM, Mathieu Desnoyers <
>>>> mathieu.desnoyers at efficios.com> wrote:
>>>>
>>>>
>>>> ----- On Mar 19, 2018, at 4:21 PM, Shehab Elsayed <shehabyomn at gmail.com>
>>>> wrote:
>>>>
>>>> I did "echo "-1" > /proc/sys/kernel/perf_event_paranoid" and made sure
>>>> the value was actually written by "cat /proc/sys/kernel/perf_event_
>>>> paranoid"
>>>>
>>>> It executed normally 2 times but then got the same error.
>>>>
>>>>
>>>> Can you provide the output when reproducing the issue with the
>>>> LTTNG_UST_DEBUG=1 environment variable set when starting
>>>> your test program ?
>>>>
>>>> I just noticed something that might explain what goes wrong here:
>>>>
>>>> lttng-context-perf-counters.c: add_thread_field() grabs the ust_lock().
>>>> Pthread mutex
>>>> in your case is instrumented with the pthread wrapper. This
>>>> "add_thread_field" is invoked
>>>> the first time the perf counter is hit by each given thread. When this
>>>> happens, the
>>>> instrumented pthread mutex will try to call into the instrumentation
>>>> tracepoint again,
>>>> which will call add_thread_field() (again), and so on until we reach
>>>> the libringbuffer
>>>> "lib_ring_buffer_nesting" threshold of 4 calls deep.
>>>>
>>>> I suspect this situation where we recursively call add_thread_field is
>>>> unexpected,
>>>> which may trigger your double-free here.
>>>>
>>>> Will investigate more.
>>>>
>>>> Can you try with the attached patch applied ?
>>>>
>>>> Here is an updated v2 of the patch which tests the notrace tls counter
>>>> sooner (before evaluating
>>>> filter). Please let me know how it goes.
>>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>
>>>>
>>>>
>>>> Shehab Y. Elsayed, MSc.
>>>> PhD Student
>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>> University of Toronto
>>>> E-mail: shehabyomn at gmail.com
>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>
>>>>
>>>> On Mon, Mar 19, 2018 at 4:01 PM, Mathieu Desnoyers <
>>>> mathieu.desnoyers at efficios.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> ----- On Mar 19, 2018, at 3:53 PM, Shehab Elsayed <
>>>>> shehabyomn at gmail.com> wrote:
>>>>>
>>>>> cat /proc/sys/kernel/perf_event_paranoid ---> returns 1
>>>>>
>>>>> I run the program as a normal user
>>>>>
>>>>> The glibc version I get by running "ldd --version" is 2.17
>>>>>
>>>>>
>>>>> Can you reproduce the issue after you do this as root ?
>>>>>
>>>>> echo "-1" > /proc/sys/kernel/perf_event_paranoid
>>>>>
>>>>> Based on this documentation of the Linux kernel:
>>>>>
>>>>> Documentation/sysctl/kernel.txt:
>>>>>
>>>>> perf_event_paranoid:
>>>>>
>>>>> Controls use of the performance events system by unprivileged
>>>>> users (without CAP_SYS_ADMIN).  The default value is 2.
>>>>>
>>>>>  -1: Allow use of (almost) all events by all users
>>>>>      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>>>>> >=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
>>>>>      Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>>>>> >=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>>>>> >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mathieu
>>>>>
>>>>>
>>>>>
>>>>> Shehab Y. Elsayed, MSc.
>>>>> PhD Student
>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>>> University of Toronto
>>>>> E-mail: shehabyomn at gmail.com
>>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>
>>>>>
>>>>> On Mon, Mar 19, 2018 at 3:31 PM, Mathieu Desnoyers <
>>>>> mathieu.desnoyers at efficios.com> wrote:
>>>>>
>>>>>> ---- On Mar 19, 2018, at 3:26 PM, Mathieu Desnoyers <
>>>>>> mathieu.desnoyers at efficios.com> wrote:
>>>>>>
>>>>>> ----- On Mar 19, 2018, at 2:26 PM, Shehab Elsayed <
>>>>>> shehabyomn at gmail.com> wrote:
>>>>>>
>>>>>> Yes, I tried with only one of those contexts and I still ran into the
>>>>>> same problem.
>>>>>>
>>>>>> What is the setting returned by
>>>>>>
>>>>>> cat /proc/sys/kernel/perf_event_paranoid
>>>>>>
>>>>>> on your system ? And do you run your test program as root or normal
>>>>>> user ?
>>>>>>
>>>>>> Please CC the mailing list on your reply.
>>>>>>
>>>>>>
>>>>>> I will also need to know what glibc version you have on your system.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Mathieu
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Mathieu
>>>>>>
>>>>>>
>>>>>>
>>>>>> Shehab Y. Elsayed, MSc.
>>>>>> PhD Student
>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>>>> University of Toronto
>>>>>> E-mail: shehabyomn at gmail.com
>>>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>
>>>>>>
>>>>>> On Mon, Mar 19, 2018 at 2:24 PM, Mathieu Desnoyers <
>>>>>> mathieu.desnoyers at efficios.com> wrote:
>>>>>>
>>>>>>> ----- On Mar 19, 2018, at 12:36 PM, Shehab Elsayed <
>>>>>>> shehabyomn at gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Mathieu,
>>>>>>>
>>>>>>> Thank you very much for your reply.
>>>>>>>
>>>>>>> I manually built lttng-ust from source (commit #:
>>>>>>> 8a208943e21700211beee3ea64180a5a534c7d2a).
>>>>>>>
>>>>>>> This is how I set up the tracing session:
>>>>>>> 1- lttng create lu_ncb_8_native -o {path}
>>>>>>> 2- lttng enable-event --userspace lttng_ust_pthread:pthread_
>>>>>>> mutex_lock_req
>>>>>>>     lttng enable-event --userspace lttng_ust_pthread:pthread_
>>>>>>> mutex_lock_acq
>>>>>>>     lttng enable-event --userspace lttng_ust_pthread:pthread_
>>>>>>> mutex_lock_trylock
>>>>>>>     lttng enable-event --userspace lttng_ust_pthread:pthread_
>>>>>>> mutex_lock_unlock
>>>>>>> 3- lttng add-context -u -t procname
>>>>>>>     lttng add-context -u -t vpid
>>>>>>>     lttng add-context -u -t pthread_id
>>>>>>>     lttng add-context -u -t vtid
>>>>>>>     lttng add-context -u -t ip
>>>>>>>     lttng add-context -u -t perf:thread:cpu-cycles
>>>>>>>     lttng add-context -u -t perf:thread:cycles
>>>>>>>     lttng add-context -u -t perf:thread:instructions
>>>>>>> 4- lttng start
>>>>>>> 5- LD_PRELOAD=/usr/local/lib/liblttng-ust-pthread-wrapper.so
>>>>>>> ./lu_ncb -p8 -n8096 -b32
>>>>>>> 6- lttng stop
>>>>>>> 7- lttng destroy
>>>>>>>
>>>>>>>
>>>>>>> Can you reproduce if you remove the following contexts ?
>>>>>>>
>>>>>>>     lttng add-context -u -t perf:thread:cpu-cycles
>>>>>>>     lttng add-context -u -t perf:thread:cycles
>>>>>>>     lttng add-context -u -t perf:thread:instructions
>>>>>>>
>>>>>>> And if you only keep a single of those contexts ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Mathieu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Shehab Y. Elsayed, MSc.
>>>>>>> PhD Student
>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer
>>>>>>> Engineering
>>>>>>> University of Toronto
>>>>>>> E-mail: shehabyomn at gmail.com
>>>>>>> <https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11#>
>>>>>>>
>>>>>>> On Mon, Mar 19, 2018 at 12:21 PM, Mathieu Desnoyers <
>>>>>>> mathieu.desnoyers at efficios.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- On Mar 16, 2018, at 5:37 PM, Shehab Elsayed <
>>>>>>>> shehabyomn at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> I am trying to instrument a pthread application using the provided
>>>>>>>> pthread wrapper, but I sometimes run into a "Double free or
>>>>>>>> corruption error (fasttop)" error.
>>>>>>>>
>>>>>>>>
>>>>>>>> Please provide more information about the version of lttng-ust you
>>>>>>>> are using, and how you setup
>>>>>>>> your tracing session.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Mathieu
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Here is a description of what I have tried and noticed:
>>>>>>>> 1- The problem isn't consistent. It sometimes happen and sometimes
>>>>>>>> works as expected.
>>>>>>>> 2- From my experiments, the problem happens (more frequently at
>>>>>>>> least) when adding performance counter contexts (I tried cycles,
>>>>>>>> cpu_cycles and instructions).
>>>>>>>> 3- I am testing using lu_ncb from splash3 benchmark suite after
>>>>>>>> setting LD_PRELOAD to use the pthread wrapper as described in the
>>>>>>>> LTTng documents.
>>>>>>>> 4- Here is the backtrace printed after exiting:
>>>>>>>> ======= Backtrace: =========
>>>>>>>> /lib64/libc.so.6([Thread 0x7ffff5611700 (LWP 97229) exited]
>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_destroy_context+
>>>>>>>> 0x35)[0x7ffff7471575]
>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_session_destroy+
>>>>>>>> 0x21c)[0x7ffff747363c]
>>>>>>>> /usr/local/lib/liblttng-ust.so.0(+0x1e906)[0x7ffff746d906]
>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+
>>>>>>>> 0x9f)[0x7ffff746dccf]
>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+
>>>>>>>> 0x9f)[0x7ffff746dccf]
>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_objd_unref+
>>>>>>>> 0x9f)[0x7ffff746dccf]
>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_abi_exit+0x68)[
>>>>>>>> 0x7ffff746ead8]
>>>>>>>> /usr/local/lib/liblttng-ust.so.0(+0x191d3)[0x7ffff74681d3]
>>>>>>>> /usr/local/lib/liblttng-ust.so.0(lttng_ust_exit+0x67)[
>>>>>>>> 0x7ffff745ed57]
>>>>>>>> /lib64/ld-linux-x86-64.so.2(+0xf85a)[0x7ffff7dec85a]
>>>>>>>> /lib64/libc.so.6(+0x38a49)[0x7ffff6ca6a49]
>>>>>>>> /lib64/libc.so.6(+0x38a95)[0x7ffff6ca6a95]
>>>>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401b51]
>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff6c8fb35]
>>>>>>>> /aenao-99/elsayed9/LTTng/data/scripts/tmp/lu_ncb[0x401c44]
>>>>>>>> 5- Also, this is a backtrace I obtained from gdb:
>>>>>>>> #0  0x00007ffff6eac1d7 in raise () from /lib64/libc.so.6
>>>>>>>> #1  0x00007ffff6ead8c8 in abort () from /lib64/libc.so.6
>>>>>>>> #2  0x00007ffff6eebf07 in __libc_message () from /lib64/libc.so.6
>>>>>>>> #3  0x00007ffff6ef3503 in _int_free () from /lib64/libc.so.6
>>>>>>>> #4  0x00007ffff768ad25 in lttng_destroy_perf_counter_field (
>>>>>>>>     field=<optimized out>) at lttng-context-perf-counters.c:418
>>>>>>>> #5  0x00007ffff767a575 in lttng_destroy_context (
>>>>>>>> ctx=0x7ffff0011090) at lttng-context.c:278
>>>>>>>> #6  0x00007ffff767c63c in _lttng_channel_unmap (
>>>>>>>> lttng_chan=0x7ffff0010f40) at lttng-events.c:172
>>>>>>>> #7  lttng_session_destroy (session=0x7ffff0000900)
>>>>>>>>     at lttng-events.c:247
>>>>>>>> #8  0x00007ffff7676906 in lttng_release_session (
>>>>>>>> objd=<optimized out>) at lttng-ust-abi.c:601
>>>>>>>> #9  0x00007ffff7676ccf in lttng_ust_objd_unref (id=1,
>>>>>>>>     is_owner=<optimized out>) at lttng-ust-abi.c:216
>>>>>>>> #10 0x00007ffff7676ccf in lttng_ust_objd_unref (id=2,
>>>>>>>>     is_owner=<optimized out>) at lttng-ust-abi.c:216
>>>>>>>> #11 0x00007ffff7676ccf in lttng_ust_objd_unref (id=id at entry=18,
>>>>>>>>     is_owner=is_owner at entry=1) at lttng-ust-abi.c:216
>>>>>>>> #12 0x00007ffff7677ad8 in objd_table_destroy ()
>>>>>>>>     at lttng-ust-abi.c:235
>>>>>>>> #13 lttng_ust_abi_exit () at lttng-ust-abi.c:1002
>>>>>>>> #14 0x00007ffff76711d3 in lttng_ust_cleanup (exiting=1)
>>>>>>>>     at lttng-ust-comm.c:1807
>>>>>>>> #15 0x00007ffff7667d57 in lttng_ust_exit ()
>>>>>>>>     at lttng-ust-comm.c:1874
>>>>>>>> #16 0x00007ffff7dec85a in _dl_fini ()
>>>>>>>>    from /lib64/ld-linux-x86-64.so.2
>>>>>>>> #17 0x00007ffff6eafa49 in __run_exit_handlers ()
>>>>>>>>    from /lib64/libc.so.6
>>>>>>>> #18 0x00007ffff6eafa95 in exit () from /lib64/libc.so.6
>>>>>>>> #19 0x0000000000401b51 in main (argc=<optimized out>,
>>>>>>>> argv=<optimized out>) at lu.c:368
>>>>>>>>
>>>>>>>> Any ideas, why this happens and how to fix it?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Shehab
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> lttng-dev mailing list
>>>>>>>> lttng-dev at lists.lttng.org
>>>>>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mathieu Desnoyers
>>>>>>>> EfficiOS Inc.
>>>>>>>> http://www.efficios.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Mathieu Desnoyers
>>>>>>> EfficiOS Inc.
>>>>>>> http://www.efficios.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mathieu Desnoyers
>>>>>> EfficiOS Inc.
>>>>>> http://www.efficios.com
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mathieu Desnoyers
>>>>>> EfficiOS Inc.
>>>>>> http://www.efficios.com
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Mathieu Desnoyers
>>>>> EfficiOS Inc.
>>>>> http://www.efficios.com
>>>>>
>>>>
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> http://www.efficios.com
>>>>
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> http://www.efficios.com
>>>>
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> http://www.efficios.com
>>>>
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> http://www.efficios.com
>>>>
>>>
>>>
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> http://www.efficios.com
>>>
>>
>>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20180322/71c9ea21/attachment-0001.html>


More information about the lttng-dev mailing list