[lttng-dev] Double free or corruption error (fasttop)
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Wed Mar 21 13:27:17 EDT 2018
----- On Mar 20, 2018, at 5:42 PM, Mathieu Desnoyers <mathieu.desnoyers at efficios.com> wrote:
> ----- On Mar 20, 2018, at 4:58 PM, Mathieu Desnoyers
> <mathieu.desnoyers at efficios.com> wrote:
>> ----- On Mar 20, 2018, at 12:07 PM, Mathieu Desnoyers
>> <mathieu.desnoyers at efficios.com> wrote:
>>> ----- On Mar 19, 2018, at 4:21 PM, Shehab Elsayed <shehabyomn at gmail.com> wrote:
>>>> I did "echo "-1" > /proc/sys/kernel/perf_event_paranoid " and made sure the
>>>> value was actually written by "cat /proc/sys/kernel/perf_event_paranoid"
>>>> It executed normally 2 times but then got the same error.
>>> Can you provide the output when reproducing the issue with the
>>> LTTNG_UST_DEBUG=1 environment variable set when starting
>>> your test program ?
>> I just noticed something that might explain what goes wrong here:
>> lttng-context-perf-counters.c: add_thread_field() grabs the ust_lock(). Pthread
>> mutex
>> in your case is instrumented with the pthread wrapper. This "add_thread_field"
>> is invoked
>> the first time the perf counter is hit by each given thread. When this happens,
>> the
>> instrumented pthread mutex will try to call into the instrumentation tracepoint
>> again,
>> which will call add_thread_field() (again), and so on until we reach the
>> libringbuffer
>> "lib_ring_buffer_nesting" threshold of 4 calls deep.
>> I suspect this situation where we recursively call add_thread_field is
>> unexpected,
>> which may trigger your double-free here.
>> Will investigate more.
> Can you try with the attached patch applied ?
Here is an updated v2 of the patch which tests the notrace tls counter sooner (before evaluating
filter). Please let me know how it goes.
Thanks,
Mathieu
> Thanks,
> Mathieu
>> Thanks,
>> Mathieu
>>> Thanks,
>>> Mathieu
>>>> Shehab Y. Elsayed, MSc.
>>>> PhD Student
>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>> University of Toronto
>>>> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# |
>>>> shehabyomn at gmail.com ]
>>>> On Mon, Mar 19, 2018 at 4:01 PM, Mathieu Desnoyers < [
>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>> wrote:
>>>>> ----- On Mar 19, 2018, at 3:53 PM, Shehab Elsayed < [
>>>>> mailto:shehabyomn at gmail.com | shehabyomn at gmail.com ] > wrote:
>>>>>> cat /proc/sys/kernel/perf_event_paranoid ---> returns 1
>>>>>> I run the program as a normal user
>>>>>> The glibc version I get by running "ldd --version" is 2.17
>>>>> Can you reproduce the issue after you do this as root ?
>>>>> echo "-1" > /proc/sys/kernel/perf_event_paranoid
>>>>> Based on this documentation of the Linux kernel:
>>>>> Documentation/sysctl/kernel.txt:
>>>>> perf_event_paranoid:
>>>>> Controls use of the performance events system by unprivileged
>>>>> users (without CAP_SYS_ADMIN). The default value is 2.
>>>>> -1: Allow use of (almost) all events by all users
>>>>> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>>>>> >=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
>>>>> Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>>>>> >=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>>>>> >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
>>>>> Thanks,
>>>>> Mathieu
>>>>>> Shehab Y. Elsayed, MSc.
>>>>>> PhD Student
>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>>>> University of Toronto
>>>>>> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# |
>>>>>> shehabyomn at gmail.com ]
>>>>>> On Mon, Mar 19, 2018 at 3:31 PM, Mathieu Desnoyers < [
>>>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>>>> wrote:
>>>>>>> ---- On Mar 19, 2018, at 3:26 PM, Mathieu Desnoyers < [
>>>>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>>>>> wrote:
>>>>>>>> ----- On Mar 19, 2018, at 2:26 PM, Shehab Elsayed < [
>>>>>>>> mailto:shehabyomn at gmail.com | shehabyomn at gmail.com ] > wrote:
>>>>>>>>> Yes, I tried with only one of those contexts and I still ran into the same
>>>>>>>>> problem.
>>>>>>>> What is the setting returned by
>>>>>>>> cat /proc/sys/kernel/perf_event_paranoid
>>>>>>>> on your system ? And do you run your test program as root or normal user ?
>>>>>>>> Please CC the mailing list on your reply.
>>>>>>> I will also need to know what glibc version you have on your system.
>>>>>>> Thanks,
>>>>>>> Mathieu
>>>>>>>> Thanks,
>>>>>>>> Mathieu
>>>>>>>>> Shehab Y. Elsayed, MSc.
>>>>>>>>> PhD Student
>>>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>>>>>>> University of Toronto
>>>>>>>>> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# |
>>>>>>>>> shehabyomn at gmail.com ]
>>>>>>>>> On Mon, Mar 19, 2018 at 2:24 PM, Mathieu Desnoyers < [
>>>>>>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>>>>>>> wrote:
>>>>>>>>>> ----- On Mar 19, 2018, at 12:36 PM, Shehab Elsayed < [
>>>>>>>>>> mailto:shehabyomn at gmail.com | shehabyomn at gmail.com ] > wrote:
>>>>>>>>>>> Hi Mathieu,
>>>>>>>>>>> Thank you very much for your reply.
>>>>>>>>>>> I manually built lttng-ust from source (commit #:
>>>>>>>>>>> 8a208943e21700211beee3ea64180a5a534c7d2a).
>>>>>>>>>>> This is how I set up the tracing session:
>>>>>>>>>>> 1- lttng create lu_ncb_8_native -o {path}
>>>>>>>>>>> 2- lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_req
>>>>>>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_acq
>>>>>>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_trylock
>>>>>>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_unlock
>>>>>>>>>>> 3- lttng add-context -u -t procname
>>>>>>>>>>> lttng add-context -u -t vpid
>>>>>>>>>>> lttng add-context -u -t pthread_id
>>>>>>>>>>> lttng add-context -u -t vtid
>>>>>>>>>>> lttng add-context -u -t ip
>>>>>>>>>>> lttng add-context -u -t perf:thread:cpu-cycles
>>>>>>>>>>> lttng add-context -u -t perf:thread:cycles
>>>>>>>>>>> lttng add-context -u -t perf:thread:instructions
>>>>>>>>>>> 4- lttng start
>>>>>>>>>>> 5- LD_PRELOAD=/usr/local/lib/liblttng-ust-pthread-wrapper.so ./lu_ncb -p8 -n8096
>>>>>>>>>>> -b32
>>>>>>>>>>> 6- lttng stop
>>>>>>>>>>> 7- lttng destroy
>>>>>>>>>> Can you reproduce if you remove the following contexts ?
>>>>>>>>>> lttng add-context -u -t perf:thread:cpu-cycles
>>>>>>>>>> lttng add-context -u -t perf:thread:cycles
>>>>>>>>>> lttng add-context -u -t perf:thread:instructions
>>>>>>>>>> And if you only keep a single of those contexts ?
>>>>>>>>>> Thanks,
>>>>>>>>>> Mathieu
>>>>>>>>>>> Shehab Y. Elsayed, MSc.
>>>>>>>>>>> PhD Student
>>>>>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>>>>>>>>> University of Toronto
>>>>>>>>>>> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# |
>>>>>>>>>>> shehabyomn at gmail.com ]
>>>>>>>>>>> On Mon, Mar 19, 2018 at 12:21 PM, Mathieu Desnoyers < [
>>>>>>>>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>>>>>>>>> wrote:
>>>>>>>>>>>> ----- On Mar 16, 2018, at 5:37 PM, Shehab Elsayed < [
>>>>>>>>>>>> mailto:shehabyomn at gmail.com | shehabyomn at gmail.com ] > wrote:
>>>>>>>>>>>>> Hello All,
>>>>>>>>>>>>> I am trying to instrument a pthread application using the provided pthread
>>>>>>>>>>>>> wrapper, but I sometimes run into a "Double free or corruption error ( fasttop
>>>>>>>>>>>>> )" error.
>>>>>>>>>>>> Please provide more information about the version of lttng-ust you are using,
>>>>>>>>>>>> and how you setup
>>>>>>>>>>>> your tracing session.
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Mathieu
>>>>>>>>>>>>> Here is a description of what I have tried and noticed:
>>>>>>>>>>>>> 1- The problem isn't consistent. It sometimes happen and sometimes works as
>>>>>>>>>>>>> expected.
>>>>>>>>>>>>> 2- From my experiments, the problem happens (more frequently at least) when
>>>>>>>>>>>>> adding performance counter contexts (I tried cycles, cpu _cycles and
>>>>>>>>>>>>> instructions).
>>>>>>>>>>>>> 3- I am testing using lu _ ncb from splash3 benchmark suite after setting LD _
>>>>>>>>>>>>> PRELOAD to use the pthread wrapper as described in the LTTng documents.
>>>>>>>>>>>>> 4- Here is the backtrace printed after exiting:
>>>>>>>>>>>>> ======= Backtrace : =========
>>>>>>>>>>>>> /lib64/ libc .so.6([Thread 0x7ffff5611700 ( LWP 97229) exited]
>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng
>>>>>>>>>>>>> _destroy_context+0x35)[0x7ffff7471575]
>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng
>>>>>>>>>>>>> _session_destroy+0x21c)[0x7ffff747363c]
>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0(+0x1e906)[0x7ffff746d906]
>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ objd _ unref
>>>>>>>>>>>>> +0x9f)[0x7ffff746dccf]
>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ objd _ unref
>>>>>>>>>>>>> +0x9f)[0x7ffff746dccf]
>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ objd _ unref
>>>>>>>>>>>>> +0x9f)[0x7ffff746dccf]
>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ abi
>>>>>>>>>>>>> _exit+0x68)[0x7ffff746ead8]
>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0(+0x191d3)[0x7ffff74681d3]
>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _exit+0x67)[0x7ffff745ed57]
>>>>>>>>>>>>> /lib64/ ld - linux -x86-64.so.2(+0xf85a)[0x7ffff7dec85a]
>>>>>>>>>>>>> /lib64/ libc .so.6(+0x38a49)[0x7ffff6ca6a49]
>>>>>>>>>>>>> /lib64/ libc .so.6(+0x38a95)[0x7ffff6ca6a95]
>>>>>>>>>>>>> / aenao -99/elsayed9/ LTTng /data/scripts/ tmp / lu _ ncb [0x401b51]
>>>>>>>>>>>>> /lib64/ libc .so.6(__ libc _start_main+0xf5)[0x7ffff6c8fb35]
>>>>>>>>>>>>> / aenao -99/elsayed9/ LTTng /data/scripts/ tmp / lu _ ncb [0x401c44]
>>>>>>>>>>>>> 5- Also, this is a backtrace I obtained from gdb :
>>>>>>>>>>>>> #0 0x00007ffff6eac1d7 in raise () from /lib64/ libc .so.6
>>>>>>>>>>>>> #1 0x00007ffff6ead8c8 in abort () from /lib64/ libc .so.6
>>>>>>>>>>>>> #2 0x00007ffff6eebf07 in __ libc _message () from /lib64/ libc .so.6
>>>>>>>>>>>>> #3 0x00007ffff6ef3503 in _int_free () from /lib64/ libc .so.6
>>>>>>>>>>>>> #4 0x00007ffff768ad25 in lttng _destroy_ perf _counter_field (
>>>>>>>>>>>>> field=<optimized out>) at lttng -context- perf -counters.c:418
>>>>>>>>>>>>> #5 0x00007ffff767a575 in lttng _destroy_context (
>>>>>>>>>>>>> ctx =0x7ffff0011090) at lttng -context.c:278
>>>>>>>>>>>>> #6 0x00007ffff767c63c in _ lttng _channel_ unmap (
>>>>>>>>>>>>> lttng _ chan =0x7ffff0010f40) at lttng -events.c:172
>>>>>>>>>>>>> #7 lttng _session_destroy (session=0x7ffff0000900)
>>>>>>>>>>>>> at lttng -events.c:247
>>>>>>>>>>>>> #8 0x00007ffff7676906 in lttng _release_session (
>>>>>>>>>>>>> objd =<optimized out>) at lttng - ust - abi .c:601
>>>>>>>>>>>>> #9 0x00007ffff7676ccf in lttng _ ust _ objd _ unref (id=1,
>>>>>>>>>>>>> is_owner=<optimized out>) at lttng - ust - abi .c:216
>>>>>>>>>>>>> #10 0x00007ffff7676ccf in lttng _ ust _ objd _ unref (id=2,
>>>>>>>>>>>>> is_owner=<optimized out>) at lttng - ust - abi .c:216
>>>>>>>>>>>>> #11 0x00007ffff7676ccf in lttng _ ust _ objd _ unref (id=id at entry=18,
>>>>>>>>>>>>> is_owner=is_owner at entry=1) at lttng - ust - abi .c:216
>>>>>>>>>>>>> #12 0x00007ffff7677ad8 in objd _table_destroy ()
>>>>>>>>>>>>> at lttng - ust - abi .c:235
>>>>>>>>>>>>> #13 lttng _ ust _ abi _exit () at lttng - ust - abi .c:1002
>>>>>>>>>>>>> #14 0x00007ffff76711d3 in lttng _ ust _cleanup (exiting=1)
>>>>>>>>>>>>> at lttng - ust -comm.c:1807
>>>>>>>>>>>>> #15 0x00007ffff7667d57 in lttng _ ust _exit ()
>>>>>>>>>>>>> at lttng - ust -comm.c:1874
>>>>>>>>>>>>> #16 0x00007ffff7dec85a in _ dl _ fini ()
>>>>>>>>>>>>> from /lib64/ ld - linux -x86-64.so.2
>>>>>>>>>>>>> #17 0x00007ffff6eafa49 in __run_exit_handlers ()
>>>>>>>>>>>>> from /lib64/ libc .so.6
>>>>>>>>>>>>> #18 0x00007ffff6eafa95 in exit () from /lib64/ libc .so.6
>>>>>>>>>>>>> #19 0x0000000000401b51 in main ( argc =<optimized out>,
>>>>>>>>>>>>> argv =<optimized out>) at lu .c:368
>>>>>>>>>>>>> Any ideas, why this happens and how to fix it?
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Shehab
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> lttng-dev mailing list
>>>>>>>>>>>>> [ mailto:lttng-dev at lists.lttng.org | lttng-dev at lists.lttng.org ]
>>>>>>>>>>>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev |
>>>>>>>>>>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ]
>>>>>>>>>>>> --
>>>>>>>>>>>> Mathieu Desnoyers
>>>>>>>>>>>> EfficiOS Inc.
>>>>>>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]
>>>>>>>>>> --
>>>>>>>>>> Mathieu Desnoyers
>>>>>>>>>> EfficiOS Inc.
>>>>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]
>>>>>>>> --
>>>>>>>> Mathieu Desnoyers
>>>>>>>> EfficiOS Inc.
>>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]
>>>>>>> --
>>>>>>> Mathieu Desnoyers
>>>>>>> EfficiOS Inc.
>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]
>>>>> --
>>>>> Mathieu Desnoyers
>>>>> EfficiOS Inc.
>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> http://www.efficios.com
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20180321/c531e4db/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-perf-event-mutex-with-pthread-wrapper.patch
Type: text/x-patch
Size: 6243 bytes
Desc: not available
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20180321/c531e4db/attachment-0001.bin>
More information about the lttng-dev
mailing list