[lttng-dev] Double free or corruption error (fasttop)

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Wed Mar 21 16:22:58 EDT 2018


----- On Mar 21, 2018, at 1:55 PM, Shehab Elsayed <shehabyomn at gmail.com> wrote: 

> I am so sorry for the late replies.

> I have tried the first patch you sent and the problem still happens (although
> seemingly less frequently especially with debugging enabled!!!!). I have
> attached the output I got from one of the erroneous runs.

> I will try the updated patch and let you know how it goes.

> Also, I am not sure if these points make any difference:
> 1- The error actually happens at the end of the application. It actually
> finishes execution, but then something goes wrong.
> 2- I run into this problem only for some of the benchmarks (or at least the
> problems happens much less frequently for others that I didn't run into it, not
> yet)

> Thanks you very much, and sorry again for the late replies.

No worries! Looking through your log, I notice that pthread set cancel state has problems when 
called from application threads. We do not restore the original thread's cancel state. Can you try 
with the attached patch applied on top of the previous one ? 

Thanks, 

Mathieu 

> Shehab Y. Elsayed, MSc.
> PhD Student
> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
> University of Toronto
> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# |
> shehabyomn at gmail.com ]

> On Wed, Mar 21, 2018 at 1:27 PM, Mathieu Desnoyers < [
> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
> wrote:

>> ----- On Mar 20, 2018, at 5:42 PM, Mathieu Desnoyers < [
>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>> wrote:

>>> ----- On Mar 20, 2018, at 4:58 PM, Mathieu Desnoyers < [
>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>> wrote:

>>>> ----- On Mar 20, 2018, at 12:07 PM, Mathieu Desnoyers < [
>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>> wrote:

>>>>> ----- On Mar 19, 2018, at 4:21 PM, Shehab Elsayed < [
>>>>> mailto:shehabyomn at gmail.com | shehabyomn at gmail.com ] > wrote:

>>>>>> I did "echo "-1" > /proc/sys/kernel/perf_event_paranoid " and made sure the
>>>>>> value was actually written by "cat /proc/sys/kernel/perf_event_paranoid"

>>>>>> It executed normally 2 times but then got the same error.

>>>>> Can you provide the output when reproducing the issue with the
>>>>> LTTNG_UST_DEBUG=1 environment variable set when starting
>>>>> your test program ?

>>>> I just noticed something that might explain what goes wrong here:

>>>> lttng-context-perf-counters.c: add_thread_field() grabs the ust_lock(). Pthread
>>>> mutex
>>>> in your case is instrumented with the pthread wrapper. This "add_thread_field"
>>>> is invoked
>>>> the first time the perf counter is hit by each given thread. When this happens,
>>>> the
>>>> instrumented pthread mutex will try to call into the instrumentation tracepoint
>>>> again,
>>>> which will call add_thread_field() (again), and so on until we reach the
>>>> libringbuffer
>>>> "lib_ring_buffer_nesting" threshold of 4 calls deep.

>>>> I suspect this situation where we recursively call add_thread_field is
>>>> unexpected,
>>>> which may trigger your double-free here.

>>>> Will investigate more.

>>> Can you try with the attached patch applied ?

>> Here is an updated v2 of the patch which tests the notrace tls counter sooner
>> (before evaluating
>> filter). Please let me know how it goes.

>> Thanks,

>> Mathieu

>>> Thanks,

>>> Mathieu

>>>> Thanks,

>>>> Mathieu

>>>>> Thanks,

>>>>> Mathieu

>>>>>> Shehab Y. Elsayed, MSc.
>>>>>> PhD Student
>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>>>> University of Toronto
>>>>>> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# |
>>>>>> shehabyomn at gmail.com ]

>>>>>> On Mon, Mar 19, 2018 at 4:01 PM, Mathieu Desnoyers < [
>>>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>>>> wrote:

>>>>>>> ----- On Mar 19, 2018, at 3:53 PM, Shehab Elsayed < [
>>>>>>> mailto:shehabyomn at gmail.com | shehabyomn at gmail.com ] > wrote:

>>>>>>>> cat /proc/sys/kernel/perf_event_paranoid ---> returns 1

>>>>>>>> I run the program as a normal user

>>>>>>>> The glibc version I get by running "ldd --version" is 2.17

>>>>>>> Can you reproduce the issue after you do this as root ?

>>>>>>> echo "-1" > /proc/sys/kernel/perf_event_paranoid

>>>>>>> Based on this documentation of the Linux kernel:

>>>>>>> Documentation/sysctl/kernel.txt:

>>>>>>> perf_event_paranoid:

>>>>>>> Controls use of the performance events system by unprivileged
>>>>>>> users (without CAP_SYS_ADMIN). The default value is 2.

>>>>>>> -1: Allow use of (almost) all events by all users
>>>>>>> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>>>>>>> >=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
>>>>>>> Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>>>>>>> >=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>>>>>>> >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN

>>>>>>> Thanks,

>>>>>>> Mathieu

>>>>>>>> Shehab Y. Elsayed, MSc.
>>>>>>>> PhD Student
>>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>>>>>> University of Toronto
>>>>>>>> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# |
>>>>>>>> shehabyomn at gmail.com ]

>>>>>>>> On Mon, Mar 19, 2018 at 3:31 PM, Mathieu Desnoyers < [
>>>>>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>>>>>> wrote:

>>>>>>>>> ---- On Mar 19, 2018, at 3:26 PM, Mathieu Desnoyers < [
>>>>>>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>>>>>>> wrote:

>>>>>>>>>> ----- On Mar 19, 2018, at 2:26 PM, Shehab Elsayed < [
>>>>>>>>>> mailto:shehabyomn at gmail.com | shehabyomn at gmail.com ] > wrote:

>>>>>>>>>>> Yes, I tried with only one of those contexts and I still ran into the same
>>>>>>>>>>> problem.

>>>>>>>>>> What is the setting returned by

>>>>>>>>>> cat /proc/sys/kernel/perf_event_paranoid

>>>>>>>>>> on your system ? And do you run your test program as root or normal user ?

>>>>>>>>>> Please CC the mailing list on your reply.

>>>>>>>>> I will also need to know what glibc version you have on your system.

>>>>>>>>> Thanks,

>>>>>>>>> Mathieu

>>>>>>>>>> Thanks,

>>>>>>>>>> Mathieu

>>>>>>>>>>> Shehab Y. Elsayed, MSc.
>>>>>>>>>>> PhD Student
>>>>>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>>>>>>>>> University of Toronto
>>>>>>>>>>> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# |
>>>>>>>>>>> shehabyomn at gmail.com ]

>>>>>>>>>>> On Mon, Mar 19, 2018 at 2:24 PM, Mathieu Desnoyers < [
>>>>>>>>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>>>>>>>>> wrote:

>>>>>>>>>>>> ----- On Mar 19, 2018, at 12:36 PM, Shehab Elsayed < [
>>>>>>>>>>>> mailto:shehabyomn at gmail.com | shehabyomn at gmail.com ] > wrote:

>>>>>>>>>>>>> Hi Mathieu,

>>>>>>>>>>>>> Thank you very much for your reply.

>>>>>>>>>>>>> I manually built lttng-ust from source (commit #:
>>>>>>>>>>>>> 8a208943e21700211beee3ea64180a5a534c7d2a).

>>>>>>>>>>>>> This is how I set up the tracing session:
>>>>>>>>>>>>> 1- lttng create lu_ncb_8_native -o {path}
>>>>>>>>>>>>> 2- lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_req
>>>>>>>>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_acq
>>>>>>>>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_trylock
>>>>>>>>>>>>> lttng enable-event --userspace lttng_ust_pthread:pthread_mutex_lock_unlock
>>>>>>>>>>>>> 3- lttng add-context -u -t procname
>>>>>>>>>>>>> lttng add-context -u -t vpid
>>>>>>>>>>>>> lttng add-context -u -t pthread_id
>>>>>>>>>>>>> lttng add-context -u -t vtid
>>>>>>>>>>>>> lttng add-context -u -t ip
>>>>>>>>>>>>> lttng add-context -u -t perf:thread:cpu-cycles
>>>>>>>>>>>>> lttng add-context -u -t perf:thread:cycles
>>>>>>>>>>>>> lttng add-context -u -t perf:thread:instructions
>>>>>>>>>>>>> 4- lttng start
>>>>>>>>>>>>> 5- LD_PRELOAD=/usr/local/lib/liblttng-ust-pthread-wrapper.so ./lu_ncb -p8 -n8096
>>>>>>>>>>>>> -b32
>>>>>>>>>>>>> 6- lttng stop
>>>>>>>>>>>>> 7- lttng destroy

>>>>>>>>>>>> Can you reproduce if you remove the following contexts ?

>>>>>>>>>>>> lttng add-context -u -t perf:thread:cpu-cycles
>>>>>>>>>>>> lttng add-context -u -t perf:thread:cycles
>>>>>>>>>>>> lttng add-context -u -t perf:thread:instructions

>>>>>>>>>>>> And if you only keep a single of those contexts ?

>>>>>>>>>>>> Thanks,

>>>>>>>>>>>> Mathieu

>>>>>>>>>>>>> Shehab Y. Elsayed, MSc.
>>>>>>>>>>>>> PhD Student
>>>>>>>>>>>>> The Edwards S. Rogers Sr. Dept. of Electrical and Computer Engineering
>>>>>>>>>>>>> University of Toronto
>>>>>>>>>>>>> E-mail: [ https://webmail.rice.edu/imp/message.php?mailbox=INBOX&index=11# |
>>>>>>>>>>>>> shehabyomn at gmail.com ]

>>>>>>>>>>>>> On Mon, Mar 19, 2018 at 12:21 PM, Mathieu Desnoyers < [
>>>>>>>>>>>>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>>>>>>>>>>>>> wrote:

>>>>>>>>>>>>>> ----- On Mar 16, 2018, at 5:37 PM, Shehab Elsayed < [
>>>>>>>>>>>>>> mailto:shehabyomn at gmail.com | shehabyomn at gmail.com ] > wrote:

>>>>>>>>>>>>>>> Hello All,

>>>>>>>>>>>>>>> I am trying to instrument a pthread application using the provided pthread
>>>>>>>>>>>>>>> wrapper, but I sometimes run into a "Double free or corruption error ( fasttop
>>>>>>>>>>>>>>> )" error.

>>>>>>>>>>>>>> Please provide more information about the version of lttng-ust you are using,
>>>>>>>>>>>>>> and how you setup
>>>>>>>>>>>>>> your tracing session.

>>>>>>>>>>>>>> Thanks,

>>>>>>>>>>>>>> Mathieu

>>>>>>>>>>>>>>> Here is a description of what I have tried and noticed:
>>>>>>>>>>>>>>> 1- The problem isn't consistent. It sometimes happen and sometimes works as
>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>> 2- From my experiments, the problem happens (more frequently at least) when
>>>>>>>>>>>>>>> adding performance counter contexts (I tried cycles, cpu _cycles and
>>>>>>>>>>>>>>> instructions).
>>>>>>>>>>>>>>> 3- I am testing using lu _ ncb from splash3 benchmark suite after setting LD _
>>>>>>>>>>>>>>> PRELOAD to use the pthread wrapper as described in the LTTng documents.
>>>>>>>>>>>>>>> 4- Here is the backtrace printed after exiting:
>>>>>>>>>>>>>>> ======= Backtrace : =========
>>>>>>>>>>>>>>> /lib64/ libc .so.6([Thread 0x7ffff5611700 ( LWP 97229) exited]
>>>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng
>>>>>>>>>>>>>>> _destroy_context+0x35)[0x7ffff7471575]
>>>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng
>>>>>>>>>>>>>>> _session_destroy+0x21c)[0x7ffff747363c]
>>>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0(+0x1e906)[0x7ffff746d906]
>>>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ objd _ unref
>>>>>>>>>>>>>>> +0x9f)[0x7ffff746dccf]
>>>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ objd _ unref
>>>>>>>>>>>>>>> +0x9f)[0x7ffff746dccf]
>>>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ objd _ unref
>>>>>>>>>>>>>>> +0x9f)[0x7ffff746dccf]
>>>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _ abi
>>>>>>>>>>>>>>> _exit+0x68)[0x7ffff746ead8]
>>>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0(+0x191d3)[0x7ffff74681d3]
>>>>>>>>>>>>>>> / usr /local/lib/ liblttng - ust .so.0( lttng _ ust _exit+0x67)[0x7ffff745ed57]
>>>>>>>>>>>>>>> /lib64/ ld - linux -x86-64.so.2(+0xf85a)[0x7ffff7dec85a]
>>>>>>>>>>>>>>> /lib64/ libc .so.6(+0x38a49)[0x7ffff6ca6a49]
>>>>>>>>>>>>>>> /lib64/ libc .so.6(+0x38a95)[0x7ffff6ca6a95]
>>>>>>>>>>>>>>> / aenao -99/elsayed9/ LTTng /data/scripts/ tmp / lu _ ncb [0x401b51]
>>>>>>>>>>>>>>> /lib64/ libc .so.6(__ libc _start_main+0xf5)[0x7ffff6c8fb35]
>>>>>>>>>>>>>>> / aenao -99/elsayed9/ LTTng /data/scripts/ tmp / lu _ ncb [0x401c44]
>>>>>>>>>>>>>>> 5- Also, this is a backtrace I obtained from gdb :
>>>>>>>>>>>>>>> #0 0x00007ffff6eac1d7 in raise () from /lib64/ libc .so.6
>>>>>>>>>>>>>>> #1 0x00007ffff6ead8c8 in abort () from /lib64/ libc .so.6
>>>>>>>>>>>>>>> #2 0x00007ffff6eebf07 in __ libc _message () from /lib64/ libc .so.6
>>>>>>>>>>>>>>> #3 0x00007ffff6ef3503 in _int_free () from /lib64/ libc .so.6
>>>>>>>>>>>>>>> #4 0x00007ffff768ad25 in lttng _destroy_ perf _counter_field (
>>>>>>>>>>>>>>> field=<optimized out>) at lttng -context- perf -counters.c:418
>>>>>>>>>>>>>>> #5 0x00007ffff767a575 in lttng _destroy_context (
>>>>>>>>>>>>>>> ctx =0x7ffff0011090) at lttng -context.c:278
>>>>>>>>>>>>>>> #6 0x00007ffff767c63c in _ lttng _channel_ unmap (
>>>>>>>>>>>>>>> lttng _ chan =0x7ffff0010f40) at lttng -events.c:172
>>>>>>>>>>>>>>> #7 lttng _session_destroy (session=0x7ffff0000900)
>>>>>>>>>>>>>>> at lttng -events.c:247
>>>>>>>>>>>>>>> #8 0x00007ffff7676906 in lttng _release_session (
>>>>>>>>>>>>>>> objd =<optimized out>) at lttng - ust - abi .c:601
>>>>>>>>>>>>>>> #9 0x00007ffff7676ccf in lttng _ ust _ objd _ unref (id=1,
>>>>>>>>>>>>>>> is_owner=<optimized out>) at lttng - ust - abi .c:216
>>>>>>>>>>>>>>> #10 0x00007ffff7676ccf in lttng _ ust _ objd _ unref (id=2,
>>>>>>>>>>>>>>> is_owner=<optimized out>) at lttng - ust - abi .c:216
>>>>>>>>>>>>>>> #11 0x00007ffff7676ccf in lttng _ ust _ objd _ unref (id=id at entry=18,
>>>>>>>>>>>>>>> is_owner=is_owner at entry=1) at lttng - ust - abi .c:216
>>>>>>>>>>>>>>> #12 0x00007ffff7677ad8 in objd _table_destroy ()
>>>>>>>>>>>>>>> at lttng - ust - abi .c:235
>>>>>>>>>>>>>>> #13 lttng _ ust _ abi _exit () at lttng - ust - abi .c:1002
>>>>>>>>>>>>>>> #14 0x00007ffff76711d3 in lttng _ ust _cleanup (exiting=1)
>>>>>>>>>>>>>>> at lttng - ust -comm.c:1807
>>>>>>>>>>>>>>> #15 0x00007ffff7667d57 in lttng _ ust _exit ()
>>>>>>>>>>>>>>> at lttng - ust -comm.c:1874
>>>>>>>>>>>>>>> #16 0x00007ffff7dec85a in _ dl _ fini ()
>>>>>>>>>>>>>>> from /lib64/ ld - linux -x86-64.so.2
>>>>>>>>>>>>>>> #17 0x00007ffff6eafa49 in __run_exit_handlers ()
>>>>>>>>>>>>>>> from /lib64/ libc .so.6
>>>>>>>>>>>>>>> #18 0x00007ffff6eafa95 in exit () from /lib64/ libc .so.6
>>>>>>>>>>>>>>> #19 0x0000000000401b51 in main ( argc =<optimized out>,
>>>>>>>>>>>>>>> argv =<optimized out>) at lu .c:368

>>>>>>>>>>>>>>> Any ideas, why this happens and how to fix it?

>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Shehab

>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> lttng-dev mailing list
>>>>>>>>>>>>>>> [ mailto:lttng-dev at lists.lttng.org | lttng-dev at lists.lttng.org ]
>>>>>>>>>>>>>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev |
>>>>>>>>>>>>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ]

>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Mathieu Desnoyers
>>>>>>>>>>>>>> EfficiOS Inc.
>>>>>>>>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]

>>>>>>>>>>>> --
>>>>>>>>>>>> Mathieu Desnoyers
>>>>>>>>>>>> EfficiOS Inc.
>>>>>>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]

>>>>>>>>>> --
>>>>>>>>>> Mathieu Desnoyers
>>>>>>>>>> EfficiOS Inc.
>>>>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]

>>>>>>>>> --
>>>>>>>>> Mathieu Desnoyers
>>>>>>>>> EfficiOS Inc.
>>>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]

>>>>>>> --
>>>>>>> Mathieu Desnoyers
>>>>>>> EfficiOS Inc.
>>>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]

>>>>> --
>>>>> Mathieu Desnoyers
>>>>> EfficiOS Inc.
>>>>> [ http://www.efficios.com/ | http://www.efficios.com ]

>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> [ http://www.efficios.com/ | http://www.efficios.com ]

>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> [ http://www.efficios.com/ | http://www.efficios.com ]

>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> [ http://www.efficios.com/ | http://www.efficios.com ]

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20180321/ab77f065/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-restore-original-thread-cancel-state.patch
Type: text/x-patch
Size: 3133 bytes
Desc: not available
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20180321/ab77f065/attachment-0001.bin>


More information about the lttng-dev mailing list