[lttng-dev] Crash on first run of target using liblttng-ust-cyg-profile.so, but subsequent runs succeed

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Wed Mar 2 04:31:58 EST 2016


----- On Mar 1, 2016, at 8:26 PM, Mathieu Desnoyers mathieu.desnoyers at efficios.com wrote:

> ----- On Mar 1, 2016, at 1:50 PM, Michael Simons (VISUAL STUDIO)
> Michael.Simons at microsoft.com wrote:
> 
>> -----Original Message-----
>> From: Mathieu Desnoyers [mailto:mathieu.desnoyers at efficios.com]
>> Sent: Tuesday, March 1, 2016 11:45 AM
>> To: Michael Simons (VISUAL STUDIO) <Michael.Simons at microsoft.com>; Michael
>> Jeanson <mjeanson at efficios.com>
>> Cc: lttng-dev <lttng-dev at lists.lttng.org>
>> Subject: Re: [lttng-dev] Crash on first run of target using
>> liblttng-ust-cyg-profile.so, but subsequent runs succeed
>> 
>> ----- On Mar 1, 2016, at 11:35 AM, Michael msimons at microsoft.com wrote:
>> 
>>> Mathieu Desnoyers <mathieu.desnoyers <at>
>>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVNViOtQct0Wx3RuxovxBLBFV4%3d>
>>> writes:
>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ----- On Feb 12, 2016, at 3:21 PM, Mathieu Desnoyers
>>> <mathieu.desnoyers <at>
>>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVNViOtQct0Wx3RuxovxBLBFV4%3d>
>>> wrote:
>>>> 
>>>> ----- On Feb 12, 2016, at 12:02 PM, Mathieu Desnoyers
>>> <mathieu.desnoyers <at>
>>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVNViOtQct0Wx3RuxovxBLBFV4%3d>
>>> wrote:
>>>> 
>>>> 
>>>> ---- On Feb 11, 2016, at 9:01 AM, Sean Heelan <seanheelan <at>
>>> https://na01.safelinks.protection.outlook.com/?url=gmail.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=wd%2fv8rpVDvN5Y3cBldXn3gkPJNMYHYnT7WE5lLH0fvI%3d>
>>> wrote:
>>>> 
>>>> Hi all,I am running a target within a Docker instance, and tracing
>>> function execution using the latest LTTng release (2.7). The commands
>>> I am issuing look as follows:
>>>> ----
>>>> 
>>>> 
>>>> lttng create cc_session -o bla
>>>> lttng enable-event --userspace lttng_ust_cyg_profile:func_entry lttng
>>>> start LD_PRELOAD=liblttng-ust-cyg-profile.so target lttng stop lttng
>>>> destroy
>>>> ----
>>>> 
>>>> When the target is executed it aborts with the following error:
>>>> ----
>>>> php: lttng-ust-comm.c:1582: lttng_ust_init: Assertion `!ret' failed.
>>>> 
>>>> ----
>>>> If I rerun the command it then works fine. In fact, simply doing the
>>> following within the Docker container demonstrates the issue:
>>>> ----
>>>> 
>>>> LD_PRELOAD=liblttng-ust-cyg-profile.so ls
>>>> LD_PRELOAD=liblttng-ust-cyg-profile.so ls
>>>> 
>>>> ----
>>>> The first 'ls' will fail at the same point mentioned above, while the
>>> second will succeed. Off the top of my head I'm struggling to come up
>>> with an explanation as to what impact the first execution using
>>> LD_PRELOAD would have on the second. Does it impact a shared lib cache
>>> in some way, which I'm unaware of?
>>>> Any assistance would be appreciated!
>>>> 
>>>> It appears that sem_timedwait() returns an unexpected error.
>>>> 
>>>> Can you add a ERROR("sem_timedwait"); just before the assert at line
>>> 1582 in your liblttng-ust/lttng-ust-comm.c
>>>> 
>>>> and show us the output ?
>>>> 
>>>> Also, what is the value of your LTTNG_UST_REGISTER_TIMEOUT env. var. ?
>>>> 
>>>> 
>>>> Also, what architecture, Linux distribution, and kernel
>>>> 
>>>> version are you using ?
>>>> 
>>>> 
>>>> 
>>>> I pushed the following fix in lttng-ust:
>>>> 
>>>> 
>>>> commit 5cf81d53e305c81517e48e0dc91620a916cdc0ccAuthor: Mathieu
>>> Desnoyers <mathieu.desnoyers <at>
>>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=0
>>> 1%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f9
>>> 4a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVN
>>> ViOtQct0Wx3RuxovxBLBFV4%3d>Date:   Fri Feb 12
>>> 15:44:10 2016 -0500    Fix: handle negative range for
>>> LTTNG_UST_REGISTER_TIMEOUT        We should not consider values below
>>> -1 as valid timeout values, this is    is unexpected and could lead to
>>> EINVAL errors returned by sem_timedwait.        Signed-off-by: Mathieu
>>> Desnoyers <mathieu.desnoyers <at>
>>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=0
>>> 1%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f9
>>> 4a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVN
>>> ViOtQct0Wx3RuxovxBLBFV4%3d>
>>>> 
>>>> 
>>>> Let me know if it helps,
>>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> 
>>>> Mathieu
>>>> 
>>> 
>>> I am seeing the same issue that Sean originally reported within Docker
>>> with the latest fix Mathieu made.  Setting LTTNG_UST_REGISTER_TIMEOUT
>>> to any positive number cause the assert to fail. This behavior change
>>> appears to have been introduced with Docker 1.10.2.  I logged a Docker
>>> issue for this - https://github.com/docker/docker/issues/20818.
>> 
>> This is really weird. It looks like the futex system call would be blacklisted
>> for some reason, which is really unexpected. This assert was due to an
>> unexpected return value from sem_timedwait.
>> 
>> Michael Jeanson will do some testing with seccomp and lttng-ust.
>> We'll try to confirm which system calls lttng-ust is using when tracing is
>> active and inactive, so we can provide a system call dependency list to our
>> users.
>> 
>> We'll also revisit the failure modes for unavailable system calls:
>> we should not assert, but rather print an error to the user, and bail out.
>> 
>> If you can share with us the glibc version you are using, and the seccomp
>> whitelist your docker environment is using, it would be very helpful in
>> reproducing the issue.
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> 
>>> _______________________________________________
>>> lttng-dev mailing list
>>> lttng-dev at lists.lttng.org
>>> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.
>>> lttng.org%2fcgi-bin%2fmailman%2flistinfo%2flttng-dev&data=01%7c01%7cMi
>>> chael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f9
>>> 88bf86f141af91ab2d7cd011db47%7c1&sdata=cpnATVV0zmwGfFrFb4erUhoNVYR43oa
>>> WtT9pQZCTunU%3d
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.efficios.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=GFkYADPxz3FUtmmdLTB845pDwyk4Qsqk3LgzvXyTlV8%3d
>> 
>> Thanks for the response Mathieu.
>> 
>> The dependency list will be very helpful for Docker users.  Additionally,
>> getting a better diagnostic (error message) experience sounds great.
>> 
>> Here are the two things you requested:
>> 
>> glibc version:
>> root at b3f0e6edf45b:/# ldd --version
>> ldd (Ubuntu EGLIBC 2.19-0ubuntu6.6) 2.19
>> Copyright (C) 2014 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions.  There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>> Written by Roland McGrath and Ulrich Drepper.
>> 
>> Seccomp whitelist:
>> Here is the default docker seccomp configuration -
>> https://github.com/docker/docker/blob/master/profiles/seccomp/default.json
>> 
>> The Docker documentation here list which syscalls are implicitly blocked (by not
>> being on the whitelist) and the reason for it -
>> https://github.com/docker/docker/blob/master/docs/security/seccomp.md
>> 
>> Let me know if you need anything else.
> 
> Hrm, we found the culprit, and it seems to be an "interesting"
> rabbit hole.
> 
> Comparing strace of a simple application with lttng-ust
> (tracing not enabled) with the UST register timeout at
> -1 and 3000:
> 
> LTTNG_UST_REGISTER_TIMEOUT=-1
> [pid    23] set_robust_list(0x7fb32047c9e0, 24) = 0
> [pid    23] umask(037777777137)         = 022
> [pid    23] open(0x7fb32047b2d0, O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0640) = 3
> [pid    23] ftruncate(3, 4096)          = 0
> [pid    23] exit_group(0)               = ?
> [pid    23] +++ exited with 0 +++
> [pid    20] <... futex resumed> )       = ? ERESTARTSYS (To be restarted if
> SA_RESTART is set)
> [pid    20] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23,
> si_status=0, si_utime=0, si_stime=0} ---
> [pid    20] futex(0x7fb321f6e000, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
> 
> default (3000)
> [pid    23] set_robust_list(0x7f4ddaad79e0, 24) = 0
> [pid    23] umask(037777777137)         = 022
> [pid    23] open(0x7f4ddaad62d0, O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0640) = 3
> [pid    23] ftruncate(3, 4096)          = 0
> [pid    23] exit_group(0)               = ?
> [pid    23] +++ exited with 0 +++
> [pid    20] <... futex resumed> )       = ? ERESTART_RESTARTBLOCK (Interrupted
> by signal)
> [pid    20] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23,
> si_status=0, si_utime=0, si_stime=0} ---
> [ Process PID=20 runs in x32 mode. ]
> [ Process PID=20 runs in 64 bit mode. ]
> [pid    20] write(2, 0x7ffcd134c8e0, 68ls: lttng-ust-comm.c:1585:
> lttng_ust_init: Assertion `!ret' failed.
> 
> The difference here is that with a positive LTTNG_UST_REGISTER_TIMEOUT
> value, we use sem_timedwait(), which behaves differently than
> sem_wait() with respect to restarting when hit by a signal:
> it uses sys_restart_syscall() (which is used when ERESTART_RESTARTBLOCK
> is received).
> 
> Unfortunately, docker decided to blacklist this core system call:
> https://github.com/docker/docker/blob/master/docs/security/seccomp.md
> 
> restart_syscall 	Don't allow containers to restart a syscall. Possible seccomp
> bypass see: https://code.google.com/p/chromium/issues/detail?id=408827.
> 
> Someone should ask them why they really have to blacklist this, since it
> will lead to various races with respect to signal handling in tons
> of applications. It looks like the wrong fix for a security concern.

Adding Thomas Gleixner in CC.

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list