[lttng-dev] Crash on first run of target using liblttng-ust-cyg-profile.so, but subsequent runs succeed

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Tue Mar 1 20:26:23 EST 2016


----- On Mar 1, 2016, at 1:50 PM, Michael Simons (VISUAL STUDIO) Michael.Simons at microsoft.com wrote:

> -----Original Message-----
> From: Mathieu Desnoyers [mailto:mathieu.desnoyers at efficios.com]
> Sent: Tuesday, March 1, 2016 11:45 AM
> To: Michael Simons (VISUAL STUDIO) <Michael.Simons at microsoft.com>; Michael
> Jeanson <mjeanson at efficios.com>
> Cc: lttng-dev <lttng-dev at lists.lttng.org>
> Subject: Re: [lttng-dev] Crash on first run of target using
> liblttng-ust-cyg-profile.so, but subsequent runs succeed
> 
> ----- On Mar 1, 2016, at 11:35 AM, Michael msimons at microsoft.com wrote:
> 
>> Mathieu Desnoyers <mathieu.desnoyers <at>
>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVNViOtQct0Wx3RuxovxBLBFV4%3d>
>> writes:
>> 
>>> 
>>> 
>>> 
>>> 
>>> ----- On Feb 12, 2016, at 3:21 PM, Mathieu Desnoyers
>> <mathieu.desnoyers <at>
>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVNViOtQct0Wx3RuxovxBLBFV4%3d>
>> wrote:
>>> 
>>> ----- On Feb 12, 2016, at 12:02 PM, Mathieu Desnoyers
>> <mathieu.desnoyers <at>
>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVNViOtQct0Wx3RuxovxBLBFV4%3d>
>> wrote:
>>> 
>>> 
>>> ---- On Feb 11, 2016, at 9:01 AM, Sean Heelan <seanheelan <at>
>> https://na01.safelinks.protection.outlook.com/?url=gmail.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=wd%2fv8rpVDvN5Y3cBldXn3gkPJNMYHYnT7WE5lLH0fvI%3d>
>> wrote:
>>> 
>>> Hi all,I am running a target within a Docker instance, and tracing
>> function execution using the latest LTTng release (2.7). The commands
>> I am issuing look as follows:
>>> ----
>>> 
>>> 
>>> lttng create cc_session -o bla
>>> lttng enable-event --userspace lttng_ust_cyg_profile:func_entry lttng
>>> start LD_PRELOAD=liblttng-ust-cyg-profile.so target lttng stop lttng
>>> destroy
>>> ----
>>> 
>>> When the target is executed it aborts with the following error:
>>> ----
>>> php: lttng-ust-comm.c:1582: lttng_ust_init: Assertion `!ret' failed.
>>> 
>>> ----
>>> If I rerun the command it then works fine. In fact, simply doing the
>> following within the Docker container demonstrates the issue:
>>> ----
>>> 
>>> LD_PRELOAD=liblttng-ust-cyg-profile.so ls
>>> LD_PRELOAD=liblttng-ust-cyg-profile.so ls
>>> 
>>> ----
>>> The first 'ls' will fail at the same point mentioned above, while the
>> second will succeed. Off the top of my head I'm struggling to come up
>> with an explanation as to what impact the first execution using
>> LD_PRELOAD would have on the second. Does it impact a shared lib cache
>> in some way, which I'm unaware of?
>>> Any assistance would be appreciated!
>>> 
>>> It appears that sem_timedwait() returns an unexpected error.
>>> 
>>> Can you add a ERROR("sem_timedwait"); just before the assert at line
>> 1582 in your liblttng-ust/lttng-ust-comm.c
>>> 
>>> and show us the output ?
>>> 
>>> Also, what is the value of your LTTNG_UST_REGISTER_TIMEOUT env. var. ?
>>> 
>>> 
>>> Also, what architecture, Linux distribution, and kernel
>>> 
>>> version are you using ?
>>> 
>>> 
>>> 
>>> I pushed the following fix in lttng-ust:
>>> 
>>> 
>>> commit 5cf81d53e305c81517e48e0dc91620a916cdc0ccAuthor: Mathieu
>> Desnoyers <mathieu.desnoyers <at>
>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=0
>> 1%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f9
>> 4a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVN
>> ViOtQct0Wx3RuxovxBLBFV4%3d>Date:   Fri Feb 12
>> 15:44:10 2016 -0500    Fix: handle negative range for
>> LTTNG_UST_REGISTER_TIMEOUT        We should not consider values below
>> -1 as valid timeout values, this is    is unexpected and could lead to
>> EINVAL errors returned by sem_timedwait.        Signed-off-by: Mathieu
>> Desnoyers <mathieu.desnoyers <at>
>> https://na01.safelinks.protection.outlook.com/?url=efficios.com&data=0
>> 1%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f9
>> 4a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=8rdPStR0c7gJagtnMSVN
>> ViOtQct0Wx3RuxovxBLBFV4%3d>
>>> 
>>> 
>>> Let me know if it helps,
>>> 
>>> 
>>> Thanks,
>>> 
>>> 
>>> Mathieu
>>> 
>> 
>> I am seeing the same issue that Sean originally reported within Docker
>> with the latest fix Mathieu made.  Setting LTTNG_UST_REGISTER_TIMEOUT
>> to any positive number cause the assert to fail. This behavior change
>> appears to have been introduced with Docker 1.10.2.  I logged a Docker
>> issue for this - https://github.com/docker/docker/issues/20818.
> 
> This is really weird. It looks like the futex system call would be blacklisted
> for some reason, which is really unexpected. This assert was due to an
> unexpected return value from sem_timedwait.
> 
> Michael Jeanson will do some testing with seccomp and lttng-ust.
> We'll try to confirm which system calls lttng-ust is using when tracing is
> active and inactive, so we can provide a system call dependency list to our
> users.
> 
> We'll also revisit the failure modes for unavailable system calls:
> we should not assert, but rather print an error to the user, and bail out.
> 
> If you can share with us the glibc version you are using, and the seccomp
> whitelist your docker environment is using, it would be very helpful in
> reproducing the issue.
> 
> Thanks,
> 
> Mathieu
> 
> 
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev at lists.lttng.org
>> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.
>> lttng.org%2fcgi-bin%2fmailman%2flistinfo%2flttng-dev&data=01%7c01%7cMi
>> chael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f9
>> 88bf86f141af91ab2d7cd011db47%7c1&sdata=cpnATVV0zmwGfFrFb4erUhoNVYR43oa
>> WtT9pQZCTunU%3d
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.efficios.com&data=01%7c01%7cMichael.Simons%40microsoft.com%7cbd1b8e06dcb04eb3878c08d341f94a1c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=GFkYADPxz3FUtmmdLTB845pDwyk4Qsqk3LgzvXyTlV8%3d
> 
> Thanks for the response Mathieu.
> 
> The dependency list will be very helpful for Docker users.  Additionally,
> getting a better diagnostic (error message) experience sounds great.
> 
> Here are the two things you requested:
> 
> glibc version:
> root at b3f0e6edf45b:/# ldd --version
> ldd (Ubuntu EGLIBC 2.19-0ubuntu6.6) 2.19
> Copyright (C) 2014 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> Written by Roland McGrath and Ulrich Drepper.
> 
> Seccomp whitelist:
> Here is the default docker seccomp configuration -
> https://github.com/docker/docker/blob/master/profiles/seccomp/default.json
> 
> The Docker documentation here list which syscalls are implicitly blocked (by not
> being on the whitelist) and the reason for it -
> https://github.com/docker/docker/blob/master/docs/security/seccomp.md
> 
> Let me know if you need anything else.

Hrm, we found the culprit, and it seems to be an "interesting"
rabbit hole.

Comparing strace of a simple application with lttng-ust
(tracing not enabled) with the UST register timeout at
-1 and 3000:

LTTNG_UST_REGISTER_TIMEOUT=-1
[pid    23] set_robust_list(0x7fb32047c9e0, 24) = 0
[pid    23] umask(037777777137)         = 022
[pid    23] open(0x7fb32047b2d0, O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0640) = 3
[pid    23] ftruncate(3, 4096)          = 0
[pid    23] exit_group(0)               = ?
[pid    23] +++ exited with 0 +++
[pid    20] <... futex resumed> )       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid    20] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23, si_status=0, si_utime=0, si_stime=0} ---
[pid    20] futex(0x7fb321f6e000, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>

default (3000)
[pid    23] set_robust_list(0x7f4ddaad79e0, 24) = 0
[pid    23] umask(037777777137)         = 022
[pid    23] open(0x7f4ddaad62d0, O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0640) = 3
[pid    23] ftruncate(3, 4096)          = 0
[pid    23] exit_group(0)               = ?
[pid    23] +++ exited with 0 +++
[pid    20] <... futex resumed> )       = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
[pid    20] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=23, si_status=0, si_utime=0, si_stime=0} ---
[ Process PID=20 runs in x32 mode. ]
[ Process PID=20 runs in 64 bit mode. ]
[pid    20] write(2, 0x7ffcd134c8e0, 68ls: lttng-ust-comm.c:1585: lttng_ust_init: Assertion `!ret' failed.

The difference here is that with a positive LTTNG_UST_REGISTER_TIMEOUT
value, we use sem_timedwait(), which behaves differently than
sem_wait() with respect to restarting when hit by a signal:
it uses sys_restart_syscall() (which is used when ERESTART_RESTARTBLOCK
is received).

Unfortunately, docker decided to blacklist this core system call:
https://github.com/docker/docker/blob/master/docs/security/seccomp.md

restart_syscall 	Don't allow containers to restart a syscall. Possible seccomp bypass see: https://code.google.com/p/chromium/issues/detail?id=408827.

Someone should ask them why they really have to blacklist this, since it
will lead to various races with respect to signal handling in tons
of applications. It looks like the wrong fix for a security concern.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list