[lttng-dev] Crash on first run of target using liblttng-ust-cyg-profile.so, but subsequent runs succeed

Thomas Gleixner tglx at linutronix.de
Wed Mar 2 04:36:21 EST 2016


Mathieu,

On Wed, 2 Mar 2016, Mathieu Desnoyers wrote:
> > The difference here is that with a positive LTTNG_UST_REGISTER_TIMEOUT
> > value, we use sem_timedwait(), which behaves differently than
> > sem_wait() with respect to restarting when hit by a signal:
> > it uses sys_restart_syscall() (which is used when ERESTART_RESTARTBLOCK
> > is received).
> > 
> > Unfortunately, docker decided to blacklist this core system call:
> > https://github.com/docker/docker/blob/master/docs/security/seccomp.md
> > 
> > restart_syscall 	Don't allow containers to restart a syscall. Possible seccomp
> > bypass see: https://code.google.com/p/chromium/issues/detail?id=408827.
> > 
> > Someone should ask them why they really have to blacklist this, since it
> > will lead to various races with respect to signal handling in tons
> > of applications. It looks like the wrong fix for a security concern.

Well, it's not a fix. It's just silly. The chromium bug is closed as:

      Closing as WontFix -- no PoC, looks to be unreachable.

Find below my analysis from back then when google reported that. This resulted
in the commit 849151dd5481b "compat: nanosleep: Clarify error handling" in
mainline.

So nothing to see here than securuty folks being silly.

Thanks,

	tglx

8<----------------------

> > You're looking at the non-compat case.
> > COMPAT_SYSCALL_DEFINE2(nanosleep,... in kernel/compat.c only sets up
> > fn and .compat_rmtp.
> 
> Well, it actually sets all fields for the real restart case - because
> it calls (and depends on) hrtimer_nanosleep() doing the rest.
> 
> The problem is that hrtimer_nanosleep() doesn't actually set any of
> the restart block stuff for the non-restart cases.

Indeed, but it's subtly correct nevertheless for the compat nanosleep
case.

COMPAT_SYSCALL_DEFINE2(nanosleep, struct compat_timespec __user *, rqtp,
		       struct compat_timespec __user *, rmtp)
{
	struct timespec tu, rmt;
	long ret;

	if (compat_get_timespec(&tu, rqtp))
		return -EFAULT;

	ret = hrtimer_nanosleep(&tu,
				rmtp ? (struct timespec __user *)&rmt : NULL,
				HRTIMER_MODE_REL, CLOCK_MONOTONIC);

Here is the important stuff:

     hrtimer_nanosleep() is called with two variables on the kernel
     stack of the calling task: tu and rmt.

     And what's further important it's called with HRTIMER_MODE_REL.

And hrtimer_nanosleep() does:

	if (do_nanosleep())
	   return 0;

The timer expired, nothing to worry about.

	if (mode == HRTIMER_MODE_ABS) 
	   return -ERESTARTNOHAND;
  
Irrelevant for the compat case, due to mode = HRTIMER_MODE_REL. This
return code CANNOT happen if called from compat_sys_nanosleep(). If it
happens, we really don't have to worry about the rest anymore.
 	   
	if (rmtp && update_rmtp())
	   return -EFAULT;

If that update fails, then we have a way bigger problem than the half
filled in restart block because "rmtp" is on the kernel stack of the
calling task.

	fill_restart_block_completely();

	return -ERESTART_RESTARTBLOCK;

So we either return 0 or -ERESTART_RESTARTBLOCK with a completely
filled in restart block. So the below is correct:

	if (ret) {
		struct restart_block *restart
			= &current_thread_info()->restart_block;

		restart->fn = compat_nanosleep_restart;
		restart->nanosleep.compat_rmtp = rmtp;

		if (rmtp && compat_put_timespec(&rmt, rmtp))
			return -EFAULT;
	}

I admit, it's not really obvious, so a cosmetic documentary change is
definitely due.

Thanks,

	Thomas
	



More information about the lttng-dev mailing list