[lttng-dev] [tip:timers/urgent] timekeeping: Fix HRTICK related deadlock from ntp lock changes

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Tue Sep 17 12:33:03 EDT 2013


* Ingo Molnar (mingo at kernel.org) wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers at efficios.com> wrote:
> 
> > * Ingo Molnar (mingo at kernel.org) wrote:
> > > 
> > > * Mathieu Desnoyers <mathieu.desnoyers at efficios.com> wrote:
> > > 
> > > > Hi Ingo,
> > > > 
> > > > Do you have an estimate of the time it will take for this fix to hit 
> > > > mainline, stable-3.10 and stable-3.11 ? Meanwhile, I'm marking 3.10 and 
> > > > 3.11 as broken for LTTng with a kernel version at compile-time, since 
> > > > this kernel regression currently triggers hard system lockup when people 
> > > > use LTTng on those kernels, and this is certainly something nobody 
> > > > wants.
> > > 
> > > So, at least as per the description of John, this should only trigger if 
> > > SCHED_HRTICK is enabled in sched_features - which is disabled by default, 
> > > it's a debug-only development feature. Does the bug trigger on more 
> > > regular kernels as well?
> > 
> > Unfortunately, it does happen on a pretty standard kernel config (giving
> > my x230 config as example below). Pasting relevant bug description from
> > http://bugs.lttng.org/issues/631 :
> > 
> > "Starting from Linux kernel commit
> > 06c017fdd4dc48451a29ac37fc1db4a3f86b7f40 "timekeeping: Hold
> > timekeepering locks in do_adjtimex and hardpps" (3.10 kernels), the
> > xtime write seqlock is held across calls to __do_adjtimex(), which
> > includes a call to notify_cmos_timer(), and hence
> > schedule_delayed_work().
> > 
> > This introduces a side-effect for a set of tracepoints, including mainly 
> > the workqueue tracepoints: a tracer hooking on those tracepoints and 
> > reading current time with ktime_get() will cause hard system LOCKUP"
> 
> It's the LTTng tracepoint 'hooking' in something that does something 
> invalid in that context that is causing the hang, not the vanilla kernel 
> itself, right?

Yes, that's correct. In order to ensure this kind of problem is entirely
taken care of, I've started working on a synchronization scheme proposed
by Peter Zijlstra that would allow ktime() to be called from any
execution context (see:
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg504089.html).

> 
> In that case the 'you get to keep both pieces' policy of out of tree code 
> applies - but the HRTICK fix should solve your problem as well, 
> incidentally.

Thanks,

Mathieu

> 
> Thanks,
> 
> 	Ingo

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list