[lttng-dev] [RFC PATCH lttng-modules] Fix: nmi-safe clock on 32-bit systems
mathieu.desnoyers at efficios.com
Tue Feb 14 20:44:16 UTC 2017
----- On Feb 14, 2017, at 3:02 PM, Trent Piepho tpiepho at kymetacorp.com wrote:
> On Thu, 2017-02-09 at 20:51 -0500, mathieu.desnoyers wrote:
>> On 32-bit systems, the current algorithm assume to have one clock read
>> per 32-bit LSB overflow period, which is not guaranteed. It also has an
>> issue on the first clock reads after module load, because the initial
>> value for the last LSB is 0. It can cause the time to stay stuck at the
>> same value for a few seconds at the beginning of the trace.
> I tested this on my 32-bit ARM system and it has solved the issue with
> timestamp being fixed at zero until the first time the TSC wraps.
> I'm not sure how to properly test that NMIs do not generate any reverse
NMIs don't exist on ARM. They have FIQ which are close (fast interrupts),
but no NMIs. It would be a good thing to ensure that either FIQs are disabled
when the OS updates the clock, or that no tracing is ever done in FIQ
handlers. I don't think FIQs are disabled when updating the clock, but
I suspect that no tracing is done in FIQ handlers.
Also, lttng-modules, with that fix on ARM, just discard events that
would come from NMI context (and since nmi context don't exist on
ARM, it never discards anything).
>> Fix this by using the non-nmi-safe clock source on 32-bit systems,
>> except for x86 systems that support double-word cmpxchg (cmpxchg8b).
> Is there a reason to restrict this to x86 systems? ARM supports 64-bit
> cmpxchg too, and indeed that's what I was testing.
Since ARM don't have NMI, I don't see why we should care about making
the clock source nmi-safe, especially since the algorithm adds a
cmpxchg64_local (slight performance overhead). This might end up
being useful if we ever want to trace fast interrupt handlers though.
>> #if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0) \
>> + && (BITS_PER_LONG == 64 || CONFIG_X86_CMPXCHG64) \
>> && !defined(LTTNG_CLOCK_NMI_SAFE_BROKEN))
> This fails to compile on ARM, as CONFIG_X86_CMPXCHG64 is not defined.
> From what I can see, all ARM have cmpxchg64_local() so I modified this
> to allow ARM support, used defined(), etc.
> It also appears that even if CONFIG_X86_CMPXCHG64 is not defined on x86,
> there is still support via another definition of cmpxchg64_local() using
> cmpxchg8b_emu. Or does this version lack some atomic in the face of NMI
> ability that the cmpxchg8b instruction would have?
Indeed, cmpxchg8b_emu disables interrupts on UP within its emulation, which
is not safe against NMIs.
I am tempted to only use the nmi-safe clock source on architectures
where it matters, and ARM32/64 don't have NMIs. So I would use
the usual seqlock-protected clock source on ARM32/64, unless we
end up figuring out that FIQ handlers are instrumented with
For x86 32/64, when 64-bit local cmpxchg is natively available,
I would like to use the nmi-safe clock source. However, using
double-word CAS on x86-32 is something that would be a new behavior
in lttng-modules, and I don't want to introduce this as a fix into
the stable branches. This is why the commit that landed into master
currently simply revert to the non-nmi-safe clock source on x86-32,
and don't trace NMIs on x86-32 for now.
I would be OK to introduce use of double-word CAS on x86-32 as a new
feature for upcoming lttng-modules versions (2.10+). But I would only use it
when CONFIG_X86_CMPXCHG64 is defined: it tells us that the architecture
does have a 64-bit cmpxchg instruction.
More information about the lttng-dev