[lttng-dev] Xeon Phi memory barriers

Simon Marchi simon.marchi at polymtl.ca
Tue Nov 19 10:26:06 EST 2013


Hello there,

liburcu does not build on the Intel Xeon Phi, because the chip is
recognized as x86_64, but lacks the {s,l,m}fence instructions found on
usual x86_64 processors. The following is taken from the Xeon Phi dev
guide:

The Intel® Xeon PhiTM coprocessor memory model is the same as that of
the Intel® Pentium processor. The reads and writes always appear in
programmed order at the system bus (or the ring interconnect in the
case of the Intel® Xeon PhiTM coprocessor); the exception being that
read misses are permitted to go ahead of buffered writes on the system
bus when all the buffered writes are cached hits and are, therefore,
not directed to the same address being accessed by the read miss.

As a consequence of its stricter memory ordering model, the Intel®
Xeon PhiTM coprocessor does not support the SFENCE, LFENCE, and MFENCE
instructions that provide a more efficient way of controlling memory
ordering on other Intel processors.

While reads and writes from an Intel® Xeon PhiTM coprocessor appear in
program order on the system bus, the compiler can still reorder
unrelated memory operations while maintaining program order on a
single Intel® Xeon PhiTM coprocessor (hardware thread). If software
running on an Intel® Xeon PhiTM coprocessor is dependent on the order
of memory operations on another Intel® Xeon PhiTM coprocessor then a
serializing instruction (e.g., CPUID, instruction with a LOCK prefix)
between the memory operations is required to guarantee completion of
all memory accesses issued prior to the serializing instruction before
any subsequent memory operations are started.

(end of quote)

>From what I understand, it is safe to leave out any run-time memory
barriers, but we still need barriers that prevent the compiler from
reordering (using __asm__ __volatile__ ("":::"memory")). In
urcu/arch/x86.h, I see that when CONFIG_RCU_HAVE_FENCE is false,
memory barriers result in both compile-time and run-time memory
barriers:  __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory").
I guess this would work for the Phi, but the lock instruction does not
seem necessary.

So, should we just set CONFIG_RCU_HAVE_FENCE to false when compiling
for the Phi and go on with our lives, or should we add a specific
config for this case?

Simon



More information about the lttng-dev mailing list