[lttng-dev] Xeon Phi memory barriers

Fri Dec 6 16:40:45 EST 2013

On Fri, Dec 06, 2013 at 08:15:38PM +0000, Mathieu Desnoyers wrote:
> ----- Original Message -----
> > From: "Simon Marchi" <simon.marchi at polymtl.ca>
> > To: lttng-dev at lists.lttng.org
> > Sent: Tuesday, November 19, 2013 4:26:06 PM
> > Subject: [lttng-dev] Xeon Phi memory barriers
> > 
> > Hello there,
> 
> Hi Simon,
> 
> While reading this reply, please keep in mind that I'm in a
> mindset where I've been in a full week of meeting, and it's late on
> Friday evening here. So YMMV ;-) I'm CCing Paul E. McKenney, so he can
> debunk my answer :)
> 
> > 
> > liburcu does not build on the Intel Xeon Phi, because the chip is
> > recognized as x86_64, but lacks the {s,l,m}fence instructions found on
> > usual x86_64 processors. The following is taken from the Xeon Phi dev
> > guide:
> 
> Let's have a look:
> 
> > 
> > The Intel® Xeon PhiTM coprocessor memory model is the same as that of
> > the Intel® Pentium processor. The reads and writes always appear in
> > programmed order at the system bus (or the ring interconnect in the
> > case of the Intel® Xeon PhiTM coprocessor); the exception being that
> > read misses are permitted to go ahead of buffered writes on the system
> > bus when all the buffered writes are cached hits and are, therefore,
> > not directed to the same address being accessed by the read miss.
> 
> OK, so reads can be reordered with respect to following writes.

That would be -preceding- writes, correct?

> > As a consequence of its stricter memory ordering model, the Intel®
> > Xeon PhiTM coprocessor does not support the SFENCE, LFENCE, and MFENCE
> > instructions that provide a more efficient way of controlling memory
> > ordering on other Intel processors.
> 
> I guess sfence and lfence are indeed completely useless, because we only
> can ever care about ordering reads vs writes (mfence). But even the mfence
> is not there.

The usual approach is an atomic operation to a dummy location on the
stack.  Is that the recommendation for Xeon Phi?

Either way, what should userspace RCU do to detect that it is being built
on a Xeon Phi?  I am sure that Mathieu would welcome the relevant patches
for this.  ;-)

> > While reads and writes from an Intel® Xeon PhiTM coprocessor appear in
> > program order on the system bus,
> 
> This part of the sentence seems misleading to me. Didn't the first
> sentence state the opposite ? "the exception being that
> read misses are permitted to go ahead of buffered writes on the system
> bus when all the buffered writes are cached hits and are, therefore,
> not directed to the same address being accessed by the read miss."
> 
> I'm probably missing something.

The trick might be that read misses are only allowed to pass write
-hits-, which would mean that the system bus would have already seen
the invalidate corresponding to the delayed write, and thus would
have no evidence of any misorderingr

> > the compiler can still reorder
> > unrelated memory operations while maintaining program order on a
> > single Intel® Xeon PhiTM coprocessor (hardware thread). If software
> > running on an Intel® Xeon PhiTM coprocessor is dependent on the order
> > of memory operations on another Intel® Xeon PhiTM coprocessor then a
> > serializing instruction (e.g., CPUID, instruction with a LOCK prefix)
> > between the memory operations is required to guarantee completion of
> > all memory accesses issued prior to the serializing instruction before
> > any subsequent memory operations are started.

OK, sounds like my guess of atomic instruction to dummy stack location
is correct, or perhaps carrying out a nearby assignment using an
xchg instruction.

> > (end of quote)
> > 
> > From what I understand, it is safe to leave out any run-time memory
> > barriers, but we still need barriers that prevent the compiler from
> > reordering (using __asm__ __volatile__ ("":::"memory")). In
> > urcu/arch/x86.h, I see that when CONFIG_RCU_HAVE_FENCE is false,
> > memory barriers result in both compile-time and run-time memory
> > barriers:  __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory").
> > I guess this would work for the Phi, but the lock instruction does not
> > seem necessary.
> 
> Actually, either a cpuid (core serializing) instruction or lock-prefixed
> instruction (serializing as a side-effect memory accesses) seems required.

It would certainly be safe.  One approach would be to keep it that way
unless/until someone showed it to be unnecessary.

> > So, should we just set CONFIG_RCU_HAVE_FENCE to false when compiling
> > for the Phi and go on with our lives, or should we add a specific
> > config for this case?
> 
> I _think_ we could get away with this mapping:
> 
> smp_wmb() -> barrier()
>   reasoning: write vs write are not reordered by the processor.
> 
> smp_rmb() -> barrier()
>   reasoning: read vs read not reordered by processor.
> 
> smp_mb() -> __asm__ __volatile__ ("lock; addl $0,0(%%esp)":::"memory")
>    or a cpuid instruction
>   reasoning: cpu can reorder reads vs later writes.
> 
> smp_read_barrier_depends() -> nothing at all (not needed at any level).

This should be safe, though I would argue for do { } while (0) for
smp_read_barrier_depends().

> Interestingly enough, AFAIU, this seems to map to x86-TSO. Maybe that instead
> of defining a compiling option specifically for Xeon Phi, we could instead
> define a x86-tso.h header variant in userspace RCU and use it in all Intel
> processors that map to TSO (hint: very vast majority). The only exceptions
> seems to be Pentium Pro (needing smp_rmb() -> lfence) and some Windchip
> processors which could reorder stores (thus needing smp_wmb() -> sfence).
> 
> Thoughts ?

As long as there is some reasonable way of detecting them.

Actually, why not use the locked add of zero for all x86 systems for
smp_mb()?

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> > 
> > Simon
> > 
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > 
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>