[ltt-dev] [URCU PATCH] caa: do not generate code for rmb/wmb on x86_64, rmb on i686

Mathieu Desnoyers compudj at krystal.dyndns.org
Mon Sep 5 11:32:56 EDT 2011


* Paolo Bonzini (pbonzini at redhat.com) wrote:
> On 09/05/2011 05:12 PM, Mathieu Desnoyers wrote:
>>> In userspace we can assume no accesses to write-combining memory occur,
>>> >  and also that there are no non-temporal load/stores (people would presumably
>>> >  write those with assembly or intrinsics and put appropriate lfence/sfence
>>> >  manually).  So rmb and wmb are no-ops on x86.
>>
>> What about memory barriers for DMA with devices ? For these, we might
>> want to define cmm_wmb/rmb and cmm_smp_wmb/rmb differently (keep the
>> fences for DMA accesses).
>
> Yes, splitting wmb/rmb and smp_wmb/rmb makes sense.

Quoting:
www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf

"AMD64

AMD64 is compatible with x86, and has recently updated
its memory model [1] to enforce the tighter
ordering that actual implementations have provided
for some time. The AMD64 implementation of the
Linux smp mb() primitive is mfence, smp rmb() is
lfence, and smp wmb() is sfence. In theory, these
might be relaxed, but any such relaxation must take SSE
and 3DNOW instructions into account."

-> So I think we should document that cmm_wmb/rmb/mb take care of SSE,
   3DNOW and DMA accesses, but cmm_smp_*mb does not.

"x86

Since the x86 CPUs provide “process ordering” so that all CPUs agree on
the order of a given CPU’s writes to memory, the smp wmb() primitive is
a no-op for the CPU [7]. However, a compiler directive is required to
prevent the compiler from performing optimizations that would result in
reordering across the smp wmb() primitive.  On the other hand, x86 CPUs
have traditionally given no ordering guarantees for loads, so the smp
mb() and smp rmb() primitives expand to lock;addl. This atomic
instruction acts as a barrier to both loads and stores.

More recently, Intel has published a memory model for x86 [8]. It turns
out that Intel’s actual CPUs enforced tighter ordering than was claimed
in the previous specifications, so this model is in effect simply
mandating the earlier de-facto behavior.

However, note that some SSE instructions are weakly ordered (clflush and
non-temporal move instructions [6]). CPUs that have SSE can use mfence
for smp mb(), lfence for smp rmb(), and sfence for smp wmb().

A few versions of the x86 CPU have a mode bit that enables out-of-order
stores, and for these CPUs, smp wmb() must also be defined to be
lock;addl.  Although many older x86 implementations accommodated
self-modifying code without the need for any special instructions, newer
revisions of the x86 architecture no longer require x86 CPUs to be so
accommodating.  Interestingly enough, this relaxation comes just in time
to inconvenience JIT implementors."

-> So for Intel x86, it would make sense to document that cmm_rmb/wmb/mb
   take care of SSE, DMA accesses, non-temporal moves and clflush. We
   should also document that the "smp" variants of those primitives offer
   no guarantee for these cases.
   None of our fences offer ordering guarantees wrt prefetch
   instructions.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com




More information about the lttng-dev mailing list