[ltt-dev] [URCU PATCH] atomic: provide seq_cst semantics on powerpc
Mathieu Desnoyers
compudj at krystal.dyndns.org
Wed Sep 21 20:27:13 EDT 2011
* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> On Tue, Sep 20, 2011 at 12:35:18PM -0400, Mathieu Desnoyers wrote:
> > * Paolo Bonzini (pbonzini at redhat.com) wrote:
> > > Using lwsync before a load conditional provides acq_rel semantics. Other
> > > architectures provide sequential consistency, so replace the lwsync with
> > > sync (see also http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html).
> >
> > Paul, I think Paolo is right here. And it seems to fit with the powerpc
> > barrier models we discussed at plumbers. Can you provide feedback on
> > this ?
>
> The Cambridge guys have run this through their proof-of-correctness
> stuff, so I am confident in it.
>
> However...
>
> This ordering is weaker than that of the gcc __sync series. To achieve
> that (if we want to, no opinion at the moment), we need to leave the
> leading lwsync and replace the trailing isync with sync.
OK, I pushed commit dabbe4f87217fc22279a02d98db4984b3187b77c that
explains this change.
Paolo: if you are interested in optimising powerpc uatomic ops further,
add, sub, inc, dec, "or", "and", basically anything that is not
xchg/add_return/cmpxchg could use versions of the cmpxchg that does not
have the lwsync/sync barriers.
And we'd really need some documentation that state this consistency
model, taking care of:
all cmm_*mb(), cmm_smp_*mb(), xchg, add_return, cmpxchg.
Thanks,
Mathieu
>
> Thanx, Paul
>
> > Thanks,
> >
> > Mathieu
> >
> > > ---
> > > urcu/uatomic/ppc.h | 18 ++++++------------
> > > 1 files changed, 6 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/urcu/uatomic/ppc.h b/urcu/uatomic/ppc.h
> > > index 3eb3d63..8485f67 100644
> > > --- a/urcu/uatomic/ppc.h
> > > +++ b/urcu/uatomic/ppc.h
> > > @@ -27,12 +27,6 @@
> > > extern "C" {
> > > #endif
> > >
> > > -#ifdef __NO_LWSYNC__
> > > -#define LWSYNC_OPCODE "sync\n"
> > > -#else
> > > -#define LWSYNC_OPCODE "lwsync\n"
> > > -#endif
> > > -
> > > #define ILLEGAL_INSTR ".long 0xd00d00"
> > >
> > > /*
> > > @@ -53,7 +47,7 @@ unsigned long _uatomic_exchange(void *addr, unsigned long val, int len)
> > > unsigned int result;
> > >
> > > __asm__ __volatile__(
> > > - LWSYNC_OPCODE
> > > + "sync\n" /* for sequential consistency */
> > > "1:\t" "lwarx %0,0,%1\n" /* load and reserve */
> > > "stwcx. %2,0,%1\n" /* else store conditional */
> > > "bne- 1b\n" /* retry if lost reservation */
> > > @@ -70,7 +64,7 @@ unsigned long _uatomic_exchange(void *addr, unsigned long val, int len)
> > > unsigned long result;
> > >
> > > __asm__ __volatile__(
> > > - LWSYNC_OPCODE
> > > + "sync\n" /* for sequential consistency */
> > > "1:\t" "ldarx %0,0,%1\n" /* load and reserve */
> > > "stdcx. %2,0,%1\n" /* else store conditional */
> > > "bne- 1b\n" /* retry if lost reservation */
> > > @@ -104,7 +98,7 @@ unsigned long _uatomic_cmpxchg(void *addr, unsigned long old,
> > > unsigned int old_val;
> > >
> > > __asm__ __volatile__(
> > > - LWSYNC_OPCODE
> > > + "sync\n" /* for sequential consistency */
> > > "1:\t" "lwarx %0,0,%1\n" /* load and reserve */
> > > "cmpw %0,%3\n" /* if load is not equal to */
> > > "bne 2f\n" /* old, fail */
> > > @@ -125,7 +119,7 @@ unsigned long _uatomic_cmpxchg(void *addr, unsigned long old,
> > > unsigned long old_val;
> > >
> > > __asm__ __volatile__(
> > > - LWSYNC_OPCODE
> > > + "sync\n" /* for sequential consistency */
> > > "1:\t" "ldarx %0,0,%1\n" /* load and reserve */
> > > "cmpd %0,%3\n" /* if load is not equal to */
> > > "bne 2f\n" /* old, fail */
> > > @@ -166,7 +160,7 @@ unsigned long _uatomic_add_return(void *addr, unsigned long val,
> > > unsigned int result;
> > >
> > > __asm__ __volatile__(
> > > - LWSYNC_OPCODE
> > > + "sync\n" /* for sequential consistency */
> > > "1:\t" "lwarx %0,0,%1\n" /* load and reserve */
> > > "add %0,%2,%0\n" /* add val to value loaded */
> > > "stwcx. %0,0,%1\n" /* store conditional */
> > > @@ -184,7 +178,7 @@ unsigned long _uatomic_add_return(void *addr, unsigned long val,
> > > unsigned long result;
> > >
> > > __asm__ __volatile__(
> > > - LWSYNC_OPCODE
> > > + "sync\n" /* for sequential consistency */
> > > "1:\t" "ldarx %0,0,%1\n" /* load and reserve */
> > > "add %0,%2,%0\n" /* add val to value loaded */
> > > "stdcx. %0,0,%1\n" /* store conditional */
> > > --
> > > 1.7.6
> > >
> > >
> > > _______________________________________________
> > > ltt-dev mailing list
> > > ltt-dev at lists.casi.polymtl.ca
> > > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> > >
> >
> > --
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list