[ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)
Mathieu Desnoyers
compudj at krystal.dyndns.org
Thu Feb 12 15:09:37 EST 2009
* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> On Thu, Feb 12, 2009 at 02:29:41PM -0500, Mathieu Desnoyers wrote:
> >
> > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > [...]
> > > diff --git a/urcu.c b/urcu.c
> > > index f2aae34..a696439 100644
> > > --- a/urcu.c
> > > +++ b/urcu.c
> > > @@ -99,7 +99,8 @@ static void force_mb_single_thread(pthread_t tid)
> > > * BUSY-LOOP.
> > > */
> > > while (sig_done < 1)
> > > - smp_rmb(); /* ensure we re-read sig-done */
> > > + barrier(); /* ensure compiler re-reads sig-done */
> > > + /* cache coherence guarantees CPU re-read. */
> >
> > OK, this is where I think our points of view differ. Please refer to
> > http://lkml.org/lkml/2007/6/18/299.
> >
> > Basically, cpu_relax() used in the Linux kernel has an
> > architecture-specific implementation which *could* include a smp_rmb()
> > if the architecture doesn't notice writes done by other CPUs. I think
> > Blackfin is the only architecture currently supported by the Linux
> > kernel which defines cpu_relax() as a smp_mb(), because it does not have
> > cache coherency.
> >
> > Therefore, I propose that we create a memory barrier macro which is
> > defined as a
> > barrier() when the cpu has cache coherency
> > cache flush when the cpu does not have cache coherency and is
> > compiled with smp support.
> >
> > We could call that
> >
> > smp_wmc() (for memory-coherency or memory commit)
> > smp_rmc()
> > smp_mc()
> >
> > It would be a good way to identify the location where data exchange
> > between memory and the local cache are is required in the algorithm.
> > What do you think ?
>
> Actually the best way to do this would be:
>
> while (ACCESS_ONCE(sig_done) < 1)
> continue;
>
Interesting idea. Maybe we should define an accessor for the data write
too ?
But I suspect that in a lot of situations, what we will really want is
to do a bunch of read/writes, and only at a particular point do the
cache flush.
> If ACCESS_ONCE() needs to be made architecture-specific to make this
> really work on Blackfin, we should make that change. And, now that
> you mention it, I have heard rumors that other CPU families can violate
> cache coherence in some circumstances.
>
> So perhaps ACCESS_ONCE() becomes:
>
> #ifdef CONFIG_ARCH_CACHE_COHERENT
> #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> #else /* #ifdef CONFIG_ARCH_CACHE_COHERENT */
> #define ACCESS_ONCE(x) ({ \
> typeof(x) _________x1; \
> _________x1 = (*(volatile typeof(x) *)&(x)); \
> cpu_relax(); \
I don't think cpu_relax would be the correct primitive to use here. We
definitely don't want a "rep; nop;" or anything like this which _slows
down_ the access. It's just a different goal we are pursuing. So using
something like smp_rmc within the ACCESS_ONCE() macro in this case as I
proposed in the other mail still seems to make sense.
Mathieu
> (_________x1); \
> })
> #endif /* #else #ifdef CONFIG_ARCH_CACHE_COHERENT */
>
> Seem reasonable?
>
> Thanx, Paul
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
More information about the lttng-dev
mailing list