[ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)

Fri Feb 13 11:18:46 EST 2009

On Fri, 13 Feb 2009, Mathieu Desnoyers wrote:
> 
> I created also
> 
> _STORE_SHARED()
> _LOAD_SHARED()
> 
> which identify the variables which need to have cache flush done before
> (load) or after (store). So we get both speed and identification when
> needed (if we need to do batch updates linked with a single cache flush).
> e.g.

The thing is, THAT JUST ABSOLUTELY SUCKS.

Lookie here - we don't want to flush the cache at every load of a shared 
variable. There's no reason to. If you don't care about the orderign, you 
might as well get the old values. That's what memory ordering _means_, for 
chissake! In the absense of locks, loads may get stale values. It's that 
easy.

A lot of code wants to access multiple variables, and they are potentially 
nearby, and in the same cacheline. Making them all use _LOAD_SHARED() adds 
absolutely no value - and makes it MUCH MUCH SLOWER.

So what's the answer?

I already outlined it: either you use locks (which will do the magic for 
you), or you use memory barriers. In no case do you make the access magic, 
unless you have a compiler issue where you are afraid that the compiler 
would turn it into _multiple_ accesses and potentially get inconsistent 
results.

So the point about ACCESS_ONCE() is not, and never has been, about 
re-ordering. We know that the CPU may re-order the accesses and give us 
stale values (or values from the "future" wrt the other accesses around 
it). That's not the point. The point of ACCESS_ONCE() is that we get 
exactly _one_ value, and not two different ones (or none at all) because 
of the compiler either re-loading it several times or not re-loading it at 
all.

Anybody who confuses ACCESS_ONCE() with ordering is simply confused.

And we don't want to make any "load with cache flush" either. Which side 
should the cache flush be on? Before? After? Both? Atomically? There is no 
sane semantics for that.

The only remaining sane semantics is to depend on memory barriers, and 
then make a magic memory barrier that is extra weak and doesn't order 
anythign at all, but just says "syncronize very weakly".

And I think we have that in "cpu_relax()". Because if you have somebody 
doing shared memory accesses in a loop without any memory barriers or 
locks or anything (ie the _ordering_ doesn't matter, only that some value 
has been seen), then dang it, I can't see how you can _possibly_ use 
anything else than that "cpu_relax()" somewhere in that loop.

			Linus