[lttng-dev] [RFC PATCH] wfqueue: expand API, simplify implementation, small performance boost

Tue Aug 14 19:34:02 EDT 2012

* Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
[...]
> One more thing: I think we need to add cmm_smp_read_barrier_depends()
> before returning the node pointers in dequeue, first, and next
> operations. This memory barrier would match the implicit barrier in the
> enqueue ordering write to the prior tail's next pointer with writes to
> the node content that would have been performed beforehand by the
> caller.

Please note that the only reason why I propose to use
"cmm_smp_read_barrier_depends()" before returning nodes in dequeue,
first, and next functions rather than a full cmm_smp_mb() is because I
fail to see a case where an architecture would reorder a dependent store
before the load of the pointer.

I very well imagine a use-case where we have node N within the structure
X like the scenario below, where we write to X immediately after dequeue:

CPU 0                                      CPU 1
store X content
  (implicit full memory barrier
   implied before uatomic_xchg)
enqueue X->N
                                           dequeue X->N
                                           (which barrier here ?)
                                           write to X immediately.

If we assume that no reordering of a store that depends on loading a
pointer can arrive, then I think we should be good with only a
read_barrier_depends() on CPU 1. Otherwise, we might want to put a full
barrier there to ensure that we don't race with CPU 0 storing content to
X.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com