[lttng-dev] [RFC PATCH] wfqueue: expand API, simplify implementation, small performance boost
Lai Jiangshan
eag0628 at gmail.com
Tue Aug 14 23:42:34 EDT 2012
On Wed, Aug 15, 2012 at 6:27 AM, Mathieu Desnoyers
<mathieu.desnoyers at efficios.com> wrote:
> * Lai Jiangshan (eag0628 at gmail.com) wrote:
>> On Tue, Aug 14, 2012 at 4:12 AM, Mathieu Desnoyers
>> <mathieu.desnoyers at efficios.com> wrote:
> [...]
>> >> > /*
>> >> > + * ___cds_wfq_first_blocking: get first node of a queue, without dequeuing.
>> >> > + *
>> >> > + * Mutual exclusion with "dequeue" and "splice" operations must be ensured
>> >> > + * by the caller.
>> >> > + */
>> >> > +static inline struct cds_wfq_node *
>> >> > +___cds_wfq_first_blocking(struct cds_wfq_queue *q)
>> >> > +{
>> >> > + if (_cds_wfq_empty(q))
>> >> > + return NULL;
>> >> > + return ___cds_wfq_node_sync_next(&q->head);
>> >> > +}
>> >> > +
>> >> > +/*
>> >> > + * ___cds_wfq_next_blocking: get next node of a queue, without dequeuing.
>> >> > + *
>> >> > + * Mutual exclusion with "dequeue" and "splice" operations must be ensured
>> >> > + * by the caller.
>> >> > + */
>> >> > +static inline struct cds_wfq_node *
>> >> > +___cds_wfq_next_blocking(struct cds_wfq_queue *q, struct cds_wfq_node *node)
>> >> > +{
>> >> > + if (CMM_LOAD_SHARED(q->tail) == node)
>> >> > + return NULL;
>> >> > + return ___cds_wfq_node_sync_next(node);
>> >> > +}
>> >>
>> >>
>> >> The same BUG as you told me.
>> >> If q has only one node just enqueued by other thread.
>> >> but if q->head.next is seen, ___cds_wfq_first_blocking() returns a node,
>> >> And the update of q->tail is not seen, it is still &q->head,
>> >> ___cds_wfq_node_sync_next(node) will be loop for every if there is no
>> >> other enqueue.
>> >
>> > Good catch ! :-)
>> >
>> >>
>> >>
>> >>
>> >> static inline struct cds_wfq_node *
>> >> ___cds_wfq_first_blocking(struct cds_wfq_queue *q)
>> >> {
>> >> + struct cds_wfq_node *ret.
>> >> if (_cds_wfq_empty(q))
>> >> return NULL;
>> >> ret = ___cds_wfq_node_sync_next(&q->head);
>> >> + cmm_smp_rmb();
>> >> + return ret;
>> >> }
>> >
>> > However, I think we should add the rmb at the beginning of
>> > ___cds_wfq_next_blocking(), so it applies at each "next" call.
>> > Otherwise, I think we could end up in a situation where we wait for a
>> > NULL next forever in the second of two consecutive
>> > ___cds_wfq_next_blocking() calls. I therefore propose:
>>
>> How this can happen(wait for a NULL next forever in the second ...)?
>>
>> A rmb is enough to leave the state(simgle node && q->tail == &q->head).
>> "CMM_LOAD_SHARED(q->tail) == node" will be true in the loop except this state.
>
> You are indeed right, but I now realise there is a small thing we need
> to change. I tried a couple of scenarios to convince myself of it, and
> here are my conclusions:
>
> - In all cases (except for the first node), comparing q->tail with node
> ensures that we never go beyond the tail. Except for the first node
> (q->head), we always perform this comparison, so at no point can we
> reach the nodes through "next" pointers that would be beyond the
> q->tail node.
>
> - In the case of the first node, we do the comparison with
>
> CMM_LOAD_SHARED(q->head.next) == NULL
> && CMM_LOAD_SHARED(q->tail) == &q->head;
>
> in no particular order (checking if empty) before we move to the
> first node with ___cds_wfq_node_sync_next(&q->head).
> The reason why the first node is different from all the others is
> because we don't use "CMM_LOAD_SHARED(q->tail) == &q->head" alone
> to distinguish between an empty queue and a queue that contains
> nodes: we first check for NULL q->head.next, AFAIU for performance
> reasons: whenever possible, we don't want to read the q->tail pointer
> because it is being heavily updated by the enqueuer.
Is it false sharing?
Access to q->head.next and access to q->tail have the same performance
because they are in the same cache line.
>
> So assuming the goal beyond the CMM_LOAD_SHARED(q->head.next) == NULL
> check in the empty queue check is for performance of dequeue vs enqueue,
> I think it would be appropriate to do a similar optimisation for the
> ___cds_wfq_next_blocking() case: first check the next pointer of the
> current node to see if it is NULL. If not, then we know we can fetch the
> next pointer without comparing our position to the tail (performance
> optimisation). However, if we do that, we'll need a cmm_smp_rmb() at
> each ___cds_wfq_next_blocking() invocation.
>
> Here is the resulting code:
>
> /*
> * ___cds_wfq_first_blocking: get first node of a queue, without dequeuing.
> *
> * Mutual exclusion with "dequeue" and "splice" operations must be ensured
> * by the caller.
> */
> static inline struct cds_wfq_node *
> ___cds_wfq_first_blocking(struct cds_wfq_queue *q)
> {
> if (_cds_wfq_empty(q))
> return NULL;
> return ___cds_wfq_node_sync_next(&q->head);
> }
>
> /*
> * ___cds_wfq_next_blocking: get next node of a queue, without dequeuing.
> *
> * Mutual exclusion with "dequeue" and "splice" operations must be ensured
> * by the caller.
> */
> static inline struct cds_wfq_node *
> ___cds_wfq_next_blocking(struct cds_wfq_queue *q, struct cds_wfq_node *node)
> {
> struct cds_wfq_node *next;
>
> if ((next = CMM_LOAD_SHARED(node->next)) != NULL)
> return next;
> /* Load node->next before q->tail */
> cmm_smp_rmb();
> if (CMM_LOAD_SHARED(q->tail) == node)
> return NULL;
> return ___cds_wfq_node_sync_next(node);
> }
>
> Is my understanding correct ?
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
More information about the lttng-dev
mailing list