[lttng-dev] [RFC URCU PATCH] wfqueue: ABI v1, simplify implementation, 2.3x to 2.6x performance boost

Fri Sep 7 09:03:25 EDT 2012

* Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> On 09/06/2012 09:43 PM, Mathieu Desnoyers wrote:
> > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> >> ping
> > 
> > we discussed a "batch" design offline. Do you think you would have time
> > to spin a patch implementing it ?
> > 
> 
> I prefer separated enqueue_struct dequeue_struct a little, so I'm
> waiting for you.
> 
> DEADLOCK!

Hahaha :-)

OK, I'm planning to discuss this with Paul next week, to get a 3rd
opinion. I don't want to make choices too quickly for APIs. So
meanwhile, let's try to sum up the possible solutions, those I recall
are:

1) The current API, with a single struct cds_wfqueue for enqueue and
   dequeue, with false-sharing between enqueue and dequeue, but on the
   plus side it provides a compact structure to put on the stack.

2) An API with cache-aligned fields of struct cds_wfqueue for enqueue
   and dequeue. Removes false-sharing, but requires a larger structure
   when extracted onto the stack with our new splice operation.

3) An API with two different queue representations: a "queue" and a
   "batch". The "queue" is bullet (2) above, and when we splice elements
   onto the stack, we put them in a "batch", described by bullet (1)
   above.

4) An API that takes separate enqueue/dequeue structures: it would allow
   the caller to decide the memory layout of those structures. We
   could possibly provide macros helpers for the common cases. E.g.

struct cds_enq {
        struct cds_wfq_node *tail;
};
struct cds_deq {
        struct cds_wfq_node head;
        pthread_mutex_t lock;
};
#define DEFINE_CDS_Q_ON_STACK(q)                                    \
        struct wfenqueue __cds_wfqenqueue_##q = { NULL };           \
        struct wfdequeue __cds_wfqdequeue_##q =                     \
                        { { NULL }, PTHREAD_MUTEX_INITIALIZER, }

/* For queue head/tail in data */
#define DEFINE_CDS_Q(q)                                             \
        struct cds_enq __cds_enq_##q                                \
                __attribute__((aligned((CAA_CACHE_LINE_SIZE))))     \
                        = { NULL };                                 \
        struct cds_deq __cds_deq_##q                                \
                 __attribute__((aligned((CAA_CACHE_LINE_SIZE))))    \
                        = { { NULL }, PTHREAD_MUTEX_INITIALIZER, }

And for dynamic allocation, then the user could specify the alignment
they want for their allocated memory (not sure it is worth it to do
helpers for that).

#define CDS_Q(q)        __cds_enq_##q, __cds_deq_##q

Then, when those structures would be passed to enqueue/dequeue:

static DEFINE_CDS_Q(myqueue);

void fct(void) {
   ...
   cds_wfq_enqueue(CDS_Q(myqueue), somenode);
}

Thoughts ?

Thanks,

Mathieu

> 
> thanks,
> Lai

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com