[lttng-dev] [RFC URCU PATCH] wfqueue: ABI v1, simplify implementation, 2.3x to 2.6x performance boost

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Fri Sep 7 09:03:25 EDT 2012

* Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> On 09/06/2012 09:43 PM, Mathieu Desnoyers wrote:
> > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> >> ping
> > 
> > we discussed a "batch" design offline. Do you think you would have time
> > to spin a patch implementing it ?
> > 
> I prefer separated enqueue_struct dequeue_struct a little, so I'm
> waiting for you.

Hahaha :-)

OK, I'm planning to discuss this with Paul next week, to get a 3rd
opinion. I don't want to make choices too quickly for APIs. So
meanwhile, let's try to sum up the possible solutions, those I recall

1) The current API, with a single struct cds_wfqueue for enqueue and
   dequeue, with false-sharing between enqueue and dequeue, but on the
   plus side it provides a compact structure to put on the stack.

2) An API with cache-aligned fields of struct cds_wfqueue for enqueue
   and dequeue. Removes false-sharing, but requires a larger structure
   when extracted onto the stack with our new splice operation.

3) An API with two different queue representations: a "queue" and a
   "batch". The "queue" is bullet (2) above, and when we splice elements
   onto the stack, we put them in a "batch", described by bullet (1)

4) An API that takes separate enqueue/dequeue structures: it would allow
   the caller to decide the memory layout of those structures. We
   could possibly provide macros helpers for the common cases. E.g.

struct cds_enq {
        struct cds_wfq_node *tail;
struct cds_deq {
        struct cds_wfq_node head;
        pthread_mutex_t lock;
#define DEFINE_CDS_Q_ON_STACK(q)                                    \
        struct wfenqueue __cds_wfqenqueue_##q = { NULL };           \
        struct wfdequeue __cds_wfqdequeue_##q =                     \
                        { { NULL }, PTHREAD_MUTEX_INITIALIZER, }

/* For queue head/tail in data */
#define DEFINE_CDS_Q(q)                                             \
        struct cds_enq __cds_enq_##q                                \
                __attribute__((aligned((CAA_CACHE_LINE_SIZE))))     \
                        = { NULL };                                 \
        struct cds_deq __cds_deq_##q                                \
                 __attribute__((aligned((CAA_CACHE_LINE_SIZE))))    \
                        = { { NULL }, PTHREAD_MUTEX_INITIALIZER, }

And for dynamic allocation, then the user could specify the alignment
they want for their allocated memory (not sure it is worth it to do
helpers for that).

#define CDS_Q(q)        __cds_enq_##q, __cds_deq_##q

Then, when those structures would be passed to enqueue/dequeue:

static DEFINE_CDS_Q(myqueue);

void fct(void) {
   cds_wfq_enqueue(CDS_Q(myqueue), somenode);

Thoughts ?



> thanks,
> Lai

Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.

More information about the lttng-dev mailing list