[lttng-dev] [RFC URCU PATCH] wfqueue: ABI v1, simplify implementation, 2.3x to 2.6x performance boost
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Fri Sep 7 09:03:25 EDT 2012
* Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> On 09/06/2012 09:43 PM, Mathieu Desnoyers wrote:
> > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> >> ping
> >
> > we discussed a "batch" design offline. Do you think you would have time
> > to spin a patch implementing it ?
> >
>
> I prefer separated enqueue_struct dequeue_struct a little, so I'm
> waiting for you.
>
> DEADLOCK!
Hahaha :-)
OK, I'm planning to discuss this with Paul next week, to get a 3rd
opinion. I don't want to make choices too quickly for APIs. So
meanwhile, let's try to sum up the possible solutions, those I recall
are:
1) The current API, with a single struct cds_wfqueue for enqueue and
dequeue, with false-sharing between enqueue and dequeue, but on the
plus side it provides a compact structure to put on the stack.
2) An API with cache-aligned fields of struct cds_wfqueue for enqueue
and dequeue. Removes false-sharing, but requires a larger structure
when extracted onto the stack with our new splice operation.
3) An API with two different queue representations: a "queue" and a
"batch". The "queue" is bullet (2) above, and when we splice elements
onto the stack, we put them in a "batch", described by bullet (1)
above.
4) An API that takes separate enqueue/dequeue structures: it would allow
the caller to decide the memory layout of those structures. We
could possibly provide macros helpers for the common cases. E.g.
struct cds_enq {
struct cds_wfq_node *tail;
};
struct cds_deq {
struct cds_wfq_node head;
pthread_mutex_t lock;
};
#define DEFINE_CDS_Q_ON_STACK(q) \
struct wfenqueue __cds_wfqenqueue_##q = { NULL }; \
struct wfdequeue __cds_wfqdequeue_##q = \
{ { NULL }, PTHREAD_MUTEX_INITIALIZER, }
/* For queue head/tail in data */
#define DEFINE_CDS_Q(q) \
struct cds_enq __cds_enq_##q \
__attribute__((aligned((CAA_CACHE_LINE_SIZE)))) \
= { NULL }; \
struct cds_deq __cds_deq_##q \
__attribute__((aligned((CAA_CACHE_LINE_SIZE)))) \
= { { NULL }, PTHREAD_MUTEX_INITIALIZER, }
And for dynamic allocation, then the user could specify the alignment
they want for their allocated memory (not sure it is worth it to do
helpers for that).
#define CDS_Q(q) __cds_enq_##q, __cds_deq_##q
Then, when those structures would be passed to enqueue/dequeue:
static DEFINE_CDS_Q(myqueue);
void fct(void) {
...
cds_wfq_enqueue(CDS_Q(myqueue), somenode);
}
Thoughts ?
Thanks,
Mathieu
>
> thanks,
> Lai
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list