[ltt-dev] [rp] [PATCH RFC] v3 call_rcu() interface for userspace-rcu

Thu Mar 3 09:58:37 EST 2011

* Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > On Mon, Feb 28, 2011 at 12:25:44PM -0500, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > On Mon, Feb 28, 2011 at 11:27:31AM -0500, Mathieu Desnoyers wrote:
> > > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > > > Adds call_rcu(), with RCU threads to invoke the callbacks.  By default,
> > > > > > there will be one such RCU thread per process, created the first time
> > > > > > that call_rcu() is invoked.  On systems supporting sched_getcpu(), it
> > > > > > is possible to create one RCU thread per CPU by calling
> > > > > > create_all_cpu_call_rcu_data().
> > > > > > 
> > > > > > This version includes a second round of feedback from Mathieu Desnoyers.
> > > > > > In addition, the tests have been upgraded to randomly configure per-CPU
> > > > > > and per-thread call_rcu() threads.  In turn, the main code contains fixes
> > > > > > for a number of embarrassing code that the resulting testing located.
> > > > > > 
> > > > > > This version does not include changes to make valgrind happy.
> > > > > > I am reviewing valgrind documentation to work out the best approach,
> > > > > > and believe that there is a possible design that does not involve
> > > > > > manually tearing everything down.  More on that later.
> > > > > 
> > > > > Hi Paul!
> > > > > 
> > > > > I'm tempted to pull this, but it is still flagged RFC. Any update on interaction
> > > > > with valgrind ?
> > > > 
> > > > I read through the valgrind documentation, and it appears that a good
> > > > approach is to simply link all of the call_rcu_data structures together
> > > > so that valgrind sees them as not being leaked, even if a thread exits
> > > > (thus losing its __thread variables).
> > > > 
> > > > I am roughing out ideas to allow call_rcu_data structures to be
> > > > deallocated, but would rather start with the simple linking.  If no one
> > > > complains, then we might also end with the simple linking.  ;-)
> > > > 
> > > > So I will re-submit with the call_rcu_data structures linked together,
> > > > hopefully later this week.  Also with the README update you call out
> > > > in your next message.
> > > 
> > > Sounds good. Also, I noticed some oddness in the thread vs fork behavior that
> > > applied to UST, and seems to apply to urcu defer/call_rcu worker threads too.
> > > Basically, the worker threads are not kept over a fork, but all the locks/data
> > > structures are kept exactly in the state they were at the exact moment the fork
> > > happened. So we might want to review pthread_atfork(3) to see the best way to
> > > proceed in this respect. We might want to either just document the limitation,
> > > or deal with the need to re-spawn threads and reinit data structures by hooking
> > > into pthread_atfork.
> > 
> > Heh!
> > 
> > We need to remove all the per-thread data structures -- implicitly
> > unregistering all of the threads.  Otherwise, some random thread from the
> > other process will prevent us from ever again completing a grace period.
> > You need to queue all of the defer_rcu() callbacks for later processing,
> > as you cannot safely process them from the atfork callback, because RCU
> > isn't in operational condition.
> > 
> > I need to free up all of the call_rcu_data structures, then set up the
> > default call_rcu_data structure for the new process.  All callbacks from
> > the old call_rcu_data structures get dumped onto the new default
> > call_rcu_data structure.
> > 
> > But, yow!!!  If any of the callbacks refer to per-thread data, the life
> > of the new process will be nasty, brutal, and short.  So how about if
> > we instead require that the caller insert a call to some rcu_atfork()?
> > Because if the caller is just going to do exec(), which is the common
> > case, why bother with any of this stuff?
> > 
> > To reiterate, automatically doing the adjustments from pthread_atfork()
> > will be unsafe at any speed, resulting in crazy bugs in the common-case
> > usage of fork().
> 
> How about we document a nice receipe to handle fork in applications using urcu?
> e.g. if the application want to do fork without exec in the child, it should,
> prior to the fork:
> 
> - unregister all urcu reader threads
> - teardown all defer/call RCU worker threads by calling defer_unregister_thread/
>   call_rcu unregister thread for all threads using call_rcu/defer_rcu.

Actually, for this part document might be needed. I'll write something in the
readme file.

> 
> For urcu-bp.c, the fork() caller must ensure that no thread is within a RCU
> read-side critical section when the fork is done, otherwise it would block grace
> period completion. urcu-bp.c is used by UST, and should be more resilient to
> these weird situations than normal use-cases. I doubt we'll be able to deal with
> that without walking on all the registered threads (in the fork child) and
> clearing their respective nesting count. We can consider these thread
> stacks/per-thread RCU data as being leaked memory (there ain't much we can do
> about this). Note that UST does not use call_rcu nor defer_rcu at the moment.

I just looked more closely at the urcu-bp code, and it seems to be really
bullet-proof ;) It checks if the threads are still there in a "garbage
collection" phase when synchronize_rcu() is invoked. The thread performing the
fork() call keeps the same thread ID in the parent and child, so it stays
consistent. We should be good without any change for that implementation.

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
> > 
> > 							Thanx, Paul
> > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > > Thanks,
> > > > > 
> > > > > Mathieu
> > > > > 
> > > > > > 
> > > > > > However, this version does pass torture tests on a 48-CPU Power system
> > > > > > (though I needed to re-apply the ppc asm typo fix).
> > > > > > 
> > > > > > Signed-off-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> > > > > > 
> > > > > > diff --git a/Makefile.am b/Makefile.am
> > > > > > index 79a7152..7956e7e 100644
> > > > > > --- a/Makefile.am
> > > > > > +++ b/Makefile.am
> > > > > > @@ -1,6 +1,7 @@
> > > > > >  INCLUDES = -I$(top_builddir)/urcu
> > > > > >  
> > > > > >  AM_LDFLAGS=-lpthread
> > > > > > +AM_CFLAGS=-Wall
> > > > > >  
> > > > > >  SUBDIRS = . tests
> > > > > >  
> > > > > > @@ -29,8 +30,8 @@ COMPAT+=compat_futex.c
> > > > > >  endif
> > > > > >  
> > > > > >  lib_LTLIBRARIES = liburcu.la liburcu-qsbr.la liburcu-mb.la liburcu-signal.la \
> > > > > > -		  liburcu-bp.la liburcu-defer.la libwfqueue.la libwfstack.la \
> > > > > > -		  librculfqueue.la librculfstack.la
> > > > > > +		  liburcu-bp.la liburcu-defer.la liburcu-call.la \
> > > > > > +		  libwfqueue.la libwfstack.la librculfqueue.la librculfstack.la
> > > > > >  
> > > > > >  liburcu_la_SOURCES = urcu.c urcu-pointer.c $(COMPAT)
> > > > > >  
> > > > > > @@ -44,6 +45,7 @@ liburcu_signal_la_CFLAGS = -DRCU_SIGNAL
> > > > > >  
> > > > > >  liburcu_bp_la_SOURCES = urcu-bp.c urcu-pointer.c $(COMPAT)
> > > > > >  
> > > > > > +liburcu_call_la_SOURCES = urcu-call-rcu.c $(COMPAT)
> > > > > >  liburcu_defer_la_SOURCES = urcu-defer.c $(COMPAT)
> > > > > >  
> > > > > >  libwfqueue_la_SOURCES = wfqueue.c $(COMPAT)
> > > > > > diff --git a/configure.ac b/configure.ac
> > > > > > index 02780e7..88771d4 100644
> > > > > > --- a/configure.ac
> > > > > > +++ b/configure.ac
> > > > > > @@ -34,7 +34,7 @@ AC_TYPE_SIZE_T
> > > > > >  # Checks for library functions.
> > > > > >  AC_FUNC_MALLOC
> > > > > >  AC_FUNC_MMAP
> > > > > > -AC_CHECK_FUNCS([bzero gettimeofday munmap strtoul])
> > > > > > +AC_CHECK_FUNCS([bzero gettimeofday munmap sched_getcpu strtoul sysconf])
> > > > > >  
> > > > > >  # Find arch type
> > > > > >  case $host_cpu in
> > > > > > diff --git a/tests/Makefile.am b/tests/Makefile.am
> > > > > > index a43dd75..3c025a4 100644
> > > > > > --- a/tests/Makefile.am
> > > > > > +++ b/tests/Makefile.am
> > > > > > @@ -1,5 +1,5 @@
> > > > > >  AM_LDFLAGS=-lpthread
> > > > > > -AM_CFLAGS=-I$(top_srcdir) -I$(top_builddir)
> > > > > > +AM_CFLAGS=-I$(top_srcdir) -I$(top_builddir) -g
> > > > > >  
> > > > > >  noinst_PROGRAMS = test_urcu test_urcu_dynamic_link test_urcu_timing \
> > > > > >  	test_urcu_signal test_urcu_signal_dynamic_link test_urcu_signal_timing \
> > > > > > @@ -28,20 +28,21 @@ if COMPAT_FUTEX
> > > > > >  COMPAT+=$(top_srcdir)/compat_futex.c
> > > > > >  endif
> > > > > >  
> > > > > > -URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(COMPAT)
> > > > > > -URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(COMPAT)
> > > > > > +URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/urcu-call-rcu.c $(top_srcdir)/wfqueue.c $(COMPAT)
> > > > > > +URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/urcu-call-rcu.c $(top_srcdir)/wfqueue.c $(COMPAT)
> > > > > >  # URCU_MB uses urcu.c but -DRCU_MB must be defined
> > > > > > -URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(COMPAT)
> > > > > > +URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/urcu-call-rcu.c $(top_srcdir)/wfqueue.c $(COMPAT)
> > > > > >  # URCU_SIGNAL uses urcu.c but -DRCU_SIGNAL must be defined
> > > > > > -URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(COMPAT)
> > > > > > -URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(COMPAT)
> > > > > > -URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-defer.c $(top_srcdir)/urcu-pointer.c $(COMPAT)
> > > > > > +URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/urcu-call-rcu.c $(top_srcdir)/wfqueue.c $(COMPAT)
> > > > > > +URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/urcu-call-rcu.c $(top_srcdir)/wfqueue.c $(COMPAT)
> > > > > > +URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-defer.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/urcu-call-rcu.c $(top_srcdir)/wfqueue.c $(COMPAT)
> > > > > >  
> > > > > >  URCU_LIB=$(top_builddir)/liburcu.la
> > > > > >  URCU_QSBR_LIB=$(top_builddir)/liburcu-qsbr.la
> > > > > >  URCU_MB_LIB=$(top_builddir)/liburcu-mb.la
> > > > > >  URCU_SIGNAL_LIB=$(top_builddir)/liburcu-signal.la
> > > > > >  URCU_BP_LIB=$(top_builddir)/liburcu-bp.la
> > > > > > +URCU_CALL_LIB=$(top_builddir)/liburcu-call.la
> > > > > >  WFQUEUE_LIB=$(top_builddir)/libwfqueue.la
> > > > > >  WFSTACK_LIB=$(top_builddir)/libwfstack.la
> > > > > >  RCULFQUEUE_LIB=$(top_builddir)/librculfqueue.la
> > > > > > @@ -95,23 +96,23 @@ test_perthreadlock_SOURCES = test_perthreadlock.c $(URCU_SIGNAL)
> > > > > >  
> > > > > >  rcutorture_urcu_SOURCES = urcutorture.c
> > > > > >  rcutorture_urcu_CFLAGS = -DTORTURE_URCU $(AM_CFLAGS)
> > > > > > -rcutorture_urcu_LDADD = $(URCU)
> > > > > > +rcutorture_urcu_LDADD = $(URCU) $(URCU_CALL_LIB) $(WFQUEUE_LIB)
> > > > > >  
> > > > > >  rcutorture_urcu_mb_SOURCES = urcutorture.c
> > > > > >  rcutorture_urcu_mb_CFLAGS = -DTORTURE_URCU_MB $(AM_CFLAGS)
> > > > > > -rcutorture_urcu_mb_LDADD = $(URCU_MB_LIB)
> > > > > > +rcutorture_urcu_mb_LDADD = $(URCU_MB_LIB) $(URCU_CALL_LIB) $(WFQUEUE_LIB)
> > > > > >  
> > > > > >  rcutorture_qsbr_SOURCES = urcutorture.c
> > > > > >  rcutorture_qsbr_CFLAGS = -DTORTURE_QSBR $(AM_CFLAGS)
> > > > > > -rcutorture_qsbr_LDADD = $(URCU_QSBR_LIB)
> > > > > > +rcutorture_qsbr_LDADD = $(URCU_QSBR_LIB) $(URCU_CALL_LIB) $(WFQUEUE_LIB)
> > > > > >  
> > > > > >  rcutorture_urcu_signal_SOURCES = urcutorture.c
> > > > > >  rcutorture_urcu_signal_CFLAGS = -DTORTURE_URCU_SIGNAL $(AM_CFLAGS)
> > > > > > -rcutorture_urcu_signal_LDADD = $(URCU_SIGNAL_LIB)
> > > > > > +rcutorture_urcu_signal_LDADD = $(URCU_SIGNAL_LIB) $(URCU_CALL_LIB) $(WFQUEUE_LIB)
> > > > > >  
> > > > > >  rcutorture_urcu_bp_SOURCES = urcutorture.c
> > > > > >  rcutorture_urcu_bp_CFLAGS = -DTORTURE_URCU_BP $(AM_CFLAGS)
> > > > > > -rcutorture_urcu_bp_LDADD = $(URCU_BP_LIB)
> > > > > > +rcutorture_urcu_bp_LDADD = $(URCU_BP_LIB) $(URCU_CALL_LIB) $(WFQUEUE_LIB)
> > > > > >  
> > > > > >  test_mutex_SOURCES = test_mutex.c $(URCU)
> > > > > >  
> > > > > > diff --git a/tests/rcutorture.h b/tests/rcutorture.h
> > > > > > index 4dac2f2..b42b8ab 100644
> > > > > > --- a/tests/rcutorture.h
> > > > > > +++ b/tests/rcutorture.h
> > > > > > @@ -65,6 +65,9 @@
> > > > > >   * Test variables.
> > > > > >   */
> > > > > >  
> > > > > > +#include <stdlib.h>
> > > > > > +#include "../urcu-call-rcu.h"
> > > > > > +
> > > > > >  DEFINE_PER_THREAD(long long, n_reads_pt);
> > > > > >  DEFINE_PER_THREAD(long long, n_updates_pt);
> > > > > >  
> > > > > > @@ -147,6 +150,16 @@ void *rcu_update_perf_test(void *arg)
> > > > > >  {
> > > > > >  	long long n_updates_local = 0;
> > > > > >  
> > > > > > +	if ((random() & 0xf00) == 0) {
> > > > > > +		struct call_rcu_data *crdp;
> > > > > > +
> > > > > > +		crdp = create_call_rcu_data(0);
> > > > > > +		if (crdp != NULL) {
> > > > > > +			fprintf(stderr,
> > > > > > +				"Using per-thread call_rcu() worker.\n");
> > > > > > +			set_thread_call_rcu_data(crdp);
> > > > > > +		}
> > > > > > +	}
> > > > > >  	uatomic_inc(&nthreadsrunning);
> > > > > >  	while (goflag == GOFLAG_INIT)
> > > > > >  		poll(NULL, 0, 1);
> > > > > > @@ -296,10 +309,30 @@ void *rcu_read_stress_test(void *arg)
> > > > > >  	return (NULL);
> > > > > >  }
> > > > > >  
> > > > > > +static pthread_mutex_t call_rcu_test_mutex = PTHREAD_MUTEX_INITIALIZER;
> > > > > > +static pthread_cond_t call_rcu_test_cond = PTHREAD_COND_INITIALIZER;
> > > > > > +
> > > > > > +void rcu_update_stress_test_rcu(struct rcu_head *head)
> > > > > > +{
> > > > > > +	if (pthread_mutex_lock(&call_rcu_test_mutex) != 0) {
> > > > > > +		perror("pthread_mutex_lock");
> > > > > > +		exit(-1);
> > > > > > +	}
> > > > > > +	if (pthread_cond_signal(&call_rcu_test_cond) != 0) {
> > > > > > +		perror("pthread_cond_signal");
> > > > > > +		exit(-1);
> > > > > > +	}
> > > > > > +	if (pthread_mutex_unlock(&call_rcu_test_mutex) != 0) {
> > > > > > +		perror("pthread_mutex_unlock");
> > > > > > +		exit(-1);
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > >  void *rcu_update_stress_test(void *arg)
> > > > > >  {
> > > > > >  	int i;
> > > > > >  	struct rcu_stress *p;
> > > > > > +	struct rcu_head rh;
> > > > > >  
> > > > > >  	while (goflag == GOFLAG_INIT)
> > > > > >  		poll(NULL, 0, 1);
> > > > > > @@ -317,7 +350,24 @@ void *rcu_update_stress_test(void *arg)
> > > > > >  		for (i = 0; i < RCU_STRESS_PIPE_LEN; i++)
> > > > > >  			if (i != rcu_stress_idx)
> > > > > >  				rcu_stress_array[i].pipe_count++;
> > > > > > -		synchronize_rcu();
> > > > > > +		if (n_updates & 0x1)
> > > > > > +			synchronize_rcu();
> > > > > > +		else {
> > > > > > +			if (pthread_mutex_lock(&call_rcu_test_mutex) != 0) {
> > > > > > +				perror("pthread_mutex_lock");
> > > > > > +				exit(-1);
> > > > > > +			}
> > > > > > +			call_rcu(&rh, rcu_update_stress_test_rcu);
> > > > > > +			if (pthread_cond_wait(&call_rcu_test_cond,
> > > > > > +					      &call_rcu_test_mutex) != 0) {
> > > > > > +				perror("pthread_cond_wait");
> > > > > > +				exit(-1);
> > > > > > +			}
> > > > > > +			if (pthread_mutex_unlock(&call_rcu_test_mutex) != 0) {
> > > > > > +				perror("pthread_mutex_unlock");
> > > > > > +				exit(-1);
> > > > > > +			}
> > > > > > +		}
> > > > > >  		n_updates++;
> > > > > >  	}
> > > > > >  	return NULL;
> > > > > > @@ -325,6 +375,16 @@ void *rcu_update_stress_test(void *arg)
> > > > > >  
> > > > > >  void *rcu_fake_update_stress_test(void *arg)
> > > > > >  {
> > > > > > +	if ((random() & 0xf00) == 0) {
> > > > > > +		struct call_rcu_data *crdp;
> > > > > > +
> > > > > > +		crdp = create_call_rcu_data(0);
> > > > > > +		if (crdp != NULL) {
> > > > > > +			fprintf(stderr,
> > > > > > +				"Using per-thread call_rcu() worker.\n");
> > > > > > +			set_thread_call_rcu_data(crdp);
> > > > > > +		}
> > > > > > +	}
> > > > > >  	while (goflag == GOFLAG_INIT)
> > > > > >  		poll(NULL, 0, 1);
> > > > > >  	while (goflag == GOFLAG_RUN) {
> > > > > > @@ -396,6 +456,12 @@ int main(int argc, char *argv[])
> > > > > >  
> > > > > >  	smp_init();
> > > > > >  	//rcu_init();
> > > > > > +	srandom(time(NULL));
> > > > > > +	if (random() & 0x100) {
> > > > > > +		fprintf(stderr, "Allocating per-CPU call_rcu threads.\n");
> > > > > > +		if (create_all_cpu_call_rcu_data(0))
> > > > > > +			perror("create_all_cpu_call_rcu_data");
> > > > > > +	}
> > > > > >  
> > > > > >  #ifdef DEBUG_YIELD
> > > > > >  	yield_active |= YIELD_READ;
> > > > > > diff --git a/urcu-call-rcu.c b/urcu-call-rcu.c
> > > > > > new file mode 100644
> > > > > > index 0000000..5c003aa
> > > > > > --- /dev/null
> > > > > > +++ b/urcu-call-rcu.c
> > > > > > @@ -0,0 +1,450 @@
> > > > > > +/*
> > > > > > + * urcu-call-rcu.c
> > > > > > + *
> > > > > > + * Userspace RCU library - batch memory reclamation with kernel API
> > > > > > + *
> > > > > > + * Copyright (c) 2010 Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> > > > > > + *
> > > > > > + * This library is free software; you can redistribute it and/or
> > > > > > + * modify it under the terms of the GNU Lesser General Public
> > > > > > + * License as published by the Free Software Foundation; either
> > > > > > + * version 2.1 of the License, or (at your option) any later version.
> > > > > > + *
> > > > > > + * This library is distributed in the hope that it will be useful,
> > > > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > > > > + * Lesser General Public License for more details.
> > > > > > + *
> > > > > > + * You should have received a copy of the GNU Lesser General Public
> > > > > > + * License along with this library; if not, write to the Free Software
> > > > > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > > > > > + */
> > > > > > +
> > > > > > +#include <stdio.h>
> > > > > > +#include <pthread.h>
> > > > > > +#include <signal.h>
> > > > > > +#include <assert.h>
> > > > > > +#include <stdlib.h>
> > > > > > +#include <string.h>
> > > > > > +#include <errno.h>
> > > > > > +#include <poll.h>
> > > > > > +#include <sys/time.h>
> > > > > > +#include <syscall.h>
> > > > > > +#include <unistd.h>
> > > > > > +
> > > > > > +#include "config.h"
> > > > > > +#include "urcu/wfqueue.h"
> > > > > > +#include "urcu-call-rcu.h"
> > > > > > +#include "urcu-pointer.h"
> > > > > > +
> > > > > > +/* Data structure that identifies a call_rcu thread. */
> > > > > > +
> > > > > > +struct call_rcu_data {
> > > > > > +	struct cds_wfq_queue cbs;
> > > > > > +	unsigned long flags;
> > > > > > +	pthread_mutex_t mtx;
> > > > > > +	pthread_cond_t cond;
> > > > > > +	unsigned long qlen;
> > > > > > +	pthread_t tid;
> > > > > > +} __attribute__((aligned(CAA_CACHE_LINE_SIZE)));
> > > > > > +
> > > > > > +/* Link a thread using call_rcu() to its call_rcu thread. */
> > > > > > +
> > > > > > +static __thread struct call_rcu_data *thread_call_rcu_data;
> > > > > > +
> > > > > > +/* Guard call_rcu thread creation. */
> > > > > > +
> > > > > > +static pthread_mutex_t call_rcu_mutex = PTHREAD_MUTEX_INITIALIZER;
> > > > > > +
> > > > > > +/* If a given thread does not have its own call_rcu thread, this is default. */
> > > > > > +
> > > > > > +static struct call_rcu_data *default_call_rcu_data;
> > > > > > +
> > > > > > +extern void synchronize_rcu(void);
> > > > > > +
> > > > > > +/*
> > > > > > + * If the sched_getcpu() and sysconf(_SC_NPROCESSORS_CONF) calls are
> > > > > > + * available, then we can have call_rcu threads assigned to individual
> > > > > > + * CPUs rather than only to specific threads.
> > > > > > + */
> > > > > > +
> > > > > > +#if defined(HAVE_SCHED_GETCPU) && defined(HAVE_SYSCONF)
> > > > > > +
> > > > > > +/*
> > > > > > + * Pointer to array of pointers to per-CPU call_rcu_data structures
> > > > > > + * and # CPUs.
> > > > > > + */
> > > > > > +
> > > > > > +static struct call_rcu_data **per_cpu_call_rcu_data;
> > > > > > +static long maxcpus;
> > > > > > +
> > > > > > +/* Allocate the array if it has not already been allocated. */
> > > > > > +
> > > > > > +static void alloc_cpu_call_rcu_data(void)
> > > > > > +{
> > > > > > +	struct call_rcu_data **p;
> > > > > > +	static int warned = 0;
> > > > > > +
> > > > > > +	if (maxcpus != 0)
> > > > > > +		return;
> > > > > > +	maxcpus = sysconf(_SC_NPROCESSORS_CONF);
> > > > > > +	if (maxcpus <= 0) {
> > > > > > +		return;
> > > > > > +	}
> > > > > > +	p = malloc(maxcpus * sizeof(*per_cpu_call_rcu_data));
> > > > > > +	if (p != NULL) {
> > > > > > +		memset(p, '\0', maxcpus * sizeof(*per_cpu_call_rcu_data));
> > > > > > +		per_cpu_call_rcu_data = p;
> > > > > > +	} else {
> > > > > > +		if (!warned) {
> > > > > > +			fprintf(stderr, "[error] liburcu: unable to allocate per-CPU pointer array\n");
> > > > > > +		}
> > > > > > +		warned = 1;
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > > +#else /* #if defined(HAVE_SCHED_GETCPU) && defined(HAVE_SYSCONF) */
> > > > > > +
> > > > > > +static const struct call_rcu_data **per_cpu_call_rcu_data = NULL;
> > > > > > +static const long maxcpus = -1;
> > > > > > +
> > > > > > +static void alloc_cpu_call_rcu_data(void)
> > > > > > +{
> > > > > > +}
> > > > > > +
> > > > > > +static int sched_getcpu(void)
> > > > > > +{
> > > > > > +	return -1;
> > > > > > +}
> > > > > > +
> > > > > > +#endif /* #else #if defined(HAVE_SCHED_GETCPU) && defined(HAVE_SYSCONF) */
> > > > > > +
> > > > > > +/* Acquire the specified pthread mutex. */
> > > > > > +
> > > > > > +static void call_rcu_lock(pthread_mutex_t *pmp)
> > > > > > +{
> > > > > > +	if (pthread_mutex_lock(pmp) != 0) {
> > > > > > +		perror("pthread_mutex_lock");
> > > > > > +		exit(-1);
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > > +/* Release the specified pthread mutex. */
> > > > > > +
> > > > > > +static void call_rcu_unlock(pthread_mutex_t *pmp)
> > > > > > +{
> > > > > > +	if (pthread_mutex_unlock(pmp) != 0) {
> > > > > > +		perror("pthread_mutex_unlock");
> > > > > > +		exit(-1);
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > > +/* This is the code run by each call_rcu thread. */
> > > > > > +
> > > > > > +static void *call_rcu_thread(void *arg)
> > > > > > +{
> > > > > > +	unsigned long cbcount;
> > > > > > +	struct cds_wfq_node *cbs;
> > > > > > +	struct cds_wfq_node **cbs_tail;
> > > > > > +	struct call_rcu_data *crdp = (struct call_rcu_data *)arg;
> > > > > > +	struct rcu_head *rhp;
> > > > > > +
> > > > > > +	thread_call_rcu_data = crdp;
> > > > > > +	for (;;) {
> > > > > > +		if (&crdp->cbs.head != _CMM_LOAD_SHARED(crdp->cbs.tail)) {
> > > > > > +			while ((cbs = _CMM_LOAD_SHARED(crdp->cbs.head)) == NULL)
> > > > > > +				poll(NULL, 0, 1);
> > > > > > +			_CMM_STORE_SHARED(crdp->cbs.head, NULL);
> > > > > > +			cbs_tail = (struct cds_wfq_node **)
> > > > > > +				uatomic_xchg(&crdp->cbs.tail, &crdp->cbs.head);
> > > > > > +			synchronize_rcu();
> > > > > > +			cbcount = 0;
> > > > > > +			do {
> > > > > > +				while (cbs->next == NULL &&
> > > > > > +				       &cbs->next != cbs_tail)
> > > > > > +				       	poll(NULL, 0, 1);
> > > > > > +				if (cbs == &crdp->cbs.dummy) {
> > > > > > +					cbs = cbs->next;
> > > > > > +					continue;
> > > > > > +				}
> > > > > > +				rhp = (struct rcu_head *)cbs;
> > > > > > +				cbs = cbs->next;
> > > > > > +				rhp->func(rhp);
> > > > > > +				cbcount++;
> > > > > > +			} while (cbs != NULL);
> > > > > > +			uatomic_sub(&crdp->qlen, cbcount);
> > > > > > +		}
> > > > > > +		if (crdp->flags & URCU_CALL_RCU_RT)
> > > > > > +			poll(NULL, 0, 10);
> > > > > > +		else {
> > > > > > +			call_rcu_lock(&crdp->mtx);
> > > > > > +			_CMM_STORE_SHARED(crdp->flags,
> > > > > > +				     crdp->flags & ~URCU_CALL_RCU_RUNNING);
> > > > > > +			if (&crdp->cbs.head ==
> > > > > > +			    _CMM_LOAD_SHARED(crdp->cbs.tail) &&
> > > > > > +			    pthread_cond_wait(&crdp->cond, &crdp->mtx) != 0) {
> > > > > > +				perror("pthread_cond_wait");
> > > > > > +				exit(-1);
> > > > > > +			}
> > > > > > +			_CMM_STORE_SHARED(crdp->flags,
> > > > > > +				     crdp->flags | URCU_CALL_RCU_RUNNING);
> > > > > > +			poll(NULL, 0, 10);
> > > > > > +			call_rcu_unlock(&crdp->mtx);
> > > > > > +		}
> > > > > > +	}
> > > > > > +	return NULL;  /* NOTREACHED */
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Create both a call_rcu thread and the corresponding call_rcu_data
> > > > > > + * structure, linking the structure in as specified.
> > > > > > + */
> > > > > > +
> > > > > > +void call_rcu_data_init(struct call_rcu_data **crdpp, unsigned long flags)
> > > > > > +{
> > > > > > +	struct call_rcu_data *crdp;
> > > > > > +
> > > > > > +	crdp = malloc(sizeof(*crdp));
> > > > > > +	if (crdp == NULL) {
> > > > > > +		fprintf(stderr, "Out of memory.\n");
> > > > > > +		exit(-1);
> > > > > > +	}
> > > > > > +	memset(crdp, '\0', sizeof(*crdp));
> > > > > > +	cds_wfq_init(&crdp->cbs);
> > > > > > +	crdp->qlen = 0;
> > > > > > +	if (pthread_mutex_init(&crdp->mtx, NULL) != 0) {
> > > > > > +		perror("pthread_mutex_init");
> > > > > > +		exit(-1);
> > > > > > +	}
> > > > > > +	if (pthread_cond_init(&crdp->cond, NULL) != 0) {
> > > > > > +		perror("pthread_cond_init");
> > > > > > +		exit(-1);
> > > > > > +	}
> > > > > > +	crdp->flags = flags | URCU_CALL_RCU_RUNNING;
> > > > > > +	cmm_smp_mb();  /* Structure initialized before pointer is planted. */
> > > > > > +	*crdpp = crdp;
> > > > > > +	if (pthread_create(&crdp->tid, NULL, call_rcu_thread, crdp) != 0) {
> > > > > > +		perror("pthread_create");
> > > > > > +		exit(-1);
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Return a pointer to the call_rcu_data structure for the specified
> > > > > > + * CPU, returning NULL if there is none.  We cannot automatically
> > > > > > + * created it because the platform we are running on might not define
> > > > > > + * sched_getcpu().
> > > > > > + */
> > > > > > +
> > > > > > +struct call_rcu_data *get_cpu_call_rcu_data(int cpu)
> > > > > > +{
> > > > > > +	static int warned = 0;
> > > > > > +
> > > > > > +	if (per_cpu_call_rcu_data == NULL)
> > > > > > +		return NULL;
> > > > > > +	if (!warned && maxcpus > 0 && (cpu < 0 || maxcpus <= cpu)) {
> > > > > > +		fprintf(stderr, "[error] liburcu: get CPU # out of range\n");
> > > > > > +		warned = 1;
> > > > > > +	}
> > > > > > +	if (cpu < 0 || maxcpus <= cpu)
> > > > > > +		return NULL;
> > > > > > +	return per_cpu_call_rcu_data[cpu];
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Return the tid corresponding to the call_rcu thread whose
> > > > > > + * call_rcu_data structure is specified.
> > > > > > + */
> > > > > > +
> > > > > > +pthread_t get_call_rcu_thread(struct call_rcu_data *crdp)
> > > > > > +{
> > > > > > +	return crdp->tid;
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Create a call_rcu_data structure (with thread) and return a pointer.
> > > > > > + */
> > > > > > +
> > > > > > +struct call_rcu_data *create_call_rcu_data(unsigned long flags)
> > > > > > +{
> > > > > > +	struct call_rcu_data *crdp;
> > > > > > +
> > > > > > +	call_rcu_data_init(&crdp, flags);
> > > > > > +	return crdp;
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Set the specified CPU to use the specified call_rcu_data structure.
> > > > > > + */
> > > > > > +
> > > > > > +int set_cpu_call_rcu_data(int cpu, struct call_rcu_data *crdp)
> > > > > > +{
> > > > > > +	int warned = 0;
> > > > > > +
> > > > > > +	call_rcu_lock(&call_rcu_mutex);
> > > > > > +	if (cpu < 0 || maxcpus <= cpu) {
> > > > > > +		if (!warned) {
> > > > > > +			fprintf(stderr, "[error] liburcu: set CPU # out of range\n");
> > > > > > +			warned = 1;
> > > > > > +		}
> > > > > > +		call_rcu_unlock(&call_rcu_mutex);
> > > > > > +		errno = EINVAL;
> > > > > > +		return -EINVAL;
> > > > > > +	}
> > > > > > +	alloc_cpu_call_rcu_data();
> > > > > > +	call_rcu_unlock(&call_rcu_mutex);
> > > > > > +	if (per_cpu_call_rcu_data == NULL) {
> > > > > > +		errno = ENOMEM;
> > > > > > +		return -ENOMEM;
> > > > > > +	}
> > > > > > +	per_cpu_call_rcu_data[cpu] = crdp;
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Return a pointer to the default call_rcu_data structure, creating
> > > > > > + * one if need be.  Because we never free call_rcu_data structures,
> > > > > > + * we don't need to be in an RCU read-side critical section.
> > > > > > + */
> > > > > > +
> > > > > > +struct call_rcu_data *get_default_call_rcu_data(void)
> > > > > > +{
> > > > > > +	if (default_call_rcu_data != NULL)
> > > > > > +		return rcu_dereference(default_call_rcu_data);
> > > > > > +	call_rcu_lock(&call_rcu_mutex);
> > > > > > +	if (default_call_rcu_data != NULL) {
> > > > > > +		call_rcu_unlock(&call_rcu_mutex);
> > > > > > +		return default_call_rcu_data;
> > > > > > +	}
> > > > > > +	call_rcu_data_init(&default_call_rcu_data, 0);
> > > > > > +	call_rcu_unlock(&call_rcu_mutex);
> > > > > > +	return default_call_rcu_data;
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Return the call_rcu_data structure that applies to the currently
> > > > > > + * running thread.  Any call_rcu_data structure assigned specifically
> > > > > > + * to this thread has first priority, followed by any call_rcu_data
> > > > > > + * structure assigned to the CPU on which the thread is running,
> > > > > > + * followed by the default call_rcu_data structure.  If there is not
> > > > > > + * yet a default call_rcu_data structure, one will be created.
> > > > > > + */
> > > > > > +struct call_rcu_data *get_call_rcu_data(void)
> > > > > > +{
> > > > > > +	int curcpu;
> > > > > > +	static int warned = 0;
> > > > > > +
> > > > > > +	if (thread_call_rcu_data != NULL)
> > > > > > +		return thread_call_rcu_data;
> > > > > > +	if (maxcpus <= 0)
> > > > > > +		return get_default_call_rcu_data();
> > > > > > +	curcpu = sched_getcpu();
> > > > > > +	if (!warned && (curcpu < 0 || maxcpus <= curcpu)) {
> > > > > > +		fprintf(stderr, "[error] liburcu: gcrd CPU # out of range\n");
> > > > > > +		warned = 1;
> > > > > > +	}
> > > > > > +	if (curcpu >= 0 && maxcpus > curcpu &&
> > > > > > +	    per_cpu_call_rcu_data != NULL &&
> > > > > > +	    per_cpu_call_rcu_data[curcpu] != NULL)
> > > > > > +	    	return per_cpu_call_rcu_data[curcpu];
> > > > > > +	return get_default_call_rcu_data();
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Return a pointer to this task's call_rcu_data if there is one.
> > > > > > + */
> > > > > > +
> > > > > > +struct call_rcu_data *get_thread_call_rcu_data(void)
> > > > > > +{
> > > > > > +	return thread_call_rcu_data;
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Set this task's call_rcu_data structure as specified, regardless
> > > > > > + * of whether or not this task already had one.  (This allows switching
> > > > > > + * to and from real-time call_rcu threads, for example.)
> > > > > > + */
> > > > > > +
> > > > > > +void set_thread_call_rcu_data(struct call_rcu_data *crdp)
> > > > > > +{
> > > > > > +	thread_call_rcu_data = crdp;
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Create a separate call_rcu thread for each CPU.  This does not
> > > > > > + * replace a pre-existing call_rcu thread -- use the set_cpu_call_rcu_data()
> > > > > > + * function if you want that behavior.
> > > > > > + */
> > > > > > +
> > > > > > +int create_all_cpu_call_rcu_data(unsigned long flags)
> > > > > > +{
> > > > > > +	int i;
> > > > > > +	struct call_rcu_data *crdp;
> > > > > > +	int ret;
> > > > > > +
> > > > > > +	call_rcu_lock(&call_rcu_mutex);
> > > > > > +	alloc_cpu_call_rcu_data();
> > > > > > +	call_rcu_unlock(&call_rcu_mutex);
> > > > > > +	if (maxcpus <= 0) {
> > > > > > +		errno = EINVAL;
> > > > > > +		return -EINVAL;
> > > > > > +	}
> > > > > > +	if (per_cpu_call_rcu_data == NULL) {
> > > > > > +		errno = ENOMEM;
> > > > > > +		return -ENOMEM;
> > > > > > +	}
> > > > > > +	for (i = 0; i < maxcpus; i++) {
> > > > > > +		call_rcu_lock(&call_rcu_mutex);
> > > > > > +		if (get_cpu_call_rcu_data(i)) {
> > > > > > +			call_rcu_unlock(&call_rcu_mutex);
> > > > > > +			continue;
> > > > > > +		}
> > > > > > +		crdp = create_call_rcu_data(flags);
> > > > > > +		if (crdp == NULL) {
> > > > > > +			call_rcu_unlock(&call_rcu_mutex);
> > > > > > +			errno = ENOMEM;
> > > > > > +			return -ENOMEM;
> > > > > > +		}
> > > > > > +		call_rcu_unlock(&call_rcu_mutex);
> > > > > > +		if ((ret = set_cpu_call_rcu_data(i, crdp)) != 0) {
> > > > > > +			/* FIXME: Leaks crdp for now. */
> > > > > > +			return ret; /* Can happen on race. */
> > > > > > +		}
> > > > > > +	}
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Schedule a function to be invoked after a following grace period.
> > > > > > + * This is the only function that must be called -- the others are
> > > > > > + * only present to allow applications to tune their use of RCU for
> > > > > > + * maximum performance.
> > > > > > + *
> > > > > > + * Note that unless a call_rcu thread has not already been created,
> > > > > > + * the first invocation of call_rcu() will create one.  So, if you
> > > > > > + * need the first invocation of call_rcu() to be fast, make sure
> > > > > > + * to create a call_rcu thread first.  One way to accomplish this is
> > > > > > + * "get_call_rcu_data();", and another is create_all_cpu_call_rcu_data().
> > > > > > + */
> > > > > > +
> > > > > > +void call_rcu(struct rcu_head *head,
> > > > > > +	      void (*func)(struct rcu_head *head))
> > > > > > +{
> > > > > > +	struct call_rcu_data *crdp;
> > > > > > +
> > > > > > +	cds_wfq_node_init(&head->next);
> > > > > > +	head->func = func;
> > > > > > +	crdp = get_call_rcu_data();
> > > > > > +	cds_wfq_enqueue(&crdp->cbs, &head->next);
> > > > > > +	uatomic_inc(&crdp->qlen);
> > > > > > +	if (!(_CMM_LOAD_SHARED(crdp->flags) & URCU_CALL_RCU_RT)) {
> > > > > > +		call_rcu_lock(&crdp->mtx);
> > > > > > +		if (!(_CMM_LOAD_SHARED(crdp->flags) & URCU_CALL_RCU_RUNNING)) {
> > > > > > +			if (pthread_cond_signal(&crdp->cond) != 0) {
> > > > > > +				perror("pthread_cond_signal");
> > > > > > +				exit(-1);
> > > > > > +			}
> > > > > > +		}
> > > > > > +		call_rcu_unlock(&crdp->mtx);
> > > > > > +	}
> > > > > > +}
> > > > > > diff --git a/urcu-call-rcu.h b/urcu-call-rcu.h
> > > > > > new file mode 100644
> > > > > > index 0000000..2c13388
> > > > > > --- /dev/null
> > > > > > +++ b/urcu-call-rcu.h
> > > > > > @@ -0,0 +1,80 @@
> > > > > > +#ifndef _URCU_CALL_RCU_H
> > > > > > +#define _URCU_CALL_RCU_H
> > > > > > +
> > > > > > +/*
> > > > > > + * urcu-call-rcu.h
> > > > > > + *
> > > > > > + * Userspace RCU header - deferred execution
> > > > > > + *
> > > > > > + * Copyright (c) 2009 Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> > > > > > + * Copyright (c) 2009 Paul E. McKenney, IBM Corporation.
> > > > > > + *
> > > > > > + * LGPL-compatible code should include this header with :
> > > > > > + *
> > > > > > + * #define _LGPL_SOURCE
> > > > > > + * #include <urcu-defer.h>
> > > > > > + *
> > > > > > + * This library is free software; you can redistribute it and/or
> > > > > > + * modify it under the terms of the GNU Lesser General Public
> > > > > > + * License as published by the Free Software Foundation; either
> > > > > > + * version 2.1 of the License, or (at your option) any later version.
> > > > > > + *
> > > > > > + * This library is distributed in the hope that it will be useful,
> > > > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > > > > + * Lesser General Public License for more details.
> > > > > > + *
> > > > > > + * You should have received a copy of the GNU Lesser General Public
> > > > > > + * License along with this library; if not, write to the Free Software
> > > > > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > > > > > + */
> > > > > > +
> > > > > > +#include <stdlib.h>
> > > > > > +#include <pthread.h>
> > > > > > +
> > > > > > +#include <urcu/wfqueue.h>
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +/* Note that struct call_rcu_data is opaque to callers. */
> > > > > > +
> > > > > > +struct call_rcu_data;
> > > > > > +
> > > > > > +/* Flag values. */
> > > > > > +
> > > > > > +#define URCU_CALL_RCU_RT	0x1
> > > > > > +#define URCU_CALL_RCU_RUNNING	0x2
> > > > > > +
> > > > > > +/*
> > > > > > + * The rcu_head data structure is placed in the structure to be freed
> > > > > > + * via call_rcu().
> > > > > > + */
> > > > > > +
> > > > > > +struct rcu_head {
> > > > > > +	struct cds_wfq_node next;
> > > > > > +	void (*func)(struct rcu_head *head);
> > > > > > +};
> > > > > > +
> > > > > > +/*
> > > > > > + * Exported functions
> > > > > > + */
> > > > > > +void call_rcu_data_init(struct call_rcu_data **crdpp, unsigned long flags);
> > > > > > +struct call_rcu_data *get_cpu_call_rcu_data(int cpu);
> > > > > > +pthread_t get_call_rcu_thread(struct call_rcu_data *crdp);
> > > > > > +struct call_rcu_data *create_call_rcu_data(unsigned long flags);
> > > > > > +int set_cpu_call_rcu_data(int cpu, struct call_rcu_data *crdp);
> > > > > > +struct call_rcu_data *get_default_call_rcu_data(void);
> > > > > > +struct call_rcu_data *get_call_rcu_data(void);
> > > > > > +struct call_rcu_data *get_thread_call_rcu_data(void);
> > > > > > +void set_thread_call_rcu_data(struct call_rcu_data *crdp);
> > > > > > +int create_all_cpu_call_rcu_data(unsigned long flags);
> > > > > > +void call_rcu(struct rcu_head *head,
> > > > > > +	      void (*func)(struct rcu_head *head));
> > > > > > +
> > > > > > +#ifdef __cplusplus 
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > > +#endif /* _URCU_CALL_RCU_H */
> > > > > > diff --git a/urcu-defer.h b/urcu-defer.h
> > > > > > index e161616..a64c75c 100644
> > > > > > --- a/urcu-defer.h
> > > > > > +++ b/urcu-defer.h
> > > > > > @@ -53,14 +53,6 @@ extern "C" {
> > > > > >  extern void defer_rcu(void (*fct)(void *p), void *p);
> > > > > >  
> > > > > >  /*
> > > > > > - * call_rcu will eventually be implemented with an API similar to the Linux
> > > > > > - * kernel call_rcu(), which will allow its use within RCU read-side C.S.
> > > > > > - * Generate an error if used for now.
> > > > > > - */
> > > > > > -
> > > > > > -#define call_rcu	__error_call_rcu_not_implemented_please_use_defer_rcu
> > > > > > -
> > > > > > -/*
> > > > > >   * Thread registration for reclamation.
> > > > > >   */
> > > > > >  extern void rcu_defer_register_thread(void);
> > > > > > diff --git a/urcu/wfqueue-static.h b/urcu/wfqueue-static.h
> > > > > > index 30d6e96..790931b 100644
> > > > > > --- a/urcu/wfqueue-static.h
> > > > > > +++ b/urcu/wfqueue-static.h
> > > > > > @@ -28,6 +28,7 @@
> > > > > >  
> > > > > >  #include <pthread.h>
> > > > > >  #include <assert.h>
> > > > > > +#include <poll.h>
> > > > > >  #include <urcu/compiler.h>
> > > > > >  #include <urcu/uatomic_arch.h>
> > > > > >  
> > > > > > @@ -47,12 +48,12 @@ extern "C" {
> > > > > >  #define WFQ_ADAPT_ATTEMPTS		10	/* Retry if being set */
> > > > > >  #define WFQ_WAIT			10	/* Wait 10 ms if being set */
> > > > > >  
> > > > > > -void _cds_wfq_node_init(struct cds_wfq_node *node)
> > > > > > +static inline void _cds_wfq_node_init(struct cds_wfq_node *node)
> > > > > >  {
> > > > > >  	node->next = NULL;
> > > > > >  }
> > > > > >  
> > > > > > -void _cds_wfq_init(struct cds_wfq_queue *q)
> > > > > > +static inline void _cds_wfq_init(struct cds_wfq_queue *q)
> > > > > >  {
> > > > > >  	int ret;
> > > > > >  
> > > > > > @@ -64,7 +65,8 @@ void _cds_wfq_init(struct cds_wfq_queue *q)
> > > > > >  	assert(!ret);
> > > > > >  }
> > > > > >  
> > > > > > -void _cds_wfq_enqueue(struct cds_wfq_queue *q, struct cds_wfq_node *node)
> > > > > > +static inline void _cds_wfq_enqueue(struct cds_wfq_queue *q,
> > > > > > +				    struct cds_wfq_node *node)
> > > > > >  {
> > > > > >  	struct cds_wfq_node **old_tail;
> > > > > >  
> > > > > > @@ -90,7 +92,7 @@ void _cds_wfq_enqueue(struct cds_wfq_queue *q, struct cds_wfq_node *node)
> > > > > >   * thread to be scheduled. The queue appears empty until tail->next is set by
> > > > > >   * enqueue.
> > > > > >   */
> > > > > > -struct cds_wfq_node *
> > > > > > +static inline struct cds_wfq_node *
> > > > > >  ___cds_wfq_dequeue_blocking(struct cds_wfq_queue *q)
> > > > > >  {
> > > > > >  	struct cds_wfq_node *node, *next;
> > > > > > @@ -128,7 +130,7 @@ ___cds_wfq_dequeue_blocking(struct cds_wfq_queue *q)
> > > > > >  	return node;
> > > > > >  }
> > > > > >  
> > > > > > -struct cds_wfq_node *
> > > > > > +static inline struct cds_wfq_node *
> > > > > >  _cds_wfq_dequeue_blocking(struct cds_wfq_queue *q)
> > > > > >  {
> > > > > >  	struct cds_wfq_node *retnode;
> > > > > > diff --git a/urcu/wfstack-static.h b/urcu/wfstack-static.h
> > > > > > index eed83da..ff18c4a 100644
> > > > > > --- a/urcu/wfstack-static.h
> > > > > > +++ b/urcu/wfstack-static.h
> > > > > > @@ -28,6 +28,7 @@
> > > > > >  
> > > > > >  #include <pthread.h>
> > > > > >  #include <assert.h>
> > > > > > +#include <poll.h>
> > > > > >  #include <urcu/compiler.h>
> > > > > >  #include <urcu/uatomic_arch.h>
> > > > > >  
> > > > > 
> > > > > -- 
> > > > > Mathieu Desnoyers
> > > > > Operating System Efficiency R&D Consultant
> > > > > EfficiOS Inc.
> > > > > http://www.efficios.com
> > > > > 
> > > > > _______________________________________________
> > > > > rp mailing list
> > > > > rp at svcs.cs.pdx.edu
> > > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp
> > > 
> > > -- 
> > > Mathieu Desnoyers
> > > Operating System Efficiency R&D Consultant
> > > EfficiOS Inc.
> > > http://www.efficios.com
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com