[lttng-dev] [PATCH] urcu-mb/signal/membarrier: batch concurrent synchronize_rcu()
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Sun Nov 25 22:22:20 EST 2012
Here are benchmarks on batching of synchronize_rcu(), and it leads to
very interesting scalability improvement and speedups, e.g., on a
24-core AMD, with a write-heavy scenario (4 readers threads, 20 updater
threads, each updater using synchronize_rcu()):
* Serialized grace periods:
./test_urcu 4 20 20
SUMMARY ./test_urcu testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 20 wdelay 0
nr_reads 714598368 nr_writes 5032889 nr_ops 719631257
* Batched grace periods:
./test_urcu 4 20 20
SUMMARY ./test_urcu testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 20 wdelay 0
nr_reads 611848168 nr_writes 9877965 nr_ops 621726133
For a 9877965/5032889 = 1.96 speedup for 20 updaters.
Of course, we can see that readers have slowed down, probably due to
increased update traffic, given there is no change to the read-side code
whatsoever.
Now let's see the penality of managing the stack for single-updater.
With 4 readers, single updater:
* Serialized grace periods :
./test_urcu 4 1 20
SUMMARY ./test_urcu testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 1 wdelay 0
nr_reads 241959144 nr_writes 11146189 nr_ops 253105333
SUMMARY ./test_urcu testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 1 wdelay 0
nr_reads 257131080 nr_writes 12310537 nr_ops 269441617
SUMMARY ./test_urcu testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 1 wdelay 0
nr_reads 259973359 nr_writes 12203025 nr_ops 272176384
* Batched grace periods :
SUMMARY ./test_urcu testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 1 wdelay 0
nr_reads 298926555 nr_writes 14018748 nr_ops 312945303
SUMMARY ./test_urcu testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 1 wdelay 0
nr_reads 272411290 nr_writes 12832166 nr_ops 285243456
SUMMARY ./test_urcu testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 1 wdelay 0
nr_reads 267511858 nr_writes 12822026 nr_ops 280333884
Serialized vs batched seems to similar, batched possibly even slightly
faster, but this is probably caused by NUMA affinity.
CC: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs at cn.fujitsu.com>
CC: Alan Stern <stern at rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
---
diff --git a/urcu.c b/urcu.c
index e6ff0f3..836bad9 100644
--- a/urcu.c
+++ b/urcu.c
@@ -43,6 +43,7 @@
#include "urcu/tls-compat.h"
#include "urcu-die.h"
+#include "urcu-wait.h"
/* Do not #define _LGPL_SOURCE to ensure we can emit the wrapper symbols */
#undef _LGPL_SOURCE
@@ -106,6 +107,12 @@ DEFINE_URCU_TLS(unsigned int, rcu_rand_yield);
static CDS_LIST_HEAD(registry);
+/*
+ * Queue keeping threads awaiting to wait for a grace period. Contains
+ * struct gp_waiters_thread objects.
+ */
+static DEFINE_URCU_WAIT_QUEUE(gp_waiters);
+
static void mutex_lock(pthread_mutex_t *mutex)
{
int ret;
@@ -306,9 +313,31 @@ void synchronize_rcu(void)
{
CDS_LIST_HEAD(cur_snap_readers);
CDS_LIST_HEAD(qsreaders);
+ DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
+ struct urcu_waiters waiters;
+
+ /*
+ * Add ourself to gp_waiters queue of threads awaiting to wait
+ * for a grace period. Proceed to perform the grace period only
+ * if we are the first thread added into the queue.
+ */
+ if (urcu_wait_add(&gp_waiters, &wait) != 0) {
+ /* Not first in queue: will be awakened by another thread. */
+ urcu_adaptative_busy_wait(&wait);
+ /* Order following memory accesses after grace period. */
+ cmm_smp_mb();
+ return;
+ }
+ /* We won't need to wake ourself up */
+ urcu_wait_set_state(&wait, URCU_WAIT_RUNNING);
mutex_lock(&rcu_gp_lock);
+ /*
+ * Move all waiters into our local queue.
+ */
+ urcu_move_waiters(&waiters, &gp_waiters);
+
if (cds_list_empty(®istry))
goto out;
@@ -374,6 +403,13 @@ void synchronize_rcu(void)
smp_mb_master(RCU_MB_GROUP);
out:
mutex_unlock(&rcu_gp_lock);
+
+ /*
+ * Wakeup waiters only after we have completed the grace period
+ * and have ensured the memory barriers at the end of the grace
+ * period have been issued.
+ */
+ urcu_wake_all_waiters(&waiters);
}
/*
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list