[ltt-dev] [PATCH resent 7/7] urcu-qsbr: avoid useless futex wakeups and burning CPU for long grace periods
Mathieu Desnoyers
compudj at krystal.dyndns.org
Wed Aug 17 05:39:42 EDT 2011
* Paolo Bonzini (pbonzini at redhat.com) wrote:
> I noticed that urcu makes exactly _one_ attempt at using futexes
> to avoid busy looping on synchronize_rcu. The attached patch instead
> switches from busy waiting to futexes after RCU_QS_ACTIVE_ATTEMPTS.
> To limit the amount of system calls, reading threads remember whether
> they already had a quiescent state in this grace period; if so they were
> already removed from the list, and can avoid signaling the futex.
>
> Performance measured with rcutorture (nreaders: 10, nupdaters: 1,
> duration: 10, median of nine runs):
>
> RCU_QS_ACTIVE_ATTEMPTS == 100, no patch n_updates = 292
> RCU_QS_ACTIVE_ATTEMPTS == 1, no patch n_updates = 290
> RCU_QS_ACTIVE_ATTEMPTS == 100, with patch n_updates = 408
> RCU_QS_ACTIVE_ATTEMPTS == 1, with patch n_updates = 404
>
> (the first two cases are obviously the same; the only change is
> when the futex is used, but over many calls there is no difference).
>
> This patch matches the update to the Promela model.
Even though the current promela model does not model the memory
ordering, I feel confident enough now, after reviewing the model and
this implementation, that it is OK. So I'll pull this patch, thanks !
It gets me wondering though.. we might want to create a wait/wakeup
abstraction within urcu, as this is getting a common use-case.
Thanks!
Mathieu
>
> Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
> ---
> urcu-qsbr.c | 17 ++++++++++++-----
> urcu/static/urcu-qsbr.h | 7 ++++++-
> 2 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/urcu-qsbr.c b/urcu-qsbr.c
> index a239be0..8d8a9cf 100644
> --- a/urcu-qsbr.c
> +++ b/urcu-qsbr.c
> @@ -150,26 +150,33 @@ static void update_counter_and_wait(void)
> */
> for (;;) {
> wait_loops++;
> - if (wait_loops == RCU_QS_ACTIVE_ATTEMPTS) {
> - uatomic_dec(&gp_futex);
> + if (wait_loops >= RCU_QS_ACTIVE_ATTEMPTS) {
> + uatomic_set(&gp_futex, -1);
> + /*
> + * Write futex before write waiting (the other side
> + * reads them in the opposite order).
> + */
> + cmm_smp_wmb();
> + cds_list_for_each_entry(index, ®istry, node) {
> + _CMM_STORE_SHARED(index->waiting, 1);
> + }
> /* Write futex before read reader_gp */
> cmm_smp_mb();
> }
> -
> cds_list_for_each_entry_safe(index, tmp, ®istry, node) {
> if (!rcu_gp_ongoing(&index->ctr))
> cds_list_move(&index->node, &qsreaders);
> }
>
> if (cds_list_empty(®istry)) {
> - if (wait_loops == RCU_QS_ACTIVE_ATTEMPTS) {
> + if (wait_loops >= RCU_QS_ACTIVE_ATTEMPTS) {
> /* Read reader_gp before write futex */
> cmm_smp_mb();
> uatomic_set(&gp_futex, 0);
> }
> break;
> } else {
> - if (wait_loops == RCU_QS_ACTIVE_ATTEMPTS) {
> + if (wait_loops >= RCU_QS_ACTIVE_ATTEMPTS) {
> wait_gp();
> } else {
> #ifndef HAS_INCOHERENT_CACHES
> diff --git a/urcu/static/urcu-qsbr.h b/urcu/static/urcu-qsbr.h
> index 5b7adac..489abb0 100644
> --- a/urcu/static/urcu-qsbr.h
> +++ b/urcu/static/urcu-qsbr.h
> @@ -124,6 +124,7 @@ struct rcu_reader {
> unsigned long ctr;
> /* Data used for registry */
> struct cds_list_head node __attribute__((aligned(CAA_CACHE_LINE_SIZE)));
> + int waiting;
> pthread_t tid;
> };
>
> @@ -136,7 +137,11 @@ extern int32_t gp_futex;
> */
> static inline void wake_up_gp(void)
> {
> - if (unlikely(uatomic_read(&gp_futex) == -1)) {
> + if (unlikely(_CMM_LOAD_SHARED(rcu_reader.waiting))) {
> + _CMM_STORE_SHARED(rcu_reader.waiting, 0);
> + cmm_smp_mb();
> + if (uatomic_read(&gp_futex) != -1)
> + return;
> uatomic_set(&gp_futex, 0);
> futex_noasync(&gp_futex, FUTEX_WAKE, 1,
> NULL, NULL, 0);
> --
> 1.7.6
>
>
> _______________________________________________
> ltt-dev mailing list
> ltt-dev at lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list