[lttng-dev] [RFC PATCH] rculfhash: race between replace and del operations

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Mon May 7 11:45:30 EDT 2012


Hi,

I just found a race in the replace vs del operations in rculfhash. This
would be introduced by commit:

commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date:   Mon Dec 19 16:45:51 2011 -0500

    rculfhash: Relax atomicity guarantees required by removal operation
    
    The atomicity guarantees for the removal operation do not need to be as
    strict as a cmpxchg. Use a uatomic_set followed by a xchg on a newly
    introduced "REMOVAL_OWNER_FLAG" to perform removal.
    
Here is the race:

Initially in hash table:  A

T0                                         T1
replace A by B
                                           del A
  read A->next
  -> check REMOVED flag, not set yet.
                                           read A->next
                                           -> check REMOVED flag, not set yet.
  cmpxchg A->next to set REMOVED flag
  -> cmpxchg succeeds
                                           uatomic_or to set REMOVED flag
                                           uatomic_xchg to atomically set the REMOVAL_OWNER flag
                                           -> first to set the flag.
  Replace returns node -> free(A)          Del success -> free(A)

With this race, we have a double-free.

The problem with the replace code is that it does not set the
"REMOVAL_OWNER" flag. Here is the fix I propose for this bug:

diff --git a/rculfhash.c b/rculfhash.c
index b26a690..e9cf062 100644
--- a/rculfhash.c
+++ b/rculfhash.c
@@ -735,12 +735,6 @@ int is_removed(struct cds_lfht_node *node)
 }
 
 static
-struct cds_lfht_node *flag_removed(struct cds_lfht_node *node)
-{
-	return (struct cds_lfht_node *) (((unsigned long) node) | REMOVED_FLAG);
-}
-
-static
 int is_bucket(struct cds_lfht_node *node)
 {
 	return ((unsigned long) node) & BUCKET_FLAG;
@@ -765,6 +759,12 @@ struct cds_lfht_node *flag_removal_owner(struct cds_lfht_node *node)
 }
 
 static
+struct cds_lfht_node *flag_removed_or_removal_owner(struct cds_lfht_node *node)
+{
+	return (struct cds_lfht_node *) (((unsigned long) node) | REMOVED_FLAG | REMOVAL_OWNER_FLAG);
+}
+
+static
 struct cds_lfht_node *get_end(void)
 {
 	return (struct cds_lfht_node *) END_VALUE;
@@ -894,6 +894,12 @@ int _cds_lfht_replace(struct cds_lfht *ht, unsigned long size,
 		}
 		assert(old_next == clear_flag(old_next));
 		assert(new_node != old_next);
+		/*
+		 * REMOVAL_OWNER flag is _NEVER_ set before the REMOVED
+		 * flag. It is either set atomically at the same time
+		 * (replace) or after (del).
+		 */
+		assert(!is_removal_owner(old_next));
 		new_node->next = old_next;
 		/*
 		 * Here is the whole trick for lock-free replace: we add
@@ -906,10 +912,12 @@ int _cds_lfht_replace(struct cds_lfht *ht, unsigned long size,
 		 * the old node, but will not see the new one.
 		 * This is a replacement of a node with another node
 		 * that has the same value: we are therefore not
-		 * removing a value from the hash table.
+		 * removing a value from the hash table. We set both the
+		 * REMOVED and REMOVAL_OWNER flags atomically so we own
+		 * the node after successful cmpxchg.
 		 */
 		ret_next = uatomic_cmpxchg(&old_node->next,
-			      old_next, flag_removed(new_node));
+			old_next, flag_removed_or_removal_owner(new_node));
 		if (ret_next == old_next)
 			break;		/* We performed the replacement. */
 		old_next = ret_next;

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list