[lttng-dev] [RFC PATCH] rculfhash: race between replace and del operations

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Mon May 7 19:04:19 EDT 2012


* Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> Hi,
> 
> I just found a race in the replace vs del operations in rculfhash. This
> would be introduced by commit:

Just to show that I am not making this up, I came up with a test-case to
reproduce it:

test_urcu_hash 0 2 20 -A -s -M 1 -N 1 -O 1

(2 threads, doing replace/del, with a hash table that has only a single
key for all values). After just a couple of seconds, either the program
hangs, or, more often, I get:

*** glibc detected ***
/media/truecrypt1/compudj/doc/userspace-rcu/tests/.libs/test_urcu_hash:
malloc(): memory corruption (fast): 0x00007ffff3a29e25 ***

Program received signal SIGSEGV, Segmentation fault.

With my fix below, the problem disappears.

I'll push the fix into the master branch right away.

Thanks,

Mathieu

> 
> commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0
> Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> Date:   Mon Dec 19 16:45:51 2011 -0500
> 
>     rculfhash: Relax atomicity guarantees required by removal operation
>     
>     The atomicity guarantees for the removal operation do not need to be as
>     strict as a cmpxchg. Use a uatomic_set followed by a xchg on a newly
>     introduced "REMOVAL_OWNER_FLAG" to perform removal.
>     
> Here is the race:
> 
> Initially in hash table:  A
> 
> T0                                         T1
> replace A by B
>                                            del A
>   read A->next
>   -> check REMOVED flag, not set yet.
>                                            read A->next
>                                            -> check REMOVED flag, not set yet.
>   cmpxchg A->next to set REMOVED flag
>   -> cmpxchg succeeds
>                                            uatomic_or to set REMOVED flag
>                                            uatomic_xchg to atomically set the REMOVAL_OWNER flag
>                                            -> first to set the flag.
>   Replace returns node -> free(A)          Del success -> free(A)
> 
> With this race, we have a double-free.
> 
> The problem with the replace code is that it does not set the
> "REMOVAL_OWNER" flag. Here is the fix I propose for this bug:
> 
> diff --git a/rculfhash.c b/rculfhash.c
> index b26a690..e9cf062 100644
> --- a/rculfhash.c
> +++ b/rculfhash.c
> @@ -735,12 +735,6 @@ int is_removed(struct cds_lfht_node *node)
>  }
>  
>  static
> -struct cds_lfht_node *flag_removed(struct cds_lfht_node *node)
> -{
> -	return (struct cds_lfht_node *) (((unsigned long) node) | REMOVED_FLAG);
> -}
> -
> -static
>  int is_bucket(struct cds_lfht_node *node)
>  {
>  	return ((unsigned long) node) & BUCKET_FLAG;
> @@ -765,6 +759,12 @@ struct cds_lfht_node *flag_removal_owner(struct cds_lfht_node *node)
>  }
>  
>  static
> +struct cds_lfht_node *flag_removed_or_removal_owner(struct cds_lfht_node *node)
> +{
> +	return (struct cds_lfht_node *) (((unsigned long) node) | REMOVED_FLAG | REMOVAL_OWNER_FLAG);
> +}
> +
> +static
>  struct cds_lfht_node *get_end(void)
>  {
>  	return (struct cds_lfht_node *) END_VALUE;
> @@ -894,6 +894,12 @@ int _cds_lfht_replace(struct cds_lfht *ht, unsigned long size,
>  		}
>  		assert(old_next == clear_flag(old_next));
>  		assert(new_node != old_next);
> +		/*
> +		 * REMOVAL_OWNER flag is _NEVER_ set before the REMOVED
> +		 * flag. It is either set atomically at the same time
> +		 * (replace) or after (del).
> +		 */
> +		assert(!is_removal_owner(old_next));
>  		new_node->next = old_next;
>  		/*
>  		 * Here is the whole trick for lock-free replace: we add
> @@ -906,10 +912,12 @@ int _cds_lfht_replace(struct cds_lfht *ht, unsigned long size,
>  		 * the old node, but will not see the new one.
>  		 * This is a replacement of a node with another node
>  		 * that has the same value: we are therefore not
> -		 * removing a value from the hash table.
> +		 * removing a value from the hash table. We set both the
> +		 * REMOVED and REMOVAL_OWNER flags atomically so we own
> +		 * the node after successful cmpxchg.
>  		 */
>  		ret_next = uatomic_cmpxchg(&old_node->next,
> -			      old_next, flag_removed(new_node));
> +			old_next, flag_removed_or_removal_owner(new_node));
>  		if (ret_next == old_next)
>  			break;		/* We performed the replacement. */
>  		old_next = ret_next;
> 
> Thoughts ?
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list