[ltt-dev] Review of rculfhash

Mathieu Desnoyers compudj at krystal.dyndns.org
Tue Nov 1 15:39:42 EDT 2011


* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> Hello!
> 
> Cool stuff!!!  A few preliminary comments below, as always.  Followed by
> a patch for one thing that I am confident in.

Thanks for the review and patch! Some answers below,

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> urcu/rculfhash.h:
> 
> o	A comment for each structure is needed, perhaps somthing like
> 	the following:
> 
> 	_cds_lfht_node: Contains the internal pointers and reverse-hash
> 		value required for traversal of the hash table.

traversal... -> lookup and traversal of the hash table.

> 
> 	cds_lfht_node: Contains the full key and length required to check
> 		for an actual match, and also contains an rcu_head
> 		structure that is used by RCU to track a node through
> 		a given RCU grace period.  There is an instance
> 		of _cds_lfht_node enclosed as a field within each
> 		_cds_lfht_node structure.
> 
> 	cds_lfht_iter: Used to track state while traversing a hash chain.

I'll add these comments using your wording.

> o	So the _cds_lfht_new() is not to be called by ordinary users,
> 	but the only call to it passes rcu_read_lock().  So the idea
> 	is to permit use of any of the predefined RCU implementations,
> 	and "void the warrantee" for any other usage?  ;-)

"other usage" with other RCU implementations (outside of those in
liburcu) would have to implement wrappers for each of these primitives,
or use _cds_lfht_new to specify which callback to use manually.

> 	Should there be an API member that returns whether or not
> 	a resize is in progress?  Or a callback that can be registered
> 	to be invoked when resizes start and finish?  This might allow
> 	users to have greater control over priorities.  (Note: This is
> 	speculation on my part.  Probably should demonstrated need
> 	for this before providing it.  Otherwise, gilded lilies and
> 	all that.)

So far I have not been able to picture a clear use-case for this. If
someone comes up with the need for this, we'll add it, not a problem.

One feature I am currently thinking about that could be quite useful is
to give the user the ability to specify a hash table "max" size, past
which insertion of new nodes would fail. This would enable users to
first try adding a node, then select a node to remove (randomly ?) and
then perform a add_replace. The idea is that the max size check would be
done with a split-counter, so the limit would not be unit-accurate, but
good enough to ensure the hash table does not grow far beyond the
specified limit. Thoughts ?

> 
> o	Comment for cds_lfht_destroy() should state that attr may be
> 	NULL if the caller does not want/need to be informed of the
> 	value passed to cds_lfht_new().

Will add.

> 
> o	Comment for cds_lfht_count_nodes() should say what approx_before,
> 	count, removed, and approx_after have to do with each other.
> 	For example, a node is either counted in *count or in
> 	*removed, but not both.  Probably no need to mention dummy
> 	nodes, as they are invisible to the caller.  As near as I can
> 	tell, *approx_before and *approx_after are both zero unless
> 	debugging.

Will add:

/*
 * cds_lfht_count_nodes - count the number of nodes in the hash table.
 * @ht: the hash table.
 * @split_count_before: Sample the node count split-counter before traversal.
 * @count: Traverse the hash table, count the number of nodes observed.
 * @removed: Number of logically removed nodes observed during traversal.
 * @split_count_after: Sample the node count split-counter after traversal.
 * Call with rcu_read_lock held.
 * Threads calling this API need to be registered RCU read-side threads.
 */

> 
> 
> rculfhash.c:
> 
> o	A comment for each structure is needed, perhaps somthing like
> 	the following:
> 
> 	ht_items_count: Counters of additions and deletions used during
> 		stress testing.
> 

/*
 * ht_items_count: Split-counters counting the number of node addition
 * and removal in the table. Only used if the CDS_LFHT_ACCOUNTING flag
 * is set at hash table creation.
 */

> 	rcu_level: Array of pointers into the list.  Each element of
> 		this array is analogous to the hash-chain headers found
> 		in other types of hash tables.  Resizing a hash table
> 		involves creating a new rcu_level array that replaces
> 		the old array.

Not quite true. For a visual explanation, please refer to
http://www.efficios.com/pub/lpc2011/Presentation-lpc2011-desnoyers-urcu.pdf
slide 20. The rcu_level structure corresponds to the "Dummy node arrays
(per-order)" in the picture.

Documenting as:

/*
 * rcu_level: Contains the per order-index-level dummy node table. The
 * size of each dummy node table is half the number of hashes contained
 * in this order (except for order 0). The minimum allocation size
 * parameter allows combining the dummy node arrays of the lowermost
 * levels to improve cache locality for small index orders.
 */


> 
> 	rcu_table: Reference to current rcu_level array.  Includes
> 		size and desired new size if a resize operation is
> 		in progress.

/*
 * rcu_table: Contains the size and desired new size if a resize
 * operation is in progress, as well as the statically-sized array of
 * rcu_level pointers.
 */

> 
> 	But why isn't the size kept with the array?

See my above comments for explanation. We don't resize the "tbl" array
in the rcu_table (it has a size of 32 or 64, depending on the
architecture word size). So we only use the size to know how many
pointers of the tbl array are populated. Note that this has changed
since our last discussion about the hash table internals, mainly thanks
to feedback from Josh Triplett.

We can think of this "size" entry as a RCU pointer, really. But instead
of changing the RCU pointer that points to a version of the table, we
keep a statically-sized table (up to the max number of entries we can
have in the architecture), and use the "size" as a delimiter of the
number of entries that are valid in that table (kind-of-like an RCU
pointer).

>  This would eliminate
> 	many of the memory barriers, for example, the cmm_smp_wmb()
> 	in init_table(), fini_table(), and the ones that should be
> 	there that is not in cds_lfht_lookup() &c.  Unless I am confused,
> 	the current code allows one size to be used on another array.
> 	(That said, not sure I yet understand the relationship of ->size
> 	and the various arrays.)

Please tell me if you still have questions after reading my comments
above.


> 	cds_lfht: Top-level data structure representing a lock-free
> 		hash table.  Presumably defined in the .c file in
> 		order to make it be an opaque cookie to users.

Yep :)

> 
> 	rcu_resize_work: ???

/*
 * rcu_resize_work: Contains arguments passed to RCU worker thread
 * responsible for performing lazy resize.
 */

> 
> 	partition_resize_work: ???

/*
 * partition_resize_work: Contains arguments passed to worker threads
 * executing the hash table resize on partitions of the hash table
 * assigned to each processor's worker thread.
 */


> 
> o	Cute trick for generating the bit-reversal table.  ;-)

I stole it from the public domaine from Sean Eron Anderson (as
documented in the source code). Nice hack indeed ! :)

> 
> o	fls_ulong(): Shouldn't the CAA_BITS_PER_lONG instead be
> 	CAA_BITS_PER_LONG?

patch merged, thanks!

> 
> o	What defines POISON_FREE?  I don't see anything but #ifdefs
> 	of it.

You really have to defined it with e.g. make CFLAGS=-DPOISON_FREE.
Thoughts ?

> 
> o	Any chance of a comment saying what CDS_LFHT_ACCOUNTING does?
> 	It seems to short-circuit ht_count_add() and ht_count_del().
> 	Appears to be debug-only checks of addition/deletion counts.

It's rather the opposite. The accounting is usedto trigger in the
automatic resize operation. We fall-back on local chain length count to
trigger expand if accounting is not enabled. We use the chain length
count to deal with small table expand too, because the split-counters
are not good at tracking small amount of nodes in the table (they push
their updates to the global counter only every 1024 add and every 1024
remove).

> 
> o	lookup_bucket()
> 
> o	ht_count_add() is interesting in that it does not appear to
> 	clear out ht->split_count[index].add.  The places that read it
> 	do not appear to mask the upper bits, either.  A comment
> 	explaining why this is legitimate?
> 
> 	Ditto for ht_count_del().

Adding:

 * These are free-running counters, never reset to zero. They count the
 * number of add/remove, and trigger every (1 << COUNT_COMMIT_ORDER)
 * operations to update the global counter. We choose a power-of-2 value
 * for the trigger to deal with 32 or 64-bit overflow of the counter.

> 
> o	cds_lfht_lookup() applies clear_flag() to the pointer twice.
> 	Unless I am missing something, the clear_flag() in the "if"
> 	statement in the loop can be omitted.

right, will remove.

> 
> 	Also, the ->next value returned does have its flag bits set.
> 	However, this seems to be necessary for cds_lfht_next_duplicate()
> 	to work correctly.

Where are we clearing its flag bits ?

> 
> o	cds_lfht_next_duplicate() would be more clear if ->compare_fct()
> 	was ->is_different() or some such.  But I guess if that is an
> 	obstacle, you probably shouldn't be looking at this stuff in the
> 	first place.

Fair point! I'm using a "compare" semantic to stay similar to other hash
table implementations (e.g. glib hash table).

> 
> 	Need to document what happens if an insertion of a
> 	duplicate key happens between a cds_lfht_lookup() and a
> 	cds_lfht_next_duplicate().  Looks to me like they will be seen.

cds_lfht_lookup() returns an iterator (node and next pointer). The next
pointer points to a node that has a different key value (by definition
of the problem, because you specify that the hash table already has only
one instance of the key (no duplicate of the key)). So when using
cds_lfht_next_duplicate(), we use the next pointer contained in the
iterator to fetch the next node, which necessarily is not a duplicate,
even if a duplicate has been added between the end of lfht_lookup() and
the beginning of lfht_next_duplicate(). Sounds good ?

If you have ideas on how to phrase this explanation, and where this
comment should be added, please let me know.

> 
> 
> tests/test_urcu_hash.c:
> 
> o	test_compare() -- Why the return -1 if the keys are not equal in
> 	length, but then fail an assert if not equal to size of unsigned
> 	long?

The test is specifying a fixed key size of sizeof(unsigned long). So if
for some reason the test_compare() function receives a different size,
we have an implementation bug.

Thanks!

Mathieu

> 
> ------------------------------------------------------------------------
> 
> Fix CAA_BITS_PER_lONG typo
> 
> Should instead be CAA_BITS_PER_LONG.
> 
> Signed-off-by: Paul E. McKenney <paul.mckenney at linaro.org>
> 
> diff --git a/rculfhash.c b/rculfhash.c
> index d786a3d..314cca8 100644
> --- a/rculfhash.c
> +++ b/rculfhash.c
> @@ -454,7 +454,7 @@ unsigned int fls_u32(uint32_t x)
>  
>  unsigned int fls_ulong(unsigned long x)
>  {
> -#if (CAA_BITS_PER_lONG == 32)
> +#if (CAA_BITS_PER_LONG == 32)
>  	return fls_u32(x);
>  #else
>  	return fls_u64(x);
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com




More information about the lttng-dev mailing list