[lttng-dev] [rp] rculfhash ordering guarantees

Paul E. McKenney paulmck at linux.vnet.ibm.com
Mon Apr 23 18:26:09 EDT 2012


On Mon, Apr 23, 2012 at 03:45:32PM -0400, Mathieu Desnoyers wrote:
> Hi Paul,
> 
> Here is the updated text I plan for the next update. Comments are
> welcome, thanks !

Looks much improved!  The inevitable questions and comments interspersed.

							Thanx, Paul

> Mathieu
> 
>  * Ordering Guarantees:
>  *
>  * To discuss these guarantees, we first define "read" operation as any
>  * of the the basic cds_lfht_lookup, cds_lfht_next_duplicate,
>  * cds_lfht_first, cds_lfht_next operation.
>  *
>  * We define "read traversal" operation as any of the following
>  * group of operations
>  *  - cds_lfht_lookup followed by iteration with cds_lfht_next_duplicate
>  *  - cds_lfht_first followed iteration with cds_lfht_next

Can't cds_lfht_next() and cds_lfht_next_duplicate() be intermixed?
Something like the following:

	rcu_read_lock();
	p = cds_lfht_lookup(...);
	while (p != NULL) {
		do_something(p);
		if (p->dups_of_interest)
			p = cds_lfht_next_duplicate(...);
		else
			p = cds_lfht_next(...);
	}
	rcu_read_unlock();

>  *
>  * We define "write" operations as any of cds_lfht_add,
>  * cds_lfht_add_unique, cds_lfht_add_replace, cds_lfht_del.
>  *
>  * We define "prior" and "later" node as nodes observable by reads and
>  * read traversals respectively before and after a write or sequence of
>  * write operations.
>  *
>  * It is implicit from the requirement of the read, read traversal, and
>  * write operations that RCU read-side locks need to be held across
>  * traversals, and between a read or read traversal and invocation of a
>  * write that receives a node as argument.

This paragraph is implying a lot...  How about the following?

	Hash-table operations are often chained, for example, the
	pointer returned by a cds_lfht_lookup() might be passed to a
	cds_lfht_next(), whose return value might in turn be passed to
	another hash-table operation.  This entire chain of operations
	must be enclosed by a pair of matching rcu_read_lock() and
	rcu_read_unlock() operations.

>  *
>  * The following ordering guarantees are offered by this hash table:
>  *
>  * A.1) "read" after "write": if there is ordering between a write and a
>  *      later read, then the read is guaranteed to see the write or some
>  *      later write.
>  * A.2) "read traversal" after "write": given that there is dependency
>  *      ordering between reads in a "read traversal", if there is
>  *      ordering between a write and the first read of the traversal,
>  *      then the "read traversal" is guaranteed to see the write or
>  *      some later write.
>  * B.1) "write" after "read": if there is ordering between a read and a
>  *      later write, then the read will never see the write.
>  * B.2) "write" after "read traversal": given that there is dependency
>  *      ordering between reads in a "read traversal", if there is
>  *      ordering between the last read of the traversal and a later
>  *      write, then the "read traversal" will never see the write.
>  * C)   "write" while "read traversal": if a write occurs during a "read
>  *      traversal", the traversal may, or may not, see the write.
>  * D)   "write" vs "write": writes occur atomically between their
>  *      invocation and the moment they return. There is a full memory
>  *      barrier before and after the atomic "commit" point of each
>  *      write.

This is a description of the implementation, while the others are guarantees.
Of course, that begs the question of what the guarantee is...  Maybe the
following?

D.1	"write" after "write": if there is ordering between a write and
	a later write, then the later write is guaranteed to see the
	effects of the first write.
D.2	Concurrent "write" pairs: The system will assign an arbitrary order
	to any pair of concurrent conflicting writes.  Non-conflicting
	writes (for example, to different keys) are unordered.

>  * E)   If a grace period separates a "del" or "replace" operation
>  *      and a subsequent operation, then that subsequent operation is
>  *      guaranteed not to see the removed item.
>  * F)   Uniqueness guarantee: given a hash table that does not contain
>  *      duplicate items for a given key, there will only be one item in
>  *      the hash table after an arbitrary sequence of add_unique and/or
>  *      add_replace operations.  Note, however, that a pair of
>  *      concurrent read operations might well access two different items
>  *      with that key.
>  * G.1) Given a hash table that does not contain duplicate items for a
>  *      given key, if a pair of lookups for a given key are ordered
>  *      (e.g. by a memory barrier), then the second lookup will return
>  *      the same node as the previous lookup, or some later node.
>  * G.2) A "read traversal" that starts after the end of a prior "read
>  *      traversal" (ordered by memory barriers) is guaranteed to see the
>  *      same nodes as the previous traversal, or some later nodes.
>  *
>  * Progress guarantees:
>  *
>  * * Reads are wait-free. These operations always move forward in the
>  *   hash table linked list, and this list has no loop.
>  * * Writes are lock-free. Any retry loop performed by a write operation
>  *   is triggered by progress made within another update operation.
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
> 
> _______________________________________________
> rp mailing list
> rp at svcs.cs.pdx.edu
> http://svcs.cs.pdx.edu/mailman/listinfo/rp
> 




More information about the lttng-dev mailing list