[lttng-dev] [rp] rculfhash ordering guarantees
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Mon Apr 23 22:54:54 EDT 2012
* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> On Mon, Apr 23, 2012 at 03:45:32PM -0400, Mathieu Desnoyers wrote:
> > Hi Paul,
> >
> > Here is the updated text I plan for the next update. Comments are
> > welcome, thanks !
>
> Looks much improved! The inevitable questions and comments interspersed.
>
> Thanx, Paul
>
> > Mathieu
> >
> > * Ordering Guarantees:
> > *
> > * To discuss these guarantees, we first define "read" operation as any
> > * of the the basic cds_lfht_lookup, cds_lfht_next_duplicate,
> > * cds_lfht_first, cds_lfht_next operation.
> > *
> > * We define "read traversal" operation as any of the following
> > * group of operations
> > * - cds_lfht_lookup followed by iteration with cds_lfht_next_duplicate
> > * - cds_lfht_first followed iteration with cds_lfht_next
>
> Can't cds_lfht_next() and cds_lfht_next_duplicate() be intermixed?
> Something like the following:
>
> rcu_read_lock();
> p = cds_lfht_lookup(...);
> while (p != NULL) {
> do_something(p);
> if (p->dups_of_interest)
> p = cds_lfht_next_duplicate(...);
> else
> p = cds_lfht_next(...);
> }
> rcu_read_unlock();
Yes, they can technically be intermixed, but I doubt that the
"cds_lfht_next" after a lookup will be of any interest: it will get the
nodes with keys following the current node's key (in split-ordered
order).
A use-case that would make more sense would be:
- cds_lfht_first
- iterate on the table with cds_lfht_next(), until we find an
interesting key.
- Then, iterate on all the duplicate of this key.
So I will document that both can be intermixed.
>
> > *
> > * We define "write" operations as any of cds_lfht_add,
> > * cds_lfht_add_unique, cds_lfht_add_replace, cds_lfht_del.
> > *
> > * We define "prior" and "later" node as nodes observable by reads and
> > * read traversals respectively before and after a write or sequence of
> > * write operations.
> > *
> > * It is implicit from the requirement of the read, read traversal, and
> > * write operations that RCU read-side locks need to be held across
> > * traversals, and between a read or read traversal and invocation of a
> > * write that receives a node as argument.
>
> This paragraph is implying a lot... How about the following?
>
> Hash-table operations are often chained, for example, the
> pointer returned by a cds_lfht_lookup() might be passed to a
> cds_lfht_next(), whose return value might in turn be passed to
> another hash-table operation. This entire chain of operations
> must be enclosed by a pair of matching rcu_read_lock() and
> rcu_read_unlock() operations.
looks good, will update.
>
> > *
> > * The following ordering guarantees are offered by this hash table:
> > *
> > * A.1) "read" after "write": if there is ordering between a write and a
> > * later read, then the read is guaranteed to see the write or some
> > * later write.
> > * A.2) "read traversal" after "write": given that there is dependency
> > * ordering between reads in a "read traversal", if there is
> > * ordering between a write and the first read of the traversal,
> > * then the "read traversal" is guaranteed to see the write or
> > * some later write.
> > * B.1) "write" after "read": if there is ordering between a read and a
> > * later write, then the read will never see the write.
> > * B.2) "write" after "read traversal": given that there is dependency
> > * ordering between reads in a "read traversal", if there is
> > * ordering between the last read of the traversal and a later
> > * write, then the "read traversal" will never see the write.
> > * C) "write" while "read traversal": if a write occurs during a "read
> > * traversal", the traversal may, or may not, see the write.
> > * D) "write" vs "write": writes occur atomically between their
> > * invocation and the moment they return. There is a full memory
> > * barrier before and after the atomic "commit" point of each
> > * write.
>
> This is a description of the implementation, while the others are guarantees.
> Of course, that begs the question of what the guarantee is... Maybe the
> following?
>
> D.1 "write" after "write": if there is ordering between a write and
> a later write, then the later write is guaranteed to see the
> effects of the first write.
> D.2 Concurrent "write" pairs: The system will assign an arbitrary order
> to any pair of concurrent conflicting writes. Non-conflicting
> writes (for example, to different keys) are unordered.
yep, looks good.
I also documented more thoroughly the "cds_lfht_add_unique", which acts
as a write (if it returns success) or read (if it returns failure).
Here is the updated version:
* Ordering Guarantees:
*
* To discuss these guarantees, we first define "read" operation as any
* of the the basic cds_lfht_lookup, cds_lfht_next_duplicate,
* cds_lfht_first, cds_lfht_next operation, as well as
* cds_lfht_add_unique (failure).
*
* We define "read traversal" operation as any of the following
* group of operations
* - cds_lfht_lookup followed by iteration with cds_lfht_next_duplicate
* (and/or cds_lfht_next, although less common).
* - cds_lfht_add_unique (failure) followed by iteration with
* cds_lfht_next_duplicate (and/or cds_lfht_next, although less
* common).
* - cds_lfht_first followed iteration with cds_lfht_next (and/or
* cds_lfht_next_duplicate, although less common).
*
* We define "write" operations as any of cds_lfht_add,
* cds_lfht_add_unique (success), cds_lfht_add_replace, cds_lfht_del.
*
* When cds_lfht_add_unique succeeds (returns the node passed as
* parameter), it acts as a "write" operation. When cds_lfht_add_unique
* fails (returns a node different from the one passed as parameter), it
* acts as a "read" operation. A cds_lfht_add_unique failure is a
* cds_lfht_lookup "read" operation, therefore, any ordering guarantee
* referring to "lookup" imply any of "lookup" or cds_lfht_add_unique
* (failure).
*
* We define "prior" and "later" node as nodes observable by reads and
* read traversals respectively before and after a write or sequence of
* write operations.
*
* Hash-table operations are often chained, for example, the pointer
* returned by a cds_lfht_lookup() might be passed to a cds_lfht_next(),
* whose return value might in turn be passed to another hash-table
* operation. This entire chain of operations must be enclosed by a
* pair of matching rcu_read_lock() and rcu_read_unlock() operations.
*
* The following ordering guarantees are offered by this hash table:
*
* A.1) "read" after "write": if there is ordering between a write and a
* later read, then the read is guaranteed to see the write or some
* later write.
* A.2) "read traversal" after "write": given that there is dependency
* ordering between reads in a "read traversal", if there is
* ordering between a write and the first read of the traversal,
* then the "read traversal" is guaranteed to see the write or
* some later write.
* B.1) "write" after "read": if there is ordering between a read and a
* later write, then the read will never see the write.
* B.2) "write" after "read traversal": given that there is dependency
* ordering between reads in a "read traversal", if there is
* ordering between the last read of the traversal and a later
* write, then the "read traversal" will never see the write.
* C) "write" while "read traversal": if a write occurs during a "read
* traversal", the traversal may, or may not, see the write.
* D.1) "write" after "write": if there is ordering between a write and
* a later write, then the later write is guaranteed to see the
* effects of the first write.
* D.2) Concurrent "write" pairs: The system will assign an arbitrary
* order to any pair of concurrent conflicting writes.
* Non-conflicting writes (for example, to different keys) are
* unordered.
* E) If a grace period separates a "del" or "replace" operation
* and a subsequent operation, then that subsequent operation is
* guaranteed not to see the removed item.
* F) Uniqueness guarantee: given a hash table that does not contain
* duplicate items for a given key, there will only be one item in
* the hash table after an arbitrary sequence of add_unique and/or
* add_replace operations. Note, however, that a pair of
* concurrent read operations might well access two different items
* with that key.
* G.1) If a pair of lookups for a given key are ordered (e.g. by a
* memory barrier), then the second lookup will return the same
* node as the previous lookup, or some later node.
* G.2) A "read traversal" that starts after the end of a prior "read
* traversal" (ordered by memory barriers) is guaranteed to see the
* same nodes as the previous traversal, or some later nodes.
*
* Progress guarantees:
*
* * Reads are wait-free. These operations always move forward in the
* hash table linked list, and this list has no loop.
* * Writes are lock-free. Any retry loop performed by a write operation
* is triggered by progress made within another update operation.
*
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list