[ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Thu Feb 12 11:18:05 EST 2009
On Thu, Feb 12, 2009 at 12:47:07AM -0500, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > On Wed, Feb 11, 2009 at 11:10:44PM -0500, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > On Wed, Feb 11, 2009 at 06:33:08PM -0800, Paul E. McKenney wrote:
> > > > > On Wed, Feb 11, 2009 at 04:35:49PM -0800, Paul E. McKenney wrote:
> > > > > > On Wed, Feb 11, 2009 at 04:42:58PM -0500, Mathieu Desnoyers wrote:
> > > > > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > >
> > > > [ . . . ]
> > > >
> > > > > > > > (BTW, I do not trust my model yet, as it currently cannot detect the
> > > > > > > > failure case I pointed out earlier. :-/ Here and I thought that the
> > > > > > > > point of such models was to detect additional failure cases!!!)
> > > > > > > >
> > > > > > >
> > > > > > > Yes, I'll have to dig deeper into it.
> > > > > >
> > > > > > Well, as I said, I attached the current model and the error trail.
> > > > >
> > > > > And I had bugs in my model that allowed the rcu_read_lock() model
> > > > > to nest indefinitely, which overflowed into the top bit, messing
> > > > > things up. :-/
> > > > >
> > > > > Attached is a fixed model. This model validates correctly (woo-hoo!).
> > > > > Even better, gives the expected error if you comment out line 180 and
> > > > > uncomment line 213, this latter corresponding to the error case I called
> > > > > out a few days ago.
> > > > >
> > > > > I will play with removing models of mb...
> > > >
> > > > And commenting out the models of mb between the counter flips and the
> > > > test for readers still passes validation, as expected, and as shown in
> > > > the attached Promela code.
> > > >
> > >
> > > Hrm, in the email I sent you about the memory barrier, I said that it
> > > would not make the algorithm incorrect, but that it would cause
> > > situations where it would be impossible for the writer to do any
> > > progress as long as there are readers active. I think we would have to
> > > enhance the model or at least express this through some LTL statement to
> > > validate this specific behavior.
> >
> > But if the writer fails to make progress, then the counter remains at a
> > given value, which causes readers to drain, which allows the writer to
> > eventually make progress again. Right?
> >
>
> Not necessarily. If we don't have the proper memory barriers, we can
> have the writer waiting on, say, parity 0 *before* it has performed the
> parity switch. Therefore, even newly coming readers will add up to
> parity 0.
But the write that changes the parity will eventually make it out.
OK, so your argument is that we at least need a compiler barrier?
Regardless, please see attached for a modified version of the Promela
model that fully models omitting out the memory barrier that my
rcu_nest32.[hc] implementation omits. (It is possible to partially
model removal of other memory barriers via #if 0, but to fully model
would need to enumerate the permutations as shown on lines 231-257.)
> In your model, this is not detected, because eventually all readers will
> execute, and only then the writer will be able to update the data. But
> in reality, if we run a very busy 4096-cores machines where there is
> always at least one reader active, the the writer will be stuck forever,
> and that's really bad.
Assuming that the reordering is done by the CPU, the write will
eventually get out -- it is stuck in (say) the store buffer, and the
cache line will eventually arrive, and then the value will eventually
be seen by the readers.
We might need a -compiler- barrier, but then again, I am not sure that
we are talking about the same memory barrier -- again, please see
attached lines 231-257 to see which one that I eliminated.
Also, the original model I sent out has a minor bug that prevents it
from fully modeling the nested-read-side case. The patch below fixes this.
Signed-off-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
---
urcu.spin | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/formal-model/urcu.spin b/formal-model/urcu.spin
index e5bfff3..611464b 100644
--- a/formal-model/urcu.spin
+++ b/formal-model/urcu.spin
@@ -124,9 +124,13 @@ proctype urcu_reader()
break;
:: tmp < 4 && reader_progress[tmp] != 0 ->
tmp = tmp + 1;
- :: tmp >= 4 ->
+ :: tmp >= 4 &&
+ reader_progress[0] == reader_progress[3] ->
done = 1;
break;
+ :: tmp >= 4 &&
+ reader_progress[0] != reader_progress[3] ->
+ break;
od;
do
:: tmp < 4 && reader_progress[tmp] == 0 ->
-------------- next part --------------
/*
* urcu_mbmin.spin: Promela code to validate urcu. See commit number
* 3a9e6e9df706b8d39af94d2f027210e2e7d4106e of Mathieu Desnoyer's
* git archive at git://lttng.org/userspace-rcu.git, but with
* memory barriers removed.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
* Copyright (c) 2009 Paul E. McKenney, IBM Corporation.
*/
/* Promela validation variables. */
bit removed = 0; /* Has RCU removal happened, e.g., list_del_rcu()? */
bit free = 0; /* Has RCU reclamation happened, e.g., kfree()? */
bit need_mb = 0; /* =1 says need reader mb, =0 for reader response. */
byte reader_progress[4];
/* Count of read-side statement executions. */
/* urcu definitions and variables, taken straight from the algorithm. */
#define RCU_GP_CTR_BIT (1 << 7)
#define RCU_GP_CTR_NEST_MASK (RCU_GP_CTR_BIT - 1)
byte urcu_gp_ctr = 1;
byte urcu_active_readers = 0;
/* Model the RCU read-side critical section. */
proctype urcu_reader()
{
bit done = 0;
bit mbok;
byte tmp;
byte tmp_removed;
byte tmp_free;
/* Absorb any early requests for memory barriers. */
do
:: need_mb == 1 ->
need_mb = 0;
:: 1 -> skip;
:: 1 -> break;
od;
/*
* Each pass through this loop executes one read-side statement
* from the following code fragment:
*
* rcu_read_lock(); [0a]
* rcu_read_lock(); [0b]
* p = rcu_dereference(global_p); [1]
* x = p->data; [2]
* rcu_read_unlock(); [3b]
* rcu_read_unlock(); [3a]
*
* Because we are modeling a weak-memory machine, these statements
* can be seen in any order, the only restriction being that
* rcu_read_unlock() cannot precede the corresponding rcu_read_lock().
* The placement of the inner rcu_read_lock() and rcu_read_unlock()
* is non-deterministic, the above is but one possible placement.
* Intestingly enough, this model validates all possible placements
* of the inner rcu_read_lock() and rcu_read_unlock() statements,
* with the only constraint being that the rcu_read_lock() must
* precede the rcu_read_unlock().
*
* We also respond to memory-barrier requests, but only if our
* execution happens to be ordered. If the current state is
* misordered, we ignore memory-barrier requests.
*/
do
:: 1 ->
if
:: reader_progress[0] < 2 -> /* [0a and 0b] */
tmp = urcu_active_readers;
if
:: (tmp & RCU_GP_CTR_NEST_MASK) == 0 ->
tmp = urcu_gp_ctr;
do
:: (reader_progress[1] +
reader_progress[2] +
reader_progress[3] == 0) && need_mb == 1 ->
need_mb = 0;
:: 1 -> skip;
:: 1 -> break;
od;
urcu_active_readers = tmp;
:: else ->
urcu_active_readers = tmp + 1;
fi;
reader_progress[0] = reader_progress[0] + 1;
:: reader_progress[1] == 0 -> /* [1] */
tmp_removed = removed;
reader_progress[1] = 1;
:: reader_progress[2] == 0 -> /* [2] */
tmp_free = free;
reader_progress[2] = 1;
:: ((reader_progress[0] > reader_progress[3]) &&
(reader_progress[3] < 2)) -> /* [3a and 3b] */
tmp = urcu_active_readers - 1;
urcu_active_readers = tmp;
reader_progress[3] = reader_progress[3] + 1;
:: else -> break;
fi;
/* Process memory-barrier requests, if it is safe to do so. */
atomic {
mbok = 0;
tmp = 0;
do
:: tmp < 4 && reader_progress[tmp] == 0 ->
tmp = tmp + 1;
break;
:: tmp < 4 && reader_progress[tmp] != 0 ->
tmp = tmp + 1;
:: tmp >= 4 &&
reader_progress[0] == reader_progress[3] ->
done = 1;
break;
:: tmp >= 4 &&
reader_progress[0] != reader_progress[3] ->
break;
od;
do
:: tmp < 4 && reader_progress[tmp] == 0 ->
tmp = tmp + 1;
:: tmp < 4 && reader_progress[tmp] != 0 ->
break;
:: tmp >= 4 ->
mbok = 1;
break;
od
}
if
:: mbok == 1 ->
/* We get here if mb processing is safe. */
do
:: need_mb == 1 ->
need_mb = 0;
:: 1 -> skip;
:: 1 -> break;
od;
:: else -> skip;
fi;
/*
* Check to see if we have modeled the entire RCU read-side
* critical section, and leave if so.
*/
if
:: done == 1 -> break;
:: else -> skip;
fi
od;
assert((tmp_free == 0) || (tmp_removed == 1));
/* Process any late-arriving memory-barrier requests. */
do
:: need_mb == 1 ->
need_mb = 0;
:: 1 -> skip;
:: 1 -> break;
od;
}
/* Model the RCU update process. */
proctype urcu_updater()
{
byte tmp;
/* prior synchronize_rcu(), second counter flip. */
need_mb = 1; /* mb() A */
do
:: need_mb == 1 -> skip;
:: need_mb == 0 -> break;
od;
urcu_gp_ctr = urcu_gp_ctr + RCU_GP_CTR_BIT;
need_mb = 1; /* mb() B */
do
:: need_mb == 1 -> skip;
:: need_mb == 0 -> break;
od;
do
:: 1 ->
if
:: (urcu_active_readers & RCU_GP_CTR_NEST_MASK) != 0 &&
(urcu_active_readers & ~RCU_GP_CTR_NEST_MASK) !=
(urcu_gp_ctr & ~RCU_GP_CTR_NEST_MASK) ->
skip;
:: else -> break;
fi
od;
need_mb = 1; /* mb() C absolutely required by analogy with G */
do
:: need_mb == 1 -> skip;
:: need_mb == 0 -> break;
od;
/* Removal statement, e.g., list_del_rcu(). */
removed = 1;
/* current synchronize_rcu(), first counter flip. */
need_mb = 1; /* mb() D suggested */
do
:: need_mb == 1 -> skip;
:: need_mb == 0 -> break;
od;
urcu_gp_ctr = urcu_gp_ctr + RCU_GP_CTR_BIT;
need_mb = 1; /* mb() E required if D not present */
do
:: need_mb == 1 -> skip;
:: need_mb == 0 -> break;
od;
/* current synchronize_rcu(), first-flip check plus second flip. */
if
:: 1 ->
do
:: 1 ->
if
:: (urcu_active_readers & RCU_GP_CTR_NEST_MASK) != 0 &&
(urcu_active_readers & ~RCU_GP_CTR_NEST_MASK) !=
(urcu_gp_ctr & ~RCU_GP_CTR_NEST_MASK) ->
skip;
:: else -> break;
fi;
od;
urcu_gp_ctr = urcu_gp_ctr + RCU_GP_CTR_BIT;
:: 1 ->
tmp = urcu_gp_ctr;
urcu_gp_ctr = urcu_gp_ctr + RCU_GP_CTR_BIT;
do
:: 1 ->
if
:: (urcu_active_readers & RCU_GP_CTR_NEST_MASK) != 0 &&
(urcu_active_readers & ~RCU_GP_CTR_NEST_MASK) !=
(tmp & ~RCU_GP_CTR_NEST_MASK) ->
skip;
:: else -> break;
fi;
od;
fi;
/* current synchronize_rcu(), second counter flip check. */
need_mb = 1; /* mb() F not required */
do
:: need_mb == 1 -> skip;
:: need_mb == 0 -> break;
od;
do
:: 1 ->
if
:: (urcu_active_readers & RCU_GP_CTR_NEST_MASK) != 0 &&
(urcu_active_readers & ~RCU_GP_CTR_NEST_MASK) !=
(urcu_gp_ctr & ~RCU_GP_CTR_NEST_MASK) ->
skip;
:: else -> break;
fi;
od;
need_mb = 1; /* mb() G absolutely required */
do
:: need_mb == 1 -> skip;
:: need_mb == 0 -> break;
od;
/* free-up step, e.g., kfree(). */
free = 1;
}
/*
* Initialize the array, spawn a reader and an updater. Because readers
* are independent of each other, only one reader is needed.
*/
init {
atomic {
reader_progress[0] = 0;
reader_progress[1] = 0;
reader_progress[2] = 0;
reader_progress[3] = 0;
run urcu_reader();
run urcu_updater();
}
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: urcu_mbmin.sh
Type: application/x-sh
Size: 56 bytes
Desc: urcu_mbmin.sh
URL: <http://lists.casi.polymtl.ca/pipermail/lttng-dev/attachments/20090212/3990ebfb/attachment-0003.sh>
More information about the lttng-dev
mailing list