[ltt-dev] Selective wakeup formal model update

Mon Aug 29 09:12:54 EDT 2011

* Paolo Bonzini (pbonzini at redhat.com) wrote:
> On 08/27/2011 06:07 PM, Mathieu Desnoyers wrote:
>>      Update nto1 selective model
>>
>>      The nto1 selective wakeup model introduced by Paolo was:
>>
>>      a) too complex (tried to model the full-blown RCU rather than a simple
>>         queue system)
>>      b) did not model progress with wakers running for a limited amount of
>>         iterations, only with wakers running infinitely often (which
>>         therefore does not prove anything).
>>
>>      What we really want to model here is what happens if we have wakers
>>      running a few times, and then not running for a very long time: can we
>>      end up with a missed wakeup, and therefore end up with an enqueued entry
>>      but with the waiter waiting forever ?
>>
>> See
>>
>> http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=futex-wakeup/nto1-selective/futex.spin;h=6f854fceefba73dfbc3781d04734b75590175159;hb=refs/heads/formal-model
>>
>> Thanks!
>
> Makes sense regarding the complexity.
>
> Regarding (b), I am not sure I understand.  It is true that wakers run  
> infinitely often, but they do nothing until all of them have progressed.  
>  Also, since the queue state is binary, Spin should exhaust its state  
> space without running wakers more than twice; it should be possible to  
> add an assertion about this.
>
> I think we are simply testing different things.  Perhaps we want to keep  
> both models?

I chose to go with a simpler model because I tried to do a change on
yours, and the results were unexpected, so I thought it might be a
modeling error. Let's see:

if you start with userspace rcu formal-models branch, commit
7f12dad2ba26c922477cc5999f7efef43ce26748, file
futex-wakeup/nto1-selective/futex.spin (your version), and apply:

diff --git a/futex-wakeup/nto1-selective/futex.spin b/futex-wakeup/nto1-selective/futex.spin
index 556f96f..64f3d9d 100644
--- a/futex-wakeup/nto1-selective/futex.spin
+++ b/futex-wakeup/nto1-selective/futex.spin
@@ -76,24 +76,37 @@ active [2] proctype waker()
 {
 	assert(get_pid() < 2);
 
-	do
-	:: 1 ->
-		queue[get_pid()] = gp;
-	
+	queue[get_pid()] = gp;
+
+	if
+	:: (waiting[get_pid()] == 1) ->
+		waiting[get_pid()] = 0;
 		if
-		:: (waiting[get_pid()] == 1) ->
-			waiting[get_pid()] = 0;
-			if
-			:: (gp_futex == -1) ->
-				gp_futex = 0;
+		:: (gp_futex == -1) ->
+			gp_futex = 0;
 #ifndef INJ_QUEUE_NO_WAKE
-				futex_wake = 1;
+			futex_wake = 1;
 #endif
-			:: else ->
-				skip;
-			fi;
+		:: else ->
+			skip;
 		fi;
-	od;
+	fi;
+
+	queue[get_pid()] = gp;
+
+	if
+	:: (waiting[get_pid()] == 1) ->
+		waiting[get_pid()] = 0;
+		if
+		:: (gp_futex == -1) ->
+			gp_futex = 0;
+#ifndef INJ_QUEUE_NO_WAKE
+			futex_wake = 1;
+#endif
+		:: else ->
+			skip;
+		fi;
+	fi;
 }
 
It changes the wakers so that instead of running infinitely often, they
run twice and stop. Normally, the progress check should ensure that the
waiter progresses even if the flow of wakers stops, but it's not the
case here. So there seems to be something wrong in the model.

Result:
futex_progress.log:State-vector 76 byte, depth reached 385, errors: 1

You can see the state "backtrace" with:
spin -v -t -N futex.ltl futex_progress.spin.input

the endless cycle is:


  <<<<<START OF CYCLE>>>>>
i=4 pno 3
364:	proc  2 (waiter) futex_progress.spin.input:169 (state 36)	[((futex_wake==0))]
i=4 pno 3
366:	proc  2 (waiter) futex_progress.spin.input:170 (state 37)	[(1)]
i=4 pno 3
368:	proc  2 (waiter) futex_progress.spin.input:167 (state 35)	[(1)]
i=4 pno 3
370:	proc  2 (waiter) futex_progress.spin.input:169 (state 36)	[((futex_wake==0))]
i=4 pno 3
372:	proc  2 (waiter) futex_progress.spin.input:170 (state 37)	[(1)]
i=4 pno 3
374:	proc  2 (waiter) futex_progress.spin.input:167 (state 35)	[(1)]
i=4 pno 3
376:	proc  2 (waiter) futex_progress.spin.input:169 (state 36)	[((futex_wake==0))]
i=4 pno 3
378:	proc  2 (waiter) futex_progress.spin.input:170 (state 37)	[(1)]
i=4 pno 3
380:	proc  2 (waiter) futex_progress.spin.input:167 (state 35)	[(1)]
i=4 pno 3
382:	proc  2 (waiter) futex_progress.spin.input:169 (state 36)	[((futex_wake==0))]
i=4 pno 3
384:	proc  2 (waiter) futex_progress.spin.input:170 (state 37)	[(1)]
i=4 pno 3
386:	proc  2 (waiter) futex_progress.spin.input:167 (state 35)	[(1)]

with the following variables state:

                queue[0] = 0
                queue[1] = 1
                waiting[0] = 0
                waiting[1] = 1
                futex_wake = 0
                gp_futex = -1
                gp = 0
                in_registry[0] = 0
                in_registry[1] = 1

So if your model was accurate, synchronize_rcu could wait on futex_wake
forever if all reader threads stop after accessing to two critical
sections.

Hrm, thinking about it a little more, this would be an expected beahvior
with QSBR if the reader threads are not put offline. So I think we need
to represent the offline state if we really want to model progress,
because otherwise we only really show the cases where readers report
their quiescent state infinitely often.

And I agree we can keep that model somewhere (when we get it to work for
offline case), but it does not belong in the futex-wakeup/ model
directory, more within a urcu-specific directory.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com