[ltt-dev] [RFC PATCH 4/7] urcu:implement URCU_WAIT_READER

Thu Aug 18 13:07:12 EDT 2011

On Thu, Aug 18, 2011 at 11:28:28AM +0800, Lai Jiangshan wrote:
> On 08/18/2011 01:06 AM, Paul E. McKenney wrote:
> > On Wed, Aug 17, 2011 at 09:43:59AM +0800, Lai Jiangshan wrote:
> >> On 08/17/2011 03:20 AM, Paul E. McKenney wrote:
> >>> On Tue, Aug 16, 2011 at 03:58:07PM +0800, Lai Jiangshan wrote:
> >>>> synchronize_rcu() find out ongoing readers and wait for them
> >>>
> >>> Cute!
> >>>
> >>> But some questions and spelling nits below.
> >>>
> >>> 							Thanx, Paul
> >>>
> >>>> Signed-off-by: Lai Jiangshan <laijs at cn.fujitsu.com>
> >>>> ---
> >>>>  urcu.c             |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  urcu/static/urcu.h |   42 ++++++++++++++++++++++++++++++++++++++++++
> >>>>  2 files changed, 91 insertions(+), 0 deletions(-)
> >>>>
> >>>> diff --git a/urcu.c b/urcu.c
> >>>> index bd54467..c13be67 100644
> >>>> --- a/urcu.c
> >>>> +++ b/urcu.c
> >>>> @@ -419,6 +419,53 @@ static void urcu_sync_unlock_if_not_proxy_unlocked(int32_t *lock, int32_t self)
> >>>>  		urcu_sync_slow_unlock(lock);
> >>>>  }
> >>>>
> >>>> +/* also implies mb() */
> >>>> +void __urcu_read_unlock_specail(void)
> >>>
> >>> s/__urcu_read_unlock_specail/__urcu_read_unlock_special/, please.
> >>>
> >>>> +{
> >>>> +	urcu_sync_unlock_if_not_proxy_unlocked(&rcu_reader.sync,
> >>>> +			rcu_reader.tid);
> >>>> +}
> >>>> +
> >>>> +/*
> >>>> + * synchronize_rcu		rcu_read_unlock(outmost)
> >>>> + *
> >>>> + *
> >>>> + * sync = tid;			ctr = 0;
> >>>> + * smp_mb_master();		smp_mb_slave();
> >>>
> >>
> >> Sorry, I use this comments to explain why synchronize_rcu()
> >> will not sleep endless when synchronize_rcu() find ctr != 0,
> >>
> >>> But smp_mb_slave() expands to 'asm volatile("" : : : "memory")'
> >>> for RCU_SIGNAL, which would open up a number of vulnerabilities.
> >>>
> >>> And yes, smp_mb_master() does a memory barrier on all CPUs in
> >>> the RCU_SIGNAL case, but that only guarantees that if there was
> >>> a reader pre-dating the beginning of the grace period, that if
> >>> that reader accessed any pre-grace-period state, we also see
> >>> that reader.  Misordering of the rcu_read_unlock() with the
> >>> prior critical section, and this is why there is an smp_mb_master()
> >>> prior to exit from synchronize_rcu().
> >>>
> >>> So for this to work, I believe that you need an smp_mb_master()
> >>> between checking the index->ctr and doing the urcu_sync_proxy_unlock().
> >>
> >> Calling urcu_sync_proxy_unlock() earlier is OK.
> >> __urcu_read_unlock_special() do nothing when it found the lock is proxy unlocked.
> >>
> >> If we find ctr == 0:
> >> I did add a smp_mb_master() after we find ctr == 0(also after urcu_sync_proxy_unlock()).
> > 
> > So it is OK to have urcu_sync_proxy_unlock() execute concurrently
> > with urcu_sync_unlock_if_not_proxy_unlocked()?  If so, no problem.
> 
> Yes. They are OK even they execute concurrently.
> I will add some comments for it.

OK, good!  The following sequence of events illustrates one of my
remaining concerns:

1.	Thread 1 does rcu_read_lock() using the signal version, and is
	delayed for whatever reason.

2.	Thread 2 does synchronize_rcu(), and notices that Thread 1
	is still in a pre-existing RCU read-side critical section.

3.	Thread 1 exits its RCU read-side critical section.  It doesn't
	see a proxy lock, so does nothing.  But because the CPU was
	not constrained by smp_mb_slave(), the check of the proxy
	lock and the exiting of the RCU read-side critical section
	appear to happen out of order to other CPUs.

4.	Then Thread 2 might see thread 1 as still being in its
	RCU read-side critical section, and therefore do urcu_sync_lock(),
	which will never return because the reader didn't release
	the lock.

Ah, but this is not possible because the smp_mb_master() happened between
Thread 2's urcu_sync_proxy_lock() and its check of Thread 1's index->ctr.
The smp_mb_master() will force Thread 1 to execute a full memory at
some point.  If this barrier happens before step 3 above, then Thread 1
will unlock the lock, preventing this scenario.  If this barrier happens
between the two accesses in step 3 above, then Thread 2 will see Thread
1 has having exited the RCU read-side critical section, and thus won't
do urcu_sync_lock().  If this barrier happens after step 3, then Thread 2
will again see Thread 1 having exited its RCU read-side critical section.

So this might actually be OK.

The other concern is that we might boost a reader that has been in its
RCU read-side critical section for only a very short time, which is
quite wasteful for both the reader and updater threads.  It might be
best to do the check after sending the first round of signals -- in
fact, it might be best if the signal sender returns a value indicating
whether any of the readers is in an RCU read-side critical section,
so that we don't needlessly boost.

After all, if we needlessly do urcu_sync_proxy_lock(), we might needlessly
delay a read-side rcu_read_unlock() that we didn't even need to wait on.

I would feel best if we avoided boosting unless the grace period had
gone on for some time -- I use 500 milliseconds by default in the
Linux kernel, for example.

> >>> Or am I missing something that restricts URCU priority boosting
> >>> to RCU_MB (where it looks like it might work)?  Or missing some
> >>> extra synchronization in there somewhere?
> >>
> >> initial: sync=0,ctr>0
> >>
> >> sync = tid;			ctr = 0
> >> smp_mb_master();		smp_mb_slave();
> >> local_ctr = ctr;		local_sync = sync;
> >> 	assert(local_ctr==0 || local_sync == tid);
> >>
> >> So if we find the ctr != 0(local_ctr!=0):
> >> the reader will call __urcu_read_unlock_special(),
> >> synchronize_rcu() can be woke up and will not sleep endless.
> > 
> > Yep, I was worried about synchronizing access to the boost mutex rather
> > than the effects on the grace period.
> > 
> >>>> + * test ctr and wait;		test sync and wakeup;
> >>>> + */
> >>>> +
> >>>> +void synchronize_rcu(void)
> >>>> +{
> >>>> +	struct rcu_reader *index;
> >>>> +	int32_t self = syscall(SYS_gettid);
> >>>> +
> >>>> +	mutex_lock(&rcu_gp_lock);
> >>>> +
> >>>> +	if (cds_list_empty(&registry))
> >>>> +		goto out;
> >>>> +
> >>>> +	cds_list_for_each_entry(index, &registry, node)
> >>>> +		urcu_sync_proxy_lock(&index->sync, index->tid);
> >>>> +
> >>>> +	smp_mb_master(RCU_MB_GROUP);
> >>>> +
> >>>> +	cds_list_for_each_entry(index, &registry, node) {
> >>>> +		if (_CMM_LOAD_SHARED(index->ctr) == 0) {
> >>>> +			urcu_sync_proxy_unlock(&index->sync);
> >>>> +		} else {
> >>>> +			/* reader still running, we need to wait reader */
> >>>> +			urcu_sync_lock(&index->sync, self);
> >>>> +			urcu_sync_unlock(&index->sync, self);
> >>>> +		}
> >>>> +	}
> >>>> +
> >>>> +	/* ensure rcu_read_unlock() finish when we found the ctr==0 */
> >>>> +	smp_mb_master(RCU_MB_GROUP); /* ensure rcu_read_unlock() finish */
> >>
> >> This smp_mb_master() is needed after we find ctr == 0.
> > 
> > Again, my concern was the manipulations of index->sync rather than
> > the grace period.
> 
> index->sync is a userspace lock, it has such values: (the same as pthread mutexes)
> 	0			UNLOCK
> 	tid			LOCKED by tid
> 	tid | FUTEX_WAITERS	LOCKED by tid, some other thread compete for it
> (In this patchset "some other thread" is only the thread calling synchronize_rcu())
> 
> index->sync belongs to a reader thread,
> the reader thread can do 2 things only:
> 	read value of this lock.
> 	unlock it when the reader thread find this lock is locked by itself.
> 
> If synchronize_rcu() finds the reader is still running, because smp_mb_master()/smp_mb_slave(),
> the reader thread must see "index->sync = tid;", it means synchronize_rcu() thread and
> the reader thread both agreed that this lock is held by the reader thread, and it will
> be unlock by the reader thread, So synchronize_rcu() calls
> urcu_sync_lock() compete for it. This competing will make synchronize_rcu() thread
> wait for the reader thread. If index->sync is a pi-lock, the synchronize_rcu() thread
> will boost the reader thread. When urcu_sync_lock() returns, the C.S. of the reader
> thread are totally done.
> 
> Any problem is welcome.

The main remaining concern is what I believe to be the overly aggressive
boosting.

							Thanx, Paul

> Thank you very much.
> Lai
> 
> > 
> >>>> +
> >>>> +out:
> >>>> +	mutex_unlock(&rcu_gp_lock);
> >>>> +}
> >>>>  #endif /* #else #ifndef URCU_WAIT_READER */
> >>>>
> >>>>  /*
> >>>> @@ -437,7 +484,9 @@ void rcu_read_unlock(void)
> >>>>
> >>>>  void rcu_register_thread(void)
> >>>>  {
> >>>> +	rcu_reader.tid = syscall(SYS_gettid);
> >>>>  	rcu_reader.pthread = pthread_self();
> >>>> +	rcu_reader.sync = 0;
> >>>>  	assert(rcu_reader.need_mb == 0);
> >>>>  	assert(!(rcu_reader.ctr & RCU_GP_CTR_NEST_MASK));
> >>>>
> >>>> diff --git a/urcu/static/urcu.h b/urcu/static/urcu.h
> >>>> index 32d1af8..0941fd2 100644
> >>>> --- a/urcu/static/urcu.h
> >>>> +++ b/urcu/static/urcu.h
> >>>> @@ -221,9 +221,11 @@ static inline void smp_mb_slave(int group)
> >>>>  struct rcu_reader {
> >>>>  	/* Data used by both reader and synchronize_rcu() */
> >>>>  	unsigned long ctr;
> >>>> +	int32_t sync;
> >>>>  	char need_mb;
> >>>>  	/* Data used for registry */
> >>>>  	struct cds_list_head node __attribute__((aligned(CAA_CACHE_LINE_SIZE)));
> >>>> +	pid_t tid;
> >>>>  	pthread_t pthread;
> >>>>  };
> >>>>
> >>>> @@ -313,6 +315,46 @@ static inline int32_t urcu_sync_lock_onwer(int32_t *lock)
> >>>>  	return _CMM_LOAD_SHARED(*lock) & ~FUTEX_WAITERS;
> >>>>  }
> >>>>
> >>>> +void __urcu_read_unlock_specail(void);
> >>>> +
> >>>> +static inline void _rcu_read_lock(void)
> >>>> +{
> >>>> +	unsigned long tmp;
> >>>> +
> >>>> +	cmm_barrier();
> >>>> +	tmp = rcu_reader.ctr;
> >>>> +
> >>>> +	if (!tmp) {
> >>>> +		_CMM_STORE_SHARED(rcu_reader.ctr, 1);
> >>>> +		smp_mb_slave(RCU_MB_GROUP);
> >>>> +	} else {
> >>>> +		_CMM_STORE_SHARED(rcu_reader.ctr, tmp + 1);
> >>>> +		cmm_barrier();
> >>>> +	}
> >>>> +}
> >>>> +
> >>>> +static inline void _rcu_read_unlock(void)
> >>>> +{
> >>>> +	unsigned long tmp;
> >>>> +
> >>>> +	cmm_barrier();
> >>>> +	tmp = rcu_reader.ctr;
> >>>> +
> >>>> +	if (tmp == 1) {
> >>>> +		smp_mb_slave(RCU_MB_GROUP);
> >>>> +
> >>>> +		rcu_reader.ctr = 0;
> >>>> +		smp_mb_slave(RCU_MB_GROUP);
> >>>> +
> >>>> +		if (unlikely(urcu_sync_lock_onwer(&rcu_reader.sync)
> >>>> +				== rcu_reader.tid))
> >>>> +			__urcu_read_unlock_specail();
> >>>> +	} else {
> >>>> +		_CMM_STORE_SHARED(rcu_reader.ctr, tmp - 1);
> >>>> +		cmm_barrier();
> >>>> +	}
> >>>> +}
> >>>> +
> >>>>  #endif /* #else #ifndef URCU_WAIT_READER */
> >>>>
> >>>>  #ifdef __cplusplus
> >>>> -- 
> >>>> 1.7.4.4
> >>>>
> >>>
> >>
> > 
>