[ltt-dev] [RFC PATCHv2 4/5] urcu: re-implment urcu-qsbr

Mon Aug 29 22:56:01 EDT 2011

On 08/29/2011 08:56 PM, Mathieu Desnoyers wrote:
> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
>> On 08/25/2011 04:35 PM, Paolo Bonzini wrote:
>>> On 08/25/2011 10:00 AM, Lai Jiangshan wrote:
>>>>>> I was measuring with 10 readers, not 3.  It makes sense to wait more with fewer readers.
>>>>
>>>> But my box just has 4 cores(i5 760).
>>>
>>> You cannot be always sure that readers are less than the cores.  readers > cores is exactly the case when busy waiting hurts most.
>>>
>>
>>
>> It makes no sense to do a "readers > cores" *performance* rcutorture test.
> 
> I think it does make sense to benchmark this use-case actually. One
> major difference between Userspace RCU and kernel code is that Userspace
> RCU has to handle workloads that can sometimes be ill-fitted with
> respect to the system configuration, and still behave reasonably well.

rcutorure's performance testing will bind reader-thread's to difference cpu,
when "readers > cores", "readers - cores" reader-threads will failed to
be bound, so I said it makes no sense to do a "readers > cores" *performance*
rcutorture test.

> 
>> When "readers > cores", the kernel scheduler will mess the test up.
> 
> Even though I agree that the kernel scheduler will become heavily
> involved in these tests, I think that if we keep the same scheduler
> configuration between the tests and only modify the URCU algorithm, we
> can compare the impact of URCU well enough.
> 
> So what I'm trying to say here is: I agree with you that we primarily
> need to optimize for performance of the "ideal" configuration (n threads
> for n cpus), but we also need to consider the cases where we have more
> threads than CPUs so, even though this behavior is not the one we
> mainly optimize for, we don't degrade its performance more than we
> should for the sake of very small gains in the ideal configuration.
> 
> Thanks,
> 
> Mathieu
> 

When readers > cores, the n_updates is very unstable, so the result of n_updates makes less sense.

The result show Paolo's patch has advance,
but my patch has more advance for reader site performance.

The updater in my patches has less affect to the reader.

Thanks,
Lai.

---------------------------------------
78bec1:

[laijs at lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 10 perf 2>/dev/null | (read a b c d e; echo $b $d); done
126477522000 37
124138875000 46
125035204000 38
124364462000 2813
126468264000 33
127630616000 37
124336956000 42
126514624000 35
125380877000 2045
123055119000 50
124811675000 1705
127572424000 30
125952102000 36
126772195000 3289
119900155000 50
126700892000 32
125155070000 38
126137397000 36
125569792000 40
125979757000 595
[laijs at lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 50 perf 2>/dev/null | (read a b c d e; echo $b $d); done
133815759000 12
134410744000 114
134054438000 10
134899649000 11
134909105000 10
134866493000 10
135255807000 12
134674536000 11
134548679000 11
134324605000 11
134932753000 812
135272699000 10
134802202000 13
134966508000 10
133966645000 13
134944451000 11
134123677000 12
135216333000 11
135954511000 10
136495744000 11

-------------------------------------------------------------------------
83a2c4(=78bec1+Paolo's patch)

[laijs at lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 10 perf 2>/dev/null | (read a b c d e; echo $b $d); done
136249775000 54
137578256000 52
136910456000 54
137308567000 52
137417925000 61
137023710000 53
137380666000 53
136926779000 55
136666836000 50
137323468000 52
137262017000 54
137175620000 2060
136971246000 149
137394253000 56
136799328000 50
137803397000 2284
137365536000 52
137353673000 53
137391468000 55
136296672000 54
[laijs at lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 50 perf 2>/dev/null | (read a b c d e; echo $b $d); done
136294321000 16
137155186000 16
135783080000 15
137418765000 499
137668007000 19
137311146000 15
137443675000 16
137231740000 17
135516661000 16
136909649000 16
136805721000 14
136709665000 18
136655673000 17
137326871000 31
136728430000 16
136911747000 65
136827095000 16
137243937000 17
136833854000 17
136780665000 76

------------------------------------------------------------
78bec1+my patchset:

[laijs at lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 10 perf 2>/dev/null | (read a b c d e; echo $b $d); done
227307909000 272
226744284000 52
226503154000 51
227665894000 1617
227029208000 52
226408806000 55
227959365000 53
226864291000 53
227732800000 52
228040512000 49
227128921000 52
228532620000 54
227128007000 53
227908550000 2692
228121321000 54
227504548000 52
228398981000 55
227060552000 55
228058918000 52
226711856000 53
[laijs at lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 50 perf 2>/dev/null | (read a b c d e; echo $b $d); done
228580953000 16
227967003000 16
228436484000 16
227489810000 17
227318023000 15
227743424000 16
226781106000 41
228138633000 17
227809037000 211
226175890000 17
228283901000 15
226989431000 15
226919435000 45
229552764000 128
227096384000 291
226240792000 17
226834648000 17
226925460000 18
227045700000 16
226696383000 15