[ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)

Paul E. McKenney paulmck at linux.vnet.ibm.com
Mon Feb 9 14:35:06 EST 2009


On Mon, Feb 09, 2009 at 02:15:26PM -0500, Mathieu Desnoyers wrote:
> * Mathieu Desnoyers (compudj at krystal.dyndns.org) wrote:
> > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > On Mon, Feb 09, 2009 at 10:37:42AM -0800, Paul E. McKenney wrote:
> > > > On Mon, Feb 09, 2009 at 01:13:41PM -0500, Mathieu Desnoyers wrote:
> > > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > 
> > > [ . . . ]
> > > 
> > > > > You know what ? Changing RCU_GP_CTR_BIT to 16 uses a
> > > > > testw %ax, %ax instead of a testb %al, %al. The trick here is that
> > > > > RCU_GP_CTR_BIT must be a multiple of 8 so we can use a full 8-bits,
> > > > > 16-bits or 32-bits bitmask for the lower order bits.
> > > > > 
> > > > > On 64-bits, using a RCU_GP_CTR_BIT of 32 is also ok. It uses a testl.
> > > > > 
> > > > > To provide 32-bits compability and allow the deepest nesting possible, I
> > > > > think it makes sense to use
> > > > > 
> > > > > /* Use the amount of bits equal to half of the architecture long size */
> > > > > #define RCU_GP_CTR_BIT (sizeof(long) << 2)
> > > > 
> > > > You lost me on this one:
> > > > 
> > > > 	sizeof(long) << 2 = 0x10
> > > > 
> > > > I could believe the following (run on a 32-bit machine):
> > > > 
> > > > 	1 << (sizeof(long) * 8 - 1) = 0x80000000
> > > > 
> > > > Or, if you were wanting to use a bit halfway up the word, perhaps this:
> > > > 
> > > > 	1 << (sizeof(long) * 4 - 1) = 0x8000
> > > > 
> > > > Or am I confused?
> > > 
> > > Well, I am at least partly confused.  You were wanting a low-order bit,
> > > so you want to lose the "- 1" above.  Here are some of the possibilities:
> > > 
> > > 	sizeof(long) = 0x4
> > > 	sizeof(long) << 2 = 0x10
> > > 	1 << (sizeof(long) * 8 - 1) = 0x80000000
> > > 	1 << (sizeof(long) * 4) = 0x10000
> > > 	1 << (sizeof(long) * 4 - 1) = 0x8000
> > > 	1 << (sizeof(long) * 2) = 0x100
> > > 	1 << (sizeof(long) * 2 - 1) = 0x80
> > > 
> > > My guess is that 1 << (sizeof(long) * 4) and 1 << (sizeof(long) * 2)
> > > are of the most interest.
> > > 
> > 
> > Exactly. I'll change it to :
> > 
> > #define RCU_GP_CTR_BIT          (1 << (sizeof(long) << 2))
> > 
> > I somehow thought this define was used as a bit number rather than the
> > bit mask.
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> 
> It's pushed in the git tree. I also removed an increment in the fast
> path by initializing urcu_gp_ctr to RCU_GP_COUNT.
> 
> It brings benchmarks to :
> 
> Time per read : 6.87183 to 7.25318 cycles
> 
> So we seem to save about half a cycle to a cycle with this.

I like it!!!  ;-)

							Thanx, Paul




More information about the lttng-dev mailing list