[ltt-dev] [PATCH 3/4] Add native ARM port for armv7l

Paul E. McKenney paulmck at linux.vnet.ibm.com
Thu Jun 17 00:43:52 EDT 2010


On Wed, Jun 16, 2010 at 09:37:02PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > On Wed, Jun 16, 2010 at 07:57:55PM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > On Wed, Jun 16, 2010 at 05:23:15PM -0400, Mathieu Desnoyers wrote:
> > > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > > > Add native support for armv7l.  Other variants of ARM will likely require
> > > > > > separate ports.
> > > > > > 
> > > > > > Signed-off-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> > > > > > ---
> > > > > >  configure.ac               |    4 +++
> > > > > >  urcu/arch_armv7l.h         |   59 ++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  urcu/uatomic_arch_armv7l.h |   48 +++++++++++++++++++++++++++++++++++
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif 
> > > > > > +
> > > > > > +/* xchg */
> > > > > > +#define uatomic_xchg(addr, v) __sync_lock_test_and_set(addr, v)
> > > > > > +
> > > > > > +/* cmpxchg */
> > > > > > +#define uatomic_cmpxchg(addr, old, _new) \
> > > > > > +	__sync_val_compare_and_swap(addr, old, _new)
> > > > > > +
> > > > > > +/* uatomic_add_return */
> > > > > > +#define uatomic_add_return(addr, v) __sync_add_and_fetch(addr, v)
> > > > > 
> > > > > So, do we end up trusting that gcc got the memory barriers right in the ARM
> > > > > __sync_() primitives ? That sounds unlikely.
> > > > > 
> > > > > I'd vote for surrounding these primitives with smp_mb().
> > > > 
> > > > On ARM, my current belief is that the primitives other than
> > > > __sync_synchronize() and __sync_lock_release() are set up correctly.
> > > > 
> > > > However, I must defer to Paolo and Uli on this.
> > > 
> > > There is nothing like a quick test to see the result:
> > > 
> > > With a arm-linux-cs2009q1-203sb1 scratchbox compiler (gcc 4.3.3, provided by
> > > Nokia for the Omap3):
> > > 
> > > arm-none-linux-gnueabi-gcc-4.3.3 (Sourcery G++ Lite 2009q1-203) 4.3.3
> > > 
> > > I compile, with
> > > 
> > > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-gcc-4.3.3 -mcpu=cortex-a9 -mtune=cortex-a9 -O2 -o armtest armtest.c
> > > 
> > > the following program:
> > > 
> > > int a;
> > > 
> > > int
> > > f()
> > > {
> > >         __sync_val_compare_and_swap(&a, 4, 1);
> > >         //__sync_lock_test_and_set(&a, 1);
> > >         //__sync_add_and_fetch(&a, 1);
> > >         //__sync_synchronize();
> > > }
> > > 
> > > int main()
> > > {
> > >         f();
> > > }
> > > 
> > > and get:
> > > 
> > > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-objdump -S armtest
> > > 
> > > [...]
> > > 
> > > 
> > > 000083cc <f>:
> > >     83cc:       e59f0008        ldr     r0, [pc, #8]    ; 83dc <f+0x10>
> > >     83d0:       e3a01004        mov     r1, #4  ; 0x4
> > >     83d4:       e3a02001        mov     r2, #1  ; 0x1
> > >     83d8:       ea000305        b       8ff4 <__sync_val_compare_and_swap_4>
> > >     83dc:       00011524        .word   0x00011524
> > > 
> > > [...]
> > > 
> > > 00008ff4 <__sync_val_compare_and_swap_4>:
> > >     8ff4:       e92d41f0        push    {r4, r5, r6, r7, r8, lr}
> > >     8ff8:       e59f8034        ldr     r8, [pc, #52]   ; 9034 <__sync_val_compare_and_swap_4+0x40>
> > >     8ffc:       e1a06000        mov     r6, r0
> > >     9000:       e1a05001        mov     r5, r1
> > >     9004:       e1a07002        mov     r7, r2
> > >     9008:       e5964000        ldr     r4, [r6]
> > >     900c:       e1a00005        mov     r0, r5
> > >     9010:       e1550004        cmp     r5, r4
> > >     9014:       e1a01007        mov     r1, r7
> > >     9018:       e1a02006        mov     r2, r6
> > >     901c:       1a000002        bne     902c <__sync_val_compare_and_swap_4+0x38>
> > >     9020:       e12fff38        blx     r8
> > >     9024:       e3500000        cmp     r0, #0  ; 0x0
> > >     9028:       1afffff6        bne     9008 <__sync_val_compare_and_swap_4+0x14>
> > >     902c:       e1a00004        mov     r0, r4
> > >     9030:       e8bd81f0        pop     {r4, r5, r6, r7, r8, pc}
> > >     9034:       ffff0fc0        .word   0xffff0fc0
> > > 
> > > Where sadly the appropriate memory barriers are missing, and even the
> > > appropriate ldrex/teq, strexeq sequence is missing. So not only is this
> > > incorrect in terms of memory barriers, but also in terms of atomicity. Argh. I
> > > don't know if a compiler more recent than 4.3.3 would do better though, but I
> > > start to think that it would be wise to stay far away from gcc __sync_*()
> > > primitives. For ARM at least.
> > 
> > The way it was explained to me is that the "blx r8" above branches to a
> > code page supplied by the kernel, and that this page contains the memory
> > barriers and atomic instructions.  The "x" in the "blx" switches from
> > ARM to Thumb instruction-set format, just to keep things interesting.
> > 
> > The address is 0xffff0fc0, the value at address 0x9035 above.  Naturally,
> > gdb refuses to let me look at this address range.  My attempt to access
> > this range from inside the program also fails.  Which is not a surprise,
> > it is not supposed to be mapped.  Assuming you have a multicore ARMv6
> > or later, the kernel supplies the following code at this address (see
> > arch/arm/kernel/entry-armv.S):
> > 
> > 		smp_dmb
> > 	1:	ldrex	r3, [r2]
> > 		subs	r3, r3, r0
> > 		strexeq	r3, r1, [r2]
> > 		teqeq	r3, #1
> > 		beq	1b
> > 		rsbs	r0, r3, #0
> > 		/* beware -- each __kuser slot must be 8 instructions max */
> > 	#ifdef CONFIG_SMP
> > 		b	__kuser_memory_barrier
> > 	#else
> > 		usr_ret	lr
> > 	#endif
> > 
> > The smp_dmb is an assembler macro (haven't come across one of those for
> > like 30 years!!!) that expands to different things depending on the ARM
> > architecture level that the kernel is built for.  The ldrex and strexeq
> > are ARM's atomic instructions, sort of like larx/stcx on ppc.
> > 
> > This is admittedly a bit involuted, but the really nice thing about it
> > is that it works on a wider variety of ARM architectures.
> > 
> > This approach OK for you?
> 
> Yep, sounds fine :) I wonder if we could document how far back we expect ARM
> Linux kernels to work (e.g. not earlier than 2.6.15). I guess this in-kernel
> cmpxchg has not been there forever.

OK, updating comments and commit message and reposting.

							Thanx, Paul




More information about the lttng-dev mailing list