[ltt-dev] [PATCH 3/4] Add native ARM port for armv7l

Paul E. McKenney paulmck at linux.vnet.ibm.com
Wed Jun 16 20:51:57 EDT 2010


On Wed, Jun 16, 2010 at 07:57:55PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > On Wed, Jun 16, 2010 at 05:23:15PM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > Add native support for armv7l.  Other variants of ARM will likely require
> > > > separate ports.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> > > > ---
> > > >  configure.ac               |    4 +++
> > > >  urcu/arch_armv7l.h         |   59 ++++++++++++++++++++++++++++++++++++++++++++
> > > >  urcu/uatomic_arch_armv7l.h |   48 +++++++++++++++++++++++++++++++++++
> > 
> > [ . . . ]
> > 
> > > > +#ifdef __cplusplus
> > > > +extern "C" {
> > > > +#endif 
> > > > +
> > > > +/* xchg */
> > > > +#define uatomic_xchg(addr, v) __sync_lock_test_and_set(addr, v)
> > > > +
> > > > +/* cmpxchg */
> > > > +#define uatomic_cmpxchg(addr, old, _new) \
> > > > +	__sync_val_compare_and_swap(addr, old, _new)
> > > > +
> > > > +/* uatomic_add_return */
> > > > +#define uatomic_add_return(addr, v) __sync_add_and_fetch(addr, v)
> > > 
> > > So, do we end up trusting that gcc got the memory barriers right in the ARM
> > > __sync_() primitives ? That sounds unlikely.
> > > 
> > > I'd vote for surrounding these primitives with smp_mb().
> > 
> > On ARM, my current belief is that the primitives other than
> > __sync_synchronize() and __sync_lock_release() are set up correctly.
> > 
> > However, I must defer to Paolo and Uli on this.
> 
> There is nothing like a quick test to see the result:
> 
> With a arm-linux-cs2009q1-203sb1 scratchbox compiler (gcc 4.3.3, provided by
> Nokia for the Omap3):
> 
> arm-none-linux-gnueabi-gcc-4.3.3 (Sourcery G++ Lite 2009q1-203) 4.3.3
> 
> I compile, with
> 
> /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-gcc-4.3.3 -mcpu=cortex-a9 -mtune=cortex-a9 -O2 -o armtest armtest.c
> 
> the following program:
> 
> int a;
> 
> int
> f()
> {
>         __sync_val_compare_and_swap(&a, 4, 1);
>         //__sync_lock_test_and_set(&a, 1);
>         //__sync_add_and_fetch(&a, 1);
>         //__sync_synchronize();
> }
> 
> int main()
> {
>         f();
> }
> 
> and get:
> 
> /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-objdump -S armtest
> 
> [...]
> 
> 
> 000083cc <f>:
>     83cc:       e59f0008        ldr     r0, [pc, #8]    ; 83dc <f+0x10>
>     83d0:       e3a01004        mov     r1, #4  ; 0x4
>     83d4:       e3a02001        mov     r2, #1  ; 0x1
>     83d8:       ea000305        b       8ff4 <__sync_val_compare_and_swap_4>
>     83dc:       00011524        .word   0x00011524
> 
> [...]
> 
> 00008ff4 <__sync_val_compare_and_swap_4>:
>     8ff4:       e92d41f0        push    {r4, r5, r6, r7, r8, lr}
>     8ff8:       e59f8034        ldr     r8, [pc, #52]   ; 9034 <__sync_val_compare_and_swap_4+0x40>
>     8ffc:       e1a06000        mov     r6, r0
>     9000:       e1a05001        mov     r5, r1
>     9004:       e1a07002        mov     r7, r2
>     9008:       e5964000        ldr     r4, [r6]
>     900c:       e1a00005        mov     r0, r5
>     9010:       e1550004        cmp     r5, r4
>     9014:       e1a01007        mov     r1, r7
>     9018:       e1a02006        mov     r2, r6
>     901c:       1a000002        bne     902c <__sync_val_compare_and_swap_4+0x38>
>     9020:       e12fff38        blx     r8
>     9024:       e3500000        cmp     r0, #0  ; 0x0
>     9028:       1afffff6        bne     9008 <__sync_val_compare_and_swap_4+0x14>
>     902c:       e1a00004        mov     r0, r4
>     9030:       e8bd81f0        pop     {r4, r5, r6, r7, r8, pc}
>     9034:       ffff0fc0        .word   0xffff0fc0
> 
> Where sadly the appropriate memory barriers are missing, and even the
> appropriate ldrex/teq, strexeq sequence is missing. So not only is this
> incorrect in terms of memory barriers, but also in terms of atomicity. Argh. I
> don't know if a compiler more recent than 4.3.3 would do better though, but I
> start to think that it would be wise to stay far away from gcc __sync_*()
> primitives. For ARM at least.

The way it was explained to me is that the "blx r8" above branches to a
code page supplied by the kernel, and that this page contains the memory
barriers and atomic instructions.  The "x" in the "blx" switches from
ARM to Thumb instruction-set format, just to keep things interesting.

The address is 0xffff0fc0, the value at address 0x9035 above.  Naturally,
gdb refuses to let me look at this address range.  My attempt to access
this range from inside the program also fails.  Which is not a surprise,
it is not supposed to be mapped.  Assuming you have a multicore ARMv6
or later, the kernel supplies the following code at this address (see
arch/arm/kernel/entry-armv.S):

		smp_dmb
	1:	ldrex	r3, [r2]
		subs	r3, r3, r0
		strexeq	r3, r1, [r2]
		teqeq	r3, #1
		beq	1b
		rsbs	r0, r3, #0
		/* beware -- each __kuser slot must be 8 instructions max */
	#ifdef CONFIG_SMP
		b	__kuser_memory_barrier
	#else
		usr_ret	lr
	#endif

The smp_dmb is an assembler macro (haven't come across one of those for
like 30 years!!!) that expands to different things depending on the ARM
architecture level that the kernel is built for.  The ldrex and strexeq
are ARM's atomic instructions, sort of like larx/stcx on ppc.

This is admittedly a bit involuted, but the really nice thing about it
is that it works on a wider variety of ARM architectures.

This approach OK for you?

							Thanx, Paul




More information about the lttng-dev mailing list