[ltt-dev] [PATCH 3/4] Add native ARM port for armv7l

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Wed Jun 16 21:37:02 EDT 2010


* Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> On Wed, Jun 16, 2010 at 07:57:55PM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > On Wed, Jun 16, 2010 at 05:23:15PM -0400, Mathieu Desnoyers wrote:
> > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > > Add native support for armv7l.  Other variants of ARM will likely require
> > > > > separate ports.
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> > > > > ---
> > > > >  configure.ac               |    4 +++
> > > > >  urcu/arch_armv7l.h         |   59 ++++++++++++++++++++++++++++++++++++++++++++
> > > > >  urcu/uatomic_arch_armv7l.h |   48 +++++++++++++++++++++++++++++++++++
> > > 
> > > [ . . . ]
> > > 
> > > > > +#ifdef __cplusplus
> > > > > +extern "C" {
> > > > > +#endif 
> > > > > +
> > > > > +/* xchg */
> > > > > +#define uatomic_xchg(addr, v) __sync_lock_test_and_set(addr, v)
> > > > > +
> > > > > +/* cmpxchg */
> > > > > +#define uatomic_cmpxchg(addr, old, _new) \
> > > > > +	__sync_val_compare_and_swap(addr, old, _new)
> > > > > +
> > > > > +/* uatomic_add_return */
> > > > > +#define uatomic_add_return(addr, v) __sync_add_and_fetch(addr, v)
> > > > 
> > > > So, do we end up trusting that gcc got the memory barriers right in the ARM
> > > > __sync_() primitives ? That sounds unlikely.
> > > > 
> > > > I'd vote for surrounding these primitives with smp_mb().
> > > 
> > > On ARM, my current belief is that the primitives other than
> > > __sync_synchronize() and __sync_lock_release() are set up correctly.
> > > 
> > > However, I must defer to Paolo and Uli on this.
> > 
> > There is nothing like a quick test to see the result:
> > 
> > With a arm-linux-cs2009q1-203sb1 scratchbox compiler (gcc 4.3.3, provided by
> > Nokia for the Omap3):
> > 
> > arm-none-linux-gnueabi-gcc-4.3.3 (Sourcery G++ Lite 2009q1-203) 4.3.3
> > 
> > I compile, with
> > 
> > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-gcc-4.3.3 -mcpu=cortex-a9 -mtune=cortex-a9 -O2 -o armtest armtest.c
> > 
> > the following program:
> > 
> > int a;
> > 
> > int
> > f()
> > {
> >         __sync_val_compare_and_swap(&a, 4, 1);
> >         //__sync_lock_test_and_set(&a, 1);
> >         //__sync_add_and_fetch(&a, 1);
> >         //__sync_synchronize();
> > }
> > 
> > int main()
> > {
> >         f();
> > }
> > 
> > and get:
> > 
> > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-objdump -S armtest
> > 
> > [...]
> > 
> > 
> > 000083cc <f>:
> >     83cc:       e59f0008        ldr     r0, [pc, #8]    ; 83dc <f+0x10>
> >     83d0:       e3a01004        mov     r1, #4  ; 0x4
> >     83d4:       e3a02001        mov     r2, #1  ; 0x1
> >     83d8:       ea000305        b       8ff4 <__sync_val_compare_and_swap_4>
> >     83dc:       00011524        .word   0x00011524
> > 
> > [...]
> > 
> > 00008ff4 <__sync_val_compare_and_swap_4>:
> >     8ff4:       e92d41f0        push    {r4, r5, r6, r7, r8, lr}
> >     8ff8:       e59f8034        ldr     r8, [pc, #52]   ; 9034 <__sync_val_compare_and_swap_4+0x40>
> >     8ffc:       e1a06000        mov     r6, r0
> >     9000:       e1a05001        mov     r5, r1
> >     9004:       e1a07002        mov     r7, r2
> >     9008:       e5964000        ldr     r4, [r6]
> >     900c:       e1a00005        mov     r0, r5
> >     9010:       e1550004        cmp     r5, r4
> >     9014:       e1a01007        mov     r1, r7
> >     9018:       e1a02006        mov     r2, r6
> >     901c:       1a000002        bne     902c <__sync_val_compare_and_swap_4+0x38>
> >     9020:       e12fff38        blx     r8
> >     9024:       e3500000        cmp     r0, #0  ; 0x0
> >     9028:       1afffff6        bne     9008 <__sync_val_compare_and_swap_4+0x14>
> >     902c:       e1a00004        mov     r0, r4
> >     9030:       e8bd81f0        pop     {r4, r5, r6, r7, r8, pc}
> >     9034:       ffff0fc0        .word   0xffff0fc0
> > 
> > Where sadly the appropriate memory barriers are missing, and even the
> > appropriate ldrex/teq, strexeq sequence is missing. So not only is this
> > incorrect in terms of memory barriers, but also in terms of atomicity. Argh. I
> > don't know if a compiler more recent than 4.3.3 would do better though, but I
> > start to think that it would be wise to stay far away from gcc __sync_*()
> > primitives. For ARM at least.
> 
> The way it was explained to me is that the "blx r8" above branches to a
> code page supplied by the kernel, and that this page contains the memory
> barriers and atomic instructions.  The "x" in the "blx" switches from
> ARM to Thumb instruction-set format, just to keep things interesting.
> 
> The address is 0xffff0fc0, the value at address 0x9035 above.  Naturally,
> gdb refuses to let me look at this address range.  My attempt to access
> this range from inside the program also fails.  Which is not a surprise,
> it is not supposed to be mapped.  Assuming you have a multicore ARMv6
> or later, the kernel supplies the following code at this address (see
> arch/arm/kernel/entry-armv.S):
> 
> 		smp_dmb
> 	1:	ldrex	r3, [r2]
> 		subs	r3, r3, r0
> 		strexeq	r3, r1, [r2]
> 		teqeq	r3, #1
> 		beq	1b
> 		rsbs	r0, r3, #0
> 		/* beware -- each __kuser slot must be 8 instructions max */
> 	#ifdef CONFIG_SMP
> 		b	__kuser_memory_barrier
> 	#else
> 		usr_ret	lr
> 	#endif
> 
> The smp_dmb is an assembler macro (haven't come across one of those for
> like 30 years!!!) that expands to different things depending on the ARM
> architecture level that the kernel is built for.  The ldrex and strexeq
> are ARM's atomic instructions, sort of like larx/stcx on ppc.
> 
> This is admittedly a bit involuted, but the really nice thing about it
> is that it works on a wider variety of ARM architectures.
> 
> This approach OK for you?

Yep, sounds fine :) I wonder if we could document how far back we expect ARM
Linux kernels to work (e.g. not earlier than 2.6.15). I guess this in-kernel
cmpxchg has not been there forever.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com




More information about the lttng-dev mailing list