[ltt-dev] [PATCH 3/4] Add native ARM port for armv7l
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Wed Jun 16 20:51:57 EDT 2010
On Wed, Jun 16, 2010 at 07:57:55PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > On Wed, Jun 16, 2010 at 05:23:15PM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > Add native support for armv7l. Other variants of ARM will likely require
> > > > separate ports.
> > > >
> > > > Signed-off-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> > > > ---
> > > > configure.ac | 4 +++
> > > > urcu/arch_armv7l.h | 59 ++++++++++++++++++++++++++++++++++++++++++++
> > > > urcu/uatomic_arch_armv7l.h | 48 +++++++++++++++++++++++++++++++++++
> >
> > [ . . . ]
> >
> > > > +#ifdef __cplusplus
> > > > +extern "C" {
> > > > +#endif
> > > > +
> > > > +/* xchg */
> > > > +#define uatomic_xchg(addr, v) __sync_lock_test_and_set(addr, v)
> > > > +
> > > > +/* cmpxchg */
> > > > +#define uatomic_cmpxchg(addr, old, _new) \
> > > > + __sync_val_compare_and_swap(addr, old, _new)
> > > > +
> > > > +/* uatomic_add_return */
> > > > +#define uatomic_add_return(addr, v) __sync_add_and_fetch(addr, v)
> > >
> > > So, do we end up trusting that gcc got the memory barriers right in the ARM
> > > __sync_() primitives ? That sounds unlikely.
> > >
> > > I'd vote for surrounding these primitives with smp_mb().
> >
> > On ARM, my current belief is that the primitives other than
> > __sync_synchronize() and __sync_lock_release() are set up correctly.
> >
> > However, I must defer to Paolo and Uli on this.
>
> There is nothing like a quick test to see the result:
>
> With a arm-linux-cs2009q1-203sb1 scratchbox compiler (gcc 4.3.3, provided by
> Nokia for the Omap3):
>
> arm-none-linux-gnueabi-gcc-4.3.3 (Sourcery G++ Lite 2009q1-203) 4.3.3
>
> I compile, with
>
> /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-gcc-4.3.3 -mcpu=cortex-a9 -mtune=cortex-a9 -O2 -o armtest armtest.c
>
> the following program:
>
> int a;
>
> int
> f()
> {
> __sync_val_compare_and_swap(&a, 4, 1);
> //__sync_lock_test_and_set(&a, 1);
> //__sync_add_and_fetch(&a, 1);
> //__sync_synchronize();
> }
>
> int main()
> {
> f();
> }
>
> and get:
>
> /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-objdump -S armtest
>
> [...]
>
>
> 000083cc <f>:
> 83cc: e59f0008 ldr r0, [pc, #8] ; 83dc <f+0x10>
> 83d0: e3a01004 mov r1, #4 ; 0x4
> 83d4: e3a02001 mov r2, #1 ; 0x1
> 83d8: ea000305 b 8ff4 <__sync_val_compare_and_swap_4>
> 83dc: 00011524 .word 0x00011524
>
> [...]
>
> 00008ff4 <__sync_val_compare_and_swap_4>:
> 8ff4: e92d41f0 push {r4, r5, r6, r7, r8, lr}
> 8ff8: e59f8034 ldr r8, [pc, #52] ; 9034 <__sync_val_compare_and_swap_4+0x40>
> 8ffc: e1a06000 mov r6, r0
> 9000: e1a05001 mov r5, r1
> 9004: e1a07002 mov r7, r2
> 9008: e5964000 ldr r4, [r6]
> 900c: e1a00005 mov r0, r5
> 9010: e1550004 cmp r5, r4
> 9014: e1a01007 mov r1, r7
> 9018: e1a02006 mov r2, r6
> 901c: 1a000002 bne 902c <__sync_val_compare_and_swap_4+0x38>
> 9020: e12fff38 blx r8
> 9024: e3500000 cmp r0, #0 ; 0x0
> 9028: 1afffff6 bne 9008 <__sync_val_compare_and_swap_4+0x14>
> 902c: e1a00004 mov r0, r4
> 9030: e8bd81f0 pop {r4, r5, r6, r7, r8, pc}
> 9034: ffff0fc0 .word 0xffff0fc0
>
> Where sadly the appropriate memory barriers are missing, and even the
> appropriate ldrex/teq, strexeq sequence is missing. So not only is this
> incorrect in terms of memory barriers, but also in terms of atomicity. Argh. I
> don't know if a compiler more recent than 4.3.3 would do better though, but I
> start to think that it would be wise to stay far away from gcc __sync_*()
> primitives. For ARM at least.
The way it was explained to me is that the "blx r8" above branches to a
code page supplied by the kernel, and that this page contains the memory
barriers and atomic instructions. The "x" in the "blx" switches from
ARM to Thumb instruction-set format, just to keep things interesting.
The address is 0xffff0fc0, the value at address 0x9035 above. Naturally,
gdb refuses to let me look at this address range. My attempt to access
this range from inside the program also fails. Which is not a surprise,
it is not supposed to be mapped. Assuming you have a multicore ARMv6
or later, the kernel supplies the following code at this address (see
arch/arm/kernel/entry-armv.S):
smp_dmb
1: ldrex r3, [r2]
subs r3, r3, r0
strexeq r3, r1, [r2]
teqeq r3, #1
beq 1b
rsbs r0, r3, #0
/* beware -- each __kuser slot must be 8 instructions max */
#ifdef CONFIG_SMP
b __kuser_memory_barrier
#else
usr_ret lr
#endif
The smp_dmb is an assembler macro (haven't come across one of those for
like 30 years!!!) that expands to different things depending on the ARM
architecture level that the kernel is built for. The ldrex and strexeq
are ARM's atomic instructions, sort of like larx/stcx on ppc.
This is admittedly a bit involuted, but the really nice thing about it
is that it works on a wider variety of ARM architectures.
This approach OK for you?
Thanx, Paul
More information about the lttng-dev
mailing list