[ltt-dev] [PATCH 3/4] Add native ARM port for armv7l
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Thu Jun 17 00:43:52 EDT 2010
On Wed, Jun 16, 2010 at 09:37:02PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > On Wed, Jun 16, 2010 at 07:57:55PM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > On Wed, Jun 16, 2010 at 05:23:15PM -0400, Mathieu Desnoyers wrote:
> > > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > > > > > Add native support for armv7l. Other variants of ARM will likely require
> > > > > > separate ports.
> > > > > >
> > > > > > Signed-off-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> > > > > > ---
> > > > > > configure.ac | 4 +++
> > > > > > urcu/arch_armv7l.h | 59 ++++++++++++++++++++++++++++++++++++++++++++
> > > > > > urcu/uatomic_arch_armv7l.h | 48 +++++++++++++++++++++++++++++++++++
> > > >
> > > > [ . . . ]
> > > >
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +/* xchg */
> > > > > > +#define uatomic_xchg(addr, v) __sync_lock_test_and_set(addr, v)
> > > > > > +
> > > > > > +/* cmpxchg */
> > > > > > +#define uatomic_cmpxchg(addr, old, _new) \
> > > > > > + __sync_val_compare_and_swap(addr, old, _new)
> > > > > > +
> > > > > > +/* uatomic_add_return */
> > > > > > +#define uatomic_add_return(addr, v) __sync_add_and_fetch(addr, v)
> > > > >
> > > > > So, do we end up trusting that gcc got the memory barriers right in the ARM
> > > > > __sync_() primitives ? That sounds unlikely.
> > > > >
> > > > > I'd vote for surrounding these primitives with smp_mb().
> > > >
> > > > On ARM, my current belief is that the primitives other than
> > > > __sync_synchronize() and __sync_lock_release() are set up correctly.
> > > >
> > > > However, I must defer to Paolo and Uli on this.
> > >
> > > There is nothing like a quick test to see the result:
> > >
> > > With a arm-linux-cs2009q1-203sb1 scratchbox compiler (gcc 4.3.3, provided by
> > > Nokia for the Omap3):
> > >
> > > arm-none-linux-gnueabi-gcc-4.3.3 (Sourcery G++ Lite 2009q1-203) 4.3.3
> > >
> > > I compile, with
> > >
> > > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-gcc-4.3.3 -mcpu=cortex-a9 -mtune=cortex-a9 -O2 -o armtest armtest.c
> > >
> > > the following program:
> > >
> > > int a;
> > >
> > > int
> > > f()
> > > {
> > > __sync_val_compare_and_swap(&a, 4, 1);
> > > //__sync_lock_test_and_set(&a, 1);
> > > //__sync_add_and_fetch(&a, 1);
> > > //__sync_synchronize();
> > > }
> > >
> > > int main()
> > > {
> > > f();
> > > }
> > >
> > > and get:
> > >
> > > /scratchbox/compilers/arm-linux-cs2009q1-203sb1/bin/arm-none-linux-gnueabi-objdump -S armtest
> > >
> > > [...]
> > >
> > >
> > > 000083cc <f>:
> > > 83cc: e59f0008 ldr r0, [pc, #8] ; 83dc <f+0x10>
> > > 83d0: e3a01004 mov r1, #4 ; 0x4
> > > 83d4: e3a02001 mov r2, #1 ; 0x1
> > > 83d8: ea000305 b 8ff4 <__sync_val_compare_and_swap_4>
> > > 83dc: 00011524 .word 0x00011524
> > >
> > > [...]
> > >
> > > 00008ff4 <__sync_val_compare_and_swap_4>:
> > > 8ff4: e92d41f0 push {r4, r5, r6, r7, r8, lr}
> > > 8ff8: e59f8034 ldr r8, [pc, #52] ; 9034 <__sync_val_compare_and_swap_4+0x40>
> > > 8ffc: e1a06000 mov r6, r0
> > > 9000: e1a05001 mov r5, r1
> > > 9004: e1a07002 mov r7, r2
> > > 9008: e5964000 ldr r4, [r6]
> > > 900c: e1a00005 mov r0, r5
> > > 9010: e1550004 cmp r5, r4
> > > 9014: e1a01007 mov r1, r7
> > > 9018: e1a02006 mov r2, r6
> > > 901c: 1a000002 bne 902c <__sync_val_compare_and_swap_4+0x38>
> > > 9020: e12fff38 blx r8
> > > 9024: e3500000 cmp r0, #0 ; 0x0
> > > 9028: 1afffff6 bne 9008 <__sync_val_compare_and_swap_4+0x14>
> > > 902c: e1a00004 mov r0, r4
> > > 9030: e8bd81f0 pop {r4, r5, r6, r7, r8, pc}
> > > 9034: ffff0fc0 .word 0xffff0fc0
> > >
> > > Where sadly the appropriate memory barriers are missing, and even the
> > > appropriate ldrex/teq, strexeq sequence is missing. So not only is this
> > > incorrect in terms of memory barriers, but also in terms of atomicity. Argh. I
> > > don't know if a compiler more recent than 4.3.3 would do better though, but I
> > > start to think that it would be wise to stay far away from gcc __sync_*()
> > > primitives. For ARM at least.
> >
> > The way it was explained to me is that the "blx r8" above branches to a
> > code page supplied by the kernel, and that this page contains the memory
> > barriers and atomic instructions. The "x" in the "blx" switches from
> > ARM to Thumb instruction-set format, just to keep things interesting.
> >
> > The address is 0xffff0fc0, the value at address 0x9035 above. Naturally,
> > gdb refuses to let me look at this address range. My attempt to access
> > this range from inside the program also fails. Which is not a surprise,
> > it is not supposed to be mapped. Assuming you have a multicore ARMv6
> > or later, the kernel supplies the following code at this address (see
> > arch/arm/kernel/entry-armv.S):
> >
> > smp_dmb
> > 1: ldrex r3, [r2]
> > subs r3, r3, r0
> > strexeq r3, r1, [r2]
> > teqeq r3, #1
> > beq 1b
> > rsbs r0, r3, #0
> > /* beware -- each __kuser slot must be 8 instructions max */
> > #ifdef CONFIG_SMP
> > b __kuser_memory_barrier
> > #else
> > usr_ret lr
> > #endif
> >
> > The smp_dmb is an assembler macro (haven't come across one of those for
> > like 30 years!!!) that expands to different things depending on the ARM
> > architecture level that the kernel is built for. The ldrex and strexeq
> > are ARM's atomic instructions, sort of like larx/stcx on ppc.
> >
> > This is admittedly a bit involuted, but the really nice thing about it
> > is that it works on a wider variety of ARM architectures.
> >
> > This approach OK for you?
>
> Yep, sounds fine :) I wonder if we could document how far back we expect ARM
> Linux kernels to work (e.g. not earlier than 2.6.15). I guess this in-kernel
> cmpxchg has not been there forever.
OK, updating comments and commit message and reposting.
Thanx, Paul
More information about the lttng-dev
mailing list