[ltt-dev] [PATCH] fix the "unknown" case
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Fri Jun 11 13:16:13 EDT 2010
On Thu, Jun 10, 2010 at 09:46:31PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote:
> > Use a back-to-back pair of pthread_mutex_lock acquisitions and releases
> > to stand in for a memory barrier, given that ARM doesn't always seem to do
> > the right thing with __sync_synchronize().
>
> Hi Paul,
>
> Hrm, that's odd. More comments below,
Turns out that it is due to a mismatch between compiler and system. Still,
useful to test the "unknown" capability.
> > Tested on ARM and on x86.
> > The x86 testing was performed by removing all of the x86-specific
> > tests from configure.ac and then building from scratch.
> >
> > Signed-off-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
> > ---
> >
> > configure.ac | 9 +---
> > urcu/arch_unknown.h | 87 ++++++++++++++++++++++++++++++++++++++++++++
> > urcu/uatomic_arch_unknown.h | 57 ++++++++++++++++++++++++++++
> > 3 files changed, 146 insertions(+), 7 deletions(-)
> >
> > diff --git a/configure.ac b/configure.ac
> > index 1b1ca65..9274337 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -54,13 +54,8 @@ case $host_cpu in
> > *) ARCHTYPE="unknown";;
> > esac
> >
> > -if test "$ARCHTYPE" != "unknown"; then
> > - UATOMICSRC=urcu/uatomic_arch_$ARCHTYPE.h
> > - ARCHSRC=urcu/arch_$ARCHTYPE.h
> > -else
> > - UATOMICSRC=urcu/uatomic_generic.h
> > - ARCHSRC=urcu/arch_generic.h
> > -fi
> > +UATOMICSRC=urcu/uatomic_arch_$ARCHTYPE.h
> > +ARCHSRC=urcu/arch_$ARCHTYPE.h
> > if test "x$ARCHTYPE" != xx86 -a "x$ARCHTYPE" != xppc; then
> > APISRC=tests/api_gcc.h
> > else
> > diff --git a/urcu/arch_unknown.h b/urcu/arch_unknown.h
> > new file mode 100644
> > index 0000000..3f78310
> > --- /dev/null
> > +++ b/urcu/arch_unknown.h
> > @@ -0,0 +1,87 @@
> > +#ifndef _URCU_ARCH_ARMV7_H
> > +#define _URCU_ARCH_ARMV7_H
>
> The ifndef/define/endif don't seem to match the header name here.
Oops... Will fix.
> > +
> > +/*
> > + * arch_armv7.h: trivial definitions for the ARMv7-A/R architecture.
>
> Same for this comment.
>
> Do we really want to add this "unknown" arch support ? I thought the "generic"
> architecture support was enough for this.
You had code for "unknown" already, it is just that it didn't compile
all the tests. ;-)
> If ARM has special needs, then we should create a arm-specific header and
> somehow detect that we are building for ARM.
I should be able to get a good compiler, and at that point I will make
real ARM targets. But if we are going to have "unknown", it should at
least work reasonably, right? And if we don't know what the CPU is, then
the double lock acquire/release trick is the only reliable way I know of
to get a full memory barrier. I didn't fully trust __sync_synchronize()
to start with, and I suspect that ARMv6 is not the only system that
doesn't implement it to the degree that we need.
> > + *
> > + * Copyright (c) 2010 Paul E. McKenney, IBM Corporation.
> > + * Copyright (c) 2009 Mathieu Desnoyers <mathieu.desnoyers at polymtl.ca>
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > +*
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > + */
> > +
> > +#include <urcu/compiler.h>
> > +#include <urcu/config.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/* We don't know, so guess!!! */
> > +#define CACHE_LINE_SIZE 64
>
> This seems to be a rather ARM-specific guess. I've seen that 128 bytes is a bit
> safer for x86 L2 caches, or even 256 for some the power5+ L3 cache line size.
Right, but we have specific targets for x86 and Power. That said, I just
picked a random number here, and am happy with 128 or 256 if you would
prefer.
I have no idea what the cache size really is. Hmmm... I should check
/sys.... No cache-related info that I can see.
So again, let me know what number you would like me to use. As long as
it is reasonable power of two. ;-)
> > +
> > +#define mb() \
> > + do { \
> > + pthread_mutex_t __mb_fake_mutex; \
> > + \
> > + pthread_mutex_init(&__mb_fake_mutex, NULL); \
> > + if (pthread_mutex_lock(&__mb_fake_mutex) != 0) { \
> > + perror("pthread_mutex_lock"); \
> > + abort(); \
> > + } \
> > + if (pthread_mutex_unlock(&__mb_fake_mutex) != 0) { \
> > + perror("pthread_mutex_lock"); \
> > + abort(); \
> > + } \
> > + if (pthread_mutex_lock(&__mb_fake_mutex) != 0) { \
> > + perror("pthread_mutex_lock"); \
> > + abort(); \
> > + } \
> > + if (pthread_mutex_unlock(&__mb_fake_mutex) != 0) { \
> > + perror("pthread_mutex_lock"); \
> > + abort(); \
> > + } \
>
> If we are building for ARM, then why can't we use ARM-specific memory barriers
> (in assembly, not the broken ones provided by gcc) ?
Because the assembler chokes on them. But yes, another work item for me
is to get a sane compiler and assembler, and I will then create ARM targets
(e.g., "armv7l" for the machine in question) and these will use the
ARM memory barriers, which I have now added to my memory-ordering paper.
> Compatibility concerns for older ARM versions maybe ? Then how does glibc handle
> that ?
My main motivation for making "unknown" work is so that people can get
at least some use out of userspace-rcu on new hardware even if they don't
understand the memory-barrier instructions on that new hardware. But this
should be a transient state. Longer term, I would hope that new hardware
will be added as a separate config option.
And I still have no idea how to handle the ARM common case of cross
compilation, but will be worrying about that later. ;-)
> > + } while (0)
> > +
> > +/*
> > + * Serialize core instruction execution. Also acts as a compiler barrier.
> > + */
> > +#define sync_core() asm volatile("isb" : : : "memory")
>
> Is "isb" available on all "unknown" architectures ?
Actually, sync_core() is not used. I therefore intend to send a patch
removing sync_core() from all architectures. Seem reasonable?
> > +
> > +#include <stdlib.h>
> > +#include <sys/time.h>
> > +
> > +typedef unsigned long long cycles_t;
> > +
> > +static inline cycles_t get_cycles (void)
> > +{
> > + long long thetime;
>
> I would use unsigned long long here. Or even cycles_t.
Good point, will fix.
> > + struct timeval tv;
> > +
> > + if (gettimeofday(&tv, NULL) != 0) {
> > + perror("gettimeofday");
> > + abort();
>
> return 0 could do. Some distros dig into libraries that call exit() and warn
> about this kind of behavior.
Good point, ditched both the perror() and the abort() in favor of a
"return 0".
> > + }
> > + thetime = ((long long)tv.tv_sec) * 1000000ULL + ((long long)tv.tv_usec);
> > + return (cycles_t)thetime;
> > +}
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#include <urcu/arch_generic.h>
> > +
> > +#endif /* _URCU_ARCH_ARMV7_H */
> > diff --git a/urcu/uatomic_arch_unknown.h b/urcu/uatomic_arch_unknown.h
> > new file mode 100644
> > index 0000000..d1cf93e
> > --- /dev/null
> > +++ b/urcu/uatomic_arch_unknown.h
> > @@ -0,0 +1,57 @@
> > +#ifndef _URCU_ARCH_UATOMIC_ARMV7_H
> > +#define _URCU_ARCH_UATOMIC_ARMV7_H
> > +
> > +/*
> > + * Copyright (c) 1991-1994 by Xerox Corporation. All rights reserved.
> > + * Copyright (c) 1996-1999 by Silicon Graphics. All rights reserved.
> > + * Copyright (c) 1999-2004 Hewlett-Packard Development Company, L.P.
> > + * Copyright (c) 2009 Mathieu Desnoyers
> > + * Copyright (c) 2010 Paul E. McKenney, IBM Corporation
> > + * (Adapted from uatomic_arch_ppc.h)
> > + *
> > + * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED
> > + * OR IMPLIED. ANY USE IS AT YOUR OWN RISK.
> > + *
> > + * Permission is hereby granted to use or copy this program
> > + * for any purpose, provided the above notices are retained on all copies.
> > + * Permission to modify the code and to distribute modified code is granted,
> > + * provided the above notices are retained, and a notice that the code was
> > + * modified is included with the above copyright notice.
> > + *
> > + * Code inspired from libuatomic_ops-1.2, inherited in part from the
> > + * Boehm-Demers-Weiser conservative garbage collector.
> > + */
> > +
> > +#include <urcu/compiler.h>
> > +#include <urcu/system.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#define ILLEGAL_INSTR ".long 0xd00d00" /* @@@FIXME from ppc to ARM. */
>
> Hrm, someone should test that this is indeed an illegal instruction on ARM.
Or better yet, remove it completely. It was only there for the default
cases on the cmpxchg size-based switch statements, which do not exist
in the "unknown" architecture.
> > +
> > +/*
> > + * Using a isync as second barrier for exchange to provide acquire semantic.
> > + * According to uatomic_ops/sysdeps/gcc/powerpc.h, the documentation is "fairly
> > + * explicit that this also has acquire semantics."
>
> This comment does not make sense for ARM.
I was clearly only partially translating from PowerPC to ARM to "unknown".
Fixed.
> > + * Derived from AO_compare_and_swap(), but removed the comparison.
> > + */
> > +
> > +/* xchg */
> > +#define uatomic_xchg(addr, v) __sync_lock_test_and_set(addr, v);
> > +
> > +/* cmpxchg */
> > +#define uatomic_cmpxchg(addr, old, _new) \
> > + __sync_val_compare_and_swap(addr, old, _new)
> > +
> > +/* uatomic_add_return */
> > +#define uatomic_add_return(addr, v) __sync_add_and_fetch(addr, v)
>
> Can we trust __sync_lock_test_and_set/__sync_add_and_fetch given that
> __sync_synchronize is broken ?
I don't know yet. If it turns out that we cannot, then I will use some
form of global locking. But the __sync_lock_test_and_set() do at least
generate instructions, unlike __sync_synchronize(). ;-)
I will send two patches, one that incorporates your suggestions, and
another that removes sync_core().
Thanx, Paul
> Thanks,
>
> Mathieu
>
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#include <urcu/uatomic_generic.h>
> > +
> > +#endif /* _URCU_ARCH_UATOMIC_ARMV7_H */
>
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
More information about the lttng-dev
mailing list