diff mbox

__HAVE_ARCH_CMPXCHG on ARM?

Message ID 20150129185618.GA31459@nathan3500-linux-VM (mailing list archive)
State New, archived
Headers show

Commit Message

Nathan Sullivan Jan. 29, 2015, 6:56 p.m. UTC
While investigating a performance issue on RT 3.14, I noticed that
__HAVE_ARCH_CMPXCHG is not defined for ARM.  This causes -rt locks to
always use the slow path and impacts performance.  ARMv6 and above
have a cmpxchg implementation already that will work just fine.

In 2012, Yong Zhang submitted a patch to fix this[1], but it mysteriously
disappeared afterwards.  I did notice that the s-o-b line has a different
email address than the windriver one.  Yong, do you mind re-submitting the
patch with a fixed s-o-b?

Thanks!
   Nathan Sullivan

[1]: http://lists.infradead.org/pipermail/linux-arm-kernel/2012-May/102122.html (duplicated below)

-- 8< --
From: Yong Zhang <yong.zhang at windriver.com>

Both pi_stress and sigwaittest in rt-test show performance gain with
__HAVE_ARCH_CMPXCHG. Testing result on coretile_express_a9x4:

pi_stress -p 99 --duration=300 (on linux-3.4-rc5; bigger is better)
  vanilla:     Total inversion performed: 5493381
  patched:     Total inversion performed: 5621746

sigwaittest -p 99 -l 100000 (on linux-3.4-rc5-rt6; less is better)
  3.4-rc5-rt6: Min   24, Cur   27, Avg   30, Max   98
  patched:     Min   19, Cur   21, Avg   23, Max   96

Signed-off-by: Yong Zhang <yong.zhang0 at gmail.com>
Cc: Russell King <rmk+kernel at arm.linux.org.uk>
Cc: Nicolas Pitre <nico at linaro.org>
Cc: Will Deacon <will.deacon at arm.com>
Cc: Catalin Marinas <catalin.marinas at arm.com>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: linux-arm-kernel at lists.infradead.org
---
 arch/arm/include/asm/cmpxchg.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

Comments

Will Deacon Feb. 2, 2015, 1:39 p.m. UTC | #1
On Fri, Jan 30, 2015 at 08:44:48AM +0000, Yong Zhang wrote:
> On Fri, Jan 30, 2015 at 2:56 AM, Nathan Sullivan <nathan.sullivan@ni.com<mailto:nathan.sullivan@ni.com>> wrote:
> 
> While investigating a performance issue on RT 3.14, I noticed that
> __HAVE_ARCH_CMPXCHG is not defined for ARM.  This causes -rt locks to
> always use the slow path and impacts performance.  ARMv6 and above
> have a cmpxchg implementation already that will work just fine.
> 
> In 2012, Yong Zhang submitted a patch to fix this[1], but it mysteriously
> disappeared afterwards.  I did notice that the s-o-b line has a different
> email address than the windriver one.  Yong, do you mind re-submitting the
> 
> It actually doesn't matter. Anyway feel free to add another s-o-b to
> the patch:
> 
> Signed-off-by: Yong Zhang <yong.zhang@windriver.com<mailto:yong.zhang@windriver.com>>

[...]

> diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h
> index 7eb18c1..a91b44e 100644
> --- a/arch/arm/include/asm/cmpxchg.h
> +++ b/arch/arm/include/asm/cmpxchg.h
> @@ -127,6 +127,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> 
>  #else  /* min ARCH >= ARMv6 */
> 
> +#define __HAVE_ARCH_CMPXCHG 1

Is it even possible to have CONFIG_SMP=y on an architecture that doesn't
support native cmpxchg? Certainly, the asm-generic cmpxchg.h barfs if
SMP, so couldn't we just change/extend the __HAVE_ARCH_CMPXCHG check in
rtmutex.c to work automatically with SMP architectures? That would help
arm64 too.

Will
Nathan Sullivan Feb. 3, 2015, 4:48 p.m. UTC | #2
On Mon, Feb 02, 2015 at 01:39:33PM +0000, Will Deacon wrote:
> On Fri, Jan 30, 2015 at 08:44:48AM +0000, Yong Zhang wrote:
> > On Fri, Jan 30, 2015 at 2:56 AM, Nathan Sullivan <nathan.sullivan@ni.com<mailto:nathan.sullivan@ni.com>> wrote:
> > 
> > While investigating a performance issue on RT 3.14, I noticed that
> > __HAVE_ARCH_CMPXCHG is not defined for ARM.  This causes -rt locks to
> > always use the slow path and impacts performance.  ARMv6 and above
> > have a cmpxchg implementation already that will work just fine.
> > 
> > In 2012, Yong Zhang submitted a patch to fix this[1], but it mysteriously
> > disappeared afterwards.  I did notice that the s-o-b line has a different
> > email address than the windriver one.  Yong, do you mind re-submitting the
> > 
> > It actually doesn't matter. Anyway feel free to add another s-o-b to
> > the patch:
> > 
> > Signed-off-by: Yong Zhang <yong.zhang@windriver.com<mailto:yong.zhang@windriver.com>>
> 
> [...]
> 
> > diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h
> > index 7eb18c1..a91b44e 100644
> > --- a/arch/arm/include/asm/cmpxchg.h
> > +++ b/arch/arm/include/asm/cmpxchg.h
> > @@ -127,6 +127,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> > 
> >  #else  /* min ARCH >= ARMv6 */
> > 
> > +#define __HAVE_ARCH_CMPXCHG 1
> 
> Is it even possible to have CONFIG_SMP=y on an architecture that doesn't
> support native cmpxchg? Certainly, the asm-generic cmpxchg.h barfs if
> SMP, so couldn't we just change/extend the __HAVE_ARCH_CMPXCHG check in
> rtmutex.c to work automatically with SMP architectures? That would help
> arm64 too.
> 
> Will

What about single Cortex-A8s, for example?  Seems like a non-SMP build should still
have cmpxchg.

Nathan
Will Deacon Feb. 3, 2015, 4:57 p.m. UTC | #3
On Tue, Feb 03, 2015 at 04:48:48PM +0000, Nathan Sullivan wrote:
> On Mon, Feb 02, 2015 at 01:39:33PM +0000, Will Deacon wrote:
> > On Fri, Jan 30, 2015 at 08:44:48AM +0000, Yong Zhang wrote:
> > > On Fri, Jan 30, 2015 at 2:56 AM, Nathan Sullivan <nathan.sullivan@ni.com<mailto:nathan.sullivan@ni.com>> wrote:
> > > 
> > > While investigating a performance issue on RT 3.14, I noticed that
> > > __HAVE_ARCH_CMPXCHG is not defined for ARM.  This causes -rt locks to
> > > always use the slow path and impacts performance.  ARMv6 and above
> > > have a cmpxchg implementation already that will work just fine.
> > > 
> > > In 2012, Yong Zhang submitted a patch to fix this[1], but it mysteriously
> > > disappeared afterwards.  I did notice that the s-o-b line has a different
> > > email address than the windriver one.  Yong, do you mind re-submitting the
> > > 
> > > It actually doesn't matter. Anyway feel free to add another s-o-b to
> > > the patch:
> > > 
> > > Signed-off-by: Yong Zhang <yong.zhang@windriver.com<mailto:yong.zhang@windriver.com>>
> > 
> > [...]
> > 
> > > diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h
> > > index 7eb18c1..a91b44e 100644
> > > --- a/arch/arm/include/asm/cmpxchg.h
> > > +++ b/arch/arm/include/asm/cmpxchg.h
> > > @@ -127,6 +127,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
> > > 
> > >  #else  /* min ARCH >= ARMv6 */
> > > 
> > > +#define __HAVE_ARCH_CMPXCHG 1
> > 
> > Is it even possible to have CONFIG_SMP=y on an architecture that doesn't
> > support native cmpxchg? Certainly, the asm-generic cmpxchg.h barfs if
> > SMP, so couldn't we just change/extend the __HAVE_ARCH_CMPXCHG check in
> > rtmutex.c to work automatically with SMP architectures? That would help
> > arm64 too.
> > 
> > Will
> 
> What about single Cortex-A8s, for example?  Seems like a non-SMP build
> should still have cmpxchg.

If you have benchmarks that show an improvement then sure, but I don't
know how much an uncontended rtmutex would really care.

Will
Josh Cartwright Feb. 3, 2015, 7:27 p.m. UTC | #4
On Mon, Feb 02, 2015 at 01:39:33PM +0000, Will Deacon wrote:
> On Fri, Jan 30, 2015 at 08:44:48AM +0000, Yong Zhang wrote:
> > On Fri, Jan 30, 2015 at 2:56 AM, Nathan Sullivan <nathan.sullivan@ni.com<mailto:nathan.sullivan@ni.com>> wrote:
> > 
> > While investigating a performance issue on RT 3.14, I noticed that
> > __HAVE_ARCH_CMPXCHG is not defined for ARM.  This causes -rt locks to
> > always use the slow path and impacts performance.  ARMv6 and above
> > have a cmpxchg implementation already that will work just fine.
> > 
> > In 2012, Yong Zhang submitted a patch to fix this[1], but it mysteriously
> > disappeared afterwards.  I did notice that the s-o-b line has a different
> > email address than the windriver one.  Yong, do you mind re-submitting the
> > 
> > It actually doesn't matter. Anyway feel free to add another s-o-b to
> > the patch:
> > 
> > Signed-off-by: Yong Zhang <yong.zhang@windriver.com<mailto:yong.zhang@windriver.com>>

I did a quick scan of architectures and looked at the various cmpxchg
implementations.  Assuming that __HAVE_ARCH_CMPXCHG really means "this
architecture implements a native cmpxchg which is faster than the
generic implementation", I put what I found in the following table; note
that I'm not 100% confident these are correct, as in some instances I
had to make a judgement call as to whether the implementation is "fast",
hopefully it's still useful :).

                 native         native
                cmpxchg (UP)   cmpxchg (SMP)    __HAVE_ARCH_CMPXCHG
          alpha     X               X                      X
            arc     X[1]            X[1]
arm (pre ARMv6)                    [2]
   arm (ARMv6+)     X               X
          arm64     X               X
          avr32     X               X                      X
       blackfin                     X
            c6x                    [2]
           cris                    [3]
            frv     X               X
        hexagon     X               X                      X
           ia64     X               X                      X
           m32r     X               X                      X
           m68k     X[4]            X[4]                   X[4]
          metag     X               X                      X
     microblaze                    [2]
           mips     X               X
        mn10300                     X[5]
          nios2                    [2]
       openrisc                    [6]
         parisc     X               X                      X
        powerpc     X               X                      X
           s390     X               X                      X
          score    [7]             [7]                     X[7]
             sh     X               X                      X
          sparc     X               X                      X
           tile     X               X
      unicore32                    [2]
            x86     X               X                      X
         xtensa     X               X

1: Conditional on CONFIG_ARC_HAS_LLSC
2: SMP not a supported configuration
3: No cmpxchg() implementation at all?
4: Conditional on CONFIG_RMW_INSNS
5: Conditional on CONFIG_MN10300_HAS_ATOMIC_OPS_UNIT
6: SMP not supported?  Uses generic cmpxchg.
7: Should use generic cmpxchg/cmpxchg_local?  Setting __HAVE_ARCH_CMPXCHG a bug?

> > +#define __HAVE_ARCH_CMPXCHG 1
> 
> Is it even possible to have CONFIG_SMP=y on an architecture that doesn't
> support native cmpxchg? Certainly, the asm-generic cmpxchg.h barfs if
> SMP, so couldn't we just change/extend the __HAVE_ARCH_CMPXCHG check in
> rtmutex.c to work automatically with SMP architectures? That would help
> arm64 too.

Looks like there are a few arch-specific conditionals that might need to
be worked out for arc, m68k, and mn10300; but this could potentially
work.

If we don't go down that route, we should probably at least add
__HAVE_ARCH_CMPXCHG for arm64, blackfin (on SMP), frv, mips, tile, and
xtensa as well.

  Josh
Sebastian Andrzej Siewior Feb. 18, 2015, 10:43 a.m. UTC | #5
* Will Deacon | 2015-02-02 13:39:33 [+0000]:

>> diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h
>> index 7eb18c1..a91b44e 100644
>> --- a/arch/arm/include/asm/cmpxchg.h
>> +++ b/arch/arm/include/asm/cmpxchg.h
>> @@ -127,6 +127,8 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
>> 
>>  #else  /* min ARCH >= ARMv6 */
>> 
>> +#define __HAVE_ARCH_CMPXCHG 1
>
>Is it even possible to have CONFIG_SMP=y on an architecture that doesn't
>support native cmpxchg? Certainly, the asm-generic cmpxchg.h barfs if
>SMP, so couldn't we just change/extend the __HAVE_ARCH_CMPXCHG check in
>rtmutex.c to work automatically with SMP architectures? That would help
>arm64 too.

RT mutex was first introduced in 23f78d4a0 ("[PATCH] pi-futex: rt mutex core")
which is v2.6.18. The generic cmpxchg was introduced laster in
068fbad288 ("Add cmpxchg_local to asm-generic for per cpu atomic
operations") which is v2.6.25.
Back then something was required to get RT mutex working with the fast
path on architectures without cmpxchg and this seems to be the result.
Please correct me if I'm wrong.

On the other hand, how do you implement a SMP systems without native cmpxchg()
support? Isn't this a prerequisite?

Right now I don't see a reason why we should not remove
__HAVE_ARCH_CMPXCHG and assume cmpxchg() is always there.

>Will

Sebastian
Sebastian Andrzej Siewior Feb. 18, 2015, 10:44 a.m. UTC | #6
* Josh Cartwright | 2015-02-03 13:27:30 [-0600]:

>                 native         native
>                cmpxchg (UP)   cmpxchg (SMP)    __HAVE_ARCH_CMPXCHG
>         xtensa     X               X

This one does not always have cmpxchg on UP (only the SMP variant which
provides atomic operation).

>  Josh

Sebastian
Josh Cartwright Feb. 18, 2015, 4:34 p.m. UTC | #7
On Wed, Feb 18, 2015 at 11:44:59AM +0100, Sebastian Andrzej Siewior wrote:
> * Josh Cartwright | 2015-02-03 13:27:30 [-0600]:
> 
> >                 native         native
> >                cmpxchg (UP)   cmpxchg (SMP)    __HAVE_ARCH_CMPXCHG
> >         xtensa     X               X
> 
> This one does not always have cmpxchg on UP (only the SMP variant which
> provides atomic operation).

Ah!  Thanks for double checking my work; my eyes started to glaze over
after looking at the other 29 architectures.

  Josh
diff mbox

Patch

diff --git a/arch/arm/include/asm/cmpxchg.h b/arch/arm/include/asm/cmpxchg.h
index 7eb18c1..a91b44e 100644
--- a/arch/arm/include/asm/cmpxchg.h
+++ b/arch/arm/include/asm/cmpxchg.h
@@ -127,6 +127,8 @@  static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 
 #else	/* min ARCH >= ARMv6 */
 
+#define __HAVE_ARCH_CMPXCHG 1
+
 extern void __bad_cmpxchg(volatile void *ptr, int size);
 
 /*