diff mbox

[RFC] arm: use built-in byte swap function

Message ID 20130128193033.8a0b0a871150c99247f05a95@freescale.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kim Phillips Jan. 29, 2013, 1:30 a.m. UTC
Enable the compiler intrinsic for byte swapping on arch ARM.  This
allows the compiler to detect and be able to optimize out byte
swappings, e.g. in big endian to big endian moves.

AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
and for the 16-bit version in 4.8.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next.  Depends on commit "compiler-gcc{3,4}.h: Use
GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
currently in the akpm branch.

RFC because of unfamiliarity with arch ARM, and that at91sam9rl,
at91rm9200, and lpd270 (so far, at least) builds fail with:

include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2'

I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4
20120303 (prerelease)

 arch/arm/Kconfig              |    1 +
 include/linux/compiler-gcc4.h |    3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Borislav Petkov Jan. 29, 2013, 8:35 a.m. UTC | #1
On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, e.g. in big endian to big endian moves.
> 
> AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
> and for the 16-bit version in 4.8.
> 
> Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
> ---
> akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016
> 
> based on linux-next.  Depends on commit "compiler-gcc{3,4}.h: Use
> GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
> currently in the akpm branch.
> 
> RFC because of unfamiliarity with arch ARM, and that at91sam9rl,
> at91rm9200, and lpd270 (so far, at least) builds fail with:
> 
> include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2'
> 
> I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4
> 20120303 (prerelease)
> 
>  arch/arm/Kconfig              |    1 +
>  include/linux/compiler-gcc4.h |    3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index eda8711..437d11a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -3,6 +3,7 @@ config ARM
>  	default y
>  	select ARCH_BINFMT_ELF_RANDOMIZE_PIE
>  	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> +	select ARCH_USE_BUILTIN_BSWAP
>  	select ARCH_HAVE_CUSTOM_GPIO_H
>  	select ARCH_WANT_IPC_PARSE_VERSION
>  	select BUILDTIME_EXTABLE_SORT if MMU
> diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
> index 68b162d..da5f728 100644
> --- a/include/linux/compiler-gcc4.h
> +++ b/include/linux/compiler-gcc4.h
> @@ -67,7 +67,8 @@
>  
>  
>  #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
> -#if GCC_VERSION >= 40400
> +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
> +    (defined(__arm__) && GCC_VERSION >= 40600)

There should be no arch-specific stuff in a generic header. I guess
you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific
compiler.h header after checking compiler version...

Thanks.
Russell King - ARM Linux Jan. 29, 2013, 2:13 p.m. UTC | #2
On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote:
> AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
> and for the 16-bit version in 4.8.

Hmm.

$ /usr/local/aeabi/bin/arm-linux-gcc --version
arm-linux-gcc (GCC) 4.5.4
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \
   echo $a:; \
   for f in 16 32 64; do \
      echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | \
      /usr/local/aeabi/bin/arm-linux-gcc -x c -S -o - - -march=$a | grep 'bl'; \
   done; \
done

produces:

armv3:
        bl __builtin_bswap16
armv4:
        bl __builtin_bswap16
armv4t:
        bl __builtin_bswap16
armv5t:
        bl __builtin_bswap16
armv5te:
        bl __builtin_bswap16
armv6k:
        bl __builtin_bswap16
armv6:
        bl __builtin_bswap16
armv7-a:
        bl __builtin_bswap16

So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps
across all our architectures, but not the 16-bit ones.
David Woodhouse Jan. 29, 2013, 2:43 p.m. UTC | #3
On Tue, 2013-01-29 at 14:13 +0000, Russell King - ARM Linux wrote:
> 
> So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps
> across all our architectures, but not the 16-bit ones.

That observation is consistent with my dig through GCC history. I had
come to the conclusion that the 32-bit and 64-bit versions were added
*generically* in 4.4, and that the 16-bit version was added in 4.6 to
that PowerPC back end, and made generic in 4.8. So I *had* put that
arch-specific check into compiler-gcc4.h, for PowerPC. It's just outside
the context of Kim's patch. If it really does end up being different for
every arch, I may reconsider that.

As for the __bswapsi2() calls... if it's ever emitting an out-of-line
call for something like that, that seems like a really dubious decision;
surely it's better being inlined? So rather than adding it to your
bits-of-libgcc.a in the kernel, I'd suggest just *not* using
ARCH_USE_BUILTIN_BSWAP for the offending compilers, and filing a bug to
get them fixed.

But really, this is why I created ARCH_USE_BUILTIN_BSWAP and left it to
architecture maintainers to enable it at their leisure.... :)
Rob Herring Jan. 29, 2013, 2:53 p.m. UTC | #4
On 01/28/2013 07:30 PM, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM.  This
> allows the compiler to detect and be able to optimize out byte
> swappings, e.g. in big endian to big endian moves.
> 
> AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
> and for the 16-bit version in 4.8.

I think these are the versions ARM got optimized swap support. The newer
compilers are smart enough to replace a swap written in C with a rev
instruction. Last time I checked, they did not do rev16, but that is
probably what was added in 4.8 (and backported to Linaro gcc). The
Linaro backports make it impossible to define what gcc version has a
specific feature.

If you specify to use the builtin's, do you still get inline rev
instructions emitted?

Rob

> 
> Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
> ---
> akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016
> 
> based on linux-next.  Depends on commit "compiler-gcc{3,4}.h: Use
> GCC_VERSION macro" by Daniel Santos <daniel.santos@pobox.com>,
> currently in the akpm branch.
> 
> RFC because of unfamiliarity with arch ARM, and that at91sam9rl,
> at91rm9200, and lpd270 (so far, at least) builds fail with:
> 
> include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2'
> 
> I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4
> 20120303 (prerelease)
> 
>  arch/arm/Kconfig              |    1 +
>  include/linux/compiler-gcc4.h |    3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index eda8711..437d11a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -3,6 +3,7 @@ config ARM
>  	default y
>  	select ARCH_BINFMT_ELF_RANDOMIZE_PIE
>  	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> +	select ARCH_USE_BUILTIN_BSWAP
>  	select ARCH_HAVE_CUSTOM_GPIO_H
>  	select ARCH_WANT_IPC_PARSE_VERSION
>  	select BUILDTIME_EXTABLE_SORT if MMU
> diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
> index 68b162d..da5f728 100644
> --- a/include/linux/compiler-gcc4.h
> +++ b/include/linux/compiler-gcc4.h
> @@ -67,7 +67,8 @@
>  
>  
>  #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
> -#if GCC_VERSION >= 40400
> +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
> +    (defined(__arm__) && GCC_VERSION >= 40600)
>  #define __HAVE_BUILTIN_BSWAP32__
>  #define __HAVE_BUILTIN_BSWAP64__
>  #endif
>
David Woodhouse Jan. 29, 2013, 3:10 p.m. UTC | #5
On Tue, 2013-01-29 at 08:53 -0600, Rob Herring wrote:
> If you specify to use the builtin's, do you still get inline rev
> instructions emitted?

You mean from the architecture's __arch_swab32() et al. macros?

No. If the architecture enables ARCH_USE_BUILTIN_BSWAP and the compiler
version checks indicate that the correspondingly-sized builtin is
available, then the builtin will be used. Only if the arch *doesn't* set
ARCH_USE_BUILTIN_BSWAP, or if the compiler is old enough not to have the
corresponding intrinsic, will the inline assembler in the __arch_swabXX
macros get used.

Note, however, that there is no such compiler intrinsic for swahb32(),
which is where rev16 gets used. That's still being left to the inline
asm in all cases.
David Woodhouse Jan. 29, 2013, 4:46 p.m. UTC | #6
On Tue, 2013-01-29 at 09:35 +0100, Borislav Petkov wrote:
> 
> >  
> >  #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
> > -#if GCC_VERSION >= 40400
> > +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
> > +    (defined(__arm__) && GCC_VERSION >= 40600)
> 
> There should be no arch-specific stuff in a generic header. I guess
> you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific
> compiler.h header after checking compiler version...

If we're really going to have many different architectures depending on
different versions of GCC for this (if it wasn't sane to use it from
4.4/4.8 when it got introduced, and depends on some later arch-specific
optimisation), then perhaps we'll have the arch provide the
corresponding required GCC_VERSION for using each of 64/32/16 bit
builtins, instead of just a yes/no flag? Or just define
__HAVE_BUILTIN_BSWAPxx__ for itself, perhaps?

In fact we could start by having just the problematic architectures set
__HAVE_BUILTIN_BSWAPxx__ for themselves according to whatever criteria
they want, and then if there's scope for consolidating that in the
generic code then we can do so later.
Borislav Petkov Jan. 29, 2013, 5:42 p.m. UTC | #7
On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote:
> If we're really going to have many different architectures depending
> on different versions of GCC for this (if it wasn't sane to use
> it from 4.4/4.8 when it got introduced, and depends on some later
> arch-specific optimisation), then perhaps we'll have the arch
> provide the corresponding required GCC_VERSION for using each of
> 64/32/16 bit builtins, instead of just a yes/no flag? Or just define
> __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps?

Damn, there's already the __powerpc__ thing in there.

Yeah, something like defininig __HAVE_BUILTIN_BSWAPxx__ makes sense and
can keep the header arch-agnostic without growing all those different
arch defines.

But I liked your other suggestion better to get the offending compilers
fixed.

I dunno though, how generically is stuff like that getting implemented
for every arch so probably single arches doing __HAVE* defines is
probably going to be the realizable solution in the end.

Hmmm.
David Woodhouse Jan. 29, 2013, 5:55 p.m. UTC | #8
On Tue, 2013-01-29 at 18:42 +0100, Borislav Petkov wrote:
> On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote:
> > If we're really going to have many different architectures depending
> > on different versions of GCC for this (if it wasn't sane to use
> > it from 4.4/4.8 when it got introduced, and depends on some later
> > arch-specific optimisation), then perhaps we'll have the arch
> > provide the corresponding required GCC_VERSION for using each of
> > 64/32/16 bit builtins, instead of just a yes/no flag? Or just define
> > __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps?
> 
> Damn, there's already the __powerpc__ thing in there.

That's different though. That's because GCC didn't have a generic
__builtin_bswap16() until 4.8, while PowerPC got it in 4.6.

That's a relatively simple and manageable one-off arch-dependency. But
once we get into a mass of "well, it wasn't actually *usable* for ARM
until 4.9", we start wanting to push it into arch code.

> But I liked your other suggestion better to get the offending
> compilers fixed.

That wasn't an *alternative*. It's required if the compiler is doing
something suboptimal, whatever happens. And then we start to *use* the
compiler from the first version that's known to be fixed.
Borislav Petkov Jan. 29, 2013, 6:10 p.m. UTC | #9
On Tue, Jan 29, 2013 at 05:55:54PM +0000, Woodhouse, David wrote:
> That's different though. That's because GCC didn't have a generic
> __builtin_bswap16() until 4.8, while PowerPC got it in 4.6.
> 
> That's a relatively simple and manageable one-off arch-dependency. But
> once we get into a mass of "well, it wasn't actually *usable* for ARM
> until 4.9", we start wanting to push it into arch code.

Yeah, and once people see one arch mentioned, all of a sudden they think
it is ok to add just one more ...

> > But I liked your other suggestion better to get the offending
> > compilers fixed.
> 
> That wasn't an *alternative*. It's required if the compiler is doing
> something suboptimal, whatever happens. And then we start to *use* the
> compiler from the first version that's known to be fixed.

So, IMHO it sounds to me like we want to explicitly state for each arch
separately that it is ok to use the __builtin_bswapXX things. This also
takes care of the case where the compiler is doing something suboptimal
by excluding the affected versions.

Thanks.
David Woodhouse Jan. 30, 2013, 10:22 a.m. UTC | #10
On Tue, 2013-01-29 at 19:10 +0100, Borislav Petkov wrote:
> So, IMHO it sounds to me like we want to explicitly state for each arch
> separately that it is ok to use the __builtin_bswapXX things. This also
> takes care of the case where the compiler is doing something suboptimal
> by excluding the affected versions.

Well, if it really does end up being different for every architecture,
then that means I probably made the wrong decision when I chose to make
it "generic", and override the __arch_swabXX() macros. I could have just
pushed all the architectures to use the builtins in their __arch_swabXX
macros instead, as appropriate.

Let's see how many special cases we actually end up with, and perhaps
we'll end up switching that round. For now, let's just make ARM set
__HAVE_BUILTIN_BSWAPxx__ for the appropriate sizes in <asm/swab.h>,
according to whatever criteria it needs.

It's not entirely clear how much of a win it is on ARM anyway; we don't
have load-and-swap or store-and-swap instructions so there are only a
few added opportunities for optimisation that we get by letting the
compiler see what's going on.
diff mbox

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index eda8711..437d11a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@  config ARM
 	default y
 	select ARCH_BINFMT_ELF_RANDOMIZE_PIE
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+	select ARCH_USE_BUILTIN_BSWAP
 	select ARCH_HAVE_CUSTOM_GPIO_H
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select BUILDTIME_EXTABLE_SORT if MMU
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 68b162d..da5f728 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -67,7 +67,8 @@ 
 
 
 #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
-#if GCC_VERSION >= 40400
+#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
+    (defined(__arm__) && GCC_VERSION >= 40600)
 #define __HAVE_BUILTIN_BSWAP32__
 #define __HAVE_BUILTIN_BSWAP64__
 #endif