From patchwork Wed Feb 6 03:04:36 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kim Phillips X-Patchwork-Id: 2101491 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork1.kernel.org (Postfix) with ESMTP id 98E183FC23 for ; Wed, 6 Feb 2013 03:10:35 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1U2vM8-0003Hx-N1; Wed, 06 Feb 2013 03:07:28 +0000 Received: from co9ehsobe004.messaging.microsoft.com ([207.46.163.27] helo=co9outboundpool.messaging.microsoft.com) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1U2vM5-0003HF-JO for linux-arm-kernel@lists.infradead.org; Wed, 06 Feb 2013 03:07:26 +0000 Received: from mail206-co9-R.bigfish.com (10.236.132.236) by CO9EHSOBE033.bigfish.com (10.236.130.96) with Microsoft SMTP Server id 14.1.225.23; Wed, 6 Feb 2013 03:07:23 +0000 Received: from mail206-co9 (localhost [127.0.0.1]) by mail206-co9-R.bigfish.com (Postfix) with ESMTP id EC054C01B4; Wed, 6 Feb 2013 03:07:22 +0000 (UTC) X-Forefront-Antispam-Report: CIP:70.37.183.190; KIP:(null); UIP:(null); IPV:NLI; H:mail.freescale.net; RD:none; EFVD:NLI X-SpamScore: -4 X-BigFish: VS-4(zz98dI936eI1432I4015Izz1ee6h1de0h1202h1e76h1d1ah1d2ahzz17326ah8275bh8275dhz2dh2a8h668h839h944hd24he5bhf0ah1220h1288h12a5h12a9h12bdh137ah139eh13b6h1441h1504h1537h162dh1631h1758h1898h18e1h1946h19b5h1155h) Received: from mail206-co9 (localhost.localdomain [127.0.0.1]) by mail206-co9 (MessageSwitch) id 1360120041191542_20688; Wed, 6 Feb 2013 03:07:21 +0000 (UTC) Received: from CO9EHSMHS019.bigfish.com (unknown [10.236.132.245]) by mail206-co9.bigfish.com (Postfix) with ESMTP id 21AEE9800B9; Wed, 6 Feb 2013 03:07:21 +0000 (UTC) Received: from mail.freescale.net (70.37.183.190) by CO9EHSMHS019.bigfish.com (10.236.130.29) with Microsoft SMTP Server (TLS) id 14.1.225.23; Wed, 6 Feb 2013 03:07:20 +0000 Received: from az84smr01.freescale.net (10.64.34.197) by 039-SN1MMR1-001.039d.mgd.msft.net (10.84.1.13) with Microsoft SMTP Server (TLS) id 14.2.318.3; Wed, 6 Feb 2013 03:07:19 +0000 Received: from x9.am.freescale.net (x9.am.freescale.net [10.82.120.9]) by az84smr01.freescale.net (8.14.3/8.14.0) with SMTP id r1637CoM012638; Tue, 5 Feb 2013 20:07:12 -0700 Date: Tue, 5 Feb 2013 21:04:36 -0600 From: Kim Phillips To: "Woodhouse, David" Subject: Re: [RFC] arm: use built-in byte swap function Message-ID: <20130205210436.670c62e26d2121330e87af35@freescale.com> In-Reply-To: <1359703995.23531.6.camel@shinybook.infradead.org> References: <20130128193033.8a0b0a871150c99247f05a95@freescale.com> <20130129083522.GA14302@pd.tnic> <1359478014.3529.157.camel@shinybook.infradead.org> <20130129174249.GB25415@pd.tnic> <1359482147.3529.161.camel@shinybook.infradead.org> <20130129181046.GC25415@pd.tnic> <1359541333.3529.186.camel@shinybook.infradead.org> <20130130200900.9d7cf7908caeaef4ecee1d61@freescale.com> <20130131092801.GV23505@n2100.arm.linux.org.uk> <20130131145947.f62474a0600848df86548b96@freescale.com> <20130201011712.GF23505@n2100.arm.linux.org.uk> <1359703995.23531.6.camel@shinybook.infradead.org> Organization: Freescale Semiconductor, Inc. X-Mailer: Sylpheed 3.2.0 (GTK+ 2.24.13; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-OriginatorOrg: freescale.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130205_220725_846012_B88379FF X-CRM114-Status: GOOD ( 30.12 ) X-Spam-Score: -4.2 (----) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-4.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [207.46.163.27 listed in list.dnswl.org] -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: Russell King - ARM Linux , Rusty Russell , "linux-kernel@vger.kernel.org" , Daniel Santos , Borislav Petkov , David Rientjes , Andrew Morton , "linux-arm-kernel@lists.infradead.org" X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org On Fri, 1 Feb 2013 07:33:17 +0000 "Woodhouse, David" wrote: > On Fri, 2013-02-01 at 01:17 +0000, Russell King - ARM Linux wrote: > > > > > I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to > > > a 4.5.x, I'll try that, too, but as it stands now, if one moves the > > > code added to swab.h below outside of its armv6 protection, > > > gcc adds calls to __bswapsi2. > > > > Take a look at the message I sent on the 29th towards the beginning of > > this thread for details of gcc 4.5.4 behaviour. > > I'd like to see a comment (with PR# if appropriate) explaining clearly > *why* it isn't enabled for Russell's test also seemed to indicate that the 32-bit and 64-bit swap > support was present and functional in GCC 4.5.4 (as indeed it should > have been since 4.4), so I'm still not quite sure why you require 4.6 > for that. initially it was based at looking at gcc commit history for the 'rev' instruction implementation, but now I've got 4.4, 4.5, 4.6 and 4.7 compilers to perform Russell's test: $ for cc in 4.4 4.5 4.6 4.7; do \ arm-linux-gnueabi-gcc-$cc --version | grep gcc ; \ for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \ echo -n $a:; \ for f in 16 32 64; do \ echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | arm-linux-gnueabi-gcc-$cc -w -x c -S -o - - -march=$a | grep 'bl'; \ done; \ done; \ done whose output is: arm-linux-gnueabi-gcc-4.4 (Ubuntu/Linaro 4.4.7-1ubuntu2) 4.4.7 armv3: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv4: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv4t: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv5t: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv5te: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv6k: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv6: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 armv7-a: bl __builtin_bswap16 bl __bswapsi2 bl __bswapdi2 arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3 armv3: bl __builtin_bswap16 armv4: bl __builtin_bswap16 armv4t: bl __builtin_bswap16 armv5t: bl __builtin_bswap16 armv5te: bl __builtin_bswap16 armv6k: bl __builtin_bswap16 armv6: bl __builtin_bswap16 armv7-a: bl __builtin_bswap16 arm-linux-gnueabi-gcc-4.6 (Ubuntu/Linaro 4.6.3-8ubuntu1) 4.6.3 20120624 (prerelease) armv3: bl __builtin_bswap16 armv4: bl __builtin_bswap16 armv4t: bl __builtin_bswap16 armv5t: bl __builtin_bswap16 armv5te: bl __builtin_bswap16 armv6k: bl __builtin_bswap16 armv6: bl __builtin_bswap16 armv7-a: bl __builtin_bswap16 arm-linux-gnueabi-gcc-4.7 (Ubuntu/Linaro 4.7.2-1ubuntu1) 4.7.2 armv3: bl __builtin_bswap16 armv4: bl __builtin_bswap16 armv4t: bl __builtin_bswap16 armv5t: bl __builtin_bswap16 armv5te: bl __builtin_bswap16 armv6k: bl __builtin_bswap16 armv6: bl __builtin_bswap16 armv7-a: bl __builtin_bswap16 So 4.4 should be exempt from using the built-ins because it always emits __bswapsi2 calls: it doesn't matter whether or not -Os or -O2 are added as options in the test. gcc 4.5, 4.6, and 4.7 all support 32 & 64-bit versions, so we should check for gcc >= 4.5 instead of gcc >= 4.6. I've added a new check for !CC_OPTIMIZE_FOR_SIZE and build-tested all defconfigs with gcc 4.6.3 - here's v5: From 11aa942a84fe94d204424a19b6b13fdb2b359ee6 Mon Sep 17 00:00:00 2001 From: Kim Phillips Date: Mon, 28 Jan 2013 19:30:33 -0600 Subject: [PATCH] arm: use built-in byte swap function Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings. __builtin_bswap{32,64} support was added in gcc 4.4, but until gcc 4.5, it emitted calls to libgcc's __bswap[sd]i2 (even with -O2). All gcc versions tested (4.[4567]) emit calls to __bswap[sd]i2 when optimizing for size, so we add the !CC_OPTIMIZE_FOR_SIZE check. Support for 16-bit built-ins will be in gcc version 4.8. This has a tiny benefit on vmlinux text size (gcc 4.6.4): multi_v7_defconfig: text data bss dec hex filename 3135208 188396 203344 3526948 35d124 vmlinux multi_v7_defconfig with builtin_bswap: text data bss dec hex filename 3135112 188396 203344 3526852 35d0c4 vmlinux exynos_defconfig: text data bss dec hex filename 4286605 360564 223172 4870341 4a50c5 vmlinux exynos_defconfig with builtin_bswap: text data bss dec hex filename 4286405 360564 223172 4870141 4a4ffd vmlinux The savings come mostly from device-tree related code, and some from drivers. Signed-off-by: Kim Phillips --- akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016 based on linux-next-20130128. Depends on commit "compiler-gcc{3,4}.h: Use GCC_VERSION macro" by Daniel Santos , currently in the akpm branch. v5: re-work based on new gcc version test data: - moved outside armv6 protection - check for gcc 4.6+ demoted to gcc 4.5+ with: !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) v4: - undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris and David - object is to find arches that define _HAVE_BSWAP and clean it up in the future: patch is much less intrusive. :) v3: - moved out of uapi swab.h into arch/arm/include/asm/swab.h - moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message - moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block v2: - at91 and lpd270 builds fixed by limiting to ARMv6 and above (i.e., ARM cores that have support for the 'rev' instruction). Otherwise, the compiler emits calls to libgcc's __bswapsi2 on these ARMv4/v5 builds (and arch ARM doesn't link with libgcc). All ARM defconfigs now have the same build status as they did without this patch (some are broken on linux-next). - move ARM check from generic compiler.h to arch ARM's swab.h. - pretty sure it should be limited to __KERNEL__ builds - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help). - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx - not too sure about this having to be a new CONFIG_, but it's hard to find a place for it given linux/compiler.h doesn't include any arch-specific files. - move new selects to end of CONFIG_ARM's Kconfig select list, as is done in David Woodhouse's original patchseries for ppc/x86. arch/arm/include/asm/swab.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h index 537fc9b..159ab16 100644 --- a/arch/arm/include/asm/swab.h +++ b/arch/arm/include/asm/swab.h @@ -35,4 +35,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x) #define __arch_swab32 __arch_swab32 #endif + +#if !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) && GCC_VERSION >= 40500 +#define __HAVE_BUILTIN_BSWAP32__ +#define __HAVE_BUILTIN_BSWAP64__ +#if GCC_VERSION >= 40800 +#define __HAVE_BUILTIN_BSWAP16__ +#endif +#endif + #endif