From patchwork Sun Aug 5 01:23:58 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rob Herring X-Patchwork-Id: 1274311 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id C6AC2E00E1 for ; Sun, 5 Aug 2012 01:28:22 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1Sxpa8-00032q-T1; Sun, 05 Aug 2012 01:24:36 +0000 Received: from mail-ob0-f177.google.com ([209.85.214.177]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1SxpZp-00031F-S3 for linux-arm-kernel@lists.infradead.org; Sun, 05 Aug 2012 01:24:18 +0000 Received: by mail-ob0-f177.google.com with SMTP id ta17so3725085obb.36 for ; Sat, 04 Aug 2012 18:24:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references; bh=q9cSzll6HTxI9oHKYlagwBo/6focYmFNzRI7I4Zr1UQ=; b=E2m3ku3e+N/YXwOIL2zK+FFERvYqc9Q56mA+K7zJ/rzQVIrrgTLYQP5tE2ycu+nHcK /yZAfLkYkrSfWovaYKch+/idJTzyLRb+cby+Cn1WrNoHnpGF09lye12pJXs+3uOFIUKd xa/BCbmDhZIdJXgUzgXaseVDACHfLY6YliL55CVuzCEel+ZOaBPK+W3NkPRO82Riseth bTmzpv5+1eknc49GobiX85yp04ruaP7EeAQUoLwlw6UVDwaBlAahuvZEXLNKdZgvbrQv /632yOcM0aocGu1X7Fp+hm7IRsLfgy2eVX7Rhnj/+0gRp7wHWNZ4UGN9Z2WRpaHgpGyl WFBA== Received: by 10.182.145.8 with SMTP id sq8mr12611502obb.50.1344129857652; Sat, 04 Aug 2012 18:24:17 -0700 (PDT) Received: from rob-laptop.grandecom.net (65-36-73-129.dyn.grandenetworks.net. [65.36.73.129]) by mx.google.com with ESMTPS id kc5sm13784480obb.21.2012.08.04.18.24.15 (version=SSLv3 cipher=OTHER); Sat, 04 Aug 2012 18:24:16 -0700 (PDT) From: Rob Herring To: linux-arm-kernel@lists.infradead.org Subject: [PATCH v2 3/5] ARM: use generic unaligned.h Date: Sat, 4 Aug 2012 20:23:58 -0500 Message-Id: <1344129840-16447-4-git-send-email-robherring2@gmail.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1344129840-16447-1-git-send-email-robherring2@gmail.com> References: <1344129840-16447-1-git-send-email-robherring2@gmail.com> X-Spam-Note: CRM114 invocation failed X-Spam-Score: -2.5 (--) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-2.5 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.214.177 listed in list.dnswl.org] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (robherring2[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (robherring2[at]gmail.com) -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature Cc: Olof Johansson , Russell King , Rob Herring , Arnd Bergmann , Nicolas Pitre X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org From: Rob Herring This moves ARM over to the asm-generic/unaligned.h header. This has the benefit of better code generated especially for ARMv7 on gcc 4.7+ compilers. As Arnd Bergmann, points out: The asm-generic version uses the "struct" version for native-endian unaligned access and the "byteshift" version for the opposite endianess. The current ARM version however uses the "byteshift" implementation for both. Thanks to Nicolas Pitre for the excellent analysis: Test case: int foo (int *x) { return get_unaligned(x); } long long bar (long long *x) { return get_unaligned(x); } With the current ARM version: foo: ldrb r3, [r0, #2] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B] ldrb r1, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B] ldrb r2, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)] mov r3, r3, asl #16 @ tmp154, MEM[(const u8 *)x_1(D) + 2B], ldrb r0, [r0, #3] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B] orr r3, r3, r1, asl #8 @, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B], orr r3, r3, r2 @ tmp157, tmp155, MEM[(const u8 *)x_1(D)] orr r0, r3, r0, asl #24 @,, tmp157, MEM[(const u8 *)x_1(D) + 3B], bx lr @ bar: stmfd sp!, {r4, r5, r6, r7} @, mov r2, #0 @ tmp184, ldrb r5, [r0, #6] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B] ldrb r4, [r0, #5] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B] ldrb ip, [r0, #2] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B] ldrb r1, [r0, #4] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B] mov r5, r5, asl #16 @ tmp175, MEM[(const u8 *)x_1(D) + 6B], ldrb r7, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B] orr r5, r5, r4, asl #8 @, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B], ldrb r6, [r0, #7] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B] orr r5, r5, r1 @ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B] ldrb r4, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)] mov ip, ip, asl #16 @ tmp188, MEM[(const u8 *)x_1(D) + 2B], ldrb r1, [r0, #3] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B] orr ip, ip, r7, asl #8 @, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B], orr r3, r5, r6, asl #24 @,, tmp178, MEM[(const u8 *)x_1(D) + 7B], orr ip, ip, r4 @ tmp191, tmp189, MEM[(const u8 *)x_1(D)] orr ip, ip, r1, asl #24 @, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B], mov r1, r3 @, orr r0, r2, ip @ tmp171, tmp184, tmp194 ldmfd sp!, {r4, r5, r6, r7} bx lr In both cases the code is slightly suboptimal. One may wonder why wasting r2 with the constant 0 in the second case for example. And all the mov's could be folded in subsequent orr's, etc. Now with the asm-generic version: foo: ldr r0, [r0, #0] @ unaligned @,* x bx lr @ bar: mov r3, r0 @ x, x ldr r0, [r0, #0] @ unaligned @,* x ldr r1, [r3, #4] @ unaligned @, bx lr @ This is way better of course, but only because this was compiled for ARMv7. In this case the compiler knows that the hardware can do unaligned word access. This isn't that obvious for foo(), but if we remove the get_unaligned() from bar as follows: long long bar (long long *x) {return *x; } then the resulting code is: bar: ldmia r0, {r0, r1} @ x,, bx lr @ So this proves that the presumed aligned vs unaligned cases does have influence on the instructions the compiler may use and that the above unaligned code results are not just an accident. Still... this isn't fully conclusive without at least looking at the resulting assembly fron a pre ARMv6 compilation. Let's see with an ARMv5 target: foo: ldrb r3, [r0, #0] @ zero_extendqisi2 @ tmp139,* x ldrb r1, [r0, #1] @ zero_extendqisi2 @ tmp140, ldrb r2, [r0, #2] @ zero_extendqisi2 @ tmp143, ldrb r0, [r0, #3] @ zero_extendqisi2 @ tmp146, orr r3, r3, r1, asl #8 @, tmp142, tmp139, tmp140, orr r3, r3, r2, asl #16 @, tmp145, tmp142, tmp143, orr r0, r3, r0, asl #24 @,, tmp145, tmp146, bx lr @ bar: stmfd sp!, {r4, r5, r6, r7} @, ldrb r2, [r0, #0] @ zero_extendqisi2 @ tmp139,* x ldrb r7, [r0, #1] @ zero_extendqisi2 @ tmp140, ldrb r3, [r0, #4] @ zero_extendqisi2 @ tmp149, ldrb r6, [r0, #5] @ zero_extendqisi2 @ tmp150, ldrb r5, [r0, #2] @ zero_extendqisi2 @ tmp143, ldrb r4, [r0, #6] @ zero_extendqisi2 @ tmp153, ldrb r1, [r0, #7] @ zero_extendqisi2 @ tmp156, ldrb ip, [r0, #3] @ zero_extendqisi2 @ tmp146, orr r2, r2, r7, asl #8 @, tmp142, tmp139, tmp140, orr r3, r3, r6, asl #8 @, tmp152, tmp149, tmp150, orr r2, r2, r5, asl #16 @, tmp145, tmp142, tmp143, orr r3, r3, r4, asl #16 @, tmp155, tmp152, tmp153, orr r0, r2, ip, asl #24 @,, tmp145, tmp146, orr r1, r3, r1, asl #24 @,, tmp155, tmp156, ldmfd sp!, {r4, r5, r6, r7} bx lr Compared to the initial results, this is really nicely optimized and I couldn't do much better if I were to hand code it myself. Signed-off-by: Rob Herring Reviewed-by: Nicolas Pitre --- arch/arm/include/asm/Kbuild | 1 + arch/arm/include/asm/unaligned.h | 19 ------------------- 2 files changed, 1 insertion(+), 19 deletions(-) delete mode 100644 arch/arm/include/asm/unaligned.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 6cc8a13..a86cabf 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -33,3 +33,4 @@ generic-y += sockios.h generic-y += termbits.h generic-y += timex.h generic-y += types.h +generic-y += unaligned.h diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h deleted file mode 100644 index 44593a8..0000000 --- a/arch/arm/include/asm/unaligned.h +++ /dev/null @@ -1,19 +0,0 @@ -#ifndef _ASM_ARM_UNALIGNED_H -#define _ASM_ARM_UNALIGNED_H - -#include -#include -#include - -/* - * Select endianness - */ -#ifndef __ARMEB__ -#define get_unaligned __get_unaligned_le -#define put_unaligned __put_unaligned_le -#else -#define get_unaligned __get_unaligned_be -#define put_unaligned __put_unaligned_be -#endif - -#endif /* _ASM_ARM_UNALIGNED_H */