From patchwork Thu Nov 7 19:20:42 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stephen Boyd X-Patchwork-Id: 3154131 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 6609EBEEB2 for ; Thu, 7 Nov 2013 19:21:25 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 30072202AB for ; Thu, 7 Nov 2013 19:21:24 +0000 (UTC) Received: from casper.infradead.org (casper.infradead.org [85.118.1.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D36B42047D for ; Thu, 7 Nov 2013 19:21:22 +0000 (UTC) Received: from merlin.infradead.org ([2001:4978:20e::2]) by casper.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1VeV8j-0002AZ-UE; Thu, 07 Nov 2013 19:21:14 +0000 Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1VeV8h-00007p-Kj; Thu, 07 Nov 2013 19:21:11 +0000 Received: from smtp.codeaurora.org ([198.145.11.231]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1VeV8e-000076-6H for linux-arm-kernel@lists.infradead.org; Thu, 07 Nov 2013 19:21:09 +0000 Received: from smtp.codeaurora.org (localhost [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id 654D613EF71; Thu, 7 Nov 2013 19:20:46 +0000 (UTC) Received: by smtp.codeaurora.org (Postfix, from userid 486) id 5941413F29E; Thu, 7 Nov 2013 19:20:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 Received: from sboyd-linux.qualcomm.com (i-global252.qualcomm.com [199.106.103.252]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: sboyd@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 9D88713EF71; Thu, 7 Nov 2013 19:20:45 +0000 (UTC) From: Stephen Boyd To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions Date: Thu, 7 Nov 2013 11:20:42 -0800 Message-Id: <1383852042-10780-1-git-send-email-sboyd@codeaurora.org> X-Mailer: git-send-email 1.8.5.rc0.44.gf26f72d X-Virus-Scanned: ClamAV using ClamSMTP X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20131107_142108_387978_BEA8F878 X-CRM114-Status: GOOD ( 20.31 ) X-Spam-Score: -1.9 (-) Cc: linux-kernel@vger.kernel.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP If we're running on a v7 ARM CPU, detect if the CPU supports the sdiv/udiv instructions and replace the signed and unsigned division library functions with an sdiv/udiv instruction. Running the perf messaging benchmark in pipe mode $ perf bench sched messaging -p shows a modest improvement on my v7 CPU. before: (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 5.683) / 10 = 5.805 after: (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 5.093) / 10 = 5.538 (5.805 - 5.538) / 5.805 = 4.6% Signed-off-by: Stephen Boyd --- Should we add in the __div0() call if the denominator is 0? arch/arm/kernel/setup.c | 10 +++++++++ arch/arm/lib/Makefile | 3 +++ arch/arm/lib/div-v7.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++ arch/arm/lib/lib1funcs.S | 16 +++++++++++++ 4 files changed, 87 insertions(+) create mode 100644 arch/arm/lib/div-v7.c diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 0e1e2b3..7d519f4 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -365,6 +366,8 @@ void __init early_print(const char *str, ...) printk("%s", buf); } +struct static_key cpu_has_idiv = STATIC_KEY_INIT_FALSE; + static void __init cpuid_init_hwcaps(void) { unsigned int divide_instrs, vmsa; @@ -381,6 +384,13 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } +#ifdef CONFIG_THUMB2_KERNEL + if (elf_hwcap & HWCAP_IDIVT) +#else + if (elf_hwcap & HWCAP_IDIVA) +#endif + static_key_slow_inc(&cpu_has_idiv); + /* LPAE implies atomic ldrd/strd instructions */ vmsa = (read_cpuid_ext(CPUID_EXT_MMFR0) & 0xf) >> 0; if (vmsa >= 5) diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index bd454b0..6ed6496 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -15,6 +15,9 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ call_with_stack.o +lib-$(CONFIG_CPU_V7) += div-v7.o +CFLAGS_div-v7.o := -march=armv7-a + mmu-y := clear_user.o copy_page.o getuser.o putuser.o # the code in uaccess.S is not preemption safe and diff --git a/arch/arm/lib/div-v7.c b/arch/arm/lib/div-v7.c new file mode 100644 index 0000000..96ceb92 --- /dev/null +++ b/arch/arm/lib/div-v7.c @@ -0,0 +1,58 @@ +/* Copyright (c) 2013, The Linux Foundation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 and + * only version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include + +extern int ___aeabi_idiv(int, int); +extern unsigned ___aeabi_uidiv(int, int); + +extern struct static_key cpu_has_idiv; + +int __aeabi_idiv(int numerator, int denominator) +{ + if (static_key_false(&cpu_has_idiv)) { + int ret; + + asm volatile ( + ".arch_extension idiv\n" + "sdiv %0, %1, %2" + : "=&r" (ret) + : "r" (numerator), "r" (denominator)); + + return ret; + } + + return ___aeabi_idiv(numerator, denominator); +} + +int __divsi3(int numerator, int denominator) + __attribute__((alias("__aeabi_idiv"))); + +unsigned __aeabi_uidiv(int numerator, int denominator) +{ + if (static_key_false(&cpu_has_idiv)) { + int ret; + + asm volatile ( + ".arch_extension idiv\n" + "udiv %0, %1, %2" + : "=&r" (ret) + : "r" (numerator), "r" (denominator)); + + return ret; + } + + return ___aeabi_uidiv(numerator, denominator); +} + +unsigned __udivsi3(int numerator, int denominator) + __attribute__((alias("__aeabi_uidiv"))); diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S index c562f64..adea088 100644 --- a/arch/arm/lib/lib1funcs.S +++ b/arch/arm/lib/lib1funcs.S @@ -205,8 +205,12 @@ Boston, MA 02111-1307, USA. */ .endm +#if defined(ZIMAGE) || !defined(CONFIG_CPU_V7) ENTRY(__udivsi3) ENTRY(__aeabi_uidiv) +#else +ENTRY(___aeabi_uidiv) +#endif UNWIND(.fnstart) subs r2, r1, #1 @@ -232,8 +236,12 @@ UNWIND(.fnstart) mov pc, lr UNWIND(.fnend) +#if defined(ZIMAGE) || !defined(CONFIG_CPU_V7) ENDPROC(__udivsi3) ENDPROC(__aeabi_uidiv) +#else +ENDPROC(___aeabi_uidiv) +#endif ENTRY(__umodsi3) UNWIND(.fnstart) @@ -253,8 +261,12 @@ UNWIND(.fnstart) UNWIND(.fnend) ENDPROC(__umodsi3) +#if defined(ZIMAGE) || !defined(CONFIG_CPU_V7) ENTRY(__divsi3) ENTRY(__aeabi_idiv) +#else +ENTRY(___aeabi_idiv) +#endif UNWIND(.fnstart) cmp r1, #0 @@ -293,8 +305,12 @@ UNWIND(.fnstart) mov pc, lr UNWIND(.fnend) +#if defined(ZIMAGE) || !defined(CONFIG_CPU_V7) ENDPROC(__divsi3) ENDPROC(__aeabi_idiv) +#else +ENDPROC(___aeabi_idiv) +#endif ENTRY(__modsi3) UNWIND(.fnstart)