From patchwork Tue Apr 20 02:50:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 12213215 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 997C1C433ED for ; Tue, 20 Apr 2021 02:50:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7279B60FE6 for ; Tue, 20 Apr 2021 02:50:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233661AbhDTCvA (ORCPT ); Mon, 19 Apr 2021 22:51:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229508AbhDTCu7 (ORCPT ); Mon, 19 Apr 2021 22:50:59 -0400 Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::4]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3D736C06174A; Mon, 19 Apr 2021 19:50:29 -0700 (PDT) Received: by angie.orcam.me.uk (Postfix, from userid 500) id 8182592009E; Tue, 20 Apr 2021 04:50:28 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 7DA0C92009B; Tue, 20 Apr 2021 04:50:28 +0200 (CEST) Date: Tue, 20 Apr 2021 04:50:28 +0200 (CEST) From: "Maciej W. Rozycki" To: Arnd Bergmann , Thomas Bogendoerfer cc: Huacai Chen , Huacai Chen , Jiaxun Yang , linux-arch@vger.kernel.org, linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/4] lib/math: Add a `do_div' test module In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-mips@vger.kernel.org Implement a module for correctness and performance evaluation for the `do_div' function, often handled in an optimised manner by platform code. Use a somewhat randomly generated set of inputs that is supposed to be representative, using the same set of divisors twice, expressed as a constant and as a variable each, so as to verify the implementation for both cases should they be handled by different code execution paths. Reference results were produced with GNU bc. At the conclusion output the total execution time elapsed. Signed-off-by: Maciej W. Rozycki --- NB there's a `checkpatch.pl' warning for a split line, but I can see no gain from joining it. --- lib/Kconfig.debug | 10 ++ lib/math/Makefile | 2 lib/math/test_div64.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 261 insertions(+) linux-div64-test.diff Index: linux-3maxp-div64/lib/Kconfig.debug =================================================================== --- linux-3maxp-div64.orig/lib/Kconfig.debug +++ linux-3maxp-div64/lib/Kconfig.debug @@ -2027,6 +2027,16 @@ config TEST_SORT If unsure, say N. +config TEST_DIV64 + tristate "64bit/32bit division and modulo test" + depends on DEBUG_KERNEL || m + help + Enable this to turn on 'do_div()' function test. This test is + executed only once during system boot (so affects only boot time), + or at module load time. + + If unsure, say N. + config KPROBES_SANITY_TEST bool "Kprobes sanity tests" depends on DEBUG_KERNEL Index: linux-3maxp-div64/lib/math/Makefile =================================================================== --- linux-3maxp-div64.orig/lib/math/Makefile +++ linux-3maxp-div64/lib/math/Makefile @@ -4,3 +4,5 @@ obj-y += div64.o gcd.o lcm.o int_pow.o i obj-$(CONFIG_CORDIC) += cordic.o obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o obj-$(CONFIG_RATIONAL) += rational.o + +obj-$(CONFIG_TEST_DIV64) += test_div64.o Index: linux-3maxp-div64/lib/math/test_div64.c =================================================================== --- /dev/null +++ linux-3maxp-div64/lib/math/test_div64.c @@ -0,0 +1,249 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2021 Maciej W. Rozycki + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include + +#include + +#define TEST_DIV64_N_ITER 1024 + +static const u64 test_div64_dividents[] = { + 0x00000000ab275080, + 0x0000000fe73c1959, + 0x000000e54c0a74b1, + 0x00000d4398ff1ef9, + 0x0000a18c2ee1c097, + 0x00079fb80b072e4a, + 0x0072db27380dd689, + 0x0842f488162e2284, + 0xf66745411d8ab063, +}; +#define SIZE_DIV64_DIVIDENTS ARRAY_SIZE(test_div64_dividents) + +#define TEST_DIV64_DIVISOR_0 0x00000009 +#define TEST_DIV64_DIVISOR_1 0x0000007c +#define TEST_DIV64_DIVISOR_2 0x00000204 +#define TEST_DIV64_DIVISOR_3 0x0000cb5b +#define TEST_DIV64_DIVISOR_4 0x00010000 +#define TEST_DIV64_DIVISOR_5 0x0008a880 +#define TEST_DIV64_DIVISOR_6 0x003fd3ae +#define TEST_DIV64_DIVISOR_7 0x0b658fac +#define TEST_DIV64_DIVISOR_8 0xdc08b349 + +static const u32 test_div64_divisors[] = { + TEST_DIV64_DIVISOR_0, + TEST_DIV64_DIVISOR_1, + TEST_DIV64_DIVISOR_2, + TEST_DIV64_DIVISOR_3, + TEST_DIV64_DIVISOR_4, + TEST_DIV64_DIVISOR_5, + TEST_DIV64_DIVISOR_6, + TEST_DIV64_DIVISOR_7, + TEST_DIV64_DIVISOR_8, +}; +#define SIZE_DIV64_DIVISORS ARRAY_SIZE(test_div64_divisors) + +static const struct { + u64 quotient; + u32 remainder; +} test_div64_results[SIZE_DIV64_DIVISORS][SIZE_DIV64_DIVIDENTS] = { + { + { 0x0000000013045e47, 0x00000001 }, + { 0x000000000161596c, 0x00000030 }, + { 0x000000000054e9d4, 0x00000130 }, + { 0x000000000000d776, 0x0000278e }, + { 0x000000000000ab27, 0x00005080 }, + { 0x00000000000013c4, 0x0004ce80 }, + { 0x00000000000002ae, 0x001e143c }, + { 0x000000000000000f, 0x0033e56c }, + { 0x0000000000000000, 0xab275080 }, + }, { + { 0x00000001c45c02d1, 0x00000000 }, + { 0x0000000020d5213c, 0x00000049 }, + { 0x0000000007e3d65f, 0x000001dd }, + { 0x0000000000140531, 0x000065ee }, + { 0x00000000000fe73c, 0x00001959 }, + { 0x000000000001d637, 0x0004e5d9 }, + { 0x0000000000003fc9, 0x000713bb }, + { 0x0000000000000165, 0x029abe7d }, + { 0x0000000000000012, 0x6e9f7e37 }, + }, { + { 0x000000197a3a0cf7, 0x00000002 }, + { 0x00000001d9632e5c, 0x00000021 }, + { 0x0000000071c28039, 0x000001cd }, + { 0x000000000120a844, 0x0000b885 }, + { 0x0000000000e54c0a, 0x000074b1 }, + { 0x00000000001a7bb3, 0x00072331 }, + { 0x00000000000397ad, 0x0002c61b }, + { 0x000000000000141e, 0x06ea2e89 }, + { 0x000000000000010a, 0xab002ad7 }, + }, { + { 0x0000017949e37538, 0x00000001 }, + { 0x0000001b62441f37, 0x00000055 }, + { 0x0000000694a3391d, 0x00000085 }, + { 0x0000000010b2a5d2, 0x0000a753 }, + { 0x000000000d4398ff, 0x00001ef9 }, + { 0x0000000001882ec6, 0x0005cbf9 }, + { 0x000000000035333b, 0x0017abdf }, + { 0x00000000000129f1, 0x0ab4520d }, + { 0x0000000000000f6e, 0x8ac0ce9b }, + }, { + { 0x000011f321a74e49, 0x00000006 }, + { 0x0000014d8481d211, 0x0000005b }, + { 0x0000005025cbd92d, 0x000001e3 }, + { 0x00000000cb5e71e3, 0x000043e6 }, + { 0x00000000a18c2ee1, 0x0000c097 }, + { 0x0000000012a88828, 0x00036c97 }, + { 0x000000000287f16f, 0x002c2a25 }, + { 0x00000000000e2cc7, 0x02d581e3 }, + { 0x000000000000bbf4, 0x1ba08c03 }, + }, { + { 0x0000d8db8f72935d, 0x00000005 }, + { 0x00000fbd5aed7a2e, 0x00000002 }, + { 0x000003c84b6ea64a, 0x00000122 }, + { 0x0000000998fa8829, 0x000044b7 }, + { 0x000000079fb80b07, 0x00002e4a }, + { 0x00000000e16b20fa, 0x0002a14a }, + { 0x000000001e940d22, 0x00353b2e }, + { 0x0000000000ab40ac, 0x06fba6ba }, + { 0x000000000008debd, 0x72d98365 }, + }, { + { 0x000cc3045b8fc281, 0x00000000 }, + { 0x0000ed1f48b5c9fc, 0x00000079 }, + { 0x000038fb9c63406a, 0x000000e1 }, + { 0x000000909705b825, 0x00000a62 }, + { 0x00000072db27380d, 0x0000d689 }, + { 0x0000000d43fce827, 0x00082b09 }, + { 0x00000001ccaba11a, 0x0037e8dd }, + { 0x000000000a13f729, 0x0566dffd }, + { 0x000000000085a14b, 0x23d36726 }, + }, { + { 0x00eafeb9c993592b, 0x00000001 }, + { 0x00110e5befa9a991, 0x00000048 }, + { 0x00041947b4a1d36a, 0x000000dc }, + { 0x00000a6679327311, 0x0000c079 }, + { 0x00000842f488162e, 0x00002284 }, + { 0x000000f4459740fc, 0x00084484 }, + { 0x0000002122c47bf9, 0x002ca446 }, + { 0x00000000b9936290, 0x004979c4 }, + { 0x00000000099ca89d, 0x9db446bf }, + }, { + { 0x1b60cece589da1d2, 0x00000001 }, + { 0x01fcb42be1453f5b, 0x0000004f }, + { 0x007a3f2457df0749, 0x0000013f }, + { 0x0001363130e3ec7b, 0x000017aa }, + { 0x0000f66745411d8a, 0x0000b063 }, + { 0x00001c757dfab350, 0x00048863 }, + { 0x000003dc4979c652, 0x00224ea7 }, + { 0x000000159edc3144, 0x06409ab3 }, + { 0x000000011eadfee3, 0xa99c48a8 }, + }, +}; + +static inline bool test_div64_verify(u64 quotient, u32 remainder, int i, int j) +{ + return (quotient == test_div64_results[i][j].quotient && + remainder == test_div64_results[i][j].remainder); +} + +/* + * This needs to be a macro, because we don't want to rely on the compiler + * to do constant propagation, and `do_div' may take a different path for + * constants, so we do want to verify that as well. + */ +#define test_div64_one(divident, divisor, i, j) ({ \ + bool result = true; \ + u64 quotient; \ + u32 remainder; \ + \ + quotient = divident; \ + remainder = do_div(quotient, divisor); \ + if (!test_div64_verify(quotient, remainder, i, j)) { \ + pr_err("ERROR: %016llx / %08x => %016llx,%08x\n", \ + divident, divisor, quotient, remainder); \ + pr_err("ERROR: expected value=> %016llx,%08x\n", \ + test_div64_results[i][j].quotient, \ + test_div64_results[i][j].remainder); \ + result = false; \ + } \ + result; \ +}) + +/* + * Run calculation for the same divisor value expressed as a constant + * and as a variable, so as to verify the implementation for both cases + * should they be handled by different code execution paths. + */ +static bool __init test_div64(void) +{ + u64 divident; + int i, j; + + for (i = 0; i < SIZE_DIV64_DIVIDENTS; i++) { + divident = test_div64_dividents[i]; + if (!test_div64_one(divident, TEST_DIV64_DIVISOR_0, i, 0)) + return false; + if (!test_div64_one(divident, TEST_DIV64_DIVISOR_1, i, 1)) + return false; + if (!test_div64_one(divident, TEST_DIV64_DIVISOR_2, i, 2)) + return false; + if (!test_div64_one(divident, TEST_DIV64_DIVISOR_3, i, 3)) + return false; + if (!test_div64_one(divident, TEST_DIV64_DIVISOR_4, i, 4)) + return false; + if (!test_div64_one(divident, TEST_DIV64_DIVISOR_5, i, 5)) + return false; + if (!test_div64_one(divident, TEST_DIV64_DIVISOR_6, i, 6)) + return false; + if (!test_div64_one(divident, TEST_DIV64_DIVISOR_7, i, 7)) + return false; + if (!test_div64_one(divident, TEST_DIV64_DIVISOR_8, i, 8)) + return false; + for (j = 0; j < SIZE_DIV64_DIVISORS; j++) { + if (!test_div64_one(divident, test_div64_divisors[j], + i, j)) + return false; + } + } + return true; +} + +static int __init test_div64_init(void) +{ + struct timespec64 ts, ts0, ts1; + int i; + + pr_info("Starting 64bit/32bit division and modulo test\n"); + ktime_get_ts64(&ts0); + + for (i = 0; i < TEST_DIV64_N_ITER; i++) + if (!test_div64()) + break; + + ktime_get_ts64(&ts1); + ts = timespec64_sub(ts1, ts0); + pr_info("Completed 64bit/32bit division and modulo test, " + "%llu.%09lus elapsed\n", ts.tv_sec, ts.tv_nsec); + + return 0; +} + +static void __exit test_div64_exit(void) +{ +} + +module_init(test_div64_init); +module_exit(test_div64_exit); + +MODULE_AUTHOR("Maciej W. Rozycki "); +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("64bit/32bit division and modulo test module"); From patchwork Tue Apr 20 02:50:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 12213217 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22BABC433B4 for ; Tue, 20 Apr 2021 02:50:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 00ADE60FE6 for ; Tue, 20 Apr 2021 02:50:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229694AbhDTCvF (ORCPT ); Mon, 19 Apr 2021 22:51:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229508AbhDTCvE (ORCPT ); Mon, 19 Apr 2021 22:51:04 -0400 Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::4]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3D8F6C06174A; Mon, 19 Apr 2021 19:50:34 -0700 (PDT) Received: by angie.orcam.me.uk (Postfix, from userid 500) id 8B4E79200BB; Tue, 20 Apr 2021 04:50:33 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 846FD9200B4; Tue, 20 Apr 2021 04:50:33 +0200 (CEST) Date: Tue, 20 Apr 2021 04:50:33 +0200 (CEST) From: "Maciej W. Rozycki" To: Arnd Bergmann , Thomas Bogendoerfer cc: Huacai Chen , Huacai Chen , Jiaxun Yang , linux-arch@vger.kernel.org, linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/4] div64: Correct inline documentation for `do_div' In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-mips@vger.kernel.org Correct inline documentation for `do_div', which is a function-like macro the `n' parameter of which has the semantics of a C++ reference: it is both read and written in the context of the caller without an explicit dereference such as with a pointer. In the C programming language it has no equivalent for proper functions, in terms of which the documentation expresses the semantics of `do_div', but substituting a pointer in documentation is misleading, and using the C++ notation should at least raise the reader's attention and encourage to seek explanation even if the C++ semantics is not readily understood. While at it observe that "semantics" is an uncountable noun, so refer to it with a singular rather than plural verb. Signed-off-by: Maciej W. Rozycki --- NB there's a `checkpatch.pl' warning for tabs preceded by spaces, but that is just the style of the piece of code quoted and I can see no gain from changing it or worse yet making inconsistent. --- include/asm-generic/div64.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) linux-div64-doc-fix.diff Index: linux-3maxp-div64/include/asm-generic/div64.h =================================================================== --- linux-3maxp-div64.orig/include/asm-generic/div64.h +++ linux-3maxp-div64/include/asm-generic/div64.h @@ -8,12 +8,14 @@ * Optimization for constant divisors on 32-bit machines: * Copyright (C) 2006-2015 Nicolas Pitre * - * The semantics of do_div() are: + * The semantics of do_div() is, in C++ notation, observing that the name + * is a function-like macro and the n parameter has the semantics of a C++ + * reference: * - * uint32_t do_div(uint64_t *n, uint32_t base) + * uint32_t do_div(uint64_t &n, uint32_t base) * { - * uint32_t remainder = *n % base; - * *n = *n / base; + * uint32_t remainder = n % base; + * n = n / base; * return remainder; * } * From patchwork Tue Apr 20 02:50:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 12213219 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D320C433B4 for ; Tue, 20 Apr 2021 02:50:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 50BD760FE6 for ; Tue, 20 Apr 2021 02:50:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233834AbhDTCvM (ORCPT ); Mon, 19 Apr 2021 22:51:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229508AbhDTCvL (ORCPT ); Mon, 19 Apr 2021 22:51:11 -0400 Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::4]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3514DC06174A; Mon, 19 Apr 2021 19:50:41 -0700 (PDT) Received: by angie.orcam.me.uk (Postfix, from userid 500) id 7C8EC9200B4; Tue, 20 Apr 2021 04:50:40 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 79A879200B3; Tue, 20 Apr 2021 04:50:40 +0200 (CEST) Date: Tue, 20 Apr 2021 04:50:40 +0200 (CEST) From: "Maciej W. Rozycki" To: Arnd Bergmann , Thomas Bogendoerfer cc: Huacai Chen , Huacai Chen , Jiaxun Yang , linux-arch@vger.kernel.org, linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH 3/4] MIPS: Reinstate platform `__div64_32' handler In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-mips@vger.kernel.org Our current MIPS platform `__div64_32' handler is inactive, because it is incorrectly only enabled for 64-bit configurations, for which generic `do_div' code does not call it anyway. The handler is not suitable for being called from there though as it only calculates 32 bits of the quotient under the assumption the 64-bit divident has been suitably reduced. Code for such reduction used to be there, however it has been incorrectly removed with commit c21004cd5b4c ("MIPS: Rewrite to work with gcc 4.4.0."), which should have only updated an obsoleted constraint for an inline asm involving $hi and $lo register outputs, while possibly wiring the original MIPS variant of the `do_div' macro as `__div64_32' handler for the generic `do_div' implementation Correct the handler as follows then: - Revert most of the commit referred, however retaining the current formatting, except for the final two instructions of the inline asm sequence, which the original commit missed. Omit the original 64-bit parts though. - Rename the original `do_div' macro to `__div64_32'. Use the combined `x' constraint referring to the MD accumulator as a whole, replacing the original individual `h' and `l' constraints used for $hi and $lo registers respectively, of which `h' has been obsoleted with GCC 4.4. Update surrounding code accordingly. We have since removed support for GCC versions before 4.9, so no need for a special arrangement here; GCC has supported the `x' constraint since forever anyway, or at least going back to 1991. - Rename the `__base' local variable in `__div64_32' to `__radix' to avoid a conflict with a local variable in `do_div'. - Actually enable this code for 32-bit rather than 64-bit configurations by qualifying it with BITS_PER_LONG being 32 instead of 64. Include for this macro rather than as we don't need anything else. - Finally include last rather than first. This has passed correctness verification with test_div64 and reduced the module's average execution time down to 1.0668s and 0.2629s from 2.1529s and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz. For a reference 64-bit `do_div' code where we have the DDIVU instruction available to do the whole calculation right away averages at 0.0660s for the latter CPU. Reported-by: Huacai Chen Signed-off-by: Maciej W. Rozycki Fixes: c21004cd5b4c ("MIPS: Rewrite to work with gcc 4.4.0.") Cc: stable@vger.kernel.org # v2.6.30+ --- Our handcrafted handler seems to run at ~25% of the performance of the 64-bit hardware instruction; not too bad I would say. Though there's likely some overhead from surrounding code that interferes with the figures. Then there are a couple of `checkpatch.pl' nits about trailing whitespace in inline asm, which however makes it more readable. So the change stays as it is. --- arch/mips/include/asm/div64.h | 57 ++++++++++++++++++++++++++++++------------ 1 file changed, 41 insertions(+), 16 deletions(-) linux-mips-div64-generic-fix.diff Index: linux-3maxp-div64/arch/mips/include/asm/div64.h =================================================================== --- linux-3maxp-div64.orig/arch/mips/include/asm/div64.h +++ linux-3maxp-div64/arch/mips/include/asm/div64.h @@ -1,5 +1,5 @@ /* - * Copyright (C) 2000, 2004 Maciej W. Rozycki + * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki * Copyright (C) 2003, 07 Ralf Baechle (ralf@linux-mips.org) * * This file is subject to the terms and conditions of the GNU General Public @@ -9,25 +9,18 @@ #ifndef __ASM_DIV64_H #define __ASM_DIV64_H -#include - -#if BITS_PER_LONG == 64 +#include -#include +#if BITS_PER_LONG == 32 /* * No traps on overflows for any of these... */ -#define __div64_32(n, base) \ -({ \ +#define do_div64_32(res, high, low, base) ({ \ unsigned long __cf, __tmp, __tmp2, __i; \ unsigned long __quot32, __mod32; \ - unsigned long __high, __low; \ - unsigned long long __n; \ \ - __high = *__n >> 32; \ - __low = __n; \ __asm__( \ " .set push \n" \ " .set noat \n" \ @@ -51,18 +44,50 @@ " subu %0, %0, %z6 \n" \ " addiu %2, %2, 1 \n" \ "3: \n" \ - " bnez %4, 0b\n\t" \ - " srl %5, %1, 0x1f\n\t" \ + " bnez %4, 0b \n" \ + " srl %5, %1, 0x1f \n" \ " .set pop" \ : "=&r" (__mod32), "=&r" (__tmp), \ "=&r" (__quot32), "=&r" (__cf), \ "=&r" (__i), "=&r" (__tmp2) \ - : "Jr" (base), "0" (__high), "1" (__low)); \ + : "Jr" (base), "0" (high), "1" (low)); \ \ - (__n) = __quot32; \ + (res) = __quot32; \ __mod32; \ }) -#endif /* BITS_PER_LONG == 64 */ +#define __div64_32(n, base) ({ \ + unsigned long __upper, __low, __high, __radix; \ + unsigned long long __modquot; \ + unsigned long long __quot; \ + unsigned long long __div; \ + unsigned long __mod; \ + \ + __div = (*n); \ + __radix = (base); \ + \ + __high = __div >> 32; \ + __low = __div; \ + __upper = __high; \ + \ + if (__high) { \ + __asm__("divu $0, %z1, %z2" \ + : "=x" (__modquot) \ + : "Jr" (__high), "Jr" (__radix)); \ + __upper = __modquot >> 32; \ + __high = __modquot; \ + } \ + \ + __mod = do_div64_32(__low, __upper, __low, __radix); \ + \ + __quot = __high; \ + __quot = __quot << 32 | __low; \ + (*n) = __quot; \ + __mod; \ +}) + +#endif /* BITS_PER_LONG == 32 */ + +#include #endif /* __ASM_DIV64_H */ From patchwork Tue Apr 20 02:50:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 12213221 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 574EAC433B4 for ; Tue, 20 Apr 2021 02:50:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 39FB561220 for ; Tue, 20 Apr 2021 02:50:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229667AbhDTCvW (ORCPT ); Mon, 19 Apr 2021 22:51:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229508AbhDTCvV (ORCPT ); Mon, 19 Apr 2021 22:51:21 -0400 Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::4]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ADB8AC06174A; Mon, 19 Apr 2021 19:50:49 -0700 (PDT) Received: by angie.orcam.me.uk (Postfix, from userid 500) id 07E0D9200BC; Tue, 20 Apr 2021 04:50:49 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 03A6C92009D; Tue, 20 Apr 2021 04:50:48 +0200 (CEST) Date: Tue, 20 Apr 2021 04:50:48 +0200 (CEST) From: "Maciej W. Rozycki" To: Arnd Bergmann , Thomas Bogendoerfer cc: Huacai Chen , Huacai Chen , Jiaxun Yang , linux-arch@vger.kernel.org, linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 4/4] MIPS: Avoid DIVU in `__div64_32' is result would be zero In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-mips@vger.kernel.org We already check the high part of the divident against zero to avoid the costly DIVU instruction in that case, needed to reduce the high part of the divident, so we may well check against the divisor instead and set the high part of the quotient to zero right away. We need to treat the high part the divident in that case though as the remainder that would be calculated by the DIVU instruction we avoided. This has passed correctness verification with test_div64 and reduced the module's average execution time down to 1.0445s and 0.2619s from 1.0668s and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz. Signed-off-by: Maciej W. Rozycki --- I have made an experimental change on top of this to put `__div64_32' out of line, and that increases the averages respectively up to 1.0785s and 0.2705s. Not a terrible loss, especially compared to generic times quoted with 3/4, but still, so I think it would best be made where optimising for size, as noted in the cover letter. --- arch/mips/include/asm/div64.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-3maxp-div64/arch/mips/include/asm/div64.h =================================================================== --- linux-3maxp-div64.orig/arch/mips/include/asm/div64.h +++ linux-3maxp-div64/arch/mips/include/asm/div64.h @@ -68,9 +68,11 @@ \ __high = __div >> 32; \ __low = __div; \ - __upper = __high; \ \ - if (__high) { \ + if (__high < __radix) { \ + __upper = __high; \ + __high = 0; \ + } else { \ __asm__("divu $0, %z1, %z2" \ : "=x" (__modquot) \ : "Jr" (__high), "Jr" (__radix)); \