From patchwork Sat Sep 26 10:26:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11801279 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 70C8A92C for ; Sat, 26 Sep 2020 10:27:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56DEA2395C for ; Sat, 26 Sep 2020 10:27:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601116020; bh=hyr7KlbrQjTIrBXGGORmyQbaJrD1a0wNg1Gzn4Ce0Js=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=ZkCrlCgF/9c3UdgyUK7va/9r525pkqVMzyepPLSGcoo4xxSkdgzuqFhMwWnnFdxZf JIlbu/gRoiy0bQIv3zYWz41IRaq3vMZJZxNf8MhF7hD1mIiz1sSiG5iKisZ2jtt4AX M5r0u/DVvwt26ywCf2wlFVsHJUyu6OaN169hsWlc= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726923AbgIZK07 (ORCPT ); Sat, 26 Sep 2020 06:26:59 -0400 Received: from mail.kernel.org ([198.145.29.99]:41644 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726210AbgIZK07 (ORCPT ); Sat, 26 Sep 2020 06:26:59 -0400 Received: from e123331-lin.nice.arm.com (lfbn-nic-1-188-42.w2-15.abo.wanadoo.fr [2.15.37.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CD11B238E6; Sat, 26 Sep 2020 10:26:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601116019; bh=hyr7KlbrQjTIrBXGGORmyQbaJrD1a0wNg1Gzn4Ce0Js=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mPXT58qqdDuEWaH4Ne2Iv8zSwyzpWqXdYTqyQ+aYQvoJNt6hY34ksebLWAMVG9OEC lFaSI6aizV5CwOkRLsxNFB4LIDDt7dGDopQwEj9sHJenx1QQRRGOJMchEwygfalZQR O06lNUMd1PGXjW1AYOpFmZ9E8PY09m8pSSCTDCPw= From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, Ard Biesheuvel , Douglas Anderson , David Laight Subject: [PATCH v2 1/2] crypto: xor - defer load time benchmark to a later time Date: Sat, 26 Sep 2020 12:26:50 +0200 Message-Id: <20200926102651.31598-2-ardb@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200926102651.31598-1-ardb@kernel.org> References: <20200926102651.31598-1-ardb@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Currently, the XOR module performs its boot time benchmark at core initcall time when it is built-in, to ensure that the RAID code can make use of it when it is built-in as well. Let's defer this to a later stage during the boot, to avoid impacting the overall boot time of the system. Instead, just pick an arbitrary implementation from the list, and use that as the preliminary default. Reviewed-by: Douglas Anderson Signed-off-by: Ard Biesheuvel --- crypto/xor.c | 29 +++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/crypto/xor.c b/crypto/xor.c index ea7349e6ed23..b42c38343733 100644 --- a/crypto/xor.c +++ b/crypto/xor.c @@ -54,6 +54,28 @@ EXPORT_SYMBOL(xor_blocks); /* Set of all registered templates. */ static struct xor_block_template *__initdata template_list; +#ifndef MODULE +static void __init do_xor_register(struct xor_block_template *tmpl) +{ + tmpl->next = template_list; + template_list = tmpl; +} + +static int __init register_xor_blocks(void) +{ + active_template = XOR_SELECT_TEMPLATE(NULL); + + if (!active_template) { +#define xor_speed do_xor_register + // register all the templates and pick the first as the default + XOR_TRY_TEMPLATES; +#undef xor_speed + active_template = template_list; + } + return 0; +} +#endif + #define BENCH_SIZE (PAGE_SIZE) static void __init @@ -129,6 +151,7 @@ calibrate_xor_blocks(void) #define xor_speed(templ) do_xor_speed((templ), b1, b2) printk(KERN_INFO "xor: measuring software checksum speed\n"); + template_list = NULL; XOR_TRY_TEMPLATES; fastest = template_list; for (f = fastest; f; f = f->next) @@ -150,6 +173,10 @@ static __exit void xor_exit(void) { } MODULE_LICENSE("GPL"); +#ifndef MODULE /* when built-in xor.o must initialize before drivers/md/md.o */ -core_initcall(calibrate_xor_blocks); +core_initcall(register_xor_blocks); +#endif + +module_init(calibrate_xor_blocks); module_exit(xor_exit); From patchwork Sat Sep 26 10:26:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11801281 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8FED692C for ; Sat, 26 Sep 2020 10:27:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 76D34238E6 for ; Sat, 26 Sep 2020 10:27:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601116024; bh=/9CtSjxFQfSsddr24DMLGCw8WbTrj3ejM5RVjJ3ZANw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=GndZxE5tuhldNnPTDUU1ABQFxphEn8L8+3IVAQ5hXOrvpAUi9uYYjTSgF/F5IweU0 OjoGwQ4qN0eWPsmOLAfw0IQj0vtzB7cNAa8YfAZWQoutRRgeY1KthsPqJ2NoRDVxyD rf3NoHeY3jnl26+GG7s8n+RVi5RAqkEx+f7Gyqcw= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726917AbgIZK1D (ORCPT ); Sat, 26 Sep 2020 06:27:03 -0400 Received: from mail.kernel.org ([198.145.29.99]:41684 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726210AbgIZK1B (ORCPT ); Sat, 26 Sep 2020 06:27:01 -0400 Received: from e123331-lin.nice.arm.com (lfbn-nic-1-188-42.w2-15.abo.wanadoo.fr [2.15.37.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6A859238EE; Sat, 26 Sep 2020 10:26:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601116020; bh=/9CtSjxFQfSsddr24DMLGCw8WbTrj3ejM5RVjJ3ZANw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=0iiZ9/JKV0Q40qTMk+STOwH8vSE2/AmyRQcRXMcF5veebm5WS1deY2cRr9M2CbU/y 20QJaNOYa4eDKHORE/FtwjlrKkexEB6AzyOfSdz1aCrNMap7k5+eVR7eUl08+EhsnM 7PzjNqDLA3PDvRVRfQ1mygImdW3fOVdSuel4hmO8= From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, Ard Biesheuvel , Douglas Anderson , David Laight Subject: [PATCH v2 2/2] crypto: xor - use ktime for template benchmarking Date: Sat, 26 Sep 2020 12:26:51 +0200 Message-Id: <20200926102651.31598-3-ardb@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200926102651.31598-1-ardb@kernel.org> References: <20200926102651.31598-1-ardb@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Currently, we use the jiffies counter as a time source, by staring at it until a HZ period elapses, and then staring at it again and perform as many XOR operations as we can at the same time until another HZ period elapses, so that we can calculate the throughput. This takes longer than necessary, and depends on HZ, which is undesirable, since HZ is system dependent. Let's use the ktime interface instead, and use it to time a fixed number of XOR operations, which can be done much faster, and makes the time spent depend on the performance level of the system itself, which is much more reasonable. To ensure that we have the resolution we need even on systems with 32 kHz time sources, while not spending too much time in the benchmark on a slow CPU, let's switch to 3 attempts of 800 repetitions each: that way, we will only misidentify algorithms that perform within 10% of each other as the fastest if they are faster than 10 GB/s to begin with, which is not expected to occur on systems with such coarse clocks. On ThunderX2, I get the following results: Before: [72625.956765] xor: measuring software checksum speed [72625.993104] 8regs : 10169.000 MB/sec [72626.033099] 32regs : 12050.000 MB/sec [72626.073095] arm64_neon: 11100.000 MB/sec [72626.073097] xor: using function: 32regs (12050.000 MB/sec) After: [72599.650216] xor: measuring software checksum speed [72599.651188] 8regs : 10491 MB/sec [72599.652006] 32regs : 12345 MB/sec [72599.652871] arm64_neon : 11402 MB/sec [72599.652873] xor: using function: 32regs (12345 MB/sec) Link: https://lore.kernel.org/linux-crypto/20200923182230.22715-3-ardb@kernel.org/ Signed-off-by: Ard Biesheuvel Reviewed-by: Douglas Anderson --- crypto/xor.c | 38 +++++++++----------- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/crypto/xor.c b/crypto/xor.c index b42c38343733..a0badbc03577 100644 --- a/crypto/xor.c +++ b/crypto/xor.c @@ -76,49 +76,43 @@ static int __init register_xor_blocks(void) } #endif -#define BENCH_SIZE (PAGE_SIZE) +#define BENCH_SIZE 4096 +#define REPS 800U static void __init do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2) { int speed; - unsigned long now, j; - int i, count, max; + int i, j, count; + ktime_t min, start, diff; tmpl->next = template_list; template_list = tmpl; preempt_disable(); - /* - * Count the number of XORs done during a whole jiffy, and use - * this to calculate the speed of checksumming. We use a 2-page - * allocation to have guaranteed color L1-cache layout. - */ - max = 0; - for (i = 0; i < 5; i++) { - j = jiffies; - count = 0; - while ((now = jiffies) == j) - cpu_relax(); - while (time_before(jiffies, now + 1)) { + min = (ktime_t)S64_MAX; + for (i = 0; i < 3; i++) { + start = ktime_get(); + for (j = 0; j < REPS; j++) { mb(); /* prevent loop optimzation */ tmpl->do_2(BENCH_SIZE, b1, b2); mb(); count++; mb(); } - if (count > max) - max = count; + diff = ktime_sub(ktime_get(), start); + if (diff < min) + min = diff; } preempt_enable(); - speed = max * (HZ * BENCH_SIZE / 1024); + // bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s] + speed = (1000 * REPS * BENCH_SIZE) / (unsigned int)ktime_to_ns(min); tmpl->speed = speed; - printk(KERN_INFO " %-10s: %5d.%03d MB/sec\n", tmpl->name, - speed / 1000, speed % 1000); + pr_info(" %-16s: %5d MB/sec\n", tmpl->name, speed); } static int __init @@ -158,8 +152,8 @@ calibrate_xor_blocks(void) if (f->speed > fastest->speed) fastest = f; - printk(KERN_INFO "xor: using function: %s (%d.%03d MB/sec)\n", - fastest->name, fastest->speed / 1000, fastest->speed % 1000); + pr_info("xor: using function: %s (%d MB/sec)\n", + fastest->name, fastest->speed); #undef xor_speed