From patchwork Wed Sep 23 18:22:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11795449 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D43A76CA for ; Wed, 23 Sep 2020 18:22:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AF14F235FD for ; Wed, 23 Sep 2020 18:22:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600885363; bh=bFlHKsUqlsvmTDL5pqgGS4guHekdp/JPWpXfv0hfy14=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=q8x6c1QGAIds13XjBsJuKdKgyyrShUOsLXa0314ZB+07nhY5KuxcpOUWj+6SfFQvX uzfIWQzw4UmyDUp9WMuM8+T2d0eOeoAepGdMqMMcFyXLGJyBXvzpt/mL7Zm88JCMZs 6a7buem+mzzfgQQXK2vUb2OrRIXl+DT876DMaJ78= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726604AbgIWSWn (ORCPT ); Wed, 23 Sep 2020 14:22:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:33808 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726476AbgIWSWn (ORCPT ); Wed, 23 Sep 2020 14:22:43 -0400 Received: from e123331-lin.nice.arm.com (lfbn-nic-1-188-42.w2-15.abo.wanadoo.fr [2.15.37.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3BC4E2223E; Wed, 23 Sep 2020 18:22:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600885362; bh=bFlHKsUqlsvmTDL5pqgGS4guHekdp/JPWpXfv0hfy14=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=K/P72bHhIZtvjtP/44qQBqfp0GEUma0M//TEcuSrdSJaxLXS2e7/YHWuvMHoKs6Xd 4k7ZY0jtXc1q/yB4LAZaDK4uz5bRbSTGW1GSmw7e1NgX7cIvOE3Op1MPIYxkfz8fpC 3lUhq8pRmQONPQa+cXUZDorOD3p0ezsRJmNbegRc= From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, Ard Biesheuvel , Douglas Anderson , David Laight Subject: [PATCH 1/2] crypto: xor - defer load time benchmark to a later time Date: Wed, 23 Sep 2020 20:22:29 +0200 Message-Id: <20200923182230.22715-2-ardb@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200923182230.22715-1-ardb@kernel.org> References: <20200923182230.22715-1-ardb@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Currently, the XOR module performs its boot time benchmark at core initcall time when it is built-in, to ensure that the RAID code can make use of it when it is built-in as well. Let's defer this to a later stage during the boot, to avoid impacting the overall boot time of the system. Instead, just pick an arbitrary implementation from the list, and use that as the preliminary default. Signed-off-by: Ard Biesheuvel Reviewed-by: Douglas Anderson --- crypto/xor.c | 29 +++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/crypto/xor.c b/crypto/xor.c index ea7349e6ed23..b42c38343733 100644 --- a/crypto/xor.c +++ b/crypto/xor.c @@ -54,6 +54,28 @@ EXPORT_SYMBOL(xor_blocks); /* Set of all registered templates. */ static struct xor_block_template *__initdata template_list; +#ifndef MODULE +static void __init do_xor_register(struct xor_block_template *tmpl) +{ + tmpl->next = template_list; + template_list = tmpl; +} + +static int __init register_xor_blocks(void) +{ + active_template = XOR_SELECT_TEMPLATE(NULL); + + if (!active_template) { +#define xor_speed do_xor_register + // register all the templates and pick the first as the default + XOR_TRY_TEMPLATES; +#undef xor_speed + active_template = template_list; + } + return 0; +} +#endif + #define BENCH_SIZE (PAGE_SIZE) static void __init @@ -129,6 +151,7 @@ calibrate_xor_blocks(void) #define xor_speed(templ) do_xor_speed((templ), b1, b2) printk(KERN_INFO "xor: measuring software checksum speed\n"); + template_list = NULL; XOR_TRY_TEMPLATES; fastest = template_list; for (f = fastest; f; f = f->next) @@ -150,6 +173,10 @@ static __exit void xor_exit(void) { } MODULE_LICENSE("GPL"); +#ifndef MODULE /* when built-in xor.o must initialize before drivers/md/md.o */ -core_initcall(calibrate_xor_blocks); +core_initcall(register_xor_blocks); +#endif + +module_init(calibrate_xor_blocks); module_exit(xor_exit); From patchwork Wed Sep 23 18:22:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 11795451 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 693C66CA for ; Wed, 23 Sep 2020 18:22:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4EE592220D for ; Wed, 23 Sep 2020 18:22:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600885365; bh=0doP0aooxFGz+DXE+9HEDhM0qRnc9fhdhI2cG/YkZ4Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=FTdcf/6JHhMfB5+e8pyfndTqVjQrW5JedG/mazAIZAQ5RT0KXNYkRR13TxwOIPiqp C8tU+YfS8Tktac6PN8Y42XUy7TFzYNoZLdGGa7JtHqvUu1lqkzjwukBsiryevXbrzk AowE4A1UVSo23zJ+D10V//qHOyRiBvnIi49LoQfA= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726636AbgIWSWo (ORCPT ); Wed, 23 Sep 2020 14:22:44 -0400 Received: from mail.kernel.org ([198.145.29.99]:33826 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726476AbgIWSWo (ORCPT ); Wed, 23 Sep 2020 14:22:44 -0400 Received: from e123331-lin.nice.arm.com (lfbn-nic-1-188-42.w2-15.abo.wanadoo.fr [2.15.37.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CC478235F7; Wed, 23 Sep 2020 18:22:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600885364; bh=0doP0aooxFGz+DXE+9HEDhM0qRnc9fhdhI2cG/YkZ4Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eW0TQObXNZLRjFux6mShjzE6nF4fx7+EimVuNwlsVZgctj5IVjkQ74/msuYCzyo3L N/Vyp+GTX2bcVG6Goom4+/JOH0M1dn9uKvgvOumNIF5UnAkfAwiqdCXsZ3DK86Ti8J cRo19sjpxLCiRVq5cwLJAVunqwjtlknu3XCr40VU= From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, Ard Biesheuvel , Douglas Anderson , David Laight Subject: [PATCH 2/2] crypto: xor - use ktime for template benchmarking Date: Wed, 23 Sep 2020 20:22:30 +0200 Message-Id: <20200923182230.22715-3-ardb@kernel.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200923182230.22715-1-ardb@kernel.org> References: <20200923182230.22715-1-ardb@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Currently, we use the jiffies counter as a time source, by staring at it until a HZ period elapses, and then staring at it again and perform as many XOR operations as we can at the same time until another HZ period elapses, so that we can calculate the throughput. This takes longer than necessary, and depends on HZ, which is undesirable, since HZ is system dependent. Let's use the ktime interface instead, and use it to time a fixed number of XOR operations, which can be done much faster, and makes the time spent depend on the performance level of the system itself, which is much more reasonable. On ThunderX2, I get the following results: Before: [72625.956765] xor: measuring software checksum speed [72625.993104] 8regs : 10169.000 MB/sec [72626.033099] 32regs : 12050.000 MB/sec [72626.073095] arm64_neon: 11100.000 MB/sec [72626.073097] xor: using function: 32regs (12050.000 MB/sec) After: [ 2503.189696] xor: measuring software checksum speed [ 2503.189896] 8regs : 10556 MB/sec [ 2503.190061] 32regs : 12538 MB/sec [ 2503.190250] arm64_neon : 11470 MB/sec [ 2503.190252] xor: using function: 32regs (12538 MB/sec) Signed-off-by: Ard Biesheuvel --- crypto/xor.c | 36 ++++++++------------ 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/crypto/xor.c b/crypto/xor.c index b42c38343733..23f98b451b69 100644 --- a/crypto/xor.c +++ b/crypto/xor.c @@ -76,49 +76,43 @@ static int __init register_xor_blocks(void) } #endif -#define BENCH_SIZE (PAGE_SIZE) +#define BENCH_SIZE 4096 +#define REPS 100 static void __init do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2) { int speed; - unsigned long now, j; - int i, count, max; + int i, j, count; + ktime_t min, start, diff; tmpl->next = template_list; template_list = tmpl; preempt_disable(); - /* - * Count the number of XORs done during a whole jiffy, and use - * this to calculate the speed of checksumming. We use a 2-page - * allocation to have guaranteed color L1-cache layout. - */ - max = 0; + min = (ktime_t)S64_MAX; for (i = 0; i < 5; i++) { - j = jiffies; - count = 0; - while ((now = jiffies) == j) - cpu_relax(); - while (time_before(jiffies, now + 1)) { + start = ktime_get(); + for (j = 0; j < REPS; j++) { mb(); /* prevent loop optimzation */ tmpl->do_2(BENCH_SIZE, b1, b2); mb(); count++; mb(); } - if (count > max) - max = count; + diff = ktime_sub(ktime_get(), start); + if (diff < min) + min = diff; } preempt_enable(); - speed = max * (HZ * BENCH_SIZE / 1024); + // bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s] + speed = (1000 * REPS * BENCH_SIZE) / (u32)min; tmpl->speed = speed; - printk(KERN_INFO " %-10s: %5d.%03d MB/sec\n", tmpl->name, - speed / 1000, speed % 1000); + printk(KERN_INFO " %-16s: %5d MB/sec\n", tmpl->name, speed); } static int __init @@ -158,8 +152,8 @@ calibrate_xor_blocks(void) if (f->speed > fastest->speed) fastest = f; - printk(KERN_INFO "xor: using function: %s (%d.%03d MB/sec)\n", - fastest->name, fastest->speed / 1000, fastest->speed % 1000); + printk(KERN_INFO "xor: using function: %s (%d MB/sec)\n", + fastest->name, fastest->speed); #undef xor_speed