From patchwork Mon Dec 19 22:02:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077189 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BC93C4167B for ; Mon, 19 Dec 2022 22:03:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232385AbiLSWDg (ORCPT ); Mon, 19 Dec 2022 17:03:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232852AbiLSWDU (ORCPT ); Mon, 19 Dec 2022 17:03:20 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9045A140C7; Mon, 19 Dec 2022 14:03:19 -0800 (PST) Received: from pps.filterd (m0134422.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJKsZRw018294; Mon, 19 Dec 2022 22:02:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=I6XKT/Pcf2+1kXWaQqbl3UWdCnYWXaqzJLkuul+7CVE=; b=D4P6AjoZHDeS94X51/5MPD7RQ8nRac65Iyn+ZTKFVfPanU0/oMOsd8ixWba7jamocAzi y38XUKZzvJpszmLjyQ3uQknffSNFQlIrbCvWzuarR/mHjvVQWyQDWo64Dq9+sB1pQtJM KjegFI2S18Hbkq05ByYKyB8cSPSYNpD6REA7POOabUZOqP/0fcB+NN2oDkCFWHKo2ClF k4l6lEIOgUnMWkeA/mHJfKSYojyU6z5W3JCfvuJ3qr8rZ6DL6p9BDoEhN0xQiyn5RlC1 MJnbgQtCYZ185A54VHaNlBDGtkUR0y62l8o4iIJtlHEBEgCbsRkK7ZvDa2E/Do+B40cg Ew== Received: from p1lg14880.it.hpe.com (p1lg14880.it.hpe.com [16.230.97.201]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 3mjyd9rckj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:44 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14880.it.hpe.com (Postfix) with ESMTPS id C5288807116; Mon, 19 Dec 2022 22:02:43 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id EC3FA805634; Mon, 19 Dec 2022 22:02:42 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 01/13] x86: protect simd.h header file Date: Mon, 19 Dec 2022 16:02:11 -0600 Message-Id: <20221219220223.3982176-2-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: k_vQFl1c6KrTHEMMA5Ho4JlnNYZ_Q4NU X-Proofpoint-GUID: k_vQFl1c6KrTHEMMA5Ho4JlnNYZ_Q4NU X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=926 clxscore=1015 bulkscore=0 adultscore=0 malwarescore=0 spamscore=0 impostorscore=0 mlxscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add the usual #ifndef/#define construct around the contents of simd.h so it doesn't confuse the C pre-processor if included by multiple include files. Fixes: 801201aa2564 ("crypto: move x86 to the generic version of ablk_helper") Signed-off-by: Robert Elliott --- arch/x86/include/asm/simd.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/include/asm/simd.h b/arch/x86/include/asm/simd.h index a341c878e977..bd9c672a2792 100644 --- a/arch/x86/include/asm/simd.h +++ b/arch/x86/include/asm/simd.h @@ -1,4 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SIMD_H +#define _ASM_X86_SIMD_H #include @@ -10,3 +12,5 @@ static __must_check inline bool may_use_simd(void) { return irq_fpu_usable(); } + +#endif /* _ASM_X86_SIMD_H */ From patchwork Mon Dec 19 22:02:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077190 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DF1BC4332F for ; Mon, 19 Dec 2022 22:03:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232761AbiLSWDh (ORCPT ); Mon, 19 Dec 2022 17:03:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232840AbiLSWDU (ORCPT ); Mon, 19 Dec 2022 17:03:20 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D42F140C6; Mon, 19 Dec 2022 14:03:19 -0800 (PST) Received: from pps.filterd (m0148663.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJL5BJ4004423; Mon, 19 Dec 2022 22:02:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=DUz+WsoLUu9ZHA+z71TaCHVliVtTYVa80yxAm1HKF4g=; b=RW6sKc9BEWBI+rYS8eckIaK7NhIc+uCvrxlgKBPBVyU17Ir5g1ATwZVhFr8RV4hy2IO1 pNjJM+Kkt1DlUldQ3RS3phZwXdOUnoRSP2yxdAloCK0r5p3BhHgYhIakYGlriIvCUVuG hYKeXIivWtOcwMRDjovRrbjxQv3P3qRNTp7HFo6GK+smWP5+aaPh8lRwWG+O6cPG/9Kh YBUq2metVJc6XBwcbOmp4ASPSh4IonJD7FVC3DmsiFE7aMxcb3T11YvD7gsrQekRZHli A9KcA1NuAHJVzUho6WNoSqc6Dozkwlw3OLwLcVNgbFJdRmwpjJxiotcUtvghs8rDjdv7 Lg== Received: from p1lg14879.it.hpe.com (p1lg14879.it.hpe.com [16.230.97.200]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3mjyd5rcwj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:47 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id 3735731099; Mon, 19 Dec 2022 22:02:46 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 5F9DF8061BF; Mon, 19 Dec 2022 22:02:45 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 02/13] x86: add yield FPU context utility function Date: Mon, 19 Dec 2022 16:02:12 -0600 Message-Id: <20221219220223.3982176-3-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-GUID: Gvjh4KDpMzFEWSzZRj7nTCTBt22lX88l X-Proofpoint-ORIG-GUID: Gvjh4KDpMzFEWSzZRj7nTCTBt22lX88l X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 phishscore=0 malwarescore=0 clxscore=1015 mlxlogscore=709 bulkscore=0 priorityscore=1501 adultscore=0 impostorscore=0 mlxscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add a function that may be called to avoid hogging the CPU between kernel_fpu_begin() and kernel_fpu_end() calls. Signed-off-by: Robert Elliott --- arch/x86/include/asm/simd.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/arch/x86/include/asm/simd.h b/arch/x86/include/asm/simd.h index bd9c672a2792..2c887dec95a2 100644 --- a/arch/x86/include/asm/simd.h +++ b/arch/x86/include/asm/simd.h @@ -3,6 +3,7 @@ #define _ASM_X86_SIMD_H #include +#include /* * may_use_simd - whether it is allowable at this time to issue SIMD @@ -13,4 +14,22 @@ static __must_check inline bool may_use_simd(void) return irq_fpu_usable(); } +/** + * kernel_fpu_relax - pause FPU preemption if scheduler wants + * + * Call this periodically during long loops between kernel_fpu_begin() + * and kernel_fpu_end() calls to avoid hogging the CPU if the + * scheduler wants to use the CPU for another thread + * + * Return: none + */ +static inline void kernel_fpu_yield(void) +{ + if (need_resched()) { + kernel_fpu_end(); + cond_resched(); + kernel_fpu_begin(); + } +} + #endif /* _ASM_X86_SIMD_H */ From patchwork Mon Dec 19 22:02:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077198 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE73CC10F1D for ; Mon, 19 Dec 2022 22:03:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232924AbiLSWD4 (ORCPT ); Mon, 19 Dec 2022 17:03:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232885AbiLSWDf (ORCPT ); Mon, 19 Dec 2022 17:03:35 -0500 Received: from mx0b-002e3701.pphosted.com (mx0b-002e3701.pphosted.com [148.163.143.35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 904AB140D4; Mon, 19 Dec 2022 14:03:20 -0800 (PST) Received: from pps.filterd (m0148664.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJLIRAa027251; Mon, 19 Dec 2022 22:02:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : mime-version; s=pps0720; bh=z/RHuqgk6c33JWJbpAkTO+kVc65c62TXc795SLcxXos=; b=XDIm92EbVglvc35vMp9ryw8mKvdIdK2A4WZ5BtwLCpSHIHdOM2ef1cJDc0MpKRNbQvI1 UBIlwwkzlyVzP38kQ+uywu1qZAQxWEuKQgee86BKIOpSO7a/jNdGD/iUCBO8GfTZzaXV ekrj9WNaDzP+LCUKoK7kIsF3M07LU0vSLU0qB2gw2uxTGyOPB9hfK8WjdEG39h95+NOF O/JkD2/WC/yq3PGSWfyHMhbs6blyGuriC8ltJT51XgZmuyCpdnTrRmtl2xI0ekogbTNH fI/L6LbYI7Kw6/S8sR78FpbUCkaUTd3OcGaFYi4VA/RffeGP45+US4Tue79pElaZbGZq YA== Received: from p1lg14879.it.hpe.com (p1lg14879.it.hpe.com [16.230.97.200]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 3mjyrar6gy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:48 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id DB234310BD; Mon, 19 Dec 2022 22:02:47 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 16A9F8052C3; Mon, 19 Dec 2022 22:02:47 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 03/13] crypto: x86/sha - yield FPU context during long loops Date: Mon, 19 Dec 2022 16:02:13 -0600 Message-Id: <20221219220223.3982176-4-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> X-Proofpoint-GUID: MsXbanzF6xtEWQ2FuoUBYmcrUNTB_Ceu X-Proofpoint-ORIG-GUID: MsXbanzF6xtEWQ2FuoUBYmcrUNTB_Ceu X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 adultscore=0 suspectscore=0 spamscore=0 clxscore=1015 mlxscore=0 impostorscore=0 malwarescore=0 bulkscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190194 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The x86 assembly language implementations using SIMD process data between kernel_fpu_begin() and kernel_fpu_end() calls. That disables scheduler preemption, so prevents the CPU core from being used by other threads. The update() and finup() functions might be called to process large quantities of data, which can result in RCU stalls and soft lockups. Periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. Fixes: 66be89515888 ("crypto: sha1 - SSSE3 based SHA1 implementation for x86-64") Fixes: 8275d1aa6422 ("crypto: sha256 - Create module providing optimized SHA256 routines using SSSE3, AVX or AVX2 instructions.") Fixes: 87de4579f92d ("crypto: sha512 - Create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions.") Fixes: aa031b8f702e ("crypto: x86/sha512 - load based on CPU features") Suggested-by: Herbert Xu Signed-off-by: Robert Elliott --- arch/x86/crypto/sha1_avx2_x86_64_asm.S | 6 +- arch/x86/crypto/sha1_ni_asm.S | 8 +- arch/x86/crypto/sha1_ssse3_glue.c | 120 ++++++++++++++++++++----- arch/x86/crypto/sha256_ni_asm.S | 8 +- arch/x86/crypto/sha256_ssse3_glue.c | 115 +++++++++++++++++++----- arch/x86/crypto/sha512_ssse3_glue.c | 89 ++++++++++++++---- 6 files changed, 277 insertions(+), 69 deletions(-) diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S b/arch/x86/crypto/sha1_avx2_x86_64_asm.S index c3ee9334cb0f..df03fbb2c42c 100644 --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S @@ -58,9 +58,9 @@ /* * SHA-1 implementation with Intel(R) AVX2 instruction set extensions. * - *This implementation is based on the previous SSSE3 release: - *Visit http://software.intel.com/en-us/articles/ - *and refer to improving-the-performance-of-the-secure-hash-algorithm-1/ + * This implementation is based on the previous SSSE3 release: + * Visit http://software.intel.com/en-us/articles/ + * and refer to improving-the-performance-of-the-secure-hash-algorithm-1/ * */ diff --git a/arch/x86/crypto/sha1_ni_asm.S b/arch/x86/crypto/sha1_ni_asm.S index a69595b033c8..d513b85e242c 100644 --- a/arch/x86/crypto/sha1_ni_asm.S +++ b/arch/x86/crypto/sha1_ni_asm.S @@ -75,7 +75,7 @@ .text /** - * sha1_ni_transform - Calculate SHA1 hash using the x86 SHA-NI feature set + * sha1_transform_ni - Calculate SHA1 hash using the x86 SHA-NI feature set * @digest: address of current 20-byte hash value (%rdi, DIGEST_PTR macro) * @data: address of data (%rsi, DATA_PTR macro); * data size must be a multiple of 64 bytes @@ -94,9 +94,9 @@ * The non-indented lines are instructions related to the message schedule. * * Return: none - * Prototype: asmlinkage void sha1_ni_transform(u32 *digest, const u8 *data, int blocks) + * Prototype: asmlinkage void sha1_transform_ni(u32 *digest, const u8 *data, int blocks) */ -SYM_TYPED_FUNC_START(sha1_ni_transform) +SYM_TYPED_FUNC_START(sha1_transform_ni) push %rbp mov %rsp, %rbp sub $FRAME_SIZE, %rsp @@ -294,7 +294,7 @@ SYM_TYPED_FUNC_START(sha1_ni_transform) pop %rbp RET -SYM_FUNC_END(sha1_ni_transform) +SYM_FUNC_END(sha1_transform_ni) .section .rodata.cst16.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 16 .align 16 diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c index 44340a1139e0..b269b455fbbe 100644 --- a/arch/x86/crypto/sha1_ssse3_glue.c +++ b/arch/x86/crypto/sha1_ssse3_glue.c @@ -41,9 +41,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, */ BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0); - kernel_fpu_begin(); sha1_base_do_update(desc, data, len, sha1_xform); - kernel_fpu_end(); return 0; } @@ -54,28 +52,46 @@ static int sha1_finup(struct shash_desc *desc, const u8 *data, if (!crypto_simd_usable()) return crypto_sha1_finup(desc, data, len, out); - kernel_fpu_begin(); if (len) sha1_base_do_update(desc, data, len, sha1_xform); sha1_base_do_finalize(desc, sha1_xform); - kernel_fpu_end(); return sha1_base_finish(desc, out); } -asmlinkage void sha1_transform_ssse3(struct sha1_state *state, - const u8 *data, int blocks); +asmlinkage void sha1_transform_ssse3(u32 *digest, const u8 *data, int blocks); + +void __sha1_transform_ssse3(struct sha1_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA1_BLOCK_SIZE); + + sha1_transform_ssse3(state->state, data, chunks); + data += chunks * SHA1_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static int sha1_ssse3_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return sha1_update(desc, data, len, sha1_transform_ssse3); + return sha1_update(desc, data, len, __sha1_transform_ssse3); } static int sha1_ssse3_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return sha1_finup(desc, data, len, out, sha1_transform_ssse3); + return sha1_finup(desc, data, len, out, __sha1_transform_ssse3); } /* Add padding and return the message digest. */ @@ -113,19 +129,39 @@ static void unregister_sha1_ssse3(void) crypto_unregister_shash(&sha1_ssse3_alg); } -asmlinkage void sha1_transform_avx(struct sha1_state *state, - const u8 *data, int blocks); +asmlinkage void sha1_transform_avx(u32 *digest, const u8 *data, int blocks); + +void __sha1_transform_avx(struct sha1_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA1_BLOCK_SIZE); + + sha1_transform_avx(state->state, data, chunks); + data += chunks * SHA1_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static int sha1_avx_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return sha1_update(desc, data, len, sha1_transform_avx); + return sha1_update(desc, data, len, __sha1_transform_avx); } static int sha1_avx_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return sha1_finup(desc, data, len, out, sha1_transform_avx); + return sha1_finup(desc, data, len, out, __sha1_transform_avx); } static int sha1_avx_final(struct shash_desc *desc, u8 *out) @@ -175,8 +211,28 @@ static void unregister_sha1_avx(void) #define SHA1_AVX2_BLOCK_OPTSIZE 4 /* optimal 4*64 bytes of SHA1 blocks */ -asmlinkage void sha1_transform_avx2(struct sha1_state *state, - const u8 *data, int blocks); +asmlinkage void sha1_transform_avx2(u32 *digest, const u8 *data, int blocks); + +void __sha1_transform_avx2(struct sha1_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA1_BLOCK_SIZE); + + sha1_transform_avx2(state->state, data, chunks); + data += chunks * SHA1_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static bool avx2_usable(void) { @@ -193,9 +249,9 @@ static void sha1_apply_transform_avx2(struct sha1_state *state, { /* Select the optimal transform based on data block size */ if (blocks >= SHA1_AVX2_BLOCK_OPTSIZE) - sha1_transform_avx2(state, data, blocks); + __sha1_transform_avx2(state, data, blocks); else - sha1_transform_avx(state, data, blocks); + __sha1_transform_avx(state, data, blocks); } static int sha1_avx2_update(struct shash_desc *desc, const u8 *data, @@ -245,19 +301,39 @@ static void unregister_sha1_avx2(void) } #ifdef CONFIG_AS_SHA1_NI -asmlinkage void sha1_ni_transform(struct sha1_state *digest, const u8 *data, - int rounds); +asmlinkage void sha1_transform_ni(u32 *digest, const u8 *data, int rounds); + +void __sha1_transform_ni(struct sha1_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA1_BLOCK_SIZE); + + sha1_transform_ni(state->state, data, chunks); + data += chunks * SHA1_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static int sha1_ni_update(struct shash_desc *desc, const u8 *data, - unsigned int len) + unsigned int len) { - return sha1_update(desc, data, len, sha1_ni_transform); + return sha1_update(desc, data, len, __sha1_transform_ni); } static int sha1_ni_finup(struct shash_desc *desc, const u8 *data, - unsigned int len, u8 *out) + unsigned int len, u8 *out) { - return sha1_finup(desc, data, len, out, sha1_ni_transform); + return sha1_finup(desc, data, len, out, __sha1_transform_ni); } static int sha1_ni_final(struct shash_desc *desc, u8 *out) diff --git a/arch/x86/crypto/sha256_ni_asm.S b/arch/x86/crypto/sha256_ni_asm.S index e7a3b9939327..29458ec970a9 100644 --- a/arch/x86/crypto/sha256_ni_asm.S +++ b/arch/x86/crypto/sha256_ni_asm.S @@ -79,7 +79,7 @@ .text /** - * sha256_ni_transform - Calculate SHA256 hash using the x86 SHA-NI feature set + * sha256_transform_ni - Calculate SHA256 hash using the x86 SHA-NI feature set * @digest: address of current 32-byte hash value (%rdi, DIGEST_PTR macro) * @data: address of data (%rsi, DATA_PTR macro); * data size must be a multiple of 64 bytes @@ -98,9 +98,9 @@ * The non-indented lines are instructions related to the message schedule. * * Return: none - * Prototype: asmlinkage void sha256_ni_transform(u32 *digest, const u8 *data, int blocks) + * Prototype: asmlinkage void sha256_transform_ni(u32 *digest, const u8 *data, int blocks) */ -SYM_TYPED_FUNC_START(sha256_ni_transform) +SYM_TYPED_FUNC_START(sha256_transform_ni) shl $6, NUM_BLKS /* convert to bytes */ jz .Ldone_hash add DATA_PTR, NUM_BLKS /* pointer to end of data */ @@ -329,7 +329,7 @@ SYM_TYPED_FUNC_START(sha256_ni_transform) .Ldone_hash: RET -SYM_FUNC_END(sha256_ni_transform) +SYM_FUNC_END(sha256_transform_ni) .section .rodata.cst256.K256, "aM", @progbits, 256 .align 64 diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c index 3a5f6be7dbba..43927cf3d06e 100644 --- a/arch/x86/crypto/sha256_ssse3_glue.c +++ b/arch/x86/crypto/sha256_ssse3_glue.c @@ -40,8 +40,28 @@ #include #include -asmlinkage void sha256_transform_ssse3(struct sha256_state *state, - const u8 *data, int blocks); +asmlinkage void sha256_transform_ssse3(u32 *digest, const u8 *data, int blocks); + +void __sha256_transform_ssse3(struct sha256_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA256_BLOCK_SIZE); + + sha256_transform_ssse3(state->state, data, chunks); + data += chunks * SHA256_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static int _sha256_update(struct shash_desc *desc, const u8 *data, unsigned int len, sha256_block_fn *sha256_xform) @@ -58,9 +78,7 @@ static int _sha256_update(struct shash_desc *desc, const u8 *data, */ BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0); - kernel_fpu_begin(); sha256_base_do_update(desc, data, len, sha256_xform); - kernel_fpu_end(); return 0; } @@ -71,11 +89,9 @@ static int sha256_finup(struct shash_desc *desc, const u8 *data, if (!crypto_simd_usable()) return crypto_sha256_finup(desc, data, len, out); - kernel_fpu_begin(); if (len) sha256_base_do_update(desc, data, len, sha256_xform); sha256_base_do_finalize(desc, sha256_xform); - kernel_fpu_end(); return sha256_base_finish(desc, out); } @@ -83,13 +99,13 @@ static int sha256_finup(struct shash_desc *desc, const u8 *data, static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return _sha256_update(desc, data, len, sha256_transform_ssse3); + return _sha256_update(desc, data, len, __sha256_transform_ssse3); } static int sha256_ssse3_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return sha256_finup(desc, data, len, out, sha256_transform_ssse3); + return sha256_finup(desc, data, len, out, __sha256_transform_ssse3); } /* Add padding and return the message digest. */ @@ -143,19 +159,39 @@ static void unregister_sha256_ssse3(void) ARRAY_SIZE(sha256_ssse3_algs)); } -asmlinkage void sha256_transform_avx(struct sha256_state *state, - const u8 *data, int blocks); +asmlinkage void sha256_transform_avx(u32 *digest, const u8 *data, int blocks); + +void __sha256_transform_avx(struct sha256_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA256_BLOCK_SIZE); + + sha256_transform_avx(state->state, data, chunks); + data += chunks * SHA256_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static int sha256_avx_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return _sha256_update(desc, data, len, sha256_transform_avx); + return _sha256_update(desc, data, len, __sha256_transform_avx); } static int sha256_avx_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return sha256_finup(desc, data, len, out, sha256_transform_avx); + return sha256_finup(desc, data, len, out, __sha256_transform_avx); } static int sha256_avx_final(struct shash_desc *desc, u8 *out) @@ -219,19 +255,39 @@ static void unregister_sha256_avx(void) ARRAY_SIZE(sha256_avx_algs)); } -asmlinkage void sha256_transform_rorx(struct sha256_state *state, - const u8 *data, int blocks); +asmlinkage void sha256_transform_rorx(u32 *state, const u8 *data, int blocks); + +void __sha256_transform_avx2(struct sha256_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA256_BLOCK_SIZE); + + sha256_transform_rorx(state->state, data, chunks); + data += chunks * SHA256_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static int sha256_avx2_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return _sha256_update(desc, data, len, sha256_transform_rorx); + return _sha256_update(desc, data, len, __sha256_transform_avx2); } static int sha256_avx2_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return sha256_finup(desc, data, len, out, sha256_transform_rorx); + return sha256_finup(desc, data, len, out, __sha256_transform_avx2); } static int sha256_avx2_final(struct shash_desc *desc, u8 *out) @@ -294,19 +350,38 @@ static void unregister_sha256_avx2(void) } #ifdef CONFIG_AS_SHA256_NI -asmlinkage void sha256_ni_transform(struct sha256_state *digest, - const u8 *data, int rounds); +asmlinkage void sha256_transform_ni(u32 *digest, const u8 *data, int rounds); + +void __sha256_transform_ni(struct sha256_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA256_BLOCK_SIZE); + sha256_transform_ni(state->state, data, chunks); + data += chunks * SHA256_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static int sha256_ni_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return _sha256_update(desc, data, len, sha256_ni_transform); + return _sha256_update(desc, data, len, __sha256_transform_ni); } static int sha256_ni_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return sha256_finup(desc, data, len, out, sha256_ni_transform); + return sha256_finup(desc, data, len, out, __sha256_transform_ni); } static int sha256_ni_final(struct shash_desc *desc, u8 *out) diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c index 6d3b85e53d0e..cb6aad9d5052 100644 --- a/arch/x86/crypto/sha512_ssse3_glue.c +++ b/arch/x86/crypto/sha512_ssse3_glue.c @@ -39,8 +39,28 @@ #include #include -asmlinkage void sha512_transform_ssse3(struct sha512_state *state, - const u8 *data, int blocks); +asmlinkage void sha512_transform_ssse3(u64 *digest, const u8 *data, int blocks); + +void __sha512_transform_ssse3(struct sha512_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA512_BLOCK_SIZE); + + sha512_transform_ssse3(&state->state[0], data, chunks); + data += chunks * SHA512_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static int sha512_update(struct shash_desc *desc, const u8 *data, unsigned int len, sha512_block_fn *sha512_xform) @@ -57,9 +77,7 @@ static int sha512_update(struct shash_desc *desc, const u8 *data, */ BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0); - kernel_fpu_begin(); sha512_base_do_update(desc, data, len, sha512_xform); - kernel_fpu_end(); return 0; } @@ -70,11 +88,9 @@ static int sha512_finup(struct shash_desc *desc, const u8 *data, if (!crypto_simd_usable()) return crypto_sha512_finup(desc, data, len, out); - kernel_fpu_begin(); if (len) sha512_base_do_update(desc, data, len, sha512_xform); sha512_base_do_finalize(desc, sha512_xform); - kernel_fpu_end(); return sha512_base_finish(desc, out); } @@ -82,13 +98,13 @@ static int sha512_finup(struct shash_desc *desc, const u8 *data, static int sha512_ssse3_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return sha512_update(desc, data, len, sha512_transform_ssse3); + return sha512_update(desc, data, len, __sha512_transform_ssse3); } static int sha512_ssse3_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return sha512_finup(desc, data, len, out, sha512_transform_ssse3); + return sha512_finup(desc, data, len, out, __sha512_transform_ssse3); } /* Add padding and return the message digest. */ @@ -142,8 +158,29 @@ static void unregister_sha512_ssse3(void) ARRAY_SIZE(sha512_ssse3_algs)); } -asmlinkage void sha512_transform_avx(struct sha512_state *state, - const u8 *data, int blocks); +asmlinkage void sha512_transform_avx(u64 *digest, const u8 *data, int blocks); + +void __sha512_transform_avx(struct sha512_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA512_BLOCK_SIZE); + + sha512_transform_avx(state->state, data, chunks); + data += chunks * SHA512_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} + static bool avx_usable(void) { if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) { @@ -158,13 +195,13 @@ static bool avx_usable(void) static int sha512_avx_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return sha512_update(desc, data, len, sha512_transform_avx); + return sha512_update(desc, data, len, __sha512_transform_avx); } static int sha512_avx_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return sha512_finup(desc, data, len, out, sha512_transform_avx); + return sha512_finup(desc, data, len, out, __sha512_transform_avx); } /* Add padding and return the message digest. */ @@ -218,19 +255,39 @@ static void unregister_sha512_avx(void) ARRAY_SIZE(sha512_avx_algs)); } -asmlinkage void sha512_transform_rorx(struct sha512_state *state, - const u8 *data, int blocks); +asmlinkage void sha512_transform_rorx(u64 *digest, const u8 *data, int blocks); + +void __sha512_transform_avx2(struct sha512_state *state, const u8 *data, int blocks) +{ + if (blocks <= 0) + return; + + kernel_fpu_begin(); + for (;;) { + const int chunks = min(blocks, 4096 / SHA512_BLOCK_SIZE); + + sha512_transform_rorx(state->state, data, chunks); + data += chunks * SHA512_BLOCK_SIZE; + blocks -= chunks; + + if (blocks <= 0) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); +} static int sha512_avx2_update(struct shash_desc *desc, const u8 *data, unsigned int len) { - return sha512_update(desc, data, len, sha512_transform_rorx); + return sha512_update(desc, data, len, __sha512_transform_avx2); } static int sha512_avx2_finup(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return sha512_finup(desc, data, len, out, sha512_transform_rorx); + return sha512_finup(desc, data, len, out, __sha512_transform_avx2); } /* Add padding and return the message digest. */ From patchwork Mon Dec 19 22:02:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077194 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 450CFC4332F for ; Mon, 19 Dec 2022 22:03:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231972AbiLSWDp (ORCPT ); Mon, 19 Dec 2022 17:03:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232870AbiLSWDd (ORCPT ); Mon, 19 Dec 2022 17:03:33 -0500 Received: from mx0b-002e3701.pphosted.com (mx0b-002e3701.pphosted.com [148.163.143.35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9035B140D1; Mon, 19 Dec 2022 14:03:20 -0800 (PST) Received: from pps.filterd (m0134424.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJK6Pu2024283; Mon, 19 Dec 2022 22:02:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=Fw7SMOXU0AU3aza6EQK5/61RTV3W7f05AzzXrFzmOik=; b=a5+TnjmsrVS/fBlkPFiDHskIFCiHBEK+zWoazCcXt5elIrnK9M/HRq/3NlicIKuwdcOH 6x3zU0Xs1eeOyHYKnYgL7ZWCgeo9mWRqsOv5J4nk5pFScZwCZoSYmuS8GX5SourExhU0 KSoNywxMVUTMhyh6CJcfDm3sV6tqCTcz1N9m52JetyJ5BccqlvqWOaRaBaLEj/0SRlAt TBmzVALCtwe/R0Nox0q/QBG7owRcItwBBkWkcjmH1b4CzsthLsyns2OF4l3Vq5j91RHl er/o+L7Zf8k8w9es4A7jmHVkso3ZK/eBbOqk5UcK945re21rsrOtgegIcr2UEJVLcKIz 9Q== Received: from p1lg14880.it.hpe.com (p1lg14880.it.hpe.com [16.230.97.201]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 3mjvh69gnb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:50 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14880.it.hpe.com (Postfix) with ESMTPS id 5E9D880711F; Mon, 19 Dec 2022 22:02:49 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 92F7E805634; Mon, 19 Dec 2022 22:02:48 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 04/13] crypto: x86/crc - yield FPU context during long loops Date: Mon, 19 Dec 2022 16:02:14 -0600 Message-Id: <20221219220223.3982176-5-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-GUID: 4FevuVYEbMsP_5mKNtJVYHc63lHdLsiw X-Proofpoint-ORIG-GUID: 4FevuVYEbMsP_5mKNtJVYHc63lHdLsiw X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 malwarescore=0 bulkscore=0 phishscore=0 adultscore=0 clxscore=1015 impostorscore=0 mlxlogscore=999 mlxscore=0 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190194 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The x86 assembly language implementations using SIMD process data between kernel_fpu_begin() and kernel_fpu_end() calls. That disables scheduler preemption, so prevents the CPU core from being used by other threads. The update() and finup() functions might be called to process large quantities of data, which can result in RCU stalls and soft lockups. Periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. For crc32, add a pre-alignment loop so the assembly language function is not repeatedly called with an unaligned starting address. Fixes: 78c37d191dd6 ("crypto: crc32 - add crc32 pclmulqdq implementation and wrappers for table implementation") Fixes: 6a8ce1ef3940 ("crypto: crc32c - Optimize CRC32C calculation with PCLMULQDQ instruction") Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform") Suggested-by: Herbert Xu Signed-off-by: Robert Elliott --- arch/x86/crypto/crc32-pclmul_glue.c | 49 +++++----- arch/x86/crypto/crc32c-intel_glue.c | 118 +++++++++++++++++------- arch/x86/crypto/crct10dif-pclmul_glue.c | 65 ++++++++++--- 3 files changed, 165 insertions(+), 67 deletions(-) diff --git a/arch/x86/crypto/crc32-pclmul_glue.c b/arch/x86/crypto/crc32-pclmul_glue.c index 98cf3b4e4c9f..3692b50faf1c 100644 --- a/arch/x86/crypto/crc32-pclmul_glue.c +++ b/arch/x86/crypto/crc32-pclmul_glue.c @@ -41,41 +41,50 @@ #define CHKSUM_BLOCK_SIZE 1 #define CHKSUM_DIGEST_SIZE 4 -#define PCLMUL_MIN_LEN 64L /* minimum size of buffer - * for crc32_pclmul_le_16 */ -#define SCALE_F 16L /* size of xmm register */ +#define PCLMUL_MIN_LEN 64U /* minimum size of buffer for crc32_pclmul_le_16 */ +#define SCALE_F 16U /* size of xmm register */ #define SCALE_F_MASK (SCALE_F - 1) -u32 crc32_pclmul_le_16(unsigned char const *buffer, size_t len, u32 crc32); +asmlinkage u32 crc32_pclmul_le_16(const u8 *buffer, unsigned int len, u32 crc32); -static u32 __attribute__((pure)) - crc32_pclmul_le(u32 crc, unsigned char const *p, size_t len) +static u32 crc32_pclmul_le(u32 crc, const u8 *p, unsigned int len) { unsigned int iquotient; unsigned int iremainder; - unsigned int prealign; if (len < PCLMUL_MIN_LEN + SCALE_F_MASK || !crypto_simd_usable()) return crc32_le(crc, p, len); - if ((long)p & SCALE_F_MASK) { + if ((unsigned long)p & SCALE_F_MASK) { /* align p to 16 byte */ - prealign = SCALE_F - ((long)p & SCALE_F_MASK); + unsigned int prealign = SCALE_F - ((unsigned long)p & SCALE_F_MASK); crc = crc32_le(crc, p, prealign); len -= prealign; - p = (unsigned char *)(((unsigned long)p + SCALE_F_MASK) & - ~SCALE_F_MASK); + p += prealign; } - iquotient = len & (~SCALE_F_MASK); + iquotient = len & ~SCALE_F_MASK; iremainder = len & SCALE_F_MASK; - kernel_fpu_begin(); - crc = crc32_pclmul_le_16(p, iquotient, crc); - kernel_fpu_end(); + if (iquotient) { + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(iquotient, 4096U); - if (iremainder) - crc = crc32_le(crc, p + iquotient, iremainder); + crc = crc32_pclmul_le_16(p, chunk, crc); + iquotient -= chunk; + p += chunk; + + if (iquotient < PCLMUL_MIN_LEN) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); + } + + if (iquotient || iremainder) + crc = crc32_le(crc, p, iquotient + iremainder); return crc; } @@ -120,8 +129,7 @@ static int crc32_pclmul_update(struct shash_desc *desc, const u8 *data, } /* No final XOR 0xFFFFFFFF, like crc32_le */ -static int __crc32_pclmul_finup(u32 *crcp, const u8 *data, unsigned int len, - u8 *out) +static int __crc32_pclmul_finup(u32 *crcp, const u8 *data, unsigned int len, u8 *out) { *(__le32 *)out = cpu_to_le32(crc32_pclmul_le(*crcp, data, len)); return 0; @@ -144,8 +152,7 @@ static int crc32_pclmul_final(struct shash_desc *desc, u8 *out) static int crc32_pclmul_digest(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return __crc32_pclmul_finup(crypto_shash_ctx(desc->tfm), data, len, - out); + return __crc32_pclmul_finup(crypto_shash_ctx(desc->tfm), data, len, out); } static struct shash_alg alg = { diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c index feccb5254c7e..932574661ef5 100644 --- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -35,19 +35,24 @@ #ifdef CONFIG_X86_64 /* - * use carryless multiply version of crc32c when buffer - * size is >= 512 to account - * for fpu state save/restore overhead. + * only use crc_pcl() (carryless multiply version of crc32c) when buffer + * size is >= 512 to account for fpu state save/restore overhead. */ #define CRC32C_PCL_BREAKEVEN 512 -asmlinkage unsigned int crc_pcl(const u8 *buffer, int len, - unsigned int crc_init); +/* + * only pass aligned buffers to crc_pcl() to avoid special handling + * in each pass + */ +#define ALIGN_CRCPCL 16U +#define ALIGN_CRCPCL_MASK (ALIGN_CRCPCL - 1) + +asmlinkage u32 crc_pcl(const u8 *buffer, u64 len, u32 crc_init); #endif /* CONFIG_X86_64 */ -static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length) +static u32 crc32c_intel_le_hw_byte(u32 crc, const u8 *data, unsigned int len) { - while (length--) { + while (len--) { asm("crc32b %1, %0" : "+r" (crc) : "rm" (*data)); data++; @@ -56,7 +61,7 @@ static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t le return crc; } -static u32 __pure crc32c_intel_le_hw(u32 crc, unsigned char const *p, size_t len) +static u32 __pure crc32c_intel_le_hw(u32 crc, const u8 *p, unsigned int len) { unsigned int iquotient = len / SCALE_F; unsigned int iremainder = len % SCALE_F; @@ -69,8 +74,7 @@ static u32 __pure crc32c_intel_le_hw(u32 crc, unsigned char const *p, size_t len } if (iremainder) - crc = crc32c_intel_le_hw_byte(crc, (unsigned char *)ptmp, - iremainder); + crc = crc32c_intel_le_hw_byte(crc, (u8 *)ptmp, iremainder); return crc; } @@ -110,8 +114,8 @@ static int crc32c_intel_update(struct shash_desc *desc, const u8 *data, return 0; } -static int __crc32c_intel_finup(u32 *crcp, const u8 *data, unsigned int len, - u8 *out) +static int __crc32c_intel_finup(const u32 *crcp, const u8 *data, + unsigned int len, u8 *out) { *(__le32 *)out = ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len)); return 0; @@ -134,8 +138,7 @@ static int crc32c_intel_final(struct shash_desc *desc, u8 *out) static int crc32c_intel_digest(struct shash_desc *desc, const u8 *data, unsigned int len, u8 *out) { - return __crc32c_intel_finup(crypto_shash_ctx(desc->tfm), data, len, - out); + return __crc32c_intel_finup(crypto_shash_ctx(desc->tfm), data, len, out); } static int crc32c_intel_cra_init(struct crypto_tfm *tfm) @@ -149,47 +152,96 @@ static int crc32c_intel_cra_init(struct crypto_tfm *tfm) #ifdef CONFIG_X86_64 static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data, - unsigned int len) + unsigned int len) { u32 *crcp = shash_desc_ctx(desc); + u32 crc; + + BUILD_BUG_ON(CRC32C_PCL_BREAKEVEN > 4096U); /* * use faster PCL version if datasize is large enough to * overcome kernel fpu state save/restore overhead */ - if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) { - kernel_fpu_begin(); - *crcp = crc_pcl(data, len, *crcp); - kernel_fpu_end(); - } else + if (len < CRC32C_PCL_BREAKEVEN + ALIGN_CRCPCL_MASK || !crypto_simd_usable()) { *crcp = crc32c_intel_le_hw(*crcp, data, len); + return 0; + } + + crc = *crcp; + /* + * Although crc_pcl() supports unaligned buffers, it is more efficient + * handling a 16-byte aligned buffer. + */ + if ((unsigned long)data & ALIGN_CRCPCL_MASK) { + unsigned int prealign = ALIGN_CRCPCL - ((unsigned long)data & ALIGN_CRCPCL_MASK); + + crc = crc32c_intel_le_hw(crc, data, prealign); + len -= prealign; + data += prealign; + } + + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(len, 4096U); + + crc = crc_pcl(data, chunk, crc); + len -= chunk; + + if (!len) + break; + + data += chunk; + kernel_fpu_yield(); + } + kernel_fpu_end(); + + *crcp = crc; return 0; } -static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len, - u8 *out) +static int __crc32c_pcl_intel_finup(const u32 *crcp, const u8 *data, + unsigned int len, u8 *out) { - if (len >= CRC32C_PCL_BREAKEVEN && crypto_simd_usable()) { - kernel_fpu_begin(); - *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp)); - kernel_fpu_end(); - } else - *(__le32 *)out = - ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len)); + u32 crc; + + BUILD_BUG_ON(CRC32C_PCL_BREAKEVEN > 4096U); + + if (len < CRC32C_PCL_BREAKEVEN || !crypto_simd_usable()) { + *(__le32 *)out = ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len)); + return 0; + } + + crc = *crcp; + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(len, 4096U); + + crc = crc_pcl(data, chunk, crc); + len -= chunk; + + if (!len) + break; + + data += chunk; + kernel_fpu_yield(); + } + kernel_fpu_end(); + + *(__le32 *)out = ~cpu_to_le32(crc); return 0; } static int crc32c_pcl_intel_finup(struct shash_desc *desc, const u8 *data, - unsigned int len, u8 *out) + unsigned int len, u8 *out) { return __crc32c_pcl_intel_finup(shash_desc_ctx(desc), data, len, out); } static int crc32c_pcl_intel_digest(struct shash_desc *desc, const u8 *data, - unsigned int len, u8 *out) + unsigned int len, u8 *out) { - return __crc32c_pcl_intel_finup(crypto_shash_ctx(desc->tfm), data, len, - out); + return __crc32c_pcl_intel_finup(crypto_shash_ctx(desc->tfm), data, len, out); } #endif /* CONFIG_X86_64 */ diff --git a/arch/x86/crypto/crct10dif-pclmul_glue.c b/arch/x86/crypto/crct10dif-pclmul_glue.c index 71291d5af9f4..4d39eac94289 100644 --- a/arch/x86/crypto/crct10dif-pclmul_glue.c +++ b/arch/x86/crypto/crct10dif-pclmul_glue.c @@ -34,6 +34,8 @@ #include #include +#define PCLMUL_MIN_LEN 16U /* minimum size of buffer for crc_t10dif_pcl */ + asmlinkage u16 crc_t10dif_pcl(u16 init_crc, const u8 *buf, size_t len); struct chksum_desc_ctx { @@ -49,17 +51,36 @@ static int chksum_init(struct shash_desc *desc) return 0; } -static int chksum_update(struct shash_desc *desc, const u8 *data, - unsigned int length) +static int chksum_update(struct shash_desc *desc, const u8 *data, unsigned int len) { struct chksum_desc_ctx *ctx = shash_desc_ctx(desc); + u16 crc; + + if (len < PCLMUL_MIN_LEN || !crypto_simd_usable()) { + ctx->crc = crc_t10dif_generic(ctx->crc, data, len); + return 0; + } + + crc = ctx->crc; + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(len, 4096U); + + crc = crc_t10dif_pcl(crc, data, chunk); + len -= chunk; + data += chunk; + + if (len < PCLMUL_MIN_LEN) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); + + if (len) + crc = crc_t10dif_generic(crc, data, len); - if (length >= 16 && crypto_simd_usable()) { - kernel_fpu_begin(); - ctx->crc = crc_t10dif_pcl(ctx->crc, data, length); - kernel_fpu_end(); - } else - ctx->crc = crc_t10dif_generic(ctx->crc, data, length); + ctx->crc = crc; return 0; } @@ -73,12 +94,30 @@ static int chksum_final(struct shash_desc *desc, u8 *out) static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out) { - if (len >= 16 && crypto_simd_usable()) { - kernel_fpu_begin(); - *(__u16 *)out = crc_t10dif_pcl(crc, data, len); - kernel_fpu_end(); - } else + if (len < PCLMUL_MIN_LEN || !crypto_simd_usable()) { *(__u16 *)out = crc_t10dif_generic(crc, data, len); + return 0; + } + + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(len, 4096U); + + crc = crc_t10dif_pcl(crc, data, chunk); + len -= chunk; + data += chunk; + + if (len < PCLMUL_MIN_LEN) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); + + if (len) + crc = crc_t10dif_generic(crc, data, len); + + *(__u16 *)out = crc; return 0; } From patchwork Mon Dec 19 22:02:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077196 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E1E2C4332F for ; Mon, 19 Dec 2022 22:03:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232920AbiLSWDz (ORCPT ); Mon, 19 Dec 2022 17:03:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232882AbiLSWDf (ORCPT ); Mon, 19 Dec 2022 17:03:35 -0500 Received: from mx0b-002e3701.pphosted.com (mx0b-002e3701.pphosted.com [148.163.143.35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C2B1140A6; Mon, 19 Dec 2022 14:03:22 -0800 (PST) Received: from pps.filterd (m0134424.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJI2Af6022166; Mon, 19 Dec 2022 22:02:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=r0Ub34Z3hVxIma+xs/LdQFAyE9eIoMSZK4PDG6wbiZE=; b=DXs2UhpRPlEH2PcRtE2PcXm4KYW1rTX0Z/zQ6SFznBcqPLLZI/XhfgE9O+AkZdGTSsjs KNevrI8AZxW32Tup2G9GhyTzB7urDgKH5DqkFGXp2X9jhq9SbXvg9QIYxigyYTE1JubJ troqSUzOLmfSijnALFE7lf3pyhQomijWGGL6wAx6TXbq0WXBBqGsezUn+/AdHjqg+QCN bjD7D6NV13r+M7OvTkyCXxjrU2GE2P9wP/PwzAhTjTcxoWlZeutF5F9NlYm5Q/GBXBUi 6G46sPRMtcdscO7UUCl20WyGAoaMfENBS9C5Wz2n0yqepB1W7wGbkR/2hcpZVoY6jbrb Yw== Received: from p1lg14879.it.hpe.com (p1lg14879.it.hpe.com [16.230.97.200]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 3mjvh69gnh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:52 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id 27C1B310B4; Mon, 19 Dec 2022 22:02:51 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 581BF8061BF; Mon, 19 Dec 2022 22:02:50 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 05/13] crypto: x86/sm3 - yield FPU context during long loops Date: Mon, 19 Dec 2022 16:02:15 -0600 Message-Id: <20221219220223.3982176-6-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-GUID: TXAUQ1xjzXxwZkk3YpfNvY56-Pji46tJ X-Proofpoint-ORIG-GUID: TXAUQ1xjzXxwZkk3YpfNvY56-Pji46tJ X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 malwarescore=0 bulkscore=0 phishscore=0 adultscore=0 clxscore=1015 impostorscore=0 mlxlogscore=999 mlxscore=0 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190194 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The x86 assembly language implementations using SIMD process data between kernel_fpu_begin() and kernel_fpu_end() calls. That disables scheduler preemption, so prevents the CPU core from being used by other threads. The update() and finup() functions might be called to process large quantities of data, which can result in RCU stalls and soft lockups. Periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. Fixes: 930ab34d906d ("crypto: x86/sm3 - add AVX assembly implementation") Suggested-by: Herbert Xu Signed-off-by: Robert Elliott --- arch/x86/crypto/sm3_avx_glue.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/arch/x86/crypto/sm3_avx_glue.c b/arch/x86/crypto/sm3_avx_glue.c index 661b6f22ffcd..9e4b21c0e748 100644 --- a/arch/x86/crypto/sm3_avx_glue.c +++ b/arch/x86/crypto/sm3_avx_glue.c @@ -25,8 +25,7 @@ static int sm3_avx_update(struct shash_desc *desc, const u8 *data, { struct sm3_state *sctx = shash_desc_ctx(desc); - if (!crypto_simd_usable() || - (sctx->count % SM3_BLOCK_SIZE) + len < SM3_BLOCK_SIZE) { + if (((sctx->count % SM3_BLOCK_SIZE) + len < SM3_BLOCK_SIZE) || !crypto_simd_usable()) { sm3_update(sctx, data, len); return 0; } @@ -38,7 +37,19 @@ static int sm3_avx_update(struct shash_desc *desc, const u8 *data, BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0); kernel_fpu_begin(); - sm3_base_do_update(desc, data, len, sm3_transform_avx); + for (;;) { + const unsigned int chunk = min(len, 4096U); + + sm3_base_do_update(desc, data, chunk, sm3_transform_avx); + + len -= chunk; + + if (!len) + break; + + data += chunk; + kernel_fpu_yield(); + } kernel_fpu_end(); return 0; @@ -58,8 +69,21 @@ static int sm3_avx_finup(struct shash_desc *desc, const u8 *data, } kernel_fpu_begin(); - if (len) - sm3_base_do_update(desc, data, len, sm3_transform_avx); + if (len) { + for (;;) { + const unsigned int chunk = min(len, 4096U); + + sm3_base_do_update(desc, data, chunk, sm3_transform_avx); + len -= chunk; + + if (!len) + break; + + data += chunk; + kernel_fpu_yield(); + } + } + sm3_base_do_finalize(desc, sm3_transform_avx); kernel_fpu_end(); From patchwork Mon Dec 19 22:02:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077195 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD845C4167B for ; Mon, 19 Dec 2022 22:03:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232878AbiLSWDy (ORCPT ); Mon, 19 Dec 2022 17:03:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232883AbiLSWDf (ORCPT ); Mon, 19 Dec 2022 17:03:35 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B140140F8; Mon, 19 Dec 2022 14:03:22 -0800 (PST) Received: from pps.filterd (m0150241.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJKW2xn028221; Mon, 19 Dec 2022 22:02:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=AEloZssES2RC7BH2BiqVkRNNWZvMUr3nP3cZeI400jg=; b=cAbaJLlrHwY+t2yJT1Q1k41/eKiApkIZBwnKcdK3YSq0J8bem9gPVeNq07fTUbM4UY23 QG6eQO3MKwq+E+s20Qp0ZJig3POddIlluStUgdYoMQxRnDn+eoVKqAsWhitkXVRPVpLd Fkal101DFUx+grt3BpN0HarCpfevjef0ZCtvneWpTntgD7LMt39D8dFfGRFuNG7p7GdV DQkrzFhSMzAlSof4dAxCHoq+pXUvYfEtbTkFVE2EPzmo8sJVNjs4MwBb7ydPGMi4u3bs ih2VpcZi/P8LhwF4Qxc7VXffzWHiebuiuDuUPKoVWuRyBRMYQtn9GidvuqzdxPH7Ijka WQ== Received: from p1lg14881.it.hpe.com (p1lg14881.it.hpe.com [16.230.97.202]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3mjx3b10xe-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:54 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14881.it.hpe.com (Postfix) with ESMTPS id F270C801722; Mon, 19 Dec 2022 22:02:52 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 31411805E9F; Mon, 19 Dec 2022 22:02:52 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 06/13] crypto: x86/ghash - use u8 rather than char Date: Mon, 19 Dec 2022 16:02:16 -0600 Message-Id: <20221219220223.3982176-7-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: W0daaAMd-lf2JY2jF3r3OO2XyX_3vs9G X-Proofpoint-GUID: W0daaAMd-lf2JY2jF3r3OO2XyX_3vs9G X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 spamscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 bulkscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Use more consistent unambivalent types (u8 rather than char) for the source and destination buffer pointer arguments for the asm functions. Reference them with "asmlinkage" as well. Signed-off-by: Robert Elliott --- arch/x86/crypto/ghash-clmulni-intel_asm.S | 6 +++--- arch/x86/crypto/ghash-clmulni-intel_glue.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S index 09cf9271b83a..ad860836f75b 100644 --- a/arch/x86/crypto/ghash-clmulni-intel_asm.S +++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S @@ -96,7 +96,7 @@ SYM_FUNC_END(__clmul_gf128mul_ble) * This supports 64-bit CPUs. * * Return: none (but @dst is updated) - * Prototype: asmlinkage void clmul_ghash_mul(char *dst, const u128 *shash) + * Prototype: asmlinkage void clmul_ghash_mul(u8 *dst, const u128 *shash) */ SYM_FUNC_START(clmul_ghash_mul) FRAME_BEGIN @@ -122,8 +122,8 @@ SYM_FUNC_END(clmul_ghash_mul) * This supports 64-bit CPUs. * * Return: none (but @dst is updated) - * Prototype: asmlinkage clmul_ghash_update(char *dst, const char *src, - * unsigned int srclen, const u128 *shash); + * Prototype: asmlinkage void clmul_ghash_update(u8 *dst, const u8 *src, + * unsigned int srclen, const u128 *shash); */ SYM_FUNC_START(clmul_ghash_update) FRAME_BEGIN diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c index 1f1a95f3dd0c..beac4b2eddf6 100644 --- a/arch/x86/crypto/ghash-clmulni-intel_glue.c +++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c @@ -23,10 +23,10 @@ #define GHASH_BLOCK_SIZE 16 #define GHASH_DIGEST_SIZE 16 -void clmul_ghash_mul(char *dst, const u128 *shash); +asmlinkage void clmul_ghash_mul(u8 *dst, const u128 *shash); -void clmul_ghash_update(char *dst, const char *src, unsigned int srclen, - const u128 *shash); +asmlinkage void clmul_ghash_update(u8 *dst, const u8 *src, unsigned int srclen, + const u128 *shash); struct ghash_async_ctx { struct cryptd_ahash *cryptd_tfm; From patchwork Mon Dec 19 22:02:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077191 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0C63C10F1D for ; Mon, 19 Dec 2022 22:03:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232693AbiLSWDi (ORCPT ); Mon, 19 Dec 2022 17:03:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232853AbiLSWDU (ORCPT ); Mon, 19 Dec 2022 17:03:20 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99A87140C9; Mon, 19 Dec 2022 14:03:19 -0800 (PST) Received: from pps.filterd (m0134422.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJL1wCX003253; Mon, 19 Dec 2022 22:02:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=v7bR1szvHVL/RPr0ZNg3zP1MrZYuBcDYIV4Z7L2T914=; b=ONkH+4LS7SXoAXohpr8SBsVPGhgEtOKMi2uiTZ/y20e/Do1S5koY8UmDXOn3lCXox5Gn so3QBUWBG47i6BhCiN1h9GBDOunOEmqQBNWLC4fVDMmZXYbsS4f75dGWSMnEYgyh7vzQ NlR9wRg+i+GW4T+IoGPEXc94OdVPr4bbNuEabW8S6Y6ggDc42r+GRaCCARK0+v67mXCv qygtL8Ok8JtwYw2oFH+YXQdb/ClYASd/ORpDCqkqSSBgxLRPGV23lOu384M0WACUnMbN jB/1ECLjBKvuYhLZ+1EsHtqIWonj1zY4LmsHc4WdNSXYQdhmGOC6XnpXIuA0XRNLaQRt cg== Received: from p1lg14878.it.hpe.com (p1lg14878.it.hpe.com [16.230.97.204]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 3mjyd9rcn4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:55 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14878.it.hpe.com (Postfix) with ESMTPS id AF7183DE2E; Mon, 19 Dec 2022 22:02:54 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id E37DB80649A; Mon, 19 Dec 2022 22:02:53 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 07/13] crypto: x86/ghash - restructure FPU context saving Date: Mon, 19 Dec 2022 16:02:17 -0600 Message-Id: <20221219220223.3982176-8-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: wEAm2iiXTf33vjlVf3B3bGIxapDJMGAh X-Proofpoint-GUID: wEAm2iiXTf33vjlVf3B3bGIxapDJMGAh X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 clxscore=1015 bulkscore=0 adultscore=0 malwarescore=0 spamscore=0 impostorscore=0 mlxscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Wrap each of the calls to clmul_hash_update and clmul_ghash__mul in its own set of kernel_fpu_begin and kernel_fpu_end calls, preparing to limit the amount of data processed by each _update call to avoid RCU stalls. This is more like how polyval-clmulni_glue is structured. Fixes: 0e1227d356e9 ("crypto: ghash - Add PCLMULQDQ accelerated implementation") Suggested-by: Herbert Xu Signed-off-by: Robert Elliott --- arch/x86/crypto/ghash-clmulni-intel_glue.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c index beac4b2eddf6..1bfde099de0f 100644 --- a/arch/x86/crypto/ghash-clmulni-intel_glue.c +++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c @@ -80,7 +80,6 @@ static int ghash_update(struct shash_desc *desc, struct ghash_ctx *ctx = crypto_shash_ctx(desc->tfm); u8 *dst = dctx->buffer; - kernel_fpu_begin(); if (dctx->bytes) { int n = min(srclen, dctx->bytes); u8 *pos = dst + (GHASH_BLOCK_SIZE - dctx->bytes); @@ -91,10 +90,14 @@ static int ghash_update(struct shash_desc *desc, while (n--) *pos++ ^= *src++; - if (!dctx->bytes) + if (!dctx->bytes) { + kernel_fpu_begin(); clmul_ghash_mul(dst, &ctx->shash); + kernel_fpu_end(); + } } + kernel_fpu_begin(); clmul_ghash_update(dst, src, srclen, &ctx->shash); kernel_fpu_end(); From patchwork Mon Dec 19 22:02:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077185 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AC23C4332F for ; Mon, 19 Dec 2022 22:03:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229564AbiLSWDR (ORCPT ); Mon, 19 Dec 2022 17:03:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231344AbiLSWDR (ORCPT ); Mon, 19 Dec 2022 17:03:17 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B59313F97; Mon, 19 Dec 2022 14:03:16 -0800 (PST) Received: from pps.filterd (m0134421.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJKW3DT015099; Mon, 19 Dec 2022 22:02:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=hrnSS/acWGmvImUo/QDE8X3zDiPvrvWWK9oGzVt4i0M=; b=STcXxQSXSo0nENAYhUcqOjh7Q66IsWXyNouDI9udlCce7apiUNr0jN4gvb1ifzxSXbGj mgv5IJf3xa1RwT1cyWuAxn8NdS6pCGhv3ZorMrV6hy3edS/3L3hCiOmmn+gtjZny1hzp NdTR9RDEkTh+/bDszmDh3aBpV9+DeCeE9Z2YaKZr7NDumVp4ANYMdX/vuxR4DNcaK1uJ iCNPwbsPWegU6hKLVD4G0rttHtU4xMn7wgO+QG/cTAtd0bP5hgh6P2IRwREbkXdnlHiP PQLcH4iff8aLlicxQnvGaEji9MuItOtS+AAPLKIfXvwUtzubn53EZk8DMDxgtzZVpkgG 2A== Received: from p1lg14879.it.hpe.com (p1lg14879.it.hpe.com [16.230.97.200]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 3mjv4222vs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:57 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id 634A5310BD; Mon, 19 Dec 2022 22:02:56 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 913C0805634; Mon, 19 Dec 2022 22:02:55 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 08/13] crypto: x86/ghash - yield FPU context during long loops Date: Mon, 19 Dec 2022 16:02:18 -0600 Message-Id: <20221219220223.3982176-9-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: OTJy3vKEdfmzvoxGnRwruc0exfrYOwPG X-Proofpoint-GUID: OTJy3vKEdfmzvoxGnRwruc0exfrYOwPG X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 adultscore=0 phishscore=0 impostorscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 bulkscore=0 mlxscore=0 lowpriorityscore=0 priorityscore=1501 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The x86 assembly language implementations using SIMD process data between kernel_fpu_begin() and kernel_fpu_end() calls. That disables scheduler preemption, so prevents the CPU core from being used by other threads. The update() and finup() functions might be called to process large quantities of data, which can result in RCU stalls and soft lockups. Periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. Fixes: 0e1227d356e9 ("crypto: ghash - Add PCLMULQDQ accelerated implementation") Suggested-by: Herbert Xu Signed-off-by: Robert Elliott --- arch/x86/crypto/ghash-clmulni-intel_glue.c | 26 ++++++++++++++++------ 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c index 1bfde099de0f..cd44339abdbb 100644 --- a/arch/x86/crypto/ghash-clmulni-intel_glue.c +++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c @@ -82,7 +82,7 @@ static int ghash_update(struct shash_desc *desc, if (dctx->bytes) { int n = min(srclen, dctx->bytes); - u8 *pos = dst + (GHASH_BLOCK_SIZE - dctx->bytes); + u8 *pos = dst + GHASH_BLOCK_SIZE - dctx->bytes; dctx->bytes -= n; srclen -= n; @@ -97,13 +97,25 @@ static int ghash_update(struct shash_desc *desc, } } - kernel_fpu_begin(); - clmul_ghash_update(dst, src, srclen, &ctx->shash); - kernel_fpu_end(); + if (srclen >= GHASH_BLOCK_SIZE) { + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(srclen, 4096U); + + clmul_ghash_update(dst, src, chunk, &ctx->shash); + + srclen -= chunk & ~(GHASH_BLOCK_SIZE - 1); + src += chunk & ~(GHASH_BLOCK_SIZE - 1); + + if (srclen < GHASH_BLOCK_SIZE) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); + } - if (srclen & 0xf) { - src += srclen - (srclen & 0xf); - srclen &= 0xf; + if (srclen) { dctx->bytes = GHASH_BLOCK_SIZE - srclen; while (srclen--) *dst++ ^= *src++; From patchwork Mon Dec 19 22:02:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077186 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 123F8C4332F for ; Mon, 19 Dec 2022 22:03:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232641AbiLSWDT (ORCPT ); Mon, 19 Dec 2022 17:03:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232385AbiLSWDS (ORCPT ); Mon, 19 Dec 2022 17:03:18 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62E541409D; Mon, 19 Dec 2022 14:03:17 -0800 (PST) Received: from pps.filterd (m0150241.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJJqpHh022240; Mon, 19 Dec 2022 22:02:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=p9JbhkqMPpi3fDjAvwBFoPmQr84CYKUu0cOLpCX7Oos=; b=R8vnhzElii1PpLsbNT17v0Pq13SNkh97OdCOMmYK3gZBqg08UIHuF8KB2TCKL05kimPn cWrDjYCmo1i+csmB9g8RtK1/xDCphuS5drdfTPT4tUjQZJk08kDb/KSre7BqA/gX8Crf 1TVQtBiuLqKbbVZWS95OfUqk+MBBXSesV3eeXM58KL8ioG+bBi6amFu0AeNZUAMhBEKP 2rQvfcM2jrYvZBixCyOWLtKk87bXFExdBs2MFUoYHJCWFGvNmLn207QGxZv9fnSDOS6+ wgol+YeZCK5INUMepxHDXYQs7fCJxoAiNjGwt+o4AHl0mcXuERH+5XhV7MMOaHjmkNPU 0w== Received: from p1lg14880.it.hpe.com (p1lg14880.it.hpe.com [16.230.97.201]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3mjx3b10xw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:58 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14880.it.hpe.com (Postfix) with ESMTPS id 01CFA807130; Mon, 19 Dec 2022 22:02:57 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 33578805634; Mon, 19 Dec 2022 22:02:57 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 09/13] crypto: x86/poly - yield FPU context only when needed Date: Mon, 19 Dec 2022 16:02:19 -0600 Message-Id: <20221219220223.3982176-10-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: twEmfStn_S24aIZ0bUsjbvnDIt-5xTZA X-Proofpoint-GUID: twEmfStn_S24aIZ0bUsjbvnDIt-5xTZA X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 spamscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 bulkscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The x86 assembly language implementations using SIMD process data between kernel_fpu_begin() and kernel_fpu_end() calls. That disables scheduler preemption, so prevents the CPU core from being used by other threads. The update() and finup() functions might be called to process large quantities of data, which can result in RCU stalls and soft lockups. Rather than break the processing into 4 KiB passes, each of which unilaterally calls kernel_fpu_begin() and kernel_fpu_end(), periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. Suggested-by: Herbert Xu Signed-off-by: Robert Elliott --- arch/x86/crypto/nhpoly1305-avx2-glue.c | 22 +++++++----- arch/x86/crypto/nhpoly1305-sse2-glue.c | 22 +++++++----- arch/x86/crypto/poly1305_glue.c | 47 ++++++++++++-------------- arch/x86/crypto/polyval-clmulni_glue.c | 46 +++++++++++++++---------- 4 files changed, 79 insertions(+), 58 deletions(-) diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c index 46b036204ed9..4afbfd35afda 100644 --- a/arch/x86/crypto/nhpoly1305-avx2-glue.c +++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c @@ -22,15 +22,21 @@ static int nhpoly1305_avx2_update(struct shash_desc *desc, if (srclen < 64 || !crypto_simd_usable()) return crypto_nhpoly1305_update(desc, src, srclen); - do { - unsigned int n = min_t(unsigned int, srclen, SZ_4K); + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(srclen, 4096U); + + crypto_nhpoly1305_update_helper(desc, src, chunk, nh_avx2); + srclen -= chunk; + + if (!srclen) + break; + + src += chunk; + kernel_fpu_yield(); + } + kernel_fpu_end(); - kernel_fpu_begin(); - crypto_nhpoly1305_update_helper(desc, src, n, nh_avx2); - kernel_fpu_end(); - src += n; - srclen -= n; - } while (srclen); return 0; } diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c index 4a4970d75107..f5c757f6f781 100644 --- a/arch/x86/crypto/nhpoly1305-sse2-glue.c +++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c @@ -22,15 +22,21 @@ static int nhpoly1305_sse2_update(struct shash_desc *desc, if (srclen < 64 || !crypto_simd_usable()) return crypto_nhpoly1305_update(desc, src, srclen); - do { - unsigned int n = min_t(unsigned int, srclen, SZ_4K); + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(srclen, 4096U); + + crypto_nhpoly1305_update_helper(desc, src, chunk, nh_sse2); + srclen -= chunk; + + if (!srclen) + break; + + src += chunk; + kernel_fpu_yield(); + } + kernel_fpu_end(); - kernel_fpu_begin(); - crypto_nhpoly1305_update_helper(desc, src, n, nh_sse2); - kernel_fpu_end(); - src += n; - srclen -= n; - } while (srclen); return 0; } diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c index 1dfb8af48a3c..13e2e134b458 100644 --- a/arch/x86/crypto/poly1305_glue.c +++ b/arch/x86/crypto/poly1305_glue.c @@ -15,20 +15,13 @@ #include #include -asmlinkage void poly1305_init_x86_64(void *ctx, - const u8 key[POLY1305_BLOCK_SIZE]); -asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp, - const size_t len, const u32 padbit); -asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], - const u32 nonce[4]); -asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], - const u32 nonce[4]); -asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, const size_t len, - const u32 padbit); -asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, const size_t len, - const u32 padbit); -asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp, - const size_t len, const u32 padbit); +asmlinkage void poly1305_init_x86_64(void *ctx, const u8 key[POLY1305_BLOCK_SIZE]); +asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp, unsigned int len, u32 padbit); +asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], const u32 nonce[4]); +asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], const u32 nonce[4]); +asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, unsigned int len, const u32 padbit); +asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, unsigned int len, u32 padbit); +asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp, unsigned int len, u32 padbit); static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx); static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx2); @@ -86,7 +79,7 @@ static void poly1305_simd_init(void *ctx, const u8 key[POLY1305_BLOCK_SIZE]) poly1305_init_x86_64(ctx, key); } -static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len, +static void poly1305_simd_blocks(void *ctx, const u8 *inp, unsigned int len, const u32 padbit) { struct poly1305_arch_internal *state = ctx; @@ -103,21 +96,25 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len, return; } - do { - const size_t bytes = min_t(size_t, len, SZ_4K); + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(len, 4096U); - kernel_fpu_begin(); if (IS_ENABLED(CONFIG_AS_AVX512) && static_branch_likely(&poly1305_use_avx512)) - poly1305_blocks_avx512(ctx, inp, bytes, padbit); + poly1305_blocks_avx512(ctx, inp, chunk, padbit); else if (static_branch_likely(&poly1305_use_avx2)) - poly1305_blocks_avx2(ctx, inp, bytes, padbit); + poly1305_blocks_avx2(ctx, inp, chunk, padbit); else - poly1305_blocks_avx(ctx, inp, bytes, padbit); - kernel_fpu_end(); + poly1305_blocks_avx(ctx, inp, chunk, padbit); + len -= chunk; - len -= bytes; - inp += bytes; - } while (len); + if (!len) + break; + + inp += chunk; + kernel_fpu_yield(); + } + kernel_fpu_end(); } static void poly1305_simd_emit(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c index 8fa58b0f3cb3..a3d72e87d58d 100644 --- a/arch/x86/crypto/polyval-clmulni_glue.c +++ b/arch/x86/crypto/polyval-clmulni_glue.c @@ -45,8 +45,8 @@ struct polyval_desc_ctx { u32 bytes; }; -asmlinkage void clmul_polyval_update(const struct polyval_tfm_ctx *keys, - const u8 *in, size_t nblocks, u8 *accumulator); +asmlinkage void clmul_polyval_update(const struct polyval_tfm_ctx *keys, const u8 *in, + unsigned int nblocks, u8 *accumulator); asmlinkage void clmul_polyval_mul(u8 *op1, const u8 *op2); static inline struct polyval_tfm_ctx *polyval_tfm_ctx(struct crypto_shash *tfm) @@ -55,27 +55,40 @@ static inline struct polyval_tfm_ctx *polyval_tfm_ctx(struct crypto_shash *tfm) } static void internal_polyval_update(const struct polyval_tfm_ctx *keys, - const u8 *in, size_t nblocks, u8 *accumulator) + const u8 *in, unsigned int nblocks, u8 *accumulator) { - if (likely(crypto_simd_usable())) { - kernel_fpu_begin(); - clmul_polyval_update(keys, in, nblocks, accumulator); - kernel_fpu_end(); - } else { + if (!crypto_simd_usable()) { polyval_update_non4k(keys->key_powers[NUM_KEY_POWERS-1], in, nblocks, accumulator); + return; } + + kernel_fpu_begin(); + for (;;) { + const unsigned int chunks = min(nblocks, 4096U / POLYVAL_BLOCK_SIZE); + + clmul_polyval_update(keys, in, chunks, accumulator); + nblocks -= chunks; + + if (!nblocks) + break; + + in += chunks * POLYVAL_BLOCK_SIZE; + kernel_fpu_yield(); + } + kernel_fpu_end(); } static void internal_polyval_mul(u8 *op1, const u8 *op2) { - if (likely(crypto_simd_usable())) { - kernel_fpu_begin(); - clmul_polyval_mul(op1, op2); - kernel_fpu_end(); - } else { + if (!crypto_simd_usable()) { polyval_mul_non4k(op1, op2); + return; } + + kernel_fpu_begin(); + clmul_polyval_mul(op1, op2); + kernel_fpu_end(); } static int polyval_x86_setkey(struct crypto_shash *tfm, @@ -113,7 +126,6 @@ static int polyval_x86_update(struct shash_desc *desc, struct polyval_desc_ctx *dctx = shash_desc_ctx(desc); const struct polyval_tfm_ctx *tctx = polyval_tfm_ctx(desc->tfm); u8 *pos; - unsigned int nblocks; unsigned int n; if (dctx->bytes) { @@ -131,9 +143,9 @@ static int polyval_x86_update(struct shash_desc *desc, tctx->key_powers[NUM_KEY_POWERS-1]); } - while (srclen >= POLYVAL_BLOCK_SIZE) { - /* Allow rescheduling every 4K bytes. */ - nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE; + if (srclen >= POLYVAL_BLOCK_SIZE) { + const unsigned int nblocks = srclen / POLYVAL_BLOCK_SIZE; + internal_polyval_update(tctx, src, nblocks, dctx->buffer); srclen -= nblocks * POLYVAL_BLOCK_SIZE; src += nblocks * POLYVAL_BLOCK_SIZE; From patchwork Mon Dec 19 22:02:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077187 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50E22C4332F for ; Mon, 19 Dec 2022 22:03:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232875AbiLSWDe (ORCPT ); Mon, 19 Dec 2022 17:03:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232124AbiLSWDT (ORCPT ); Mon, 19 Dec 2022 17:03:19 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D02413D6F; Mon, 19 Dec 2022 14:03:18 -0800 (PST) Received: from pps.filterd (m0150241.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJLW1rF025606; Mon, 19 Dec 2022 22:03:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=cBwySZEgLVUjwgqyHLHULZ/4YxK3sjPT9ThAW/euji8=; b=VJhp6fzD/f6TR2gTVRg7uEutZGe27LkEmWB5s8L1k+jlyUoxHHvYcL9vrIkf1EDtWINJ KVZCUNNgS8FFjj8RKDt57CO5XbZfRv1WDthh3F3mCN/wsMTIPttwCT49hs3Nwq9XBrke L9EqPi0m7GwQrgoPlM03aTV2crOo2V+N+/lA8k99/R8uCjDM9mlbj+nLKgpj0B7bIWBU jgnKaIBr32VmZop61nUkCCZJa3nMpV3mziCJSV+I9VMVDzFGF26UHkNjBSVn8BBqe34S k/mM8FFPf2FGop10BKXxNgJVxDmYlkZVea9rCScx64SsybkudRUsZ5LR9hbYeVz/ykQW lw== Received: from p1lg14878.it.hpe.com (p1lg14878.it.hpe.com [16.230.97.204]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3mjx3b10y3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:03:00 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14878.it.hpe.com (Postfix) with ESMTPS id 7AE703DE29; Mon, 19 Dec 2022 22:02:59 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id AECC2808734; Mon, 19 Dec 2022 22:02:58 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 10/13] crypto: x86/aegis - yield FPU context during long loops Date: Mon, 19 Dec 2022 16:02:20 -0600 Message-Id: <20221219220223.3982176-11-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: MWN4LHZ798tbmusyvRonviknI_w6SdXZ X-Proofpoint-GUID: MWN4LHZ798tbmusyvRonviknI_w6SdXZ X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 spamscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 bulkscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Make kernel_fpu_begin() and kernel_fpu_end() calls around each assembly language function that uses FPU context, rather than around the entire set (init, ad, crypt, final). During encryption, periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. Associated data is not limited. Allow the skcipher_walk functions to sleep again, since they are no longer called inside FPU context. Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations") Fixes: ba6771c0a0bc ("crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP") Signed-off-by: Robert Elliott --- arch/x86/crypto/aegis128-aesni-glue.c | 49 ++++++++++++++++++++------- 1 file changed, 36 insertions(+), 13 deletions(-) diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c index 4623189000d8..f99f3e597b3c 100644 --- a/arch/x86/crypto/aegis128-aesni-glue.c +++ b/arch/x86/crypto/aegis128-aesni-glue.c @@ -12,8 +12,8 @@ #include #include #include -#include #include +#include #define AEGIS128_BLOCK_ALIGN 16 #define AEGIS128_BLOCK_SIZE 16 @@ -85,15 +85,19 @@ static void crypto_aegis128_aesni_process_ad( if (pos > 0) { unsigned int fill = AEGIS128_BLOCK_SIZE - pos; memcpy(buf.bytes + pos, src, fill); - crypto_aegis128_aesni_ad(state, + kernel_fpu_begin(); + crypto_aegis128_aesni_ad(state->blocks, AEGIS128_BLOCK_SIZE, buf.bytes); + kernel_fpu_end(); pos = 0; left -= fill; src += fill; } - crypto_aegis128_aesni_ad(state, left, src); + kernel_fpu_begin(); + crypto_aegis128_aesni_ad(state->blocks, left, src); + kernel_fpu_end(); src += left & ~(AEGIS128_BLOCK_SIZE - 1); left &= AEGIS128_BLOCK_SIZE - 1; @@ -110,7 +114,9 @@ static void crypto_aegis128_aesni_process_ad( if (pos > 0) { memset(buf.bytes + pos, 0, AEGIS128_BLOCK_SIZE - pos); - crypto_aegis128_aesni_ad(state, AEGIS128_BLOCK_SIZE, buf.bytes); + kernel_fpu_begin(); + crypto_aegis128_aesni_ad(state->blocks, AEGIS128_BLOCK_SIZE, buf.bytes); + kernel_fpu_end(); } } @@ -118,16 +124,31 @@ static void crypto_aegis128_aesni_process_crypt( struct aegis_state *state, struct skcipher_walk *walk, const struct aegis_crypt_ops *ops) { - while (walk->nbytes >= AEGIS128_BLOCK_SIZE) { - ops->crypt_blocks(state, - round_down(walk->nbytes, AEGIS128_BLOCK_SIZE), - walk->src.virt.addr, walk->dst.virt.addr); - skcipher_walk_done(walk, walk->nbytes % AEGIS128_BLOCK_SIZE); + if (walk->nbytes >= AEGIS128_BLOCK_SIZE) { + kernel_fpu_begin(); + for (;;) { + unsigned int chunk = min(walk->nbytes, 4096U); + + chunk = round_down(chunk, AEGIS128_BLOCK_SIZE); + + ops->crypt_blocks(state->blocks, chunk, + walk->src.virt.addr, walk->dst.virt.addr); + + skcipher_walk_done(walk, walk->nbytes - chunk); + + if (walk->nbytes < AEGIS128_BLOCK_SIZE) + break; + + kernel_fpu_yield(); + } + kernel_fpu_end(); } if (walk->nbytes) { - ops->crypt_tail(state, walk->nbytes, walk->src.virt.addr, + kernel_fpu_begin(); + ops->crypt_tail(state->blocks, walk->nbytes, walk->src.virt.addr, walk->dst.virt.addr); + kernel_fpu_end(); skcipher_walk_done(walk, 0); } } @@ -172,15 +193,17 @@ static void crypto_aegis128_aesni_crypt(struct aead_request *req, struct skcipher_walk walk; struct aegis_state state; - ops->skcipher_walk_init(&walk, req, true); + ops->skcipher_walk_init(&walk, req, false); kernel_fpu_begin(); + crypto_aegis128_aesni_init(&state.blocks, ctx->key.bytes, req->iv); + kernel_fpu_end(); - crypto_aegis128_aesni_init(&state, ctx->key.bytes, req->iv); crypto_aegis128_aesni_process_ad(&state, req->src, req->assoclen); crypto_aegis128_aesni_process_crypt(&state, &walk, ops); - crypto_aegis128_aesni_final(&state, tag_xor, req->assoclen, cryptlen); + kernel_fpu_begin(); + crypto_aegis128_aesni_final(&state.blocks, tag_xor, req->assoclen, cryptlen); kernel_fpu_end(); } From patchwork Mon Dec 19 22:02:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077188 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 589D2C001B2 for ; Mon, 19 Dec 2022 22:03:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232890AbiLSWDf (ORCPT ); Mon, 19 Dec 2022 17:03:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232747AbiLSWDT (ORCPT ); Mon, 19 Dec 2022 17:03:19 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBDFA13F97; Mon, 19 Dec 2022 14:03:18 -0800 (PST) Received: from pps.filterd (m0134420.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJKsHLI016291; Mon, 19 Dec 2022 22:03:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=nSvaflAAMuZJfAIaCIotghXNKYG+XpkRDzCUXcgsa5g=; b=JtNnDTFqPb0hBmYH8+/JkEsBILIbaFYXXGSZdnc3lzB68YQav0oJHkwYpQFVNbClPyx0 u/Pj0y+ndNK9Nu5Iyw+EEfFJe1QPli4e3N6yhPSeDxP8c189BQ/XGlKTkAGp1OThJANs YhfBAiqZXvtzeCCGdErFar8YZVz4rbKMrVHBwM9wBNa5n3WdFwwOAi8A6DaDCgawEUic QPRCc/EtuC04ZhT7LvEVjVLR4qDuFAqfyEXFrczTRMkj9jbTLQ3swsqQmTSJXhbYsPEA +26v9T3WUkxbEvsHpOCPgWSe1rPjwzaU3a3HC0l/h8u4/9GtYKauJmUhYTuHVMQMlbjW DA== Received: from p1lg14878.it.hpe.com (p1lg14878.it.hpe.com [16.230.97.204]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 3mjy6d0etp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:03:01 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14878.it.hpe.com (Postfix) with ESMTPS id 3EF413DE2A; Mon, 19 Dec 2022 22:03:01 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 6DAE080649A; Mon, 19 Dec 2022 22:03:00 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 11/13] crypto: x86/blake - yield FPU context only when needed Date: Mon, 19 Dec 2022 16:02:21 -0600 Message-Id: <20221219220223.3982176-12-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: O0UNo8-c_31ZUN3VBFPimjZMmRle62hI X-Proofpoint-GUID: O0UNo8-c_31ZUN3VBFPimjZMmRle62hI X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 lowpriorityscore=0 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 suspectscore=0 spamscore=0 mlxscore=0 adultscore=0 clxscore=1015 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The x86 assembly language implementations using SIMD process data between kernel_fpu_begin() and kernel_fpu_end() calls. That disables scheduler preemption, so prevents the CPU core from being used by other threads. The update() and finup() functions might be called to process large quantities of data, which can result in RCU stalls and soft lockups. Rather than break the processing into 4 KiB passes, each of which unilaterally calls kernel_fpu_begin() and kernel_fpu_end(), periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. Adjust the type of the length arguments everywhere to be unsigned long rather than size_t to avoid typecasts. Suggested-by: Herbert Xu Signed-off-by: Robert Elliott --- arch/x86/crypto/blake2s-glue.c | 41 ++++++++++++++++--------------- include/crypto/internal/blake2s.h | 8 +++--- lib/crypto/blake2s-generic.c | 12 ++++----- 3 files changed, 31 insertions(+), 30 deletions(-) diff --git a/arch/x86/crypto/blake2s-glue.c b/arch/x86/crypto/blake2s-glue.c index aaba21230528..bbb0a67ebb1c 100644 --- a/arch/x86/crypto/blake2s-glue.c +++ b/arch/x86/crypto/blake2s-glue.c @@ -12,46 +12,47 @@ #include #include -#include #include #include -asmlinkage void blake2s_compress_ssse3(struct blake2s_state *state, - const u8 *block, const size_t nblocks, - const u32 inc); -asmlinkage void blake2s_compress_avx512(struct blake2s_state *state, - const u8 *block, const size_t nblocks, - const u32 inc); +asmlinkage void blake2s_compress_ssse3(struct blake2s_state *state, const u8 *data, + unsigned int nblocks, u32 inc); +asmlinkage void blake2s_compress_avx512(struct blake2s_state *state, const u8 *data, + unsigned int nblocks, u32 inc); static __ro_after_init DEFINE_STATIC_KEY_FALSE(blake2s_use_ssse3); static __ro_after_init DEFINE_STATIC_KEY_FALSE(blake2s_use_avx512); -void blake2s_compress(struct blake2s_state *state, const u8 *block, - size_t nblocks, const u32 inc) +void blake2s_compress(struct blake2s_state *state, const u8 *data, + unsigned int nblocks, const u32 inc) { /* SIMD disables preemption, so relax after processing each page. */ BUILD_BUG_ON(SZ_4K / BLAKE2S_BLOCK_SIZE < 8); if (!static_branch_likely(&blake2s_use_ssse3) || !may_use_simd()) { - blake2s_compress_generic(state, block, nblocks, inc); + blake2s_compress_generic(state, data, nblocks, inc); return; } - do { - const size_t blocks = min_t(size_t, nblocks, - SZ_4K / BLAKE2S_BLOCK_SIZE); + kernel_fpu_begin(); + for (;;) { + const unsigned int chunks = min(nblocks, 4096U / BLAKE2S_BLOCK_SIZE); - kernel_fpu_begin(); if (IS_ENABLED(CONFIG_AS_AVX512) && static_branch_likely(&blake2s_use_avx512)) - blake2s_compress_avx512(state, block, blocks, inc); + blake2s_compress_avx512(state, data, chunks, inc); else - blake2s_compress_ssse3(state, block, blocks, inc); - kernel_fpu_end(); + blake2s_compress_ssse3(state, data, chunks, inc); - nblocks -= blocks; - block += blocks * BLAKE2S_BLOCK_SIZE; - } while (nblocks); + nblocks -= chunks; + + if (!nblocks) + break; + + data += chunks * BLAKE2S_BLOCK_SIZE; + kernel_fpu_yield(); + } + kernel_fpu_end(); } EXPORT_SYMBOL(blake2s_compress); diff --git a/include/crypto/internal/blake2s.h b/include/crypto/internal/blake2s.h index 506d56530ca9..d6df791e6148 100644 --- a/include/crypto/internal/blake2s.h +++ b/include/crypto/internal/blake2s.h @@ -10,11 +10,11 @@ #include #include -void blake2s_compress_generic(struct blake2s_state *state, const u8 *block, - size_t nblocks, const u32 inc); +void blake2s_compress_generic(struct blake2s_state *state, const u8 *data, + unsigned int nblocks, u32 inc); -void blake2s_compress(struct blake2s_state *state, const u8 *block, - size_t nblocks, const u32 inc); +void blake2s_compress(struct blake2s_state *state, const u8 *data, + unsigned int nblocks, u32 inc); bool blake2s_selftest(void); diff --git a/lib/crypto/blake2s-generic.c b/lib/crypto/blake2s-generic.c index 75ccb3e633e6..6a1caa702698 100644 --- a/lib/crypto/blake2s-generic.c +++ b/lib/crypto/blake2s-generic.c @@ -37,12 +37,12 @@ static inline void blake2s_increment_counter(struct blake2s_state *state, state->t[1] += (state->t[0] < inc); } -void blake2s_compress(struct blake2s_state *state, const u8 *block, - size_t nblocks, const u32 inc) +void blake2s_compress(struct blake2s_state *state, const u8 *data, + unsigned int nblocks, u32 inc) __weak __alias(blake2s_compress_generic); -void blake2s_compress_generic(struct blake2s_state *state, const u8 *block, - size_t nblocks, const u32 inc) +void blake2s_compress_generic(struct blake2s_state *state, const u8 *data, + unsigned int nblocks, u32 inc) { u32 m[16]; u32 v[16]; @@ -53,7 +53,7 @@ void blake2s_compress_generic(struct blake2s_state *state, const u8 *block, while (nblocks > 0) { blake2s_increment_counter(state, inc); - memcpy(m, block, BLAKE2S_BLOCK_SIZE); + memcpy(m, data, BLAKE2S_BLOCK_SIZE); le32_to_cpu_array(m, ARRAY_SIZE(m)); memcpy(v, state->h, 32); v[ 8] = BLAKE2S_IV0; @@ -103,7 +103,7 @@ void blake2s_compress_generic(struct blake2s_state *state, const u8 *block, for (i = 0; i < 8; ++i) state->h[i] ^= v[i] ^ v[i + 8]; - block += BLAKE2S_BLOCK_SIZE; + data += BLAKE2S_BLOCK_SIZE; --nblocks; } } From patchwork Mon Dec 19 22:02:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077193 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7381C4167B for ; Mon, 19 Dec 2022 22:03:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232588AbiLSWDk (ORCPT ); Mon, 19 Dec 2022 17:03:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232869AbiLSWDd (ORCPT ); Mon, 19 Dec 2022 17:03:33 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1A2A140D8; Mon, 19 Dec 2022 14:03:20 -0800 (PST) Received: from pps.filterd (m0150242.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJKdEeQ002227; Mon, 19 Dec 2022 22:03:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=ytx49hxAW/5Vofc1aOHCr8Q+x9j/e6AuWZ4zsK8+9Fc=; b=T/CbVSUjbXTIPnbdEp5IctHSx/fVjKlksErJBnrc/G/sqyWFXQhLUzilM38Zf247E5jt GKvRjMDVq9Xxg/qXAuOxyysQWkh4Hz2rWITY6gNPt10CKx7yYXkBb3gzfcZ3kyWV6pZ+ RZxksUGp30chzOo1BXP/ir5lu7ytBDADYQ2rtEaDRrHJR6uVdc5hvtgwj13YWk/iUIM0 gbnwoJnORqDTpyKpFJo97/TjKEaASY/Ij0d2J45gW6CRFoXI8/KgwHb0oA0uANPVR8+G 1lt8WQqCMEstAg9FqRQeL6ryFpdhlXCeN64F83QqXwtqs+yGNP2zOALvesIp7KCziLEn qw== Received: from p1lg14878.it.hpe.com (p1lg14878.it.hpe.com [16.230.97.204]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3mjy650gt0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:03:03 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14878.it.hpe.com (Postfix) with ESMTPS id D89063DE25; Mon, 19 Dec 2022 22:03:02 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 161CF808734; Mon, 19 Dec 2022 22:03:02 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 12/13] crypto: x86/chacha - yield FPU context only when needed Date: Mon, 19 Dec 2022 16:02:22 -0600 Message-Id: <20221219220223.3982176-13-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-GUID: j1P-dXNFLU2g8BowGj6h7B-U2Q2W6YtA X-Proofpoint-ORIG-GUID: j1P-dXNFLU2g8BowGj6h7B-U2Q2W6YtA X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 impostorscore=0 mlxscore=0 phishscore=0 clxscore=1015 malwarescore=0 spamscore=0 mlxlogscore=999 adultscore=0 bulkscore=0 priorityscore=1501 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190194 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The x86 assembly language implementations using SIMD process data between kernel_fpu_begin() and kernel_fpu_end() calls. That disables scheduler preemption, so prevents the CPU core from being used by other threads. Rather than break the processing into 4 KiB passes, each of which unilaterally calls kernel_fpu_begin() and kernel_fpu_end(), periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. Suggested-by: Herbert Xu Signed-off-by: Robert Elliott --- arch/x86/crypto/chacha_glue.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/x86/crypto/chacha_glue.c b/arch/x86/crypto/chacha_glue.c index 7b3a1cf0984b..892cbae958b8 100644 --- a/arch/x86/crypto/chacha_glue.c +++ b/arch/x86/crypto/chacha_glue.c @@ -146,17 +146,21 @@ void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes, bytes <= CHACHA_BLOCK_SIZE) return chacha_crypt_generic(state, dst, src, bytes, nrounds); - do { - unsigned int todo = min_t(unsigned int, bytes, SZ_4K); + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(bytes, 4096U); - kernel_fpu_begin(); - chacha_dosimd(state, dst, src, todo, nrounds); - kernel_fpu_end(); + chacha_dosimd(state, dst, src, chunk, nrounds); - bytes -= todo; - src += todo; - dst += todo; - } while (bytes); + bytes -= chunk; + if (!bytes) + break; + + src += chunk; + dst += chunk; + kernel_fpu_yield(); + } + kernel_fpu_end(); } EXPORT_SYMBOL(chacha_crypt_arch); From patchwork Mon Dec 19 22:02:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 13077197 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63F10C4167B for ; Mon, 19 Dec 2022 22:03:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232922AbiLSWDz (ORCPT ); Mon, 19 Dec 2022 17:03:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232633AbiLSWDf (ORCPT ); Mon, 19 Dec 2022 17:03:35 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80D82140F6; Mon, 19 Dec 2022 14:03:21 -0800 (PST) Received: from pps.filterd (m0150241.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJJr66N024598; Mon, 19 Dec 2022 22:03:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=O4GnIk7zd/4MqgzKF2156bsbM+N7CAdELJWrz2MCgdk=; b=SXjklFqcz0t9iTLogr2jGRxusZQC60LRJIbIqsGnHdNhOeE83d0mLu6phlugRe6kfDdl ujMBmMkL3+aSMn/R5QtS1k8ppkPjA30SDCRTBZjjRBwzMwo5K+1PjoPBD26ysB8lyGd9 of2PLn64zaXCZUBrDiUbra03X5J58JNqmanlAa+mW0NIHQ33QqDfYxADKh0vM7qjFsBH SpVWFE56M6x1iieblh75hPbo7NYyWq5txFHkhqWf/LvJly+c0NXNQ3EMH4LFCyDPiMwz NeyIiMhWVvzLs0BfE7BGpIv6R3CspSX1Sytg5q/vN3npNUgjoDt0DmASHBDpEeVkgWs1 SA== Received: from p1lg14879.it.hpe.com (p1lg14879.it.hpe.com [16.230.97.200]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3mjx3b10yx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:03:05 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id 75DB84AC45; Mon, 19 Dec 2022 22:03:04 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id A8B1380649A; Mon, 19 Dec 2022 22:03:03 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 13/13] crypto: x86/aria - yield FPU context only when needed Date: Mon, 19 Dec 2022 16:02:23 -0600 Message-Id: <20221219220223.3982176-14-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: l8D768RjsgE5OEabaL3ZFIEKiFHi-Ky- X-Proofpoint-GUID: l8D768RjsgE5OEabaL3ZFIEKiFHi-Ky- X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 spamscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 bulkscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The x86 assembly language implementations using SIMD process data between kernel_fpu_begin() and kernel_fpu_end() calls. That disables scheduler preemption, so prevents the CPU core from being used by other threads. During ctr mode, rather than break the processing into 256 byte passes, each of which unilaterally calls kernel_fpu_begin() and kernel_fpu_end(), periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. Signed-off-by: Robert Elliott --- arch/x86/crypto/aria_aesni_avx_glue.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c index c561ea4fefa5..6657ce576e6c 100644 --- a/arch/x86/crypto/aria_aesni_avx_glue.c +++ b/arch/x86/crypto/aria_aesni_avx_glue.c @@ -5,6 +5,7 @@ * Copyright (c) 2022 Taehee Yoo */ +#include #include #include #include @@ -85,17 +86,19 @@ static int aria_avx_ctr_encrypt(struct skcipher_request *req) const u8 *src = walk.src.virt.addr; u8 *dst = walk.dst.virt.addr; + kernel_fpu_begin(); while (nbytes >= ARIA_AESNI_PARALLEL_BLOCK_SIZE) { u8 keystream[ARIA_AESNI_PARALLEL_BLOCK_SIZE]; - kernel_fpu_begin(); aria_ops.aria_ctr_crypt_16way(ctx, dst, src, keystream, walk.iv); - kernel_fpu_end(); dst += ARIA_AESNI_PARALLEL_BLOCK_SIZE; src += ARIA_AESNI_PARALLEL_BLOCK_SIZE; nbytes -= ARIA_AESNI_PARALLEL_BLOCK_SIZE; + + kernel_fpu_yield(); } + kernel_fpu_end(); while (nbytes >= ARIA_BLOCK_SIZE) { u8 keystream[ARIA_BLOCK_SIZE];