From patchwork Mon Feb 6 22:58:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130730 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B0190C64EC5 for ; Mon, 6 Feb 2023 22:59:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=OkLfn9M2fopg5cOhPaFlUTi59955vfmES1SZNADypzU=; b=mlofLprDUdBXYm JwSTwi3kISb/9Tm3KuTrRZ+Iquv3LaDwVFTWgf0zo5jXxwFXJM8rLR2plwMPkQbCT3BFDGMU4kDe2 mxJLO2PkC5rkX/D9WOaaNdsdfpoYz5q/ytTRHL8ya3029Pd5SqZ1K1khECv3NMZaABHfdpDpb59fB OBHTSxh2Uxb8vwGy/prSnRv6INu1YU0wP1BV0ZxtWLXUKRnJ+TanSX0ZX0X2joqG6NNaH7PvyRdEz IP71YdQ7ghptcJdGHVqXo42dPlfO1o55YVROnGNzhfh62/hgediJ0pL3fWVuprmOslvHxkLMW/BYj WrK7G+Erb547uxJ+DmtQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS9-00A7L6-VN; Mon, 06 Feb 2023 22:59:13 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS7-00A7J5-1T for linux-riscv@bombadil.infradead.org; Mon, 06 Feb 2023 22:59:11 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=HHw9C76Pqi3jxm6MoeWfyREEyAUDw6my/A9qpPVrVkA=; b=knXdYvl/UvoLMhlf9VERjtX2p8 m1f2Csl0tlanzJCv4LGX2L3GsYtiSl1TgSqW7xIshFAx9tPMdIIXMOyj3lQnbkWJUoam5ppjgHjn0 CbI27FMfnpPx7jvF+vcRHo2QWo5xJeJ++YwnroO1VoBHxw+5rZAswVcuSqMBREceKTGuLDQ8eVkQ4 QZCxY/1zuwda5iWtsEfx2FXv1aj9n3YPGFl4omGdCDBc8RGoo8DvKwB3D4EdOZ5AG8Mb73rbq0sY/ XLJINZvIemn7wtP79AFrVoezTxD/eAE3fpQqnGk8VWvGdPUlFhK3nPqIFzU98Ixt7t+R+RsREf7IO uuh5m5wQ==; Received: from gloria.sntech.de ([185.11.138.130]) by desiato.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pPARQ-006ho0-22 for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 22:58:30 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARk-0002Mb-Gg; Mon, 06 Feb 2023 23:58:48 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Vincent Chen , Heiko Stuebner Subject: [PATCH RFC 01/12] riscv: Add support for kernel mode vector Date: Mon, 6 Feb 2023 23:58:35 +0100 Message-Id: <20230206225846.1381789-2-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_225828_878591_8C29C283 X-CRM114-Status: GOOD ( 20.07 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Greentime Hu Add kernel_rvv_begin() and kernel_rvv_end() function declarations and corresponding definitions in kernel_mode_vector.c These are needed to wrap uses of vector in kernel mode. Co-developed-by: Vincent Chen Signed-off-by: Vincent Chen Signed-off-by: Greentime Hu Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/vector.h | 14 +++ arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/kernel_mode_vector.c | 132 +++++++++++++++++++++++++ 3 files changed, 147 insertions(+) create mode 100644 arch/riscv/kernel/kernel_mode_vector.c diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index 7c77696d704a..f38266ec483a 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -134,6 +134,20 @@ static inline void vstate_restore(struct task_struct *task, } } +static inline void vector_flush_cpu_state(void) +{ + asm volatile ( + "vsetvli t0, x0, e8, m8, ta, ma\n\t" + "vmv.v.i v0, 0\n\t" + "vmv.v.i v8, 0\n\t" + "vmv.v.i v16, 0\n\t" + "vmv.v.i v24, 0\n\t" + : : : "t0"); +} + +void kernel_rvv_begin(void); +void kernel_rvv_end(void); + #else /* ! CONFIG_RISCV_ISA_V */ struct pt_regs; diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 48d345a5f326..304c500cc1f7 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -56,6 +56,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/ obj-$(CONFIG_RISCV_M_MODE) += traps_misaligned.o obj-$(CONFIG_FPU) += fpu.o obj-$(CONFIG_RISCV_ISA_V) += vector.o +obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o obj-$(CONFIG_SMP) += smpboot.o obj-$(CONFIG_SMP) += smp.o obj-$(CONFIG_SMP) += cpu_ops.o diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c new file mode 100644 index 000000000000..0277168af0c5 --- /dev/null +++ b/arch/riscv/kernel/kernel_mode_vector.c @@ -0,0 +1,132 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2012 ARM Ltd. + * Author: Catalin Marinas + * Copyright (C) 2017 Linaro Ltd. + * Copyright (C) 2021 SiFive + */ +#include +#include +#include +#include +#include + +#include +#include + +DECLARE_PER_CPU(bool, vector_context_busy); +DEFINE_PER_CPU(bool, vector_context_busy); + +/* + * may_use_vector - whether it is allowable at this time to issue vector + * instructions or access the vector register file + * + * Callers must not assume that the result remains true beyond the next + * preempt_enable() or return from softirq context. + */ +static __must_check inline bool may_use_vector(void) +{ + /* + * vector_context_busy is only set while preemption is disabled, + * and is clear whenever preemption is enabled. Since + * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy + * cannot change under our feet -- if it's set we cannot be + * migrated, and if it's clear we cannot be migrated to a CPU + * where it is set. + */ + return !in_irq() && !irqs_disabled() && !in_nmi() && + !this_cpu_read(vector_context_busy); +} + +/* + * Claim ownership of the CPU vector context for use by the calling context. + * + * The caller may freely manipulate the vector context metadata until + * put_cpu_vector_context() is called. + */ +static void get_cpu_vector_context(void) +{ + bool busy; + + preempt_disable(); + busy = __this_cpu_xchg(vector_context_busy, true); + + WARN_ON(busy); +} + +/* + * Release the CPU vector context. + * + * Must be called from a context in which get_cpu_vector_context() was + * previously called, with no call to put_cpu_vector_context() in the + * meantime. + */ +static void put_cpu_vector_context(void) +{ + bool busy = __this_cpu_xchg(vector_context_busy, false); + + WARN_ON(!busy); + preempt_enable(); +} + +/* + * kernel_rvv_begin(): obtain the CPU vector registers for use by the calling + * context + * + * Must not be called unless may_use_vector() returns true. + * Task context in the vector registers is saved back to memory as necessary. + * + * A matching call to kernel_rvv_end() must be made before returning from the + * calling context. + * + * The caller may freely use the vector registers until kernel_rvv_end() is + * called. + */ +void kernel_rvv_begin(void) +{ + if (WARN_ON(!has_vector())) + return; + + WARN_ON(!may_use_vector()); + + /* Acquire kernel mode vector */ + get_cpu_vector_context(); + + /* Save vector state, if any */ + vstate_save(current, task_pt_regs(current)); + + /* Enable vector */ + rvv_enable(); + + /* Invalidate vector regs */ + vector_flush_cpu_state(); +} +EXPORT_SYMBOL_GPL(kernel_rvv_begin); + +/* + * kernel_rvv_end(): give the CPU vector registers back to the current task + * + * Must be called from a context in which kernel_rvv_begin() was previously + * called, with no call to kernel_rvv_end() in the meantime. + * + * The caller must not use the vector registers after this function is called, + * unless kernel_rvv_begin() is called again in the meantime. + */ +void kernel_rvv_end(void) +{ + if (WARN_ON(!has_vector())) + return; + + /* Invalidate vector regs */ + vector_flush_cpu_state(); + + /* Restore vector state, if any */ + vstate_restore(current, task_pt_regs(current)); + + /* disable vector */ + rvv_disable(); + + /* release kernel mode vector */ + put_cpu_vector_context(); +} +EXPORT_SYMBOL_GPL(kernel_rvv_end); From patchwork Mon Feb 6 22:58:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130735 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BC984C61DA4 for ; Mon, 6 Feb 2023 22:59:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=5PFV+vviqfsoMnLM22SmwTDaYdGHEoUPdFOpJOianKE=; b=09cTSbQ+GQoz2f PVH6FFHXrJVjKMbNAm7ltjtNr3X9hDp2iVkfzYAwBFt/vmFih+rV4ZXyQ0ok2ulpA+9Va+uVaUuDq mvqIwJ3sRsayGN1X2tuibrBDhfI4IfC8N3kr1xIdf/9vXIzR//DepGSUE7zXF5eTr07yHIyYMSZOQ wZ25SzBrJn4tBOKwOSf0I4dtOtYoC06FYIZzH6upDwaKdrAzjolOlPd+xKe08EvTh5AMoi5YrfBjm 5CmRNi4FmnGrqNSEbf19XvYp1Gf5D8p2LIGvpMA7HXEqBcJ8tF0Yzgsa6w0eWM8anDqThrJ5TpwWW r5/fC/RmZR+5FkSg7VEw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPASM-00A7Uf-CB; Mon, 06 Feb 2023 22:59:26 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPASJ-00A7Ru-7y for linux-riscv@bombadil.infradead.org; Mon, 06 Feb 2023 22:59:23 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=7MU0IrXD8ygZspwGHS6yiB9o0+by13/yhB/K2AA2HrI=; b=XCItZY0zuFP0401KCdzGMQSR7D BcWHIGzQtwlgSYdFUsGQscecKQZ1BgckW3/G9D6fRfcUk5P3Wx/8Eree07MqHTLzscT72sXr+RgJ4 ZhApDA54ZX3MCIdB/3Og3QM5lqo7LlDJahx5ofohcdjGw4+fbm0FxIxisQIHDlg941O5j8lCwerqM Fqz5B/yaVxDuMp62I3W4WnOnCWlTF4IyFdOxKlJsv/x80uIXvh/MhK4Lql4uBN8xVTR7iC2d6EGcn ApgHWIl0FRWJJB38QZhdbXNpKotxLfMUIAOHqs/srlxIdMHPWStbwS5vk/QfqGPVZZoYTqgThv36/ PBqUol5w==; Received: from gloria.sntech.de ([185.11.138.130]) by desiato.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pPARW-006ho4-05 for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 22:58:42 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARk-0002Mb-QL; Mon, 06 Feb 2023 23:58:48 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Han-Kuan Chen , Heiko Stuebner Subject: [PATCH RFC 02/12] riscv: Add vector extension XOR implementation Date: Mon, 6 Feb 2023 23:58:36 +0100 Message-Id: <20230206225846.1381789-3-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_225834_435161_DA0D4CC4 X-CRM114-Status: GOOD ( 15.92 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Greentime Hu This patch adds support for vector optimized XOR and it is tested in qemu. Co-developed-by: Han-Kuan Chen Signed-off-by: Han-Kuan Chen Signed-off-by: Greentime Hu Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/xor.h | 82 ++++++++++++++++++++++++++++++++++++ arch/riscv/lib/Makefile | 1 + arch/riscv/lib/xor.S | 81 +++++++++++++++++++++++++++++++++++ 3 files changed, 164 insertions(+) create mode 100644 arch/riscv/include/asm/xor.h create mode 100644 arch/riscv/lib/xor.S diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h new file mode 100644 index 000000000000..74867c7fd955 --- /dev/null +++ b/arch/riscv/include/asm/xor.h @@ -0,0 +1,82 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2021 SiFive + */ + +#include +#include +#ifdef CONFIG_VECTOR +#include +#include + +void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2); +void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3); +void xor_regs_4_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4); +void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4, + const unsigned long *__restrict p5); + +static void xor_rvv_2(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2) +{ + kernel_rvv_begin(); + xor_regs_2_(bytes, p1, p2); + kernel_rvv_end(); +} + +static void xor_rvv_3(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3) +{ + kernel_rvv_begin(); + xor_regs_3_(bytes, p1, p2, p3); + kernel_rvv_end(); +} + +static void xor_rvv_4(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4) +{ + kernel_rvv_begin(); + xor_regs_4_(bytes, p1, p2, p3, p4); + kernel_rvv_end(); +} + +static void xor_rvv_5(unsigned long bytes, unsigned long *__restrict p1, + const unsigned long *__restrict p2, + const unsigned long *__restrict p3, + const unsigned long *__restrict p4, + const unsigned long *__restrict p5) +{ + kernel_rvv_begin(); + xor_regs_5_(bytes, p1, p2, p3, p4, p5); + kernel_rvv_end(); +} + +static struct xor_block_template xor_block_rvv = { + .name = "rvv", + .do_2 = xor_rvv_2, + .do_3 = xor_rvv_3, + .do_4 = xor_rvv_4, + .do_5 = xor_rvv_5 +}; + +#undef XOR_TRY_TEMPLATES +#define XOR_TRY_TEMPLATES \ + do { \ + xor_speed(&xor_block_8regs); \ + xor_speed(&xor_block_32regs); \ + if (has_vector()) { \ + xor_speed(&xor_block_rvv);\ + } \ + } while (0) +#endif diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile index 25d5c9664e57..acd87ac86d24 100644 --- a/arch/riscv/lib/Makefile +++ b/arch/riscv/lib/Makefile @@ -7,3 +7,4 @@ lib-$(CONFIG_MMU) += uaccess.o lib-$(CONFIG_64BIT) += tishift.o obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o +lib-$(CONFIG_VECTOR) += xor.o diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S new file mode 100644 index 000000000000..3bc059e18171 --- /dev/null +++ b/arch/riscv/lib/xor.S @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2021 SiFive + */ +#include +#include +#include + +ENTRY(xor_regs_2_) + vsetvli a3, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a3 + vxor.vv v16, v0, v8 + add a2, a2, a3 + vse8.v v16, (a1) + add a1, a1, a3 + bnez a0, xor_regs_2_ + ret +END(xor_regs_2_) +EXPORT_SYMBOL(xor_regs_2_) + +ENTRY(xor_regs_3_) + vsetvli a4, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a4 + vxor.vv v0, v0, v8 + vle8.v v16, (a3) + add a2, a2, a4 + vxor.vv v16, v0, v16 + add a3, a3, a4 + vse8.v v16, (a1) + add a1, a1, a4 + bnez a0, xor_regs_3_ + ret +END(xor_regs_3_) +EXPORT_SYMBOL(xor_regs_3_) + +ENTRY(xor_regs_4_) + vsetvli a5, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a5 + vxor.vv v0, v0, v8 + vle8.v v16, (a3) + add a2, a2, a5 + vxor.vv v0, v0, v16 + vle8.v v24, (a4) + add a3, a3, a5 + vxor.vv v16, v0, v24 + add a4, a4, a5 + vse8.v v16, (a1) + add a1, a1, a5 + bnez a0, xor_regs_4_ + ret +END(xor_regs_4_) +EXPORT_SYMBOL(xor_regs_4_) + +ENTRY(xor_regs_5_) + vsetvli a6, a0, e8, m8, ta, ma + vle8.v v0, (a1) + vle8.v v8, (a2) + sub a0, a0, a6 + vxor.vv v0, v0, v8 + vle8.v v16, (a3) + add a2, a2, a6 + vxor.vv v0, v0, v16 + vle8.v v24, (a4) + add a3, a3, a6 + vxor.vv v0, v0, v24 + vle8.v v8, (a5) + add a4, a4, a6 + vxor.vv v16, v0, v8 + add a5, a5, a6 + vse8.v v16, (a1) + add a1, a1, a6 + bnez a0, xor_regs_5_ + ret +END(xor_regs_5_) +EXPORT_SYMBOL(xor_regs_5_) From patchwork Mon Feb 6 22:58:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130734 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CE0C6C05027 for ; Mon, 6 Feb 2023 22:59:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=QPB1ocQl0VpVxI6rwBMMfIA0Cn3tvH1QTl+9nusKIhE=; b=NJcFutE6QQRlIg eJohpsfYc34jECwvbMZkUzKqSLhoTK2+hs9jVDtKc+NLCCzdzojnuH5bnTTDx/QnFZmB4//Fr5DrC Gz/Q7vAh+QXVdyjF0SduJ55IqVXOyGJj4OPSD3FWr4O9YqhnBWdhZBWnywdHZcm1bbycjIMwU+6zn nX/IMuj0DRwkA2I/jK/anlAPyfWP/nxPRgKcnAeImNC7yjwfeSB7fSfbeknxbAZOXOAIlgg+YhTe8 VWMjZHbfEEUdrUONCm4KK96+1h9Ojgantunk9LHpcENOitQa6769WCKSOodbwMt9mlEErjRVz8G3n s0C8QJxE4TueiuQ2QU6g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPASK-00A7T3-AE; Mon, 06 Feb 2023 22:59:24 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPASH-00A7Pn-AK for linux-riscv@bombadil.infradead.org; Mon, 06 Feb 2023 22:59:21 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=Ha2VcTf/Ukpmf2n4K/vpPnR+V+kmAlwsf0EEZj20Wu8=; b=INM6a3uYptw64e/6DnFrYoV35V sHik9elkCi6OSqK9MuuDhWz2vekmy+CFOZ7JZOuKU6WP+oe7DmlsCfEE2cPjWx0wwxFcp76jZNg2p AdwJSy5qvDguUPdGoeqspWjC4YQtbMe2HOJuAthrVNXnGkEKIxACHhACiOcMaG8ujx4zJUoP/6qL5 FMWjei+K58h3SQ70mQA5EiQJ/zFURmOoOcavmxYOAHm8RUlsovhCbcp21n3uTtPpDh5Bi6opfFsF6 SM5Xyu6QI2ARBS3M9e4iD2s7NUeKgFiBQBXqjzYnLTdV0ngywsfMc+EaP8YDRpb6EShKF88VZmb6h StQtCZDw==; Received: from gloria.sntech.de ([185.11.138.130]) by desiato.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pPARV-006ho3-2V for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 22:58:35 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARl-0002Mb-2T; Mon, 06 Feb 2023 23:58:49 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 03/12] RISC-V: add Zbb extension detection Date: Mon, 6 Feb 2023 23:58:37 +0100 Message-Id: <20230206225846.1381789-4-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_225834_075118_C85285F6 X-CRM114-Status: GOOD ( 11.45 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner Add handling for Zbb extension. Zbb provides basic bit-manipulation instructions. As multiple subsequent features want to check for zbb presence, add the extension handling without directly including code using it. Signed-off-by: Heiko Stuebner --- arch/riscv/Kconfig | 23 +++++++++++++++++++++++ arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 1 + 4 files changed, 26 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index f4299ba9a843..f4b0e0144516 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -426,6 +426,29 @@ config RISCV_ISA_V If you don't know what to do here, say Y. +config TOOLCHAIN_HAS_ZBB + bool + default y + depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbb) + depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbb) + depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900 + depends on AS_IS_GNU + +config RISCV_ISA_ZBB + bool "Zbb extension support for bit manipulation instructions" + depends on TOOLCHAIN_HAS_ZBB + depends on !XIP_KERNEL && MMU + default y + help + Adds support to dynamically detect the presence of the ZBB + extension (basic bit manipulation) and enable its usage. + + The Zbb extension provides instructions to accelerate a number + of bit-specific operations (count bit population, sign extending, + bitrotation, etc). + + If you don't know what to do here, say Y. + config TOOLCHAIN_HAS_ZICBOM bool default y diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index f413db6118e5..c8c69b49f0ad 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -59,6 +59,7 @@ enum riscv_isa_ext_id { RISCV_ISA_EXT_SSTC, RISCV_ISA_EXT_SVINVAL, RISCV_ISA_EXT_SVPBMT, + RISCV_ISA_EXT_ZBB, RISCV_ISA_EXT_ZICBOM, RISCV_ISA_EXT_ZIHINTPAUSE, RISCV_ISA_EXT_ID_MAX diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index 0bf1c7f663fc..420228e219f7 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -185,6 +185,7 @@ arch_initcall(riscv_cpuinfo_init); * New entries to this struct should follow the ordering rules described above. */ static struct riscv_isa_ext_data isa_ext_arr[] = { + __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB), __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM), __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE), __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF), diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index cbd60e744c09..33938f91cbbf 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -234,6 +234,7 @@ printk("!!!! isa-string: %s\n\n\n", isa); SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC); SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL); SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); + SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB); SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE); } From patchwork Mon Feb 6 22:58:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130728 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 59D25C636D6 for ; Mon, 6 Feb 2023 22:59:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=REVHDw1ctWaaQczC10w/jtBtwr8UtzXGck4CFIcqpKY=; b=TjUWBFlChnZq0j oBHHs38fZap9Yd3wvZz1lGAyRCRiIV4TZsxJt03r1G7l/XRaylsgRlPngercA1u9aMXXK+31XTHPm Qg1voedDAAOXljsUeuSbVaaAzPI93daAtoYio6KjyWbsurYMZJAYYLf8jQNwbDoE6+npLd7P9zklT xJFSqQ6yIXxQXP+7c3kD+LdDnSgkoF39fLu0PpQCHQax49Mc2AJPp91Kt9xl9Hg/Z0wo0vY+njoJ9 ye/naEcsDJPWaAVBSsUoKW8mRxyLspCgxI+rdeagEB7xw8v1BBvXy0obAZIE2qEUp9PdLC8IxaRMG 7XfPaQmN7zHIEIOxthdw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS5-00A7Ix-64; Mon, 06 Feb 2023 22:59:09 +0000 Received: from gloria.sntech.de ([185.11.138.130]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS0-00A7Ez-Ry for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 22:59:06 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARl-0002Mb-AY; Mon, 06 Feb 2023 23:58:49 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 04/12] RISC-V: add Zbc extension detection Date: Mon, 6 Feb 2023 23:58:38 +0100 Message-Id: <20230206225846.1381789-5-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_145904_925462_643D1AC1 X-CRM114-Status: GOOD ( 11.11 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner Add handling for Zbc extension. Zbc provides instruction for carry-less multiplication. Signed-off-by: Heiko Stuebner --- arch/riscv/Kconfig | 22 ++++++++++++++++++++++ arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 1 + 4 files changed, 25 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index f4b0e0144516..05b92bcb7bfe 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -449,6 +449,28 @@ config RISCV_ISA_ZBB If you don't know what to do here, say Y. +config TOOLCHAIN_HAS_ZBC + bool + default y + depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbc) + depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbc) + depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900 + depends on AS_IS_GNU + +config RISCV_ISA_ZBC + bool "Zbc extension support for bit manipulation instructions" + depends on TOOLCHAIN_HAS_ZBC + depends on !XIP_KERNEL && MMU + default y + help + Adds support to dynamically detect the presence of the ZBC + extension (carry-less multiplication) and enable its usage. + + The Zbc extension provides instructions clmul, clmulh and clmulr + to accelerate carry-less multiplications. + + If you don't know what to do here, say Y. + config TOOLCHAIN_HAS_ZICBOM bool default y diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index c8c69b49f0ad..8673c2146d20 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -60,6 +60,7 @@ enum riscv_isa_ext_id { RISCV_ISA_EXT_SVINVAL, RISCV_ISA_EXT_SVPBMT, RISCV_ISA_EXT_ZBB, + RISCV_ISA_EXT_ZBC, RISCV_ISA_EXT_ZICBOM, RISCV_ISA_EXT_ZIHINTPAUSE, RISCV_ISA_EXT_ID_MAX diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index 420228e219f7..995462a0de86 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -186,6 +186,7 @@ arch_initcall(riscv_cpuinfo_init); */ static struct riscv_isa_ext_data isa_ext_arr[] = { __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB), + __RISCV_ISA_EXT_DATA(zbc, RISCV_ISA_EXT_ZBC), __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM), __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE), __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF), diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 33938f91cbbf..ba74f3fa2310 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -235,6 +235,7 @@ printk("!!!! isa-string: %s\n\n\n", isa); SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL); SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB); + SET_ISA_EXT_MAP("zbc", RISCV_ISA_EXT_ZBC); SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE); } From patchwork Mon Feb 6 22:58:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130727 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 03203C61DA4 for ; Mon, 6 Feb 2023 22:59:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=inYC0RH5oJgkzyxn7cDPc9D0Pjy/KI9iOUE+2v6eInQ=; b=JsZuU5Z207RoGM 3cALCRBqdElibJAdPNvUeroxq88fHtUdbfEtBP6UEGvppOrA0W3rXnEFSM4NNpFP1LutMogXx6Ohl QabKobhb21sDJEUHZSb/L2e8pgmLBiRCDF49F4OhzGeELggCihpq+KsI88/vXDEfnM9+4qwF3ncqU xqyZ2nnEBJgzxuvdx0vs5viUy0clu2TYCUblqAwNsTgrl0wNKK6iWeLIZrmLlpiAIuWWqbsJ8wdDW uVO5RhN1vuks3k9EsXkAGF4Mi26eiKRMIlc8LWbY3Kb+PwnkEdAbnfuxrbcGyYC4GctOCkMd362rU KSAzR77cBDMo/2DUgI4Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS6-00A7JW-DX; Mon, 06 Feb 2023 22:59:10 +0000 Received: from gloria.sntech.de ([185.11.138.130]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS1-00A7F2-4K for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 22:59:07 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARl-0002Mb-Il; Mon, 06 Feb 2023 23:58:49 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 05/12] RISC-V: add Zbkb extension detection Date: Mon, 6 Feb 2023 23:58:39 +0100 Message-Id: <20230206225846.1381789-6-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_145905_196747_3DDE5AFE X-CRM114-Status: GOOD ( 10.15 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner Add detection for Zbkb extension. Zbkb is part of the set of scalar cryptography extensions and provides bitmanip instructions for cryptography, with them being a "subset of the Zbb extension particularly useful for cryptography". Zbkb was ratified in january 2022. Expect code using the extension to pre-encode zbkb instructions, so don't introduce special toolchain requirements for now. Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 1 + 3 files changed, 3 insertions(+) diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 8673c2146d20..23427b9ed1e6 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -61,6 +61,7 @@ enum riscv_isa_ext_id { RISCV_ISA_EXT_SVPBMT, RISCV_ISA_EXT_ZBB, RISCV_ISA_EXT_ZBC, + RISCV_ISA_EXT_ZBKB, RISCV_ISA_EXT_ZICBOM, RISCV_ISA_EXT_ZIHINTPAUSE, RISCV_ISA_EXT_ID_MAX diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index 995462a0de86..f9f361285b04 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -187,6 +187,7 @@ arch_initcall(riscv_cpuinfo_init); static struct riscv_isa_ext_data isa_ext_arr[] = { __RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB), __RISCV_ISA_EXT_DATA(zbc, RISCV_ISA_EXT_ZBC), + __RISCV_ISA_EXT_DATA(zbkb, RISCV_ISA_EXT_ZBKB), __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM), __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE), __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF), diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index ba74f3fa2310..695dfd732483 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -236,6 +236,7 @@ printk("!!!! isa-string: %s\n\n\n", isa); SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB); SET_ISA_EXT_MAP("zbc", RISCV_ISA_EXT_ZBC); + SET_ISA_EXT_MAP("zbkb", RISCV_ISA_EXT_ZBKB); SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE); } From patchwork Mon Feb 6 22:58:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130733 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B3817C61DA4 for ; Mon, 6 Feb 2023 22:59:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=OvMWBYrcWcyp3I9mZsNgTNhgXRws5Mr/mnJgxhltWH8=; b=4Po49LMw+3N/ae ujmAHYOSL2/My5kepdxzpNaZb08uYoNaxf+XKxzE5xTi4Cv7A0C+0h0o8l1VwithnBm3GDEdGJRZi tdpTprREuCEzGxvgy/i+EkCbs9DMsZeWVLl29fgdVKBlaxPJ16bqoSHyHRVEU7iEjEtjazbB3Ze6x We52wRH2K5d+axnWslceh8iNC9TeI4HdGAgceGwz8RIINvbcVj4j1qAwF8Px8llYyIv+3unb8KyoG zBwfkCDslmNqEN6h9vev0kvYcnuQHujLRnhlMAT4Ta5gVrlh+mns0TJTDjKkXWneYB6uJPeY9nZRB 69iEnjFWTtNeMHKpC7UA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPASH-00A7Qk-HL; Mon, 06 Feb 2023 22:59:21 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPASG-00A7Ox-4g for linux-riscv@bombadil.infradead.org; Mon, 06 Feb 2023 22:59:20 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=vJF6CvlSXvmq7qX4eunn3BoeOyl7pdqADErsJfHPImU=; b=WnHci2sMnUk93xgn2avH79ybkv gLEidXi4+quzgCF+LxjjusWAp7yL9bEM88Rmj1Y/VQsp9RhHrnn7oDz6y5RHSgXCL+OIdkltPh8Oy XSIL/Wbs+B9/lH51Z4zLX7IOXOHqCo3B1v1MfuLVIjNcEREFs/EVJGsHynqZaLupFN1TZeavfvaio 0aj0ovYANCy/w9RTHPBJtshLuuCc8hNDmrH6BKmOSa+LE7CQrV4pX4Q5ZIW6YDq62fBqW05/zqUmS DCiHYvO8eRfNOfexTgnFtx1cbA7dAuoW+3HLbFzxNhiOOl3s9KlxGOHDSbmU8sjRooaVj9wIkdIac QnRwntCA==; Received: from gloria.sntech.de ([185.11.138.130]) by desiato.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pPARV-006ho1-20 for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 22:58:35 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARl-0002Mb-Qy; Mon, 06 Feb 2023 23:58:49 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 06/12] RISC-V: hook new crypto subdir into build-system Date: Mon, 6 Feb 2023 23:58:40 +0100 Message-Id: <20230206225846.1381789-7-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_225833_944605_61C4F61C X-CRM114-Status: GOOD ( 14.17 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner Create a crypto subdirectory for added accelerated cryptography routines and hook it into the riscv Kbuild and the main crypto Kconfig. Signed-off-by: Heiko Stuebner --- arch/riscv/Kbuild | 1 + arch/riscv/crypto/Kconfig | 5 +++++ arch/riscv/crypto/Makefile | 4 ++++ crypto/Kconfig | 3 +++ 4 files changed, 13 insertions(+) create mode 100644 arch/riscv/crypto/Kconfig create mode 100644 arch/riscv/crypto/Makefile diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild index afa83e307a2e..250d1fd38618 100644 --- a/arch/riscv/Kbuild +++ b/arch/riscv/Kbuild @@ -2,6 +2,7 @@ obj-y += kernel/ mm/ net/ obj-$(CONFIG_BUILTIN_DTB) += boot/dts/ +obj-$(CONFIG_CRYPTO) += crypto/ obj-y += errata/ obj-$(CONFIG_KVM) += kvm/ diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig new file mode 100644 index 000000000000..10d60edc0110 --- /dev/null +++ b/arch/riscv/crypto/Kconfig @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +menu "Accelerated Cryptographic Algorithms for CPU (riscv)" + +endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile new file mode 100644 index 000000000000..b3b6332c9f6d --- /dev/null +++ b/arch/riscv/crypto/Makefile @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# linux/arch/riscv/crypto/Makefile +# diff --git a/crypto/Kconfig b/crypto/Kconfig index 9c86f7045157..003921cb0301 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1401,6 +1401,9 @@ endif if PPC source "arch/powerpc/crypto/Kconfig" endif +if RISCV +source "arch/riscv/crypto/Kconfig" +endif if S390 source "arch/s390/crypto/Kconfig" endif From patchwork Mon Feb 6 22:58:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130731 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7C71AC64EC4 for ; Mon, 6 Feb 2023 22:59:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=MxxIFf5mukwKO3tcvy0lP9DXZI8q0r7VBXoqI9QRl6o=; b=MY29TiGFCtp8h9 TAAwDeUk2L2evsdWqBTa4e3w62/ad7VbAvrZX/12Vo79IlV5OIfTOgEy9ZERm06Nh7KmHYWDcuy6d +j23L7Y7PPewgDg3fXJ0XAm1kkULBklgh19hwGv8yhRnaRAtRCelqzMlcG26VvH/LbA5I3K3gZeYy b780J8MpZOMpK6S4FX4/JW2xP7qiZN8L+VVOrcBKbcQ86J7bk8lSCfw8aVz6j+0+yR3mS2e/Ga/+O NHVBNNUjibhf+SYHh2wB5U4ml0GOThhSzc1ECEEHUcakdb5IV3ZQPmXbqb0hZXPjTN4OPXByox6y5 IrOQiUjv9RdIUg768IrA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS8-00A7KZ-P3; Mon, 06 Feb 2023 22:59:12 +0000 Received: from gloria.sntech.de ([185.11.138.130]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS1-00A7F4-4A for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 22:59:09 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARm-0002Mb-31; Mon, 06 Feb 2023 23:58:50 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 07/12] RISC-V: crypto: add accelerated GCM GHASH implementation Date: Mon, 6 Feb 2023 23:58:41 +0100 Message-Id: <20230206225846.1381789-8-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_145905_493856_3BB64C54 X-CRM114-Status: GOOD ( 31.80 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner With different sets of available extensions a number of different implementation variants are possible. Quite a number of them are already implemented in openSSL or are in the process of being implemented, so pick the relevant openSSL coden and add suitable glue code similar to arm64 and powerpc to use it for kernel-specific cryptography. The prioritization of the algorithms follows the ifdef chain for the assembly callbacks done in openssl but here algorithms will get registered separately so that all of them can be part of the crypto selftests. The crypto subsystem will select the most performant of all registered algorithms on the running system but will selftest all registered ones. In a first step this adds scalar variants using the Zbc, Zbb and possible Zbkb (bitmanip crypto extension) and the perl implementation stems from openSSL pull request on https://github.com/openssl/openssl/pull/20078 Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 11 + arch/riscv/crypto/Makefile | 14 + arch/riscv/crypto/ghash-riscv64-glue.c | 263 ++++++++++++++++ arch/riscv/crypto/ghash-riscv64-zbc.pl | 400 +++++++++++++++++++++++++ arch/riscv/crypto/riscv.pm | 230 ++++++++++++++ 5 files changed, 918 insertions(+) create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c create mode 100644 arch/riscv/crypto/ghash-riscv64-zbc.pl create mode 100644 arch/riscv/crypto/riscv.pm diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 10d60edc0110..010adbbb058a 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -2,4 +2,15 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)" +config CRYPTO_GHASH_RISCV64 + tristate "Hash functions: GHASH" + depends on 64BIT && RISCV_ISA_ZBC + select CRYPTO_HASH + select CRYPTO_LIB_GF128MUL + help + GCM GHASH function (NIST SP800-38D) + + Architecture: riscv64 using one of: + - ZBC extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index b3b6332c9f6d..0a158919e9da 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -2,3 +2,17 @@ # # linux/arch/riscv/crypto/Makefile # + +obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o +ghash-riscv64-y := ghash-riscv64-glue.o +ifdef CONFIG_RISCV_ISA_ZBC +ghash-riscv64-y += ghash-riscv64-zbc.o +endif + +quiet_cmd_perlasm = PERLASM $@ + cmd_perlasm = $(PERL) $(<) void $(@) + +$(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl + $(call cmd,perlasm) + +clean-files += ghash-riscv64-zbc.S diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c new file mode 100644 index 000000000000..9802b6718c3c --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-glue.c @@ -0,0 +1,263 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GHASH routines supporting VMX instructions on the Power 8 + * + * Copyright (C) 2015, 2019 International Business Machines Inc. + * + * Author: Marcelo Henrique Cerri + * + * Extended by Daniel Axtens to replace the fallback + * mechanism. The new approach is based on arm64 code, which is: + * Copyright (C) 2014 - 2018 Linaro Ltd. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* Zbc (optional with zbkb improvements) */ +void gcm_ghash_rv64i_zbc(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); +void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); + +struct riscv64_ghash_ctx { + void (*ghash_func)(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); + + /* key used by vector asm */ + u128 htable[16]; + /* key used by software fallback */ + be128 key; +}; + +struct riscv64_ghash_desc_ctx { + u64 shash[2]; + u8 buffer[GHASH_DIGEST_SIZE]; + int bytes; +}; + +static int riscv64_ghash_init(struct shash_desc *desc) +{ + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + dctx->bytes = 0; + memset(dctx->shash, 0, GHASH_DIGEST_SIZE); + return 0; +} + +#ifdef CONFIG_RISCV_ISA_ZBC + +#define RISCV64_ZBC_SETKEY(VARIANT, GHASH) \ +void gcm_init_rv64i_ ## VARIANT(u128 Htable[16], const u64 Xi[2]); \ +static int riscv64_zbc_ghash_setkey_ ## VARIANT(struct crypto_shash *tfm, \ + const u8 *key, \ + unsigned int keylen) \ +{ \ + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm)); \ + const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]), \ + cpu_to_be64(((const u64 *)key)[1]) }; \ + \ + if (keylen != GHASH_BLOCK_SIZE) \ + return -EINVAL; \ + \ + memcpy(&ctx->key, key, GHASH_BLOCK_SIZE); \ + gcm_init_rv64i_ ## VARIANT(ctx->htable, k); \ + \ + ctx->ghash_func = gcm_ghash_rv64i_ ## GHASH; \ + \ + return 0; \ +} + +static int riscv64_zbc_ghash_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen) +{ + unsigned int len; + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + if (dctx->bytes) { + if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) { + memcpy(dctx->buffer + dctx->bytes, src, + srclen); + dctx->bytes += srclen; + return 0; + } + memcpy(dctx->buffer + dctx->bytes, src, + GHASH_DIGEST_SIZE - dctx->bytes); + + ctx->ghash_func(dctx->shash, ctx->htable, + dctx->buffer, GHASH_DIGEST_SIZE); + + src += GHASH_DIGEST_SIZE - dctx->bytes; + srclen -= GHASH_DIGEST_SIZE - dctx->bytes; + dctx->bytes = 0; + } + len = srclen & ~(GHASH_DIGEST_SIZE - 1); + + if (len) { + gcm_ghash_rv64i_zbc(dctx->shash, ctx->htable, + src, len); + src += len; + srclen -= len; + } + + if (srclen) { + memcpy(dctx->buffer, src, srclen); + dctx->bytes = srclen; + } + return 0; +} + +static int riscv64_zbc_ghash_final(struct shash_desc *desc, u8 *out) +{ + int i; + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + if (dctx->bytes) { + for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++) + dctx->buffer[i] = 0; + ctx->ghash_func(dctx->shash, ctx->htable, + dctx->buffer, GHASH_DIGEST_SIZE); + dctx->bytes = 0; + } + memcpy(out, dctx->shash, GHASH_DIGEST_SIZE); + return 0; +} + +RISCV64_ZBC_SETKEY(zbc, zbc); +struct shash_alg riscv64_zbc_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zbc_ghash_update, + .final = riscv64_zbc_ghash_final, + .setkey = riscv64_zbc_ghash_setkey_zbc, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zbc_ghash", + .cra_priority = 250, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +RISCV64_ZBC_SETKEY(zbc__zbb, zbc); +struct shash_alg riscv64_zbc_zbb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zbc_ghash_update, + .final = riscv64_zbc_ghash_final, + .setkey = riscv64_zbc_ghash_setkey_zbc__zbb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zbc_zbb_ghash", + .cra_priority = 251, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +RISCV64_ZBC_SETKEY(zbc__zbkb, zbc__zbkb); +struct shash_alg riscv64_zbc_zbkb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zbc_ghash_update, + .final = riscv64_zbc_ghash_final, + .setkey = riscv64_zbc_ghash_setkey_zbc__zbkb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zbc_zbkb_ghash", + .cra_priority = 252, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +#endif /* CONFIG_RISCV_ISA_ZBC */ + +#define RISCV64_DEFINED_GHASHES 7 + +static struct shash_alg *riscv64_ghashes[RISCV64_DEFINED_GHASHES]; +static int num_riscv64_ghashes; + +static int __init riscv64_ghash_register(struct shash_alg *ghash) +{ + int ret; + + ret = crypto_register_shash(ghash); + if (ret < 0) { + int i; + + for (i = num_riscv64_ghashes - 1; i >= 0 ; i--) + crypto_unregister_shash(riscv64_ghashes[i]); + + num_riscv64_ghashes = 0; + + return ret; + } + + pr_debug("Registered RISC-V ghash %s\n", ghash->base.cra_driver_name); + riscv64_ghashes[num_riscv64_ghashes] = ghash; + num_riscv64_ghashes++; + return 0; +} + +static int __init riscv64_ghash_mod_init(void) +{ + int ret = 0; + +#ifdef CONFIG_RISCV_ISA_ZBC + if (riscv_isa_extension_available(NULL, ZBC)) { + ret = riscv64_ghash_register(&riscv64_zbc_ghash_alg); + if (ret < 0) + return ret; + + if (riscv_isa_extension_available(NULL, ZBB)) { + ret = riscv64_ghash_register(&riscv64_zbc_zbb_ghash_alg); + if (ret < 0) + return ret; + } + + if (riscv_isa_extension_available(NULL, ZBKB)) { + ret = riscv64_ghash_register(&riscv64_zbc_zbkb_ghash_alg); + if (ret < 0) + return ret; + } + } +#endif + + return 0; +} + +static void __exit riscv64_ghash_mod_fini(void) +{ + int i; + + for (i = num_riscv64_ghashes - 1; i >= 0 ; i--) + crypto_unregister_shash(riscv64_ghashes[i]); + + num_riscv64_ghashes = 0; +} + +module_init(riscv64_ghash_mod_init); +module_exit(riscv64_ghash_mod_fini); + +MODULE_DESCRIPTION("GSM GHASH (accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("ghash"); diff --git a/arch/riscv/crypto/ghash-riscv64-zbc.pl b/arch/riscv/crypto/ghash-riscv64-zbc.pl new file mode 100644 index 000000000000..691231ffa11c --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-zbc.pl @@ -0,0 +1,400 @@ +#! /usr/bin/env perl +# Copyright 2022 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# void gcm_init_rv64i_zbc(u128 Htable[16], const u64 H[2]); +# void gcm_init_rv64i_zbc__zbb(u128 Htable[16], const u64 H[2]); +# void gcm_init_rv64i_zbc__zbkb(u128 Htable[16], const u64 H[2]); +# +# input: H: 128-bit H - secret parameter E(K, 0^128) +# output: Htable: Preprocessed key data for gcm_gmult_rv64i_zbc* and +# gcm_ghash_rv64i_zbc* +# +# All callers of this function revert the byte-order unconditionally +# on little-endian machines. So we need to revert the byte-order back. +# Additionally we reverse the bits of each byte. + +{ +my ($Htable,$H,$VAL0,$VAL1,$TMP0,$TMP1,$TMP2) = ("a0","a1","a2","a3","t0","t1","t2"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zbc +.type gcm_init_rv64i_zbc,\@function +gcm_init_rv64i_zbc: + ld $VAL0,0($H) + ld $VAL1,8($H) + @{[brev8_rv64i $VAL0, $TMP0, $TMP1, $TMP2]} + @{[brev8_rv64i $VAL1, $TMP0, $TMP1, $TMP2]} + @{[sd_rev8_rv64i $VAL0, $Htable, 0, $TMP0]} + @{[sd_rev8_rv64i $VAL1, $Htable, 8, $TMP0]} + ret +.size gcm_init_rv64i_zbc,.-gcm_init_rv64i_zbc +___ +} + +{ +my ($Htable,$H,$VAL0,$VAL1,$TMP0,$TMP1,$TMP2) = ("a0","a1","a2","a3","t0","t1","t2"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zbc__zbb +.type gcm_init_rv64i_zbc__zbb,\@function +gcm_init_rv64i_zbc__zbb: + ld $VAL0,0($H) + ld $VAL1,8($H) + @{[brev8_rv64i $VAL0, $TMP0, $TMP1, $TMP2]} + @{[brev8_rv64i $VAL1, $TMP0, $TMP1, $TMP2]} + @{[rev8 $VAL0, $VAL0]} + @{[rev8 $VAL1, $VAL1]} + sd $VAL0,0($Htable) + sd $VAL1,8($Htable) + ret +.size gcm_init_rv64i_zbc__zbb,.-gcm_init_rv64i_zbc__zbb +___ +} + +{ +my ($Htable,$H,$TMP0,$TMP1) = ("a0","a1","t0","t1"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zbc__zbkb +.type gcm_init_rv64i_zbc__zbkb,\@function +gcm_init_rv64i_zbc__zbkb: + ld $TMP0,0($H) + ld $TMP1,8($H) + @{[brev8 $TMP0, $TMP0]} + @{[brev8 $TMP1, $TMP1]} + @{[rev8 $TMP0, $TMP0]} + @{[rev8 $TMP1, $TMP1]} + sd $TMP0,0($Htable) + sd $TMP1,8($Htable) + ret +.size gcm_init_rv64i_zbc__zbkb,.-gcm_init_rv64i_zbc__zbkb +___ +} + +################################################################################ +# void gcm_gmult_rv64i_zbc(u64 Xi[2], const u128 Htable[16]); +# void gcm_gmult_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16]); +# +# input: Xi: current hash value +# Htable: copy of H +# output: Xi: next hash value Xi +# +# Compute GMULT (Xi*H mod f) using the Zbc (clmul) and Zbb (basic bit manip) +# extensions. Using the no-Karatsuba approach and clmul for the final reduction. +# This results in an implementation with minimized number of instructions. +# HW with clmul latencies higher than 2 cycles might observe a performance +# improvement with Karatsuba. HW with clmul latencies higher than 6 cycles +# might observe a performance improvement with additionally converting the +# reduction to shift&xor. For a full discussion of this estimates see +# https://github.com/riscv/riscv-crypto/blob/master/doc/supp/gcm-mode-cmul.adoc +{ +my ($Xi,$Htable,$x0,$x1,$y0,$y1) = ("a0","a1","a4","a5","a6","a7"); +my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6"); + +$code .= <<___; +.p2align 3 +.globl gcm_gmult_rv64i_zbc +.type gcm_gmult_rv64i_zbc,\@function +gcm_gmult_rv64i_zbc: + # Load Xi and bit-reverse it + ld $x0, 0($Xi) + ld $x1, 8($Xi) + @{[brev8_rv64i $x0, $z0, $z1, $z2]} + @{[brev8_rv64i $x1, $z0, $z1, $z2]} + + # Load the key (already bit-reversed) + ld $y0, 0($Htable) + ld $y1, 8($Htable) + + # Load the reduction constant + la $polymod, Lpolymod + lbu $polymod, 0($polymod) + + # Multiplication (without Karatsuba) + @{[clmulh $z3, $x1, $y1]} + @{[clmul $z2, $x1, $y1]} + @{[clmulh $t1, $x0, $y1]} + @{[clmul $z1, $x0, $y1]} + xor $z2, $z2, $t1 + @{[clmulh $t1, $x1, $y0]} + @{[clmul $t0, $x1, $y0]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $x0, $y0]} + @{[clmul $z0, $x0, $y0]} + xor $z1, $z1, $t1 + + # Reduction with clmul + @{[clmulh $t1, $z3, $polymod]} + @{[clmul $t0, $z3, $polymod]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $z2, $polymod]} + @{[clmul $t0, $z2, $polymod]} + xor $x1, $z1, $t1 + xor $x0, $z0, $t0 + + # Bit-reverse Xi back and store it + @{[brev8_rv64i $x0, $z0, $z1, $z2]} + @{[brev8_rv64i $x1, $z0, $z1, $z2]} + sd $x0, 0($Xi) + sd $x1, 8($Xi) + ret +.size gcm_gmult_rv64i_zbc,.-gcm_gmult_rv64i_zbc +___ +} + +{ +my ($Xi,$Htable,$x0,$x1,$y0,$y1) = ("a0","a1","a4","a5","a6","a7"); +my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6"); + +$code .= <<___; +.p2align 3 +.globl gcm_gmult_rv64i_zbc__zbkb +.type gcm_gmult_rv64i_zbc__zbkb,\@function +gcm_gmult_rv64i_zbc__zbkb: + # Load Xi and bit-reverse it + ld $x0, 0($Xi) + ld $x1, 8($Xi) + @{[brev8 $x0, $x0]} + @{[brev8 $x1, $x1]} + + # Load the key (already bit-reversed) + ld $y0, 0($Htable) + ld $y1, 8($Htable) + + # Load the reduction constant + la $polymod, Lpolymod + lbu $polymod, 0($polymod) + + # Multiplication (without Karatsuba) + @{[clmulh $z3, $x1, $y1]} + @{[clmul $z2, $x1, $y1]} + @{[clmulh $t1, $x0, $y1]} + @{[clmul $z1, $x0, $y1]} + xor $z2, $z2, $t1 + @{[clmulh $t1, $x1, $y0]} + @{[clmul $t0, $x1, $y0]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $x0, $y0]} + @{[clmul $z0, $x0, $y0]} + xor $z1, $z1, $t1 + + # Reduction with clmul + @{[clmulh $t1, $z3, $polymod]} + @{[clmul $t0, $z3, $polymod]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $z2, $polymod]} + @{[clmul $t0, $z2, $polymod]} + xor $x1, $z1, $t1 + xor $x0, $z0, $t0 + + # Bit-reverse Xi back and store it + @{[brev8 $x0, $x0]} + @{[brev8 $x1, $x1]} + sd $x0, 0($Xi) + sd $x1, 8($Xi) + ret +.size gcm_gmult_rv64i_zbc__zbkb,.-gcm_gmult_rv64i_zbc__zbkb +___ +} + +################################################################################ +# void gcm_ghash_rv64i_zbc(u64 Xi[2], const u128 Htable[16], +# const u8 *inp, size_t len); +# void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16], +# const u8 *inp, size_t len); +# +# input: Xi: current hash value +# Htable: copy of H +# inp: pointer to input data +# len: length of input data in bytes (mutiple of block size) +# output: Xi: Xi+1 (next hash value Xi) +{ +my ($Xi,$Htable,$inp,$len,$x0,$x1,$y0,$y1) = ("a0","a1","a2","a3","a4","a5","a6","a7"); +my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6"); + +$code .= <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zbc +.type gcm_ghash_rv64i_zbc,\@function +gcm_ghash_rv64i_zbc: + # Load Xi and bit-reverse it + ld $x0, 0($Xi) + ld $x1, 8($Xi) + @{[brev8_rv64i $x0, $z0, $z1, $z2]} + @{[brev8_rv64i $x1, $z0, $z1, $z2]} + + # Load the key (already bit-reversed) + ld $y0, 0($Htable) + ld $y1, 8($Htable) + + # Load the reduction constant + la $polymod, Lpolymod + lbu $polymod, 0($polymod) + +Lstep: + # Load the input data, bit-reverse them, and XOR them with Xi + ld $t0, 0($inp) + ld $t1, 8($inp) + add $inp, $inp, 16 + add $len, $len, -16 + @{[brev8_rv64i $t0, $z0, $z1, $z2]} + @{[brev8_rv64i $t1, $z0, $z1, $z2]} + xor $x0, $x0, $t0 + xor $x1, $x1, $t1 + + # Multiplication (without Karatsuba) + @{[clmulh $z3, $x1, $y1]} + @{[clmul $z2, $x1, $y1]} + @{[clmulh $t1, $x0, $y1]} + @{[clmul $z1, $x0, $y1]} + xor $z2, $z2, $t1 + @{[clmulh $t1, $x1, $y0]} + @{[clmul $t0, $x1, $y0]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $x0, $y0]} + @{[clmul $z0, $x0, $y0]} + xor $z1, $z1, $t1 + + # Reduction with clmul + @{[clmulh $t1, $z3, $polymod]} + @{[clmul $t0, $z3, $polymod]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $z2, $polymod]} + @{[clmul $t0, $z2, $polymod]} + xor $x1, $z1, $t1 + xor $x0, $z0, $t0 + + # Iterate over all blocks + bnez $len, Lstep + + # Bit-reverse final Xi back and store it + @{[brev8_rv64i $x0, $z0, $z1, $z2]} + @{[brev8_rv64i $x1, $z0, $z1, $z2]} + sd $x0, 0($Xi) + sd $x1, 8($Xi) + ret +.size gcm_ghash_rv64i_zbc,.-gcm_ghash_rv64i_zbc +___ +} + +{ +my ($Xi,$Htable,$inp,$len,$x0,$x1,$y0,$y1) = ("a0","a1","a2","a3","a4","a5","a6","a7"); +my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6"); + +$code .= <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zbc__zbkb +.type gcm_ghash_rv64i_zbc__zbkb,\@function +gcm_ghash_rv64i_zbc__zbkb: + # Load Xi and bit-reverse it + ld $x0, 0($Xi) + ld $x1, 8($Xi) + @{[brev8 $x0, $x0]} + @{[brev8 $x1, $x1]} + + # Load the key (already bit-reversed) + ld $y0, 0($Htable) + ld $y1, 8($Htable) + + # Load the reduction constant + la $polymod, Lpolymod + lbu $polymod, 0($polymod) + +Lstep_zkbk: + # Load the input data, bit-reverse them, and XOR them with Xi + ld $t0, 0($inp) + ld $t1, 8($inp) + add $inp, $inp, 16 + add $len, $len, -16 + @{[brev8 $t0, $t0]} + @{[brev8 $t1, $t1]} + xor $x0, $x0, $t0 + xor $x1, $x1, $t1 + + # Multiplication (without Karatsuba) + @{[clmulh $z3, $x1, $y1]} + @{[clmul $z2, $x1, $y1]} + @{[clmulh $t1, $x0, $y1]} + @{[clmul $z1, $x0, $y1]} + xor $z2, $z2, $t1 + @{[clmulh $t1, $x1, $y0]} + @{[clmul $t0, $x1, $y0]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $x0, $y0]} + @{[clmul $z0, $x0, $y0]} + xor $z1, $z1, $t1 + + # Reduction with clmul + @{[clmulh $t1, $z3, $polymod]} + @{[clmul $t0, $z3, $polymod]} + xor $z2, $z2, $t1 + xor $z1, $z1, $t0 + @{[clmulh $t1, $z2, $polymod]} + @{[clmul $t0, $z2, $polymod]} + xor $x1, $z1, $t1 + xor $x0, $z0, $t0 + + # Iterate over all blocks + bnez $len, Lstep_zkbk + + # Bit-reverse final Xi back and store it + @{[brev8 $x0, $x0]} + @{[brev8 $x1, $x1]} + sd $x0, 0($Xi) + sd $x1, 8($Xi) + ret +.size gcm_ghash_rv64i_zbc__zbkb,.-gcm_ghash_rv64i_zbc__zbkb +___ +} + +$code .= <<___; +.p2align 3 +Lbrev8_const: + .dword 0xAAAAAAAAAAAAAAAA + .dword 0xCCCCCCCCCCCCCCCC + .dword 0xF0F0F0F0F0F0F0F0 +.size Lbrev8_const,.-Lbrev8_const + +Lpolymod: + .byte 0x87 +.size Lpolymod,.-Lpolymod +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm new file mode 100644 index 000000000000..61bc4fc41a43 --- /dev/null +++ b/arch/riscv/crypto/riscv.pm @@ -0,0 +1,230 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +use strict; +use warnings; + +# Set $have_stacktrace to 1 if we have Devel::StackTrace +my $have_stacktrace = 0; +if (eval {require Devel::StackTrace;1;}) { + $have_stacktrace = 1; +} + +my @regs = map("x$_",(0..31)); +my @regaliases = ('zero','ra','sp','gp','tp','t0','t1','t2','s0','s1', + map("a$_",(0..7)), + map("s$_",(2..11)), + map("t$_",(3..6)) +); + +my %reglookup; +@reglookup{@regs} = @regs; +@reglookup{@regaliases} = @regs; + +# Takes a register name, possibly an alias, and converts it to a register index +# from 0 to 31 +sub read_reg { + my $reg = lc shift; + if (!exists($reglookup{$reg})) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unknown register ".$reg."\n".$trace); + } + my $regstr = $reglookup{$reg}; + if (!($regstr =~ /^x([0-9]+)$/)) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Could not process register ".$reg."\n".$trace); + } + return $1; +} + +# Helper functions + +sub brev8_rv64i { + # brev8 without `brev8` instruction (only in Zkbk) + # Bit-reverses the first argument and needs three scratch registers + my $val = shift; + my $t0 = shift; + my $t1 = shift; + my $brev8_const = shift; + my $seq = <<___; + la $brev8_const, Lbrev8_const + + ld $t0, 0($brev8_const) # 0xAAAAAAAAAAAAAAAA + slli $t1, $val, 1 + and $t1, $t1, $t0 + and $val, $val, $t0 + srli $val, $val, 1 + or $val, $t1, $val + + ld $t0, 8($brev8_const) # 0xCCCCCCCCCCCCCCCC + slli $t1, $val, 2 + and $t1, $t1, $t0 + and $val, $val, $t0 + srli $val, $val, 2 + or $val, $t1, $val + + ld $t0, 16($brev8_const) # 0xF0F0F0F0F0F0F0F0 + slli $t1, $val, 4 + and $t1, $t1, $t0 + and $val, $val, $t0 + srli $val, $val, 4 + or $val, $t1, $val +___ + return $seq; +} + +sub sd_rev8_rv64i { + # rev8 without `rev8` instruction (only in Zbb or Zbkb) + # Stores the given value byte-reversed and needs one scratch register + my $val = shift; + my $addr = shift; + my $off = shift; + my $tmp = shift; + my $off0 = ($off + 0); + my $off1 = ($off + 1); + my $off2 = ($off + 2); + my $off3 = ($off + 3); + my $off4 = ($off + 4); + my $off5 = ($off + 5); + my $off6 = ($off + 6); + my $off7 = ($off + 7); + my $seq = <<___; + sb $val, $off7($addr) + srli $tmp, $val, 8 + sb $tmp, $off6($addr) + srli $tmp, $val, 16 + sb $tmp, $off5($addr) + srli $tmp, $val, 24 + sb $tmp, $off4($addr) + srli $tmp, $val, 32 + sb $tmp, $off3($addr) + srli $tmp, $val, 40 + sb $tmp, $off2($addr) + srli $tmp, $val, 48 + sb $tmp, $off1($addr) + srli $tmp, $val, 56 + sb $tmp, $off0($addr) +___ + return $seq; +} + +# Scalar crypto instructions + +sub aes64ds { + # Encoding for aes64ds rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0011101_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64dsm { + # Encoding for aes64dsm rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0011111_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64es { + # Encoding for aes64es rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0011001_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64esm { + # Encoding for aes64esm rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0011011_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64im { + # Encoding for aes64im rd, rs1 instruction on RV64 + # XXXXXXXXXXXX_ rs1 _XXX_ rd _XXXXXXX + my $template = 0b001100000000_00000_001_00000_0010011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64ks1i { + # Encoding for aes64ks1i rd, rs1, rnum instruction on RV64 + # XXXXXXXX_rnum_ rs1 _XXX_ rd _XXXXXXX + my $template = 0b00110001_0000_00000_001_00000_0010011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rnum = shift; + return ".word ".($template | ($rnum << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub aes64ks2 { + # Encoding for aes64ks2 rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0111111_00000_00000_000_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub brev8 { + # brev8 rd, rs + my $template = 0b011010000111_00000_101_00000_0010011; + my $rd = read_reg shift; + my $rs = read_reg shift; + return ".word ".($template | ($rs << 15) | ($rd << 7)); +} + +sub clmul { + # Encoding for clmul rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0000101_00000_00000_001_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub clmulh { + # Encoding for clmulh rd, rs1, rs2 instruction on RV64 + # XXXXXXX_ rs2 _ rs1 _XXX_ rd _XXXXXXX + my $template = 0b0000101_00000_00000_011_00000_0110011; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub rev8 { + # Encoding for rev8 rd, rs instruction on RV64 + # XXXXXXXXXXXXX_ rs _XXX_ rd _XXXXXXX + my $template = 0b011010111000_00000_101_00000_0010011; + my $rd = read_reg shift; + my $rs = read_reg shift; + return ".word ".($template | ($rs << 15) | ($rd << 7)); +} + +1; From patchwork Mon Feb 6 22:58:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130732 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F10CEC05027 for ; Mon, 6 Feb 2023 22:59:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=qavf+nWbofZyKCVmUuffNoeJbHQq19Z3mtLRhGMWoKo=; b=M6gYw8KRXvYG/w ttUU3F4K9PZSIbmUHYvKyF+kD1Uh+u7tvha2dC7ygFhVQ019nHEjMzb8qF4Yt5BgNdUOtC//iiuca NP2xgL4T5e3gB5FtNm1qj+u5FFKnCSdKd1LkF9xtpE+UpjO9UG5NP7zxV9uPD5eIHgk4hLL9+0Jq8 IfVoZd8n04HbczjCYD5TK3IRe4Lb7CKaLuED0yzx1vKXiuV+WtESggG0kG0Qrk5Ee5eYRcxKbucKq rXcBLOLmz63e1dVuyYgwiLZlxv1gMfA1cxMkCJkgcuubIFFqy3vA7gLXZNedOstSIA3R/5Pu4XLBs ZNP7qGi21NrPuxpWs0yw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS4-00A7IP-0c; Mon, 06 Feb 2023 22:59:08 +0000 Received: from gloria.sntech.de ([185.11.138.130]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAS0-00A7F3-Ov for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 22:59:06 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARm-0002Mb-CF; Mon, 06 Feb 2023 23:58:50 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 08/12] RISC-V: add vector crypto extension detection Date: Mon, 6 Feb 2023 23:58:42 +0100 Message-Id: <20230206225846.1381789-9-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_145904_833747_04375047 X-CRM114-Status: GOOD ( 10.46 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner Add detection for some extensions of the vector-crypto specification, namely - Zvkb: Vector Bit-manipulation used in Cryptography - Zvkg: Vector GCM/GMAC - Zvknha and Zvknhb: NIST Algorithm Suite As their use is very specific and will likely be limited to special places we expect current code to just pre-encode those instructions, so right now we don't introduce toolchain requirements. Signed-off-by: Heiko Stuebner --- arch/riscv/include/asm/hwcap.h | 4 ++++ arch/riscv/kernel/cpu.c | 4 ++++ arch/riscv/kernel/cpufeature.c | 4 ++++ 3 files changed, 12 insertions(+) diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 23427b9ed1e6..ce683dfb849f 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -64,6 +64,10 @@ enum riscv_isa_ext_id { RISCV_ISA_EXT_ZBKB, RISCV_ISA_EXT_ZICBOM, RISCV_ISA_EXT_ZIHINTPAUSE, + RISCV_ISA_EXT_ZVKB, + RISCV_ISA_EXT_ZVKG, + RISCV_ISA_EXT_ZVKNHA, + RISCV_ISA_EXT_ZVKNHB, RISCV_ISA_EXT_ID_MAX }; static_assert(RISCV_ISA_EXT_ID_MAX <= RISCV_ISA_EXT_MAX); diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index f9f361285b04..bc615dbbf766 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -190,6 +190,10 @@ static struct riscv_isa_ext_data isa_ext_arr[] = { __RISCV_ISA_EXT_DATA(zbkb, RISCV_ISA_EXT_ZBKB), __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM), __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE), + __RISCV_ISA_EXT_DATA(zvkb, RISCV_ISA_EXT_ZVKB), + __RISCV_ISA_EXT_DATA(zvkg, RISCV_ISA_EXT_ZVKG), + __RISCV_ISA_EXT_DATA(zvknha, RISCV_ISA_EXT_ZVKNHA), + __RISCV_ISA_EXT_DATA(zvknhb, RISCV_ISA_EXT_ZVKNHB), __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF), __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC), __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL), diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 695dfd732483..4f08b7d97810 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -239,6 +239,10 @@ printk("!!!! isa-string: %s\n\n\n", isa); SET_ISA_EXT_MAP("zbkb", RISCV_ISA_EXT_ZBKB); SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE); + SET_ISA_EXT_MAP("zvkb", RISCV_ISA_EXT_ZVKB); + SET_ISA_EXT_MAP("zvkg", RISCV_ISA_EXT_ZVKG); + SET_ISA_EXT_MAP("zvknha", RISCV_ISA_EXT_ZVKNHA); + SET_ISA_EXT_MAP("zvknhb", RISCV_ISA_EXT_ZVKNHB); } #undef SET_ISA_EXT_MAP } From patchwork Mon Feb 6 22:58:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130736 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C1418C636D3 for ; Mon, 6 Feb 2023 22:59:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=9vuvrl//GjuWEBAGEEwPBWGpuPhFXZsowlXlkkdoNo8=; b=3pLaFFdWpgk1sL okbrM+wT6YdhGGj+dQIFiepFtt+E2tIsO+dzWl0NQvBIpVJAUM+bWtI45Z/lj/e4VthJfIw4SzlsH /N6TThNMDoAQbYfMGMKiJ5QePThtykErquX77dl5+PeL+E0c9plpXuIyQeJfzMR0zE4znHP1WmPI1 LoKZyGKqOjjXhA+wUYeUsp3ivWOsoEMqseQKmJafTdVymDh0/aB5rrRYbMaBs10Pj0Noyz0p0zAQe LtnseM3xlOlqbzr4arMTS0Y6wukKu/jQnys+cAlQl4XptorO6KPjCvF/h9JMx321kG5hAtDTTfUGm hC07jSGwg83w/SH4HC0g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPASO-00A7WK-CP; Mon, 06 Feb 2023 22:59:28 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPASJ-00A7Rv-7e for linux-riscv@bombadil.infradead.org; Mon, 06 Feb 2023 22:59:23 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=ctRpTQ6W9qWRT7wG6zcpqyHASKL9zzsNRh8o+EnRYwM=; b=p7povVJMoXd7Q8HxkBCnDKLpbJ cOiAm8jE2C1iixRSAFCRoZLV14jVBkAQBddpS7/O2PEKYzWercZWjjTcmlMmyXFS43RiyngZf6Bxr aLhkn8yU+4qQEr8DG+974LNaSr6ZtoKj1kubytIrjoaRfVF+IwWC0AbRgfvjTa+LCiPifPGvAo5DA fA9sXMdFTvto88r9iW2OMW/sPfD+KpENpaB3LOLfBhtN2tFW6uV3SyfGvtYAsRmc+Oqvz7u4hN2m8 1Tvhp6TIKMKgm2Cnn6T9S14REffQfVTxgge7JuYi0U5YsWJCsuM8TINLf/X1KEwDU4ZnTTtXZEWdR 5f02IkgA==; Received: from gloria.sntech.de ([185.11.138.130]) by desiato.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pPARW-006ho2-0o for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 22:58:42 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARm-0002Mb-KL; Mon, 06 Feb 2023 23:58:50 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 09/12] RISC-V: crypto: update perl include with helpers for vector (crypto) instructions Date: Mon, 6 Feb 2023 23:58:43 +0100 Message-Id: <20230206225846.1381789-10-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_225834_759940_EC20BCF5 X-CRM114-Status: GOOD ( 10.54 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner The openSSL scripts use a number of helpers for handling vector instructions and instructions from the vector-crypto-extensions. Therefore port these over from openSSL. Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/riscv.pm | 493 +++++++++++++++++++++++++++++++++++++ 1 file changed, 493 insertions(+) diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm index 61bc4fc41a43..ba3ae0e3de26 100644 --- a/arch/riscv/crypto/riscv.pm +++ b/arch/riscv/crypto/riscv.pm @@ -48,6 +48,29 @@ sub read_reg { return $1; } +my @vregs = map("v$_",(0..31)); +my %vreglookup; +@vreglookup{@vregs} = @vregs; + +sub read_vreg { + my $vreg = lc shift; + if (!exists($vreglookup{$vreg})) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unknown vector register ".$vreg."\n".$trace); + } + if (!($vreg =~ /^v([0-9]+)$/)) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Could not process vector register ".$vreg."\n".$trace); + } + return $1; +} + # Helper functions sub brev8_rv64i { @@ -227,4 +250,474 @@ sub rev8 { return ".word ".($template | ($rs << 15) | ($rd << 7)); } +# Vector instructions + +sub vadd_vv { + # vadd.vv vd, vs2, vs1 + my $template = 0b0000001_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vand_vi { + # vand.vi vd, vs1, imm + my $template = 0b0010011_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs1 = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($vs1 << 20) | ($imm << 15) | ($vd << 7)) +} + +sub vid_v { + # vid.v vd + my $template = 0b0101001_00000_10001_010_00000_1010111; + my $vd = read_vreg shift; + return ".word ".($template | ($vd << 7)); +} + +sub vl1re32_v { + # vl1re32.v vd, (rs) + my $template = 0b0000001_00000_00000_110_00000_0000111; + my $vd = read_vreg shift; + my $rs = read_reg shift; + return ".word ".($template | ($rs << 15)| ($vd << 7)); +} + +sub vl1re64_v { + # vl1re64.v vd, (rs) + my $template = 0b0000001_00000_00000_111_00000_0000111; + my $vd = read_vreg shift; + my $rs = read_reg shift; + return ".word ".($template | ($rs << 15)| ($vd << 7)); +} + +sub vle32_v { + # vle32.v vd, (rs1) + my $template = 0b0000001_00000_00000_110_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vle64_v { + # vle64.v vd, (rs1) + my $template = 0b0000001_00000_00000_111_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vlse32_v { + # vlse32.v vd, (rs1), rs2 + my $template = 0b0000101_00000_00000_110_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vlse64_v { + # vlse64.v vd, (rs1), rs2 + my $template = 0b0000101_00000_00000_111_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vmerge_vim { + # vmerge.vim vd, vs2, imm, v0 + my $template = 0b0101110_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7)); +} + +sub vmerge_vvm { + # vmerge.vvm vd vs2 vs1 + my $template = 0b0101110_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)) +} + +sub vmseq_vi { + # vmseq vd vs1, imm + my $template = 0b0110001_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs1 = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($vs1 << 20) | ($imm << 15) | ($vd << 7)) +} + +sub vmv_v_i { + # vmv.v.i vd, imm + my $template = 0b0101111_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($imm << 15) | ($vd << 7)); +} + +sub vmv_v_v { + # vmv.v.v vd, vs1 + my $template = 0b0101111_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs1 << 15) | ($vd << 7)); +} + +sub vmv_v_x { + # vmv.v.x vd, rs1 + my $template = 0b0101111_00000_00000_100_00000_1010111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vor_vv_v0t { + # vor.vv vd, vs2, vs1, v0.t + my $template = 0b0010100_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vrgather_vv { + # vrgather.vv vd, vs2, vs1 + my $template = 0b0011001_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + if ($vd == $vs1 || $vd == $vs2) { + my $trace = Devel::StackTrace->new; + die("Source operands and destination operand must not overlap!\n".$trace->as_string); + } + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vse32_v { + # vse32.v vd, (rs1) + my $template = 0b0000001_00000_00000_110_00000_0100111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vse64_v { + # vse64.v vd, (rs1) + my $template = 0b0000001_00000_00000_111_00000_0100111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vsetivli__x0_2_e64_m1_ta_ma { + # vsetivli x0, 2, e64, m1, ta, ma + return ".word 0xcd817057"; +} + +sub vsetivli__x0_4_e32_m1_ta_ma { + # vsetivli x0, 4, e32, m1, ta, ma + return ".word 0xcd027057"; +} + +sub vsetivli__x0_4_e64_m1_ta_ma { + # vsetivli x0,4,e64,m1,ta,ma + return ".word 0xcd827057"; +} + +sub vslidedown_vi { + # vslidedown.vi vd, vs2, uimm + my $template = 0b0011111_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vslideup_vi_v0t { + # vslideup.vi vd, vs2, uimm, v0.t + my $template = 0b0011100_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vslideup_vi { + # vslideup.vi vd, vs2, uimm + my $template = 0b0011101_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vsll_vi { + # vsll.vi vd, vs2, uimm, vm + my $template = 0b1001011_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vsrl_vx { + # vsrl.vx vd, vs2, rs1 + my $template = 0b1010001_00000_00000_100_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vsse32_v { + # vse32.v vs3, (rs1), rs2 + my $template = 0b0000101_00000_00000_110_00000_0100111; + my $vs3 = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7)); +} + +sub vsse64_v { + # vsse64.v vs3, (rs1), rs2 + my $template = 0b0000101_00000_00000_111_00000_0100111; + my $vs3 = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7)); +} + +sub vxor_vi { + # vxor.vi vd, vs2, imm + my $template = 0b0010111_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7)); +} + +sub vxor_vv_v0t { + # vxor.vv vd, vs2, vs1, v0.t + my $template = 0b0010110_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vxor_vv { + # vxor.vv vd, vs2, vs1 + my $template = 0b0010111_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +# Vector crypto instructions + +## Zvkb instructions + +sub vclmulh_vx { + # vclmulh.vx vd, vs2, rs1 + my $template = 0b0011011_00000_00000_110_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vclmul_vx_v0t { + # vclmul.vx vd, vs2, rs1, v0.t + my $template = 0b0011000_00000_00000_110_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vclmul_vx { + # vclmul.vx vd, vs2, rs1 + my $template = 0b0011001_00000_00000_110_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vrev8_v { + # vrev8.v vd, vs2 + my $template = 0b0100101_00000_01001_010_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## Zvkg instructions + +sub vghmac_vv { + # vhgmac.vv vd, vs2, vs1 + my $template = 0b1011001_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +## Zvknha and Zvknhb instructions + +sub vsha2ms_vv { + # vsha2ms.vv vd, vs2, vs1 + my $template = 0b1011011_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +sub vsha2cl_vv { + # vsha2cl.vv vd, vs2, vs1 + my $template = 0b101111_10000_00000_001_00000_01110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +sub vsha2ch_vv { + # vsha2ch.vv vd, vs2, vs1 + my $template = 0b101110_10000_00000_001_00000_01110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +## Zvkns instructions + +sub vaesdf_vs { + # vaesdf.vs vd, vs2 + my $template = 0b101001_1_00000_00001_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesdf_vv { + # vaesdf.vv vd, vs2 + my $template = 0b101000_1_00000_00001_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesdm_vs { + # vaesdm.vs vd, vs2 + my $template = 0b101001_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesdm_vv { + # vaesdm.vv vd, vs2 + my $template = 0b101000_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesef_vs { + # vaesef.vs vd, vs2 + my $template = 0b101001_1_00000_00011_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesef_vv { + # vaesef.vv vd, vs2 + my $template = 0b101000_1_00000_00011_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesem_vs { + # vaesem.vs vd, vs2 + my $template = 0b101001_1_00000_00010_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesem_vv { + # vaesem.vv vd, vs2 + my $template = 0b101000_1_00000_00010_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaeskf1_vi { + # vaeskf1.vi vd, vs2, uimmm + my $template = 0b100010_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7)); +} + +sub vaeskf2_vi { + # vaeskf2.vi vd, vs2, uimm + my $template = 0b101010_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vaesz_vs { + # vaesz.vs vd, vs2 + my $template = 0b101001_1_00000_00111_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## Zvksed instructions + +sub vsm4k_vi { + # vsm4k.vi vd, vs2, uimm + my $template = 0b1000011_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vsm4r_vv { + # vsm4r.vv vd, vs2 + my $template = 0b1010001_00000_10000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vsm4r_vs { + # vsm4r.vs vd, vs2 + my $template = 0b1010011_00000_10000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + 1; From patchwork Mon Feb 6 22:58:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130768 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C4299C05027 for ; Mon, 6 Feb 2023 23:30:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=uTk3sEgh336SiUdmQPvwn5E5qUykMr/MSN58DVkUGAw=; b=WWv9FipRL9XysV bJfAz+xCNGNuC5m+h5Co9RP6pppSa52MsFnGqt0pnB92+U51Y5aAZiNlxk+YjTjMnCqPFgAkpHAQE Xt/yWL5KxgXlHCuxSV4rCzlF1qFqUOmOzBkvzDVbKnJZitZA4uH8WyCnGFxSGcY6KKcBqkzyqoWhg 9aiE8S0hGSM3EvrKh2knnNlS+sX9H92A813wOOXbblRAk1YiX5hmyJ5J+Gc7fepZjQcWKrBgL1ZAV fZ8SdUikuOT/VaCUkzPRhE7sOPsKcEgWoucybZD5SydC8qghMmoBpK536Vnq/2NCEtPkrG+3hEHZu 3eHOEbbZ8no1UG7vZ2Eg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAvq-00ABIO-6x; Mon, 06 Feb 2023 23:29:54 +0000 Received: from gloria.sntech.de ([185.11.138.130]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAvk-00ABG3-Iq for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 23:29:51 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARm-0002Mb-Sk; Mon, 06 Feb 2023 23:58:50 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 10/12] RISC-V: crypto: add Zvkb accelerated GCM GHASH implementation Date: Mon, 6 Feb 2023 23:58:44 +0100 Message-Id: <20230206225846.1381789-11-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_152948_940000_A7C95517 X-CRM114-Status: GOOD ( 35.94 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner Add a gcm hash implementation using the Zvkb crypto extension. It gets possibly registered alongside the Zbc-based variant, with a higher priority so that the crypto subsystem will be able to select the most performant variant, but the algorithm itself will still be part of the crypto selftests that run during registration. Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 3 +- arch/riscv/crypto/Makefile | 8 +- arch/riscv/crypto/ghash-riscv64-glue.c | 146 ++++++++++ arch/riscv/crypto/ghash-riscv64-zvkb.pl | 346 ++++++++++++++++++++++++ 4 files changed, 501 insertions(+), 2 deletions(-) create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkb.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 010adbbb058a..404fd9b3cb7c 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -4,7 +4,7 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)" config CRYPTO_GHASH_RISCV64 tristate "Hash functions: GHASH" - depends on 64BIT && RISCV_ISA_ZBC + depends on 64BIT && (RISCV_ISA_ZBC || RISCV_ISA_V) select CRYPTO_HASH select CRYPTO_LIB_GF128MUL help @@ -12,5 +12,6 @@ config CRYPTO_GHASH_RISCV64 Architecture: riscv64 using one of: - ZBC extension + - ZVKB vector crypto extension endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 0a158919e9da..8ab9a0ae8f2d 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -8,6 +8,9 @@ ghash-riscv64-y := ghash-riscv64-glue.o ifdef CONFIG_RISCV_ISA_ZBC ghash-riscv64-y += ghash-riscv64-zbc.o endif +ifdef CONFIG_RISCV_ISA_V +ghash-riscv64-y += ghash-riscv64-zvkb.o +endif quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -15,4 +18,7 @@ quiet_cmd_perlasm = PERLASM $@ $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl $(call cmd,perlasm) -clean-files += ghash-riscv64-zbc.S +$(obj)/ghash-riscv64-zvkb.S: $(src)/ghash-riscv64-zvkb.pl + $(call cmd,perlasm) + +clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c index 9802b6718c3c..7376a8a793aa 100644 --- a/arch/riscv/crypto/ghash-riscv64-glue.c +++ b/arch/riscv/crypto/ghash-riscv64-glue.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -26,6 +27,10 @@ void gcm_ghash_rv64i_zbc(u64 Xi[2], const u128 Htable[16], void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16], const u8 *inp, size_t len); +/* Zvkb (vector crypto with vclmul) based routines. */ +void gcm_ghash_rv64i_zvkb(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); + struct riscv64_ghash_ctx { void (*ghash_func)(u64 Xi[2], const u128 Htable[16], const u8 *inp, size_t len); @@ -51,6 +56,139 @@ static int riscv64_ghash_init(struct shash_desc *desc) return 0; } +#ifdef CONFIG_RISCV_ISA_V + +#define RISCV64_ZVK_SETKEY(VARIANT, GHASH) \ +void gcm_init_rv64i_ ## VARIANT(u128 Htable[16], const u64 Xi[2]); \ +static int riscv64_zvk_ghash_setkey_ ## VARIANT(struct crypto_shash *tfm, \ + const u8 *key, \ + unsigned int keylen) \ +{ \ + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm)); \ + const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]), \ + cpu_to_be64(((const u64 *)key)[1]) }; \ + \ + if (keylen != GHASH_BLOCK_SIZE) \ + return -EINVAL; \ + \ + memcpy(&ctx->key, key, GHASH_BLOCK_SIZE); \ + kernel_rvv_begin(); \ + gcm_init_rv64i_ ## VARIANT(ctx->htable, k); \ + kernel_rvv_end(); \ + \ + ctx->ghash_func = gcm_ghash_rv64i_ ## GHASH; \ + \ + return 0; \ +} + +static inline void __ghash_block(struct riscv64_ghash_ctx *ctx, + struct riscv64_ghash_desc_ctx *dctx) +{ + if (crypto_simd_usable()) { + kernel_rvv_begin(); + ctx->ghash_func(dctx->shash, ctx->htable, + dctx->buffer, GHASH_DIGEST_SIZE); + kernel_rvv_end(); + } else { + crypto_xor((u8 *)dctx->shash, dctx->buffer, GHASH_BLOCK_SIZE); + gf128mul_lle((be128 *)dctx->shash, &ctx->key); + } +} + +static inline void __ghash_blocks(struct riscv64_ghash_ctx *ctx, + struct riscv64_ghash_desc_ctx *dctx, + const u8 *src, unsigned int srclen) +{ + if (crypto_simd_usable()) { + kernel_rvv_begin(); + ctx->ghash_func(dctx->shash, ctx->htable, + src, srclen); + kernel_rvv_end(); + } else { + while (srclen >= GHASH_BLOCK_SIZE) { + crypto_xor((u8 *)dctx->shash, src, GHASH_BLOCK_SIZE); + gf128mul_lle((be128 *)dctx->shash, &ctx->key); + srclen -= GHASH_BLOCK_SIZE; + src += GHASH_BLOCK_SIZE; + } + } +} + +static int riscv64_zvk_ghash_update(struct shash_desc *desc, + const u8 *src, unsigned int srclen) +{ + unsigned int len; + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + if (dctx->bytes) { + if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) { + memcpy(dctx->buffer + dctx->bytes, src, + srclen); + dctx->bytes += srclen; + return 0; + } + memcpy(dctx->buffer + dctx->bytes, src, + GHASH_DIGEST_SIZE - dctx->bytes); + + __ghash_block(ctx, dctx); + + src += GHASH_DIGEST_SIZE - dctx->bytes; + srclen -= GHASH_DIGEST_SIZE - dctx->bytes; + dctx->bytes = 0; + } + len = srclen & ~(GHASH_DIGEST_SIZE - 1); + + if (len) { + __ghash_blocks(ctx, dctx, src, len); + src += len; + srclen -= len; + } + + if (srclen) { + memcpy(dctx->buffer, src, srclen); + dctx->bytes = srclen; + } + return 0; +} + +static int riscv64_zvk_ghash_final(struct shash_desc *desc, u8 *out) +{ + int i; + struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + if (dctx->bytes) { + for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++) + dctx->buffer[i] = 0; + __ghash_block(ctx, dctx); + dctx->bytes = 0; + } + memcpy(out, dctx->shash, GHASH_DIGEST_SIZE); + return 0; +} + +RISCV64_ZVK_SETKEY(zvkb, zvkb); +struct shash_alg riscv64_zvkb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zvk_ghash_update, + .final = riscv64_zvk_ghash_final, + .setkey = riscv64_zvk_ghash_setkey_zvkb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zvkb_ghash", + .cra_priority = 300, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +#endif /* CONFIG_RISCV_ISA_V */ + #ifdef CONFIG_RISCV_ISA_ZBC #define RISCV64_ZBC_SETKEY(VARIANT, GHASH) \ @@ -241,6 +379,14 @@ static int __init riscv64_ghash_mod_init(void) } #endif +#ifdef CONFIG_RISCV_ISA_V + if (riscv_isa_extension_available(NULL, ZVKB)) { + ret = riscv64_ghash_register(&riscv64_zvkb_ghash_alg); + if (ret < 0) + return ret; + } +#endif + return 0; } diff --git a/arch/riscv/crypto/ghash-riscv64-zvkb.pl b/arch/riscv/crypto/ghash-riscv64-zvkb.pl new file mode 100644 index 000000000000..8450e850108b --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-zvkb.pl @@ -0,0 +1,346 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 H[2]); +# +# input: H: 128-bit H - secret parameter E(K, 0^128) +# output: Htable: Preprocessed key data for gcm_gmult_rv64i_zvkb and +# gcm_ghash_rv64i_zvkb +{ +my ($Htable,$H,$TMP0,$TMP1,$TMP2) = ("a0","a1","t0","t1","t2"); +my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zvkb +.type gcm_init_rv64i_zvkb,\@function +gcm_init_rv64i_zvkb: + # Load/store data in reverse order. + # This is needed as a part of endianness swap. + add $H, $H, 8 + li $TMP0, -8 + li $TMP1, 63 + la $TMP2, Lpolymod + + @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma + + @{[vlse64_v $V1, $H, $TMP0]} # vlse64.v v1, (a1), t0 + @{[vle64_v $V2, $TMP2]} # vle64.v v2, (t2) + + # Shift one left and get the carry bits. + @{[vsrl_vx $V3, $V1, $TMP1]} # vsrl.vx v3, v1, t1 + @{[vsll_vi $V1, $V1, 1]} # vsll.vi v1, v1, 1 + + # Use the fact that the polynomial degree is no more than 128, + # i.e. only the LSB of the upper half could be set. + # Thanks to we don't need to do the full reduction here. + # Instead simply subtract the reduction polynomial. + # This idea was taken from x86 ghash implementation in OpenSSL. + @{[vslideup_vi $V4, $V3, 1]} # vslideup.vi v4, v3, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + + @{[vmv_v_i $V0, 2]} # vmv.v.i v0, 2 + @{[vor_vv_v0t $V1, $V1, $V4]} # vor.vv v1, v1, v4, v0.t + + # Need to set the mask to 3, if the carry bit is set. + # Not sure if there is a better way of doing this. + @{[vmv_v_v $V0, $V3]} # vmv.v.v v0, v3 + @{[vmv_v_i $V3, 0]} # vmv.v.i v3, 0 + @{[vmerge_vim $V3, $V3, 3]} # vmerge.vim v3, v3, 3, v0 + @{[vmv_v_v $V0, $V3]} # vmv.v.v v0, v3 + + @{[vxor_vv_v0t $V1, $V1, $V2]} # vxor.vv v1, v1, v2, v0.t + + @{[vse64_v $V1, $Htable]} # vse64.v v1, (a0) + ret +.size gcm_init_rv64i_zvkb,.-gcm_init_rv64i_zvkb +___ +} + +################################################################################ +# void gcm_gmult_rv64i_zvkb(u64 Xi[2], const u128 Htable[16]); +# +# input: Xi: current hash value +# Htable: preprocessed H +# output: Xi: next hash value Xi = (Xi * H mod f) +{ +my ($Xi,$Htable,$TMP0,$TMP1,$TMP2,$TMP3,$TMP4) = ("a0","a1","t0","t1","t2","t3","t4"); +my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6"); + +$code .= <<___; +.text +.p2align 3 +.globl gcm_gmult_rv64i_zvkb +.type gcm_gmult_rv64i_zvkb,\@function +gcm_gmult_rv64i_zvkb: + ld $TMP0, ($Htable) + ld $TMP1, 8($Htable) + li $TMP2, 63 + la $TMP3, Lpolymod + ld $TMP3, 8($TMP3) + + # Load/store data in reverse order. + # This is needed as a part of endianness swap. + add $Xi, $Xi, 8 + li $TMP4, -8 + + @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma + + @{[vlse64_v $V5, $Xi, $TMP4]} # vlse64.v v5, (a0), t4 + @{[vrev8_v $V5, $V5]} # vrev8.v v5, v5 + + # Multiplication + + # Do two 64x64 multiplications in one go to save some time + # and simplify things. + + # A = a1a0 (t1, t0) + # B = b1b0 (v5) + # C = c1c0 (256 bit) + # c1 = a1b1 + (a0b1)h + (a1b0)h + # c0 = a0b0 + (a0b1)l + (a1b0)h + + # v1 = (a0b1)l,(a0b0)l + @{[vclmul_vx $V1, $V5, $TMP0]} # vclmul.vx v1, v5, t0 + # v3 = (a0b1)h,(a0b0)h + @{[vclmulh_vx $V3, $V5, $TMP0]} # vclmulh.vx v3, v5, t0 + + # v4 = (a1b1)l,(a1b0)l + @{[vclmul_vx $V4, $V5, $TMP1]} # vclmul.vx v4, v5, t1 + # v2 = (a1b1)h,(a1b0)h + @{[vclmulh_vx $V2, $V5, $TMP1]} # vclmulh.vx v2, v5, t1 + + # Is there a better way to do this? + # Would need to swap the order of elements within a vector register. + @{[vslideup_vi $V5, $V3, 1]} # vslideup.vi v5, v3, 1 + @{[vslideup_vi $V6, $V4, 1]} # vslideup.vi v6, v4, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + @{[vslidedown_vi $V4, $V4, 1]} # vslidedown.vi v4, v4, 1 + + @{[vmv_v_i $V0, 1]} # vmv.v.i v0, 1 + # v2 += (a0b1)h + @{[vxor_vv_v0t $V2, $V2, $V3]} # vxor.vv v2, v2, v3, v0.t + # v2 += (a1b1)l + @{[vxor_vv_v0t $V2, $V2, $V4]} # vxor.vv v2, v2, v4, v0.t + + @{[vmv_v_i $V0, 2]} # vmv.v.i v0, 2 + # v1 += (a0b0)h,0 + @{[vxor_vv_v0t $V1, $V1, $V5]} # vxor.vv v1, v1, v5, v0.t + # v1 += (a1b0)l,0 + @{[vxor_vv_v0t $V1, $V1, $V6]} # vxor.vv v1, v1, v6, v0.t + + # Now the 256bit product should be stored in (v2,v1) + # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l + # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l + + # Reduction + # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0] + # This is a slight variation of the Gueron's Montgomery reduction. + # The difference being the order of some operations has been changed, + # to make a better use of vclmul(h) instructions. + + # First step: + # c1 += (c0 * P)l + # vmv.v.i v0, 2 + @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t + @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t + @{[vxor_vv_v0t $V1, $V1, $V3]} # vxor.vv v1, v1, v3, v0.t + + # Second step: + # D = d1,d0 is final result + # We want: + # m1 = c1 + (c1 * P)h + # m0 = (c1 * P)l + (c0 * P)h + c0 + # d1 = c3 + m1 + # d0 = c2 + m0 + + #v3 = (c1 * P)l, 0 + @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t + #v4 = (c1 * P)h, (c0 * P)h + @{[vclmulh_vx $V4, $V1, $TMP3]} # vclmulh.vx v4, v1, t3 + + @{[vmv_v_i $V0, 1]} # vmv.v.i v0, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + + @{[vxor_vv $V1, $V1, $V4]} # vxor.vv v1, v1, v4 + @{[vxor_vv_v0t $V1, $V1, $V3]} # vxor.vv v1, v1, v3, v0.t + + # XOR in the upper upper part of the product + @{[vxor_vv $V2, $V2, $V1]} # vxor.vv v2, v2, v1 + + @{[vrev8_v $V2, $V2]} # vrev8.v v2, v2 + @{[vsse64_v $V2, $Xi, $TMP4]} # vsse64.v v2, (a0), t4 + ret +.size gcm_gmult_rv64i_zvkb,.-gcm_gmult_rv64i_zvkb +___ +} + +################################################################################ +# void gcm_ghash_rv64i_zvkb(u64 Xi[2], const u128 Htable[16], +# const u8 *inp, size_t len); +# +# input: Xi: current hash value +# Htable: preprocessed H +# inp: pointer to input data +# len: length of input data in bytes (mutiple of block size) +# output: Xi: Xi+1 (next hash value Xi) +{ +my ($Xi,$Htable,$inp,$len,$TMP0,$TMP1,$TMP2,$TMP3,$M8,$TMP5,$TMP6) = ("a0","a1","a2","a3","t0","t1","t2","t3","t4","t5","t6"); +my ($V0,$V1,$V2,$V3,$V4,$V5,$V6,$Vinp) = ("v0","v1","v2","v3","v4","v5","v6","v7"); + +$code .= <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zvkb +.type gcm_ghash_rv64i_zvkb,\@function +gcm_ghash_rv64i_zvkb: + ld $TMP0, ($Htable) + ld $TMP1, 8($Htable) + li $TMP2, 63 + la $TMP3, Lpolymod + ld $TMP3, 8($TMP3) + + # Load/store data in reverse order. + # This is needed as a part of endianness swap. + add $Xi, $Xi, 8 + add $inp, $inp, 8 + li $M8, -8 + + @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma + + @{[vlse64_v $V5, $Xi, $M8]} # vlse64.v v5, (a0), t4 + +Lstep: + # Read input data + @{[vlse64_v $Vinp, $inp, $M8]} # vle64.v v0, (a2) + add $inp, $inp, 16 + add $len, $len, -16 + # XOR them into Xi + @{[vxor_vv $V5, $V5, $Vinp]} # vxor.vv v0, v0, v1 + + @{[vrev8_v $V5, $V5]} # vrev8.v v5, v5 + + # Multiplication + + # Do two 64x64 multiplications in one go to save some time + # and simplify things. + + # A = a1a0 (t1, t0) + # B = b1b0 (v5) + # C = c1c0 (256 bit) + # c1 = a1b1 + (a0b1)h + (a1b0)h + # c0 = a0b0 + (a0b1)l + (a1b0)h + + # v1 = (a0b1)l,(a0b0)l + @{[vclmul_vx $V1, $V5, $TMP0]} # vclmul.vx v1, v5, t0 + # v3 = (a0b1)h,(a0b0)h + @{[vclmulh_vx $V3, $V5, $TMP0]} # vclmulh.vx v3, v5, t0 + + # v4 = (a1b1)l,(a1b0)l + @{[vclmul_vx $V4, $V5, $TMP1]} # vclmul.vx v4, v5, t1 + # v2 = (a1b1)h,(a1b0)h + @{[vclmulh_vx $V2, $V5, $TMP1]} # vclmulh.vx v2, v5, t1 + + # Is there a better way to do this? + # Would need to swap the order of elements within a vector register. + @{[vslideup_vi $V5, $V3, 1]} # vslideup.vi v5, v3, 1 + @{[vslideup_vi $V6, $V4, 1]} # vslideup.vi v6, v4, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + @{[vslidedown_vi $V4, $V4, 1]} # vslidedown.vi v4, v4, 1 + + @{[vmv_v_i $V0, 1]} # vmv.v.i v0, 1 + # v2 += (a0b1)h + @{[vxor_vv_v0t $V2, $V2, $V3]} # vxor.vv v2, v2, v3, v0.t + # v2 += (a1b1)l + @{[vxor_vv_v0t $V2, $V2, $V4]} # vxor.vv v2, v2, v4, v0.t + + @{[vmv_v_i $V0, 2]} # vmv.v.i v0, 2 + # v1 += (a0b0)h,0 + @{[vxor_vv_v0t $V1, $V1, $V5]} # vxor.vv v1, v1, v5, v0.t + # v1 += (a1b0)l,0 + @{[vxor_vv_v0t $V1, $V1, $V6]} # vxor.vv v1, v1, v6, v0.t + + # Now the 256bit product should be stored in (v2,v1) + # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l + # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l + + # Reduction + # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0] + # This is a slight variation of the Gueron's Montgomery reduction. + # The difference being the order of some operations has been changed, + # to make a better use of vclmul(h) instructions. + + # First step: + # c1 += (c0 * P)l + # vmv.v.i v0, 2 + @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t + @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t + @{[vxor_vv_v0t $V1, $V1, $V3]} # vxor.vv v1, v1, v3, v0.t + + # Second step: + # D = d1,d0 is final result + # We want: + # m1 = c1 + (c1 * P)h + # m0 = (c1 * P)l + (c0 * P)h + c0 + # d1 = c3 + m1 + # d0 = c2 + m0 + + #v3 = (c1 * P)l, 0 + @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t + #v4 = (c1 * P)h, (c0 * P)h + @{[vclmulh_vx $V4, $V1, $TMP3]} # vclmulh.vx v4, v1, t3 + + @{[vmv_v_i $V0, 1]} # vmv.v.i v0, 1 + @{[vslidedown_vi $V3, $V3, 1]} # vslidedown.vi v3, v3, 1 + + @{[vxor_vv $V1, $V1, $V4]} # vxor.vv v1, v1, v4 + @{[vxor_vv_v0t $V1, $V1, $V3]} # vxor.vv v1, v1, v3, v0.t + + # XOR in the upper upper part of the product + @{[vxor_vv $V2, $V2, $V1]} # vxor.vv v2, v2, v1 + + @{[vrev8_v $V5, $V2]} # vrev8.v v2, v2 + + bnez $len, Lstep + + @{[vsse64_v $V5, $Xi, $M8]} # vsse64.v v2, (a0), t4 + ret +.size gcm_ghash_rv64i_zvkb,.-gcm_ghash_rv64i_zvkb +___ +} + +$code .= <<___; +.p2align 4 +Lpolymod: + .dword 0x0000000000000001 + .dword 0xc200000000000000 +.size Lpolymod,.-Lpolymod +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Mon Feb 6 22:58:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130769 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6FCB2C636D3 for ; Mon, 6 Feb 2023 23:30:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=O0B7QWl2zosCuyDrwJI5NG2Wjx1eczlVA0VB3SDmvjk=; b=EV/FipNdCK5jTX Lvs/+hPjcVhK+QTaRhmCMZ3sfp2R1O/LnH/AnLlO293yzjytClJnu1bWBgsV74Oabdl78SGJPIGem 3DhF9cBNI5ERzJ3e3NJGIczkg0xteb4so6OPo07736+MDnn85E0dkMMYV3qi4aXiEEvRx6/9xoDkZ uT41Qs4ZlYCJAeJTqqlOhK2fCPoAeO2Ub3/SH5ddXphznnx67V3JfXbkgOWodvo/c2yvGflDhvlYk Fq8BtP2rseYqskuaINN1n3wcXfxr5sC2fQYoAoek9b+r2e3BbmFFK9T7Qb7hOg5VlYq61qz+rEpU0 W+mjKqCbcVWHbdpA0T6A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAvu-00ABJt-Oa; Mon, 06 Feb 2023 23:29:58 +0000 Received: from gloria.sntech.de ([185.11.138.130]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAvq-00ABIN-Sd for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 23:29:57 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARn-0002Mb-5v; Mon, 06 Feb 2023 23:58:51 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 11/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation Date: Mon, 6 Feb 2023 23:58:45 +0100 Message-Id: <20230206225846.1381789-12-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_152955_109659_9CC49A0A X-CRM114-Status: GOOD ( 22.98 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner When the Zvkg vector crypto extension is available another optimized gcm ghash variant is possible, so add it as another implmentation. Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 1 + arch/riscv/crypto/Makefile | 7 +- arch/riscv/crypto/ghash-riscv64-glue.c | 80 +++++++++++ arch/riscv/crypto/ghash-riscv64-zvkg.pl | 172 ++++++++++++++++++++++++ 4 files changed, 258 insertions(+), 2 deletions(-) create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 404fd9b3cb7c..84da19bdde8b 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -13,5 +13,6 @@ config CRYPTO_GHASH_RISCV64 Architecture: riscv64 using one of: - ZBC extension - ZVKB vector crypto extension + - ZVKG vector crypto extension endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 8ab9a0ae8f2d..1ee0ce7d3264 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -9,7 +9,7 @@ ifdef CONFIG_RISCV_ISA_ZBC ghash-riscv64-y += ghash-riscv64-zbc.o endif ifdef CONFIG_RISCV_ISA_V -ghash-riscv64-y += ghash-riscv64-zvkb.o +ghash-riscv64-y += ghash-riscv64-zvkb.o ghash-riscv64-zvkg.o endif quiet_cmd_perlasm = PERLASM $@ @@ -21,4 +21,7 @@ $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl $(obj)/ghash-riscv64-zvkb.S: $(src)/ghash-riscv64-zvkb.pl $(call cmd,perlasm) -clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S +$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl + $(call cmd,perlasm) + +clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c index 7376a8a793aa..4fed99e98019 100644 --- a/arch/riscv/crypto/ghash-riscv64-glue.c +++ b/arch/riscv/crypto/ghash-riscv64-glue.c @@ -31,6 +31,10 @@ void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16], void gcm_ghash_rv64i_zvkb(u64 Xi[2], const u128 Htable[16], const u8 *inp, size_t len); +/* Zvkg (vector crypto with vghmac.vv). */ +void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16], + const u8 *inp, size_t len); + struct riscv64_ghash_ctx { void (*ghash_func)(u64 Xi[2], const u128 Htable[16], const u8 *inp, size_t len); @@ -187,6 +191,63 @@ struct shash_alg riscv64_zvkb_ghash_alg = { }, }; +RISCV64_ZVK_SETKEY(zvkg, zvkg); +struct shash_alg riscv64_zvkg_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zvk_ghash_update, + .final = riscv64_zvk_ghash_final, + .setkey = riscv64_zvk_ghash_setkey_zvkg, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zvkg_ghash", + .cra_priority = 301, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +RISCV64_ZVK_SETKEY(zvkg__zbb_or_zbkb, zvkg); +struct shash_alg riscv64_zvkg_zbb_or_zbkb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zvk_ghash_update, + .final = riscv64_zvk_ghash_final, + .setkey = riscv64_zvk_ghash_setkey_zvkg__zbb_or_zbkb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zvkg_zbb_or_zbkb_ghash", + .cra_priority = 302, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + +RISCV64_ZVK_SETKEY(zvkg__zvkb, zvkg); +struct shash_alg riscv64_zvkg_zvkb_ghash_alg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = riscv64_ghash_init, + .update = riscv64_zvk_ghash_update, + .final = riscv64_zvk_ghash_final, + .setkey = riscv64_zvk_ghash_setkey_zvkg__zvkb, + .descsize = sizeof(struct riscv64_ghash_desc_ctx) + + sizeof(struct ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "riscv64_zvkg_zvkb_ghash", + .cra_priority = 303, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_ctx), + .cra_module = THIS_MODULE, + }, +}; + #endif /* CONFIG_RISCV_ISA_V */ #ifdef CONFIG_RISCV_ISA_ZBC @@ -385,6 +446,25 @@ static int __init riscv64_ghash_mod_init(void) if (ret < 0) return ret; } + + if (riscv_isa_extension_available(NULL, ZVKG)) { + ret = riscv64_ghash_register(&riscv64_zvkg_ghash_alg); + if (ret < 0) + return ret; + + if (riscv_isa_extension_available(NULL, ZVKB)) { + ret = riscv64_ghash_register(&riscv64_zvkg_zvkb_ghash_alg); + if (ret < 0) + return ret; + } + + if (riscv_isa_extension_available(NULL, ZBB) || + riscv_isa_extension_available(NULL, ZBKB)) { + ret = riscv64_ghash_register(&riscv64_zvkg_zbb_or_zbkb_ghash_alg); + if (ret < 0) + return ret; + } + } #endif return 0; diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/ghash-riscv64-zvkg.pl new file mode 100644 index 000000000000..1331d498f1f1 --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl @@ -0,0 +1,172 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 H[2]); +# void gcm_init_rv64i_zvkg__zbb_or_zbkb(u128 Htable[16], const u64 H[2]); +# void gcm_init_rv64i_zvkg__zvkb(u128 Htable[16], const u64 H[2]); +# +# input: H: 128-bit H - secret parameter E(K, 0^128) +# output: Htable: Copy of secret parameter (in normalized byte order) +# +# All callers of this function revert the byte-order unconditionally +# on little-endian machines. So we need to revert the byte-order back. +{ +my ($Htable,$H,$VAL0,$VAL1,$TMP0) = ("a0","a1","a2","a3","t0"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zvkg +.type gcm_init_rv64i_zvkg,\@function +gcm_init_rv64i_zvkg: + # First word + ld $VAL0, 0($H) + ld $VAL1, 8($H) + @{[sd_rev8_rv64i $VAL0, $Htable, 0, $TMP0]} + @{[sd_rev8_rv64i $VAL1, $Htable, 8, $TMP0]} + ret +.size gcm_init_rv64i_zvkg,.-gcm_init_rv64i_zvkg +___ +} + +{ +my ($Htable,$H,$TMP0,$TMP1) = ("a0","a1","t0","t1"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zvkg__zbb_or_zbkb +.type gcm_init_rv64i_zvkg__zbb_or_zbkb,\@function +gcm_init_rv64i_zvkg__zbb_or_zbkb: + ld $TMP0,0($H) + ld $TMP1,8($H) + @{[rev8 $TMP0, $TMP0]} #rev8 $TMP0, $TMP0 + @{[rev8 $TMP1, $TMP1]} #rev8 $TMP1, $TMP1 + sd $TMP0,0($Htable) + sd $TMP1,8($Htable) + ret +.size gcm_init_rv64i_zvkg__zbb_or_zbkb,.-gcm_init_rv64i_zvkg__zbb_or_zbkb +___ +} + +{ +my ($Htable,$H,$V0) = ("a0","a1","v0"); + +$code .= <<___; +.p2align 3 +.globl gcm_init_rv64i_zvkg__zvkb +.type gcm_init_rv64i_zvkg__zvkb,\@function +gcm_init_rv64i_zvkg__zvkb: + # All callers of this function revert the byte-order unconditionally + # on little-endian machines. So we need to revert the byte-order back. + @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma + @{[vle64_v $V0, $H]} # vle64.v v0, (a1) + @{[vrev8_v $V0, $V0]} # vrev8.v v0, v0 + @{[vse64_v $V0, $Htable]} # vse64.v v0, (a0) + ret +.size gcm_init_rv64i_zvkg__zvkb,.-gcm_init_rv64i_zvkg__zvkb +___ +} + +################################################################################ +# void gcm_gmult_rv64i_zvkg(u64 Xi[2], const u128 Htable[16]); +# +# input: Xi: current hash value +# Htable: copy of H +# output: Xi: next hash value Xi +{ +my ($Xi,$Htable) = ("a0","a1"); +my ($VD,$VS1,$VS2) = ("v1","v2","v3"); + +$code .= <<___; +.p2align 3 +.globl gcm_gmult_rv64i_zvkg +.type gcm_gmult_rv64i_zvkg,\@function +gcm_gmult_rv64i_zvkg: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + @{[vle32_v $VS1, $Htable]} + @{[vle32_v $VD, $Xi]} + # Use a zero-block as input + # This works because zero is the neutral element of XOR + @{[vmv_v_i $VS2, 0]} + @{[vghmac_vv $VD, $VS2, $VS1]} + @{[vse32_v $VD, $Xi]} + ret +.size gcm_gmult_rv64i_zvkg,.-gcm_gmult_rv64i_zvkg +___ +} + +################################################################################ +# void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16], +# const u8 *inp, size_t len); +# +# input: Xi: current hash value +# Htable: copy of H +# inp: pointer to input data +# len: length of input data in bytes (mutiple of block size) +# output: Xi: Xi+1 (next hash value Xi) +{ +my ($Xi,$Htable,$inp,$len) = ("a0","a1","a2","a3"); +my ($vXi,$vH,$vinp,$Vzero) = ("v1","v2","v3","v4"); + +$code .= <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zvkg +.type gcm_ghash_rv64i_zvkg,\@function +gcm_ghash_rv64i_zvkg: + @{[vsetivli__x0_4_e32_m1_ta_ma]} + @{[vle32_v $vH, $Htable]} + @{[vle32_v $vXi, $Xi]} + + # First loop part + @{[vle32_v $vinp, $inp]} + @{[vxor_vv $vXi, $vXi, $vinp]} + add $inp, $inp, 16 + add $len, $len, -16 + beqz $len, Lend + +Lstep: + @{[vle32_v $vinp, $inp]} + add $inp, $inp, 16 + add $len, $len, -16 + @{[vghmac_vv $vXi, $vinp, $vH]} + bnez $len, Lstep + +Lend: + # Final multiplication (no XOR operation) + @{[vmv_v_i $Vzero, 0]} + @{[vghmac_vv $vXi, $Vzero, $vH]} + + @{[vse32_v $vXi, $Xi]} + ret + +.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Mon Feb 6 22:58:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Heiko Stuebner X-Patchwork-Id: 13130770 X-Patchwork-Delegate: palmer@dabbelt.com Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B5EDEC05027 for ; Mon, 6 Feb 2023 23:30:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=Cp7N78KjXrdZBMRANbuTAxJVP1ZD2FyXryadbbA9DIQ=; b=F3WWv/t0NH3Keb J315tgyqYzEeT2o6ccYsxSJwWNjpWq4226D2nuklUOg+Hx10b+Y12+p9FdFpcMh/enMws41AJo4il +MHdoJCLO3bngKtHvkCt961u0TUxzBzMsMgy56NW1HXZ8BamgSFwAXu/F5DpbFgdLeLVDpxTeJZlm UyhnKMT+/BwAOYTKQk3+fMVPfdDyOfRC6cCcjZuhLi35WC0w65l+0vprsMbm+j5wtfVqQtxhhk1Pu XIOs+9NZxGKJG8z1z2VnjsCFXH3mdHuPXAhfz2fPNcHru5wzkNf2l4wRMiKI/5zDlxmZi7trE6x0i M7XgAwLULzgIGhUOuDNA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAvy-00ABLw-JD; Mon, 06 Feb 2023 23:30:02 +0000 Received: from gloria.sntech.de ([185.11.138.130]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPAvt-00ABJJ-SK for linux-riscv@lists.infradead.org; Mon, 06 Feb 2023 23:30:01 +0000 Received: from ip5b412258.dynamic.kabel-deutschland.de ([91.65.34.88] helo=phil.lan) by gloria.sntech.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1pPARn-0002Mb-EY; Mon, 06 Feb 2023 23:58:51 +0100 From: Heiko Stuebner To: palmer@rivosinc.com Cc: greentime.hu@sifive.com, conor@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, christoph.muellner@vrull.eu, Heiko Stuebner Subject: [PATCH RFC 12/12] RISC-V: crypto: add a vector-crypto-accelerated SHA256 implementation Date: Mon, 6 Feb 2023 23:58:46 +0100 Message-Id: <20230206225846.1381789-13-heiko@sntech.de> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230206225846.1381789-1-heiko@sntech.de> References: <20230206225846.1381789-1-heiko@sntech.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230206_152958_240401_265EAA01 X-CRM114-Status: GOOD ( 29.27 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Heiko Stuebner This adds an accelerated SHA256 algorithm using either the Zvknha or Zvknhb vector crypto extensions. The spec says that Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256. so the relevant acclerating instructions are included in both. Signed-off-by: Heiko Stuebner --- arch/riscv/crypto/Kconfig | 10 + arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sha256-riscv64-glue.c | 103 +++++ arch/riscv/crypto/sha256-riscv64-zvknha.pl | 502 +++++++++++++++++++++ 4 files changed, 622 insertions(+) create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c create mode 100644 arch/riscv/crypto/sha256-riscv64-zvknha.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 84da19bdde8b..c4e7d7526f1a 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -15,4 +15,14 @@ config CRYPTO_GHASH_RISCV64 - ZVKB vector crypto extension - ZVKG vector crypto extension +config CRYPTO_SHA256_RISCV64 + tristate "Hash functions: SHA-256" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_HASH + help + SHA-256 secure hash algorithm (FIPS 180) + + Architecture: riscv64 using + - Zvknha or Zvknhb vector crypto extensions + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 1ee0ce7d3264..02b3b4c32672 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -12,6 +12,9 @@ ifdef CONFIG_RISCV_ISA_V ghash-riscv64-y += ghash-riscv64-zvkb.o ghash-riscv64-zvkg.o endif +obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o +sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknhb.o + quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -24,4 +27,8 @@ $(obj)/ghash-riscv64-zvkb.S: $(src)/ghash-riscv64-zvkb.pl $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl $(call cmd,perlasm) +$(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl + $(call cmd,perlasm) + clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S +clean-files += sha256-riscv64-zvknha.S diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c new file mode 100644 index 000000000000..bde46907a823 --- /dev/null +++ b/arch/riscv/crypto/sha256-riscv64-glue.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISCV64 + */ + +#include +#include +#include +#include +#include +#include +#include + +asmlinkage void sha256_block_data_order_zvknha(u32 *digest, const void *data, + unsigned int num_blks); +EXPORT_SYMBOL(sha256_block_data_order_zvknha); + +static void __sha256_block_data_order(struct sha256_state *sst, u8 const *src, + int blocks) +{ + sha256_block_data_order_zvknha(sst->state, src, blocks); +} + +static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + if (crypto_simd_usable()) { + int ret; + + kernel_rvv_begin(); + ret = sha256_base_do_update(desc, data, len, + __sha256_block_data_order); + kernel_rvv_end(); + return ret; + } else { + return crypto_sha256_update(desc, data, len); + } +} + +static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + if (!crypto_simd_usable()) + return crypto_sha256_finup(desc, data, len, out); + + kernel_rvv_begin(); + if (len) + sha256_base_do_update(desc, data, len, + __sha256_block_data_order); + + sha256_base_do_finalize(desc, __sha256_block_data_order); + kernel_rvv_end(); + + return sha256_base_finish(desc, out); +} + +static int riscv64_sha256_final(struct shash_desc *desc, u8 *out) +{ + return riscv64_sha256_finup(desc, NULL, 0, out); +} + +static struct shash_alg sha256_alg = { + .digestsize = SHA256_DIGEST_SIZE, + .init = sha256_base_init, + .update = riscv64_sha256_update, + .final = riscv64_sha256_final, + .finup = riscv64_sha256_finup, + .descsize = sizeof(struct sha256_state), + .base.cra_name = "sha256", + .base.cra_driver_name = "sha256-riscv64-zvknha", + .base.cra_priority = 150, + .base.cra_blocksize = SHA256_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, +}; + +static int __init sha256_mod_init(void) +{ + /* + * From the spec: + * Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256. + */ + if (riscv_isa_extension_available(NULL, ZVKNHA) || + riscv_isa_extension_available(NULL, ZVKNHB)) + return crypto_register_shash(&sha256_alg); + + return 0; +} + +static void __exit sha256_mod_fini(void) +{ + if (riscv_isa_extension_available(NULL, ZVKNHA) || + riscv_isa_extension_available(NULL, ZVKNHB)) + crypto_unregister_shash(&sha256_alg); +} + +module_init(sha256_mod_init); +module_exit(sha256_mod_fini); + +MODULE_DESCRIPTION("SHA-256 secure hash for riscv64"); +MODULE_AUTHOR("Andy Polyakov "); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("sha256"); diff --git a/arch/riscv/crypto/sha256-riscv64-zvknha.pl b/arch/riscv/crypto/sha256-riscv64-zvknha.pl new file mode 100644 index 000000000000..c15978926287 --- /dev/null +++ b/arch/riscv/crypto/sha256-riscv64-zvknha.pl @@ -0,0 +1,502 @@ +#! /usr/bin/env perl +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html + +# The generated code of this file depends on the following RISC-V extensions: +# - RV64I +# - RISC-V vector ('V') +# - Vector Bit-manipulation used in Cryptography ('Zvkb') +# - Vector SHA-2 Secure Hash ('Zvknha') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17"); +my ($V26, $V27) = ("v26", "v27"); + +my $K256 = "K256"; + +# Function arguments +my ($H, $INP, $KT, $LEN, $STRIDE) = ("a0", "a1", "a2", "a3", "t3"); + +################################################################################ +# void sha256_block_data_order(void *c, const void *p, size_t len) +$code .= <<___; +.p2align 2 +.globl sha256_block_data_order_zvknha +.type sha256_block_data_order_zvknha,\@function +sha256_block_data_order_zvknha: + + # $LEN is stored in a2 + mv $LEN, a2 + + # Register use in this function: + # + # SCALARS: + # a0 (i.e., x10): initially the address of the first byte of `hash`, + # modified during the logic. + # a1: initially the address of the first byte of the message block, + # modified during the initial load. + # a2: initially the address of the first byte of the round constants + # 'Kt', incremented during the rounds. + # + # VECTORS + # v10 - v13 (512-bits / 4*128 bits / 4*4*32 bits), hold the message + # schedule words (Wt). They start with the message block + # content (W0 to W15), then further words in the message + # schedule generated via vsha2ms from previous Wt. + # Initially: + # v10 = W[ 3:0] = { W3, W2, W1, W0} + # v11 = W[ 7:4] = { W7, W6, W5, W4} + # v12 = W[ 11:8] = {W11, W10, W9, W8} + # v13 = W[15:12] = {W15, W14, W13, W12} + # + # v16 - v17 hold the working state variables (a, b, ..., h) + # v16 = {a[t],b[t],e[t],f[t]} + # v17 = {c[t],d[t],g[t],h[t]} + # Initially: + # v16 = {H5i-1, H4i-1, H1i-1 , H0i-1} + # v17 = {H7i-i, H6i-1, H3i-1 , H2i-1} + # + # v0 = masks for vrgather/vmerge. Single value during the 16 rounds. + # + # v14 = temporary, Wt+Kt + # v15 = temporary, Kt + # + # v18/v19 = temporaries, in the epilogue, to re-arrange + # and byte-swap v16/v17 + # + # v26/v27 = hold the initial values of the hash, byte-swapped. + # + # v30/v31 = used to generate masks, vrgather indices. + # + # During most of the function the vector state is configured so that each + # vector is interpreted as containing four 32 bits (e32) elements (128 bits). + + # Set vectors as 4 * 32 bits + # + # e32: vector of 32b/4B elements + # m1: LMUL=1 + # ta: tail agnostic (don't care about those lanes) + # ma: mask agnostic (don't care about those lanes) + # x0 is not written, we known the number of vector elements, 4. + @{[vsetivli__x0_4_e32_m1_ta_ma]} + + # Load H[0..8] to produce + # v26 = v16 = {a[t],b[t],e[t],f[t]} + # v27 = v17 = {c[t],d[t],g[t],h[t]} + # + # To minimize per-block work, H is provided as {f,e,b,a, h,g,d,c} + # with the bytes in little endian order, i.e., not in NIST endianness + # or order. + @{[vle32_v $V16, $H]} + @{[vmv_v_v $V26, $V16]} + addi $H, $H, 16 + @{[vle32_v $V17, $H]} + @{[vmv_v_v $V27, $V17]} + addi $H, $H, -16 + + @{[vslideup_vi $V16, $V27, 2]} + @{[vslidedown_vi $V17, $V26, 2]} + @{[vslidedown_vi $V26, $V27, 2]} + @{[vslideup_vi $V17, $V26, 2]} + + @{[vse32_v $V16, $H]} + addi $H, $H, 16 + @{[vse32_v $V17, $H]} + addi $H, $H, -16 + @{[vmv_v_v $V26, $V16]} + + li $STRIDE, -4 + + addi $H, $H, 12 + @{[vlse32_v $V16, $H, $STRIDE]} + @{[vmv_v_v $V26, $V16]} + + addi $H, $H, 16 + @{[vlse32_v $V17, $H, $STRIDE]} + @{[vmv_v_v $V27, $V17]} + addi $H, $H, -28 + + + # Load the 512-bits of the message block in v10-v13 and perform + # an endian swap on each 4 bytes element. + # + # If Zvkb is not implemented, one can use vrgather with the right index + # sequence. It requires loading in separate registers since the destination + # of vrgather cannot overlap the source. + # # We generate the lane (byte) index sequence + # # v24 = [3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12] + # # gives us "N ^ 3" as a nice formula to generate + # # this sequence. 'vid' gives us the N. + # # + # # We switch the vector type to SEW=8 temporarily. + # vsetivli x0, 16, e8, m1, ta, ma + # vid.v v24 + # vxor.vi v24, v24, 0x3 + # # Byteswap the bytes in each word of the text. + # vrgather.vv v10, v20, v24 + # vrgather.vv v11, v21, v24 + # vrgather.vv v12, v22, v24 + # vrgather.vv v13, v23, v24 + # # Switch back to SEW=32 + # vsetivli x0, 4, e32, m1, ta, ma + +L_round_loop: + la $KT, $K256 # Load round constants K256 + + # First loop part + @{[vle32_v $V10, $INP]} + @{[vrev8_v $V10, $V10]} + add $INP, $INP, 16 + @{[vle32_v $V11, $INP]} + @{[vrev8_v $V11, $V11]} + add $INP, $INP, 16 + @{[vle32_v $V12, $INP]} + @{[vrev8_v $V12, $V12]} + add $INP, $INP, 16 + @{[vle32_v $V13, $INP]} + @{[vrev8_v $V13, $V13]} + add $INP, $INP, 16 + + # Decrement length by 1 + add $LEN, $LEN, -1 + + # Set v0 up for the vmerge that replaces the first word (idx==0) + @{[vid_v $V0]} + @{[vmseq_vi $V0, $V0, 0x0]} # v0.mask[i] = (i == 0 ? 1 : 0) + + # Overview of the logic in each "quad round". + # + # The code below repeats 16 times the logic implementing four rounds + # of the SHA-256 core loop as documented by NIST. 16 "quad rounds" + # to implementing the 64 single rounds. + # + # # Load four word (u32) constants (K[t+3], K[t+2], K[t+1], K[t+0]) + # # Output: + # # v15 = {K[t+3], K[t+2], K[t+1], K[t+0]} + # vl1re32.v v15, (a2) + # + # # Increment word contant address by stride (16 bytes, 4*4B, 128b) + # addi a2, a2, 16 + # + # # Add constants to message schedule words: + # # Input + # # v15 = {K[t+3], K[t+2], K[t+1], K[t+0]} + # # v10 = {W[t+3], W[t+2], W[t+1], W[t+0]}; // Vt0 = W[3:0]; + # # Output + # # v14 = {W[t+3]+K[t+3], W[t+2]+K[t+2], W[t+1]+K[t+1], W[t+0]+K[t+0]} + # vadd.vv v14, v15, v10 + # + # # 2 rounds of working variables updates. + # # v17[t+4] <- v17[t], v16[t], v14[t] + # # Input: + # # v17 = {c[t],d[t],g[t],h[t]} " = v17[t] " + # # v16 = {a[t],b[t],e[t],f[t]} + # # v14 = {W[t+3]+K[t+3], W[t+2]+K[t+2], W[t+1]+K[t+1], W[t+0]+K[t+0]} + # # Output: + # # v17 = {f[t+2],e[t+2],b[t+2],a[t+2]} " = v16[t+2] " + # # = {h[t+4],g[t+4],d[t+4],c[t+4]} " = v17[t+4] " + # vsha2cl.vv v17, v16, v14 + # + # # 2 rounds of working variables updates. + # # v16[t+4] <- v16[t], v16[t+2], v14[t] + # # Input + # # v16 = {a[t],b[t],e[t],f[t]} " = v16[t] " + # # = {h[t+2],g[t+2],d[t+2],c[t+2]} " = v17[t+2] " + # # v17 = {f[t+2],e[t+2],b[t+2],a[t+2]} " = v16[t+2] " + # # v14 = {W[t+3]+K[t+3], W[t+2]+K[t+2], W[t+1]+K[t+1], W[t+0]+K[t+0]} + # # Output: + # # v16 = {f[t+4],e[t+4],b[t+4],a[t+4]} " = v16[t+4] " + # vsha2ch.vv v16, v17, v14 + # + # # Combine 2QW into 1QW + # # + # # To generate the next 4 words, "new_v10"/"v14" from v10-v13, vsha2ms needs + # # v10[0..3], v11[0], v12[1..3], v13[0, 2..3] + # # and it can only take 3 vectors as inputs. Hence we need to combine + # # v11[0] and v12[1..3] in a single vector. + # # + # # vmerge Vt4, Vt1, Vt2, V0 + # # Input + # # V0 = mask // first word from v12, 1..3 words from v11 + # # V12 = {Wt-8, Wt-7, Wt-6, Wt-5} + # # V11 = {Wt-12, Wt-11, Wt-10, Wt-9} + # # Output + # # Vt4 = {Wt-12, Wt-7, Wt-6, Wt-5} + # vmerge.vvm v14, v12, v11, v0 + # + # # Generate next Four Message Schedule Words (hence allowing for 4 more rounds) + # # Input + # # V10 = {W[t+ 3], W[t+ 2], W[t+ 1], W[t+ 0]} W[ 3: 0] + # # V13 = {W[t+15], W[t+14], W[t+13], W[t+12]} W[15:12] + # # V14 = {W[t+11], W[t+10], W[t+ 9], W[t+ 4]} W[11: 9,4] + # # Output (next four message schedule words) + # # v10 = {W[t+19], W[t+18], W[t+17], W[t+16]} W[19:16] + # vsha2ms.vv v10, v14, v13 + # + # BEFORE + # v10 - v13 hold the message schedule words (initially the block words) + # v10 = W[ 3: 0] "oldest" + # v11 = W[ 7: 4] + # v12 = W[11: 8] + # v13 = W[15:12] "newest" + # + # vt6 - vt7 hold the working state variables + # v16 = {a[t],b[t],e[t],f[t]} // initially {H5,H4,H1,H0} + # v17 = {c[t],d[t],g[t],h[t]} // initially {H7,H6,H3,H2} + # + # AFTER + # v10 - v13 hold the message schedule words (initially the block words) + # v11 = W[ 7: 4] "oldest" + # v12 = W[11: 8] + # v13 = W[15:12] + # v10 = W[19:16] "newest" + # + # v16 and v17 hold the working state variables + # v16 = {a[t+4],b[t+4],e[t+4],f[t+4]} + # v17 = {c[t+4],d[t+4],g[t+4],h[t+4]} + # + # The group of vectors v10,v11,v12,v13 is "rotated" by one in each quad-round, + # hence the uses of those vectors rotate in each round, and we get back to the + # initial configuration every 4 quad-rounds. We could avoid those changes at + # the cost of moving those vectors at the end of each quad-rounds. + + #-------------------------------------------------------------------------------- + # Quad-round 0 (+0, Wt from oldest to newest in v10->v11->v12->v13) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} # Generate W[19:16] + #-------------------------------------------------------------------------------- + # Quad-round 1 (+1, v11->v12->v13->v10) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} # Generate W[23:20] + #-------------------------------------------------------------------------------- + # Quad-round 2 (+2, v12->v13->v10->v11) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} # Generate W[27:24] + #-------------------------------------------------------------------------------- + # Quad-round 3 (+3, v13->v10->v11->v12) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} # Generate W[31:28] + + #-------------------------------------------------------------------------------- + # Quad-round 4 (+0, v10->v11->v12->v13) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} # Generate W[35:32] + #-------------------------------------------------------------------------------- + # Quad-round 5 (+1, v11->v12->v13->v10) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} # Generate W[39:36] + #-------------------------------------------------------------------------------- + # Quad-round 6 (+2, v12->v13->v10->v11) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} # Generate W[43:40] + #-------------------------------------------------------------------------------- + # Quad-round 7 (+3, v13->v10->v11->v12) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} # Generate W[47:44] + + #-------------------------------------------------------------------------------- + # Quad-round 8 (+0, v10->v11->v12->v13) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V12, $V11, $V0]} + @{[vsha2ms_vv $V10, $V14, $V13]} # Generate W[51:48] + #-------------------------------------------------------------------------------- + # Quad-round 9 (+1, v11->v12->v13->v10) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V13, $V12, $V0]} + @{[vsha2ms_vv $V11, $V14, $V10]} # Generate W[55:52] + #-------------------------------------------------------------------------------- + # Quad-round 10 (+2, v12->v13->v10->v11) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V10, $V13, $V0]} + @{[vsha2ms_vv $V12, $V14, $V11]} # Generate W[59:56] + #-------------------------------------------------------------------------------- + # Quad-round 11 (+3, v13->v10->v11->v12) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + @{[vmerge_vvm $V14, $V11, $V10, $V0]} + @{[vsha2ms_vv $V13, $V14, $V12]} # Generate W[63:60] + + #-------------------------------------------------------------------------------- + # Quad-round 12 (+0, v10->v11->v12->v13) + # Note that we stop generating new message schedule words (Wt, v10-13) + # as we already generated all the words we end up consuming (i.e., W[63:60]). + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V10]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + #-------------------------------------------------------------------------------- + # Quad-round 13 (+1, v11->v12->v13->v10) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V11]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + #-------------------------------------------------------------------------------- + # Quad-round 14 (+2, v12->v13->v10->v11) + @{[vl1re32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vadd_vv $V14, $V15, $V12]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + #-------------------------------------------------------------------------------- + # Quad-round 15 (+3, v13->v10->v11->v12) + @{[vl1re32_v $V15, $KT]} + # No kt increment needed. + @{[vadd_vv $V14, $V15, $V13]} + @{[vsha2cl_vv $V17, $V16, $V14]} + @{[vslidedown_vi $V14, $V14, 2]} + @{[vsha2cl_vv $V16, $V17, $V14]} + + # H' = H+{a',b',c',...,h'} + @{[vadd_vv $V16, $V26, $V16]} + @{[vadd_vv $V17, $V27, $V17]} + @{[vmv_v_v $V26, $V16]} + @{[vmv_v_v $V27, $V17]} + bnez $LEN, L_round_loop + +L_end: + @{[vmv_v_v $V26, $V16]} + @{[vmv_v_v $V27, $V17]} + + addi $H, $H, 12 + @{[vsse32_v $V16, $H, $STRIDE]} + @{[vmv_v_v $V26, $V16]} + + addi $H, $H, 16 + @{[vsse32_v $V17, $H, $STRIDE]} + @{[vmv_v_v $V27, $V17]} + addi $H, $H, -28 + + @{[vle32_v $V16, $H]} + @{[vmv_v_v $V26, $V16]} + addi $H, $H, 16 + @{[vle32_v $V17, $H]} + @{[vmv_v_v $V27, $V17]} + addi $H, $H, -16 + + @{[vslideup_vi $V16, $V27, 2]} + @{[vslidedown_vi $V17, $V26, 2]} + @{[vslidedown_vi $V26, $V27, 2]} + @{[vslideup_vi $V17, $V26, 2]} + + @{[vse32_v $V16, $H]} + addi $H, $H, 16 + @{[vse32_v $V17, $H]} + ret + +.p2align 2 +.type $K256,\@object +$K256: + .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5 + .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5 + .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3 + .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174 + .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc + .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da + .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7 + .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967 + .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13 + .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85 + .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3 + .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070 + .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5 + .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3 + .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208 + .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 +___ + +print $code; +close STDOUT or die "error closing STDOUT: $!";