From patchwork Wed Oct 25 18:36:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436515 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2193C07545 for ; Wed, 25 Oct 2023 18:37:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234846AbjJYShG (ORCPT ); Wed, 25 Oct 2023 14:37:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234757AbjJYShB (ORCPT ); Wed, 25 Oct 2023 14:37:01 -0400 Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19C29115 for ; Wed, 25 Oct 2023 11:37:00 -0700 (PDT) Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-5a9bc2ec556so91978a12.0 for ; Wed, 25 Oct 2023 11:37:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259019; x=1698863819; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/AQWefridDFzEX+S3QZsEdRxceu5Syx5iK5d77pjmdM=; b=BEk/TZiyK9j8wTcqs7wcZdkD0N+ykp+xXIBbXRNUL9uCTlc9wcGaIJhk3jJ4wlWBmN 4CyoPG5wg8UnGZLqnLHvjRnfggn3SP7l+ThtNAMXeUWbhmIrZSeuwhJG2/sJlxvi/2Vz e9/YTTyBokdGQFJhPDNCXMqnGrQPJYkd7IcIx0WgErwVX623U8vgo5ipUY91SDucXeBt 7fpqvk2JchFvE2BVh9YQR3VanCGzZ6G1OP+MdDqS7+YtxnX3TZIsz01NBqHYWUEv3ZV6 9+XPxx1eSBiVkdfyclPoQbAVgIPKKbuiEoYHzi8DmI1oLvQ9uyTIKCVAs2YYfHgddsYE b8Gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259019; x=1698863819; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/AQWefridDFzEX+S3QZsEdRxceu5Syx5iK5d77pjmdM=; b=rhJwCtGayfhqXQWxcKaZwe83f+ir5hyXzNdFE64w6Rf1rI/pt07y/h64GoPkw4KZjl Ks5iCQah3NTK1QHdvT5w94T+WEM2j4VH8ld1VT+jd62G4v0dZkZUAYhbk2g92l4PilaS gtgEVKW295n8K82VRjtlVkZt85RSJ3DRYozndSBrHmQQYKvSouh8C1WQ8CjGphum1OxG 5Bi8SCYJRKMuc4GJ24lOGsCQLleQXBypIgQ183uPwaN40kXbSbgewrSITChZGAKCDRA0 HXF28dGfE9ugzAE6W48B54sfCjaqNLdK6LjXvqC8UciRfBECQOblfvlnmVHQQr3TUPUz 9twA== X-Gm-Message-State: AOJu0YyRtDAkPD9//E6uVt66+uIQZxB/hGnJ+MKFq9ih5TNRgMcHklBt vW7jI9NLRZ+3z9q8tQ83u62Z+A== X-Google-Smtp-Source: AGHT+IFi/NboaaxSjvkdMStkpkrWMbnIqk4WWFJWYRx8koDO2Zab0WQxKwEoeYJQcTe6ecrly6nnFw== X-Received: by 2002:a17:90a:17c5:b0:27d:2100:b57c with SMTP id q63-20020a17090a17c500b0027d2100b57cmr14411038pja.37.1698259019481; Wed, 25 Oct 2023 11:36:59 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.36.55 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:36:59 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 01/12] RISC-V: add helper function to read the vector VLEN Date: Thu, 26 Oct 2023 02:36:33 +0800 Message-Id: <20231025183644.8735-2-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org From: Heiko Stuebner VLEN describes the length of each vector register and some instructions need specific minimal VLENs to work correctly. The vector code already includes a variable riscv_v_vsize that contains the value of "32 vector registers with vlenb length" that gets filled during boot. vlenb is the value contained in the CSR_VLENB register and the value represents "VLEN / 8". So add riscv_vector_vlen() to return the actual VLEN value for in-kernel users when they need to check the available VLEN. Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih --- arch/riscv/include/asm/vector.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h index 9fb2dea66abd..1fd3e5510b64 100644 --- a/arch/riscv/include/asm/vector.h +++ b/arch/riscv/include/asm/vector.h @@ -244,4 +244,15 @@ void kernel_vector_allow_preemption(void); #define kernel_vector_allow_preemption() do {} while (0) #endif +/* + * Return the implementation's vlen value. + * + * riscv_v_vsize contains the value of "32 vector registers with vlenb length" + * so rebuild the vlen value in bits from it. + */ +static inline int riscv_vector_vlen(void) +{ + return riscv_v_vsize / 32 * 8; +} + #endif /* ! __ASM_RISCV_VECTOR_H */ From patchwork Wed Oct 25 18:36:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436516 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 245C4C0032E for ; Wed, 25 Oct 2023 18:37:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234893AbjJYShN (ORCPT ); Wed, 25 Oct 2023 14:37:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234847AbjJYShG (ORCPT ); Wed, 25 Oct 2023 14:37:06 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F370C18D for ; Wed, 25 Oct 2023 11:37:03 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id 41be03b00d2f7-5aa481d53e5so89031a12.1 for ; Wed, 25 Oct 2023 11:37:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259023; x=1698863823; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WMA+62ieF0dHRmbCC9GfwETOr1r4110ste5ly+nBz6U=; b=FyWciSEhZaGJi8lq7OaF4XxWpflQgxldWhG6M26PCik/FEGWr34IA6SU/ZdBsMIHTX Oyy+CKsNHq4yAY72nJdPnnqTEHomF4+Plx1qJo10VfZk6kBZ3ySTdt/s7tfTi43MdpZR 2QfHo2UlX8dm4jA9mZ+Vxk2SSn50k8wCOSD1zZPGtoISqSFeqmP/pAj9Ki3bwNLMWCyg LELHMhQWVZCFg5Xa6HBqakc7k4kxSs1uTcO67ZVNLphRIwNODEnUyo/WSNy9pw9zWnnw 42XvCPdhdSlob8q2WnfGH9l8iWjBxsb6JnpPu8vYwCxT27+DeLD+9KNYSbriEfcKs1I1 Og2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259023; x=1698863823; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WMA+62ieF0dHRmbCC9GfwETOr1r4110ste5ly+nBz6U=; b=cNf580UlRCUzLDqsuPG9kF8S5BpsmYxt3P786KWCCPzDsyvprqU5LVLcgWlTWnpyXI wz/VhyPMpWwMI9iDhjRzj+x0vbHtyAp5hZ6yuYym+UUlHnhOR/kzIr/22VWqVM+m0g7l ThEYIdJqi1VMhU7CDXDPOxe5ZOb8QhneJpVV8L5Rt1qJaT5iST26aWgoZpXju5+TU1Pz zLMn+wXXnK2MQZZrmSjit70VWzFVyoNbuAQc2vWeA4CeDzEpoY+9Blv7o+HzIsZ4QbXv 46jRkY5d48vW3WxDqHQHKidQNiM9psbyvRf5ETpgp4DOin11mTeJUq7sqDnmctp8m6zh FaaA== X-Gm-Message-State: AOJu0YwLaDDrBk3bSHRIWQ02qrWIb1TS0p+rwrXwFwvFcVzw7lshJjHh ath/SwJFeOJf3AwXIjkZTrzmKQ== X-Google-Smtp-Source: AGHT+IGWiBP6XA7K/f/VG9hs7DTTqjpeQ8mNNk//GZj6Js6XKPfQH2D4dC63vtWJoNHA/uNws80I8A== X-Received: by 2002:a17:90a:7889:b0:27d:1dc:aa5a with SMTP id x9-20020a17090a788900b0027d01dcaa5amr15159894pjk.23.1698259023367; Wed, 25 Oct 2023 11:37:03 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.36.59 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:03 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 02/12] RISC-V: hook new crypto subdir into build-system Date: Thu, 26 Oct 2023 02:36:34 +0800 Message-Id: <20231025183644.8735-3-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org From: Heiko Stuebner Create a crypto subdirectory for added accelerated cryptography routines and hook it into the riscv Kbuild and the main crypto Kconfig. Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih --- arch/riscv/Kbuild | 1 + arch/riscv/crypto/Kconfig | 5 +++++ arch/riscv/crypto/Makefile | 4 ++++ crypto/Kconfig | 3 +++ 4 files changed, 13 insertions(+) create mode 100644 arch/riscv/crypto/Kconfig create mode 100644 arch/riscv/crypto/Makefile diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild index d25ad1c19f88..2c585f7a0b6e 100644 --- a/arch/riscv/Kbuild +++ b/arch/riscv/Kbuild @@ -2,6 +2,7 @@ obj-y += kernel/ mm/ net/ obj-$(CONFIG_BUILTIN_DTB) += boot/dts/ +obj-$(CONFIG_CRYPTO) += crypto/ obj-y += errata/ obj-$(CONFIG_KVM) += kvm/ diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig new file mode 100644 index 000000000000..10d60edc0110 --- /dev/null +++ b/arch/riscv/crypto/Kconfig @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +menu "Accelerated Cryptographic Algorithms for CPU (riscv)" + +endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile new file mode 100644 index 000000000000..b3b6332c9f6d --- /dev/null +++ b/arch/riscv/crypto/Makefile @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# linux/arch/riscv/crypto/Makefile +# diff --git a/crypto/Kconfig b/crypto/Kconfig index 650b1b3620d8..c7b23d2c58e4 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1436,6 +1436,9 @@ endif if PPC source "arch/powerpc/crypto/Kconfig" endif +if RISCV +source "arch/riscv/crypto/Kconfig" +endif if S390 source "arch/s390/crypto/Kconfig" endif From patchwork Wed Oct 25 18:36:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436517 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32100C0032E for ; Wed, 25 Oct 2023 18:37:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234859AbjJYShY (ORCPT ); Wed, 25 Oct 2023 14:37:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234852AbjJYShR (ORCPT ); Wed, 25 Oct 2023 14:37:17 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 357D6196 for ; Wed, 25 Oct 2023 11:37:08 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1c77449a6daso22995ad.0 for ; Wed, 25 Oct 2023 11:37:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259027; x=1698863827; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=N6MgwafsLqjywfMCXjdUW6zqI+itRnJtEI6/v6PIAPo=; b=J8pGuzYtbwxp6BKJzugWb4G291Ue4UVOcQSQawBCrNn4p3lu+nplTptpFfof1jRrdI zMds9De++5yQtiqFH/l6oQTQt0bwBi0M83WK7BftijDhpUR7KZ/xc30dkB2p/HI/yd2S 0I2VzrHAxN23Pc1iAOW+gULA9PmjO/wEl8QgLi9X+4R/r4JZBnnXSMi1LuvNYJmbHX1r 9pLICOwfRnbFSJqu7/8Ns0Bc5gR2Cw8b9OdBUAPf1L/S/3QHsx5eEytG7WhJgH+JSxwd Ej47reBYEfb6BDN2r5sRyJga/pfxdG4wjvr1Gd8mYeoHTVu7DHuwL7wy8brLy4raCppS +nPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259027; x=1698863827; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N6MgwafsLqjywfMCXjdUW6zqI+itRnJtEI6/v6PIAPo=; b=L2d5O9ZrmQVWencUyP+47kYaOhVLW6jZYhMmA+PFCIInjaDNtyr8xD2i/VtWFbCyFK doEIru5hUGkyKHU9aH95C9YtQVSs7+EoFeUsr1T5CyqGk+j4wsGANy8X9fTPWxUKtKHw 5DSQ9fRS4E/7WExt5RP7Fspxspn0hDBhPiJK+ZoT7Ta78j3KVjXq/Tr3Qi+E4CnokHtX YGEn5NlzsZQJpsRiquC9TUOh3MkwAg2Anhnt+2x62v6zhQLNeExlu4FNU4U/gHrDfgmw s/OcCM2O8U+MUlQWQkmcrl0SdrOQjfCSEPsMAxdGNXEuELKd/WSHD+KDGG2wR88Jm0gO QTXg== X-Gm-Message-State: AOJu0YzFoB547qirS4hcJ2Yst+jR6EP3Q9CElDHyuzl58ZQu6WXXrEkx OYEiPl8ku61da7S5ySEDxUC3gQ== X-Google-Smtp-Source: AGHT+IG0z9SRqLNyJg2TehdS4CqqTSOT1S0/ni0xDxr1Z9HFhKdeiZpJ1HKxezskjzDh9blmV89o3A== X-Received: by 2002:a17:90a:1c8:b0:27d:6dd:fb7d with SMTP id 8-20020a17090a01c800b0027d06ddfb7dmr16708676pjd.17.1698259027428; Wed, 25 Oct 2023 11:37:07 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.03 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:06 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 03/12] RISC-V: crypto: add OpenSSL perl module for vector instructions Date: Thu, 26 Oct 2023 02:36:35 +0800 Message-Id: <20231025183644.8735-4-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The OpenSSL has some RISC-V vector cryptography implementations which could be reused for kernel. These implementations use a number of perl helpers for handling vector and vector-crypto-extension instructions. This patch take these perl helpers from OpenSSL(openssl/openssl#21923). The unused scalar crypto instructions in the original perl module are skipped. Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- arch/riscv/crypto/riscv.pm | 828 +++++++++++++++++++++++++++++++++++++ 1 file changed, 828 insertions(+) create mode 100644 arch/riscv/crypto/riscv.pm diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm new file mode 100644 index 000000000000..e188f7476e3e --- /dev/null +++ b/arch/riscv/crypto/riscv.pm @@ -0,0 +1,828 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph Müllner +# Copyright (c) 2023, Jerry Shih +# Copyright (c) 2023, Phoebe Chen +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +use strict; +use warnings; + +# Set $have_stacktrace to 1 if we have Devel::StackTrace +my $have_stacktrace = 0; +if (eval {require Devel::StackTrace;1;}) { + $have_stacktrace = 1; +} + +my @regs = map("x$_",(0..31)); +# Mapping from the RISC-V psABI ABI mnemonic names to the register number. +my @regaliases = ('zero','ra','sp','gp','tp','t0','t1','t2','s0','s1', + map("a$_",(0..7)), + map("s$_",(2..11)), + map("t$_",(3..6)) +); + +my %reglookup; +@reglookup{@regs} = @regs; +@reglookup{@regaliases} = @regs; + +# Takes a register name, possibly an alias, and converts it to a register index +# from 0 to 31 +sub read_reg { + my $reg = lc shift; + if (!exists($reglookup{$reg})) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unknown register ".$reg."\n".$trace); + } + my $regstr = $reglookup{$reg}; + if (!($regstr =~ /^x([0-9]+)$/)) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Could not process register ".$reg."\n".$trace); + } + return $1; +} + +# Read the sew setting(8, 16, 32 and 64) and convert to vsew encoding. +sub read_sew { + my $sew_setting = shift; + + if ($sew_setting eq "e8") { + return 0; + } elsif ($sew_setting eq "e16") { + return 1; + } elsif ($sew_setting eq "e32") { + return 2; + } elsif ($sew_setting eq "e64") { + return 3; + } else { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unsupported SEW setting:".$sew_setting."\n".$trace); + } +} + +# Read the LMUL settings and convert to vlmul encoding. +sub read_lmul { + my $lmul_setting = shift; + + if ($lmul_setting eq "mf8") { + return 5; + } elsif ($lmul_setting eq "mf4") { + return 6; + } elsif ($lmul_setting eq "mf2") { + return 7; + } elsif ($lmul_setting eq "m1") { + return 0; + } elsif ($lmul_setting eq "m2") { + return 1; + } elsif ($lmul_setting eq "m4") { + return 2; + } elsif ($lmul_setting eq "m8") { + return 3; + } else { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unsupported LMUL setting:".$lmul_setting."\n".$trace); + } +} + +# Read the tail policy settings and convert to vta encoding. +sub read_tail_policy { + my $tail_setting = shift; + + if ($tail_setting eq "ta") { + return 1; + } elsif ($tail_setting eq "tu") { + return 0; + } else { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unsupported tail policy setting:".$tail_setting."\n".$trace); + } +} + +# Read the mask policy settings and convert to vma encoding. +sub read_mask_policy { + my $mask_setting = shift; + + if ($mask_setting eq "ma") { + return 1; + } elsif ($mask_setting eq "mu") { + return 0; + } else { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unsupported mask policy setting:".$mask_setting."\n".$trace); + } +} + +my @vregs = map("v$_",(0..31)); +my %vreglookup; +@vreglookup{@vregs} = @vregs; + +sub read_vreg { + my $vreg = lc shift; + if (!exists($vreglookup{$vreg})) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Unknown vector register ".$vreg."\n".$trace); + } + if (!($vreg =~ /^v([0-9]+)$/)) { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("Could not process vector register ".$vreg."\n".$trace); + } + return $1; +} + +# Read the vm settings and convert to mask encoding. +sub read_mask_vreg { + my $vreg = shift; + # The default value is unmasked. + my $mask_bit = 1; + + if (defined($vreg)) { + my $reg_id = read_vreg $vreg; + if ($reg_id == 0) { + $mask_bit = 0; + } else { + my $trace = ""; + if ($have_stacktrace) { + $trace = Devel::StackTrace->new->as_string; + } + die("The ".$vreg." is not the mask register v0.\n".$trace); + } + } + return $mask_bit; +} + +# Vector instructions + +sub vadd_vv { + # vadd.vv vd, vs2, vs1, vm + my $template = 0b000000_0_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vadd_vx { + # vadd.vx vd, vs2, rs1, vm + my $template = 0b000000_0_00000_00000_100_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vsub_vv { + # vsub.vv vd, vs2, vs1, vm + my $template = 0b000010_0_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vsub_vx { + # vsub.vx vd, vs2, rs1, vm + my $template = 0b000010_0_00000_00000_100_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vid_v { + # vid.v vd + my $template = 0b0101001_00000_10001_010_00000_1010111; + my $vd = read_vreg shift; + return ".word ".($template | ($vd << 7)); +} + +sub viota_m { + # viota.m vd, vs2, vm + my $template = 0b010100_0_00000_10000_010_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7)); +} + +sub vle8_v { + # vle8.v vd, (rs1), vm + my $template = 0b000000_0_00000_00000_000_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7)); +} + +sub vle32_v { + # vle32.v vd, (rs1), vm + my $template = 0b000000_0_00000_00000_110_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7)); +} + +sub vle64_v { + # vle64.v vd, (rs1) + my $template = 0b0000001_00000_00000_111_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vlse32_v { + # vlse32.v vd, (rs1), rs2 + my $template = 0b0000101_00000_00000_110_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vlsseg_nf_e32_v { + # vlssege32.v vd, (rs1), rs2 + my $template = 0b0000101_00000_00000_110_00000_0000111; + my $nf = shift; + $nf -= 1; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vlse64_v { + # vlse64.v vd, (rs1), rs2 + my $template = 0b0000101_00000_00000_111_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vluxei8_v { + # vluxei8.v vd, (rs1), vs2, vm + my $template = 0b000001_0_00000_00000_000_00000_0000111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $vs2 = read_vreg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vmerge_vim { + # vmerge.vim vd, vs2, imm, v0 + my $template = 0b0101110_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7)); +} + +sub vmerge_vvm { + # vmerge.vvm vd vs2 vs1 + my $template = 0b0101110_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)) +} + +sub vmseq_vi { + # vmseq.vi vd vs1, imm + my $template = 0b0110001_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs1 = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($vs1 << 20) | ($imm << 15) | ($vd << 7)) +} + +sub vmsgtu_vx { + # vmsgtu.vx vd vs2, rs1, vm + my $template = 0b011110_0_00000_00000_100_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)) +} + +sub vmv_v_i { + # vmv.v.i vd, imm + my $template = 0b0101111_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $imm = shift; + return ".word ".($template | ($imm << 15) | ($vd << 7)); +} + +sub vmv_v_x { + # vmv.v.x vd, rs1 + my $template = 0b0101111_00000_00000_100_00000_1010111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vmv_v_v { + # vmv.v.v vd, vs1 + my $template = 0b0101111_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs1 << 15) | ($vd << 7)); +} + +sub vor_vv_v0t { + # vor.vv vd, vs2, vs1, v0.t + my $template = 0b0010100_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vse8_v { + # vse8.v vd, (rs1), vm + my $template = 0b000000_0_00000_00000_000_00000_0100111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7)); +} + +sub vse32_v { + # vse32.v vd, (rs1), vm + my $template = 0b000000_0_00000_00000_110_00000_0100111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($rs1 << 15) | ($vd << 7)); +} + +sub vssseg_nf_e32_v { + # vsssege32.v vs3, (rs1), rs2 + my $template = 0b0000101_00000_00000_110_00000_0100111; + my $nf = shift; + $nf -= 1; + my $vs3 = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($nf << 29) | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7)); +} + +sub vsuxei8_v { + # vsuxei8.v vs3, (rs1), vs2, vm + my $template = 0b000001_0_00000_00000_000_00000_0100111; + my $vs3 = read_vreg shift; + my $rs1 = read_reg shift; + my $vs2 = read_vreg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($rs1 << 15) | ($vs3 << 7)); +} + +sub vse64_v { + # vse64.v vd, (rs1) + my $template = 0b0000001_00000_00000_111_00000_0100111; + my $vd = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($rs1 << 15) | ($vd << 7)); +} + +sub vsetivli__x0_2_e64_m1_tu_mu { + # vsetivli x0, 2, e64, m1, tu, mu + return ".word 0xc1817057"; +} + +sub vsetivli__x0_4_e32_m1_tu_mu { + # vsetivli x0, 4, e32, m1, tu, mu + return ".word 0xc1027057"; +} + +sub vsetivli__x0_4_e64_m1_tu_mu { + # vsetivli x0, 4, e64, m1, tu, mu + return ".word 0xc1827057"; +} + +sub vsetivli__x0_8_e32_m1_tu_mu { + # vsetivli x0, 8, e32, m1, tu, mu + return ".word 0xc1047057"; +} + +sub vsetvli { + # vsetvli rd, rs1, vtypei + my $template = 0b0_00000000000_00000_111_00000_1010111; + my $rd = read_reg shift; + my $rs1 = read_reg shift; + my $sew = read_sew shift; + my $lmul = read_lmul shift; + my $tail_policy = read_tail_policy shift; + my $mask_policy = read_mask_policy shift; + my $vtypei = ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3) | $lmul; + + return ".word ".($template | ($vtypei << 20) | ($rs1 << 15) | ($rd << 7)); +} + +sub vsetivli { + # vsetvli rd, uimm, vtypei + my $template = 0b11_0000000000_00000_111_00000_1010111; + my $rd = read_reg shift; + my $uimm = shift; + my $sew = read_sew shift; + my $lmul = read_lmul shift; + my $tail_policy = read_tail_policy shift; + my $mask_policy = read_mask_policy shift; + my $vtypei = ($mask_policy << 7) | ($tail_policy << 6) | ($sew << 3) | $lmul; + + return ".word ".($template | ($vtypei << 20) | ($uimm << 15) | ($rd << 7)); +} + +sub vslidedown_vi { + # vslidedown.vi vd, vs2, uimm + my $template = 0b0011111_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vslidedown_vx { + # vslidedown.vx vd, vs2, rs1 + my $template = 0b0011111_00000_00000_100_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vslideup_vi_v0t { + # vslideup.vi vd, vs2, uimm, v0.t + my $template = 0b0011100_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vslideup_vi { + # vslideup.vi vd, vs2, uimm + my $template = 0b0011101_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vsll_vi { + # vsll.vi vd, vs2, uimm, vm + my $template = 0b1001011_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vsrl_vx { + # vsrl.vx vd, vs2, rs1 + my $template = 0b1010001_00000_00000_100_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vsse32_v { + # vse32.v vs3, (rs1), rs2 + my $template = 0b0000101_00000_00000_110_00000_0100111; + my $vs3 = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7)); +} + +sub vsse64_v { + # vsse64.v vs3, (rs1), rs2 + my $template = 0b0000101_00000_00000_111_00000_0100111; + my $vs3 = read_vreg shift; + my $rs1 = read_reg shift; + my $rs2 = read_reg shift; + return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7)); +} + +sub vxor_vv_v0t { + # vxor.vv vd, vs2, vs1, v0.t + my $template = 0b0010110_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vxor_vv { + # vxor.vv vd, vs2, vs1 + my $template = 0b0010111_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vzext_vf2 { + # vzext.vf2 vd, vs2, vm + my $template = 0b010010_0_00000_00110_010_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7)); +} + +# Vector crypto instructions + +## Zvbb and Zvkb instructions +## +## vandn (also in zvkb) +## vbrev +## vbrev8 (also in zvkb) +## vrev8 (also in zvkb) +## vclz +## vctz +## vcpop +## vrol (also in zvkb) +## vror (also in zvkb) +## vwsll + +sub vbrev8_v { + # vbrev8.v vd, vs2, vm + my $template = 0b010010_0_00000_01000_010_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7)); +} + +sub vrev8_v { + # vrev8.v vd, vs2, vm + my $template = 0b010010_0_00000_01001_010_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vd << 7)); +} + +sub vror_vi { + # vror.vi vd, vs2, uimm + my $template = 0b01010_0_1_00000_00000_011_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + my $uimm_i5 = $uimm >> 5; + my $uimm_i4_0 = $uimm & 0b11111; + + return ".word ".($template | ($uimm_i5 << 26) | ($vs2 << 20) | ($uimm_i4_0 << 15) | ($vd << 7)); +} + +sub vwsll_vv { + # vwsll.vv vd, vs2, vs1, vm + my $template = 0b110101_0_00000_00000_000_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + my $vm = read_mask_vreg shift; + return ".word ".($template | ($vm << 25) | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +## Zvbc instructions + +sub vclmulh_vx { + # vclmulh.vx vd, vs2, rs1 + my $template = 0b0011011_00000_00000_110_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vclmul_vx_v0t { + # vclmul.vx vd, vs2, rs1, v0.t + my $template = 0b0011000_00000_00000_110_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +sub vclmul_vx { + # vclmul.vx vd, vs2, rs1 + my $template = 0b0011001_00000_00000_110_00000_1010111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $rs1 = read_reg shift; + return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7)); +} + +## Zvkg instructions + +sub vghsh_vv { + # vghsh.vv vd, vs2, vs1 + my $template = 0b1011001_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7)); +} + +sub vgmul_vv { + # vgmul.vv vd, vs2 + my $template = 0b1010001_00000_10001_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## Zvkned instructions + +sub vaesdf_vs { + # vaesdf.vs vd, vs2 + my $template = 0b101001_1_00000_00001_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesdm_vs { + # vaesdm.vs vd, vs2 + my $template = 0b101001_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesef_vs { + # vaesef.vs vd, vs2 + my $template = 0b101001_1_00000_00011_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaesem_vs { + # vaesem.vs vd, vs2 + my $template = 0b101001_1_00000_00010_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +sub vaeskf1_vi { + # vaeskf1.vi vd, vs2, uimmm + my $template = 0b100010_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7)); +} + +sub vaeskf2_vi { + # vaeskf2.vi vd, vs2, uimm + my $template = 0b101010_1_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vaesz_vs { + # vaesz.vs vd, vs2 + my $template = 0b101001_1_00000_00111_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## Zvknha and Zvknhb instructions + +sub vsha2ms_vv { + # vsha2ms.vv vd, vs2, vs1 + my $template = 0b1011011_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +sub vsha2ch_vv { + # vsha2ch.vv vd, vs2, vs1 + my $template = 0b101110_10000_00000_001_00000_01110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +sub vsha2cl_vv { + # vsha2cl.vv vd, vs2, vs1 + my $template = 0b101111_10000_00000_001_00000_01110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7)); +} + +## Zvksed instructions + +sub vsm4k_vi { + # vsm4k.vi vd, vs2, uimm + my $template = 0b1000011_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7)); +} + +sub vsm4r_vs { + # vsm4r.vs vd, vs2 + my $template = 0b1010011_00000_10000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vd << 7)); +} + +## zvksh instructions + +sub vsm3c_vi { + # vsm3c.vi vd, vs2, uimm + my $template = 0b1010111_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $uimm = shift; + return ".word ".($template | ($vs2 << 20) | ($uimm << 15 ) | ($vd << 7)); +} + +sub vsm3me_vv { + # vsm3me.vv vd, vs2, vs1 + my $template = 0b1000001_00000_00000_010_00000_1110111; + my $vd = read_vreg shift; + my $vs2 = read_vreg shift; + my $vs1 = read_vreg shift; + return ".word ".($template | ($vs2 << 20) | ($vs1 << 15 ) | ($vd << 7)); +} + +1; From patchwork Wed Oct 25 18:36:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436518 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8A8DC07545 for ; Wed, 25 Oct 2023 18:37:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234762AbjJYShZ (ORCPT ); Wed, 25 Oct 2023 14:37:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234719AbjJYShV (ORCPT ); Wed, 25 Oct 2023 14:37:21 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8842A18D for ; Wed, 25 Oct 2023 11:37:12 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1c9d407bb15so16125ad.0 for ; Wed, 25 Oct 2023 11:37:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259032; x=1698863832; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MWKTD/SWI8njkmgyeeN+5f9jkvLnhIgL7e3VWIYvG8Q=; b=OligZW4ClXoFMTp8YmdWgUfsaapPuy6hpOpQA9zMBmB1EXNnFxE1dd7VV14ml5FDBM yTkYGMcI61LSTl7Z8KwhU/vfP0K5Fb499YEzUqFtA1mx7CxkLOnClJGMhZhsAzx4uQkA Q52tvYCe4x1yD8jOwHskx78991r7AWed5ajzE5wJ62/b4Jp5Xj+R4qYxpHdGIOXlPqop LTww+Rm0tubgnfiO16wIW8Laz9h3xjkI8xR517eoJpA0hiV2h7QKhCVjCbzBqUqzbOwB PfKaB7gIyJymwQMgDF8a1cT5Hf8/PkZDJiGSqKxoxnM+X1nU65G1edz+D4kHz/c9lY2Q +ETA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259032; x=1698863832; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MWKTD/SWI8njkmgyeeN+5f9jkvLnhIgL7e3VWIYvG8Q=; b=noBiCAkLq9gNka6w3JL4PLTrQ1F3G457pqy3DCkqBUzChhe6sQQPNYyStLITII8WdB gTtHpiD+gTWTfiy70bKyQnH+z2XunjC6AYnpeb6bdWXe6JhWE93EcUmeZdagph9JmJOe yvHm08XsbtFfi0Icqq1NDzAY+wghlXl71LqZjkyMQhhL9fHdxb8ikchqJlQxibJZfLGz BB7WvhYUEKqXmHEf2wmnycWq/DcB/o62oKgXFASHTDQxA0SQDWFecANs+R44XtC9aETo WF40zhVefdEWjB1iUQZKjMY8DvLX9qOgVfaO7m4G8CqjVl18p0FspUt7quUREpgS4tq7 s61A== X-Gm-Message-State: AOJu0YwxLn0b+fH536vIeWPfy2bc90BYL/itJgtsziV/rxq/WTuO3x8H QS15IbDm8l8Ry1kAiSBkn2dA+g== X-Google-Smtp-Source: AGHT+IGRNYQPO/oiPp8ACsYhU0M5aGJ04WqOdeXXrN2GlGvrHQNkylxeMduls440BMus9XknRraQ5g== X-Received: by 2002:a17:90a:7442:b0:27d:9b67:7fa6 with SMTP id o2-20020a17090a744200b0027d9b677fa6mr14293424pjk.3.1698259031716; Wed, 25 Oct 2023 11:37:11 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.07 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:11 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 04/12] RISC-V: crypto: add Zvkned accelerated AES implementation Date: Thu, 26 Oct 2023 02:36:36 +0800 Message-Id: <20231025183644.8735-5-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The AES implementation using the Zvkned vector crypto extension from OpenSSL(openssl/openssl#21923). Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- arch/riscv/crypto/Kconfig | 12 + arch/riscv/crypto/Makefile | 11 + arch/riscv/crypto/aes-riscv64-glue.c | 163 +++++++++ arch/riscv/crypto/aes-riscv64-glue.h | 28 ++ arch/riscv/crypto/aes-riscv64-zvkned.pl | 451 ++++++++++++++++++++++++ 5 files changed, 665 insertions(+) create mode 100644 arch/riscv/crypto/aes-riscv64-glue.c create mode 100644 arch/riscv/crypto/aes-riscv64-glue.h create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 10d60edc0110..500938317e71 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -2,4 +2,16 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)" +config CRYPTO_AES_RISCV64 + default y if RISCV_ISA_V + tristate "Ciphers: AES" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_AES + select CRYPTO_ALGAPI + help + Block ciphers: AES cipher algorithms (FIPS-197) + + Architecture: riscv64 using: + - Zvkned vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index b3b6332c9f6d..90ca91d8df26 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -2,3 +2,14 @@ # # linux/arch/riscv/crypto/Makefile # + +obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o +aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o + +quiet_cmd_perlasm = PERLASM $@ + cmd_perlasm = $(PERL) $(<) void $(@) + +$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl + $(call cmd,perlasm) + +clean-files += aes-riscv64-zvkned.S diff --git a/arch/riscv/crypto/aes-riscv64-glue.c b/arch/riscv/crypto/aes-riscv64-glue.c new file mode 100644 index 000000000000..5c4f1018d3aa --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-glue.c @@ -0,0 +1,163 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Port of the OpenSSL AES implementation for RISC-V + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "aes-riscv64-glue.h" + +/* + * aes cipher using zvkned vector crypto extension + * + * All zvkned-based functions use encryption expending keys for both encryption + * and decryption. + */ +void rv64i_zvkned_encrypt(const u8 *in, u8 *out, const struct aes_key *key); +void rv64i_zvkned_decrypt(const u8 *in, u8 *out, const struct aes_key *key); + +static inline int aes_round_num(unsigned int keylen) +{ + switch (keylen) { + case AES_KEYSIZE_128: + return 10; + case AES_KEYSIZE_192: + return 12; + case AES_KEYSIZE_256: + return 14; + default: + return 0; + } +} + +int riscv64_aes_setkey(struct riscv64_aes_ctx *ctx, const u8 *key, + unsigned int keylen) +{ + /* + * The RISC-V AES vector crypto key expending doesn't support AES-192. + * We just use the generic software key expending here to simplify the key + * expending flow. + */ + u32 aes_rounds; + u32 key_length; + int ret; + + ret = aes_expandkey(&ctx->fallback_ctx, key, keylen); + if (ret < 0) + return -EINVAL; + + /* + * Copy the key from `crypto_aes_ctx` to `aes_key` for zvkned-based AES + * implementations. + */ + aes_rounds = aes_round_num(keylen); + ctx->key.rounds = aes_rounds; + key_length = AES_BLOCK_SIZE * (aes_rounds + 1); + memcpy(ctx->key.key, ctx->fallback_ctx.key_enc, key_length); + + return 0; +} + +void riscv64_aes_encrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst, + const u8 *src) +{ + if (crypto_simd_usable()) { + kernel_vector_begin(); + rv64i_zvkned_encrypt(src, dst, &ctx->key); + kernel_vector_end(); + } else { + aes_encrypt(&ctx->fallback_ctx, dst, src); + } +} + +void riscv64_aes_decrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst, + const u8 *src) +{ + if (crypto_simd_usable()) { + kernel_vector_begin(); + rv64i_zvkned_decrypt(src, dst, &ctx->key); + kernel_vector_end(); + } else { + aes_decrypt(&ctx->fallback_ctx, dst, src); + } +} + +static int aes_setkey(struct crypto_tfm *tfm, const u8 *key, + unsigned int keylen) +{ + struct riscv64_aes_ctx *ctx = crypto_tfm_ctx(tfm); + + return riscv64_aes_setkey(ctx, key, keylen); +} + +static void aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src) +{ + const struct riscv64_aes_ctx *ctx = crypto_tfm_ctx(tfm); + + riscv64_aes_encrypt_zvkned(ctx, dst, src); +} + +static void aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src) +{ + const struct riscv64_aes_ctx *ctx = crypto_tfm_ctx(tfm); + + riscv64_aes_decrypt_zvkned(ctx, dst, src); +} + +static struct crypto_alg riscv64_aes_alg_zvkned = { + .cra_name = "aes", + .cra_driver_name = "aes-riscv64-zvkned", + .cra_module = THIS_MODULE, + .cra_priority = 300, + .cra_flags = CRYPTO_ALG_TYPE_CIPHER, + .cra_blocksize = AES_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_aes_ctx), + .cra_cipher = { + .cia_min_keysize = AES_MIN_KEY_SIZE, + .cia_max_keysize = AES_MAX_KEY_SIZE, + .cia_setkey = aes_setkey, + .cia_encrypt = aes_encrypt_zvkned, + .cia_decrypt = aes_decrypt_zvkned, + }, +}; + +static inline bool check_aes_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKNED) && + riscv_vector_vlen() >= 128; +} + +static int __init riscv64_aes_mod_init(void) +{ + if (check_aes_ext()) + return crypto_register_alg(&riscv64_aes_alg_zvkned); + + return -ENODEV; +} + +static void __exit riscv64_aes_mod_fini(void) +{ + if (check_aes_ext()) + crypto_unregister_alg(&riscv64_aes_alg_zvkned); +} + +module_init(riscv64_aes_mod_init); +module_exit(riscv64_aes_mod_fini); + +MODULE_DESCRIPTION("AES (RISC-V accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("aes"); diff --git a/arch/riscv/crypto/aes-riscv64-glue.h b/arch/riscv/crypto/aes-riscv64-glue.h new file mode 100644 index 000000000000..7f1f675aca0d --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-glue.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef AES_RISCV64_GLUE_H +#define AES_RISCV64_GLUE_H + +#include +#include + +struct aes_key { + u32 key[AES_MAX_KEYLENGTH_U32]; + u32 rounds; +}; + +struct riscv64_aes_ctx { + struct aes_key key; + struct crypto_aes_ctx fallback_ctx; +}; + +int riscv64_aes_setkey(struct riscv64_aes_ctx *ctx, const u8 *key, + unsigned int keylen); + +void riscv64_aes_encrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst, + const u8 *src); + +void riscv64_aes_decrypt_zvkned(const struct riscv64_aes_ctx *ctx, u8 *dst, + const u8 *src); + +#endif /* AES_RISCV64_GLUE_H */ diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl new file mode 100644 index 000000000000..c0ecde77bf56 --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl @@ -0,0 +1,451 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph Müllner +# Copyright (c) 2023, Phoebe Chen +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >= 128 +# - RISC-V Vector AES block cipher extension ('Zvkned') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) = map("v$_",(0..31)); + +{ +################################################################################ +# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out, +# const AES_KEY *key); +my ($INP, $OUTP, $KEYP) = ("a0", "a1", "a2"); +my ($T0, $T1, $ROUNDS, $T6) = ("a3", "a4", "t5", "t6"); + +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_encrypt +.type rv64i_zvkned_encrypt,\@function +rv64i_zvkned_encrypt: + # Load number of rounds + lwu $ROUNDS, 240($KEYP) + + # Get proper routine for key size + li $T6, 14 + beq $ROUNDS, $T6, L_enc_256 + li $T6, 10 + beq $ROUNDS, $T6, L_enc_128 + li $T6, 12 + beq $ROUNDS, $T6, L_enc_192 + + j L_fail_m2 +.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt +___ + +$code .= <<___; +.p2align 3 +L_enc_128: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, ($INP)]} + + @{[vle32_v $V10, ($KEYP)]} + @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, ($KEYP)]} + @{[vaesem_vs $V1, $V11]} # with round key w[ 4, 7] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, ($KEYP)]} + @{[vaesem_vs $V1, $V12]} # with round key w[ 8,11] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, ($KEYP)]} + @{[vaesem_vs $V1, $V13]} # with round key w[12,15] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, ($KEYP)]} + @{[vaesem_vs $V1, $V14]} # with round key w[16,19] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, ($KEYP)]} + @{[vaesem_vs $V1, $V15]} # with round key w[20,23] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V16, ($KEYP)]} + @{[vaesem_vs $V1, $V16]} # with round key w[24,27] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V17, ($KEYP)]} + @{[vaesem_vs $V1, $V17]} # with round key w[28,31] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V18, ($KEYP)]} + @{[vaesem_vs $V1, $V18]} # with round key w[32,35] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V19, ($KEYP)]} + @{[vaesem_vs $V1, $V19]} # with round key w[36,39] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V20, ($KEYP)]} + @{[vaesef_vs $V1, $V20]} # with round key w[40,43] + + @{[vse32_v $V1, ($OUTP)]} + + ret +.size L_enc_128,.-L_enc_128 +___ + +$code .= <<___; +.p2align 3 +L_enc_192: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, ($INP)]} + + @{[vle32_v $V10, ($KEYP)]} + @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, ($KEYP)]} + @{[vaesem_vs $V1, $V11]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, ($KEYP)]} + @{[vaesem_vs $V1, $V12]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, ($KEYP)]} + @{[vaesem_vs $V1, $V13]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, ($KEYP)]} + @{[vaesem_vs $V1, $V14]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, ($KEYP)]} + @{[vaesem_vs $V1, $V15]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V16, ($KEYP)]} + @{[vaesem_vs $V1, $V16]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V17, ($KEYP)]} + @{[vaesem_vs $V1, $V17]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V18, ($KEYP)]} + @{[vaesem_vs $V1, $V18]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V19, ($KEYP)]} + @{[vaesem_vs $V1, $V19]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V20, ($KEYP)]} + @{[vaesem_vs $V1, $V20]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V21, ($KEYP)]} + @{[vaesem_vs $V1, $V21]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V22, ($KEYP)]} + @{[vaesef_vs $V1, $V22]} + + @{[vse32_v $V1, ($OUTP)]} + ret +.size L_enc_192,.-L_enc_192 +___ + +$code .= <<___; +.p2align 3 +L_enc_256: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, ($INP)]} + + @{[vle32_v $V10, ($KEYP)]} + @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3] + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, ($KEYP)]} + @{[vaesem_vs $V1, $V11]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, ($KEYP)]} + @{[vaesem_vs $V1, $V12]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, ($KEYP)]} + @{[vaesem_vs $V1, $V13]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, ($KEYP)]} + @{[vaesem_vs $V1, $V14]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, ($KEYP)]} + @{[vaesem_vs $V1, $V15]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V16, ($KEYP)]} + @{[vaesem_vs $V1, $V16]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V17, ($KEYP)]} + @{[vaesem_vs $V1, $V17]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V18, ($KEYP)]} + @{[vaesem_vs $V1, $V18]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V19, ($KEYP)]} + @{[vaesem_vs $V1, $V19]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V20, ($KEYP)]} + @{[vaesem_vs $V1, $V20]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V21, ($KEYP)]} + @{[vaesem_vs $V1, $V21]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V22, ($KEYP)]} + @{[vaesem_vs $V1, $V22]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V23, ($KEYP)]} + @{[vaesem_vs $V1, $V23]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V24, ($KEYP)]} + @{[vaesef_vs $V1, $V24]} + + @{[vse32_v $V1, ($OUTP)]} + ret +.size L_enc_256,.-L_enc_256 +___ + +################################################################################ +# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out, +# const AES_KEY *key); +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_decrypt +.type rv64i_zvkned_decrypt,\@function +rv64i_zvkned_decrypt: + # Load number of rounds + lwu $ROUNDS, 240($KEYP) + + # Get proper routine for key size + li $T6, 14 + beq $ROUNDS, $T6, L_dec_256 + li $T6, 10 + beq $ROUNDS, $T6, L_dec_128 + li $T6, 12 + beq $ROUNDS, $T6, L_dec_192 + + j L_fail_m2 +.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt +___ + +$code .= <<___; +.p2align 3 +L_dec_128: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, ($INP)]} + + addi $KEYP, $KEYP, 160 + @{[vle32_v $V20, ($KEYP)]} + @{[vaesz_vs $V1, $V20]} # with round key w[40,43] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V19, ($KEYP)]} + @{[vaesdm_vs $V1, $V19]} # with round key w[36,39] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V18, ($KEYP)]} + @{[vaesdm_vs $V1, $V18]} # with round key w[32,35] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V17, ($KEYP)]} + @{[vaesdm_vs $V1, $V17]} # with round key w[28,31] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V16, ($KEYP)]} + @{[vaesdm_vs $V1, $V16]} # with round key w[24,27] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V15, ($KEYP)]} + @{[vaesdm_vs $V1, $V15]} # with round key w[20,23] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V14, ($KEYP)]} + @{[vaesdm_vs $V1, $V14]} # with round key w[16,19] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V13, ($KEYP)]} + @{[vaesdm_vs $V1, $V13]} # with round key w[12,15] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V12, ($KEYP)]} + @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V11, ($KEYP)]} + @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V10, ($KEYP)]} + @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3] + + @{[vse32_v $V1, ($OUTP)]} + + ret +.size L_dec_128,.-L_dec_128 +___ + +$code .= <<___; +.p2align 3 +L_dec_192: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, ($INP)]} + + addi $KEYP, $KEYP, 192 + @{[vle32_v $V22, ($KEYP)]} + @{[vaesz_vs $V1, $V22]} # with round key w[48,51] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V21, ($KEYP)]} + @{[vaesdm_vs $V1, $V21]} # with round key w[44,47] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V20, ($KEYP)]} + @{[vaesdm_vs $V1, $V20]} # with round key w[40,43] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V19, ($KEYP)]} + @{[vaesdm_vs $V1, $V19]} # with round key w[36,39] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V18, ($KEYP)]} + @{[vaesdm_vs $V1, $V18]} # with round key w[32,35] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V17, ($KEYP)]} + @{[vaesdm_vs $V1, $V17]} # with round key w[28,31] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V16, ($KEYP)]} + @{[vaesdm_vs $V1, $V16]} # with round key w[24,27] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V15, ($KEYP)]} + @{[vaesdm_vs $V1, $V15]} # with round key w[20,23] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V14, ($KEYP)]} + @{[vaesdm_vs $V1, $V14]} # with round key w[16,19] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V13, ($KEYP)]} + @{[vaesdm_vs $V1, $V13]} # with round key w[12,15] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V12, ($KEYP)]} + @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V11, ($KEYP)]} + @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V10, ($KEYP)]} + @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3] + + @{[vse32_v $V1, ($OUTP)]} + + ret +.size L_dec_192,.-L_dec_192 +___ + +$code .= <<___; +.p2align 3 +L_dec_256: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[vle32_v $V1, ($INP)]} + + addi $KEYP, $KEYP, 224 + @{[vle32_v $V24, ($KEYP)]} + @{[vaesz_vs $V1, $V24]} # with round key w[56,59] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V23, ($KEYP)]} + @{[vaesdm_vs $V1, $V23]} # with round key w[52,55] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V22, ($KEYP)]} + @{[vaesdm_vs $V1, $V22]} # with round key w[48,51] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V21, ($KEYP)]} + @{[vaesdm_vs $V1, $V21]} # with round key w[44,47] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V20, ($KEYP)]} + @{[vaesdm_vs $V1, $V20]} # with round key w[40,43] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V19, ($KEYP)]} + @{[vaesdm_vs $V1, $V19]} # with round key w[36,39] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V18, ($KEYP)]} + @{[vaesdm_vs $V1, $V18]} # with round key w[32,35] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V17, ($KEYP)]} + @{[vaesdm_vs $V1, $V17]} # with round key w[28,31] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V16, ($KEYP)]} + @{[vaesdm_vs $V1, $V16]} # with round key w[24,27] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V15, ($KEYP)]} + @{[vaesdm_vs $V1, $V15]} # with round key w[20,23] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V14, ($KEYP)]} + @{[vaesdm_vs $V1, $V14]} # with round key w[16,19] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V13, ($KEYP)]} + @{[vaesdm_vs $V1, $V13]} # with round key w[12,15] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V12, ($KEYP)]} + @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V11, ($KEYP)]} + @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7] + addi $KEYP, $KEYP, -16 + @{[vle32_v $V10, ($KEYP)]} + @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3] + + @{[vse32_v $V1, ($OUTP)]} + + ret +.size L_dec_256,.-L_dec_256 +___ +} + +$code .= <<___; +L_fail_m1: + li a0, -1 + ret +.size L_fail_m1,.-L_fail_m1 + +L_fail_m2: + li a0, -2 + ret +.size L_fail_m2,.-L_fail_m2 + +L_end: + ret +.size L_end,.-L_end +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Wed Oct 25 18:36:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436519 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D40D0C0032E for ; Wed, 25 Oct 2023 18:37:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234062AbjJYShi (ORCPT ); Wed, 25 Oct 2023 14:37:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234952AbjJYShV (ORCPT ); Wed, 25 Oct 2023 14:37:21 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 637A31BE for ; Wed, 25 Oct 2023 11:37:16 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-27e0c1222d1so17454a91.0 for ; Wed, 25 Oct 2023 11:37:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259036; x=1698863836; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KWDZyqBJhOVpYqYhwbudR4lRinTiZ1fIB/oQ8fHi3tg=; b=mRDDTlxDfVZ4q4JsFyDbCTEDG/QLsx/GfO41ZwHBPW87ceShqpqIoBk8el3SDT0c7b Z70hpl8okKOoMgyTscvmy8Q2WsMEW9ovIwGyJjUFn6ag3nIqnu/o6tFYmP6v8vJCWdj1 nyXYOBcGgGXk5i4qXzrjEdYxAee8jRMTwG0Lvx1BYPeEIEgk9gD4iTHYCdvGQhHqQR9A KrLLnbQbbMQ0XV8fY0jUFh7qHRF+kW4DkwfoIB7zHlxUxlWuK51Y1eRpyo6M04CjLvLA Fr5VjXUd3tfBiwZOzMW1RS58SCfnCi75cmlZi21VvipOEDTppJLGTklc34sFsGZhg5ME wAgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259036; x=1698863836; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KWDZyqBJhOVpYqYhwbudR4lRinTiZ1fIB/oQ8fHi3tg=; b=hU0rLvDRVaqeeSylsxS8Laz0b77D0AqdLhU/52ndgSUmRPKNGhE/Yo8yzhgFT7AzIR T5hpNNM72a8Fg437Ve0sNcGrBt8uOj2nRcoCgTj7ndT3ktK1A+0RLJdc1nOQuZVpA27P rOF6+M2n46WUXO8fXpAWX/VNP2fZc3f2AeByb9vU7mJbVhfrcDKDc8zxwkFi1YVKOid8 QEjd59PxaQ83alAn/D9THsgJ4ZajVK59V7Li64he9XNPYvjCMgwRE/57auJYSSepeT99 PIBRMSm2Jd4+M8uChxwbzk4D7q0n9vREuo/1UaaIcGFtl9ZY/p8VrAbiGUKMJpUNt2uR JBxA== X-Gm-Message-State: AOJu0YyFeSEnypdRV5y2vi7z/4XNPhiJLtwAZjnKCVO0PjZD1M+KS4DI VdH/xhGgyPMObUqoOXBgL4/FQA== X-Google-Smtp-Source: AGHT+IFOJxOczYNy9csjaf/jWys3suT6WH12pXc8zs4hoIeoK9YHwvDMemywNi2gacyq+/6UAnDc7Q== X-Received: by 2002:a17:90b:4a8f:b0:27d:b885:17f2 with SMTP id lp15-20020a17090b4a8f00b0027db88517f2mr12749747pjb.30.1698259035682; Wed, 25 Oct 2023 11:37:15 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.12 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:15 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 05/12] crypto: scatterwalk - Add scatterwalk_next() to get the next scatterlist in scatter_walk Date: Thu, 26 Oct 2023 02:36:37 +0800 Message-Id: <20231025183644.8735-6-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org In some situations, we might split the `skcipher_request` into several segments. When we try to move to next segment, we might use `scatterwalk_ffwd()` to get the corresponding `scatterlist` iterating from the head of `scatterlist`. This helper function could just gather the information in `skcipher_walk` and move to next `scatterlist` directly. Signed-off-by: Jerry Shih --- include/crypto/scatterwalk.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/include/crypto/scatterwalk.h b/include/crypto/scatterwalk.h index 32fc4473175b..b1a90afe695d 100644 --- a/include/crypto/scatterwalk.h +++ b/include/crypto/scatterwalk.h @@ -98,7 +98,12 @@ void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg, unsigned int start, unsigned int nbytes, int out); struct scatterlist *scatterwalk_ffwd(struct scatterlist dst[2], - struct scatterlist *src, - unsigned int len); + struct scatterlist *src, unsigned int len); + +static inline struct scatterlist *scatterwalk_next(struct scatterlist dst[2], + struct scatter_walk *src) +{ + return scatterwalk_ffwd(dst, src->sg, src->offset - src->sg->offset); +} #endif /* _CRYPTO_SCATTERWALK_H */ From patchwork Wed Oct 25 18:36:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436521 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D747C0032E for ; Wed, 25 Oct 2023 18:38:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235040AbjJYSiA (ORCPT ); Wed, 25 Oct 2023 14:38:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234892AbjJYShb (ORCPT ); Wed, 25 Oct 2023 14:37:31 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8985A199 for ; Wed, 25 Oct 2023 11:37:21 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-5b5354da665so69403a12.2 for ; Wed, 25 Oct 2023 11:37:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259041; x=1698863841; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NUBPCI3nv6GuIuqyxLimCC7AgB3vZ5Y1wEWhzJ4254k=; b=aiDpz5gxH4TYos5dJjP4bOrT4mqMG3fe7C/AEgw1p0389yS+6GCINZ2ExOg8NvvU93 yfxG3OcE7Mj9lbdPA/iPh19yU2qoIDn0MUjVu+pVEtvIdAtXMA3CweVeMp+0yOx5D0+V GudTyozZRvDm12eufcJbWV+QMUlc+614s8RBlqus+LwjVoddX2wm+mttCVjbHuNSWS7B TrCC8x5D5HESHJUlpJRwvZIsm5spO/9FSH5mrExXIvWlKE7DrJyJFgzcqvzxXPMBCjZT c+DiB6c6PDIRS5eHVcUrMGmO61LzgSwBQSnztuxiWPjSczHbmJhXlcTM5akTrHQTI6cM 7dRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259041; x=1698863841; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NUBPCI3nv6GuIuqyxLimCC7AgB3vZ5Y1wEWhzJ4254k=; b=VOpZyqiYTshBMrrkxeVjFdvbjZbWPpxravjS3Mas7eiAlN7/SNTQz4oZEinGoChmG1 cw8x3f2v+UVQFtvXHk2uHzzD2z7cHNdRD0bGCv8uhJAs+e4ykm/6hvRj08y7zkbIWhxE mFpXlQhRSMY8pie/nbGVaziQQwuICCscLF5UWWRu8olp1nkLNMOifwxlt/h240pk3Czb aG/bHAdiXl8hYFPURhkGEkXA+gyjfCc1Oy7nQre7+ZbMMF7tyUZmBYAWFX9JR63psD2G JIDaT2b3R1ZWKghwoXDoauXfToASxQCnkGrwqLwKVvWNPs5pxaGOXzt7ROVvvNU5WEOx t6aA== X-Gm-Message-State: AOJu0Yz64moBsBSVy/4RjzCFRvXnOq2J4ZorG01tVz/IwenmkPssmK+H 0+WXvQrCXiirTwwxMOdmVMJSHw== X-Google-Smtp-Source: AGHT+IFQuAa2SyMzjPJ+9WrEyPCylzasoBrW5l/9jN+1MkD2TxmPOKhkX62uLo9IKNNKkyUjjI6EAg== X-Received: by 2002:a17:90b:f85:b0:27d:75f2:a3ee with SMTP id ft5-20020a17090b0f8500b0027d75f2a3eemr13897309pjb.10.1698259040058; Wed, 25 Oct 2023 11:37:20 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:19 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 06/12] RISC-V: crypto: add accelerated AES-CBC/CTR/ECB/XTS implementations Date: Thu, 26 Oct 2023 02:36:38 +0800 Message-Id: <20231025183644.8735-7-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Port the vector-crypto accelerated CBC, CTR, ECB and XTS block modes for AES cipher from OpenSSL(openssl/openssl#21923). In addition, support XTS-AES-192 mode which is not existed in OpenSSL. Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- arch/riscv/crypto/Kconfig | 21 + arch/riscv/crypto/Makefile | 11 + .../crypto/aes-riscv64-block-mode-glue.c | 486 +++++++++ .../crypto/aes-riscv64-zvbb-zvkg-zvkned.pl | 944 ++++++++++++++++++ arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl | 416 ++++++++ arch/riscv/crypto/aes-riscv64-zvkned.pl | 927 +++++++++++++++-- 6 files changed, 2715 insertions(+), 90 deletions(-) create mode 100644 arch/riscv/crypto/aes-riscv64-block-mode-glue.c create mode 100644 arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl create mode 100644 arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 500938317e71..dfa9d0146d26 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -14,4 +14,25 @@ config CRYPTO_AES_RISCV64 Architecture: riscv64 using: - Zvkned vector crypto extension +config CRYPTO_AES_BLOCK_RISCV64 + default y if RISCV_ISA_V + tristate "Ciphers: AES, modes: ECB/CBC/CTR/XTS" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_AES_RISCV64 + select CRYPTO_SKCIPHER + help + Length-preserving ciphers: AES cipher algorithms (FIPS-197) + with block cipher modes: + - ECB (Electronic Codebook) mode (NIST SP 800-38A) + - CBC (Cipher Block Chaining) mode (NIST SP 800-38A) + - CTR (Counter) mode (NIST SP 800-38A) + - XTS (XOR Encrypt XOR Tweakable Block Cipher with Ciphertext + Stealing) mode (NIST SP 800-38E and IEEE 1619) + + Architecture: riscv64 using: + - Zvbb vector extension (XTS) + - Zvkb vector crypto extension (CTR/XTS) + - Zvkg vector crypto extension (XTS) + - Zvkned vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 90ca91d8df26..42a4e8ec79cf 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -6,10 +6,21 @@ obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o +obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o +aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkned.o aes-riscv64-zvkb-zvkned.o + quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) $(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl $(call cmd,perlasm) +$(obj)/aes-riscv64-zvbb-zvkg-zvkned.S: $(src)/aes-riscv64-zvbb-zvkg-zvkned.pl + $(call cmd,perlasm) + +$(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl + $(call cmd,perlasm) + clean-files += aes-riscv64-zvkned.S +clean-files += aes-riscv64-zvbb-zvkg-zvkned.S +clean-files += aes-riscv64-zvkb-zvkned.S diff --git a/arch/riscv/crypto/aes-riscv64-block-mode-glue.c b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c new file mode 100644 index 000000000000..c33e902eff5e --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-block-mode-glue.c @@ -0,0 +1,486 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Port of the OpenSSL AES block mode implementations for RISC-V + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "aes-riscv64-glue.h" + +#define AES_BLOCK_VALID_SIZE_MASK (~(AES_BLOCK_SIZE - 1)) +#define AES_BLOCK_REMAINING_SIZE_MASK (AES_BLOCK_SIZE - 1) + +struct riscv64_aes_xts_ctx { + struct riscv64_aes_ctx ctx1; + struct riscv64_aes_ctx ctx2; +}; + +/* aes cbc block mode using zvkned vector crypto extension */ +void rv64i_zvkned_cbc_encrypt(const u8 *in, u8 *out, size_t length, + const struct aes_key *key, u8 *ivec); +void rv64i_zvkned_cbc_decrypt(const u8 *in, u8 *out, size_t length, + const struct aes_key *key, u8 *ivec); +/* aes ecb block mode using zvkned vector crypto extension */ +void rv64i_zvkned_ecb_encrypt(const u8 *in, u8 *out, size_t length, + const struct aes_key *key); +void rv64i_zvkned_ecb_decrypt(const u8 *in, u8 *out, size_t length, + const struct aes_key *key); + +/* aes ctr block mode using zvkb and zvkned vector crypto extension */ +/* This func operates on 32-bit counter. Caller has to handle the overflow. */ +void rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const u8 *in, u8 *out, + size_t length, + const struct aes_key *key, + u8 *ivec); + +/* aes xts block mode using zvbb, zvkg and zvkned vector crypto extension */ +void rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const u8 *in, u8 *out, + size_t length, + const struct aes_key *key, u8 *iv, + int update_iv); +void rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const u8 *in, u8 *out, + size_t length, + const struct aes_key *key, u8 *iv, + int update_iv); + +typedef void (*aes_xts_func)(const u8 *in, u8 *out, size_t length, + const struct aes_key *key, u8 *iv, int update_iv); + +/* ecb */ +static int aes_setkey(struct crypto_skcipher *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + + return riscv64_aes_setkey(ctx, in_key, key_len); +} + +static int ecb_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + int err; + + /* If we have error here, the `nbytes` will be zero. */ + err = skcipher_walk_virt(&walk, req, false); + while ((nbytes = walk.nbytes)) { + kernel_vector_begin(); + rv64i_zvkned_ecb_encrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes & AES_BLOCK_VALID_SIZE_MASK, + &ctx->key); + kernel_vector_end(); + err = skcipher_walk_done( + &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK); + } + + return err; +} + +static int ecb_decrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + int err; + + err = skcipher_walk_virt(&walk, req, false); + while ((nbytes = walk.nbytes)) { + kernel_vector_begin(); + rv64i_zvkned_ecb_decrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes & AES_BLOCK_VALID_SIZE_MASK, + &ctx->key); + kernel_vector_end(); + err = skcipher_walk_done( + &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK); + } + + return err; +} + +/* cbc */ +static int cbc_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + int err; + + err = skcipher_walk_virt(&walk, req, false); + while ((nbytes = walk.nbytes)) { + kernel_vector_begin(); + rv64i_zvkned_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes & AES_BLOCK_VALID_SIZE_MASK, + &ctx->key, walk.iv); + kernel_vector_end(); + err = skcipher_walk_done( + &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK); + } + + return err; +} + +static int cbc_decrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + int err; + + err = skcipher_walk_virt(&walk, req, false); + while ((nbytes = walk.nbytes)) { + kernel_vector_begin(); + rv64i_zvkned_cbc_decrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes & AES_BLOCK_VALID_SIZE_MASK, + &ctx->key, walk.iv); + kernel_vector_end(); + err = skcipher_walk_done( + &walk, nbytes & AES_BLOCK_REMAINING_SIZE_MASK); + } + + return err; +} + +/* ctr */ +static int ctr_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct riscv64_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int ctr32; + unsigned int nbytes; + unsigned int blocks; + unsigned int current_blocks; + unsigned int current_length; + int err; + + /* the ctr iv uses big endian */ + ctr32 = get_unaligned_be32(req->iv + 12); + err = skcipher_walk_virt(&walk, req, false); + while ((nbytes = walk.nbytes)) { + if (nbytes != walk.total) { + nbytes &= AES_BLOCK_VALID_SIZE_MASK; + blocks = nbytes / AES_BLOCK_SIZE; + } else { + /* This is the last walk. We should handle the tail data. */ + blocks = (nbytes + (AES_BLOCK_SIZE - 1)) / + AES_BLOCK_SIZE; + } + ctr32 += blocks; + + kernel_vector_begin(); + /* + * The `if` block below detects the overflow, which is then handled by + * limiting the amount of blocks to the exact overflow point. + */ + if (ctr32 >= blocks) { + rv64i_zvkb_zvkned_ctr32_encrypt_blocks( + walk.src.virt.addr, walk.dst.virt.addr, nbytes, + &ctx->key, req->iv); + } else { + /* use 2 ctr32 function calls for overflow case */ + current_blocks = blocks - ctr32; + current_length = + min(nbytes, current_blocks * AES_BLOCK_SIZE); + rv64i_zvkb_zvkned_ctr32_encrypt_blocks( + walk.src.virt.addr, walk.dst.virt.addr, + current_length, &ctx->key, req->iv); + crypto_inc(req->iv, 12); + + if (ctr32) { + rv64i_zvkb_zvkned_ctr32_encrypt_blocks( + walk.src.virt.addr + + current_blocks * AES_BLOCK_SIZE, + walk.dst.virt.addr + + current_blocks * AES_BLOCK_SIZE, + nbytes - current_length, &ctx->key, + req->iv); + } + } + kernel_vector_end(); + + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); + } + + return err; +} + +/* xts */ +static int xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm); + unsigned int xts_single_key_len = key_len / 2; + int ret; + + ret = xts_verify_key(tfm, in_key, key_len); + if (ret) + return ret; + ret = riscv64_aes_setkey(&ctx->ctx1, in_key, xts_single_key_len); + if (ret) + return ret; + return riscv64_aes_setkey(&ctx->ctx2, in_key + xts_single_key_len, + xts_single_key_len); +} + +static int xts_crypt(struct skcipher_request *req, aes_xts_func func) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm); + struct skcipher_request sub_req; + struct scatterlist sg_src[2], sg_dst[2]; + struct scatterlist *src, *dst; + struct skcipher_walk walk; + unsigned int walk_size = crypto_skcipher_walksize(tfm); + unsigned int tail_bytes; + unsigned int head_bytes; + unsigned int nbytes; + unsigned int update_iv = 1; + int err; + + /* xts input size should be bigger than AES_BLOCK_SIZE */ + if (req->cryptlen < AES_BLOCK_SIZE) + return -EINVAL; + + /* + * The tail size should be small than walk_size. Thus, we could make sure the + * walk size for tail elements could be bigger than AES_BLOCK_SIZE. + */ + if (req->cryptlen <= walk_size) { + tail_bytes = req->cryptlen; + head_bytes = 0; + } else { + if (req->cryptlen & AES_BLOCK_REMAINING_SIZE_MASK) { + tail_bytes = req->cryptlen & + AES_BLOCK_REMAINING_SIZE_MASK; + tail_bytes = walk_size + tail_bytes - AES_BLOCK_SIZE; + head_bytes = req->cryptlen - tail_bytes; + } else { + tail_bytes = 0; + head_bytes = req->cryptlen; + } + } + + riscv64_aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv); + + if (head_bytes && tail_bytes) { + skcipher_request_set_tfm(&sub_req, tfm); + skcipher_request_set_callback( + &sub_req, skcipher_request_flags(req), NULL, NULL); + skcipher_request_set_crypt(&sub_req, req->src, req->dst, + head_bytes, req->iv); + req = &sub_req; + } + + if (head_bytes) { + err = skcipher_walk_virt(&walk, req, false); + while ((nbytes = walk.nbytes)) { + if (nbytes == walk.total) + update_iv = (tail_bytes > 0); + + nbytes &= AES_BLOCK_VALID_SIZE_MASK; + kernel_vector_begin(); + func(walk.src.virt.addr, walk.dst.virt.addr, nbytes, + &ctx->ctx1.key, req->iv, update_iv); + kernel_vector_end(); + + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); + } + if (err || !tail_bytes) + return err; + + dst = src = scatterwalk_next(sg_src, &walk.in); + if (req->dst != req->src) + dst = scatterwalk_next(sg_dst, &walk.out); + skcipher_request_set_crypt(req, src, dst, tail_bytes, req->iv); + } + + /* tail */ + err = skcipher_walk_virt(&walk, req, false); + if (err) + return err; + if (walk.nbytes != tail_bytes) + return -EINVAL; + kernel_vector_begin(); + func(walk.src.virt.addr, walk.dst.virt.addr, walk.nbytes, + &ctx->ctx1.key, req->iv, 0); + kernel_vector_end(); + + return skcipher_walk_done(&walk, 0); +} + +static int xts_encrypt(struct skcipher_request *req) +{ + return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt); +} + +static int xts_decrypt(struct skcipher_request *req) +{ + return xts_crypt(req, rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt); +} + +static struct skcipher_alg riscv64_aes_alg_zvkned[] = { { + .base = { + .cra_name = "ecb(aes)", + .cra_driver_name = "ecb-aes-riscv64-zvkned", + .cra_priority = 300, + .cra_blocksize = AES_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_aes_ctx), + .cra_module = THIS_MODULE, + }, + .min_keysize = AES_MIN_KEY_SIZE, + .max_keysize = AES_MAX_KEY_SIZE, + .walksize = AES_BLOCK_SIZE * 8, + .setkey = aes_setkey, + .encrypt = ecb_encrypt, + .decrypt = ecb_decrypt, +}, { + .base = { + .cra_name = "cbc(aes)", + .cra_driver_name = "cbc-aes-riscv64-zvkned", + .cra_priority = 300, + .cra_blocksize = AES_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_aes_ctx), + .cra_module = THIS_MODULE, + }, + .min_keysize = AES_MIN_KEY_SIZE, + .max_keysize = AES_MAX_KEY_SIZE, + .ivsize = AES_BLOCK_SIZE, + .walksize = AES_BLOCK_SIZE * 8, + .setkey = aes_setkey, + .encrypt = cbc_encrypt, + .decrypt = cbc_decrypt, +} }; + +static struct skcipher_alg riscv64_aes_alg_zvkb_zvkned[] = { { + .base = { + .cra_name = "ctr(aes)", + .cra_driver_name = "ctr-aes-riscv64-zvkb-zvkned", + .cra_priority = 300, + .cra_blocksize = 1, + .cra_ctxsize = sizeof(struct riscv64_aes_ctx), + .cra_module = THIS_MODULE, + }, + .min_keysize = AES_MIN_KEY_SIZE, + .max_keysize = AES_MAX_KEY_SIZE, + .ivsize = AES_BLOCK_SIZE, + .chunksize = AES_BLOCK_SIZE, + .walksize = AES_BLOCK_SIZE * 8, + .setkey = aes_setkey, + .encrypt = ctr_encrypt, + .decrypt = ctr_encrypt, +} }; + +static struct skcipher_alg riscv64_aes_alg_zvbb_zvkg_zvkned[] = { { + .base = { + .cra_name = "xts(aes)", + .cra_driver_name = "xts-aes-riscv64-zvbb-zvkg-zvkned", + .cra_priority = 300, + .cra_blocksize = AES_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_aes_xts_ctx), + .cra_module = THIS_MODULE, + }, + .min_keysize = AES_MIN_KEY_SIZE * 2, + .max_keysize = AES_MAX_KEY_SIZE * 2, + .ivsize = AES_BLOCK_SIZE, + .chunksize = AES_BLOCK_SIZE, + .walksize = AES_BLOCK_SIZE * 8, + .setkey = xts_setkey, + .encrypt = xts_encrypt, + .decrypt = xts_decrypt, +} }; + +static int __init riscv64_aes_block_mod_init(void) +{ + int ret = -ENODEV; + + if (riscv_isa_extension_available(NULL, ZVKNED) && + riscv_vector_vlen() >= 128) { + ret = crypto_register_skciphers( + riscv64_aes_alg_zvkned, + ARRAY_SIZE(riscv64_aes_alg_zvkned)); + if (ret) + return ret; + + if (riscv_isa_extension_available(NULL, ZVBB)) { + ret = crypto_register_skciphers( + riscv64_aes_alg_zvkb_zvkned, + ARRAY_SIZE(riscv64_aes_alg_zvkb_zvkned)); + if (ret) + goto unregister_zvkned; + + if (riscv_isa_extension_available(NULL, ZVKG)) { + ret = crypto_register_skciphers( + riscv64_aes_alg_zvbb_zvkg_zvkned, + ARRAY_SIZE( + riscv64_aes_alg_zvbb_zvkg_zvkned)); + if (ret) + goto unregister_zvkb_zvkned; + } + } + } + + return ret; + +unregister_zvkb_zvkned: + crypto_unregister_skciphers(riscv64_aes_alg_zvkb_zvkned, + ARRAY_SIZE(riscv64_aes_alg_zvkb_zvkned)); +unregister_zvkned: + crypto_unregister_skciphers(riscv64_aes_alg_zvkned, + ARRAY_SIZE(riscv64_aes_alg_zvkned)); + + return ret; +} + +static void __exit riscv64_aes_block_mod_fini(void) +{ + if (riscv_isa_extension_available(NULL, ZVKNED) && + riscv_vector_vlen() >= 128) { + crypto_unregister_skciphers(riscv64_aes_alg_zvkned, + ARRAY_SIZE(riscv64_aes_alg_zvkned)); + + if (riscv_isa_extension_available(NULL, ZVBB)) { + crypto_unregister_skciphers( + riscv64_aes_alg_zvkb_zvkned, + ARRAY_SIZE(riscv64_aes_alg_zvkb_zvkned)); + + if (riscv_isa_extension_available(NULL, ZVKG)) { + crypto_unregister_skciphers( + riscv64_aes_alg_zvbb_zvkg_zvkned, + ARRAY_SIZE( + riscv64_aes_alg_zvbb_zvkg_zvkned)); + } + } + } +} + +module_init(riscv64_aes_block_mod_init); +module_exit(riscv64_aes_block_mod_fini); + +MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS (RISC-V accelerated)"); +MODULE_AUTHOR("Jerry Shih "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("cbc(aes)"); +MODULE_ALIAS_CRYPTO("ctr(aes)"); +MODULE_ALIAS_CRYPTO("ecb(aes)"); +MODULE_ALIAS_CRYPTO("xts(aes)"); diff --git a/arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl new file mode 100644 index 000000000000..0daec9c38574 --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-zvbb-zvkg-zvkned.pl @@ -0,0 +1,944 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >= 128 +# - RISC-V Vector Bit-manipulation extension ('Zvbb') +# - RISC-V Vector GCM/GMAC extension ('Zvkg') +# - RISC-V Vector AES block cipher extension ('Zvkned') +# - RISC-V Zicclsm(Main memory supports misaligned loads/stores) + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +{ +################################################################################ +# void rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt(const unsigned char *in, +# unsigned char *out, size_t length, +# const AES_KEY *key, +# unsigned char iv[16], +# int update_iv) +my ($INPUT, $OUTPUT, $LENGTH, $KEY, $IV, $UPDATE_IV) = ("a0", "a1", "a2", "a3", "a4", "a5"); +my ($TAIL_LENGTH) = ("a6"); +my ($VL) = ("a7"); +my ($T0, $T1, $T2, $T3) = ("t0", "t1", "t2", "t3"); +my ($STORE_LEN32) = ("t4"); +my ($LEN32) = ("t5"); +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) = map("v$_",(0..31)); + +# load iv to v28 +sub load_xts_iv0 { + my $code=<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V28, $IV]} +___ + + return $code; +} + +# prepare input data(v24), iv(v28), bit-reversed-iv(v16), bit-reversed-iv-multiplier(v20) +sub init_first_round { + my $code=<<___; + # load input + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + @{[vle32_v $V24, $INPUT]} + + li $T0, 5 + # We could simplify the initialization steps if we have `block<=1`. + blt $LEN32, $T0, 1f + + # Note: We use `vgmul` for GF(2^128) multiplication. The `vgmul` uses + # different order of coefficients. We should use`vbrev8` to reverse the + # data when we use `vgmul`. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vbrev8_v $V0, $V28]} + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} + @{[vmv_v_i $V16, 0]} + # v16: [r-IV0, r-IV0, ...] + @{[vaesz_vs $V16, $V0]} + + # Prepare GF(2^128) multiplier [1, x, x^2, x^3, ...] in v8. + slli $T0, $LEN32, 2 + @{[vsetvli "zero", $T0, "e32", "m1", "ta", "ma"]} + # v2: [`1`, `1`, `1`, `1`, ...] + @{[vmv_v_i $V2, 1]} + # v3: [`0`, `1`, `2`, `3`, ...] + @{[vid_v $V3]} + @{[vsetvli "zero", $T0, "e64", "m2", "ta", "ma"]} + # v4: [`1`, 0, `1`, 0, `1`, 0, `1`, 0, ...] + @{[vzext_vf2 $V4, $V2]} + # v6: [`0`, 0, `1`, 0, `2`, 0, `3`, 0, ...] + @{[vzext_vf2 $V6, $V3]} + slli $T0, $LEN32, 1 + @{[vsetvli "zero", $T0, "e32", "m2", "ta", "ma"]} + # v8: [1<<0=1, 0, 0, 0, 1<<1=x, 0, 0, 0, 1<<2=x^2, 0, 0, 0, ...] + @{[vwsll_vv $V8, $V4, $V6]} + + # Compute [r-IV0*1, r-IV0*x, r-IV0*x^2, r-IV0*x^3, ...] in v16 + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} + @{[vbrev8_v $V8, $V8]} + @{[vgmul_vv $V16, $V8]} + + # Compute [IV0*1, IV0*x, IV0*x^2, IV0*x^3, ...] in v28. + # Reverse the bits order back. + @{[vbrev8_v $V28, $V16]} + + # Prepare the x^n multiplier in v20. The `n` is the aes-xts block number + # in a LMUL=4 register group. + # n = ((VLEN*LMUL)/(32*4)) = ((VLEN*4)/(32*4)) + # = (VLEN/32) + # We could use vsetvli with `e32, m1` to compute the `n` number. + @{[vsetvli $T0, "zero", "e32", "m1", "ta", "ma"]} + li $T1, 1 + sll $T0, $T1, $T0 + @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]} + @{[vmv_v_i $V0, 0]} + @{[vsetivli "zero", 1, "e64", "m1", "tu", "ma"]} + @{[vmv_v_x $V0, $T0]} + @{[vsetivli "zero", 2, "e64", "m1", "ta", "ma"]} + @{[vbrev8_v $V0, $V0]} + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} + @{[vmv_v_i $V20, 0]} + @{[vaesz_vs $V20, $V0]} + + j 2f +1: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vbrev8_v $V16, $V28]} +2: +___ + + return $code; +} + +# prepare xts enc last block's input(v24) and iv(v28) +sub handle_xts_enc_last_block { + my $code=<<___; + bnez $TAIL_LENGTH, 2f + + beqz $UPDATE_IV, 1f + ## Store next IV + addi $VL, $VL, -4 + @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]} + # multiplier + @{[vslidedown_vx $V16, $V16, $VL]} + + # setup `x` multiplier with byte-reversed order + # 0b00000010 => 0b01000000 (0x40) + li $T0, 0x40 + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_i $V28, 0]} + @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]} + @{[vmv_v_x $V28, $T0]} + + # IV * `x` + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V28]} + # Reverse the IV's bits order back to big-endian + @{[vbrev8_v $V28, $V16]} + + @{[vse32_v $V28, $IV]} +1: + + ret +2: + # slidedown second to last block + addi $VL, $VL, -4 + @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]} + # ciphertext + @{[vslidedown_vx $V24, $V24, $VL]} + # multiplier + @{[vslidedown_vx $V16, $V16, $VL]} + + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_v $V25, $V24]} + + # load last block into v24 + # note: We should load the last block before store the second to last block + # for in-place operation. + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]} + @{[vle8_v $V24, $INPUT]} + + # setup `x` multiplier with byte-reversed order + # 0b00000010 => 0b01000000 (0x40) + li $T0, 0x40 + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_i $V28, 0]} + @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]} + @{[vmv_v_x $V28, $T0]} + + # compute IV for last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V28]} + @{[vbrev8_v $V28, $V16]} + + # store second to last block + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "ta", "ma"]} + @{[vse8_v $V25, $OUTPUT]} +___ + + return $code; +} + +# prepare xts dec second to last block's input(v24) and iv(v29) and +# last block's and iv(v28) +sub handle_xts_dec_last_block { + my $code=<<___; + bnez $TAIL_LENGTH, 2f + + beqz $UPDATE_IV, 1f + ## Store next IV + # setup `x` multiplier with byte-reversed order + # 0b00000010 => 0b01000000 (0x40) + li $T0, 0x40 + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_i $V28, 0]} + @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]} + @{[vmv_v_x $V28, $T0]} + + beqz $LENGTH, 3f + addi $VL, $VL, -4 + @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]} + # multiplier + @{[vslidedown_vx $V16, $V16, $VL]} + +3: + # IV * `x` + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V28]} + # Reverse the IV's bits order back to big-endian + @{[vbrev8_v $V28, $V16]} + + @{[vse32_v $V28, $IV]} +1: + + ret +2: + # load second to last block's ciphertext + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V24, $INPUT]} + addi $INPUT, $INPUT, 16 + + # setup `x` multiplier with byte-reversed order + # 0b00000010 => 0b01000000 (0x40) + li $T0, 0x40 + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vmv_v_i $V20, 0]} + @{[vsetivli "zero", 1, "e8", "m1", "tu", "ma"]} + @{[vmv_v_x $V20, $T0]} + + beqz $LENGTH, 1f + # slidedown third to last block + addi $VL, $VL, -4 + @{[vsetivli "zero", 4, "e32", "m4", "ta", "ma"]} + # multiplier + @{[vslidedown_vx $V16, $V16, $VL]} + + # compute IV for last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V20]} + @{[vbrev8_v $V28, $V16]} + + # compute IV for second to last block + @{[vgmul_vv $V16, $V20]} + @{[vbrev8_v $V29, $V16]} + j 2f +1: + # compute IV for second to last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vgmul_vv $V16, $V20]} + @{[vbrev8_v $V29, $V16]} +2: +___ + + return $code; +} + +# Load all 11 round keys to v1-v11 registers. +sub aes_128_load_key { + my $code=<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V2, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V3, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V4, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V5, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V6, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V7, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V8, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V9, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V10, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V11, $KEY]} +___ + + return $code; +} + +# Load all 13 round keys to v1-v13 registers. +sub aes_192_load_key { + my $code=<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V2, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V3, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V4, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V5, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V6, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V7, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V8, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V9, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V10, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V11, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V12, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V13, $KEY]} +___ + + return $code; +} + +# Load all 15 round keys to v1-v15 registers. +sub aes_256_load_key { + my $code=<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V2, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V3, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V4, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V5, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V6, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V7, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V8, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V9, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V10, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V11, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V12, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V13, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V14, $KEY]} + addi $KEY, $KEY, 16 + @{[vle32_v $V15, $KEY]} +___ + + return $code; +} + +# aes-128 enc with round keys v1-v11 +sub aes_128_enc { + my $code=<<___; + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesef_vs $V24, $V11]} +___ + + return $code; +} + +# aes-128 dec with round keys v1-v11 +sub aes_128_dec { + my $code=<<___; + @{[vaesz_vs $V24, $V11]} + @{[vaesdm_vs $V24, $V10]} + @{[vaesdm_vs $V24, $V9]} + @{[vaesdm_vs $V24, $V8]} + @{[vaesdm_vs $V24, $V7]} + @{[vaesdm_vs $V24, $V6]} + @{[vaesdm_vs $V24, $V5]} + @{[vaesdm_vs $V24, $V4]} + @{[vaesdm_vs $V24, $V3]} + @{[vaesdm_vs $V24, $V2]} + @{[vaesdf_vs $V24, $V1]} +___ + + return $code; +} + +# aes-192 enc with round keys v1-v13 +sub aes_192_enc { + my $code=<<___; + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesem_vs $V24, $V11]} + @{[vaesem_vs $V24, $V12]} + @{[vaesef_vs $V24, $V13]} +___ + + return $code; +} + +# aes-192 dec with round keys v1-v13 +sub aes_192_dec { + my $code=<<___; + @{[vaesz_vs $V24, $V13]} + @{[vaesdm_vs $V24, $V12]} + @{[vaesdm_vs $V24, $V11]} + @{[vaesdm_vs $V24, $V10]} + @{[vaesdm_vs $V24, $V9]} + @{[vaesdm_vs $V24, $V8]} + @{[vaesdm_vs $V24, $V7]} + @{[vaesdm_vs $V24, $V6]} + @{[vaesdm_vs $V24, $V5]} + @{[vaesdm_vs $V24, $V4]} + @{[vaesdm_vs $V24, $V3]} + @{[vaesdm_vs $V24, $V2]} + @{[vaesdf_vs $V24, $V1]} +___ + + return $code; +} + +# aes-256 enc with round keys v1-v15 +sub aes_256_enc { + my $code=<<___; + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesem_vs $V24, $V11]} + @{[vaesem_vs $V24, $V12]} + @{[vaesem_vs $V24, $V13]} + @{[vaesem_vs $V24, $V14]} + @{[vaesef_vs $V24, $V15]} +___ + + return $code; +} + +# aes-256 dec with round keys v1-v15 +sub aes_256_dec { + my $code=<<___; + @{[vaesz_vs $V24, $V15]} + @{[vaesdm_vs $V24, $V14]} + @{[vaesdm_vs $V24, $V13]} + @{[vaesdm_vs $V24, $V12]} + @{[vaesdm_vs $V24, $V11]} + @{[vaesdm_vs $V24, $V10]} + @{[vaesdm_vs $V24, $V9]} + @{[vaesdm_vs $V24, $V8]} + @{[vaesdm_vs $V24, $V7]} + @{[vaesdm_vs $V24, $V6]} + @{[vaesdm_vs $V24, $V5]} + @{[vaesdm_vs $V24, $V4]} + @{[vaesdm_vs $V24, $V3]} + @{[vaesdm_vs $V24, $V2]} + @{[vaesdf_vs $V24, $V1]} +___ + + return $code; +} + +$code .= <<___; +.p2align 3 +.globl rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt +.type rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,\@function +rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt: + @{[load_xts_iv0]} + + # aes block size is 16 + andi $TAIL_LENGTH, $LENGTH, 15 + mv $STORE_LEN32, $LENGTH + beqz $TAIL_LENGTH, 1f + sub $LENGTH, $LENGTH, $TAIL_LENGTH + addi $STORE_LEN32, $LENGTH, -16 +1: + # We make the `LENGTH` become e32 length here. + srli $LEN32, $LENGTH, 2 + srli $STORE_LEN32, $STORE_LEN32, 2 + + # Load number of rounds + lwu $T0, 240($KEY) + li $T1, 14 + li $T2, 12 + li $T3, 10 + beq $T0, $T1, aes_xts_enc_256 + beq $T0, $T2, aes_xts_enc_192 + beq $T0, $T3, aes_xts_enc_128 +.size rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt,.-rv64i_zvbb_zvkg_zvkned_aes_xts_encrypt +___ + +$code .= <<___; +.p2align 3 +aes_xts_enc_128: + @{[init_first_round]} + @{[aes_128_load_key]} + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Lenc_blocks_128: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load plaintext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_128_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store ciphertext + @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]} + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + sub $STORE_LEN32, $STORE_LEN32, $VL + + bnez $LEN32, .Lenc_blocks_128 + + @{[handle_xts_enc_last_block]} + + # xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_128_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store last block ciphertext + addi $OUTPUT, $OUTPUT, -16 + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_enc_128,.-aes_xts_enc_128 +___ + +$code .= <<___; +.p2align 3 +aes_xts_enc_192: + @{[init_first_round]} + @{[aes_192_load_key]} + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Lenc_blocks_192: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load plaintext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_192_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store ciphertext + @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]} + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + sub $STORE_LEN32, $STORE_LEN32, $VL + + bnez $LEN32, .Lenc_blocks_192 + + @{[handle_xts_enc_last_block]} + + # xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_192_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store last block ciphertext + addi $OUTPUT, $OUTPUT, -16 + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_enc_192,.-aes_xts_enc_192 +___ + +$code .= <<___; +.p2align 3 +aes_xts_enc_256: + @{[init_first_round]} + @{[aes_256_load_key]} + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Lenc_blocks_256: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load plaintext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_256_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store ciphertext + @{[vsetvli "zero", $STORE_LEN32, "e32", "m4", "ta", "ma"]} + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + sub $STORE_LEN32, $STORE_LEN32, $VL + + bnez $LEN32, .Lenc_blocks_256 + + @{[handle_xts_enc_last_block]} + + # xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_256_enc]} + @{[vxor_vv $V24, $V24, $V28]} + + # store last block ciphertext + addi $OUTPUT, $OUTPUT, -16 + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_enc_256,.-aes_xts_enc_256 +___ + +################################################################################ +# void rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt(const unsigned char *in, +# unsigned char *out, size_t length, +# const AES_KEY *key, +# unsigned char iv[16], +# int update_iv) +$code .= <<___; +.p2align 3 +.globl rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt +.type rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,\@function +rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt: + @{[load_xts_iv0]} + + # aes block size is 16 + andi $TAIL_LENGTH, $LENGTH, 15 + beqz $TAIL_LENGTH, 1f + sub $LENGTH, $LENGTH, $TAIL_LENGTH + addi $LENGTH, $LENGTH, -16 +1: + # We make the `LENGTH` become e32 length here. + srli $LEN32, $LENGTH, 2 + + # Load number of rounds + lwu $T0, 240($KEY) + li $T1, 14 + li $T2, 12 + li $T3, 10 + beq $T0, $T1, aes_xts_dec_256 + beq $T0, $T2, aes_xts_dec_192 + beq $T0, $T3, aes_xts_dec_128 +.size rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt,.-rv64i_zvbb_zvkg_zvkned_aes_xts_decrypt +___ + +$code .= <<___; +.p2align 3 +aes_xts_dec_128: + @{[init_first_round]} + @{[aes_128_load_key]} + + beqz $LEN32, 2f + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Ldec_blocks_128: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load ciphertext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_128_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store plaintext + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + + bnez $LEN32, .Ldec_blocks_128 + +2: + @{[handle_xts_dec_last_block]} + + ## xts second to last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V29]} + @{[aes_128_dec]} + @{[vxor_vv $V24, $V24, $V29]} + @{[vmv_v_v $V25, $V24]} + + # load last block ciphertext + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]} + @{[vle8_v $V24, $INPUT]} + + # store second to last block plaintext + addi $T0, $OUTPUT, 16 + @{[vse8_v $V25, $T0]} + + ## xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_128_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store second to last block plaintext + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_dec_128,.-aes_xts_dec_128 +___ + +$code .= <<___; +.p2align 3 +aes_xts_dec_192: + @{[init_first_round]} + @{[aes_192_load_key]} + + beqz $LEN32, 2f + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Ldec_blocks_192: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load ciphertext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_192_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store plaintext + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + + bnez $LEN32, .Ldec_blocks_192 + +2: + @{[handle_xts_dec_last_block]} + + ## xts second to last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V29]} + @{[aes_192_dec]} + @{[vxor_vv $V24, $V24, $V29]} + @{[vmv_v_v $V25, $V24]} + + # load last block ciphertext + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]} + @{[vle8_v $V24, $INPUT]} + + # store second to last block plaintext + addi $T0, $OUTPUT, 16 + @{[vse8_v $V25, $T0]} + + ## xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_192_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store second to last block plaintext + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_dec_192,.-aes_xts_dec_192 +___ + +$code .= <<___; +.p2align 3 +aes_xts_dec_256: + @{[init_first_round]} + @{[aes_256_load_key]} + + beqz $LEN32, 2f + + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + j 1f + +.Ldec_blocks_256: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + # load ciphertext into v24 + @{[vle32_v $V24, $INPUT]} + # update iv + @{[vgmul_vv $V16, $V20]} + # reverse the iv's bits order back + @{[vbrev8_v $V28, $V16]} +1: + @{[vxor_vv $V24, $V24, $V28]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + add $INPUT, $INPUT, $T0 + @{[aes_256_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store plaintext + @{[vse32_v $V24, $OUTPUT]} + add $OUTPUT, $OUTPUT, $T0 + + bnez $LEN32, .Ldec_blocks_256 + +2: + @{[handle_xts_dec_last_block]} + + ## xts second to last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V29]} + @{[aes_256_dec]} + @{[vxor_vv $V24, $V24, $V29]} + @{[vmv_v_v $V25, $V24]} + + # load last block ciphertext + @{[vsetvli "zero", $TAIL_LENGTH, "e8", "m1", "tu", "ma"]} + @{[vle8_v $V24, $INPUT]} + + # store second to last block plaintext + addi $T0, $OUTPUT, 16 + @{[vse8_v $V25, $T0]} + + ## xts last block + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V28]} + @{[aes_256_dec]} + @{[vxor_vv $V24, $V24, $V28]} + + # store second to last block plaintext + @{[vse32_v $V24, $OUTPUT]} + + ret +.size aes_xts_dec_256,.-aes_xts_dec_256 +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; diff --git a/arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl new file mode 100644 index 000000000000..bc659da44c53 --- /dev/null +++ b/arch/riscv/crypto/aes-riscv64-zvkb-zvkned.pl @@ -0,0 +1,416 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >= 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector AES block cipher extension ('Zvkned') +# - RISC-V Zicclsm(Main memory supports misaligned loads/stores) + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# void rv64i_zvkb_zvkned_ctr32_encrypt_blocks(const unsigned char *in, +# unsigned char *out, size_t length, +# const void *key, +# unsigned char ivec[16]); +{ +my ($INP, $OUTP, $LEN, $KEYP, $IVP) = ("a0", "a1", "a2", "a3", "a4"); +my ($T0, $T1, $T2, $T3) = ("t0", "t1", "t2", "t3"); +my ($VL) = ("t4"); +my ($LEN32) = ("t5"); +my ($CTR) = ("t6"); +my ($MASK) = ("v0"); +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) = map("v$_",(0..31)); + +# Prepare the AES ctr input data into v16. +sub init_aes_ctr_input { + my $code=<<___; + # Setup mask into v0 + # The mask pattern for 4*N-th elements + # mask v0: [000100010001....] + # Note: + # We could setup the mask just for the maximum element length instead of + # the VLMAX. + li $T0, 0b10001000 + @{[vsetvli $T2, "zero", "e8", "m1", "ta", "ma"]} + @{[vmv_v_x $MASK, $T0]} + # Load IV. + # v31:[IV0, IV1, IV2, big-endian count] + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V31, $IVP]} + # Convert the big-endian counter into little-endian. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]} + @{[vrev8_v $V31, $V31, $MASK]} + # Splat the IV to v16 + @{[vsetvli "zero", $LEN32, "e32", "m4", "ta", "ma"]} + @{[vmv_v_i $V16, 0]} + @{[vaesz_vs $V16, $V31]} + # Prepare the ctr pattern into v20 + # v20: [x, x, x, 0, x, x, x, 1, x, x, x, 2, ...] + @{[viota_m $V20, $MASK, $MASK]} + # v16:[IV0, IV1, IV2, count+0, IV0, IV1, IV2, count+1, ...] + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]} + @{[vadd_vv $V16, $V16, $V20, $MASK]} +___ + + return $code; +} + +$code .= <<___; +.p2align 3 +.globl rv64i_zvkb_zvkned_ctr32_encrypt_blocks +.type rv64i_zvkb_zvkned_ctr32_encrypt_blocks,\@function +rv64i_zvkb_zvkned_ctr32_encrypt_blocks: + # The aes block size is 16 bytes. + # We try to get the minimum aes block number including the tail data. + addi $T0, $LEN, 15 + # the minimum block number + srli $T0, $T0, 4 + # We make the block number become e32 length here. + slli $LEN32, $T0, 2 + + # Load number of rounds + lwu $T0, 240($KEYP) + li $T1, 14 + li $T2, 12 + li $T3, 10 + + beq $T0, $T1, ctr32_encrypt_blocks_256 + beq $T0, $T2, ctr32_encrypt_blocks_192 + beq $T0, $T3, ctr32_encrypt_blocks_128 + + ret +.size rv64i_zvkb_zvkned_ctr32_encrypt_blocks,.-rv64i_zvkb_zvkned_ctr32_encrypt_blocks +___ + +$code .= <<___; +.p2align 3 +ctr32_encrypt_blocks_128: + # Load all 11 round keys to v1-v11 registers. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + + @{[init_aes_ctr_input]} + + ##### AES body + j 2f +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} +2: + # Prepare the AES ctr input into v24. + # The ctr data uses big-endian form. + @{[vmv_v_v $V24, $V16]} + @{[vrev8_v $V24, $V24, $MASK]} + srli $CTR, $VL, 2 + sub $LEN32, $LEN32, $VL + + # Load plaintext in bytes into v20. + @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]} + @{[vle8_v $V20, $INP]} + sub $LEN, $LEN, $T0 + add $INP, $INP, $T0 + + @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]} + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesef_vs $V24, $V11]} + + # ciphertext + @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V20]} + + # Store the ciphertext. + @{[vse8_v $V24, $OUTP]} + add $OUTP, $OUTP, $T0 + + bnez $LEN, 1b + + ## store ctr iv + @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} + # Convert ctr data back to big-endian. + @{[vrev8_v $V16, $V16, $MASK]} + @{[vse32_v $V16, $IVP]} + + ret +.size ctr32_encrypt_blocks_128,.-ctr32_encrypt_blocks_128 +___ + +$code .= <<___; +.p2align 3 +ctr32_encrypt_blocks_192: + # Load all 13 round keys to v1-v13 registers. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} + + @{[init_aes_ctr_input]} + + ##### AES body + j 2f +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} +2: + # Prepare the AES ctr input into v24. + # The ctr data uses big-endian form. + @{[vmv_v_v $V24, $V16]} + @{[vrev8_v $V24, $V24, $MASK]} + srli $CTR, $VL, 2 + sub $LEN32, $LEN32, $VL + + # Load plaintext in bytes into v20. + @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]} + @{[vle8_v $V20, $INP]} + sub $LEN, $LEN, $T0 + add $INP, $INP, $T0 + + @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]} + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesem_vs $V24, $V11]} + @{[vaesem_vs $V24, $V12]} + @{[vaesef_vs $V24, $V13]} + + # ciphertext + @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V20]} + + # Store the ciphertext. + @{[vse8_v $V24, $OUTP]} + add $OUTP, $OUTP, $T0 + + bnez $LEN, 1b + + ## store ctr iv + @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} + # Convert ctr data back to big-endian. + @{[vrev8_v $V16, $V16, $MASK]} + @{[vse32_v $V16, $IVP]} + + ret +.size ctr32_encrypt_blocks_192,.-ctr32_encrypt_blocks_192 +___ + +$code .= <<___; +.p2align 3 +ctr32_encrypt_blocks_256: + # Load all 15 round keys to v1-v15 registers. + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, $KEYP]} + + @{[init_aes_ctr_input]} + + ##### AES body + j 2f +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} +2: + # Prepare the AES ctr input into v24. + # The ctr data uses big-endian form. + @{[vmv_v_v $V24, $V16]} + @{[vrev8_v $V24, $V24, $MASK]} + srli $CTR, $VL, 2 + sub $LEN32, $LEN32, $VL + + # Load plaintext in bytes into v20. + @{[vsetvli $T0, $LEN, "e8", "m4", "ta", "ma"]} + @{[vle8_v $V20, $INP]} + sub $LEN, $LEN, $T0 + add $INP, $INP, $T0 + + @{[vsetvli "zero", $VL, "e32", "m4", "ta", "ma"]} + @{[vaesz_vs $V24, $V1]} + @{[vaesem_vs $V24, $V2]} + @{[vaesem_vs $V24, $V3]} + @{[vaesem_vs $V24, $V4]} + @{[vaesem_vs $V24, $V5]} + @{[vaesem_vs $V24, $V6]} + @{[vaesem_vs $V24, $V7]} + @{[vaesem_vs $V24, $V8]} + @{[vaesem_vs $V24, $V9]} + @{[vaesem_vs $V24, $V10]} + @{[vaesem_vs $V24, $V11]} + @{[vaesem_vs $V24, $V12]} + @{[vaesem_vs $V24, $V13]} + @{[vaesem_vs $V24, $V14]} + @{[vaesef_vs $V24, $V15]} + + # ciphertext + @{[vsetvli "zero", $T0, "e8", "m4", "ta", "ma"]} + @{[vxor_vv $V24, $V24, $V20]} + + # Store the ciphertext. + @{[vse8_v $V24, $OUTP]} + add $OUTP, $OUTP, $T0 + + bnez $LEN, 1b + + ## store ctr iv + @{[vsetivli "zero", 4, "e32", "m1", "ta", "mu"]} + # Increase ctr in v16. + @{[vadd_vx $V16, $V16, $CTR, $MASK]} + # Convert ctr data back to big-endian. + @{[vrev8_v $V16, $V16, $MASK]} + @{[vse32_v $V16, $IVP]} + + ret +.size ctr32_encrypt_blocks_256,.-ctr32_encrypt_blocks_256 +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl index c0ecde77bf56..4689f878463a 100644 --- a/arch/riscv/crypto/aes-riscv64-zvkned.pl +++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl @@ -66,6 +66,753 @@ my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, ) = map("v$_",(0..31)); +# Load all 11 round keys to v1-v11 registers. +sub aes_128_load_key { + my $KEYP = shift; + + my $code=<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} +___ + + return $code; +} + +# Load all 13 round keys to v1-v13 registers. +sub aes_192_load_key { + my $KEYP = shift; + + my $code=<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} +___ + + return $code; +} + +# Load all 15 round keys to v1-v15 registers. +sub aes_256_load_key { + my $KEYP = shift; + + my $code=<<___; + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $V1, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V2, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V3, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V4, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V5, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V6, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V7, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V8, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V9, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V10, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V11, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V12, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V13, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V14, $KEYP]} + addi $KEYP, $KEYP, 16 + @{[vle32_v $V15, $KEYP]} +___ + + return $code; +} + +# aes-128 encryption with round keys v1-v11 +sub aes_128_encrypt { + my $code=<<___; + @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3] + @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesem_vs $V24, $V4]} # with round key w[12,15] + @{[vaesem_vs $V24, $V5]} # with round key w[16,19] + @{[vaesem_vs $V24, $V6]} # with round key w[20,23] + @{[vaesem_vs $V24, $V7]} # with round key w[24,27] + @{[vaesem_vs $V24, $V8]} # with round key w[28,31] + @{[vaesem_vs $V24, $V9]} # with round key w[32,35] + @{[vaesem_vs $V24, $V10]} # with round key w[36,39] + @{[vaesef_vs $V24, $V11]} # with round key w[40,43] +___ + + return $code; +} + +# aes-128 decryption with round keys v1-v11 +sub aes_128_decrypt { + my $code=<<___; + @{[vaesz_vs $V24, $V11]} # with round key w[40,43] + @{[vaesdm_vs $V24, $V10]} # with round key w[36,39] + @{[vaesdm_vs $V24, $V9]} # with round key w[32,35] + @{[vaesdm_vs $V24, $V8]} # with round key w[28,31] + @{[vaesdm_vs $V24, $V7]} # with round key w[24,27] + @{[vaesdm_vs $V24, $V6]} # with round key w[20,23] + @{[vaesdm_vs $V24, $V5]} # with round key w[16,19] + @{[vaesdm_vs $V24, $V4]} # with round key w[12,15] + @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3] +___ + + return $code; +} + +# aes-192 encryption with round keys v1-v13 +sub aes_192_encrypt { + my $code=<<___; + @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3] + @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesem_vs $V24, $V4]} # with round key w[12,15] + @{[vaesem_vs $V24, $V5]} # with round key w[16,19] + @{[vaesem_vs $V24, $V6]} # with round key w[20,23] + @{[vaesem_vs $V24, $V7]} # with round key w[24,27] + @{[vaesem_vs $V24, $V8]} # with round key w[28,31] + @{[vaesem_vs $V24, $V9]} # with round key w[32,35] + @{[vaesem_vs $V24, $V10]} # with round key w[36,39] + @{[vaesem_vs $V24, $V11]} # with round key w[40,43] + @{[vaesem_vs $V24, $V12]} # with round key w[44,47] + @{[vaesef_vs $V24, $V13]} # with round key w[48,51] +___ + + return $code; +} + +# aes-192 decryption with round keys v1-v13 +sub aes_192_decrypt { + my $code=<<___; + @{[vaesz_vs $V24, $V13]} # with round key w[48,51] + @{[vaesdm_vs $V24, $V12]} # with round key w[44,47] + @{[vaesdm_vs $V24, $V11]} # with round key w[40,43] + @{[vaesdm_vs $V24, $V10]} # with round key w[36,39] + @{[vaesdm_vs $V24, $V9]} # with round key w[32,35] + @{[vaesdm_vs $V24, $V8]} # with round key w[28,31] + @{[vaesdm_vs $V24, $V7]} # with round key w[24,27] + @{[vaesdm_vs $V24, $V6]} # with round key w[20,23] + @{[vaesdm_vs $V24, $V5]} # with round key w[16,19] + @{[vaesdm_vs $V24, $V4]} # with round key w[12,15] + @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3] +___ + + return $code; +} + +# aes-256 encryption with round keys v1-v15 +sub aes_256_encrypt { + my $code=<<___; + @{[vaesz_vs $V24, $V1]} # with round key w[ 0, 3] + @{[vaesem_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesem_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesem_vs $V24, $V4]} # with round key w[12,15] + @{[vaesem_vs $V24, $V5]} # with round key w[16,19] + @{[vaesem_vs $V24, $V6]} # with round key w[20,23] + @{[vaesem_vs $V24, $V7]} # with round key w[24,27] + @{[vaesem_vs $V24, $V8]} # with round key w[28,31] + @{[vaesem_vs $V24, $V9]} # with round key w[32,35] + @{[vaesem_vs $V24, $V10]} # with round key w[36,39] + @{[vaesem_vs $V24, $V11]} # with round key w[40,43] + @{[vaesem_vs $V24, $V12]} # with round key w[44,47] + @{[vaesem_vs $V24, $V13]} # with round key w[48,51] + @{[vaesem_vs $V24, $V14]} # with round key w[52,55] + @{[vaesef_vs $V24, $V15]} # with round key w[56,59] +___ + + return $code; +} + +# aes-256 decryption with round keys v1-v15 +sub aes_256_decrypt { + my $code=<<___; + @{[vaesz_vs $V24, $V15]} # with round key w[56,59] + @{[vaesdm_vs $V24, $V14]} # with round key w[52,55] + @{[vaesdm_vs $V24, $V13]} # with round key w[48,51] + @{[vaesdm_vs $V24, $V12]} # with round key w[44,47] + @{[vaesdm_vs $V24, $V11]} # with round key w[40,43] + @{[vaesdm_vs $V24, $V10]} # with round key w[36,39] + @{[vaesdm_vs $V24, $V9]} # with round key w[32,35] + @{[vaesdm_vs $V24, $V8]} # with round key w[28,31] + @{[vaesdm_vs $V24, $V7]} # with round key w[24,27] + @{[vaesdm_vs $V24, $V6]} # with round key w[20,23] + @{[vaesdm_vs $V24, $V5]} # with round key w[16,19] + @{[vaesdm_vs $V24, $V4]} # with round key w[12,15] + @{[vaesdm_vs $V24, $V3]} # with round key w[ 8,11] + @{[vaesdm_vs $V24, $V2]} # with round key w[ 4, 7] + @{[vaesdf_vs $V24, $V1]} # with round key w[ 0, 3] +___ + + return $code; +} + +{ +############################################################################### +# void rv64i_zvkned_cbc_encrypt(const unsigned char *in, unsigned char *out, +# size_t length, const AES_KEY *key, +# unsigned char *ivec, const int enc); +my ($INP, $OUTP, $LEN, $KEYP, $IVP, $ENC) = ("a0", "a1", "a2", "a3", "a4", "a5"); +my ($T0, $T1, $ROUNDS) = ("t0", "t1", "t2"); + +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_cbc_encrypt +.type rv64i_zvkned_cbc_encrypt,\@function +rv64i_zvkned_cbc_encrypt: + # check whether the length is a multiple of 16 and >= 16 + li $T1, 16 + blt $LEN, $T1, L_end + andi $T1, $LEN, 15 + bnez $T1, L_end + + # Load number of rounds + lwu $ROUNDS, 240($KEYP) + + # Get proper routine for key size + li $T0, 10 + beq $ROUNDS, $T0, L_cbc_enc_128 + + li $T0, 12 + beq $ROUNDS, $T0, L_cbc_enc_192 + + li $T0, 14 + beq $ROUNDS, $T0, L_cbc_enc_256 + + ret +.size rv64i_zvkned_cbc_encrypt,.-rv64i_zvkned_cbc_encrypt +___ + +$code .= <<___; +.p2align 3 +L_cbc_enc_128: + # Load all 11 round keys to v1-v11 registers. + @{[aes_128_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vxor_vv $V24, $V24, $V16]} + j 2f + +1: + @{[vle32_v $V17, $INP]} + @{[vxor_vv $V24, $V24, $V17]} + +2: + # AES body + @{[aes_128_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + addi $INP, $INP, 16 + addi $OUTP, $OUTP, 16 + addi $LEN, $LEN, -16 + + bnez $LEN, 1b + + @{[vse32_v $V24, $IVP]} + + ret +.size L_cbc_enc_128,.-L_cbc_enc_128 +___ + +$code .= <<___; +.p2align 3 +L_cbc_enc_192: + # Load all 13 round keys to v1-v13 registers. + @{[aes_192_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vxor_vv $V24, $V24, $V16]} + j 2f + +1: + @{[vle32_v $V17, $INP]} + @{[vxor_vv $V24, $V24, $V17]} + +2: + # AES body + @{[aes_192_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + addi $INP, $INP, 16 + addi $OUTP, $OUTP, 16 + addi $LEN, $LEN, -16 + + bnez $LEN, 1b + + @{[vse32_v $V24, $IVP]} + + ret +.size L_cbc_enc_192,.-L_cbc_enc_192 +___ + +$code .= <<___; +.p2align 3 +L_cbc_enc_256: + # Load all 15 round keys to v1-v15 registers. + @{[aes_256_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vxor_vv $V24, $V24, $V16]} + j 2f + +1: + @{[vle32_v $V17, $INP]} + @{[vxor_vv $V24, $V24, $V17]} + +2: + # AES body + @{[aes_256_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + addi $INP, $INP, 16 + addi $OUTP, $OUTP, 16 + addi $LEN, $LEN, -16 + + bnez $LEN, 1b + + @{[vse32_v $V24, $IVP]} + + ret +.size L_cbc_enc_256,.-L_cbc_enc_256 +___ + +############################################################################### +# void rv64i_zvkned_cbc_decrypt(const unsigned char *in, unsigned char *out, +# size_t length, const AES_KEY *key, +# unsigned char *ivec, const int enc); +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_cbc_decrypt +.type rv64i_zvkned_cbc_decrypt,\@function +rv64i_zvkned_cbc_decrypt: + # check whether the length is a multiple of 16 and >= 16 + li $T1, 16 + blt $LEN, $T1, L_end + andi $T1, $LEN, 15 + bnez $T1, L_end + + # Load number of rounds + lwu $ROUNDS, 240($KEYP) + + # Get proper routine for key size + li $T0, 10 + beq $ROUNDS, $T0, L_cbc_dec_128 + + li $T0, 12 + beq $ROUNDS, $T0, L_cbc_dec_192 + + li $T0, 14 + beq $ROUNDS, $T0, L_cbc_dec_256 + + ret +.size rv64i_zvkned_cbc_decrypt,.-rv64i_zvkned_cbc_decrypt +___ + +$code .= <<___; +.p2align 3 +L_cbc_dec_128: + # Load all 11 round keys to v1-v11 registers. + @{[aes_128_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + j 2f + +1: + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + addi $OUTP, $OUTP, 16 + +2: + # AES body + @{[aes_128_decrypt]} + + @{[vxor_vv $V24, $V24, $V16]} + @{[vse32_v $V24, $OUTP]} + @{[vmv_v_v $V16, $V17]} + + addi $LEN, $LEN, -16 + addi $INP, $INP, 16 + + bnez $LEN, 1b + + @{[vse32_v $V16, $IVP]} + + ret +.size L_cbc_dec_128,.-L_cbc_dec_128 +___ + +$code .= <<___; +.p2align 3 +L_cbc_dec_192: + # Load all 13 round keys to v1-v13 registers. + @{[aes_192_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + j 2f + +1: + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + addi $OUTP, $OUTP, 16 + +2: + # AES body + @{[aes_192_decrypt]} + + @{[vxor_vv $V24, $V24, $V16]} + @{[vse32_v $V24, $OUTP]} + @{[vmv_v_v $V16, $V17]} + + addi $LEN, $LEN, -16 + addi $INP, $INP, 16 + + bnez $LEN, 1b + + @{[vse32_v $V16, $IVP]} + + ret +.size L_cbc_dec_192,.-L_cbc_dec_192 +___ + +$code .= <<___; +.p2align 3 +L_cbc_dec_256: + # Load all 15 round keys to v1-v15 registers. + @{[aes_256_load_key $KEYP]} + + # Load IV. + @{[vle32_v $V16, $IVP]} + + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + j 2f + +1: + @{[vle32_v $V24, $INP]} + @{[vmv_v_v $V17, $V24]} + addi $OUTP, $OUTP, 16 + +2: + # AES body + @{[aes_256_decrypt]} + + @{[vxor_vv $V24, $V24, $V16]} + @{[vse32_v $V24, $OUTP]} + @{[vmv_v_v $V16, $V17]} + + addi $LEN, $LEN, -16 + addi $INP, $INP, 16 + + bnez $LEN, 1b + + @{[vse32_v $V16, $IVP]} + + ret +.size L_cbc_dec_256,.-L_cbc_dec_256 +___ +} + +{ +############################################################################### +# void rv64i_zvkned_ecb_encrypt(const unsigned char *in, unsigned char *out, +# size_t length, const AES_KEY *key, +# const int enc); +my ($INP, $OUTP, $LEN, $KEYP, $ENC) = ("a0", "a1", "a2", "a3", "a4"); +my ($REMAIN_LEN) = ("a5"); +my ($VL) = ("a6"); +my ($T0, $T1, $ROUNDS) = ("t0", "t1", "t2"); +my ($LEN32) = ("t3"); + +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_ecb_encrypt +.type rv64i_zvkned_ecb_encrypt,\@function +rv64i_zvkned_ecb_encrypt: + # Make the LEN become e32 length. + srli $LEN32, $LEN, 2 + + # Load number of rounds + lwu $ROUNDS, 240($KEYP) + + # Get proper routine for key size + li $T0, 10 + beq $ROUNDS, $T0, L_ecb_enc_128 + + li $T0, 12 + beq $ROUNDS, $T0, L_ecb_enc_192 + + li $T0, 14 + beq $ROUNDS, $T0, L_ecb_enc_256 + + ret +.size rv64i_zvkned_ecb_encrypt,.-rv64i_zvkned_ecb_encrypt +___ + +$code .= <<___; +.p2align 3 +L_ecb_enc_128: + # Load all 11 round keys to v1-v11 registers. + @{[aes_128_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_128_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_enc_128,.-L_ecb_enc_128 +___ + +$code .= <<___; +.p2align 3 +L_ecb_enc_192: + # Load all 13 round keys to v1-v13 registers. + @{[aes_192_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_192_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_enc_192,.-L_ecb_enc_192 +___ + +$code .= <<___; +.p2align 3 +L_ecb_enc_256: + # Load all 15 round keys to v1-v15 registers. + @{[aes_256_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_256_encrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_enc_256,.-L_ecb_enc_256 +___ + +############################################################################### +# void rv64i_zvkned_ecb_decrypt(const unsigned char *in, unsigned char *out, +# size_t length, const AES_KEY *key, +# const int enc); +$code .= <<___; +.p2align 3 +.globl rv64i_zvkned_ecb_decrypt +.type rv64i_zvkned_ecb_decrypt,\@function +rv64i_zvkned_ecb_decrypt: + # Make the LEN become e32 length. + srli $LEN32, $LEN, 2 + + # Load number of rounds + lwu $ROUNDS, 240($KEYP) + + # Get proper routine for key size + li $T0, 10 + beq $ROUNDS, $T0, L_ecb_dec_128 + + li $T0, 12 + beq $ROUNDS, $T0, L_ecb_dec_192 + + li $T0, 14 + beq $ROUNDS, $T0, L_ecb_dec_256 + + ret +.size rv64i_zvkned_ecb_decrypt,.-rv64i_zvkned_ecb_decrypt +___ + +$code .= <<___; +.p2align 3 +L_ecb_dec_128: + # Load all 11 round keys to v1-v11 registers. + @{[aes_128_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_128_decrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_dec_128,.-L_ecb_dec_128 +___ + +$code .= <<___; +.p2align 3 +L_ecb_dec_192: + # Load all 13 round keys to v1-v13 registers. + @{[aes_192_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_192_decrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_dec_192,.-L_ecb_dec_192 +___ + +$code .= <<___; +.p2align 3 +L_ecb_dec_256: + # Load all 15 round keys to v1-v15 registers. + @{[aes_256_load_key $KEYP]} + +1: + @{[vsetvli $VL, $LEN32, "e32", "m4", "ta", "ma"]} + slli $T0, $VL, 2 + sub $LEN32, $LEN32, $VL + + @{[vle32_v $V24, $INP]} + + # AES body + @{[aes_256_decrypt]} + + @{[vse32_v $V24, $OUTP]} + + add $INP, $INP, $T0 + add $OUTP, $OUTP, $T0 + + bnez $LEN32, 1b + + ret +.size L_ecb_dec_256,.-L_ecb_dec_256 +___ +} + { ################################################################################ # void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out, @@ -98,42 +845,42 @@ $code .= <<___; L_enc_128: @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} - @{[vle32_v $V1, ($INP)]} + @{[vle32_v $V1, $INP]} - @{[vle32_v $V10, ($KEYP)]} + @{[vle32_v $V10, $KEYP]} @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3] addi $KEYP, $KEYP, 16 - @{[vle32_v $V11, ($KEYP)]} + @{[vle32_v $V11, $KEYP]} @{[vaesem_vs $V1, $V11]} # with round key w[ 4, 7] addi $KEYP, $KEYP, 16 - @{[vle32_v $V12, ($KEYP)]} + @{[vle32_v $V12, $KEYP]} @{[vaesem_vs $V1, $V12]} # with round key w[ 8,11] addi $KEYP, $KEYP, 16 - @{[vle32_v $V13, ($KEYP)]} + @{[vle32_v $V13, $KEYP]} @{[vaesem_vs $V1, $V13]} # with round key w[12,15] addi $KEYP, $KEYP, 16 - @{[vle32_v $V14, ($KEYP)]} + @{[vle32_v $V14, $KEYP]} @{[vaesem_vs $V1, $V14]} # with round key w[16,19] addi $KEYP, $KEYP, 16 - @{[vle32_v $V15, ($KEYP)]} + @{[vle32_v $V15, $KEYP]} @{[vaesem_vs $V1, $V15]} # with round key w[20,23] addi $KEYP, $KEYP, 16 - @{[vle32_v $V16, ($KEYP)]} + @{[vle32_v $V16, $KEYP]} @{[vaesem_vs $V1, $V16]} # with round key w[24,27] addi $KEYP, $KEYP, 16 - @{[vle32_v $V17, ($KEYP)]} + @{[vle32_v $V17, $KEYP]} @{[vaesem_vs $V1, $V17]} # with round key w[28,31] addi $KEYP, $KEYP, 16 - @{[vle32_v $V18, ($KEYP)]} + @{[vle32_v $V18, $KEYP]} @{[vaesem_vs $V1, $V18]} # with round key w[32,35] addi $KEYP, $KEYP, 16 - @{[vle32_v $V19, ($KEYP)]} + @{[vle32_v $V19, $KEYP]} @{[vaesem_vs $V1, $V19]} # with round key w[36,39] addi $KEYP, $KEYP, 16 - @{[vle32_v $V20, ($KEYP)]} + @{[vle32_v $V20, $KEYP]} @{[vaesef_vs $V1, $V20]} # with round key w[40,43] - @{[vse32_v $V1, ($OUTP)]} + @{[vse32_v $V1, $OUTP]} ret .size L_enc_128,.-L_enc_128 @@ -144,48 +891,48 @@ $code .= <<___; L_enc_192: @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} - @{[vle32_v $V1, ($INP)]} + @{[vle32_v $V1, $INP]} - @{[vle32_v $V10, ($KEYP)]} + @{[vle32_v $V10, $KEYP]} @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3] addi $KEYP, $KEYP, 16 - @{[vle32_v $V11, ($KEYP)]} + @{[vle32_v $V11, $KEYP]} @{[vaesem_vs $V1, $V11]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V12, ($KEYP)]} + @{[vle32_v $V12, $KEYP]} @{[vaesem_vs $V1, $V12]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V13, ($KEYP)]} + @{[vle32_v $V13, $KEYP]} @{[vaesem_vs $V1, $V13]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V14, ($KEYP)]} + @{[vle32_v $V14, $KEYP]} @{[vaesem_vs $V1, $V14]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V15, ($KEYP)]} + @{[vle32_v $V15, $KEYP]} @{[vaesem_vs $V1, $V15]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V16, ($KEYP)]} + @{[vle32_v $V16, $KEYP]} @{[vaesem_vs $V1, $V16]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V17, ($KEYP)]} + @{[vle32_v $V17, $KEYP]} @{[vaesem_vs $V1, $V17]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V18, ($KEYP)]} + @{[vle32_v $V18, $KEYP]} @{[vaesem_vs $V1, $V18]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V19, ($KEYP)]} + @{[vle32_v $V19, $KEYP]} @{[vaesem_vs $V1, $V19]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V20, ($KEYP)]} + @{[vle32_v $V20, $KEYP]} @{[vaesem_vs $V1, $V20]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V21, ($KEYP)]} + @{[vle32_v $V21, $KEYP]} @{[vaesem_vs $V1, $V21]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V22, ($KEYP)]} + @{[vle32_v $V22, $KEYP]} @{[vaesef_vs $V1, $V22]} - @{[vse32_v $V1, ($OUTP)]} + @{[vse32_v $V1, $OUTP]} ret .size L_enc_192,.-L_enc_192 ___ @@ -195,54 +942,54 @@ $code .= <<___; L_enc_256: @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} - @{[vle32_v $V1, ($INP)]} + @{[vle32_v $V1, $INP]} - @{[vle32_v $V10, ($KEYP)]} + @{[vle32_v $V10, $KEYP]} @{[vaesz_vs $V1, $V10]} # with round key w[ 0, 3] addi $KEYP, $KEYP, 16 - @{[vle32_v $V11, ($KEYP)]} + @{[vle32_v $V11, $KEYP]} @{[vaesem_vs $V1, $V11]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V12, ($KEYP)]} + @{[vle32_v $V12, $KEYP]} @{[vaesem_vs $V1, $V12]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V13, ($KEYP)]} + @{[vle32_v $V13, $KEYP]} @{[vaesem_vs $V1, $V13]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V14, ($KEYP)]} + @{[vle32_v $V14, $KEYP]} @{[vaesem_vs $V1, $V14]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V15, ($KEYP)]} + @{[vle32_v $V15, $KEYP]} @{[vaesem_vs $V1, $V15]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V16, ($KEYP)]} + @{[vle32_v $V16, $KEYP]} @{[vaesem_vs $V1, $V16]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V17, ($KEYP)]} + @{[vle32_v $V17, $KEYP]} @{[vaesem_vs $V1, $V17]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V18, ($KEYP)]} + @{[vle32_v $V18, $KEYP]} @{[vaesem_vs $V1, $V18]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V19, ($KEYP)]} + @{[vle32_v $V19, $KEYP]} @{[vaesem_vs $V1, $V19]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V20, ($KEYP)]} + @{[vle32_v $V20, $KEYP]} @{[vaesem_vs $V1, $V20]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V21, ($KEYP)]} + @{[vle32_v $V21, $KEYP]} @{[vaesem_vs $V1, $V21]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V22, ($KEYP)]} + @{[vle32_v $V22, $KEYP]} @{[vaesem_vs $V1, $V22]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V23, ($KEYP)]} + @{[vle32_v $V23, $KEYP]} @{[vaesem_vs $V1, $V23]} addi $KEYP, $KEYP, 16 - @{[vle32_v $V24, ($KEYP)]} + @{[vle32_v $V24, $KEYP]} @{[vaesef_vs $V1, $V24]} - @{[vse32_v $V1, ($OUTP)]} + @{[vse32_v $V1, $OUTP]} ret .size L_enc_256,.-L_enc_256 ___ @@ -275,43 +1022,43 @@ $code .= <<___; L_dec_128: @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} - @{[vle32_v $V1, ($INP)]} + @{[vle32_v $V1, $INP]} addi $KEYP, $KEYP, 160 - @{[vle32_v $V20, ($KEYP)]} + @{[vle32_v $V20, $KEYP]} @{[vaesz_vs $V1, $V20]} # with round key w[40,43] addi $KEYP, $KEYP, -16 - @{[vle32_v $V19, ($KEYP)]} + @{[vle32_v $V19, $KEYP]} @{[vaesdm_vs $V1, $V19]} # with round key w[36,39] addi $KEYP, $KEYP, -16 - @{[vle32_v $V18, ($KEYP)]} + @{[vle32_v $V18, $KEYP]} @{[vaesdm_vs $V1, $V18]} # with round key w[32,35] addi $KEYP, $KEYP, -16 - @{[vle32_v $V17, ($KEYP)]} + @{[vle32_v $V17, $KEYP]} @{[vaesdm_vs $V1, $V17]} # with round key w[28,31] addi $KEYP, $KEYP, -16 - @{[vle32_v $V16, ($KEYP)]} + @{[vle32_v $V16, $KEYP]} @{[vaesdm_vs $V1, $V16]} # with round key w[24,27] addi $KEYP, $KEYP, -16 - @{[vle32_v $V15, ($KEYP)]} + @{[vle32_v $V15, $KEYP]} @{[vaesdm_vs $V1, $V15]} # with round key w[20,23] addi $KEYP, $KEYP, -16 - @{[vle32_v $V14, ($KEYP)]} + @{[vle32_v $V14, $KEYP]} @{[vaesdm_vs $V1, $V14]} # with round key w[16,19] addi $KEYP, $KEYP, -16 - @{[vle32_v $V13, ($KEYP)]} + @{[vle32_v $V13, $KEYP]} @{[vaesdm_vs $V1, $V13]} # with round key w[12,15] addi $KEYP, $KEYP, -16 - @{[vle32_v $V12, ($KEYP)]} + @{[vle32_v $V12, $KEYP]} @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11] addi $KEYP, $KEYP, -16 - @{[vle32_v $V11, ($KEYP)]} + @{[vle32_v $V11, $KEYP]} @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7] addi $KEYP, $KEYP, -16 - @{[vle32_v $V10, ($KEYP)]} + @{[vle32_v $V10, $KEYP]} @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3] - @{[vse32_v $V1, ($OUTP)]} + @{[vse32_v $V1, $OUTP]} ret .size L_dec_128,.-L_dec_128 @@ -322,49 +1069,49 @@ $code .= <<___; L_dec_192: @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} - @{[vle32_v $V1, ($INP)]} + @{[vle32_v $V1, $INP]} addi $KEYP, $KEYP, 192 - @{[vle32_v $V22, ($KEYP)]} + @{[vle32_v $V22, $KEYP]} @{[vaesz_vs $V1, $V22]} # with round key w[48,51] addi $KEYP, $KEYP, -16 - @{[vle32_v $V21, ($KEYP)]} + @{[vle32_v $V21, $KEYP]} @{[vaesdm_vs $V1, $V21]} # with round key w[44,47] addi $KEYP, $KEYP, -16 - @{[vle32_v $V20, ($KEYP)]} + @{[vle32_v $V20, $KEYP]} @{[vaesdm_vs $V1, $V20]} # with round key w[40,43] addi $KEYP, $KEYP, -16 - @{[vle32_v $V19, ($KEYP)]} + @{[vle32_v $V19, $KEYP]} @{[vaesdm_vs $V1, $V19]} # with round key w[36,39] addi $KEYP, $KEYP, -16 - @{[vle32_v $V18, ($KEYP)]} + @{[vle32_v $V18, $KEYP]} @{[vaesdm_vs $V1, $V18]} # with round key w[32,35] addi $KEYP, $KEYP, -16 - @{[vle32_v $V17, ($KEYP)]} + @{[vle32_v $V17, $KEYP]} @{[vaesdm_vs $V1, $V17]} # with round key w[28,31] addi $KEYP, $KEYP, -16 - @{[vle32_v $V16, ($KEYP)]} + @{[vle32_v $V16, $KEYP]} @{[vaesdm_vs $V1, $V16]} # with round key w[24,27] addi $KEYP, $KEYP, -16 - @{[vle32_v $V15, ($KEYP)]} + @{[vle32_v $V15, $KEYP]} @{[vaesdm_vs $V1, $V15]} # with round key w[20,23] addi $KEYP, $KEYP, -16 - @{[vle32_v $V14, ($KEYP)]} + @{[vle32_v $V14, $KEYP]} @{[vaesdm_vs $V1, $V14]} # with round key w[16,19] addi $KEYP, $KEYP, -16 - @{[vle32_v $V13, ($KEYP)]} + @{[vle32_v $V13, $KEYP]} @{[vaesdm_vs $V1, $V13]} # with round key w[12,15] addi $KEYP, $KEYP, -16 - @{[vle32_v $V12, ($KEYP)]} + @{[vle32_v $V12, $KEYP]} @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11] addi $KEYP, $KEYP, -16 - @{[vle32_v $V11, ($KEYP)]} + @{[vle32_v $V11, $KEYP]} @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7] addi $KEYP, $KEYP, -16 - @{[vle32_v $V10, ($KEYP)]} + @{[vle32_v $V10, $KEYP]} @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3] - @{[vse32_v $V1, ($OUTP)]} + @{[vse32_v $V1, $OUTP]} ret .size L_dec_192,.-L_dec_192 @@ -375,55 +1122,55 @@ $code .= <<___; L_dec_256: @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} - @{[vle32_v $V1, ($INP)]} + @{[vle32_v $V1, $INP]} addi $KEYP, $KEYP, 224 - @{[vle32_v $V24, ($KEYP)]} + @{[vle32_v $V24, $KEYP]} @{[vaesz_vs $V1, $V24]} # with round key w[56,59] addi $KEYP, $KEYP, -16 - @{[vle32_v $V23, ($KEYP)]} + @{[vle32_v $V23, $KEYP]} @{[vaesdm_vs $V1, $V23]} # with round key w[52,55] addi $KEYP, $KEYP, -16 - @{[vle32_v $V22, ($KEYP)]} + @{[vle32_v $V22, $KEYP]} @{[vaesdm_vs $V1, $V22]} # with round key w[48,51] addi $KEYP, $KEYP, -16 - @{[vle32_v $V21, ($KEYP)]} + @{[vle32_v $V21, $KEYP]} @{[vaesdm_vs $V1, $V21]} # with round key w[44,47] addi $KEYP, $KEYP, -16 - @{[vle32_v $V20, ($KEYP)]} + @{[vle32_v $V20, $KEYP]} @{[vaesdm_vs $V1, $V20]} # with round key w[40,43] addi $KEYP, $KEYP, -16 - @{[vle32_v $V19, ($KEYP)]} + @{[vle32_v $V19, $KEYP]} @{[vaesdm_vs $V1, $V19]} # with round key w[36,39] addi $KEYP, $KEYP, -16 - @{[vle32_v $V18, ($KEYP)]} + @{[vle32_v $V18, $KEYP]} @{[vaesdm_vs $V1, $V18]} # with round key w[32,35] addi $KEYP, $KEYP, -16 - @{[vle32_v $V17, ($KEYP)]} + @{[vle32_v $V17, $KEYP]} @{[vaesdm_vs $V1, $V17]} # with round key w[28,31] addi $KEYP, $KEYP, -16 - @{[vle32_v $V16, ($KEYP)]} + @{[vle32_v $V16, $KEYP]} @{[vaesdm_vs $V1, $V16]} # with round key w[24,27] addi $KEYP, $KEYP, -16 - @{[vle32_v $V15, ($KEYP)]} + @{[vle32_v $V15, $KEYP]} @{[vaesdm_vs $V1, $V15]} # with round key w[20,23] addi $KEYP, $KEYP, -16 - @{[vle32_v $V14, ($KEYP)]} + @{[vle32_v $V14, $KEYP]} @{[vaesdm_vs $V1, $V14]} # with round key w[16,19] addi $KEYP, $KEYP, -16 - @{[vle32_v $V13, ($KEYP)]} + @{[vle32_v $V13, $KEYP]} @{[vaesdm_vs $V1, $V13]} # with round key w[12,15] addi $KEYP, $KEYP, -16 - @{[vle32_v $V12, ($KEYP)]} + @{[vle32_v $V12, $KEYP]} @{[vaesdm_vs $V1, $V12]} # with round key w[ 8,11] addi $KEYP, $KEYP, -16 - @{[vle32_v $V11, ($KEYP)]} + @{[vle32_v $V11, $KEYP]} @{[vaesdm_vs $V1, $V11]} # with round key w[ 4, 7] addi $KEYP, $KEYP, -16 - @{[vle32_v $V10, ($KEYP)]} + @{[vle32_v $V10, $KEYP]} @{[vaesdf_vs $V1, $V10]} # with round key w[ 0, 3] - @{[vse32_v $V1, ($OUTP)]} + @{[vse32_v $V1, $OUTP]} ret .size L_dec_256,.-L_dec_256 From patchwork Wed Oct 25 18:36:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436520 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DC07C07545 for ; Wed, 25 Oct 2023 18:38:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235023AbjJYSh7 (ORCPT ); Wed, 25 Oct 2023 14:37:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234969AbjJYSha (ORCPT ); Wed, 25 Oct 2023 14:37:30 -0400 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E92151AB for ; Wed, 25 Oct 2023 11:37:24 -0700 (PDT) Received: by mail-pg1-x531.google.com with SMTP id 41be03b00d2f7-578b4997decso90695a12.0 for ; Wed, 25 Oct 2023 11:37:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259044; x=1698863844; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l3sIOyZ5p5Uajadw8K50qTKq2/dD0THvRCAR8HIfH6Y=; b=XIUuc6JC9IEcYhaNy7CMwVUskbHCU2XMO8aw1FKHweCDXRt1gf9UclnzkTl7biWcFA YCzDDfCL19ZNAZai1w1AYpG8f8jaEoHBuYBg1jPjLFGFqgjyiiYTryK0rxu5jcoR2Kmg vPdzMRDF0olZgEb+szIZmbq2QaYbxGcny7TADBdPazTqR1uod/AtaiedZd2c9gLFrLPU l6RmudDLPYQqSGHFMlxg9jM4A2HsUKYhSM1ms0PtLoqBqeQLA16NTElti29R7BgVVfy2 lDZp/nWfN4gTu4KOHNWa8brX+cyddpdNKrx7mAjzxWtYtc8voUA77FlzlX5QX+w46T96 e5RQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259044; x=1698863844; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l3sIOyZ5p5Uajadw8K50qTKq2/dD0THvRCAR8HIfH6Y=; b=ZMMC934ZTsGW0N3Z59zJF3uiPP4kB0AcQC9UYBYWk13eqau4s6WkZl2vcS7VedM7xo QBbG6JHGH6uK55JIou2rOCDFxayI2BuTEnBdyhR7bNrgzxYNs3I5lJX89MFx8zo0if6M Swgq92B+okJsi8cor5RRZM56eZrP309+hoX+oyrYAZ6DTjDBE/cyxY5D3HAeMTrpgGsC 9ztZqcpWKh2IbPxnp1DCP/ci2DeSjuOGHuouWcJQjD9nheTx0hWEEI+2MknD4KV4jKNJ xzDmfeL8lZeYwGtVw5eNvAMc9Y9TalzWBTve0qL16Z47ydv8TrJ6xcFnqBysL+khun2i oWcQ== X-Gm-Message-State: AOJu0Yyal4ZVRio5gKqfl9aKFf7fdR87BS2WXlKxmtheWWEleoTsNsnv 9p6Nw5YIQT5q3sa4LoGhyPnD9w== X-Google-Smtp-Source: AGHT+IGu8yxKc9K1usDnSzC0aQvB+RBgSE4w4NZky1E/bDehLZ+KuUk4ZVCDlNGg1awjDL3F+ZoeYA== X-Received: by 2002:a17:90b:288e:b0:27d:7f1b:1bec with SMTP id qc14-20020a17090b288e00b0027d7f1b1becmr14664038pjb.35.1698259044137; Wed, 25 Oct 2023 11:37:24 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:23 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 07/12] RISC-V: crypto: add Zvkg accelerated GCM GHASH implementation Date: Thu, 26 Oct 2023 02:36:39 +0800 Message-Id: <20231025183644.8735-8-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add a gcm hash implementation using the Zvkg extension from OpenSSL (openssl/openssl#21923). The perlasm here is different from the original implementation in OpenSSL. The OpenSSL assumes that the H is stored in little-endian. Thus, it needs to convert the H to big-endian for Zvkg instructions. In kernel, we have the big-endian H directly. There is no need for endian conversion. Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih --- arch/riscv/crypto/Kconfig | 14 ++ arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/ghash-riscv64-glue.c | 191 ++++++++++++++++++++++++ arch/riscv/crypto/ghash-riscv64-zvkg.pl | 100 +++++++++++++ 4 files changed, 312 insertions(+) create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index dfa9d0146d26..00be7177eb1e 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -35,4 +35,18 @@ config CRYPTO_AES_BLOCK_RISCV64 - Zvkg vector crypto extension (XTS) - Zvkned vector crypto extension +config CRYPTO_GHASH_RISCV64 + default y if RISCV_ISA_V + tristate "Hash functions: GHASH" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_GCM + select CRYPTO_GHASH + select CRYPTO_HASH + select CRYPTO_LIB_GF128MUL + help + GCM GHASH function (NIST SP 800-38D) + + Architecture: riscv64 using: + - Zvkg vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 42a4e8ec79cf..532316cc1758 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -9,6 +9,9 @@ aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkned.o aes-riscv64-zvkb-zvkned.o +obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o +ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o + quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -21,6 +24,10 @@ $(obj)/aes-riscv64-zvbb-zvkg-zvkned.S: $(src)/aes-riscv64-zvbb-zvkg-zvkned.pl $(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl $(call cmd,perlasm) +$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl + $(call cmd,perlasm) + clean-files += aes-riscv64-zvkned.S clean-files += aes-riscv64-zvbb-zvkg-zvkned.S clean-files += aes-riscv64-zvkb-zvkned.S +clean-files += ghash-riscv64-zvkg.S diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c new file mode 100644 index 000000000000..d5b7f0e4f612 --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-glue.c @@ -0,0 +1,191 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * RISC-V optimized GHASH routines + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* ghash using zvkg vector crypto extension */ +void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *inp, size_t len); + +struct riscv64_ghash_context { + be128 key; +}; + +struct riscv64_ghash_desc_ctx { + be128 shash; + u8 buffer[GHASH_BLOCK_SIZE]; + u32 bytes; +}; + +typedef void (*ghash_func)(be128 *Xi, const be128 *H, const u8 *inp, + size_t len); + +static inline void ghash_blocks(const struct riscv64_ghash_context *ctx, + struct riscv64_ghash_desc_ctx *dctx, + const u8 *src, size_t srclen, ghash_func func) +{ + if (crypto_simd_usable()) { + kernel_vector_begin(); + func(&dctx->shash, &ctx->key, src, srclen); + kernel_vector_end(); + } else { + while (srclen >= GHASH_BLOCK_SIZE) { + crypto_xor((u8 *)&dctx->shash, src, GHASH_BLOCK_SIZE); + gf128mul_lle(&dctx->shash, &ctx->key); + srclen -= GHASH_BLOCK_SIZE; + src += GHASH_BLOCK_SIZE; + } + } +} + +static int ghash_update(struct shash_desc *desc, const u8 *src, size_t srclen, + ghash_func func) +{ + size_t len; + const struct riscv64_ghash_context *ctx = + crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + if (dctx->bytes) { + if (dctx->bytes + srclen < GHASH_BLOCK_SIZE) { + memcpy(dctx->buffer + dctx->bytes, src, srclen); + dctx->bytes += srclen; + return 0; + } + memcpy(dctx->buffer + dctx->bytes, src, + GHASH_BLOCK_SIZE - dctx->bytes); + + ghash_blocks(ctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE, func); + + src += GHASH_BLOCK_SIZE - dctx->bytes; + srclen -= GHASH_BLOCK_SIZE - dctx->bytes; + dctx->bytes = 0; + } + len = srclen & ~(GHASH_BLOCK_SIZE - 1); + + if (len) { + ghash_blocks(ctx, dctx, src, len, func); + src += len; + srclen -= len; + } + + if (srclen) { + memcpy(dctx->buffer, src, srclen); + dctx->bytes = srclen; + } + + return 0; +} + +static int ghash_final(struct shash_desc *desc, u8 *out, ghash_func func) +{ + const struct riscv64_ghash_context *ctx = + crypto_tfm_ctx(crypto_shash_tfm(desc->tfm)); + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + int i; + + if (dctx->bytes) { + for (i = dctx->bytes; i < GHASH_BLOCK_SIZE; i++) + dctx->buffer[i] = 0; + + ghash_blocks(ctx, dctx, dctx->buffer, GHASH_BLOCK_SIZE, func); + dctx->bytes = 0; + } + + memcpy(out, &dctx->shash, GHASH_DIGEST_SIZE); + + return 0; +} + +static int ghash_init(struct shash_desc *desc) +{ + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); + + *dctx = (struct riscv64_ghash_desc_ctx){}; + + return 0; +} + +static int ghash_update_zvkg(struct shash_desc *desc, const u8 *src, + unsigned int srclen) +{ + return ghash_update(desc, src, srclen, gcm_ghash_rv64i_zvkg); +} + +static int ghash_final_zvkg(struct shash_desc *desc, u8 *out) +{ + return ghash_final(desc, out, gcm_ghash_rv64i_zvkg); +} + +static int ghash_setkey(struct crypto_shash *tfm, const u8 *key, + unsigned int keylen) +{ + struct riscv64_ghash_context *ctx = + crypto_tfm_ctx(crypto_shash_tfm(tfm)); + + if (keylen != GHASH_BLOCK_SIZE) + return -EINVAL; + + memcpy(&ctx->key, key, GHASH_BLOCK_SIZE); + + return 0; +} + +static struct shash_alg riscv64_ghash_alg_zvkg = { + .digestsize = GHASH_DIGEST_SIZE, + .init = ghash_init, + .update = ghash_update_zvkg, + .final = ghash_final_zvkg, + .setkey = ghash_setkey, + .descsize = sizeof(struct riscv64_ghash_desc_ctx), + .base = { + .cra_name = "ghash", + .cra_driver_name = "ghash-riscv64-zvkg", + .cra_priority = 303, + .cra_blocksize = GHASH_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct riscv64_ghash_context), + .cra_module = THIS_MODULE, + }, +}; + +static inline bool check_ghash_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKG) && + riscv_vector_vlen() >= 128; +} + +static int __init riscv64_ghash_mod_init(void) +{ + if (check_ghash_ext()) + return crypto_register_shash(&riscv64_ghash_alg_zvkg); + + return -ENODEV; +} + +static void __exit riscv64_ghash_mod_fini(void) +{ + if (check_ghash_ext()) + crypto_unregister_shash(&riscv64_ghash_alg_zvkg); +} + +module_init(riscv64_ghash_mod_init); +module_exit(riscv64_ghash_mod_fini); + +MODULE_DESCRIPTION("GCM GHASH (RISC-V accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("ghash"); diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/ghash-riscv64-zvkg.pl new file mode 100644 index 000000000000..4beea4ac9cbe --- /dev/null +++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl @@ -0,0 +1,100 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph Müllner +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >= 128 +# - RISC-V Vector GCM/GMAC extension ('Zvkg') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +############################################################################### +# void gcm_ghash_rv64i_zvkg(be128 *Xi, const be128 *H, const u8 *inp, size_t len) +# +# input: Xi: current hash value +# H: hash key +# inp: pointer to input data +# len: length of input data in bytes (multiple of block size) +# output: Xi: Xi+1 (next hash value Xi) +{ +my ($Xi,$H,$inp,$len) = ("a0","a1","a2","a3"); +my ($vXi,$vH,$vinp,$Vzero) = ("v1","v2","v3","v4"); + +$code .= <<___; +.p2align 3 +.globl gcm_ghash_rv64i_zvkg +.type gcm_ghash_rv64i_zvkg,\@function +gcm_ghash_rv64i_zvkg: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vle32_v $vH, $H]} + @{[vle32_v $vXi, $Xi]} + +Lstep: + @{[vle32_v $vinp, $inp]} + add $inp, $inp, 16 + add $len, $len, -16 + @{[vghsh_vv $vXi, $vH, $vinp]} + bnez $len, Lstep + + @{[vse32_v $vXi, $Xi]} + ret + +.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Wed Oct 25 18:36:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436522 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A4BFC07545 for ; Wed, 25 Oct 2023 18:38:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234866AbjJYSiQ (ORCPT ); Wed, 25 Oct 2023 14:38:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234995AbjJYSh5 (ORCPT ); Wed, 25 Oct 2023 14:37:57 -0400 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 053EBD4D for ; Wed, 25 Oct 2023 11:37:29 -0700 (PDT) Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-5a9bc2ec556so92389a12.0 for ; Wed, 25 Oct 2023 11:37:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259048; x=1698863848; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GFhsyXyqq7vflrUvLHWxOLv+JErs0KUA8a4MpjUfb6k=; b=Iy0i9U8iuY45MTcsowiH9w/BygERagIoWqpA3o/ALY9Wep+AvQCNMp0SLNyi4MVLIP Xs8TXaMoPvwDPfO5YM742+EIE6Nt1NqRAdnOq3/ErwBX3P0xg2WMQxeJg5p34nNRS3iI v3hMClCj3Y9ggxE0MItkOekJwBkhk9asIMMYQT8ciz57aos7PAMhBcINabBqoNi+ebYB nDXhSwSjn7sLDAMU3+1/Hu48wI+HC3m/AFRt6YiFmGGfitt0SiDB4pVbnS6zLCmD11ca VA47odCHTM+Gn5vMsDwN4UUyL4mvC2VjwLJE+dSa5pDrnhp/q79QjSLMR/4H2ACLR/ZS NhNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259048; x=1698863848; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GFhsyXyqq7vflrUvLHWxOLv+JErs0KUA8a4MpjUfb6k=; b=SX2zL5YnvEIAEbJCkjh14E+kIncRW4l76lSs6oa9ExCApa96umtadlpdIJgMZMYHGL H8P1gOIAG0vWJ0kLjlPjp6De6DuwX89eq/HldF+gKJaRj7bynqTaRuY4bj7cyi1f/UMu vo4VoRPkUe24sgXBjK2iSDIskuNg6LaPV7Sl3DAtHX3pacMJegyqNuvI18LmNRHODcod 1xAxXm2wbHxMRhz+dh1PrKIDW0mRC+0nB5whisnRmZ6e44AN8F8ogi0QELWcUHBxIJpX 6q/qpnDt0VhjI65Hkjh3xZoJ2/hNgkX2b08c0Q6Z4+6pB7kZJH7qS967Fn0DR/C472Gz pEbg== X-Gm-Message-State: AOJu0YyGOknftYUVV1VBrOO4LIUAVL9jIKMorguh7CkZp2keZHwh62hh Fu8j0jF8VvOY6Cuc6NgkH5jqcw== X-Google-Smtp-Source: AGHT+IEOlkCWlpWa0D8YyaNcxzwVuxyDQKs6Eh/An49WnNeVSvp6OxIlA5f8xkM1IDNcrtptMmj01g== X-Received: by 2002:a17:90a:1a15:b0:27d:b22b:fb89 with SMTP id 21-20020a17090a1a1500b0027db22bfb89mr12033741pjk.35.1698259048196; Wed, 25 Oct 2023 11:37:28 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:27 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 08/12] RISC-V: crypto: add Zvknha/b accelerated SHA224/256 implementations Date: Thu, 26 Oct 2023 02:36:40 +0800 Message-Id: <20231025183644.8735-9-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add SHA224 and 256 implementations using Zvknha or Zvknhb vector crypto extensions from OpenSSL(openssl/openssl#21923). Co-developed-by: Charalampos Mitrodimas Signed-off-by: Charalampos Mitrodimas Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- arch/riscv/crypto/Kconfig | 13 + arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sha256-riscv64-glue.c | 140 ++++++++ .../sha256-riscv64-zvkb-zvknha_or_zvknhb.pl | 318 ++++++++++++++++++ 4 files changed, 478 insertions(+) create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c create mode 100644 arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 00be7177eb1e..dbc86592063a 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -49,4 +49,17 @@ config CRYPTO_GHASH_RISCV64 Architecture: riscv64 using: - Zvkg vector crypto extension +config CRYPTO_SHA256_RISCV64 + default y if RISCV_ISA_V + tristate "Hash functions: SHA-224 and SHA-256" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_HASH + select CRYPTO_SHA256 + help + SHA-256 secure hash algorithm (FIPS 180) + + Architecture: riscv64 using: + - Zvkb vector crypto extension + - Zvknha or Zvknhb vector crypto extensions + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 532316cc1758..49577fb72cca 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -12,6 +12,9 @@ aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkne obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o +obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o +sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o + quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -27,7 +30,11 @@ $(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl $(call cmd,perlasm) +$(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl + $(call cmd,perlasm) + clean-files += aes-riscv64-zvkned.S clean-files += aes-riscv64-zvbb-zvkg-zvkned.S clean-files += aes-riscv64-zvkb-zvkned.S clean-files += ghash-riscv64-zvkg.S +clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c new file mode 100644 index 000000000000..35acfb6e4b2d --- /dev/null +++ b/arch/riscv/crypto/sha256-riscv64-glue.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISC-V 64 + * + * Copyright (C) 2022 VRULL GmbH + * Author: Heiko Stuebner + */ + +#include +#include +#include +#include +#include +#include +#include + +/* + * sha256 using zvkb and zvknha/b vector crypto extension + * + * This asm function will just take the first 256-bit as the sha256 state from + * the pointer to `struct sha256_state`. + */ +void sha256_block_data_order_zvkb_zvknha_or_zvknhb(struct sha256_state *digest, + const u8 *data, + int num_blks); + +static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + int ret = 0; + + /* + * Make sure struct sha256_state begins directly with the SHA256 + * 256-bit internal state, as this is what the asm function expect. + */ + BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + ret = sha256_base_do_update( + desc, data, len, + sha256_block_data_order_zvkb_zvknha_or_zvknhb); + kernel_vector_end(); + } else { + ret = crypto_sha256_update(desc, data, len); + } + + return ret; +} + +static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + if (crypto_simd_usable()) { + kernel_vector_begin(); + if (len) + sha256_base_do_update( + desc, data, len, + sha256_block_data_order_zvkb_zvknha_or_zvknhb); + sha256_base_do_finalize( + desc, sha256_block_data_order_zvkb_zvknha_or_zvknhb); + kernel_vector_end(); + + return sha256_base_finish(desc, out); + } + + return crypto_sha256_finup(desc, data, len, out); +} + +static int riscv64_sha256_final(struct shash_desc *desc, u8 *out) +{ + return riscv64_sha256_finup(desc, NULL, 0, out); +} + +static struct shash_alg sha256_alg[] = { + { + .digestsize = SHA256_DIGEST_SIZE, + .init = sha256_base_init, + .update = riscv64_sha256_update, + .final = riscv64_sha256_final, + .finup = riscv64_sha256_finup, + .descsize = sizeof(struct sha256_state), + .base.cra_name = "sha256", + .base.cra_driver_name = "sha256-riscv64-zvkb-zvknha_or_zvknhb", + .base.cra_priority = 150, + .base.cra_blocksize = SHA256_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, + }, + { + .digestsize = SHA224_DIGEST_SIZE, + .init = sha224_base_init, + .update = riscv64_sha256_update, + .final = riscv64_sha256_final, + .finup = riscv64_sha256_finup, + .descsize = sizeof(struct sha256_state), + .base.cra_name = "sha224", + .base.cra_driver_name = "sha224-riscv64-zvkb-zvknha_or_zvknhb", + .base.cra_priority = 150, + .base.cra_blocksize = SHA224_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, + } +}; + +static inline bool check_sha256_ext(void) +{ + /* + * From the spec: + * The Zvknhb ext supports both SHA-256 and SHA-512 and Zvknha only + * supports SHA-256. + */ + return (riscv_isa_extension_available(NULL, ZVKNHA) || + riscv_isa_extension_available(NULL, ZVKNHB)) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 128; +} + +static int __init riscv64_sha256_mod_init(void) +{ + if (check_sha256_ext()) + return crypto_register_shashes(sha256_alg, + ARRAY_SIZE(sha256_alg)); + + return -ENODEV; +} + +static void __exit riscv64_sha256_mod_fini(void) +{ + if (check_sha256_ext()) + crypto_unregister_shashes(sha256_alg, ARRAY_SIZE(sha256_alg)); +} + +module_init(riscv64_sha256_mod_init); +module_exit(riscv64_sha256_mod_fini); + +MODULE_DESCRIPTION("SHA-256 (RISC-V accelerated)"); +MODULE_AUTHOR("Andy Polyakov "); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("sha224"); +MODULE_ALIAS_CRYPTO("sha256"); diff --git a/arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl b/arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl new file mode 100644 index 000000000000..51b2d9d4f8f1 --- /dev/null +++ b/arch/riscv/crypto/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl @@ -0,0 +1,318 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph Müllner +# Copyright (c) 2023, Phoebe Chen +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The generated code of this file depends on the following RISC-V extensions: +# - RV64I +# - RISC-V Vector ('V') with VLEN >= 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknha' or 'Zvknhb') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) = map("v$_",(0..31)); + +my $K256 = "K256"; + +# Function arguments +my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) = ("a0", "a1", "a2", "a3", "t3", "t4"); + +sub sha_256_load_constant { + my $code=<<___; + la $KT, $K256 # Load round constants K256 + @{[vle32_v $V10, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V11, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V12, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V13, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V14, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V15, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V16, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V17, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V18, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V19, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V20, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V21, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V22, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V23, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V24, $KT]} + addi $KT, $KT, 16 + @{[vle32_v $V25, $KT]} +___ + + return $code; +} + +################################################################################ +# void sha256_block_data_order_zvkb_zvknha_or_zvknhb(void *c, const void *p, size_t len) +$code .= <<___; +.p2align 2 +.globl sha256_block_data_order_zvkb_zvknha_or_zvknhb +.type sha256_block_data_order_zvkb_zvknha_or_zvknhb,\@function +sha256_block_data_order_zvkb_zvknha_or_zvknhb: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + @{[sha_256_load_constant]} + + # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c} + # The dst vtype is e32m1 and the index vtype is e8mf4. + # We use index-load with the following index pattern at v26. + # i8 index: + # 20, 16, 4, 0 + # Instead of setting the i8 index, we could use a single 32bit + # little-endian value to cover the 4xi8 index. + # i32 value: + # 0x 00 04 10 14 + li $INDEX_PATTERN, 0x00041014 + @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]} + @{[vmv_v_x $V26, $INDEX_PATTERN]} + + addi $H2, $H, 8 + + # Use index-load to get {f,e,b,a},{h,g,d,c} + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + @{[vluxei8_v $V6, $H, $V26]} + @{[vluxei8_v $V7, $H2, $V26]} + + # Setup v0 mask for the vmerge to replace the first word (idx==0) in key-scheduling. + # The AVL is 4 in SHA, so we could use a single e8(8 element masking) for masking. + @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]} + @{[vmv_v_i $V0, 0x01]} + + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + +L_round_loop: + # Decrement length by 1 + add $LEN, $LEN, -1 + + # Keep the current state as we need it later: H' = H+{a',b',c',...,h'}. + @{[vmv_v_v $V30, $V6]} + @{[vmv_v_v $V31, $V7]} + + # Load the 512-bits of the message block in v1-v4 and perform + # an endian swap on each 4 bytes element. + @{[vle32_v $V1, $INP]} + @{[vrev8_v $V1, $V1]} + add $INP, $INP, 16 + @{[vle32_v $V2, $INP]} + @{[vrev8_v $V2, $V2]} + add $INP, $INP, 16 + @{[vle32_v $V3, $INP]} + @{[vrev8_v $V3, $V3]} + add $INP, $INP, 16 + @{[vle32_v $V4, $INP]} + @{[vrev8_v $V4, $V4]} + add $INP, $INP, 16 + + # Quad-round 0 (+0, Wt from oldest to newest in v1->v2->v3->v4) + @{[vadd_vv $V5, $V10, $V1]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V3, $V2, $V0]} + @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[19:16] + + # Quad-round 1 (+1, v2->v3->v4->v1) + @{[vadd_vv $V5, $V11, $V2]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V4, $V3, $V0]} + @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[23:20] + + # Quad-round 2 (+2, v3->v4->v1->v2) + @{[vadd_vv $V5, $V12, $V3]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V1, $V4, $V0]} + @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[27:24] + + # Quad-round 3 (+3, v4->v1->v2->v3) + @{[vadd_vv $V5, $V13, $V4]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V2, $V1, $V0]} + @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[31:28] + + # Quad-round 4 (+0, v1->v2->v3->v4) + @{[vadd_vv $V5, $V14, $V1]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V3, $V2, $V0]} + @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[35:32] + + # Quad-round 5 (+1, v2->v3->v4->v1) + @{[vadd_vv $V5, $V15, $V2]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V4, $V3, $V0]} + @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[39:36] + + # Quad-round 6 (+2, v3->v4->v1->v2) + @{[vadd_vv $V5, $V16, $V3]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V1, $V4, $V0]} + @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[43:40] + + # Quad-round 7 (+3, v4->v1->v2->v3) + @{[vadd_vv $V5, $V17, $V4]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V2, $V1, $V0]} + @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[47:44] + + # Quad-round 8 (+0, v1->v2->v3->v4) + @{[vadd_vv $V5, $V18, $V1]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V3, $V2, $V0]} + @{[vsha2ms_vv $V1, $V5, $V4]} # Generate W[51:48] + + # Quad-round 9 (+1, v2->v3->v4->v1) + @{[vadd_vv $V5, $V19, $V2]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V4, $V3, $V0]} + @{[vsha2ms_vv $V2, $V5, $V1]} # Generate W[55:52] + + # Quad-round 10 (+2, v3->v4->v1->v2) + @{[vadd_vv $V5, $V20, $V3]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V1, $V4, $V0]} + @{[vsha2ms_vv $V3, $V5, $V2]} # Generate W[59:56] + + # Quad-round 11 (+3, v4->v1->v2->v3) + @{[vadd_vv $V5, $V21, $V4]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + @{[vmerge_vvm $V5, $V2, $V1, $V0]} + @{[vsha2ms_vv $V4, $V5, $V3]} # Generate W[63:60] + + # Quad-round 12 (+0, v1->v2->v3->v4) + # Note that we stop generating new message schedule words (Wt, v1-13) + # as we already generated all the words we end up consuming (i.e., W[63:60]). + @{[vadd_vv $V5, $V22, $V1]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + + # Quad-round 13 (+1, v2->v3->v4->v1) + @{[vadd_vv $V5, $V23, $V2]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + + # Quad-round 14 (+2, v3->v4->v1->v2) + @{[vadd_vv $V5, $V24, $V3]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + + # Quad-round 15 (+3, v4->v1->v2->v3) + @{[vadd_vv $V5, $V25, $V4]} + @{[vsha2cl_vv $V7, $V6, $V5]} + @{[vsha2ch_vv $V6, $V7, $V5]} + + # H' = H+{a',b',c',...,h'} + @{[vadd_vv $V6, $V30, $V6]} + @{[vadd_vv $V7, $V31, $V7]} + bnez $LEN, L_round_loop + + # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}. + @{[vsuxei8_v $V6, $H, $V26]} + @{[vsuxei8_v $V7, $H2, $V26]} + + ret +.size sha256_block_data_order_zvkb_zvknha_or_zvknhb,.-sha256_block_data_order_zvkb_zvknha_or_zvknhb + +.p2align 2 +.type $K256,\@object +$K256: + .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5 + .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5 + .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3 + .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174 + .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc + .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da + .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7 + .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967 + .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13 + .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85 + .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3 + .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070 + .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5 + .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3 + .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208 + .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 +.size $K256,.-$K256 +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Wed Oct 25 18:36:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436524 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 914E3C0032E for ; Wed, 25 Oct 2023 18:38:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235043AbjJYSig (ORCPT ); Wed, 25 Oct 2023 14:38:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235010AbjJYSh6 (ORCPT ); Wed, 25 Oct 2023 14:37:58 -0400 Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com [IPv6:2607:f8b0:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEECD10C3 for ; Wed, 25 Oct 2023 11:37:32 -0700 (PDT) Received: by mail-pg1-x52a.google.com with SMTP id 41be03b00d2f7-5859a7d6556so105382a12.0 for ; Wed, 25 Oct 2023 11:37:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259052; x=1698863852; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7enZoXOtIU3XtaKli1cup0KAoO2RhV7XTzE5tAb0h4E=; b=HHUPa22Y7HYqn6Y4Skkl5V31Ueb+WGodFcJL43YiI8B4mg6qqg3EASryejuaTyhRTT lAZ4RyhXcthsdskZzGGCHbS4Tc5GCvFe5WfW4DkVsX09tXsC9tBjdomVrrD9vY0JYs1I xru5B0i95BSNZt+Kioo7VUI+vruyAZJWBhbbegRDuuQ+ojdFxOcOXlArmBRAbZzfun8n R14U5vHd0hsd8uW5j5JuVZLLVcgbk4J7LB7YPluNQeaqblZMy3d+KtwEeazEYL8sMbpR 7/svpkrfJzhRrMcfLrq1qsxMTSXH9wGY0SkQhP00gH6L01X7kI2AhTiW/OUfp67t5NHd berA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259052; x=1698863852; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7enZoXOtIU3XtaKli1cup0KAoO2RhV7XTzE5tAb0h4E=; b=aWfhokJJlHoQsoC2u8DH9alAkPx5N7ydowJRYQgyGKrImU9+V+UHHtdjSrT2dz8+Jn 4m2eHIYac+qf/nx/Nn31lr36ZpU0Qvlr9sxCv8OIhhgJsT5a2WkFfeqRuMtsoCGs1Edu MqLcoZrPg+R9Y9AgLXjbSjK6oOiRb9s5x79F7WiOVYSpjndnAoVowDkuSLEVCzRglJqe gZDdqjHclNWQVHw9xLmMAs5TXx4KmrYakNjN2+MMv9Xa4ha8RxTDGfhZKFIFAog3SpxY H0u58An86LghgJMDrx7+vo0lBlayJ6mj0ZTzSbJ0cnRKht7DuVnZL/6EvHRTnGezEmZi UZNw== X-Gm-Message-State: AOJu0YyXEXcERd0xbLrBcRRETgh8oFE7RvPHF/AlRC/9dxJHtETylxP3 sJU1vGtYOZPFX0ZmDugsdo7P4A== X-Google-Smtp-Source: AGHT+IFQNnAhIQ8YExaXerfZBQxg2S+wLRmCHdO0Ci+54ovI9KEgDmWZR5yOGY9s0heFc6s5y7jgEQ== X-Received: by 2002:a17:90b:510c:b0:27d:9f6:47a3 with SMTP id sc12-20020a17090b510c00b0027d09f647a3mr16548160pjb.31.1698259052145; Wed, 25 Oct 2023 11:37:32 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:31 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 09/12] RISC-V: crypto: add Zvknhb accelerated SHA384/512 implementations Date: Thu, 26 Oct 2023 02:36:41 +0800 Message-Id: <20231025183644.8735-10-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add SHA384 and 512 implementations using Zvknhb vector crypto extension from OpenSSL(openssl/openssl#21923). Co-developed-by: Charalampos Mitrodimas Signed-off-by: Charalampos Mitrodimas Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Co-developed-by: Phoebe Chen Signed-off-by: Phoebe Chen Signed-off-by: Jerry Shih --- arch/riscv/crypto/Kconfig | 13 + arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sha512-riscv64-glue.c | 133 +++++++++ .../crypto/sha512-riscv64-zvkb-zvknhb.pl | 266 ++++++++++++++++++ 4 files changed, 419 insertions(+) create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c create mode 100644 arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index dbc86592063a..a5a34777d9c3 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -62,4 +62,17 @@ config CRYPTO_SHA256_RISCV64 - Zvkb vector crypto extension - Zvknha or Zvknhb vector crypto extensions +config CRYPTO_SHA512_RISCV64 + default y if RISCV_ISA_V + tristate "Hash functions: SHA-384 and SHA-512" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_HASH + select CRYPTO_SHA512 + help + SHA-512 secure hash algorithm (FIPS 180) + + Architecture: riscv64 using: + - Zvkb vector crypto extension + - Zvknhb vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index 49577fb72cca..ebb4e861ebdc 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -15,6 +15,9 @@ ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o +obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o +sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvkb-zvknhb.o + quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -33,8 +36,12 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl $(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl $(call cmd,perlasm) +$(obj)/sha512-riscv64-zvkb-zvknhb.S: $(src)/sha512-riscv64-zvkb-zvknhb.pl + $(call cmd,perlasm) + clean-files += aes-riscv64-zvkned.S clean-files += aes-riscv64-zvbb-zvkg-zvkned.S clean-files += aes-riscv64-zvkb-zvkned.S clean-files += ghash-riscv64-zvkg.S clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S +clean-files += sha512-riscv64-zvkb-zvknhb.S diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c new file mode 100644 index 000000000000..31202b7b8c9b --- /dev/null +++ b/arch/riscv/crypto/sha512-riscv64-glue.c @@ -0,0 +1,133 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISC-V 64 + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + */ + +#include +#include +#include +#include +#include +#include +#include + +/* + * sha512 using zvkb and zvknhb vector crypto extension + * + * This asm function will just take the first 512-bit as the sha512 state from + * the pointer to `struct sha512_state`. + */ +void sha512_block_data_order_zvkb_zvknhb(struct sha512_state *digest, + const u8 *data, int num_blks); + +static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + int ret = 0; + + /* + * Make sure struct sha256_state begins directly with the SHA256 + * 256-bit internal state, as this is what the asm function expect. + */ + BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + ret = sha512_base_do_update( + desc, data, len, sha512_block_data_order_zvkb_zvknhb); + kernel_vector_end(); + } else { + ret = crypto_sha512_update(desc, data, len); + } + + return ret; +} + +static int riscv64_sha512_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + if (crypto_simd_usable()) { + kernel_vector_begin(); + if (len) + sha512_base_do_update( + desc, data, len, + sha512_block_data_order_zvkb_zvknhb); + sha512_base_do_finalize(desc, + sha512_block_data_order_zvkb_zvknhb); + kernel_vector_end(); + + return sha512_base_finish(desc, out); + } + + return crypto_sha512_finup(desc, data, len, out); +} + +static int riscv64_sha512_final(struct shash_desc *desc, u8 *out) +{ + return riscv64_sha512_finup(desc, NULL, 0, out); +} + +static struct shash_alg sha512_alg[] = { + { + .digestsize = SHA512_DIGEST_SIZE, + .init = sha512_base_init, + .update = riscv64_sha512_update, + .final = riscv64_sha512_final, + .finup = riscv64_sha512_finup, + .descsize = sizeof(struct sha512_state), + .base.cra_name = "sha512", + .base.cra_driver_name = "sha512-riscv64-zvkb-zvknhb", + .base.cra_priority = 150, + .base.cra_blocksize = SHA512_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, + }, + { + .digestsize = SHA384_DIGEST_SIZE, + .init = sha384_base_init, + .update = riscv64_sha512_update, + .final = riscv64_sha512_final, + .finup = riscv64_sha512_finup, + .descsize = sizeof(struct sha512_state), + .base.cra_name = "sha384", + .base.cra_driver_name = "sha384-riscv64-zvkb-zvknhb", + .base.cra_priority = 150, + .base.cra_blocksize = SHA384_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, + } +}; + +static inline bool check_sha512_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKNHB) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 128; +} + +static int __init riscv64_sha512_mod_init(void) +{ + if (check_sha512_ext()) + return crypto_register_shashes(sha512_alg, + ARRAY_SIZE(sha512_alg)); + + return -ENODEV; +} + +static void __exit riscv64_sha512_mod_fini(void) +{ + if (check_sha512_ext()) + crypto_unregister_shashes(sha512_alg, ARRAY_SIZE(sha512_alg)); +} + +module_init(riscv64_sha512_mod_init); +module_exit(riscv64_sha512_mod_fini); + +MODULE_DESCRIPTION("SHA-512 (RISC-V accelerated)"); +MODULE_AUTHOR("Andy Polyakov "); +MODULE_AUTHOR("Ard Biesheuvel "); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("sha384"); +MODULE_ALIAS_CRYPTO("sha512"); diff --git a/arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl b/arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl new file mode 100644 index 000000000000..4be448266a59 --- /dev/null +++ b/arch/riscv/crypto/sha512-riscv64-zvkb-zvknhb.pl @@ -0,0 +1,266 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph Müllner +# Copyright (c) 2023, Phoebe Chen +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The generated code of this file depends on the following RISC-V extensions: +# - RV64I +# - RISC-V vector ('V') with VLEN >= 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector SHA-2 Secure Hash extension ('Zvknhb') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) = map("v$_",(0..31)); + +my $K512 = "K512"; + +# Function arguments +my ($H, $INP, $LEN, $KT, $H2, $INDEX_PATTERN) = ("a0", "a1", "a2", "a3", "t3", "t4"); + +################################################################################ +# void sha512_block_data_order_zvkb_zvknhb(void *c, const void *p, size_t len) +$code .= <<___; +.p2align 2 +.globl sha512_block_data_order_zvkb_zvknhb +.type sha512_block_data_order_zvkb_zvknhb,\@function +sha512_block_data_order_zvkb_zvknhb: + @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]} + + # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c} + # The dst vtype is e64m2 and the index vtype is e8mf4. + # We use index-load with the following index pattern at v1. + # i8 index: + # 40, 32, 8, 0 + # Instead of setting the i8 index, we could use a single 32bit + # little-endian value to cover the 4xi8 index. + # i32 value: + # 0x 00 08 20 28 + li $INDEX_PATTERN, 0x00082028 + @{[vsetivli "zero", 1, "e32", "m1", "ta", "ma"]} + @{[vmv_v_x $V1, $INDEX_PATTERN]} + + addi $H2, $H, 16 + + # Use index-load to get {f,e,b,a},{h,g,d,c} + @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]} + @{[vluxei8_v $V22, $H, $V1]} + @{[vluxei8_v $V24, $H2, $V1]} + + # Setup v0 mask for the vmerge to replace the first word (idx==0) in key-scheduling. + # The AVL is 4 in SHA, so we could use a single e8(8 element masking) for masking. + @{[vsetivli "zero", 1, "e8", "m1", "ta", "ma"]} + @{[vmv_v_i $V0, 0x01]} + + @{[vsetivli "zero", 4, "e64", "m2", "ta", "ma"]} + +L_round_loop: + # Load round constants K512 + la $KT, $K512 + + # Decrement length by 1 + addi $LEN, $LEN, -1 + + # Keep the current state as we need it later: H' = H+{a',b',c',...,h'}. + @{[vmv_v_v $V26, $V22]} + @{[vmv_v_v $V28, $V24]} + + # Load the 1024-bits of the message block in v10-v16 and perform the endian + # swap. + @{[vle64_v $V10, $INP]} + @{[vrev8_v $V10, $V10]} + addi $INP, $INP, 32 + @{[vle64_v $V12, $INP]} + @{[vrev8_v $V12, $V12]} + addi $INP, $INP, 32 + @{[vle64_v $V14, $INP]} + @{[vrev8_v $V14, $V14]} + addi $INP, $INP, 32 + @{[vle64_v $V16, $INP]} + @{[vrev8_v $V16, $V16]} + addi $INP, $INP, 32 + + .rept 4 + # Quad-round 0 (+0, v10->v12->v14->v16) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V10]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + @{[vmerge_vvm $V18, $V14, $V12, $V0]} + @{[vsha2ms_vv $V10, $V18, $V16]} + + # Quad-round 1 (+1, v12->v14->v16->v10) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V12]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + @{[vmerge_vvm $V18, $V16, $V14, $V0]} + @{[vsha2ms_vv $V12, $V18, $V10]} + + # Quad-round 2 (+2, v14->v16->v10->v12) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V14]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + @{[vmerge_vvm $V18, $V10, $V16, $V0]} + @{[vsha2ms_vv $V14, $V18, $V12]} + + # Quad-round 3 (+3, v16->v10->v12->v14) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V16]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + @{[vmerge_vvm $V18, $V12, $V10, $V0]} + @{[vsha2ms_vv $V16, $V18, $V14]} + .endr + + # Quad-round 16 (+0, v10->v12->v14->v16) + # Note that we stop generating new message schedule words (Wt, v10-16) + # as we already generated all the words we end up consuming (i.e., W[79:76]). + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V10]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + + # Quad-round 17 (+1, v12->v14->v16->v10) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V12]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + + # Quad-round 18 (+2, v14->v16->v10->v12) + @{[vle64_v $V20, $KT]} + addi $KT, $KT, 32 + @{[vadd_vv $V18, $V20, $V14]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + + # Quad-round 19 (+3, v16->v10->v12->v14) + @{[vle64_v $V20, $KT]} + # No t1 increment needed. + @{[vadd_vv $V18, $V20, $V16]} + @{[vsha2cl_vv $V24, $V22, $V18]} + @{[vsha2ch_vv $V22, $V24, $V18]} + + # H' = H+{a',b',c',...,h'} + @{[vadd_vv $V22, $V26, $V22]} + @{[vadd_vv $V24, $V28, $V24]} + bnez $LEN, L_round_loop + + # Store {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h}. + @{[vsuxei8_v $V22, $H, $V1]} + @{[vsuxei8_v $V24, $H2, $V1]} + + ret +.size sha512_block_data_order_zvkb_zvknhb,.-sha512_block_data_order_zvkb_zvknhb + +.p2align 3 +.type $K512,\@object +$K512: + .dword 0x428a2f98d728ae22, 0x7137449123ef65cd + .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc + .dword 0x3956c25bf348b538, 0x59f111f1b605d019 + .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118 + .dword 0xd807aa98a3030242, 0x12835b0145706fbe + .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2 + .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1 + .dword 0x9bdc06a725c71235, 0xc19bf174cf692694 + .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3 + .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65 + .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483 + .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5 + .dword 0x983e5152ee66dfab, 0xa831c66d2db43210 + .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4 + .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725 + .dword 0x06ca6351e003826f, 0x142929670a0e6e70 + .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926 + .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df + .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8 + .dword 0x81c2c92e47edaee6, 0x92722c851482353b + .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001 + .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30 + .dword 0xd192e819d6ef5218, 0xd69906245565a910 + .dword 0xf40e35855771202a, 0x106aa07032bbd1b8 + .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53 + .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8 + .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb + .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3 + .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60 + .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec + .dword 0x90befffa23631e28, 0xa4506cebde82bde9 + .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b + .dword 0xca273eceea26619c, 0xd186b8c721c0c207 + .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178 + .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6 + .dword 0x113f9804bef90dae, 0x1b710b35131c471b + .dword 0x28db77f523047d84, 0x32caab7b40c72493 + .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c + .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a + .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817 +.size $K512,.-$K512 +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Wed Oct 25 18:36:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436523 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B563C07545 for ; Wed, 25 Oct 2023 18:38:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234969AbjJYSif (ORCPT ); Wed, 25 Oct 2023 14:38:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235044AbjJYSiA (ORCPT ); Wed, 25 Oct 2023 14:38:00 -0400 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2469C10F2 for ; Wed, 25 Oct 2023 11:37:37 -0700 (PDT) Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-27d4b280e4eso25209a91.1 for ; Wed, 25 Oct 2023 11:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259056; x=1698863856; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HmLKITkx1eSP8gqJH+WnAeHuaVXFpzAB9w+G6AVIsu4=; b=iyJEFLwpQMIymHKm5MfRobOYyLy3KRrZa4dQD/GwYe8SNCcdS/DZ4aMduqc/ZcbALi sUtSh9NQKaM+LFSQDiGJ2nK7TtMcO7MfrEjlF5JrNTWrzNL6X2MrSjZRdI59gGfY2+4g 2dkMW7huuXUgfU33aCbk5SSUfwZmbirob2aara/mHcrAXTCokkB38PGODuXY/t0dIwr+ Jk280ei59QPG6BLETCSLQoKM2Xk00gMfLxqTozM4WrISQGcZI+DPpyITfe7ySGF2GWN6 8WyG4wHN1bgVhYJgCBlbGu/ee1OQh2Er5wZeszDP50k+bU+F+KrCUL2ycKqqM77P7kjR cc8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259056; x=1698863856; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HmLKITkx1eSP8gqJH+WnAeHuaVXFpzAB9w+G6AVIsu4=; b=N41QtLjEfpDv9nZWcusIrcyZDN906ihyNlOWULV9NWjnOV7TlrdbHwyhnqof9x2wgx UJBvLORJNYN2LmzX+MQJ8IcbqlHnxTFmMO90rbZEDxtR+nqRaD+DpUL3ZAJciGxqOyqW GNnvaota5nczX4nKYWoXXk4zqBM3S3WJEOjex1/O8FXa9p7YFQDXdqDpWEAuKxXwmRmK rvtB9wQ84kOeiugf8MAyk+LrPBDprXeYFKyf1JoDfFDWpjbZScd78qYIHKxUFaD2/5cC GV7mMjTSCdJYSAaOqyScZl8YGPYzTMshHziAjjDjXFtYkx17uEqvqA/M9P1DeXHYPMWl SapA== X-Gm-Message-State: AOJu0Yy5cJRJhqwV8dT/3x8JbMbq9Ji5F3S7XxfYF43JinPAlNJC6E+J rKsYN/Mah4Ncu7odoF6Vhq/5EQ== X-Google-Smtp-Source: AGHT+IHzBlS+B5URuxLTQ8oBJy/h5hlYy7abyTtFnLBrnHWfXXOjXLi3op7KwlrM0zcS/751EbwnpQ== X-Received: by 2002:a17:90b:1652:b0:27d:1126:1ff6 with SMTP id il18-20020a17090b165200b0027d11261ff6mr536600pjb.8.1698259056268; Wed, 25 Oct 2023 11:37:36 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.32 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:35 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 10/12] RISC-V: crypto: add Zvksed accelerated SM4 implementation Date: Thu, 26 Oct 2023 02:36:42 +0800 Message-Id: <20231025183644.8735-11-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add SM4 implementation using Zvksed vector crypto extension from OpenSSL (openssl/openssl#21923). The perlasm here is different from the original implementation in OpenSSL. In OpenSSL, SM4 has the separated set_encrypt_key and set_decrypt_key functions. In kernel, these set_key functions are merged into a single one in order to skip the redundant key expanding instructions. Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih --- arch/riscv/crypto/Kconfig | 19 ++ arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sm4-riscv64-glue.c | 120 +++++++++++ arch/riscv/crypto/sm4-riscv64-zvksed.pl | 268 ++++++++++++++++++++++++ 4 files changed, 414 insertions(+) create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index a5a34777d9c3..ff004afa2874 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -75,4 +75,23 @@ config CRYPTO_SHA512_RISCV64 - Zvkb vector crypto extension - Zvknhb vector crypto extension +config CRYPTO_SM4_RISCV64 + default y if RISCV_ISA_V + tristate "Ciphers: SM4 (ShangMi 4)" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_ALGAPI + select CRYPTO_SM4 + select CRYPTO_SM4_GENERIC + help + SM4 cipher algorithms (OSCCA GB/T 32907-2016, + ISO/IEC 18033-3:2010/Amd 1:2021) + + SM4 (GBT.32907-2016) is a cryptographic standard issued by the + Organization of State Commercial Administration of China (OSCCA) + as an authorized cryptographic algorithms for the use within China. + + Architecture: riscv64 using: + - Zvkb vector crypto extension + - Zvksed vector crypto extension + endmenu diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index ebb4e861ebdc..ffad095e531f 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -18,6 +18,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvkb-zvknhb.o +obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o +sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o + quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) @@ -39,9 +42,13 @@ $(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha $(obj)/sha512-riscv64-zvkb-zvknhb.S: $(src)/sha512-riscv64-zvkb-zvknhb.pl $(call cmd,perlasm) +$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl + $(call cmd,perlasm) + clean-files += aes-riscv64-zvkned.S clean-files += aes-riscv64-zvbb-zvkg-zvkned.S clean-files += aes-riscv64-zvkb-zvkned.S clean-files += ghash-riscv64-zvkg.S clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S clean-files += sha512-riscv64-zvkb-zvknhb.S +clean-files += sm4-riscv64-zvksed.S diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-riscv64-glue.c new file mode 100644 index 000000000000..2d9983a4b1ee --- /dev/null +++ b/arch/riscv/crypto/sm4-riscv64-glue.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Linux/riscv64 port of the OpenSSL SM4 implementation for RISC-V 64 + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* sm4 using zvksed vector crypto extension */ +void rv64i_zvksed_sm4_encrypt(const u8 *in, u8 *out, const u32 *key); +void rv64i_zvksed_sm4_decrypt(const u8 *in, u8 *out, const u32 *key); +int rv64i_zvksed_sm4_set_key(const u8 *userKey, unsigned int key_len, + u32 *enc_key, u32 *dec_key); + +static int riscv64_sm4_setkey_zvksed(struct crypto_tfm *tfm, const u8 *key, + unsigned int key_len) +{ + struct sm4_ctx *ctx = crypto_tfm_ctx(tfm); + int ret = 0; + + if (crypto_simd_usable()) { + kernel_vector_begin(); + if (rv64i_zvksed_sm4_set_key(key, key_len, ctx->rkey_enc, + ctx->rkey_dec)) + ret = -EINVAL; + kernel_vector_end(); + } else { + ret = sm4_expandkey(ctx, key, key_len); + } + + return ret; +} + +static void riscv64_sm4_encrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, + const u8 *src) +{ + const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + rv64i_zvksed_sm4_encrypt(src, dst, ctx->rkey_enc); + kernel_vector_end(); + } else { + sm4_crypt_block(ctx->rkey_enc, dst, src); + } +} + +static void riscv64_sm4_decrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, + const u8 *src) +{ + const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + rv64i_zvksed_sm4_decrypt(src, dst, ctx->rkey_dec); + kernel_vector_end(); + } else { + sm4_crypt_block(ctx->rkey_dec, dst, src); + } +} + +struct crypto_alg riscv64_sm4_zvksed_alg = { + .cra_name = "sm4", + .cra_driver_name = "sm4-riscv64-zvkb-zvksed", + .cra_module = THIS_MODULE, + .cra_priority = 300, + .cra_flags = CRYPTO_ALG_TYPE_CIPHER, + .cra_blocksize = SM4_BLOCK_SIZE, + .cra_ctxsize = sizeof(struct sm4_ctx), + .cra_cipher = { + .cia_min_keysize = SM4_KEY_SIZE, + .cia_max_keysize = SM4_KEY_SIZE, + .cia_setkey = riscv64_sm4_setkey_zvksed, + .cia_encrypt = riscv64_sm4_encrypt_zvksed, + .cia_decrypt = riscv64_sm4_decrypt_zvksed, + }, +}; + +static inline bool check_sm4_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKSED) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 128; +} + +static int __init riscv64_sm4_mod_init(void) +{ + if (check_sm4_ext()) + return crypto_register_alg(&riscv64_sm4_zvksed_alg); + + return -ENODEV; +} + +static void __exit riscv64_sm4_mod_fini(void) +{ + if (check_sm4_ext()) + crypto_unregister_alg(&riscv64_sm4_zvksed_alg); +} + +module_init(riscv64_sm4_mod_init); +module_exit(riscv64_sm4_mod_fini); + +MODULE_DESCRIPTION("SM4 (RISC-V accelerated)"); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("sm4"); diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed.pl b/arch/riscv/crypto/sm4-riscv64-zvksed.pl new file mode 100644 index 000000000000..a764668c8398 --- /dev/null +++ b/arch/riscv/crypto/sm4-riscv64-zvksed.pl @@ -0,0 +1,268 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph Müllner +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The generated code of this file depends on the following RISC-V extensions: +# - RV64I +# - RISC-V Vector ('V') with VLEN >= 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector SM4 Block Cipher extension ('Zvksed') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +#### +# int rv64i_zvksed_sm4_set_key(const u8 *userKey, unsigned int key_len, +# u32 *enc_key, u32 *dec_key); +# +{ +my ($ukey,$key_len,$enc_key,$dec_key)=("a0","a1","a2","a3"); +my ($fk,$stride)=("a4","a5"); +my ($t0,$t1)=("t0","t1"); +my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10"); +$code .= <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_set_key +.type rv64i_zvksed_sm4_set_key,\@function +rv64i_zvksed_sm4_set_key: + li $t0, 16 + beq $t0, $key_len, 1f + li a0, 1 + ret +1: + + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + # Load the user key + @{[vle32_v $vukey, $ukey]} + @{[vrev8_v $vukey, $vukey]} + + # Load the FK. + la $fk, FK + @{[vle32_v $vfk, $fk]} + + # Generate round keys. + @{[vxor_vv $vukey, $vukey, $vfk]} + @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3] + @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7] + @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11] + @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15] + @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19] + @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23] + @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27] + @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31] + + # Store enc round keys + @{[vse32_v $vk0, $enc_key]} # rk[0:3] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk1, $enc_key]} # rk[4:7] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk2, $enc_key]} # rk[8:11] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk3, $enc_key]} # rk[12:15] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk4, $enc_key]} # rk[16:19] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk5, $enc_key]} # rk[20:23] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk6, $enc_key]} # rk[24:27] + addi $enc_key, $enc_key, 16 + @{[vse32_v $vk7, $enc_key]} # rk[28:31] + + # Store dec round keys in reverse order + addi $dec_key, $dec_key, 12 + li $stride, -4 + @{[vsse32_v $vk7, $dec_key, $stride]} # rk[31:28] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk6, $dec_key, $stride]} # rk[27:24] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk5, $dec_key, $stride]} # rk[23:20] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk4, $dec_key, $stride]} # rk[19:16] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk3, $dec_key, $stride]} # rk[15:12] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk2, $dec_key, $stride]} # rk[11:8] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk1, $dec_key, $stride]} # rk[7:4] + addi $dec_key, $dec_key, 16 + @{[vsse32_v $vk0, $dec_key, $stride]} # rk[3:0] + + li a0, 0 + ret +.size rv64i_zvksed_sm4_set_key,.-rv64i_zvksed_sm4_set_key +___ +} + +#### +# void rv64i_zvksed_sm4_encrypt(const unsigned char *in, unsigned char *out, +# const SM4_KEY *key); +# +{ +my ($in,$out,$keys,$stride)=("a0","a1","a2","t0"); +my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10"); +$code .= <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_encrypt +.type rv64i_zvksed_sm4_encrypt,\@function +rv64i_zvksed_sm4_encrypt: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + # Load input data + @{[vle32_v $vdata, $in]} + @{[vrev8_v $vdata, $vdata]} + + # Order of elements was adjusted in sm4_set_key() + # Encrypt with all keys + @{[vle32_v $vk0, $keys]} # rk[0:3] + @{[vsm4r_vs $vdata, $vk0]} + addi $keys, $keys, 16 + @{[vle32_v $vk1, $keys]} # rk[4:7] + @{[vsm4r_vs $vdata, $vk1]} + addi $keys, $keys, 16 + @{[vle32_v $vk2, $keys]} # rk[8:11] + @{[vsm4r_vs $vdata, $vk2]} + addi $keys, $keys, 16 + @{[vle32_v $vk3, $keys]} # rk[12:15] + @{[vsm4r_vs $vdata, $vk3]} + addi $keys, $keys, 16 + @{[vle32_v $vk4, $keys]} # rk[16:19] + @{[vsm4r_vs $vdata, $vk4]} + addi $keys, $keys, 16 + @{[vle32_v $vk5, $keys]} # rk[20:23] + @{[vsm4r_vs $vdata, $vk5]} + addi $keys, $keys, 16 + @{[vle32_v $vk6, $keys]} # rk[24:27] + @{[vsm4r_vs $vdata, $vk6]} + addi $keys, $keys, 16 + @{[vle32_v $vk7, $keys]} # rk[28:31] + @{[vsm4r_vs $vdata, $vk7]} + + # Save the ciphertext (in reverse element order) + @{[vrev8_v $vdata, $vdata]} + li $stride, -4 + addi $out, $out, 12 + @{[vsse32_v $vdata, $out, $stride]} + + ret +.size rv64i_zvksed_sm4_encrypt,.-rv64i_zvksed_sm4_encrypt +___ +} + +#### +# void rv64i_zvksed_sm4_decrypt(const unsigned char *in, unsigned char *out, +# const SM4_KEY *key); +# +{ +my ($in,$out,$keys,$stride)=("a0","a1","a2","t0"); +my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10"); +$code .= <<___; +.p2align 3 +.globl rv64i_zvksed_sm4_decrypt +.type rv64i_zvksed_sm4_decrypt,\@function +rv64i_zvksed_sm4_decrypt: + @{[vsetivli "zero", 4, "e32", "m1", "ta", "ma"]} + + # Load input data + @{[vle32_v $vdata, $in]} + @{[vrev8_v $vdata, $vdata]} + + # Order of key elements was adjusted in sm4_set_key() + # Decrypt with all keys + @{[vle32_v $vk7, $keys]} # rk[31:28] + @{[vsm4r_vs $vdata, $vk7]} + addi $keys, $keys, 16 + @{[vle32_v $vk6, $keys]} # rk[27:24] + @{[vsm4r_vs $vdata, $vk6]} + addi $keys, $keys, 16 + @{[vle32_v $vk5, $keys]} # rk[23:20] + @{[vsm4r_vs $vdata, $vk5]} + addi $keys, $keys, 16 + @{[vle32_v $vk4, $keys]} # rk[19:16] + @{[vsm4r_vs $vdata, $vk4]} + addi $keys, $keys, 16 + @{[vle32_v $vk3, $keys]} # rk[15:11] + @{[vsm4r_vs $vdata, $vk3]} + addi $keys, $keys, 16 + @{[vle32_v $vk2, $keys]} # rk[11:8] + @{[vsm4r_vs $vdata, $vk2]} + addi $keys, $keys, 16 + @{[vle32_v $vk1, $keys]} # rk[7:4] + @{[vsm4r_vs $vdata, $vk1]} + addi $keys, $keys, 16 + @{[vle32_v $vk0, $keys]} # rk[3:0] + @{[vsm4r_vs $vdata, $vk0]} + + # Save the ciphertext (in reverse element order) + @{[vrev8_v $vdata, $vdata]} + li $stride, -4 + addi $out, $out, 12 + @{[vsse32_v $vdata, $out, $stride]} + + ret +.size rv64i_zvksed_sm4_decrypt,.-rv64i_zvksed_sm4_decrypt +___ +} + +$code .= <<___; +# Family Key (little-endian 32-bit chunks) +.p2align 3 +FK: + .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC +.size FK,.-FK +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Wed Oct 25 18:36:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436525 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BDCEC07545 for ; Wed, 25 Oct 2023 18:39:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232076AbjJYSi7 (ORCPT ); Wed, 25 Oct 2023 14:38:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234981AbjJYSiQ (ORCPT ); Wed, 25 Oct 2023 14:38:16 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A5171715 for ; Wed, 25 Oct 2023 11:37:40 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id 98e67ed59e1d1-27d5c71b4d7so952585a91.1 for ; Wed, 25 Oct 2023 11:37:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259060; x=1698863860; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rFfELb5x/LsOrFQ7gUXamutHrvvgbPVGCPs6bAZYMRw=; b=mc72EvBvCoo0wAMqIOaKcleEPNK1lXG56y953+1akpcGLMz18a/eKlKrf++3/vEm9m IhaEWKTJePlIiBxKTCw8lP/APOnRWR282TE5RmHQBTTPvrf8mSOrfQ7GpmLSLcIDfHw+ d1xzzIg6tpZWtuFy2+RvHu0lhcQN7JVLXOgD3CQNBGAkvDKGcus7eRt1oeMkOia9QnQH 0LIA1ScvWDOaQ3XFQBZo6MUhHPqtyYgNOV+TxqSC6GoiZ0ap51pNyka9Z0bbqG4H1qFV Ls4mwnwOTi7pEwk68LIEnGKHUxpASF/+nF/hOy3b4z1jpm0P+ndNLKBw3G79v4ULNL6s 4RdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259060; x=1698863860; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rFfELb5x/LsOrFQ7gUXamutHrvvgbPVGCPs6bAZYMRw=; b=dkWVtdxTIN3bQKi1oTajOAiNXIWBD1j69Guu6jLW+266/DJaMVWW4uAuhOZ0XBpprk TW2229BzMng6ag3o0N/mX6HVglIUajTvCRK48SaSR6NsorKJgxl2281xm3GBi25vma1p NlAUm2lRD1Ecrz5DUQxIvESAtvdfbuaa5VssbRsomY3ipOotawNpzzGs6DJBdabE39Rv DHyiEQ3dgogX7DkTsAXum+kvYUHx3aFjz99za90LaWZAYNCX1aNL1Adp7B0oD8asOxmj 1LN8WdFKqUa/t65YlaYMsNQnpUdhEH16ypnnVq36VybYPR9GJ+czC55An/ikeFYJ78fj 6pHQ== X-Gm-Message-State: AOJu0YwzPizuMC1lc+9z1CUINYBvJdXI4Oz6buofZ9Dp4mIVCb6UV4TX 1rebUa6vaSti6H9b+D/Dg0ghFA== X-Google-Smtp-Source: AGHT+IHABU2hd2qHwiwMTWq1S7m/e937VOUzR8QXiVJeSGfxSSxVPKGmahrvNlXL2eiKB3F4nLI5+Q== X-Received: by 2002:a17:90a:3f16:b0:27d:2d2f:9ca2 with SMTP id l22-20020a17090a3f1600b0027d2d2f9ca2mr462317pjc.24.1698259060209; Wed, 25 Oct 2023 11:37:40 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:39 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 11/12] RISC-V: crypto: add Zvksh accelerated SM3 implementation Date: Thu, 26 Oct 2023 02:36:43 +0800 Message-Id: <20231025183644.8735-12-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add SM3 implementation using Zvksh vector crypto extension from OpenSSL (openssl/openssl#21923). Co-developed-by: Christoph Müllner Signed-off-by: Christoph Müllner Co-developed-by: Heiko Stuebner Signed-off-by: Heiko Stuebner Signed-off-by: Jerry Shih --- arch/riscv/crypto/Kconfig | 13 ++ arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/sm3-riscv64-glue.c | 121 +++++++++++++ arch/riscv/crypto/sm3-riscv64-zvksh.pl | 230 +++++++++++++++++++++++++ 4 files changed, 371 insertions(+) create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index ff004afa2874..2797b37394bb 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -75,6 +75,19 @@ config CRYPTO_SHA512_RISCV64 - Zvkb vector crypto extension - Zvknhb vector crypto extension +config CRYPTO_SM3_RISCV64 + default y if RISCV_ISA_V + tristate "Hash functions: SM3 (ShangMi 3)" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_HASH + select CRYPTO_SM3 + help + SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012) + + Architecture: riscv64 using: + - Zvkb vector crypto extension + - Zvksh vector crypto extension + config CRYPTO_SM4_RISCV64 default y if RISCV_ISA_V tristate "Ciphers: SM4 (ShangMi 4)" diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index ffad095e531f..b772417703fd 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -18,6 +18,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvkb-zvknha_or_zvknhb.o obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvkb-zvknhb.o +obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o +sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh.o + obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o @@ -42,6 +45,9 @@ $(obj)/sha256-riscv64-zvkb-zvknha_or_zvknhb.S: $(src)/sha256-riscv64-zvkb-zvknha $(obj)/sha512-riscv64-zvkb-zvknhb.S: $(src)/sha512-riscv64-zvkb-zvknhb.pl $(call cmd,perlasm) +$(obj)/sm3-riscv64-zvksh.S: $(src)/sm3-riscv64-zvksh.pl + $(call cmd,perlasm) + $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl $(call cmd,perlasm) @@ -51,4 +57,5 @@ clean-files += aes-riscv64-zvkb-zvkned.S clean-files += ghash-riscv64-zvkg.S clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S clean-files += sha512-riscv64-zvkb-zvknhb.S +clean-files += sm3-riscv64-zvksh.S clean-files += sm4-riscv64-zvksed.S diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-riscv64-glue.c new file mode 100644 index 000000000000..32d37b602e54 --- /dev/null +++ b/arch/riscv/crypto/sm3-riscv64-glue.c @@ -0,0 +1,121 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Linux/riscv64 port of the OpenSSL SM3 implementation for RISC-V 64 + * + * Copyright (C) 2023 VRULL GmbH + * Author: Heiko Stuebner + */ + +#include +#include +#include +#include +#include +#include +#include + +/* + * sm3 using zvksh vector crypto extension + * + * This asm function will just take the first 256-bit as the sm3 state from + * the pointer to `struct sm3_state`. + */ +void ossl_hwsm3_block_data_order_zvksh(struct sm3_state *digest, u8 const *o, + int num); + +static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data, + unsigned int len) +{ + int ret = 0; + + /* + * Make sure struct sm3_state begins directly with the SM3 256-bit internal + * state, as this is what the asm function expect. + */ + BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0); + + if (crypto_simd_usable()) { + kernel_vector_begin(); + ret = sm3_base_do_update(desc, data, len, + ossl_hwsm3_block_data_order_zvksh); + kernel_vector_end(); + } else { + sm3_update(shash_desc_ctx(desc), data, len); + } + + return ret; +} + +static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data, + unsigned int len, u8 *out) +{ + struct sm3_state *ctx; + + if (crypto_simd_usable()) { + kernel_vector_begin(); + if (len) + sm3_base_do_update(desc, data, len, + ossl_hwsm3_block_data_order_zvksh); + sm3_base_do_finalize(desc, ossl_hwsm3_block_data_order_zvksh); + kernel_vector_end(); + + return sm3_base_finish(desc, out); + } + + ctx = shash_desc_ctx(desc); + if (len) + sm3_update(ctx, data, len); + sm3_final(ctx, out); + + return 0; +} + +static int riscv64_sm3_final(struct shash_desc *desc, u8 *out) +{ + return riscv64_sm3_finup(desc, NULL, 0, out); +} + +static struct shash_alg sm3_alg = { + .digestsize = SM3_DIGEST_SIZE, + .init = sm3_base_init, + .update = riscv64_sm3_update, + .final = riscv64_sm3_final, + .finup = riscv64_sm3_finup, + .descsize = sizeof(struct sm3_state), + .base.cra_name = "sm3", + .base.cra_driver_name = "sm3-riscv64-zvkb-zvksh", + .base.cra_priority = 150, + .base.cra_blocksize = SM3_BLOCK_SIZE, + .base.cra_module = THIS_MODULE, +}; + +static inline bool check_sm3_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKSH) && + riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 128; +} + +static int __init riscv64_riscv64_sm3_mod_init(void) +{ + if (check_sm3_ext()) + return crypto_register_shash(&sm3_alg); + + return -ENODEV; +} + +static void __exit riscv64_sm3_mod_fini(void) +{ + if (check_sm3_ext()) + crypto_unregister_shash(&sm3_alg); +} + +module_init(riscv64_riscv64_sm3_mod_init); +module_exit(riscv64_sm3_mod_fini); + +MODULE_DESCRIPTION("SM3 (RISC-V accelerated)"); +MODULE_AUTHOR("Andy Polyakov "); +MODULE_AUTHOR("Ard Biesheuvel "); +MODULE_AUTHOR("Heiko Stuebner "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("sm3"); diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh.pl b/arch/riscv/crypto/sm3-riscv64-zvksh.pl new file mode 100644 index 000000000000..942d78d982e9 --- /dev/null +++ b/arch/riscv/crypto/sm3-riscv64-zvksh.pl @@ -0,0 +1,230 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You can obtain +# a copy in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Christoph Müllner +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# The generated code of this file depends on the following RISC-V extensions: +# - RV64I +# - RISC-V Vector ('V') with VLEN >= 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Vector SM3 Secure Hash extension ('Zvksh') + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT,">$output"; + +my $code=<<___; +.text +___ + +################################################################################ +# ossl_hwsm3_block_data_order_zvksh(SM3_CTX *c, const void *p, size_t num); +{ +my ($CTX, $INPUT, $NUM) = ("a0", "a1", "a2"); +my ($V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, + $V8, $V9, $V10, $V11, $V12, $V13, $V14, $V15, + $V16, $V17, $V18, $V19, $V20, $V21, $V22, $V23, + $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) = map("v$_",(0..31)); + +$code .= <<___; +.text +.p2align 3 +.globl ossl_hwsm3_block_data_order_zvksh +.type ossl_hwsm3_block_data_order_zvksh,\@function +ossl_hwsm3_block_data_order_zvksh: + @{[vsetivli "zero", 8, "e32", "m2", "ta", "ma"]} + + # Load initial state of hash context (c->A-H). + @{[vle32_v $V0, $CTX]} + @{[vrev8_v $V0, $V0]} + +L_sm3_loop: + # Copy the previous state to v2. + # It will be XOR'ed with the current state at the end of the round. + @{[vmv_v_v $V2, $V0]} + + # Load the 64B block in 2x32B chunks. + @{[vle32_v $V6, $INPUT]} # v6 := {w7, ..., w0} + addi $INPUT, $INPUT, 32 + + @{[vle32_v $V8, $INPUT]} # v8 := {w15, ..., w8} + addi $INPUT, $INPUT, 32 + + addi $NUM, $NUM, -1 + + # As vsm3c consumes only w0, w1, w4, w5 we need to slide the input + # 2 elements down so we process elements w2, w3, w6, w7 + # This will be repeated for each odd round. + @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w7, ..., w2} + + @{[vsm3c_vi $V0, $V6, 0]} + @{[vsm3c_vi $V0, $V4, 1]} + + # Prepare a vector with {w11, ..., w4} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w7, ..., w4} + @{[vslideup_vi $V4, $V8, 4]} # v4 := {w11, w10, w9, w8, w7, w6, w5, w4} + + @{[vsm3c_vi $V0, $V4, 2]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w11, w10, w9, w8, w7, w6} + @{[vsm3c_vi $V0, $V4, 3]} + + @{[vsm3c_vi $V0, $V8, 4]} + @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w15, w14, w13, w12, w11, w10} + @{[vsm3c_vi $V0, $V4, 5]} + + @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w23, w22, w21, w20, w19, w18, w17, w16} + + # Prepare a register with {w19, w18, w17, w16, w15, w14, w13, w12} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w15, w14, w13, w12} + @{[vslideup_vi $V4, $V6, 4]} # v4 := {w19, w18, w17, w16, w15, w14, w13, w12} + + @{[vsm3c_vi $V0, $V4, 6]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w19, w18, w17, w16, w15, w14} + @{[vsm3c_vi $V0, $V4, 7]} + + @{[vsm3c_vi $V0, $V6, 8]} + @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w23, w22, w21, w20, w19, w18} + @{[vsm3c_vi $V0, $V4, 9]} + + @{[vsm3me_vv $V8, $V6, $V8]} # v8 := {w31, w30, w29, w28, w27, w26, w25, w24} + + # Prepare a register with {w27, w26, w25, w24, w23, w22, w21, w20} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w23, w22, w21, w20} + @{[vslideup_vi $V4, $V8, 4]} # v4 := {w27, w26, w25, w24, w23, w22, w21, w20} + + @{[vsm3c_vi $V0, $V4, 10]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w27, w26, w25, w24, w23, w22} + @{[vsm3c_vi $V0, $V4, 11]} + + @{[vsm3c_vi $V0, $V8, 12]} + @{[vslidedown_vi $V4, $V8, 2]} # v4 := {x, X, w31, w30, w29, w28, w27, w26} + @{[vsm3c_vi $V0, $V4, 13]} + + @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w32, w33, w34, w35, w36, w37, w38, w39} + + # Prepare a register with {w35, w34, w33, w32, w31, w30, w29, w28} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w31, w30, w29, w28} + @{[vslideup_vi $V4, $V6, 4]} # v4 := {w35, w34, w33, w32, w31, w30, w29, w28} + + @{[vsm3c_vi $V0, $V4, 14]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w35, w34, w33, w32, w31, w30} + @{[vsm3c_vi $V0, $V4, 15]} + + @{[vsm3c_vi $V0, $V6, 16]} + @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w39, w38, w37, w36, w35, w34} + @{[vsm3c_vi $V0, $V4, 17]} + + @{[vsm3me_vv $V8, $V6, $V8]} # v8 := {w47, w46, w45, w44, w43, w42, w41, w40} + + # Prepare a register with {w43, w42, w41, w40, w39, w38, w37, w36} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w39, w38, w37, w36} + @{[vslideup_vi $V4, $V8, 4]} # v4 := {w43, w42, w41, w40, w39, w38, w37, w36} + + @{[vsm3c_vi $V0, $V4, 18]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w43, w42, w41, w40, w39, w38} + @{[vsm3c_vi $V0, $V4, 19]} + + @{[vsm3c_vi $V0, $V8, 20]} + @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w47, w46, w45, w44, w43, w42} + @{[vsm3c_vi $V0, $V4, 21]} + + @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w55, w54, w53, w52, w51, w50, w49, w48} + + # Prepare a register with {w51, w50, w49, w48, w47, w46, w45, w44} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w47, w46, w45, w44} + @{[vslideup_vi $V4, $V6, 4]} # v4 := {w51, w50, w49, w48, w47, w46, w45, w44} + + @{[vsm3c_vi $V0, $V4, 22]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w51, w50, w49, w48, w47, w46} + @{[vsm3c_vi $V0, $V4, 23]} + + @{[vsm3c_vi $V0, $V6, 24]} + @{[vslidedown_vi $V4, $V6, 2]} # v4 := {X, X, w55, w54, w53, w52, w51, w50} + @{[vsm3c_vi $V0, $V4, 25]} + + @{[vsm3me_vv $V8, $V6, $V8]} # v8 := {w63, w62, w61, w60, w59, w58, w57, w56} + + # Prepare a register with {w59, w58, w57, w56, w55, w54, w53, w52} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w55, w54, w53, w52} + @{[vslideup_vi $V4, $V8, 4]} # v4 := {w59, w58, w57, w56, w55, w54, w53, w52} + + @{[vsm3c_vi $V0, $V4, 26]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w59, w58, w57, w56, w55, w54} + @{[vsm3c_vi $V0, $V4, 27]} + + @{[vsm3c_vi $V0, $V8, 28]} + @{[vslidedown_vi $V4, $V8, 2]} # v4 := {X, X, w63, w62, w61, w60, w59, w58} + @{[vsm3c_vi $V0, $V4, 29]} + + @{[vsm3me_vv $V6, $V8, $V6]} # v6 := {w71, w70, w69, w68, w67, w66, w65, w64} + + # Prepare a register with {w67, w66, w65, w64, w63, w62, w61, w60} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, X, X, w63, w62, w61, w60} + @{[vslideup_vi $V4, $V6, 4]} # v4 := {w67, w66, w65, w64, w63, w62, w61, w60} + + @{[vsm3c_vi $V0, $V4, 30]} + @{[vslidedown_vi $V4, $V4, 2]} # v4 := {X, X, w67, w66, w65, w64, w63, w62} + @{[vsm3c_vi $V0, $V4, 31]} + + # XOR in the previous state. + @{[vxor_vv $V0, $V0, $V2]} + + bnez $NUM, L_sm3_loop # Check if there are any more block to process +L_sm3_end: + @{[vrev8_v $V0, $V0]} + @{[vse32_v $V0, $CTX]} + ret + +.size ossl_hwsm3_block_data_order_zvksh,.-ossl_hwsm3_block_data_order_zvksh +___ +} + +print $code; + +close STDOUT or die "error closing STDOUT: $!"; From patchwork Wed Oct 25 18:36:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jerry Shih X-Patchwork-Id: 13436526 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BBFFC25B6E for ; Wed, 25 Oct 2023 18:39:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235101AbjJYSjC (ORCPT ); Wed, 25 Oct 2023 14:39:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234800AbjJYSib (ORCPT ); Wed, 25 Oct 2023 14:38:31 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41D8E172E for ; Wed, 25 Oct 2023 11:37:44 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-517ab9a4a13so95058a12.1 for ; Wed, 25 Oct 2023 11:37:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1698259064; x=1698863864; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Kt3WTfvjOI4LHPOMyEbRdhIq5hHcz3LzqQvN0TjWHcA=; b=kmqU2GTAA3zJ6DhfUM2XtTT9eX5DUOeyQVNJlC4dam4VL4wBlOfoOjhoR7IqHy62PH OVoSTot4sv+cbpsjVCHEM48KLrTrUJ3dTpXV5AwllttK5Wz2lejRynaLT4BaDmStUffV RCJokDmwcQZuw6906hGMxIgxuAOSRGRTxrc+P0JF4wE9+j/QukQUIOxRC84vgE/Iwiiq EbLugRWGk6CWv8j1HOPT64PtyKgy69my8QAPcDvfn0QUJojSvMjRpS/dAg87XZwzkVeL ZflsBvibo3bChHV72YxlIoC4K3ighhwHPjrvo4SyAz5BxRBqNpxByqQyY62PK8fH/Wog Nj6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698259064; x=1698863864; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Kt3WTfvjOI4LHPOMyEbRdhIq5hHcz3LzqQvN0TjWHcA=; b=QylwnQDbjp/iHLXC9ZFHz6jaY+XQ4T14qGoMKPmu3A8Xj/pBO+TiyBwOa77owRdXy4 7kUUxv4jSzGU3+msRhAiHlyJlOJV/EHofp1nwBdfkw6AX/gG5ZA/UqH9enVYLEsWkVj3 2XOqQ5w60A2efsAmjX7lveY0lRJ0/eR4d2FFj2LkskXF0980hTj3QoMzL4HEjcVT/MsD iRCMIVg5s2ADHMmno9HAy5f25RHMQAI5aIkgRWFO4arPwH30jEJMudlf7iU93UWk6jhM pBn36ebEjJdBhZYCwqBOUEAnnFVgJuC9FK9Jx27ICENmpWLHMZ1p2jzCAoL41+NmmBfj 3Tiw== X-Gm-Message-State: AOJu0YzYNG9Rf6UKFtAZuH1z2g2Lix1mRGKCAH10NDxpsgAbjb2x0pwa a4Hh3b7pcYi4UDNKvR1bZYIDjw== X-Google-Smtp-Source: AGHT+IFqpwEuw2xGu3F8rzTjT0tmzzxKRCVTrSIiTxria3fgff4tabeHnvS5oCOlsSllxSVw8iiIsg== X-Received: by 2002:a17:90a:a38d:b0:27d:3a34:2194 with SMTP id x13-20020a17090aa38d00b0027d3a342194mr15580807pjp.14.1698259064157; Wed, 25 Oct 2023 11:37:44 -0700 (PDT) Received: from localhost.localdomain ([49.216.222.119]) by smtp.gmail.com with ESMTPSA id g3-20020a17090adb0300b00278f1512dd9sm212367pjv.32.2023.10.25.11.37.40 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Oct 2023 11:37:43 -0700 (PDT) From: Jerry Shih To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, herbert@gondor.apana.org.au, davem@davemloft.net Cc: andy.chiu@sifive.com, greentime.hu@sifive.com, conor.dooley@microchip.com, guoren@kernel.org, bjorn@rivosinc.com, heiko@sntech.de, ebiggers@kernel.org, ardb@kernel.org, phoebe.chen@sifive.com, hongrong.hsu@sifive.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org Subject: [PATCH 12/12] RISC-V: crypto: add Zvkb accelerated ChaCha20 implementation Date: Thu, 26 Oct 2023 02:36:44 +0800 Message-Id: <20231025183644.8735-13-jerry.shih@sifive.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20231025183644.8735-1-jerry.shih@sifive.com> References: <20231025183644.8735-1-jerry.shih@sifive.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add a ChaCha20 vector implementation from OpenSSL(openssl/openssl#21923). Signed-off-by: Jerry Shih --- arch/riscv/crypto/Kconfig | 12 + arch/riscv/crypto/Makefile | 7 + arch/riscv/crypto/chacha-riscv64-glue.c | 120 +++++++++ arch/riscv/crypto/chacha-riscv64-zvkb.pl | 322 +++++++++++++++++++++++ 4 files changed, 461 insertions(+) create mode 100644 arch/riscv/crypto/chacha-riscv64-glue.c create mode 100644 arch/riscv/crypto/chacha-riscv64-zvkb.pl diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig index 2797b37394bb..41ce453afafa 100644 --- a/arch/riscv/crypto/Kconfig +++ b/arch/riscv/crypto/Kconfig @@ -35,6 +35,18 @@ config CRYPTO_AES_BLOCK_RISCV64 - Zvkg vector crypto extension (XTS) - Zvkned vector crypto extension +config CRYPTO_CHACHA20_RISCV64 + default y if RISCV_ISA_V + tristate "Ciphers: ChaCha20" + depends on 64BIT && RISCV_ISA_V + select CRYPTO_SKCIPHER + select CRYPTO_LIB_CHACHA_GENERIC + help + Length-preserving ciphers: ChaCha20 stream cipher algorithm + + Architecture: riscv64 using: + - Zvkb vector crypto extension + config CRYPTO_GHASH_RISCV64 default y if RISCV_ISA_V tristate "Hash functions: GHASH" diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile index b772417703fd..80b0ebc956a3 100644 --- a/arch/riscv/crypto/Makefile +++ b/arch/riscv/crypto/Makefile @@ -9,6 +9,9 @@ aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o obj-$(CONFIG_CRYPTO_AES_BLOCK_RISCV64) += aes-block-riscv64.o aes-block-riscv64-y := aes-riscv64-block-mode-glue.o aes-riscv64-zvbb-zvkg-zvkned.o aes-riscv64-zvkb-zvkned.o +obj-$(CONFIG_CRYPTO_CHACHA20_RISCV64) += chacha-riscv64.o +chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o + obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o @@ -36,6 +39,9 @@ $(obj)/aes-riscv64-zvbb-zvkg-zvkned.S: $(src)/aes-riscv64-zvbb-zvkg-zvkned.pl $(obj)/aes-riscv64-zvkb-zvkned.S: $(src)/aes-riscv64-zvkb-zvkned.pl $(call cmd,perlasm) +$(obj)/chacha-riscv64-zvkb.S: $(src)/chacha-riscv64-zvkb.pl + $(call cmd,perlasm) + $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl $(call cmd,perlasm) @@ -54,6 +60,7 @@ $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl clean-files += aes-riscv64-zvkned.S clean-files += aes-riscv64-zvbb-zvkg-zvkned.S clean-files += aes-riscv64-zvkb-zvkned.S +clean-files += chacha-riscv64-zvkb.S clean-files += ghash-riscv64-zvkg.S clean-files += sha256-riscv64-zvkb-zvknha_or_zvknhb.S clean-files += sha512-riscv64-zvkb-zvknhb.S diff --git a/arch/riscv/crypto/chacha-riscv64-glue.c b/arch/riscv/crypto/chacha-riscv64-glue.c new file mode 100644 index 000000000000..72011949f705 --- /dev/null +++ b/arch/riscv/crypto/chacha-riscv64-glue.c @@ -0,0 +1,120 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Port of the OpenSSL ChaCha20 implementation for RISC-V 64 + * + * Copyright (C) 2023 SiFive, Inc. + * Author: Jerry Shih + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define CHACHA_BLOCK_VALID_SIZE_MASK (~(CHACHA_BLOCK_SIZE - 1)) +#define CHACHA_BLOCK_REMAINING_SIZE_MASK (CHACHA_BLOCK_SIZE - 1) +#define CHACHA_KEY_OFFSET 4 +#define CHACHA_IV_OFFSET 12 + +/* chacha20 using zvkb vector crypto extension */ +void ChaCha20_ctr32_zvkb(u8 *out, const u8 *input, size_t len, const u32 *key, + const u32 *counter); + +static int chacha20_encrypt(struct skcipher_request *req) +{ + u32 state[CHACHA_STATE_WORDS]; + u8 block_buffer[CHACHA_BLOCK_SIZE]; + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + const struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + unsigned int nbytes; + unsigned int tail_bytes; + int err; + + chacha_init_generic(state, ctx->key, req->iv); + + err = skcipher_walk_virt(&walk, req, false); + while (walk.nbytes) { + nbytes = walk.nbytes & CHACHA_BLOCK_VALID_SIZE_MASK; + tail_bytes = walk.nbytes & CHACHA_BLOCK_REMAINING_SIZE_MASK; + kernel_vector_begin(); + if (nbytes) { + ChaCha20_ctr32_zvkb(walk.dst.virt.addr, + walk.src.virt.addr, nbytes, + state + CHACHA_KEY_OFFSET, + state + CHACHA_IV_OFFSET); + state[CHACHA_IV_OFFSET] += nbytes / CHACHA_BLOCK_SIZE; + } + if (walk.nbytes == walk.total && tail_bytes > 0) { + memcpy(block_buffer, walk.src.virt.addr + nbytes, + tail_bytes); + ChaCha20_ctr32_zvkb(block_buffer, block_buffer, + CHACHA_BLOCK_SIZE, + state + CHACHA_KEY_OFFSET, + state + CHACHA_IV_OFFSET); + memcpy(walk.dst.virt.addr + nbytes, block_buffer, + tail_bytes); + tail_bytes = 0; + } + kernel_vector_end(); + + err = skcipher_walk_done(&walk, tail_bytes); + } + + return err; +} + +static struct skcipher_alg riscv64_chacha_alg_zvkb[] = { { + .base = { + .cra_name = "chacha20", + .cra_driver_name = "chacha20-riscv64-zvkb", + .cra_priority = 300, + .cra_blocksize = 1, + .cra_ctxsize = sizeof(struct chacha_ctx), + .cra_module = THIS_MODULE, + }, + .min_keysize = CHACHA_KEY_SIZE, + .max_keysize = CHACHA_KEY_SIZE, + .ivsize = CHACHA_IV_SIZE, + .chunksize = CHACHA_BLOCK_SIZE, + .walksize = CHACHA_BLOCK_SIZE * 4, + .setkey = chacha20_setkey, + .encrypt = chacha20_encrypt, + .decrypt = chacha20_encrypt, +} }; + +static inline bool check_chacha20_ext(void) +{ + return riscv_isa_extension_available(NULL, ZVKB) && + riscv_vector_vlen() >= 128; +} + +static int __init riscv64_chacha_mod_init(void) +{ + if (check_chacha20_ext()) + return crypto_register_skciphers( + riscv64_chacha_alg_zvkb, + ARRAY_SIZE(riscv64_chacha_alg_zvkb)); + + return -ENODEV; +} + +static void __exit riscv64_chacha_mod_fini(void) +{ + if (check_chacha20_ext()) + crypto_unregister_skciphers( + riscv64_chacha_alg_zvkb, + ARRAY_SIZE(riscv64_chacha_alg_zvkb)); +} + +module_init(riscv64_chacha_mod_init); +module_exit(riscv64_chacha_mod_fini); + +MODULE_DESCRIPTION("ChaCha20 (RISC-V accelerated)"); +MODULE_AUTHOR("Jerry Shih "); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_CRYPTO("chacha20"); diff --git a/arch/riscv/crypto/chacha-riscv64-zvkb.pl b/arch/riscv/crypto/chacha-riscv64-zvkb.pl new file mode 100644 index 000000000000..9caf7b247804 --- /dev/null +++ b/arch/riscv/crypto/chacha-riscv64-zvkb.pl @@ -0,0 +1,322 @@ +#! /usr/bin/env perl +# SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause +# +# This file is dual-licensed, meaning that you can use it under your +# choice of either of the following two licenses: +# +# Copyright 2023-2023 The OpenSSL Project Authors. All Rights Reserved. +# +# Licensed under the Apache License 2.0 (the "License"). You may not use +# this file except in compliance with the License. You can obtain a copy +# in the file LICENSE in the source distribution or at +# https://www.openssl.org/source/license.html +# +# or +# +# Copyright (c) 2023, Jerry Shih +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# - RV64I +# - RISC-V Vector ('V') with VLEN >= 128 +# - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') +# - RISC-V Zicclsm(Main memory supports misaligned loads/stores) + +use strict; +use warnings; + +use FindBin qw($Bin); +use lib "$Bin"; +use lib "$Bin/../../perlasm"; +use riscv; + +# $output is the last argument if it looks like a file (it has an extension) +# $flavour is the first argument if it doesn't look like a file +my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef; +my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef; + +$output and open STDOUT, ">$output"; + +my $code = <<___; +.text +___ + +# void ChaCha20_ctr32_zvkb(unsigned char *out, const unsigned char *inp, +# size_t len, const unsigned int key[8], +# const unsigned int counter[4]); +################################################################################ +my ( $OUTPUT, $INPUT, $LEN, $KEY, $COUNTER ) = ( "a0", "a1", "a2", "a3", "a4" ); +my ( $T0 ) = ( "t0" ); +my ( $CONST_DATA0, $CONST_DATA1, $CONST_DATA2, $CONST_DATA3 ) = + ( "a5", "a6", "a7", "t1" ); +my ( $KEY0, $KEY1, $KEY2,$KEY3, $KEY4, $KEY5, $KEY6, $KEY7, + $COUNTER0, $COUNTER1, $NONCE0, $NONCE1 +) = ( "s0", "s1", "s2", "s3", "s4", "s5", "s6", + "s7", "s8", "s9", "s10", "s11" ); +my ( $VL, $STRIDE, $CHACHA_LOOP_COUNT ) = ( "t2", "t3", "t4" ); +my ( + $V0, $V1, $V2, $V3, $V4, $V5, $V6, $V7, $V8, $V9, $V10, + $V11, $V12, $V13, $V14, $V15, $V16, $V17, $V18, $V19, $V20, $V21, + $V22, $V23, $V24, $V25, $V26, $V27, $V28, $V29, $V30, $V31, +) = map( "v$_", ( 0 .. 31 ) ); + +sub chacha_quad_round_group { + my ( + $A0, $B0, $C0, $D0, $A1, $B1, $C1, $D1, + $A2, $B2, $C2, $D2, $A3, $B3, $C3, $D3 + ) = @_; + + my $code = <<___; + # a += b; d ^= a; d <<<= 16; + @{[vadd_vv $A0, $A0, $B0]} + @{[vadd_vv $A1, $A1, $B1]} + @{[vadd_vv $A2, $A2, $B2]} + @{[vadd_vv $A3, $A3, $B3]} + @{[vxor_vv $D0, $D0, $A0]} + @{[vxor_vv $D1, $D1, $A1]} + @{[vxor_vv $D2, $D2, $A2]} + @{[vxor_vv $D3, $D3, $A3]} + @{[vror_vi $D0, $D0, 32 - 16]} + @{[vror_vi $D1, $D1, 32 - 16]} + @{[vror_vi $D2, $D2, 32 - 16]} + @{[vror_vi $D3, $D3, 32 - 16]} + # c += d; b ^= c; b <<<= 12; + @{[vadd_vv $C0, $C0, $D0]} + @{[vadd_vv $C1, $C1, $D1]} + @{[vadd_vv $C2, $C2, $D2]} + @{[vadd_vv $C3, $C3, $D3]} + @{[vxor_vv $B0, $B0, $C0]} + @{[vxor_vv $B1, $B1, $C1]} + @{[vxor_vv $B2, $B2, $C2]} + @{[vxor_vv $B3, $B3, $C3]} + @{[vror_vi $B0, $B0, 32 - 12]} + @{[vror_vi $B1, $B1, 32 - 12]} + @{[vror_vi $B2, $B2, 32 - 12]} + @{[vror_vi $B3, $B3, 32 - 12]} + # a += b; d ^= a; d <<<= 8; + @{[vadd_vv $A0, $A0, $B0]} + @{[vadd_vv $A1, $A1, $B1]} + @{[vadd_vv $A2, $A2, $B2]} + @{[vadd_vv $A3, $A3, $B3]} + @{[vxor_vv $D0, $D0, $A0]} + @{[vxor_vv $D1, $D1, $A1]} + @{[vxor_vv $D2, $D2, $A2]} + @{[vxor_vv $D3, $D3, $A3]} + @{[vror_vi $D0, $D0, 32 - 8]} + @{[vror_vi $D1, $D1, 32 - 8]} + @{[vror_vi $D2, $D2, 32 - 8]} + @{[vror_vi $D3, $D3, 32 - 8]} + # c += d; b ^= c; b <<<= 7; + @{[vadd_vv $C0, $C0, $D0]} + @{[vadd_vv $C1, $C1, $D1]} + @{[vadd_vv $C2, $C2, $D2]} + @{[vadd_vv $C3, $C3, $D3]} + @{[vxor_vv $B0, $B0, $C0]} + @{[vxor_vv $B1, $B1, $C1]} + @{[vxor_vv $B2, $B2, $C2]} + @{[vxor_vv $B3, $B3, $C3]} + @{[vror_vi $B0, $B0, 32 - 7]} + @{[vror_vi $B1, $B1, 32 - 7]} + @{[vror_vi $B2, $B2, 32 - 7]} + @{[vror_vi $B3, $B3, 32 - 7]} +___ + + return $code; +} + +$code .= <<___; +.p2align 3 +.globl ChaCha20_ctr32_zvkb +.type ChaCha20_ctr32_zvkb,\@function +ChaCha20_ctr32_zvkb: + srli $LEN, $LEN, 6 + beqz $LEN, .Lend + + addi sp, sp, -96 + sd s0, 0(sp) + sd s1, 8(sp) + sd s2, 16(sp) + sd s3, 24(sp) + sd s4, 32(sp) + sd s5, 40(sp) + sd s6, 48(sp) + sd s7, 56(sp) + sd s8, 64(sp) + sd s9, 72(sp) + sd s10, 80(sp) + sd s11, 88(sp) + + li $STRIDE, 64 + + #### chacha block data + # "expa" little endian + li $CONST_DATA0, 0x61707865 + # "nd 3" little endian + li $CONST_DATA1, 0x3320646e + # "2-by" little endian + li $CONST_DATA2, 0x79622d32 + # "te k" little endian + li $CONST_DATA3, 0x6b206574 + + lw $KEY0, 0($KEY) + lw $KEY1, 4($KEY) + lw $KEY2, 8($KEY) + lw $KEY3, 12($KEY) + lw $KEY4, 16($KEY) + lw $KEY5, 20($KEY) + lw $KEY6, 24($KEY) + lw $KEY7, 28($KEY) + + lw $COUNTER0, 0($COUNTER) + lw $COUNTER1, 4($COUNTER) + lw $NONCE0, 8($COUNTER) + lw $NONCE1, 12($COUNTER) + +.Lblock_loop: + @{[vsetvli $VL, $LEN, "e32", "m1", "ta", "ma"]} + + # init chacha const states + @{[vmv_v_x $V0, $CONST_DATA0]} + @{[vmv_v_x $V1, $CONST_DATA1]} + @{[vmv_v_x $V2, $CONST_DATA2]} + @{[vmv_v_x $V3, $CONST_DATA3]} + + # init chacha key states + @{[vmv_v_x $V4, $KEY0]} + @{[vmv_v_x $V5, $KEY1]} + @{[vmv_v_x $V6, $KEY2]} + @{[vmv_v_x $V7, $KEY3]} + @{[vmv_v_x $V8, $KEY4]} + @{[vmv_v_x $V9, $KEY5]} + @{[vmv_v_x $V10, $KEY6]} + @{[vmv_v_x $V11, $KEY7]} + + # init chacha key states + @{[vid_v $V12]} + @{[vadd_vx $V12, $V12, $COUNTER0]} + @{[vmv_v_x $V13, $COUNTER1]} + + # init chacha nonce states + @{[vmv_v_x $V14, $NONCE0]} + @{[vmv_v_x $V15, $NONCE1]} + + # load the top-half of input data + @{[vlsseg_nf_e32_v 8, $V16, $INPUT, $STRIDE]} + + li $CHACHA_LOOP_COUNT, 10 +.Lround_loop: + addi $CHACHA_LOOP_COUNT, $CHACHA_LOOP_COUNT, -1 + @{[chacha_quad_round_group + $V0, $V4, $V8, $V12, + $V1, $V5, $V9, $V13, + $V2, $V6, $V10, $V14, + $V3, $V7, $V11, $V15]} + @{[chacha_quad_round_group + $V0, $V5, $V10, $V15, + $V1, $V6, $V11, $V12, + $V2, $V7, $V8, $V13, + $V3, $V4, $V9, $V14]} + bnez $CHACHA_LOOP_COUNT, .Lround_loop + + # load the bottom-half of input data + addi $T0, $INPUT, 32 + @{[vlsseg_nf_e32_v 8, $V24, $T0, $STRIDE]} + + # add chacha top-half initial block states + @{[vadd_vx $V0, $V0, $CONST_DATA0]} + @{[vadd_vx $V1, $V1, $CONST_DATA1]} + @{[vadd_vx $V2, $V2, $CONST_DATA2]} + @{[vadd_vx $V3, $V3, $CONST_DATA3]} + @{[vadd_vx $V4, $V4, $KEY0]} + @{[vadd_vx $V5, $V5, $KEY1]} + @{[vadd_vx $V6, $V6, $KEY2]} + @{[vadd_vx $V7, $V7, $KEY3]} + # xor with the top-half input + @{[vxor_vv $V16, $V16, $V0]} + @{[vxor_vv $V17, $V17, $V1]} + @{[vxor_vv $V18, $V18, $V2]} + @{[vxor_vv $V19, $V19, $V3]} + @{[vxor_vv $V20, $V20, $V4]} + @{[vxor_vv $V21, $V21, $V5]} + @{[vxor_vv $V22, $V22, $V6]} + @{[vxor_vv $V23, $V23, $V7]} + + # save the top-half of output + @{[vssseg_nf_e32_v 8, $V16, $OUTPUT, $STRIDE]} + + # add chacha bottom-half initial block states + @{[vadd_vx $V8, $V8, $KEY4]} + @{[vadd_vx $V9, $V9, $KEY5]} + @{[vadd_vx $V10, $V10, $KEY6]} + @{[vadd_vx $V11, $V11, $KEY7]} + @{[vid_v $V0]} + @{[vadd_vx $V12, $V12, $COUNTER0]} + @{[vadd_vx $V13, $V13, $COUNTER1]} + @{[vadd_vx $V14, $V14, $NONCE0]} + @{[vadd_vx $V15, $V15, $NONCE1]} + @{[vadd_vv $V12, $V12, $V0]} + # xor with the bottom-half input + @{[vxor_vv $V24, $V24, $V8]} + @{[vxor_vv $V25, $V25, $V9]} + @{[vxor_vv $V26, $V26, $V10]} + @{[vxor_vv $V27, $V27, $V11]} + @{[vxor_vv $V29, $V29, $V13]} + @{[vxor_vv $V28, $V28, $V12]} + @{[vxor_vv $V30, $V30, $V14]} + @{[vxor_vv $V31, $V31, $V15]} + + # save the bottom-half of output + addi $T0, $OUTPUT, 32 + @{[vssseg_nf_e32_v 8, $V24, $T0, $STRIDE]} + + # update counter + add $COUNTER0, $COUNTER0, $VL + sub $LEN, $LEN, $VL + # increase offset for `4 * 16 * VL = 64 * VL` + slli $T0, $VL, 6 + add $INPUT, $INPUT, $T0 + add $OUTPUT, $OUTPUT, $T0 + bnez $LEN, .Lblock_loop + + ld s0, 0(sp) + ld s1, 8(sp) + ld s2, 16(sp) + ld s3, 24(sp) + ld s4, 32(sp) + ld s5, 40(sp) + ld s6, 48(sp) + ld s7, 56(sp) + ld s8, 64(sp) + ld s9, 72(sp) + ld s10, 80(sp) + ld s11, 88(sp) + addi sp, sp, 96 + +.Lend: + ret +.size ChaCha20_ctr32_zvkb,.-ChaCha20_ctr32_zvkb +___ + +print $code; + +close STDOUT or die "error closing STDOUT: $!";