diff mbox series

[3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512

Message ID 20240116063549.3016-4-TonyWWang-oc@zhaoxin.com (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show
Series Add Zhaoxin hardware engine driver support for SHA | expand

Commit Message

Tony W Wang-oc Jan. 16, 2024, 6:35 a.m. UTC
Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
instructions, including SHA1, SHA256, SHA384 and SHA512, which conform
to the Secure Hash Algorithms specified by FIPS 180-3.

With the help of implementation of SHA in hardware instead of software,
can develop applications with higher performance, more security and more
flexibility.

Below table gives a summary of test using the driver tcrypt with different
crypt algorithm drivers on Zhaoxin KH-40000 platform:
---------------------------------------------------------------------------
tcrypt     driver   16*    64      256     1024    2048    4096    8192
---------------------------------------------------------------------------
           zhaoxin** 442.80 1309.21 3257.53 5221.56 5813.45 6136.39 6264.50***
403:SHA1   generic** 341.44 813.27  1458.98 1818.03 1896.60 1940.71 1939.06
           ratio    1.30   1.61    2.23    2.87    3.07    3.16    3.23
---------------------------------------------------------------------------
           zhaoxin  451.70 1313.65 2958.71 4658.55 5109.16 5359.08 5459.13
404:SHA256 generic  202.62 463.55  845.01  1070.50 1117.51 1144.79 1155.68
           ratio    2.23   2.83    3.50    4.35    4.57    4.68    4.72
---------------------------------------------------------------------------
           zhaoxin  350.90 1406.42 3166.16 5736.39 6627.77 7182.01 7429.18
405:SHA384 generic  161.76 654.88  979.06  1350.56 1423.08 1496.57 1513.12
           ratio    2.17   2.15    3.23    4.25    4.66    4.80    4.91
---------------------------------------------------------------------------
           zhaoxin  334.49 1394.71 3159.93 5728.86 6625.33 7169.23 7407.80
406:SHA512 generic  161.80 653.84  979.42  1351.41 1444.14 1495.35 1518.43
           ratio    2.07   2.13    3.23    4.24    4.59    4.79    4.88
---------------------------------------------------------------------------
*: The length of each data block to be processed by one complete SHA
   sequence, namely one INIT, multi UPDATEs and one FINAL.
**: Crypt algorithm driver used by tcrypt, "zhaoxin" represents zhaoxin-sha
   while "generic" represents the generic software SHA driver.
***: The speed of each crypt algorithm driver processing different length
   of data blocks, unit is Mb/s.

The ratio in the table implies the performance of SHA implemented by
zhaoxin-sha driver is much higher than the ones implemented by the generic
software driver of sha1/sha256/sha384/sha512.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 drivers/crypto/Kconfig       |  15 ++
 drivers/crypto/Makefile      |   1 +
 drivers/crypto/zhaoxin-sha.c | 500 +++++++++++++++++++++++++++++++++++
 drivers/crypto/zhaoxin-sha.h |  16 ++
 4 files changed, 532 insertions(+)
 create mode 100644 drivers/crypto/zhaoxin-sha.c
 create mode 100644 drivers/crypto/zhaoxin-sha.h

Comments

kernel test robot Jan. 17, 2024, 12:45 a.m. UTC | #1
Hi Tony,

kernel test robot noticed the following build errors:

[auto build test ERROR on herbert-cryptodev-2.6/master]
[also build test ERROR on tip/x86/core linus/master v6.7 next-20240112]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tony-W-Wang-oc/crypto-padlock-sha-Matches-CPU-with-Family-with-6-explicitly/20240116-144827
base:   https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
patch link:    https://lore.kernel.org/r/20240116063549.3016-4-TonyWWang-oc%40zhaoxin.com
patch subject: [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512
config: hexagon-randconfig-r071-20240117 (https://download.01.org/0day-ci/archive/20240117/202401170833.HWvPThMS-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240117/202401170833.HWvPThMS-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401170833.HWvPThMS-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/crypto/zhaoxin-sha.c:17:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:337:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from drivers/crypto/zhaoxin-sha.c:17:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:337:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from drivers/crypto/zhaoxin-sha.c:17:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:337:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
>> drivers/crypto/zhaoxin-sha.c:20:10: fatal error: 'asm/cpu_device_id.h' file not found
   #include <asm/cpu_device_id.h>
            ^~~~~~~~~~~~~~~~~~~~~
   6 warnings and 1 error generated.


vim +20 drivers/crypto/zhaoxin-sha.c

  > 20	#include <asm/cpu_device_id.h>
    21	#include <asm/fpu/api.h>
    22	#include "zhaoxin-sha.h"
    23
kernel test robot Jan. 19, 2024, 5:04 p.m. UTC | #2
Hi Tony,

kernel test robot noticed the following build errors:

[auto build test ERROR on herbert-cryptodev-2.6/master]
[also build test ERROR on tip/x86/core linus/master v6.7 next-20240119]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tony-W-Wang-oc/crypto-padlock-sha-Matches-CPU-with-Family-with-6-explicitly/20240116-144827
base:   https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
patch link:    https://lore.kernel.org/r/20240116063549.3016-4-TonyWWang-oc%40zhaoxin.com
patch subject: [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512
config: loongarch-allyesconfig (https://download.01.org/0day-ci/archive/20240120/202401200020.g4KDJOTm-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240120/202401200020.g4KDJOTm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401200020.g4KDJOTm-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/crypto/zhaoxin-sha.c:20:10: fatal error: asm/cpu_device_id.h: No such file or directory
      20 | #include <asm/cpu_device_id.h>
         |          ^~~~~~~~~~~~~~~~~~~~~
   compilation terminated.


vim +20 drivers/crypto/zhaoxin-sha.c

  > 20	#include <asm/cpu_device_id.h>
    21	#include <asm/fpu/api.h>
    22	#include "zhaoxin-sha.h"
    23
diff mbox series

Patch

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 79c3bb9c99c3..2698e8fcf06d 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -798,4 +798,19 @@  config CRYPTO_DEV_SA2UL
 source "drivers/crypto/aspeed/Kconfig"
 source "drivers/crypto/starfive/Kconfig"
 
+config CRYPTO_DEV_ZHAOXIN_SHA
+	tristate "Support for Zhaoxin SHA1/SHA256/SHA384/SHA512 algorithms"
+	select CRYPTO_HASH
+	select CRYPTO_SHA1
+	select CRYPTO_SHA256
+	select CRYPTO_SHA384
+	select CRYPTO_SHA512
+	help
+	  Use Zhaoxin HW engine for SHA1/SHA256/SHA384/SHA512 algorithms.
+
+	  Available in ZX-C+ and newer processors.
+
+	  If unsure say M. The compiled module will be
+	  called zhaoxin-sha.
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index d859d6a5f3a4..b77c02d6dab7 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -51,3 +51,4 @@  obj-y += hisilicon/
 obj-$(CONFIG_CRYPTO_DEV_AMLOGIC_GXL) += amlogic/
 obj-y += intel/
 obj-y += starfive/
+obj-$(CONFIG_CRYPTO_DEV_ZHAOXIN_SHA) += zhaoxin-sha.o
diff --git a/drivers/crypto/zhaoxin-sha.c b/drivers/crypto/zhaoxin-sha.c
new file mode 100644
index 000000000000..bcfc77440890
--- /dev/null
+++ b/drivers/crypto/zhaoxin-sha.c
@@ -0,0 +1,500 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Cryptographic API.
+ *
+ * Support for Zhaoxin hardware crypto engine.
+ *
+ * Copyright (c) 2023  George Xue <georgexue@zhaoxin.com>
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/sha1.h>
+#include <crypto/sha2.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/scatterlist.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include "zhaoxin-sha.h"
+
+static inline void zhaoxin_output_block(uint32_t *src, uint32_t *dst, size_t count)
+{
+	while (count--)
+		*dst++ = swab32(*src++);
+}
+
+static int zhaoxin_sha1_init(struct shash_desc *desc)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha1_state){
+		.state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+	};
+
+	return 0;
+}
+
+static int zhaoxin_sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len)
+{
+	struct sha1_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial, done;
+	const u8 *src;
+	u8 buf[SHA1_BLOCK_SIZE * 2];
+	u8 *dst = &buf[0];
+
+	partial = sctx->count & (SHA1_BLOCK_SIZE - 1);
+	sctx->count += len;
+	done = 0;
+	src = data;
+	memcpy(dst, sctx->state, SHA1_DIGEST_SIZE);
+
+	if ((partial + len) >= SHA1_BLOCK_SIZE) {
+
+		/* Append the bytes in state's buffer to a block to handle */
+		if (partial) {
+			done = -partial;
+			memcpy(sctx->buffer + partial, data, done + SHA1_BLOCK_SIZE);
+			src = sctx->buffer;
+
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xc8"
+			: "+S"(src), "+D"(dst)
+			: "a"(-1L), "c"(1UL));
+
+			done += SHA1_BLOCK_SIZE;
+			src = data + done;
+		}
+
+		/* Process the left bytes from the input data */
+		if (len - done >= SHA1_BLOCK_SIZE) {
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xc8"
+			: "+S"(src), "+D"(dst)
+			: "a"(-1L),
+			"c"((unsigned long)((len - done) / SHA1_BLOCK_SIZE)));
+
+			done += ((len - done) - (len - done) % SHA1_BLOCK_SIZE);
+			src = data + done;
+		}
+		partial = 0;
+	}
+	memcpy(sctx->state, dst, SHA1_DIGEST_SIZE);
+	memcpy(sctx->buffer + partial, src, len - done);
+
+	return 0;
+}
+
+static int zhaoxin_sha1_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha1_state *state = shash_desc_ctx(desc);
+	unsigned int partial, padlen;
+	__be64 bits;
+	static const u8 padding[SHA1_BLOCK_SIZE] = {SHA_PADDING_BYTE, };
+	const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64);
+
+	bits = cpu_to_be64(state->count << 3);
+
+	/* Padding */
+	partial = state->count & (SHA1_BLOCK_SIZE - 1);
+	padlen = (partial < bit_offset) ? (bit_offset - partial) :
+		((SHA1_BLOCK_SIZE + bit_offset) - partial);
+	zhaoxin_sha1_update(desc, padding, padlen);
+
+	/* Append length field bytes */
+	zhaoxin_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+
+	/* Swap to output */
+	zhaoxin_output_block(state->state, (uint32_t *)out, SHA1_DIGEST_SIZE/sizeof(uint32_t));
+
+	return 0;
+}
+
+static int zhaoxin_sha256_init(struct shash_desc *desc)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha256_state){
+		.state = { SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+				SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7},
+	};
+
+	return 0;
+}
+
+static int zhaoxin_sha256_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial, done;
+	const u8 *src;
+	u8 buf[SHA256_BLOCK_SIZE*2];
+	u8 *dst = &buf[0];
+
+	partial = sctx->count & (SHA256_BLOCK_SIZE - 1);
+	sctx->count += len;
+	done = 0;
+	src = data;
+	memcpy(dst, sctx->state, SHA256_DIGEST_SIZE);
+
+	if ((partial + len) >= SHA256_BLOCK_SIZE) {
+
+		/* Append the bytes in state's buffer to a block to handle */
+		if (partial) {
+			done = -partial;
+			memcpy(sctx->buf + partial, data, done + SHA256_BLOCK_SIZE);
+			src = sctx->buf;
+
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xd0"
+			: "+S"(src), "+D"(dst)
+			: "a"(-1L), "c"(1UL));
+
+			done += SHA256_BLOCK_SIZE;
+			src = data + done;
+		}
+
+		/* Process the left bytes from input data*/
+		if (len - done >= SHA256_BLOCK_SIZE) {
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xd0"
+			: "+S"(src), "+D"(dst)
+			: "a"(-1L),
+			"c"((unsigned long)((len - done) / SHA256_BLOCK_SIZE)));
+
+			done += ((len - done) - (len - done) % SHA256_BLOCK_SIZE);
+			src = data + done;
+		}
+		partial = 0;
+	}
+	memcpy(sctx->state, dst, SHA256_DIGEST_SIZE);
+	memcpy(sctx->buf + partial, src, len - done);
+
+	return 0;
+}
+
+static int zhaoxin_sha256_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha256_state *state = shash_desc_ctx(desc);
+	unsigned int partial, padlen;
+	__be64 bits;
+	static const u8 padding[SHA256_BLOCK_SIZE] = {SHA_PADDING_BYTE, };
+	const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
+
+	bits = cpu_to_be64(state->count << 3);
+
+	/* Padding */
+	partial = state->count & (SHA256_BLOCK_SIZE - 1);
+	padlen = (partial < bit_offset) ? (bit_offset - partial) :
+		((SHA256_BLOCK_SIZE + bit_offset) - partial);
+	zhaoxin_sha256_update(desc, padding, padlen);
+
+	/* Append length field bytes */
+	zhaoxin_sha256_update(desc, (const u8 *)&bits, sizeof(bits));
+
+	/* Swap to output */
+	zhaoxin_output_block(state->state, (uint32_t *)out, SHA256_DIGEST_SIZE/sizeof(uint32_t));
+
+	return 0;
+}
+
+static inline void zhaoxin_output_block_512(uint64_t *src,
+			uint64_t *dst, size_t count)
+{
+	while (count--)
+		*dst++ = swab64(*src++);
+}
+
+static int zhaoxin_sha384_init(struct shash_desc *desc)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha512_state){
+		.state = { SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3,
+				SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7},
+		.count = {0, 0},
+	};
+
+	return 0;
+}
+
+static int zhaoxin_sha512_init(struct shash_desc *desc)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+
+	*sctx = (struct sha512_state){
+		.state = { SHA512_H0, SHA512_H1, SHA512_H2, SHA512_H3,
+				SHA512_H4, SHA512_H5, SHA512_H6, SHA512_H7},
+		.count = {0, 0},
+	};
+
+	return 0;
+}
+
+static int zhaoxin_sha512_update(struct shash_desc *desc, const u8 *data,
+			  unsigned int len)
+{
+	struct sha512_state *sctx = shash_desc_ctx(desc);
+	unsigned int partial, done;
+	const u8 *src;
+	u8 buf[SHA512_BLOCK_SIZE];
+	u8 *dst = &buf[0];
+
+	partial = sctx->count[0] % SHA512_BLOCK_SIZE;
+
+	sctx->count[0] += len;
+	if (sctx->count[0] < len)
+		sctx->count[1]++;
+
+	done = 0;
+	src = data;
+	memcpy(dst, sctx->state, SHA512_DIGEST_SIZE);
+
+	if ((partial + len) >= SHA512_BLOCK_SIZE) {
+		/* Append the bytes in state's buffer to a block to handle */
+		if (partial) {
+
+			done = -partial;
+			memcpy(sctx->buf + partial, data, done + SHA512_BLOCK_SIZE);
+
+			src = sctx->buf;
+
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xe0"
+			: "+S"(src), "+D"(dst)
+			: "c"(1UL));
+
+			done += SHA512_BLOCK_SIZE;
+			src = data + done;
+		}
+
+		/* Process the left bytes from input data*/
+		if (len - done >= SHA512_BLOCK_SIZE) {
+			asm volatile (".byte 0xf3,0x0f,0xa6,0xe0"
+			: "+S"(src), "+D"(dst)
+			: "c"((unsigned long)((len - done) / SHA512_BLOCK_SIZE)));
+
+			done += ((len - done) - (len - done) % SHA512_BLOCK_SIZE);
+			src = data + done;
+		}
+		partial = 0;
+	}
+
+	memcpy(sctx->state, dst, SHA512_DIGEST_SIZE);
+	memcpy(sctx->buf + partial, src, len - done);
+
+	return 0;
+}
+
+static int zhaoxin_sha512_final(struct shash_desc *desc, u8 *out)
+{
+	const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]);
+	struct sha512_state *state = shash_desc_ctx(desc);
+	unsigned int partial = state->count[0] % SHA512_BLOCK_SIZE, padlen;
+	__be64 bits2[2];
+
+	// Both SHA384 and SHA512 may be supported.
+	int dgst_size = crypto_shash_digestsize(desc->tfm);
+
+	static u8 padding[SHA512_BLOCK_SIZE];
+
+	memset(padding, 0, SHA512_BLOCK_SIZE);
+	padding[0] = SHA_PADDING_BYTE;
+
+	// Convert byte count in little endian to bit count in big endian.
+	bits2[0] = cpu_to_be64(state->count[1] << 3 | state->count[0] >> 61);
+	bits2[1] = cpu_to_be64(state->count[0] << 3);
+
+	padlen = (partial < bit_offset) ? (bit_offset - partial) :
+		((SHA512_BLOCK_SIZE + bit_offset) - partial);
+
+	zhaoxin_sha512_update(desc, padding, padlen);
+
+	/* Append length field bytes */
+	zhaoxin_sha512_update(desc, (const u8 *)bits2, sizeof(__be64[2]));
+
+	/* Swap to output */
+	zhaoxin_output_block_512(state->state, (uint64_t *)out, dgst_size/sizeof(uint64_t));
+
+	return 0;
+}
+
+static int zhaoxin_sha_export(struct shash_desc *desc,
+				void *out)
+{
+	int statesize = crypto_shash_statesize(desc->tfm);
+	void *sctx = shash_desc_ctx(desc);
+
+	memcpy(out, sctx, statesize);
+	return 0;
+}
+
+static int zhaoxin_sha_import(struct shash_desc *desc,
+				const void *in)
+{
+	int statesize = crypto_shash_statesize(desc->tfm);
+	void *sctx = shash_desc_ctx(desc);
+
+	memcpy(sctx, in, statesize);
+	return 0;
+}
+
+static struct shash_alg sha1_alg = {
+	.digestsize	=	SHA1_DIGEST_SIZE,
+	.init		=	zhaoxin_sha1_init,
+	.update		=	zhaoxin_sha1_update,
+	.final		=	zhaoxin_sha1_final,
+	.export		=	zhaoxin_sha_export,
+	.import		=	zhaoxin_sha_import,
+	.descsize	=	sizeof(struct sha1_state),
+	.statesize	=	sizeof(struct sha1_state),
+	.base		=	{
+		.cra_name		=	"sha1",
+		.cra_driver_name	=	"sha1-zhaoxin",
+		.cra_priority		=	ZHAOXIN_SHA_CRA_PRIORITY,
+		.cra_blocksize		=	SHA1_BLOCK_SIZE,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+static struct shash_alg sha256_alg = {
+	.digestsize	=	SHA256_DIGEST_SIZE,
+	.init		=	zhaoxin_sha256_init,
+	.update		=	zhaoxin_sha256_update,
+	.final		=	zhaoxin_sha256_final,
+	.export		=	zhaoxin_sha_export,
+	.import		=	zhaoxin_sha_import,
+	.descsize	=	sizeof(struct sha256_state),
+	.statesize	=	sizeof(struct sha256_state),
+	.base		=	{
+		.cra_name		=	"sha256",
+		.cra_driver_name	=	"sha256-zhaoxin",
+		.cra_priority		=	ZHAOXIN_SHA_CRA_PRIORITY,
+		.cra_blocksize		=	SHA256_BLOCK_SIZE,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+static struct shash_alg sha384_alg = {
+	.digestsize	=	SHA384_DIGEST_SIZE,
+	.init		=	zhaoxin_sha384_init,
+	.update		=	zhaoxin_sha512_update,
+	.final		=	zhaoxin_sha512_final,
+	.export		=	zhaoxin_sha_export,
+	.import		=	zhaoxin_sha_import,
+	.descsize	=	sizeof(struct sha512_state),
+	.statesize	=	sizeof(struct sha512_state),
+	.base		=	{
+		.cra_name		=	"sha384",
+		.cra_driver_name	=	"sha384-zhaoxin",
+		.cra_priority		=	ZHAOXIN_SHA_CRA_PRIORITY,
+		.cra_blocksize		=	SHA384_BLOCK_SIZE,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+static struct shash_alg sha512_alg = {
+	.digestsize	=	SHA512_DIGEST_SIZE,
+	.init		=	zhaoxin_sha512_init,
+	.update		=	zhaoxin_sha512_update,
+	.final		=	zhaoxin_sha512_final,
+	.export		=	zhaoxin_sha_export,
+	.import		=	zhaoxin_sha_import,
+	.descsize	=	sizeof(struct sha512_state),
+	.statesize	=	sizeof(struct sha512_state),
+	.base		=	{
+		.cra_name		=	"sha512",
+		.cra_driver_name	=	"sha512-zhaoxin",
+		.cra_priority		=	ZHAOXIN_SHA_CRA_PRIORITY,
+		.cra_blocksize		=	SHA512_BLOCK_SIZE,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+
+static const struct x86_cpu_id zhaoxin_sha_ids[] = {
+	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 6, X86_FEATURE_PHE, NULL),
+	X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 7, X86_FEATURE_PHE, NULL),
+	X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 7, X86_FEATURE_PHE, NULL),
+	{}
+};
+MODULE_DEVICE_TABLE(x86cpu, zhaoxin_sha_ids);
+
+static int __init zhaoxin_sha_init(void)
+{
+	int rc = -ENODEV;
+
+	struct shash_alg *sha1;
+	struct shash_alg *sha256;
+	struct shash_alg *sha384;
+	struct shash_alg *sha512;
+
+	if (!x86_match_cpu(zhaoxin_sha_ids) || !boot_cpu_has(X86_FEATURE_PHE_EN))
+		return -ENODEV;
+
+	sha1 = &sha1_alg;
+	sha256 = &sha256_alg;
+
+	rc = crypto_register_shash(sha1);
+	if (rc)
+		goto out;
+
+	rc = crypto_register_shash(sha256);
+	if (rc)
+		goto out_unreg1;
+
+	if (boot_cpu_has(X86_FEATURE_PHE2_EN)) {
+
+		sha384 = &sha384_alg;
+		sha512 = &sha512_alg;
+
+		rc = crypto_register_shash(sha384);
+		if (rc)
+			goto out_unreg2;
+
+		rc = crypto_register_shash(sha512);
+		if (rc)
+			goto out_unreg3;
+
+		pr_notice("Using Zhaoxin Hardware Engine for SHA1/SHA256/SHA384/SHA512 algorithms.\n");
+	} else
+		pr_notice("Using Zhaoxin Hardware Engine for SHA1/SHA256 algorithms.\n");
+
+
+	return 0;
+
+out_unreg3:
+	if (boot_cpu_has(X86_FEATURE_PHE2_EN))
+		crypto_unregister_shash(sha384);
+
+out_unreg2:
+	crypto_unregister_shash(sha256);
+out_unreg1:
+	crypto_unregister_shash(sha1);
+
+out:
+	pr_err("Zhaoxin Hardware Engine for SHA1/SHA256/SHA384/SHA512 initialization failed.\n");
+	return rc;
+}
+
+static void __exit zhaoxin_sha_fini(void)
+{
+	crypto_unregister_shash(&sha1_alg);
+	crypto_unregister_shash(&sha256_alg);
+
+	if (boot_cpu_has(X86_FEATURE_PHE2_EN)) {
+		crypto_unregister_shash(&sha384_alg);
+		crypto_unregister_shash(&sha512_alg);
+	}
+
+}
+
+module_init(zhaoxin_sha_init);
+module_exit(zhaoxin_sha_fini);
+
+MODULE_DESCRIPTION("Zhaoxin Hardware SHA1/SHA256/SHA384/SHA512 algorithms support.");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("George Xue");
+
+MODULE_ALIAS_CRYPTO("sha1-zhaoxin");
+MODULE_ALIAS_CRYPTO("sha256-zhaoxin");
+MODULE_ALIAS_CRYPTO("sha384-zhaoxin");
+MODULE_ALIAS_CRYPTO("sha512-zhaoxin");
diff --git a/drivers/crypto/zhaoxin-sha.h b/drivers/crypto/zhaoxin-sha.h
new file mode 100644
index 000000000000..373484384f6e
--- /dev/null
+++ b/drivers/crypto/zhaoxin-sha.h
@@ -0,0 +1,16 @@ 
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Driver for Zhaoxin Sha
+ *
+ * Copyright (c) 2023 George Xue<georgexue@zhaoxin.com>
+ */
+
+#ifndef _ZHAOXIN_SHA_H
+#define _ZHAOXIN_SHA_H
+
+#define ZHAOXIN_SHA_CRA_PRIORITY	300
+#define ZHAOXIN_SHA_COMPOSITE_PRIORITY 400
+
+#define SHA_PADDING_BYTE    0x80
+
+#endif	/* _ZHAOXIN_SHA_H */