diff mbox series

[v4] crypto: stm32 - Save and restore between each request

Message ID Y/3N6zFOZeehJQ/p@gondor.apana.org.au (mailing list archive)
State Changes Requested
Delegated to: Herbert Xu
Headers show
Series [v4] crypto: stm32 - Save and restore between each request | expand

Commit Message

Herbert Xu Feb. 28, 2023, 9:48 a.m. UTC
v4 fixes hmac to not reload the key over and over again causing
the hash state to be corrupted.

---8<---
The Crypto API hashing paradigm requires the hardware state to
be exported between *each* request because multiple unrelated
hashes may be processed concurrently.

The stm32 hardware is capable of producing the hardware hashing
state but it was only doing it in the export function.  This is
not only broken for export as you can't export a kernel pointer
and reimport it, but it also means that concurrent hashing was
fundamentally broken.

Fix this by moving the saving and restoring of hardware hash
state between each and every hashing request.

Also change the emptymsg check in stm32_hash_copy_hash to rely
on whether we have any existing hash state, rather than whether
this particular update request is empty.

Fixes: 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
Reported-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Linus Walleij Feb. 28, 2023, 8:50 p.m. UTC | #1
On Tue, Feb 28, 2023 at 10:48 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:

> v4 fixes hmac to not reload the key over and over again causing
> the hash state to be corrupted.

OK I tested this, sadly the same results.

Notice though: the HMAC versions fail on test vector 0 and
the non-MAC:ed fail on vector 1, so I guess that means test
vector 0 works with those?

Here is the complete log:

[    2.997312] alg: extra crypto tests enabled.  This is intended for
developer use only.
[   15.203609] Key type encrypted registered
[   22.553791] stm32-hash a03c2000.hash: allocated hmac(sha256) fallback
[   22.561976] alg: ahash: stm32-hmac-sha256 test failed (wrong
result) on test vector 0, cfg="init+update+final aligned buffer"
[   22.573387] Expected:
[   22.575674] 00000000: a2 1b 1f 5d 4c f4 f7 3a 4d d9 39 75 0f 7a 06 6a
[   22.582160] 00000010: 7f 98 cc 13 1c b1 6a 66 92 75 90 21 cf ab 81 81
[   22.588613] Obtained:
[   22.590917] 00000000: 46 24 76 a8 97 dd fd bd 40 d1 42 0e 08 a5 bc fe
[   22.597368] 00000010: eb 25 c3 e2 ad e6 a0 a9 08 3b 32 7b 9e f9 fc a1
[   22.603865] alg: self-tests for hmac(sha256) using
stm32-hmac-sha256 failed (rc=-22)
[   22.603887] ------------[ cut here ]------------
[   22.616297] WARNING: CPU: 1 PID: 75 at crypto/testmgr.c:5864
alg_test.part.0+0x4d0/0x4dc
[   22.624437] alg: self-tests for hmac(sha256) using
stm32-hmac-sha256 failed (rc=-22)
[   22.624448] Modules linked in:
[   22.635258] CPU: 1 PID: 75 Comm: cryptomgr_test Not tainted
6.2.0-12020-g1c3e1a0051be #67
[   22.643437] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   22.650405]  unwind_backtrace from show_stack+0x10/0x14
[   22.655650]  show_stack from dump_stack_lvl+0x40/0x4c
[   22.660724]  dump_stack_lvl from __warn+0x94/0xc0
[   22.665447]  __warn from warn_slowpath_fmt+0x118/0x164
[   22.670601]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[   22.676537]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[   22.682037]  cryptomgr_test from kthread+0xc0/0xc4
[   22.686843]  kthread from ret_from_fork+0x14/0x2c
[   22.691553] Exception stack(0xf0f45fb0 to 0xf0f45ff8)
[   22.696604] 5fa0:                                     00000000
00000000 00000000 00000000
[   22.704779] 5fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   22.712953] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   22.719596] ---[ end trace 0000000000000000 ]---
[   22.724494] stm32-hash a03c2000.hash: allocated sha256 fallback
[   22.769732] alg: ahash: stm32-sha256 test failed (wrong result) on
test vector 1, cfg="init+update+final aligned buffer"
[   22.780648] Expected:
[   22.782952] 00000000: ba 78 16 bf 8f 01 cf ea 41 41 40 de 5d ae 22 23
[   22.789392] 00000010: b0 03 61 a3 96 17 7a 9c b4 10 ff 61 f2 00 15 ad
[   22.795874] Obtained:
[   22.798147] 00000000: e3 b0 c4 42 98 fc 1c 14 9a fb f4 c8 99 6f b9 24
[   22.804607] 00000010: 27 ae 41 e4 64 9b 93 4c a4 95 99 1b 78 52 b8 55
[   22.811074] alg: self-tests for sha256 using stm32-sha256 failed (rc=-22)
[   22.811083] ------------[ cut here ]------------
[   22.822480] WARNING: CPU: 1 PID: 85 at crypto/testmgr.c:5864
alg_test.part.0+0x4d0/0x4dc
[   22.830607] alg: self-tests for sha256 using stm32-sha256 failed (rc=-22)
[   22.830615] Modules linked in:
[   22.840457] CPU: 1 PID: 85 Comm: cryptomgr_test Tainted: G        W
         6.2.0-12020-g1c3e1a0051be #67
[   22.850109] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   22.857069]  unwind_backtrace from show_stack+0x10/0x14
[   22.862307]  show_stack from dump_stack_lvl+0x40/0x4c
[   22.867373]  dump_stack_lvl from __warn+0x94/0xc0
[   22.872090]  __warn from warn_slowpath_fmt+0x118/0x164
[   22.877237]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[   22.883167]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[   22.888662]  cryptomgr_test from kthread+0xc0/0xc4
[   22.893462]  kthread from ret_from_fork+0x14/0x2c
[   22.898169] Exception stack(0xf0f6dfb0 to 0xf0f6dff8)
[   22.903216] dfa0:                                     00000000
00000000 00000000 00000000
[   22.911388] dfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   22.919559] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   22.926182] ---[ end trace 0000000000000000 ]---
[   36.677933] stm32-hash a03c2000.hash: allocated hmac(sha1) fallback
[   36.686991] alg: ahash: stm32-hmac-sha1 test failed (wrong result)
on test vector 0, cfg="init+update+final aligned buffer"
[   36.698242] Expected:
[   36.700547] 00000000: b6 17 31 86 55 05 72 64 e2 8b c0 b6 fb 37 8c 8e
[   36.707002] 00000010: f1 46 be 00
[   36.710345] Obtained:
[   36.712624] 00000000: 12 3f d7 8b da 01 00 78 6a e8 6b 76 f5 0f 01 bd
[   36.719072] 00000010: 18 e4 77 f3
[   36.722450] alg: self-tests for hmac(sha1) using stm32-hmac-sha1
failed (rc=-22)
[   36.722472] ------------[ cut here ]------------
[   36.734495] WARNING: CPU: 1 PID: 88 at crypto/testmgr.c:5864
alg_test.part.0+0x4d0/0x4dc
[   36.742628] alg: self-tests for hmac(sha1) using stm32-hmac-sha1
failed (rc=-22)
[   36.742637] Modules linked in:
[   36.753097] CPU: 1 PID: 88 Comm: cryptomgr_test Tainted: G        W
         6.2.0-12020-g1c3e1a0051be #67
[   36.762754] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   36.769719]  unwind_backtrace from show_stack+0x10/0x14
[   36.774963]  show_stack from dump_stack_lvl+0x40/0x4c
[   36.780036]  dump_stack_lvl from __warn+0x94/0xc0
[   36.784759]  __warn from warn_slowpath_fmt+0x118/0x164
[   36.789912]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[   36.795847]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[   36.801347]  cryptomgr_test from kthread+0xc0/0xc4
[   36.806153]  kthread from ret_from_fork+0x14/0x2c
[   36.810862] Exception stack(0xf0f79fb0 to 0xf0f79ff8)
[   36.815912] 9fa0:                                     00000000
00000000 00000000 00000000
[   36.824087] 9fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   36.832261] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   36.838902] ---[ end trace 0000000000000000 ]---
[   36.843762] stm32-hash a03c2000.hash: allocated sha1 fallback
[   36.889782] alg: ahash: stm32-sha1 test failed (wrong result) on
test vector 1, cfg="init+update+final aligned buffer"
[   36.900507] Expected:
[   36.902786] 00000000: a9 99 3e 36 47 06 81 6a ba 3e 25 71 78 50 c2 6c
[   36.909225] 00000010: 9c d0 d8 9d
[   36.912564] Obtained:
[   36.914834] 00000000: da 39 a3 ee 5e 6b 4b 0d 32 55 bf ef 95 60 18 90
[   36.921296] 00000010: af d8 07 09
[   36.924627] alg: self-tests for sha1 using stm32-sha1 failed (rc=-22)
[   36.924635] ------------[ cut here ]------------
[   36.935687] WARNING: CPU: 1 PID: 100 at crypto/testmgr.c:5864
alg_test.part.0+0x4d0/0x4dc
[   36.943902] alg: self-tests for sha1 using stm32-sha1 failed (rc=-22)
[   36.943909] Modules linked in:
[   36.953406] CPU: 1 PID: 100 Comm: cryptomgr_test Tainted: G
W          6.2.0-12020-g1c3e1a0051be #67
[   36.963144] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   36.970103]  unwind_backtrace from show_stack+0x10/0x14
[   36.975340]  show_stack from dump_stack_lvl+0x40/0x4c
[   36.980404]  dump_stack_lvl from __warn+0x94/0xc0
[   36.985120]  __warn from warn_slowpath_fmt+0x118/0x164
[   36.990266]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[   36.996195]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[   37.001689]  cryptomgr_test from kthread+0xc0/0xc4
[   37.006488]  kthread from ret_from_fork+0x14/0x2c
[   37.011193] Exception stack(0xf0f8dfb0 to 0xf0f8dff8)
[   37.016240] dfa0:                                     00000000
00000000 00000000 00000000
[   37.024411] dfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   37.032581] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   37.039222] ---[ end trace 0000000000000000 ]---

Here I have applied a patch like this to see the failing vectors:

commit 1c3e1a0051be234ef109e97075783c28e3b07452 (HEAD ->
ux500-fixup-stm32-cryp-herbert-v4)
Author: Linus Walleij <linus.walleij@linaro.org>
Date:   Mon Dec 26 09:53:10 2022 +0100

    test hacks

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index c91e93ece20b..db511293933b 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1203,6 +1203,10 @@ static int check_hash_result(const char *type,
        if (memcmp(result, vec->digest, digestsize) != 0) {
                pr_err("alg: %s: %s test failed (wrong result) on test
vector %s, cfg=\"%s\"\n",
                       type, driver, vec_name, cfg->name);
+               pr_err("Expected:\n");
+               hexdump(vec->digest, digestsize);
+               pr_err("Obtained:\n");
+               hexdump(result, digestsize);
                return -EINVAL;

I'm a bit lost on what to try next :/

Yours,
Linus Walleij
Herbert Xu March 1, 2023, 1:30 a.m. UTC | #2
On Tue, Feb 28, 2023 at 09:50:55PM +0100, Linus Walleij wrote:
> 
> OK I tested this, sadly the same results.
> 
> Notice though: the HMAC versions fail on test vector 0 and
> the non-MAC:ed fail on vector 1, so I guess that means test
> vector 0 works with those?

Hah, test vector 0 for sha256 is an empty message.  While test
vector 1 is the same as test vector 0 for hmac(sha256).

So I guess at least the fallback is still working :)

Cheers,
Herbert Xu March 1, 2023, 1:36 a.m. UTC | #3
On Tue, Feb 28, 2023 at 09:50:55PM +0100, Linus Walleij wrote:
> 
> Notice though: the HMAC versions fail on test vector 0 and
> the non-MAC:ed fail on vector 1, so I guess that means test
> vector 0 works with those?

The failing vector is the first one where we save the state from
the hardware and then try to restore it.

Is your device ux500 or stm32? Perhaps state saving/restoring is
simply broken on ux500 (as the original ux500 driver didn't support
export/import and always used a fallback)?

Thanks,
Herbert Xu March 1, 2023, 1:46 a.m. UTC | #4
On Wed, Mar 01, 2023 at 09:36:08AM +0800, Herbert Xu wrote:
>
> Is your device ux500 or stm32? Perhaps state saving/restoring is
> simply broken on ux500 (as the original ux500 driver didn't support
> export/import and always used a fallback)?

Interesting, I dug up the old ux500 driver and even though
it doesn't have export/import hooked up, it does actually appear
to save/restore hardware state.  In fact it seems to do it multiple
times per request, even when it's unnecessary.

I'll try to see if the saving/restoring is subtly different
between ux500 and stm32.

Cheers,
Linus Walleij March 1, 2023, 12:22 p.m. UTC | #5
On Wed, Mar 1, 2023 at 2:36 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:

> The failing vector is the first one where we save the state from
> the hardware and then try to restore it.

Yeah that's typical :/

> Is your device ux500 or stm32? Perhaps state saving/restoring is
> simply broken on ux500 (as the original ux500 driver didn't support
> export/import and always used a fallback)?

It's Ux500 but I had no problem with import/export before,
and yeah it has state save/restore in HW.

Yours,
Linus Walleij
Herbert Xu March 2, 2023, 1:16 a.m. UTC | #6
On Wed, Mar 01, 2023 at 01:22:13PM +0100, Linus Walleij wrote:
>
> It's Ux500 but I had no problem with import/export before,
> and yeah it has state save/restore in HW.

So with the stm32 driver your ux500 is able to pass the extra
fuzz tests, right? That should indeed test export and import.

Thanks,
Herbert Xu March 2, 2023, 6:04 a.m. UTC | #7
On Wed, Mar 01, 2023 at 01:22:13PM +0100, Linus Walleij wrote:
>
> It's Ux500 but I had no problem with import/export before,
> and yeah it has state save/restore in HW.

I think I see the problem.  My patch wasn't waiting for the hash
computation to complete before saving the state so obviously it
will get the wrong hash state every single time.

I'll fix this up and some other inconsistencies (my reading of the
documentation is that there are 54 registers (0-53), not 53) and
resend the patch.

Cheers,
diff mbox series

Patch

diff --git a/drivers/crypto/stm32/stm32-hash.c b/drivers/crypto/stm32/stm32-hash.c
index 7bf805563ac2..a4c4cb1735d4 100644
--- a/drivers/crypto/stm32/stm32-hash.c
+++ b/drivers/crypto/stm32/stm32-hash.c
@@ -7,7 +7,6 @@ 
  */
 
 #include <linux/clk.h>
-#include <linux/crypto.h>
 #include <linux/delay.h>
 #include <linux/dma-mapping.h>
 #include <linux/dmaengine.h>
@@ -127,6 +126,16 @@  struct stm32_hash_ctx {
 	int			keylen;
 };
 
+struct stm32_hash_state {
+	u16			bufcnt;
+	u16			buflen;
+
+	u8 buffer[HASH_BUFLEN] __aligned(4);
+
+	/* hash state */
+	u32			hw_context[3 + HASH_CSR_REGISTER_NUMBER];
+};
+
 struct stm32_hash_request_ctx {
 	struct stm32_hash_dev	*hdev;
 	unsigned long		flags;
@@ -134,8 +143,6 @@  struct stm32_hash_request_ctx {
 
 	u8 digest[SHA256_DIGEST_SIZE] __aligned(sizeof(u32));
 	size_t			digcnt;
-	size_t			bufcnt;
-	size_t			buflen;
 
 	/* DMA */
 	struct scatterlist	*sg;
@@ -149,10 +156,7 @@  struct stm32_hash_request_ctx {
 
 	u8			data_type;
 
-	u8 buffer[HASH_BUFLEN] __aligned(sizeof(u32));
-
-	/* Export Context */
-	u32			*hw_context;
+	struct stm32_hash_state state;
 };
 
 struct stm32_hash_algs_info {
@@ -183,7 +187,6 @@  struct stm32_hash_dev {
 	struct ahash_request	*req;
 	struct crypto_engine	*engine;
 
-	int			err;
 	unsigned long		flags;
 
 	struct dma_chan		*dma_lch;
@@ -326,11 +329,12 @@  static void stm32_hash_write_ctrl(struct stm32_hash_dev *hdev, int bufcnt)
 
 static void stm32_hash_append_sg(struct stm32_hash_request_ctx *rctx)
 {
+	struct stm32_hash_state *state = &rctx->state;
 	size_t count;
 
-	while ((rctx->bufcnt < rctx->buflen) && rctx->total) {
+	while ((state->bufcnt < state->buflen) && rctx->total) {
 		count = min(rctx->sg->length - rctx->offset, rctx->total);
-		count = min(count, rctx->buflen - rctx->bufcnt);
+		count = min_t(size_t, count, state->buflen - state->bufcnt);
 
 		if (count <= 0) {
 			if ((rctx->sg->length == 0) && !sg_is_last(rctx->sg)) {
@@ -341,10 +345,10 @@  static void stm32_hash_append_sg(struct stm32_hash_request_ctx *rctx)
 			}
 		}
 
-		scatterwalk_map_and_copy(rctx->buffer + rctx->bufcnt, rctx->sg,
-					 rctx->offset, count, 0);
+		scatterwalk_map_and_copy(state->buffer + state->bufcnt,
+					 rctx->sg, rctx->offset, count, 0);
 
-		rctx->bufcnt += count;
+		state->bufcnt += count;
 		rctx->offset += count;
 		rctx->total -= count;
 
@@ -413,26 +417,27 @@  static int stm32_hash_xmit_cpu(struct stm32_hash_dev *hdev,
 static int stm32_hash_update_cpu(struct stm32_hash_dev *hdev)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(hdev->req);
+	struct stm32_hash_state *state = &rctx->state;
 	int bufcnt, err = 0, final;
 
 	dev_dbg(hdev->dev, "%s flags %lx\n", __func__, rctx->flags);
 
 	final = (rctx->flags & HASH_FLAGS_FINUP);
 
-	while ((rctx->total >= rctx->buflen) ||
-	       (rctx->bufcnt + rctx->total >= rctx->buflen)) {
+	while ((rctx->total >= state->buflen) ||
+	       (state->bufcnt + rctx->total >= state->buflen)) {
 		stm32_hash_append_sg(rctx);
-		bufcnt = rctx->bufcnt;
-		rctx->bufcnt = 0;
-		err = stm32_hash_xmit_cpu(hdev, rctx->buffer, bufcnt, 0);
+		bufcnt = state->bufcnt;
+		state->bufcnt = 0;
+		err = stm32_hash_xmit_cpu(hdev, state->buffer, bufcnt, 0);
 	}
 
 	stm32_hash_append_sg(rctx);
 
 	if (final) {
-		bufcnt = rctx->bufcnt;
-		rctx->bufcnt = 0;
-		err = stm32_hash_xmit_cpu(hdev, rctx->buffer, bufcnt, 1);
+		bufcnt = state->bufcnt;
+		state->bufcnt = 0;
+		err = stm32_hash_xmit_cpu(hdev, state->buffer, bufcnt, 1);
 
 		/* If we have an IRQ, wait for that, else poll for completion */
 		if (hdev->polled) {
@@ -441,8 +446,20 @@  static int stm32_hash_update_cpu(struct stm32_hash_dev *hdev)
 			hdev->flags |= HASH_FLAGS_OUTPUT_READY;
 			err = 0;
 		}
+	} else {
+		u32 *preg = state->hw_context;
+		int i;
+
+		if (!hdev->pdata->ux500)
+			*preg++ = stm32_hash_read(hdev, HASH_IMR);
+		*preg++ = stm32_hash_read(hdev, HASH_STR);
+		*preg++ = stm32_hash_read(hdev, HASH_CR);
+		for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
+			*preg++ = stm32_hash_read(hdev, HASH_CSR(i));
 	}
 
+	rctx->flags |= HASH_FLAGS_INIT;
+
 	return err;
 }
 
@@ -584,10 +601,10 @@  static int stm32_hash_dma_init(struct stm32_hash_dev *hdev)
 static int stm32_hash_dma_send(struct stm32_hash_dev *hdev)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(hdev->req);
+	u32 *buffer = (void *)rctx->state.buffer;
 	struct scatterlist sg[1], *tsg;
 	int err = 0, len = 0, reg, ncp = 0;
 	unsigned int i;
-	u32 *buffer = (void *)rctx->buffer;
 
 	rctx->sg = hdev->req->src;
 	rctx->total = hdev->req->nbytes;
@@ -615,7 +632,7 @@  static int stm32_hash_dma_send(struct stm32_hash_dev *hdev)
 
 				ncp = sg_pcopy_to_buffer(
 					rctx->sg, rctx->nents,
-					rctx->buffer, sg->length - len,
+					rctx->state.buffer, sg->length - len,
 					rctx->total - sg->length + len);
 
 				sg->length = len;
@@ -671,6 +688,8 @@  static int stm32_hash_dma_send(struct stm32_hash_dev *hdev)
 		err = stm32_hash_hmac_dma_send(hdev);
 	}
 
+	rctx->flags |= HASH_FLAGS_INIT;
+
 	return err;
 }
 
@@ -749,14 +768,12 @@  static int stm32_hash_init(struct ahash_request *req)
 		return -EINVAL;
 	}
 
-	rctx->bufcnt = 0;
-	rctx->buflen = HASH_BUFLEN;
+	rctx->state.bufcnt = 0;
+	rctx->state.buflen = HASH_BUFLEN;
 	rctx->total = 0;
 	rctx->offset = 0;
 	rctx->data_type = HASH_DATA_8_BITS;
 
-	memset(rctx->buffer, 0, HASH_BUFLEN);
-
 	if (ctx->flags & HASH_FLAGS_HMAC)
 		rctx->flags |= HASH_FLAGS_HMAC;
 
@@ -774,15 +791,16 @@  static int stm32_hash_final_req(struct stm32_hash_dev *hdev)
 {
 	struct ahash_request *req = hdev->req;
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
+	struct stm32_hash_state *state = &rctx->state;
+	int buflen = state->bufcnt;
 	int err;
-	int buflen = rctx->bufcnt;
 
-	rctx->bufcnt = 0;
+	state->bufcnt = 0;
 
 	if (!(rctx->flags & HASH_FLAGS_CPU))
 		err = stm32_hash_dma_send(hdev);
 	else
-		err = stm32_hash_xmit_cpu(hdev, rctx->buffer, buflen, 1);
+		err = stm32_hash_xmit_cpu(hdev, state->buffer, buflen, 1);
 
 	/* If we have an IRQ, wait for that, else poll for completion */
 	if (hdev->polled) {
@@ -832,7 +850,7 @@  static void stm32_hash_copy_hash(struct ahash_request *req)
 	__be32 *hash = (void *)rctx->digest;
 	unsigned int i, hashsize;
 
-	if (hdev->pdata->broken_emptymsg && !req->nbytes)
+	if (hdev->pdata->broken_emptymsg && !(rctx->flags & HASH_FLAGS_INIT))
 		return stm32_hash_emptymsg_fallback(req);
 
 	switch (rctx->flags & HASH_FLAGS_ALGO_MASK) {
@@ -882,11 +900,6 @@  static void stm32_hash_finish_req(struct ahash_request *req, int err)
 	if (!err && (HASH_FLAGS_FINAL & hdev->flags)) {
 		stm32_hash_copy_hash(req);
 		err = stm32_hash_finish(req);
-		hdev->flags &= ~(HASH_FLAGS_FINAL | HASH_FLAGS_CPU |
-				 HASH_FLAGS_INIT | HASH_FLAGS_DMA_READY |
-				 HASH_FLAGS_OUTPUT_READY | HASH_FLAGS_HMAC |
-				 HASH_FLAGS_HMAC_INIT | HASH_FLAGS_HMAC_FINAL |
-				 HASH_FLAGS_HMAC_KEY);
 	} else {
 		rctx->flags |= HASH_FLAGS_ERRORS;
 	}
@@ -897,67 +910,61 @@  static void stm32_hash_finish_req(struct ahash_request *req, int err)
 	crypto_finalize_hash_request(hdev->engine, req, err);
 }
 
-static int stm32_hash_hw_init(struct stm32_hash_dev *hdev,
+static void stm32_hash_hw_init(struct stm32_hash_dev *hdev,
 			      struct stm32_hash_request_ctx *rctx)
 {
 	pm_runtime_get_sync(hdev->dev);
-
-	if (!(HASH_FLAGS_INIT & hdev->flags)) {
-		stm32_hash_write(hdev, HASH_CR, HASH_CR_INIT);
-		stm32_hash_write(hdev, HASH_STR, 0);
-		stm32_hash_write(hdev, HASH_DIN, 0);
-		stm32_hash_write(hdev, HASH_IMR, 0);
-		hdev->err = 0;
-	}
-
-	return 0;
 }
 
-static int stm32_hash_one_request(struct crypto_engine *engine, void *areq);
-static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq);
-
 static int stm32_hash_handle_queue(struct stm32_hash_dev *hdev,
 				   struct ahash_request *req)
 {
 	return crypto_transfer_hash_request_to_engine(hdev->engine, req);
 }
 
-static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq)
+static int stm32_hash_one_request(struct crypto_engine *engine, void *areq)
 {
 	struct ahash_request *req = container_of(areq, struct ahash_request,
 						 base);
 	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
 	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
 	struct stm32_hash_request_ctx *rctx;
+	int err = 0;
 
 	if (!hdev)
 		return -ENODEV;
 
+	dev_dbg(hdev->dev, "processing new req, op: %lu, nbytes %d\n",
+		rctx->op, req->nbytes);
+
+	stm32_hash_hw_init(hdev, rctx);
+
 	hdev->req = req;
+	hdev->flags = 0;
 
 	rctx = ahash_request_ctx(req);
 
-	dev_dbg(hdev->dev, "processing new req, op: %lu, nbytes %d\n",
-		rctx->op, req->nbytes);
+	if (rctx->flags & HASH_FLAGS_INIT) {
+		u32 *preg = rctx->state.hw_context;
+		u32 reg;
+		int i;
 
-	return stm32_hash_hw_init(hdev, rctx);
-}
-
-static int stm32_hash_one_request(struct crypto_engine *engine, void *areq)
-{
-	struct ahash_request *req = container_of(areq, struct ahash_request,
-						 base);
-	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
-	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	struct stm32_hash_request_ctx *rctx;
-	int err = 0;
+		if (!hdev->pdata->ux500)
+			stm32_hash_write(hdev, HASH_IMR, *preg++);
+		stm32_hash_write(hdev, HASH_STR, *preg++);
+		stm32_hash_write(hdev, HASH_CR, *preg);
+		reg = *preg++ | HASH_CR_INIT;
+		stm32_hash_write(hdev, HASH_CR, reg);
 
-	if (!hdev)
-		return -ENODEV;
+		for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
+			stm32_hash_write(hdev, HASH_CSR(i), *preg++);
 
-	hdev->req = req;
+		hdev->flags |= HASH_FLAGS_INIT;
 
-	rctx = ahash_request_ctx(req);
+		if (rctx->flags & HASH_FLAGS_HMAC)
+			hdev->flags |= HASH_FLAGS_HMAC |
+				       HASH_FLAGS_HMAC_KEY;
+	}
 
 	if (rctx->op == HASH_OP_UPDATE)
 		err = stm32_hash_update_req(hdev);
@@ -985,6 +992,7 @@  static int stm32_hash_enqueue(struct ahash_request *req, unsigned int op)
 static int stm32_hash_update(struct ahash_request *req)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
+	struct stm32_hash_state *state = &rctx->state;
 
 	if (!req->nbytes || !(rctx->flags & HASH_FLAGS_CPU))
 		return 0;
@@ -993,7 +1001,7 @@  static int stm32_hash_update(struct ahash_request *req)
 	rctx->sg = req->src;
 	rctx->offset = 0;
 
-	if ((rctx->bufcnt + rctx->total < rctx->buflen)) {
+	if ((state->bufcnt + rctx->total < state->buflen)) {
 		stm32_hash_append_sg(rctx);
 		return 0;
 	}
@@ -1044,35 +1052,13 @@  static int stm32_hash_digest(struct ahash_request *req)
 static int stm32_hash_export(struct ahash_request *req, void *out)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
-	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
-	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	u32 *preg;
-	unsigned int i;
-	int ret;
+	bool empty = !(rctx->flags & HASH_FLAGS_INIT);
+	u8 *p = out;
 
-	pm_runtime_get_sync(hdev->dev);
-
-	ret = stm32_hash_wait_busy(hdev);
-	if (ret)
-		return ret;
-
-	rctx->hw_context = kmalloc_array(3 + HASH_CSR_REGISTER_NUMBER,
-					 sizeof(u32),
-					 GFP_KERNEL);
+	*(u8 *)p = empty;
 
-	preg = rctx->hw_context;
-
-	if (!hdev->pdata->ux500)
-		*preg++ = stm32_hash_read(hdev, HASH_IMR);
-	*preg++ = stm32_hash_read(hdev, HASH_STR);
-	*preg++ = stm32_hash_read(hdev, HASH_CR);
-	for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
-		*preg++ = stm32_hash_read(hdev, HASH_CSR(i));
-
-	pm_runtime_mark_last_busy(hdev->dev);
-	pm_runtime_put_autosuspend(hdev->dev);
-
-	memcpy(out, rctx, sizeof(*rctx));
+	if (!empty)
+		memcpy(p + 1, &rctx->state, sizeof(rctx->state));
 
 	return 0;
 }
@@ -1080,32 +1066,14 @@  static int stm32_hash_export(struct ahash_request *req, void *out)
 static int stm32_hash_import(struct ahash_request *req, const void *in)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
-	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
-	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	const u32 *preg = in;
-	u32 reg;
-	unsigned int i;
-
-	memcpy(rctx, in, sizeof(*rctx));
+	const u8 *p = in;
 
-	preg = rctx->hw_context;
-
-	pm_runtime_get_sync(hdev->dev);
+	stm32_hash_init(req);
 
-	if (!hdev->pdata->ux500)
-		stm32_hash_write(hdev, HASH_IMR, *preg++);
-	stm32_hash_write(hdev, HASH_STR, *preg++);
-	stm32_hash_write(hdev, HASH_CR, *preg);
-	reg = *preg++ | HASH_CR_INIT;
-	stm32_hash_write(hdev, HASH_CR, reg);
-
-	for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
-		stm32_hash_write(hdev, HASH_CSR(i), *preg++);
-
-	pm_runtime_mark_last_busy(hdev->dev);
-	pm_runtime_put_autosuspend(hdev->dev);
-
-	kfree(rctx->hw_context);
+	if (!*(u8 *)p) {
+		rctx->flags |= HASH_FLAGS_INIT;
+		memcpy(&rctx->state, p + 1, sizeof(rctx->state));
+	}
 
 	return 0;
 }
@@ -1162,8 +1130,6 @@  static int stm32_hash_cra_init_algs(struct crypto_tfm *tfm,
 		ctx->flags |= HASH_FLAGS_HMAC;
 
 	ctx->enginectx.op.do_one_request = stm32_hash_one_request;
-	ctx->enginectx.op.prepare_request = stm32_hash_prepare_req;
-	ctx->enginectx.op.unprepare_request = NULL;
 
 	return stm32_hash_init_fallback(tfm);
 }
@@ -1255,7 +1221,7 @@  static struct ahash_alg algs_md5[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = MD5_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "md5",
 				.cra_driver_name = "stm32-md5",
@@ -1282,7 +1248,7 @@  static struct ahash_alg algs_md5[] = {
 		.setkey = stm32_hash_setkey,
 		.halg = {
 			.digestsize = MD5_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "hmac(md5)",
 				.cra_driver_name = "stm32-hmac-md5",
@@ -1311,7 +1277,7 @@  static struct ahash_alg algs_sha1[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA1_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "sha1",
 				.cra_driver_name = "stm32-sha1",
@@ -1338,7 +1304,7 @@  static struct ahash_alg algs_sha1[] = {
 		.setkey = stm32_hash_setkey,
 		.halg = {
 			.digestsize = SHA1_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "hmac(sha1)",
 				.cra_driver_name = "stm32-hmac-sha1",
@@ -1367,7 +1333,7 @@  static struct ahash_alg algs_sha224[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA224_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "sha224",
 				.cra_driver_name = "stm32-sha224",
@@ -1394,7 +1360,7 @@  static struct ahash_alg algs_sha224[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA224_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "hmac(sha224)",
 				.cra_driver_name = "stm32-hmac-sha224",
@@ -1423,7 +1389,7 @@  static struct ahash_alg algs_sha256[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA256_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "sha256",
 				.cra_driver_name = "stm32-sha256",
@@ -1450,7 +1416,7 @@  static struct ahash_alg algs_sha256[] = {
 		.setkey = stm32_hash_setkey,
 		.halg = {
 			.digestsize = SHA256_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "hmac(sha256)",
 				.cra_driver_name = "stm32-hmac-sha256",