[2/2] crypto: DRBG - use caller buffer if suitable
diff mbox

Message ID 1814539.28Fd0X6sOF@positron.chronox.de
State Changes Requested
Delegated to: Herbert Xu
Headers show

Commit Message

Stephan Mueller July 10, 2018, 3:57 p.m. UTC
The SGL can directly operate caller-provided memory with the exception
of stack memory. The DRBG detects whether the caller provided
non-suitable memory and uses the scratchpad only on those circumstances.

This patch increases the speed of the CTR DRBG by 1 to 3 percent
depending on the buffer size of the output buffer.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 crypto/drbg.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Herbert Xu July 19, 2018, 9:34 a.m. UTC | #1
On Tue, Jul 10, 2018 at 05:57:00PM +0200, Stephan Müller wrote:
> The SGL can directly operate caller-provided memory with the exception
> of stack memory. The DRBG detects whether the caller provided
> non-suitable memory and uses the scratchpad only on those circumstances.
> 
> This patch increases the speed of the CTR DRBG by 1 to 3 percent
> depending on the buffer size of the output buffer.
> 
> Signed-off-by: Stephan Mueller <smueller@chronox.de>

I think this is an abuse of virt_addr_valid.  It's meant to catch
bogus uses of SG lists, it's not meant to be a guarantee that an
address can be used on an SG list.

A better solution would be either an SG-list interface for rng,
or alternatively a virtual address interface for sync skcipher.

Cheers,
Stephan Mueller July 19, 2018, 8:57 p.m. UTC | #2
Am Donnerstag, 19. Juli 2018, 11:34:33 CEST schrieb Herbert Xu:

Hi Herbert,

> I think this is an abuse of virt_addr_valid.  It's meant to catch
> bogus uses of SG lists, it's not meant to be a guarantee that an
> address can be used on an SG list.

Thanks for your insights.
> 
> A better solution would be either an SG-list interface for rng,
> or alternatively a virtual address interface for sync skcipher.

My goal is to implement an in-place cipher operation drbg_kcapi_sym_ctr 
operating directly on the output buffer if possible. Without knowing whether 
the output buffer is valid for SGLs, I would need to use the scratchpad buffer 
as input/output buffer. That means that the data must be memcpy'ed from the 
scratchpad buffer to the output buffer.

To provide the fastest DRBG implementation, we should eliminate the memcpy in 
drbg_kcapi_sym_ctr and with it the restriction of generating 
DRBG_OUTSCRATCHLEN with one cipher invocation.

I.e. the ultimate goal is to operate directly on the output buffer if at all 
possible.

So far, all RNGs generate data for one buffer only. Thus, I would consider 
converting the RNG interface to use SGLs as overkill. Furthermore, if the RNG 
uses the SHASH ciphers (like the DRBG use case), the input data is expected to 
be a straight buffer.

Also, even the caller of the RNG may not know whether the destination buffer 
is valid for an SGL in case it is an intermediate layer.

Therefore, I am not sure that either having an SGL interface for the RNG API 
or a virtual address interface for the sync skcipher would be helpful.

Is there no way of identifying whether a buffer is valid for SGL operation?

PS: I experimented with the in-place cipher operation in drbg_kcapi_sym_ctr 
while operating directly on the output buffer. The result was a more than 3-
times increase in performance.

Current implementation:

      16 bytes|           12.267661 MB/s|    61338304 bytes |  5000000213 ns
      32 bytes|           23.603770 MB/s|   118018848 bytes |  5000000073 ns
      64 bytes|           46.732262 MB/s|   233661312 bytes |  5000000241 ns
     128 bytes|           90.038042 MB/s|   450190208 bytes |  5000000244 ns
     256 bytes|          160.399616 MB/s|   801998080 bytes |  5000000393 ns
     512 bytes|          259.878400 MB/s|  1299392000 bytes |  5000001675 ns
    1024 bytes|          386.050662 MB/s|  1930253312 bytes |  5000001661 ns
    2048 bytes|          493.641728 MB/s|  2468208640 bytes |  5000001598 ns
    4096 bytes|          581.835981 MB/s|  2909179904 bytes |  5000003426 ns

In-place cipher operation directly on output buffer:

      16 bytes|           16.017830 MB/s|    80089152 bytes |  5000000752 ns
      32 bytes|           30.775686 MB/s|   153878432 bytes |  5000000701 ns
      64 bytes|           58.299430 MB/s|   291497152 bytes |  5000000466 ns
     128 bytes|          114.847462 MB/s|   574237312 bytes |  5000000385 ns
     256 bytes|          218.359859 MB/s|  1091799296 bytes |  5000000476 ns
     512 bytes|          416.003379 MB/s|  2080016896 bytes |  5000001105 ns
    1024 bytes|          714.792346 MB/s|  3573961728 bytes |  5000718637 ns
    2048 bytes|            1.143480 GB/s|  5717401600 bytes |  5000000094 ns
    4096 bytes|            1.609953 GB/s|  8049766400 bytes |  5000001687 ns

Ciao
Stephan
Herbert Xu July 20, 2018, 3:54 a.m. UTC | #3
On Thu, Jul 19, 2018 at 10:57:16PM +0200, Stephan Müller wrote:
>
> Therefore, I am not sure that either having an SGL interface for the RNG API 
> or a virtual address interface for the sync skcipher would be helpful.

Could you please explain again why a virtual address interface to
sync skcipher algorithms won't be useful? It's something that others
have asked for in the past too.

Thanks,
Stephan Mueller July 20, 2018, 5:09 a.m. UTC | #4
> Am 20.07.2018 um 05:54 schrieb Herbert Xu <herbert@gondor.apana.org.au>:
> 
>> On Thu, Jul 19, 2018 at 10:57:16PM +0200, Stephan Müller wrote:
>> 
>> Therefore, I am not sure that either having an SGL interface for the RNG API 
>> or a virtual address interface for the sync skcipher would be helpful.
> 
> Could you please explain again why a virtual address interface to
> sync skcipher algorithms won't be useful? It's something that others
> have asked for in the past too.

Maybe I have a different understanding of how such interface should look like.

Can you give me some more detail on how you envision such virtual address interface should work?

Thanks a lot
Stephan
Herbert Xu July 20, 2018, 5:54 a.m. UTC | #5
On Fri, Jul 20, 2018 at 07:09:05AM +0200, Stephan Mueller wrote:
>
> Maybe I have a different understanding of how such interface should look like.
> 
> Can you give me some more detail on how you envision such virtual address interface should work?

It should look like shash.

Cheers,
Stephan Mueller July 20, 2018, 6:08 a.m. UTC | #6
>> On Fri, Jul 20, 2018 at 07:09:05AM +0200, Stephan Mueller wrote:
>> 
>> Maybe I have a different understanding of how such interface should look like.
>> 
>> Can you give me some more detail on how you envision such virtual address interface should work?
> 
> It should look like shash.
> 

Ok, but then:

- should it be synchronous like blkcipher?

- the TFMs (cipher Impls and templates) all operate on SGLs - should a virt API simply convert a virt address into an SGL? If so, the problem that triggered this thread is still not solved but only pushed to a different layer. If it should not just be an SGL wrapper, should all TFMs be changed?

Thanks
Herbert Xu July 20, 2018, 6:21 a.m. UTC | #7
On Fri, Jul 20, 2018 at 08:08:22AM +0200, Stephan Mueller wrote:
>
> - should it be synchronous like blkcipher?

It should be synchronous.

> - the TFMs (cipher Impls and templates) all operate on SGLs - should a virt API simply convert a virt address into an SGL? If so, the problem that triggered this thread is still not solved but only pushed to a different layer. If it should not just be an SGL wrapper, should all TFMs be changed?

Obviously if we're going to do this then we would be converting
the underlying algorithms over to the virtual address-based interface
just like shash.

Cheers,

Patch
diff mbox

diff --git a/crypto/drbg.c b/crypto/drbg.c
index ee302fd229ad..193354e9d207 100644
--- a/crypto/drbg.c
+++ b/crypto/drbg.c
@@ -1748,14 +1748,20 @@  static int drbg_kcapi_sym_ctr(struct drbg_state *drbg,
 {
 	struct scatterlist *sg_in = &drbg->sg_in, *sg_out = &drbg->sg_out;
 	int ret;
+	bool virt_addr_valid = virt_addr_valid(outbuf);
 
 	sg_set_buf(sg_in, inbuf, inlen);
-	sg_set_buf(sg_out, drbg->outscratchpad, DRBG_OUTSCRATCHLEN);
 
 	while (outlen) {
-		u32 cryptlen = min3(inlen, outlen, (u32)DRBG_OUTSCRATCHLEN);
+		u32 cryptlen = min_t(u32, inlen, outlen);
 
 		/* Output buffer may not be valid for SGL, use scratchpad */
+		if (virt_addr_valid) {
+			sg_set_buf(sg_out, outbuf, cryptlen);
+		} else {
+			cryptlen = min_t(u32, cryptlen, DRBG_OUTSCRATCHLEN);
+			sg_set_buf(sg_out, drbg->outscratchpad, cryptlen);
+		}
 		skcipher_request_set_crypt(drbg->ctr_req, sg_in, sg_out,
 					   cryptlen, drbg->V);
 		ret = crypto_wait_req(crypto_skcipher_encrypt(drbg->ctr_req),
@@ -1765,7 +1771,8 @@  static int drbg_kcapi_sym_ctr(struct drbg_state *drbg,
 
 		crypto_init_wait(&drbg->ctr_wait);
 
-		memcpy(outbuf, drbg->outscratchpad, cryptlen);
+		if (!virt_addr_valid)
+			memcpy(outbuf, drbg->outscratchpad, cryptlen);
 
 		outlen -= cryptlen;
 		outbuf += cryptlen;
@@ -1773,7 +1780,8 @@  static int drbg_kcapi_sym_ctr(struct drbg_state *drbg,
 	ret = 0;
 
 out:
-	memzero_explicit(drbg->outscratchpad, DRBG_OUTSCRATCHLEN);
+	if (!virt_addr_valid)
+		memzero_explicit(drbg->outscratchpad, DRBG_OUTSCRATCHLEN);
 	return ret;
 }
 #endif /* CONFIG_CRYPTO_DRBG_CTR */