diff mbox series

[RFC] crypto: de-prioritize QAT

Message ID 0171515-7267-624-5a22-238af829698f@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Herbert Xu
Headers show
Series [RFC] crypto: de-prioritize QAT | expand

Commit Message

Mikulas Patocka Oct. 16, 2023, 11:26 a.m. UTC
Hi

I created this kernel module that stress-tests the crypto API:
https://people.redhat.com/~mpatocka/benchmarks/qat/tools/module-multithreaded.c

It shows that QAT underperforms significantly compared to AES-NI (for 
large requests it is 10 times slower; for small requests it is even worse) 
- see the second table in this document: 
https://people.redhat.com/~mpatocka/benchmarks/qat/kernel-module.txt

QAT has higher priority than AES-NI, so the kernel prefers it (it is not 
used for dm-crypt because it has the flag "CRYPTO_ALG_ALLOCATES_MEMORY", 
but it is preferred over AES-NI in other cases).

AES-NI has priority 300 or 400, unaccelerated AES has 100, so I suggest to 
lower the priority of QAT from 4001 to 201.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/crypto/intel/qat/qat_common/qat_algs.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Cabiddu, Giovanni Oct. 20, 2023, 12:58 p.m. UTC | #1
On Mon, Oct 16, 2023 at 01:26:47PM +0200, Mikulas Patocka wrote:
> Hi
> 
> I created this kernel module that stress-tests the crypto API:
> https://people.redhat.com/~mpatocka/benchmarks/qat/tools/module-multithreaded.c
> 
> It shows that QAT underperforms significantly compared to AES-NI (for 
> large requests it is 10 times slower; for small requests it is even worse) 
> - see the second table in this document: 
> https://people.redhat.com/~mpatocka/benchmarks/qat/kernel-module.txt
> 
> QAT has higher priority than AES-NI, so the kernel prefers it (it is not 
> used for dm-crypt because it has the flag "CRYPTO_ALG_ALLOCATES_MEMORY", 
> but it is preferred over AES-NI in other cases).
Probably you can get better performance by modifying your configuration
and test.
From your test application I can infer that you are using a single QAT
device. The driver allocates a ring pair per TFM and it loads balances
allocations between devices.
In addition, jobs are submitted synchronously. This way the cost of
offload is not amortised between requests.

Regards,
Mikulas Patocka Oct. 20, 2023, 8:01 p.m. UTC | #2
On Fri, 20 Oct 2023, Giovanni Cabiddu wrote:

> On Mon, Oct 16, 2023 at 01:26:47PM +0200, Mikulas Patocka wrote:
> > Hi
> > 
> > I created this kernel module that stress-tests the crypto API:
> > https://people.redhat.com/~mpatocka/benchmarks/qat/tools/module-multithreaded.c
> > 
> > It shows that QAT underperforms significantly compared to AES-NI (for 
> > large requests it is 10 times slower; for small requests it is even worse) 
> > - see the second table in this document: 
> > https://people.redhat.com/~mpatocka/benchmarks/qat/kernel-module.txt
> > 
> > QAT has higher priority than AES-NI, so the kernel prefers it (it is not 
> > used for dm-crypt because it has the flag "CRYPTO_ALG_ALLOCATES_MEMORY", 
> > but it is preferred over AES-NI in other cases).
> Probably you can get better performance by modifying your configuration
> and test.
> >From your test application I can infer that you are using a single QAT
> device. The driver allocates a ring pair per TFM and it loads balances
> allocations between devices.
> In addition, jobs are submitted synchronously. This way the cost of
> offload is not amortised between requests.
> 
> Regards,
> 
> -- 
> Giovanni

Yes, I thought about using more TFMs, but I don't have access to the dual 
Xeon machine anymore (I would have to request access and wait until people 
who are using it release it).

We can run more tests, but it would be best to batch all the tests in a 
small timeframe, to minimize blocking the machine for other people.

Regarding synchronous submission - the current test submits 112 requests 
concurrently. Do you think that it is too small and we should submit more? 
What would be the appropriate number of requests to submit concurrently?

I did synchronous submission because it was easier to write it than 
asynchronous submission, but if you think that 112 requests is too small 
and we need to submit more, I can try to do it.

Mikulas
diff mbox series

Patch

Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs.c
===================================================================
--- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs.c
+++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs.c
@@ -1277,7 +1277,7 @@  static struct aead_alg qat_aeads[] = { {
 	.base = {
 		.cra_name = "authenc(hmac(sha1),cbc(aes))",
 		.cra_driver_name = "qat_aes_cbc_hmac_sha1",
-		.cra_priority = 4001,
+		.cra_priority = 201,
 		.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY,
 		.cra_blocksize = AES_BLOCK_SIZE,
 		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
@@ -1294,7 +1294,7 @@  static struct aead_alg qat_aeads[] = { {
 	.base = {
 		.cra_name = "authenc(hmac(sha256),cbc(aes))",
 		.cra_driver_name = "qat_aes_cbc_hmac_sha256",
-		.cra_priority = 4001,
+		.cra_priority = 201,
 		.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY,
 		.cra_blocksize = AES_BLOCK_SIZE,
 		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
@@ -1311,7 +1311,7 @@  static struct aead_alg qat_aeads[] = { {
 	.base = {
 		.cra_name = "authenc(hmac(sha512),cbc(aes))",
 		.cra_driver_name = "qat_aes_cbc_hmac_sha512",
-		.cra_priority = 4001,
+		.cra_priority = 201,
 		.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY,
 		.cra_blocksize = AES_BLOCK_SIZE,
 		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
@@ -1329,7 +1329,7 @@  static struct aead_alg qat_aeads[] = { {
 static struct skcipher_alg qat_skciphers[] = { {
 	.base.cra_name = "cbc(aes)",
 	.base.cra_driver_name = "qat_aes_cbc",
-	.base.cra_priority = 4001,
+	.base.cra_priority = 201,
 	.base.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY,
 	.base.cra_blocksize = AES_BLOCK_SIZE,
 	.base.cra_ctxsize = sizeof(struct qat_alg_skcipher_ctx),
@@ -1347,7 +1347,7 @@  static struct skcipher_alg qat_skciphers
 }, {
 	.base.cra_name = "ctr(aes)",
 	.base.cra_driver_name = "qat_aes_ctr",
-	.base.cra_priority = 4001,
+	.base.cra_priority = 201,
 	.base.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY,
 	.base.cra_blocksize = 1,
 	.base.cra_ctxsize = sizeof(struct qat_alg_skcipher_ctx),
@@ -1365,7 +1365,7 @@  static struct skcipher_alg qat_skciphers
 }, {
 	.base.cra_name = "xts(aes)",
 	.base.cra_driver_name = "qat_aes_xts",
-	.base.cra_priority = 4001,
+	.base.cra_priority = 201,
 	.base.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK |
 			  CRYPTO_ALG_ALLOCATES_MEMORY,
 	.base.cra_blocksize = AES_BLOCK_SIZE,