mbox series

[0/6] net/crypto: Introduce crypto_pool

Message ID 20220726201600.1715505-1-dima@arista.com (mailing list archive)
Headers show
Series net/crypto: Introduce crypto_pool | expand

Message

Dmitry Safonov July 26, 2022, 8:15 p.m. UTC
Add crypto_pool - an API for allocating per-CPU array of crypto requests
on slow-path (in sleep'able context) and to use them on a fast-path,
which is RX/TX for net/ users (or in any other bh-disabled users).
The design is based on the current implementations of md5sig_pool.

Previously, I've suggested to add such API on TCP-AO patch submission [1], 
where Herbert kindly suggested to help with introducing new crypto API.

New API will allow:
- to reuse per-CPU ahash_request(s) for different users
- to allocate only one per-CPU scratch buffer rather than a new one for
  each user
- to have a common API for net/ users that need ahash on RX/TX fast path

In this version I've wired up TCP-MD5 and IPv6-SR-HMAC as users.
Potentially, xfrm_ipcomp and xfrm_ah can be converted as well.
The initial reason for patches would be to have TCP-AO as a user, which
would let it share per-CPU crypto_request for any supported hashing
algorithm.

While at it, I've also made TCP-MD5 static key dynamically switchable.
This means that after TCP-MD5 was used and the last key got destroyed,
the static branch is disabled and any potential penalty for checking
tcp_md5sig_info is gone, and the system's tcp performance should be as
if it never had TCP-MD5 key defined.

[1]: http://lkml.kernel.org/r/20211106034334.GA18577@gondor.apana.org.au

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Francesco Ruggeri <fruggeri@arista.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Leonard Crestez <cdleonard@gmail.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Salam Noureddine <noureddine@arista.com>
Cc: netdev@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Dmitry Safonov (6):
  crypto: Introduce crypto_pool
  crypto_pool: Add crypto_pool_reserve_scratch()
  net/tcp: Separate tcp_md5sig_info allocation into
    tcp_md5sig_info_add()
  net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction
  net/tcp: Use crypto_pool for TCP-MD5
  net/ipv6: sr: Switch to using crypto_pool

 crypto/Kconfig           |  12 ++
 crypto/Makefile          |   1 +
 crypto/crypto_pool.c     | 316 +++++++++++++++++++++++++++++++++++++++
 include/crypto/pool.h    |  33 ++++
 include/net/seg6_hmac.h  |   7 -
 include/net/tcp.h        |  32 ++--
 net/ipv4/Kconfig         |   2 +-
 net/ipv4/tcp.c           | 102 ++++---------
 net/ipv4/tcp_ipv4.c      | 153 ++++++++++++-------
 net/ipv4/tcp_minisocks.c |  23 ++-
 net/ipv4/tcp_output.c    |   4 +-
 net/ipv6/Kconfig         |   2 +-
 net/ipv6/seg6.c          |   3 -
 net/ipv6/seg6_hmac.c     | 204 ++++++++++---------------
 net/ipv6/tcp_ipv6.c      |  63 ++++----
 15 files changed, 634 insertions(+), 323 deletions(-)
 create mode 100644 crypto/crypto_pool.c
 create mode 100644 include/crypto/pool.h


base-commit: 058affafc65a74cf54499fb578b66ad0b18f939b

Comments

Herbert Xu July 27, 2022, 12:17 a.m. UTC | #1
On Tue, Jul 26, 2022 at 09:15:54PM +0100, Dmitry Safonov wrote:
> Add crypto_pool - an API for allocating per-CPU array of crypto requests
> on slow-path (in sleep'able context) and to use them on a fast-path,
> which is RX/TX for net/ users (or in any other bh-disabled users).
> The design is based on the current implementations of md5sig_pool.
> 
> Previously, I've suggested to add such API on TCP-AO patch submission [1], 
> where Herbert kindly suggested to help with introducing new crypto API.

What I was suggesting is modifying the actual ahash interface so
that the tfm can be shared between different key users by moving
the key into the request object.

Thanks,
Leonard Crestez July 27, 2022, 3:52 p.m. UTC | #2
On 7/27/22 03:17, Herbert Xu wrote:
> On Tue, Jul 26, 2022 at 09:15:54PM +0100, Dmitry Safonov wrote:
>> Add crypto_pool - an API for allocating per-CPU array of crypto requests
>> on slow-path (in sleep'able context) and to use them on a fast-path,
>> which is RX/TX for net/ users (or in any other bh-disabled users).
>> The design is based on the current implementations of md5sig_pool.
>>
>> Previously, I've suggested to add such API on TCP-AO patch submission [1],
>> where Herbert kindly suggested to help with introducing new crypto API.
> 
> What I was suggesting is modifying the actual ahash interface so
> that the tfm can be shared between different key users by moving
> the key into the request object.

The fact that setkey is implemented at the crypto_ahash instead of the 
ahash_request level is baked into all algorithm implementations 
(including many hardware-specific ones). Changing this seems extremely 
difficult.

Supporting setkey at the tfm level could be achieved by making it an 
optional capability on a per-algorithm basis, then something like 
crypto_pool could detect this scenario and avoid allocating a per-cpu 
tfm. This would also require a crypto_pool_setkey wrapper.

As it stands right now multiple crypto-api users needs to duplicate 
logic for allocating a percpu array of transforms so adding this "pool" 
API is an useful step forward.

As far as I remember the requirement for a per-cpu scratch buffer is 
based on weird architectures having limitations on what kind of memory 
can be passed to crypto api so this will have to remain.

--
Regards,
Leonard
Herbert Xu July 28, 2022, 9:26 a.m. UTC | #3
On Wed, Jul 27, 2022 at 06:52:27PM +0300, Leonard Crestez wrote:
>
> The fact that setkey is implemented at the crypto_ahash instead of the
> ahash_request level is baked into all algorithm implementations (including
> many hardware-specific ones). Changing this seems extremely difficult.

What I had in mind is simply making the tfm setkey optional.  That
way you could then have an additional setkey at the request level.

If the key is provided in either place you're allowed to perform
the hash.

This should have minimal impact on existing code.

Cheers,
Dmitry Safonov July 29, 2022, 4:13 p.m. UTC | #4
Hi Herbert,

On 7/27/22 01:17, Herbert Xu wrote:
> On Tue, Jul 26, 2022 at 09:15:54PM +0100, Dmitry Safonov wrote:
>> Add crypto_pool - an API for allocating per-CPU array of crypto requests
>> on slow-path (in sleep'able context) and to use them on a fast-path,
>> which is RX/TX for net/ users (or in any other bh-disabled users).
>> The design is based on the current implementations of md5sig_pool.
>>
>> Previously, I've suggested to add such API on TCP-AO patch submission [1], 
>> where Herbert kindly suggested to help with introducing new crypto API.
> 
> What I was suggesting is modifying the actual ahash interface so
> that the tfm can be shared between different key users by moving
> the key into the request object.

My impression here is that we're looking at different issues.
1. The necessity of allocating per-CPU ahash_requests.
2. Managing the lifetime and sharing of ahash_request between different
kernel users.

Removing (1) will allow saving (num_possible_cpus() - 1)*(sizeof(struct
ahash_request) + crypto_ahash_reqsize(tfm)) bytes. Which would be very
nice for the new fancy CPUs with hundreds of threads.

For (2) many kernel users try manage it themselves, resulting in
different implementations, as well as some users trying to avoid using
any complication like ref counting and allocating the request only once,
without freeing it until the module is unloaded. Here for example,
introducing TCP-AO would result in copy'n'paste of tcp_md5sig_pool code.
As well as RFC5925 for TCP-AO let user to have any supported hashing
algorithms, with the requirement from RFC5926 of hmac(sha1) & aes(cmac).
If a user wants more algorithms that implementation would need to be
patched.

I see quite a few net/ users that could use some common API for this
besides TCP-MD5 and TCP-AO. That have the same pattern of allocating
crypto algorithm on a slow-path (adding a key or module initialization)
and using it of a fast-path, which is RX/TX.
Besides of sharing and lifetime managing, those users need a temporary
buffer (usually the name is `scratch'), IIUC, it is needed for async
algorithms that could use some hardware accelerator instead of CPU and
need to write the result anywhere, but on vmapped stack.

So, here I'm trying to address (2) in order to avoid copy'n'pasting of
tcp_md5sig_pool code for introduction of TCP-AO support.
I've also patched tcp-md5 code to dynamically disable the static branch,
which is not crypto change.

There's also a chance I've misunderstood what is your proposal :-)

Thanks,
          Dmitry