mbox series

[RFC,00/18] crypto: Add generic Kerberos library

Message ID 160518586534.2277919.14475638653680231924.stgit@warthog.procyon.org.uk (mailing list archive)
Headers show
Series crypto: Add generic Kerberos library | expand

Message

David Howells Nov. 12, 2020, 12:57 p.m. UTC
Hi Herbert, Bruce,

Here's my first cut at a generic Kerberos crypto library in the kernel so
that I can share code between rxrpc and sunrpc (and cifs?).

I derived some of the parts from the sunrpc gss library and added more
advanced AES and Camellia crypto.  I haven't ported across the DES-based
crypto yet - I figure that can wait a bit till the interface is sorted.

Whilst I have put it into a directory under crypto/, I haven't made an
interface that goes and loads it (analogous to crypto_alloc_skcipher,
say).  Instead, you call:

        const struct krb5_enctype *crypto_krb5_find_enctype(u32 enctype);

to go and get a handler table and then use a bunch of accessor functions to
jump through the hoops.  This is basically the way the sunrpc gsslib does
things.  It might be worth making it so you do something like:

	struct crypto_mech *ctx = crypto_mech_alloc("krb5(18)");

to get enctype 18, but I'm not sure if it's worth the effort.  Also, I'm
not sure if there are any alternatives to kerberos we will need to support.

There are three main interfaces to it:

 (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic.

     These all do in-place crypto, using an sglist to define the buffer
     with the data in it.  Is it necessary to make it able to take separate
     input and output buffers?

 (*) PRF+ calculation for key derivation.
 (*) Kc, Ke, Ki derivation.

     These use krb5_buffer structs to pass objects around.  This is akin to
     the xdr_netobj, but has a void* instead of a u8* data pointer.

In terms of rxrpc's rxgk, there's another step in key derivation that isn't
part of the kerberos standard, but uses the PRF+ function to generate a key
that is then used to generate Kc, Ke and Ki.  Is it worth putting this into
the directory or maybe having a callout to insert an intermediate step in
key derivation?

Note that, for purposes of illustration, I've included some rxrpc patches
that use this interface to implement the rxgk Rx security class.  The
branch also is based on some rxrpc patches that are a prerequisite for
this, but the crypto patches don't need it.

---
The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=crypto-krb5

David
---
David Howells (18):
      crypto/krb5: Implement Kerberos crypto core
      crypto/krb5: Add some constants out of sunrpc headers
      crypto/krb5: Provide infrastructure and key derivation
      crypto/krb5: Implement the Kerberos5 rfc3961 key derivation
      crypto/krb5: Implement the Kerberos5 rfc3961 encrypt and decrypt functions
      crypto/krb5: Implement the Kerberos5 rfc3961 get_mic and verify_mic
      crypto/krb5: Implement the AES enctypes from rfc3962
      crypto/krb5: Implement crypto self-testing
      crypto/krb5: Implement the AES enctypes from rfc8009
      crypto/krb5: Implement the AES encrypt/decrypt from rfc8009
      crypto/krb5: Add the AES self-testing data from rfc8009
      crypto/krb5: Implement the Camellia enctypes from rfc6803
      rxrpc: Add the security index for yfs-rxgk
      rxrpc: Add YFS RxGK (GSSAPI) security class
      rxrpc: rxgk: Provide infrastructure and key derivation
      rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)
      rxrpc: rxgk: Implement connection rekeying
      rxgk: Support OpenAFS's rxgk implementation


 crypto/krb5/Kconfig              |    9 +
 crypto/krb5/Makefile             |   11 +-
 crypto/krb5/internal.h           |  101 +++
 crypto/krb5/kdf.c                |  223 ++++++
 crypto/krb5/main.c               |  190 +++++
 crypto/krb5/rfc3961_simplified.c |  732 ++++++++++++++++++
 crypto/krb5/rfc3962_aes.c        |  140 ++++
 crypto/krb5/rfc6803_camellia.c   |  249 ++++++
 crypto/krb5/rfc8009_aes2.c       |  440 +++++++++++
 crypto/krb5/selftest.c           |  543 +++++++++++++
 crypto/krb5/selftest_data.c      |  289 +++++++
 fs/afs/misc.c                    |   13 +
 include/crypto/krb5.h            |  100 +++
 include/keys/rxrpc-type.h        |   17 +
 include/trace/events/rxrpc.h     |    4 +
 include/uapi/linux/rxrpc.h       |   17 +
 net/rxrpc/Kconfig                |   10 +
 net/rxrpc/Makefile               |    5 +
 net/rxrpc/ar-internal.h          |   20 +
 net/rxrpc/conn_object.c          |    2 +
 net/rxrpc/key.c                  |  319 ++++++++
 net/rxrpc/rxgk.c                 | 1232 ++++++++++++++++++++++++++++++
 net/rxrpc/rxgk_app.c             |  424 ++++++++++
 net/rxrpc/rxgk_common.h          |  164 ++++
 net/rxrpc/rxgk_kdf.c             |  271 +++++++
 net/rxrpc/security.c             |    6 +
 26 files changed, 5530 insertions(+), 1 deletion(-)
 create mode 100644 crypto/krb5/kdf.c
 create mode 100644 crypto/krb5/rfc3961_simplified.c
 create mode 100644 crypto/krb5/rfc3962_aes.c
 create mode 100644 crypto/krb5/rfc6803_camellia.c
 create mode 100644 crypto/krb5/rfc8009_aes2.c
 create mode 100644 crypto/krb5/selftest.c
 create mode 100644 crypto/krb5/selftest_data.c
 create mode 100644 net/rxrpc/rxgk.c
 create mode 100644 net/rxrpc/rxgk_app.c
 create mode 100644 net/rxrpc/rxgk_common.h
 create mode 100644 net/rxrpc/rxgk_kdf.c

Comments

David Howells Nov. 12, 2020, 1:44 p.m. UTC | #1
Would it be possible/practical to make the skcipher encrypt functions take an
offset into the scatterlist rather than always starting at the beginning?

David
Chuck Lever Nov. 12, 2020, 2:36 p.m. UTC | #2
> On Nov 12, 2020, at 7:57 AM, David Howells <dhowells@redhat.com> wrote:
> 
> 
> Hi Herbert, Bruce,
> 
> Here's my first cut at a generic Kerberos crypto library in the kernel so
> that I can share code between rxrpc and sunrpc (and cifs?).
> 
> I derived some of the parts from the sunrpc gss library and added more
> advanced AES and Camellia crypto.  I haven't ported across the DES-based
> crypto yet - I figure that can wait a bit till the interface is sorted.
> 
> Whilst I have put it into a directory under crypto/, I haven't made an
> interface that goes and loads it (analogous to crypto_alloc_skcipher,
> say).  Instead, you call:
> 
>        const struct krb5_enctype *crypto_krb5_find_enctype(u32 enctype);
> 
> to go and get a handler table and then use a bunch of accessor functions to
> jump through the hoops.  This is basically the way the sunrpc gsslib does
> things.  It might be worth making it so you do something like:
> 
> 	struct crypto_mech *ctx = crypto_mech_alloc("krb5(18)");
> 
> to get enctype 18, but I'm not sure if it's worth the effort.  Also, I'm
> not sure if there are any alternatives to kerberos we will need to support.
> 
> There are three main interfaces to it:
> 
> (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic.
> 
>     These all do in-place crypto, using an sglist to define the buffer
>     with the data in it.  Is it necessary to make it able to take separate
>     input and output buffers?

Hi David, Wondering if these "I/O" APIs use synchronous or async
crypto under the covers. For small data items like MICs, synchronous
might be a better choice, especially if asynchronous crypto could
result in incoming requests getting re-ordered and falling out of
the GSS sequence number window.

What say ye?


> (*) PRF+ calculation for key derivation.
> (*) Kc, Ke, Ki derivation.
> 
>     These use krb5_buffer structs to pass objects around.  This is akin to
>     the xdr_netobj, but has a void* instead of a u8* data pointer.
> 
> In terms of rxrpc's rxgk, there's another step in key derivation that isn't
> part of the kerberos standard, but uses the PRF+ function to generate a key
> that is then used to generate Kc, Ke and Ki.  Is it worth putting this into
> the directory or maybe having a callout to insert an intermediate step in
> key derivation?
> 
> Note that, for purposes of illustration, I've included some rxrpc patches
> that use this interface to implement the rxgk Rx security class.  The
> branch also is based on some rxrpc patches that are a prerequisite for
> this, but the crypto patches don't need it.
> 
> ---
> The patches can be found here also:
> 
> 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=crypto-krb5
> 
> David
> ---
> David Howells (18):
>      crypto/krb5: Implement Kerberos crypto core
>      crypto/krb5: Add some constants out of sunrpc headers
>      crypto/krb5: Provide infrastructure and key derivation
>      crypto/krb5: Implement the Kerberos5 rfc3961 key derivation
>      crypto/krb5: Implement the Kerberos5 rfc3961 encrypt and decrypt functions
>      crypto/krb5: Implement the Kerberos5 rfc3961 get_mic and verify_mic
>      crypto/krb5: Implement the AES enctypes from rfc3962
>      crypto/krb5: Implement crypto self-testing
>      crypto/krb5: Implement the AES enctypes from rfc8009
>      crypto/krb5: Implement the AES encrypt/decrypt from rfc8009
>      crypto/krb5: Add the AES self-testing data from rfc8009
>      crypto/krb5: Implement the Camellia enctypes from rfc6803
>      rxrpc: Add the security index for yfs-rxgk
>      rxrpc: Add YFS RxGK (GSSAPI) security class
>      rxrpc: rxgk: Provide infrastructure and key derivation
>      rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)
>      rxrpc: rxgk: Implement connection rekeying
>      rxgk: Support OpenAFS's rxgk implementation
> 
> 
> crypto/krb5/Kconfig              |    9 +
> crypto/krb5/Makefile             |   11 +-
> crypto/krb5/internal.h           |  101 +++
> crypto/krb5/kdf.c                |  223 ++++++
> crypto/krb5/main.c               |  190 +++++
> crypto/krb5/rfc3961_simplified.c |  732 ++++++++++++++++++
> crypto/krb5/rfc3962_aes.c        |  140 ++++
> crypto/krb5/rfc6803_camellia.c   |  249 ++++++
> crypto/krb5/rfc8009_aes2.c       |  440 +++++++++++
> crypto/krb5/selftest.c           |  543 +++++++++++++
> crypto/krb5/selftest_data.c      |  289 +++++++
> fs/afs/misc.c                    |   13 +
> include/crypto/krb5.h            |  100 +++
> include/keys/rxrpc-type.h        |   17 +
> include/trace/events/rxrpc.h     |    4 +
> include/uapi/linux/rxrpc.h       |   17 +
> net/rxrpc/Kconfig                |   10 +
> net/rxrpc/Makefile               |    5 +
> net/rxrpc/ar-internal.h          |   20 +
> net/rxrpc/conn_object.c          |    2 +
> net/rxrpc/key.c                  |  319 ++++++++
> net/rxrpc/rxgk.c                 | 1232 ++++++++++++++++++++++++++++++
> net/rxrpc/rxgk_app.c             |  424 ++++++++++
> net/rxrpc/rxgk_common.h          |  164 ++++
> net/rxrpc/rxgk_kdf.c             |  271 +++++++
> net/rxrpc/security.c             |    6 +
> 26 files changed, 5530 insertions(+), 1 deletion(-)
> create mode 100644 crypto/krb5/kdf.c
> create mode 100644 crypto/krb5/rfc3961_simplified.c
> create mode 100644 crypto/krb5/rfc3962_aes.c
> create mode 100644 crypto/krb5/rfc6803_camellia.c
> create mode 100644 crypto/krb5/rfc8009_aes2.c
> create mode 100644 crypto/krb5/selftest.c
> create mode 100644 crypto/krb5/selftest_data.c
> create mode 100644 net/rxrpc/rxgk.c
> create mode 100644 net/rxrpc/rxgk_app.c
> create mode 100644 net/rxrpc/rxgk_common.h
> create mode 100644 net/rxrpc/rxgk_kdf.c
> 
> 

--
Chuck Lever
David Howells Nov. 12, 2020, 3:42 p.m. UTC | #3
Chuck Lever <chuck.lever@oracle.com> wrote:

> > There are three main interfaces to it:
> > 
> > (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic.
> > 
> >     These all do in-place crypto, using an sglist to define the buffer
> >     with the data in it.  Is it necessary to make it able to take separate
> >     input and output buffers?
> 
> Hi David, Wondering if these "I/O" APIs use synchronous or async
> crypto under the covers. For small data items like MICs, synchronous
> might be a better choice, especially if asynchronous crypto could
> result in incoming requests getting re-ordered and falling out of
> the GSS sequence number window.
> 
> What say ye?

For the moment I'm using synchronous APIs as that's what sunrpc is using (I
borrowed the basic code from there).

It would be interesting to consider using async, but there's a potential
issue.  For the simplified profile, encryption and integrity checksum
generation can be done simultaneously, but decryption and verification can't.
For the AES-2 profile, the reverse is true.

For my purposes in rxrpc, async mode isn't actually that useful since I'm only
doing the contents of a UDP packet at a time.  Either I'm encrypting with the
intention of immediate transmission or decrypting with the intention of
immediately using the data, so I'm in a context where I can wait anyway.

What might get me more of a boost would be to encrypt the app data directly
into a UDP packet and decrypt the UDP packet directly into the app buffers.
This is easier said than done, though, as there's typically security metadata
inserted into the packet inside the encrypted region.

David
Chuck Lever Nov. 12, 2020, 3:49 p.m. UTC | #4
> On Nov 12, 2020, at 10:42 AM, David Howells <dhowells@redhat.com> wrote:
> 
> Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>>> There are three main interfaces to it:
>>> 
>>> (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic.
>>> 
>>>    These all do in-place crypto, using an sglist to define the buffer
>>>    with the data in it.  Is it necessary to make it able to take separate
>>>    input and output buffers?
>> 
>> Hi David, Wondering if these "I/O" APIs use synchronous or async
>> crypto under the covers. For small data items like MICs, synchronous
>> might be a better choice, especially if asynchronous crypto could
>> result in incoming requests getting re-ordered and falling out of
>> the GSS sequence number window.
>> 
>> What say ye?
> 
> For the moment I'm using synchronous APIs as that's what sunrpc is using (I
> borrowed the basic code from there).

Really? My understanding of the Linux kernel SUNRPC implementation is
that it uses asynchronous, even for small data items. Maybe I'm using
the terminology incorrectly.

The problem that arises is on the server. The asynchronous API can
schedule, and if the server has other work to do, that can delay a
verify_mic long enough that the request drops out of the GSS sequence
number window (even a large window).

Whatever the mechanism, we need to have deterministic ordering, at
least on the server-side.


> It would be interesting to consider using async, but there's a potential
> issue.  For the simplified profile, encryption and integrity checksum
> generation can be done simultaneously, but decryption and verification can't.
> For the AES-2 profile, the reverse is true.
> 
> For my purposes in rxrpc, async mode isn't actually that useful since I'm only
> doing the contents of a UDP packet at a time.  Either I'm encrypting with the
> intention of immediate transmission or decrypting with the intention of
> immediately using the data, so I'm in a context where I can wait anyway.
> 
> What might get me more of a boost would be to encrypt the app data directly
> into a UDP packet and decrypt the UDP packet directly into the app buffers.
> This is easier said than done, though, as there's typically security metadata
> inserted into the packet inside the encrypted region.


--
Chuck Lever
David Howells Nov. 12, 2020, 4:54 p.m. UTC | #5
Chuck Lever <chuck.lever@oracle.com> wrote:

> Really? My understanding of the Linux kernel SUNRPC implementation is
> that it uses asynchronous, even for small data items. Maybe I'm using
> the terminology incorrectly.

Seems to be synchronous, at least in its use of skcipher:

grep -e skcipher *
gss_krb5_crypto.c:#include <crypto/skcipher.h>
gss_krb5_crypto.c:	struct crypto_sync_skcipher *tfm,
gss_krb5_crypto.c:	if (length % crypto_sync_skcipher_blocksize(tfm) != 0)
gss_krb5_crypto.c:	if (crypto_sync_skcipher_ivsize(tfm) > GSS_KRB5_MAX_BLOCKSIZE) {
gss_krb5_crypto.c:			crypto_sync_skcipher_ivsize(tfm));
gss_krb5_crypto.c:		memcpy(local_iv, iv, crypto_sync_skcipher_ivsize(tfm));
gss_krb5_crypto.c:	skcipher_request_set_sync_tfm(req, tfm);
gss_krb5_crypto.c:	skcipher_request_set_callback(req, 0, NULL, NULL);
gss_krb5_crypto.c:	skcipher_request_set_crypt(req, sg, sg, length, local_iv);
gss_krb5_crypto.c:	ret = crypto_skcipher_encrypt(req);
gss_krb5_crypto.c:	skcipher_request_zero(req);
gss_krb5_crypto.c:     struct crypto_sync_skcipher *tfm,
gss_krb5_crypto.c:	if (length % crypto_sync_skcipher_blocksize(tfm) != 0)
gss_krb5_crypto.c:	if (crypto_sync_skcipher_ivsize(tfm) > GSS_KRB5_MAX_BLOCKSIZE) {
gss_krb5_crypto.c:			crypto_sync_skcipher_ivsize(tfm));
gss_krb5_crypto.c:		memcpy(local_iv, iv, crypto_sync_skcipher_ivsize(tfm));
gss_krb5_crypto.c:	skcipher_request_set_sync_tfm(req, tfm);
gss_krb5_crypto.c:	skcipher_request_set_callback(req, 0, NULL, NULL);
gss_krb5_crypto.c:	skcipher_request_set_crypt(req, sg, sg, length, local_iv);
gss_krb5_crypto.c:	ret = crypto_skcipher_decrypt(req);
gss_krb5_crypto.c:	skcipher_request_zero(req);
gss_krb5_crypto.c:	struct skcipher_request *req;
gss_krb5_crypto.c:	struct crypto_sync_skcipher *tfm =
gss_krb5_crypto.c:		crypto_sync_skcipher_reqtfm(desc->req);
gss_krb5_crypto.c:	fraglen = thislen & (crypto_sync_skcipher_blocksize(tfm) - 1);
gss_krb5_crypto.c:	skcipher_request_set_crypt(desc->req, desc->infrags, desc->outfrags,
gss_krb5_crypto.c:	ret = crypto_skcipher_encrypt(desc->req);
gss_krb5_crypto.c:gss_encrypt_xdr_buf(struct crypto_sync_skcipher *tfm, struct xdr_buf *buf,
gss_krb5_crypto.c:	BUG_ON((buf->len - offset) % crypto_sync_skcipher_blocksize(tfm) != 0);
gss_krb5_crypto.c:	skcipher_request_set_sync_tfm(req, tfm);
gss_krb5_crypto.c:	skcipher_request_set_callback(req, 0, NULL, NULL);
gss_krb5_crypto.c:	skcipher_request_zero(req);
gss_krb5_crypto.c:	struct skcipher_request *req;
gss_krb5_crypto.c:	struct crypto_sync_skcipher *tfm =
gss_krb5_crypto.c:		crypto_sync_skcipher_reqtfm(desc->req);
gss_krb5_crypto.c:	fraglen = thislen & (crypto_sync_skcipher_blocksize(tfm) - 1);
gss_krb5_crypto.c:	skcipher_request_set_crypt(desc->req, desc->frags, desc->frags,
gss_krb5_crypto.c:	ret = crypto_skcipher_decrypt(desc->req);
gss_krb5_crypto.c:gss_decrypt_xdr_buf(struct crypto_sync_skcipher *tfm, struct xdr_buf *buf,
gss_krb5_crypto.c:	BUG_ON((buf->len - offset) % crypto_sync_skcipher_blocksize(tfm) != 0);
gss_krb5_crypto.c:	skcipher_request_set_sync_tfm(req, tfm);
gss_krb5_crypto.c:	skcipher_request_set_callback(req, 0, NULL, NULL);
gss_krb5_crypto.c:	skcipher_request_zero(req);
gss_krb5_crypto.c:gss_krb5_cts_crypt(struct crypto_sync_skcipher *cipher, struct xdr_buf *buf,
gss_krb5_crypto.c:	skcipher_request_set_sync_tfm(req, cipher);
gss_krb5_crypto.c:	skcipher_request_set_callback(req, 0, NULL, NULL);
gss_krb5_crypto.c:	skcipher_request_set_crypt(req, sg, sg, len, iv);
gss_krb5_crypto.c:		ret = crypto_skcipher_encrypt(req);
gss_krb5_crypto.c:		ret = crypto_skcipher_decrypt(req);
gss_krb5_crypto.c:	skcipher_request_zero(req);
gss_krb5_crypto.c:	struct crypto_sync_skcipher *cipher, *aux_cipher;
gss_krb5_crypto.c:	blocksize = crypto_sync_skcipher_blocksize(cipher);
gss_krb5_crypto.c:		skcipher_request_set_sync_tfm(req, aux_cipher);
gss_krb5_crypto.c:		skcipher_request_set_callback(req, 0, NULL, NULL);
gss_krb5_crypto.c:		skcipher_request_zero(req);
gss_krb5_crypto.c:	struct crypto_sync_skcipher *cipher, *aux_cipher;
gss_krb5_crypto.c:	blocksize = crypto_sync_skcipher_blocksize(cipher);
gss_krb5_crypto.c:		skcipher_request_set_sync_tfm(req, aux_cipher);
gss_krb5_crypto.c:		skcipher_request_set_callback(req, 0, NULL, NULL);
gss_krb5_crypto.c:		skcipher_request_zero(req);
gss_krb5_keys.c:#include <crypto/skcipher.h>
gss_krb5_keys.c:	struct crypto_sync_skcipher *cipher;
gss_krb5_keys.c:	cipher = crypto_alloc_sync_skcipher(gk5e->encrypt_name, 0, 0);
gss_krb5_keys.c:	if (crypto_sync_skcipher_setkey(cipher, inkey->data, inkey->len))
gss_krb5_keys.c:	crypto_free_sync_skcipher(cipher);
gss_krb5_mech.c:#include <crypto/skcipher.h>
gss_krb5_mech.c:	struct krb5_ctx *ctx, struct crypto_sync_skcipher **res)
gss_krb5_mech.c:	*res = crypto_alloc_sync_skcipher(ctx->gk5e->encrypt_name, 0, 0);
gss_krb5_mech.c:	if (crypto_sync_skcipher_setkey(*res, key.data, key.len)) {
gss_krb5_mech.c:	crypto_free_sync_skcipher(*res);
gss_krb5_mech.c:	crypto_free_sync_skcipher(ctx->seq);
gss_krb5_mech.c:	crypto_free_sync_skcipher(ctx->enc);
gss_krb5_mech.c:static struct crypto_sync_skcipher *
gss_krb5_mech.c:	struct crypto_sync_skcipher *cp;
gss_krb5_mech.c:	cp = crypto_alloc_sync_skcipher(cname, 0, 0);
gss_krb5_mech.c:	if (crypto_sync_skcipher_setkey(cp, key, ctx->gk5e->keylength)) {
gss_krb5_mech.c:		crypto_free_sync_skcipher(cp);
gss_krb5_mech.c:	crypto_free_sync_skcipher(ctx->enc);
gss_krb5_mech.c:	crypto_free_sync_skcipher(ctx->seq);
gss_krb5_mech.c:			crypto_free_sync_skcipher(ctx->initiator_enc_aux);
gss_krb5_mech.c:	crypto_free_sync_skcipher(ctx->acceptor_enc);
gss_krb5_mech.c:	crypto_free_sync_skcipher(ctx->initiator_enc);
gss_krb5_mech.c:	crypto_free_sync_skcipher(kctx->seq);
gss_krb5_mech.c:	crypto_free_sync_skcipher(kctx->enc);
gss_krb5_mech.c:	crypto_free_sync_skcipher(kctx->acceptor_enc);
gss_krb5_mech.c:	crypto_free_sync_skcipher(kctx->initiator_enc);
gss_krb5_mech.c:	crypto_free_sync_skcipher(kctx->acceptor_enc_aux);
gss_krb5_mech.c:	crypto_free_sync_skcipher(kctx->initiator_enc_aux);
gss_krb5_seqnum.c:#include <crypto/skcipher.h>
gss_krb5_seqnum.c:		struct crypto_sync_skcipher *key,
gss_krb5_seqnum.c:	struct crypto_sync_skcipher *key = kctx->seq;
gss_krb5_wrap.c:#include <crypto/skcipher.h>
gss_krb5_wrap.c:	blocksize = crypto_sync_skcipher_blocksize(kctx->enc);
gss_krb5_wrap.c:	blocksize = crypto_sync_skcipher_blocksize(kctx->enc);

David
J. Bruce Fields Nov. 12, 2020, 6:37 p.m. UTC | #6
On Thu, Nov 12, 2020 at 12:57:45PM +0000, David Howells wrote:
> 
> Hi Herbert, Bruce,
> 
> Here's my first cut at a generic Kerberos crypto library in the kernel so
> that I can share code between rxrpc and sunrpc (and cifs?).
> 
> I derived some of the parts from the sunrpc gss library and added more
> advanced AES and Camellia crypto.  I haven't ported across the DES-based
> crypto yet - I figure that can wait a bit till the interface is sorted.
> 
> Whilst I have put it into a directory under crypto/, I haven't made an
> interface that goes and loads it (analogous to crypto_alloc_skcipher,
> say).  Instead, you call:
> 
>         const struct krb5_enctype *crypto_krb5_find_enctype(u32 enctype);
> 
> to go and get a handler table and then use a bunch of accessor functions to
> jump through the hoops.  This is basically the way the sunrpc gsslib does
> things.  It might be worth making it so you do something like:
> 
> 	struct crypto_mech *ctx = crypto_mech_alloc("krb5(18)");
> 
> to get enctype 18, but I'm not sure if it's worth the effort.  Also, I'm
> not sure if there are any alternatives to kerberos we will need to support.

We did have code for a non-krb5 mechanism at some point, but it was torn
out.  So I don't think that's a priority.

(Chuck, will RPC-over-SSL need a new non-krb5 mechanism?)

> There are three main interfaces to it:
> 
>  (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic.
> 
>      These all do in-place crypto, using an sglist to define the buffer
>      with the data in it.  Is it necessary to make it able to take separate
>      input and output buffers?

I don't know.  My memory is that the buffer management in the existing
rpcsec_gss code is complex and fragile.  See e.g. the long comment in
gss_krb5_remove_padding.

--b.

>  (*) PRF+ calculation for key derivation.
>  (*) Kc, Ke, Ki derivation.
> 
>      These use krb5_buffer structs to pass objects around.  This is akin to
>      the xdr_netobj, but has a void* instead of a u8* data pointer.
> 
> In terms of rxrpc's rxgk, there's another step in key derivation that isn't
> part of the kerberos standard, but uses the PRF+ function to generate a key
> that is then used to generate Kc, Ke and Ki.  Is it worth putting this into
> the directory or maybe having a callout to insert an intermediate step in
> key derivation?
> 
> Note that, for purposes of illustration, I've included some rxrpc patches
> that use this interface to implement the rxgk Rx security class.  The
> branch also is based on some rxrpc patches that are a prerequisite for
> this, but the crypto patches don't need it.
> 
> ---
> The patches can be found here also:
> 
> 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=crypto-krb5
> 
> David
> ---
> David Howells (18):
>       crypto/krb5: Implement Kerberos crypto core
>       crypto/krb5: Add some constants out of sunrpc headers
>       crypto/krb5: Provide infrastructure and key derivation
>       crypto/krb5: Implement the Kerberos5 rfc3961 key derivation
>       crypto/krb5: Implement the Kerberos5 rfc3961 encrypt and decrypt functions
>       crypto/krb5: Implement the Kerberos5 rfc3961 get_mic and verify_mic
>       crypto/krb5: Implement the AES enctypes from rfc3962
>       crypto/krb5: Implement crypto self-testing
>       crypto/krb5: Implement the AES enctypes from rfc8009
>       crypto/krb5: Implement the AES encrypt/decrypt from rfc8009
>       crypto/krb5: Add the AES self-testing data from rfc8009
>       crypto/krb5: Implement the Camellia enctypes from rfc6803
>       rxrpc: Add the security index for yfs-rxgk
>       rxrpc: Add YFS RxGK (GSSAPI) security class
>       rxrpc: rxgk: Provide infrastructure and key derivation
>       rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)
>       rxrpc: rxgk: Implement connection rekeying
>       rxgk: Support OpenAFS's rxgk implementation
> 
> 
>  crypto/krb5/Kconfig              |    9 +
>  crypto/krb5/Makefile             |   11 +-
>  crypto/krb5/internal.h           |  101 +++
>  crypto/krb5/kdf.c                |  223 ++++++
>  crypto/krb5/main.c               |  190 +++++
>  crypto/krb5/rfc3961_simplified.c |  732 ++++++++++++++++++
>  crypto/krb5/rfc3962_aes.c        |  140 ++++
>  crypto/krb5/rfc6803_camellia.c   |  249 ++++++
>  crypto/krb5/rfc8009_aes2.c       |  440 +++++++++++
>  crypto/krb5/selftest.c           |  543 +++++++++++++
>  crypto/krb5/selftest_data.c      |  289 +++++++
>  fs/afs/misc.c                    |   13 +
>  include/crypto/krb5.h            |  100 +++
>  include/keys/rxrpc-type.h        |   17 +
>  include/trace/events/rxrpc.h     |    4 +
>  include/uapi/linux/rxrpc.h       |   17 +
>  net/rxrpc/Kconfig                |   10 +
>  net/rxrpc/Makefile               |    5 +
>  net/rxrpc/ar-internal.h          |   20 +
>  net/rxrpc/conn_object.c          |    2 +
>  net/rxrpc/key.c                  |  319 ++++++++
>  net/rxrpc/rxgk.c                 | 1232 ++++++++++++++++++++++++++++++
>  net/rxrpc/rxgk_app.c             |  424 ++++++++++
>  net/rxrpc/rxgk_common.h          |  164 ++++
>  net/rxrpc/rxgk_kdf.c             |  271 +++++++
>  net/rxrpc/security.c             |    6 +
>  26 files changed, 5530 insertions(+), 1 deletion(-)
>  create mode 100644 crypto/krb5/kdf.c
>  create mode 100644 crypto/krb5/rfc3961_simplified.c
>  create mode 100644 crypto/krb5/rfc3962_aes.c
>  create mode 100644 crypto/krb5/rfc6803_camellia.c
>  create mode 100644 crypto/krb5/rfc8009_aes2.c
>  create mode 100644 crypto/krb5/selftest.c
>  create mode 100644 crypto/krb5/selftest_data.c
>  create mode 100644 net/rxrpc/rxgk.c
>  create mode 100644 net/rxrpc/rxgk_app.c
>  create mode 100644 net/rxrpc/rxgk_common.h
>  create mode 100644 net/rxrpc/rxgk_kdf.c
>
Chuck Lever Nov. 12, 2020, 6:39 p.m. UTC | #7
> On Nov 12, 2020, at 1:37 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Thu, Nov 12, 2020 at 12:57:45PM +0000, David Howells wrote:
>> 
>> Hi Herbert, Bruce,
>> 
>> Here's my first cut at a generic Kerberos crypto library in the kernel so
>> that I can share code between rxrpc and sunrpc (and cifs?).
>> 
>> I derived some of the parts from the sunrpc gss library and added more
>> advanced AES and Camellia crypto.  I haven't ported across the DES-based
>> crypto yet - I figure that can wait a bit till the interface is sorted.
>> 
>> Whilst I have put it into a directory under crypto/, I haven't made an
>> interface that goes and loads it (analogous to crypto_alloc_skcipher,
>> say).  Instead, you call:
>> 
>>        const struct krb5_enctype *crypto_krb5_find_enctype(u32 enctype);
>> 
>> to go and get a handler table and then use a bunch of accessor functions to
>> jump through the hoops.  This is basically the way the sunrpc gsslib does
>> things.  It might be worth making it so you do something like:
>> 
>> 	struct crypto_mech *ctx = crypto_mech_alloc("krb5(18)");
>> 
>> to get enctype 18, but I'm not sure if it's worth the effort.  Also, I'm
>> not sure if there are any alternatives to kerberos we will need to support.
> 
> We did have code for a non-krb5 mechanism at some point, but it was torn
> out.  So I don't think that's a priority.
> 
> (Chuck, will RPC-over-SSL need a new non-krb5 mechanism?)

No, RPC-over-TLS does not involve the GSS infrastructure in any way.


>> There are three main interfaces to it:
>> 
>> (*) I/O crypto: encrypt, decrypt, get_mic and verify_mic.
>> 
>>     These all do in-place crypto, using an sglist to define the buffer
>>     with the data in it.  Is it necessary to make it able to take separate
>>     input and output buffers?
> 
> I don't know.  My memory is that the buffer management in the existing
> rpcsec_gss code is complex and fragile.  See e.g. the long comment in
> gss_krb5_remove_padding.

And even worse, the buffer handling is slightly different in the NFS
client and server code paths.


> --b.
> 
>> (*) PRF+ calculation for key derivation.
>> (*) Kc, Ke, Ki derivation.
>> 
>>     These use krb5_buffer structs to pass objects around.  This is akin to
>>     the xdr_netobj, but has a void* instead of a u8* data pointer.
>> 
>> In terms of rxrpc's rxgk, there's another step in key derivation that isn't
>> part of the kerberos standard, but uses the PRF+ function to generate a key
>> that is then used to generate Kc, Ke and Ki.  Is it worth putting this into
>> the directory or maybe having a callout to insert an intermediate step in
>> key derivation?
>> 
>> Note that, for purposes of illustration, I've included some rxrpc patches
>> that use this interface to implement the rxgk Rx security class.  The
>> branch also is based on some rxrpc patches that are a prerequisite for
>> this, but the crypto patches don't need it.
>> 
>> ---
>> The patches can be found here also:
>> 
>> 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=crypto-krb5
>> 
>> David
>> ---
>> David Howells (18):
>>      crypto/krb5: Implement Kerberos crypto core
>>      crypto/krb5: Add some constants out of sunrpc headers
>>      crypto/krb5: Provide infrastructure and key derivation
>>      crypto/krb5: Implement the Kerberos5 rfc3961 key derivation
>>      crypto/krb5: Implement the Kerberos5 rfc3961 encrypt and decrypt functions
>>      crypto/krb5: Implement the Kerberos5 rfc3961 get_mic and verify_mic
>>      crypto/krb5: Implement the AES enctypes from rfc3962
>>      crypto/krb5: Implement crypto self-testing
>>      crypto/krb5: Implement the AES enctypes from rfc8009
>>      crypto/krb5: Implement the AES encrypt/decrypt from rfc8009
>>      crypto/krb5: Add the AES self-testing data from rfc8009
>>      crypto/krb5: Implement the Camellia enctypes from rfc6803
>>      rxrpc: Add the security index for yfs-rxgk
>>      rxrpc: Add YFS RxGK (GSSAPI) security class
>>      rxrpc: rxgk: Provide infrastructure and key derivation
>>      rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)
>>      rxrpc: rxgk: Implement connection rekeying
>>      rxgk: Support OpenAFS's rxgk implementation
>> 
>> 
>> crypto/krb5/Kconfig              |    9 +
>> crypto/krb5/Makefile             |   11 +-
>> crypto/krb5/internal.h           |  101 +++
>> crypto/krb5/kdf.c                |  223 ++++++
>> crypto/krb5/main.c               |  190 +++++
>> crypto/krb5/rfc3961_simplified.c |  732 ++++++++++++++++++
>> crypto/krb5/rfc3962_aes.c        |  140 ++++
>> crypto/krb5/rfc6803_camellia.c   |  249 ++++++
>> crypto/krb5/rfc8009_aes2.c       |  440 +++++++++++
>> crypto/krb5/selftest.c           |  543 +++++++++++++
>> crypto/krb5/selftest_data.c      |  289 +++++++
>> fs/afs/misc.c                    |   13 +
>> include/crypto/krb5.h            |  100 +++
>> include/keys/rxrpc-type.h        |   17 +
>> include/trace/events/rxrpc.h     |    4 +
>> include/uapi/linux/rxrpc.h       |   17 +
>> net/rxrpc/Kconfig                |   10 +
>> net/rxrpc/Makefile               |    5 +
>> net/rxrpc/ar-internal.h          |   20 +
>> net/rxrpc/conn_object.c          |    2 +
>> net/rxrpc/key.c                  |  319 ++++++++
>> net/rxrpc/rxgk.c                 | 1232 ++++++++++++++++++++++++++++++
>> net/rxrpc/rxgk_app.c             |  424 ++++++++++
>> net/rxrpc/rxgk_common.h          |  164 ++++
>> net/rxrpc/rxgk_kdf.c             |  271 +++++++
>> net/rxrpc/security.c             |    6 +
>> 26 files changed, 5530 insertions(+), 1 deletion(-)
>> create mode 100644 crypto/krb5/kdf.c
>> create mode 100644 crypto/krb5/rfc3961_simplified.c
>> create mode 100644 crypto/krb5/rfc3962_aes.c
>> create mode 100644 crypto/krb5/rfc6803_camellia.c
>> create mode 100644 crypto/krb5/rfc8009_aes2.c
>> create mode 100644 crypto/krb5/selftest.c
>> create mode 100644 crypto/krb5/selftest_data.c
>> create mode 100644 net/rxrpc/rxgk.c
>> create mode 100644 net/rxrpc/rxgk_app.c
>> create mode 100644 net/rxrpc/rxgk_common.h
>> create mode 100644 net/rxrpc/rxgk_kdf.c

--
Chuck Lever
J. Bruce Fields Nov. 12, 2020, 9:07 p.m. UTC | #8
On Thu, Nov 12, 2020 at 04:54:06PM +0000, David Howells wrote:
> Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> > Really? My understanding of the Linux kernel SUNRPC implementation is
> > that it uses asynchronous, even for small data items. Maybe I'm using
> > the terminology incorrectly.
> 
> Seems to be synchronous, at least in its use of skcipher:

Yes, it's all synchronous.  The only cases where we defer and revisit a
request is when we need to do upcalls to userspace.

(And those upcalls mostly come after we're done with unwrapping and
verifying a request, so now I'm sort of curious exactly what Chuck was
seeing.)

--b.
Chuck Lever Nov. 12, 2020, 9:09 p.m. UTC | #9
> On Nov 12, 2020, at 4:07 PM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Thu, Nov 12, 2020 at 04:54:06PM +0000, David Howells wrote:
>> Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>>> Really? My understanding of the Linux kernel SUNRPC implementation is
>>> that it uses asynchronous, even for small data items. Maybe I'm using
>>> the terminology incorrectly.
>> 
>> Seems to be synchronous, at least in its use of skcipher:
> 
> Yes, it's all synchronous.  The only cases where we defer and revisit a
> request is when we need to do upcalls to userspace.
> 
> (And those upcalls mostly come after we're done with unwrapping and
> verifying a request, so now I'm sort of curious exactly what Chuck was
> seeing.)

I vaguely recall that setting CRYPTO_TFM_REQ_MAY_SLEEP allows the
crypto API to sleep and defer completion.


--
Chuck Lever
Herbert Xu Nov. 26, 2020, 6:33 a.m. UTC | #10
On Thu, Nov 12, 2020 at 12:57:45PM +0000, David Howells wrote:
> 
> Hi Herbert, Bruce,
> 
> Here's my first cut at a generic Kerberos crypto library in the kernel so
> that I can share code between rxrpc and sunrpc (and cifs?).

Hi David:

I can't find the bit where you are actually sharing this code with
sunrpc, am I missing something?

Cheers,
David Howells Nov. 26, 2020, 8:19 a.m. UTC | #11
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> > Here's my first cut at a generic Kerberos crypto library in the kernel so
> > that I can share code between rxrpc and sunrpc (and cifs?).
> 
> I can't find the bit where you are actually sharing this code with
> sunrpc, am I missing something?

I haven't done that yet.  Sorry, I should've been more explicit with what I
was after.  I was wanting to find out if the nfs/nfsd people are okay with
this (and if there are any gotchas I should know about - it turns out, if I
understand it correctly, the relevant code may being being rewritten a bit
anyway).

And from you, I was wanting to find out if you're okay with an interface of
this kind in crypto/ where the code is just used directly - or whether I'll
be required to wrap it up in the autoloading, module-handling mechanisms.

David
Herbert Xu Nov. 27, 2020, 5:07 a.m. UTC | #12
On Thu, Nov 26, 2020 at 08:19:41AM +0000, David Howells wrote:
>
> I haven't done that yet.  Sorry, I should've been more explicit with what I
> was after.  I was wanting to find out if the nfs/nfsd people are okay with
> this (and if there are any gotchas I should know about - it turns out, if I
> understand it correctly, the relevant code may being being rewritten a bit
> anyway).
> 
> And from you, I was wanting to find out if you're okay with an interface of
> this kind in crypto/ where the code is just used directly - or whether I'll
> be required to wrap it up in the autoloading, module-handling mechanisms.

I don't have any problems with it living under crypto.  However,
I'd like to see what the sunrpc code looks like before going one
way or another.

Cheers,
David Howells Dec. 1, 2020, 8:44 a.m. UTC | #13
Btw, would it be feasible to make it so that an extra parameter can be added
to the cipher buffer-supplying functions, e.g.:

	skcipher_request_set_crypt(req, input, ciphertext_sg, esize, iv);

such that we can pass in an offset into the output sg as well?

David
Herbert Xu Dec. 1, 2020, 8:46 a.m. UTC | #14
On Tue, Dec 01, 2020 at 08:44:33AM +0000, David Howells wrote:
> Btw, would it be feasible to make it so that an extra parameter can be added
> to the cipher buffer-supplying functions, e.g.:
> 
> 	skcipher_request_set_crypt(req, input, ciphertext_sg, esize, iv);
> 
> such that we can pass in an offset into the output sg as well?

Couldn't you just change the output sg to include the offset?

Cheers,
David Howells Dec. 1, 2020, 9:12 a.m. UTC | #15
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Couldn't you just change the output sg to include the offset?

That depends on whether the caller has passed it elsewhere for some other
parallel purpose, but I think I'm going to have to go down that road and
restore it afterwards.

David
Herbert Xu Dec. 1, 2020, 10:36 a.m. UTC | #16
On Tue, Dec 01, 2020 at 09:12:38AM +0000, David Howells wrote:
> 
> That depends on whether the caller has passed it elsewhere for some other
> parallel purpose, but I think I'm going to have to go down that road and
> restore it afterwards.

Sure but even if you added it to the API the underlying
implementataions would just have to do the same thing.

Since this is particular to your use-case it's better to leave
the complexity where it's needed rather than propagting it to
all the crypto drivers.

Cheers,
David Howells Dec. 4, 2020, 2:59 p.m. UTC | #17
Hi Chuck, Bruce,

Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
gss_krb5_aes_encrypt() code looks like the attached.

From what I can tell, in AES mode, the difference between the main cipher and
the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
"cts(cbc(aes))" - but they have the same key.

Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
same as the non-CTS, except for the last two blocks, but the non-CTS one is
more efficient.

David
---
	nbytes = buf->len - offset - GSS_KRB5_TOK_HDR_LEN;
	nblocks = (nbytes + blocksize - 1) / blocksize;
	cbcbytes = 0;
	if (nblocks > 2)
		cbcbytes = (nblocks - 2) * blocksize;

	memset(desc.iv, 0, sizeof(desc.iv));

	if (cbcbytes) {
		SYNC_SKCIPHER_REQUEST_ON_STACK(req, aux_cipher);

		desc.pos = offset + GSS_KRB5_TOK_HDR_LEN;
		desc.fragno = 0;
		desc.fraglen = 0;
		desc.pages = pages;
		desc.outbuf = buf;
		desc.req = req;

		skcipher_request_set_sync_tfm(req, aux_cipher);
		skcipher_request_set_callback(req, 0, NULL, NULL);

		sg_init_table(desc.infrags, 4);
		sg_init_table(desc.outfrags, 4);

		err = xdr_process_buf(buf, offset + GSS_KRB5_TOK_HDR_LEN,
				      cbcbytes, encryptor, &desc);
		skcipher_request_zero(req);
		if (err)
			goto out_err;
	}

	/* Make sure IV carries forward from any CBC results. */
	err = gss_krb5_cts_crypt(cipher, buf,
				 offset + GSS_KRB5_TOK_HDR_LEN + cbcbytes,
				 desc.iv, pages, 1);
	if (err) {
		err = GSS_S_FAILURE;
		goto out_err;
	}
J. Bruce Fields Dec. 4, 2020, 3:46 p.m. UTC | #18
On Fri, Dec 04, 2020 at 02:59:35PM +0000, David Howells wrote:
> Hi Chuck, Bruce,
> 
> Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
> gss_krb5_aes_encrypt() code looks like the attached.
> 
> >From what I can tell, in AES mode, the difference between the main cipher and
> the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
> "cts(cbc(aes))" - but they have the same key.
> 
> Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
> same as the non-CTS, except for the last two blocks, but the non-CTS one is
> more efficient.

CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
that did that, and I don't remember the history.  I thought it was
required by some spec or peer implementation (maybe Windows?) but I
really don't remember.  It may predate git.  I'll dig around and see
what I can find.

--b.

> 
> David
> ---
> 	nbytes = buf->len - offset - GSS_KRB5_TOK_HDR_LEN;
> 	nblocks = (nbytes + blocksize - 1) / blocksize;
> 	cbcbytes = 0;
> 	if (nblocks > 2)
> 		cbcbytes = (nblocks - 2) * blocksize;
> 
> 	memset(desc.iv, 0, sizeof(desc.iv));
> 
> 	if (cbcbytes) {
> 		SYNC_SKCIPHER_REQUEST_ON_STACK(req, aux_cipher);
> 
> 		desc.pos = offset + GSS_KRB5_TOK_HDR_LEN;
> 		desc.fragno = 0;
> 		desc.fraglen = 0;
> 		desc.pages = pages;
> 		desc.outbuf = buf;
> 		desc.req = req;
> 
> 		skcipher_request_set_sync_tfm(req, aux_cipher);
> 		skcipher_request_set_callback(req, 0, NULL, NULL);
> 
> 		sg_init_table(desc.infrags, 4);
> 		sg_init_table(desc.outfrags, 4);
> 
> 		err = xdr_process_buf(buf, offset + GSS_KRB5_TOK_HDR_LEN,
> 				      cbcbytes, encryptor, &desc);
> 		skcipher_request_zero(req);
> 		if (err)
> 			goto out_err;
> 	}
> 
> 	/* Make sure IV carries forward from any CBC results. */
> 	err = gss_krb5_cts_crypt(cipher, buf,
> 				 offset + GSS_KRB5_TOK_HDR_LEN + cbcbytes,
> 				 desc.iv, pages, 1);
> 	if (err) {
> 		err = GSS_S_FAILURE;
> 		goto out_err;
> 	}
David Howells Dec. 4, 2020, 4:01 p.m. UTC | #19
Bruce Fields <bfields@fieldses.org> wrote:

> > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
> > same as the non-CTS, except for the last two blocks, but the non-CTS one is
> > more efficient.
> 
> CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
> that did that, and I don't remember the history.  I thought it was
> required by some spec or peer implementation (maybe Windows?) but I
> really don't remember.  It may predate git.  I'll dig around and see
> what I can find.

rfc3961 and rfc3962 specify CTS-CBC with AES.

David
J. Bruce Fields Dec. 4, 2020, 4:03 p.m. UTC | #20
On Fri, Dec 04, 2020 at 04:01:53PM +0000, David Howells wrote:
> Bruce Fields <bfields@fieldses.org> wrote:
> 
> > > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
> > > same as the non-CTS, except for the last two blocks, but the non-CTS one is
> > > more efficient.
> > 
> > CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
> > that did that, and I don't remember the history.  I thought it was
> > required by some spec or peer implementation (maybe Windows?) but I
> > really don't remember.  It may predate git.  I'll dig around and see
> > what I can find.
> 
> rfc3961 and rfc3962 specify CTS-CBC with AES.

OK, I guess I don't understand the question.  I haven't thought about
this code in at least a decade.  What's an auxilary cipher?  Is this a
question about why we're implementing something, or how we're
implementing it?

--b.
Chuck Lever Dec. 4, 2020, 4:05 p.m. UTC | #21
> On Dec 4, 2020, at 10:46 AM, Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Fri, Dec 04, 2020 at 02:59:35PM +0000, David Howells wrote:
>> Hi Chuck, Bruce,
>> 
>> Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
>> gss_krb5_aes_encrypt() code looks like the attached.
>> 
>>> From what I can tell, in AES mode, the difference between the main cipher and
>> the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
>> "cts(cbc(aes))" - but they have the same key.
>> 
>> Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
>> same as the non-CTS, except for the last two blocks, but the non-CTS one is
>> more efficient.
> 
> CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
> that did that, and I don't remember the history.  I thought it was
> required by some spec or peer implementation (maybe Windows?) but I
> really don't remember.  It may predate git.  I'll dig around and see
> what I can find.

I can't add more here, this design comes from well before I started
working on this body of code (though, I worked near Kevin when he
implemented it).


--
Chuck Lever
J. Bruce Fields Dec. 4, 2020, 4:14 p.m. UTC | #22
On Fri, Dec 04, 2020 at 10:46:26AM -0500, Bruce Fields wrote:
> On Fri, Dec 04, 2020 at 02:59:35PM +0000, David Howells wrote:
> > Hi Chuck, Bruce,
> > 
> > Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
> > gss_krb5_aes_encrypt() code looks like the attached.
> > 
> > >From what I can tell, in AES mode, the difference between the main cipher and
> > the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
> > "cts(cbc(aes))" - but they have the same key.
> > 
> > Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
> > same as the non-CTS, except for the last two blocks, but the non-CTS one is
> > more efficient.
> 
> CTS is cipher-text stealing, isn't it?  I think it was Kevin Coffman
> that did that, and I don't remember the history.  I thought it was
> required by some spec or peer implementation (maybe Windows?) but I
> really don't remember.  It may predate git.  I'll dig around and see
> what I can find.

Like I say, I've got no insight here, I'm just grepping through
mailboxes and stuff, but maybe some of this history's useful;

Addition of CTS mode:

	https://lore.kernel.org/linux-crypto/20080220202543.3209.47410.stgit@jazz.citi.umich.edu/

This rpc/krb5 code went in with 934a95aa1c9c "gss_krb5: add remaining
pieces to enable AES encryption support"; may be worth looking at that
and the series leading up to it, I see the changelogs have some RFC
references that might explain why it's using the crypto it is.

--b.
David Howells Dec. 4, 2020, 4:50 p.m. UTC | #23
Bruce Fields <bfields@fieldses.org> wrote:

> OK, I guess I don't understand the question.  I haven't thought about
> this code in at least a decade.  What's an auxilary cipher?  Is this a
> question about why we're implementing something, or how we're
> implementing it?

That's what the Linux sunrpc implementation calls them:

	struct crypto_sync_skcipher *acceptor_enc;
	struct crypto_sync_skcipher *initiator_enc;
	struct crypto_sync_skcipher *acceptor_enc_aux;
	struct crypto_sync_skcipher *initiator_enc_aux;

Auxiliary ciphers aren't mentioned in rfc396{1,2} so it appears to be
something peculiar to that implementation.

So acceptor_enc and acceptor_enc_aux, for instance, are both based on the same
key, and the implementation seems to pass the IV from one to the other.  The
only difference is that the 'aux' cipher lacks the CTS wrapping - which only
makes a difference for the final two blocks[*] of the encryption (or
decryption) - and only if the data doesn't fully fill out the last block
(ie. it needs padding in some way so that the encryption algorithm can handle
it).

[*] Encryption cipher blocks, that is.

So I think it's purpose is twofold:

 (1) It's a way to be a bit more efficient, cutting out the CTS layer's
     indirection and additional buffering.

 (2) crypto_skcipher_encrypt() assumes that it's doing the entire crypto
     operation in one go and will always impose the final CTS bit, so you
     can't call it repeatedly to progress through a buffer (as
     xdr_process_buf() would like to do) as that would corrupt the data being
     encrypted - unless you made sure that the data was always block-size
     aligned (in which case, there's no point using CTS).

I wonder how much going through three layers of crypto modules costs.  Looking
at how AES can be implemented using, say, Intel AES intructions, it looks like
AES+CBC should be easy to do in a single module.  I wonder if we could have
optimised kerberos crypto that do the AES and the SHA together in a single
loop.

David
Ard Biesheuvel Dec. 4, 2020, 5:06 p.m. UTC | #24
On Fri, 4 Dec 2020 at 17:52, David Howells <dhowells@redhat.com> wrote:
>
> Bruce Fields <bfields@fieldses.org> wrote:
>
> > OK, I guess I don't understand the question.  I haven't thought about
> > this code in at least a decade.  What's an auxilary cipher?  Is this a
> > question about why we're implementing something, or how we're
> > implementing it?
>
> That's what the Linux sunrpc implementation calls them:
>
>         struct crypto_sync_skcipher *acceptor_enc;
>         struct crypto_sync_skcipher *initiator_enc;
>         struct crypto_sync_skcipher *acceptor_enc_aux;
>         struct crypto_sync_skcipher *initiator_enc_aux;
>
> Auxiliary ciphers aren't mentioned in rfc396{1,2} so it appears to be
> something peculiar to that implementation.
>
> So acceptor_enc and acceptor_enc_aux, for instance, are both based on the same
> key, and the implementation seems to pass the IV from one to the other.  The
> only difference is that the 'aux' cipher lacks the CTS wrapping - which only
> makes a difference for the final two blocks[*] of the encryption (or
> decryption) - and only if the data doesn't fully fill out the last block
> (ie. it needs padding in some way so that the encryption algorithm can handle
> it).
>
> [*] Encryption cipher blocks, that is.
>
> So I think it's purpose is twofold:
>
>  (1) It's a way to be a bit more efficient, cutting out the CTS layer's
>      indirection and additional buffering.
>
>  (2) crypto_skcipher_encrypt() assumes that it's doing the entire crypto
>      operation in one go and will always impose the final CTS bit, so you
>      can't call it repeatedly to progress through a buffer (as
>      xdr_process_buf() would like to do) as that would corrupt the data being
>      encrypted - unless you made sure that the data was always block-size
>      aligned (in which case, there's no point using CTS).
>
> I wonder how much going through three layers of crypto modules costs.  Looking
> at how AES can be implemented using, say, Intel AES intructions, it looks like
> AES+CBC should be easy to do in a single module.  I wonder if we could have
> optimised kerberos crypto that do the AES and the SHA together in a single
> loop.
>

The tricky thing with CTS is that you have to ensure that the final
full and partial blocks are presented to the crypto driver as one
chunk, or it won't be able to perform the ciphertext stealing. This
might be the reason for the current approach. If the sunrpc code has
multiple disjoint chunks of data to encrypto, it is always better to
wrap it in a single scatterlist and call into the skcipher only once.

However, I would recommend against it: at least for ARM and arm64, I
have already contributed SIMD based implementations that use SIMD
permutation instructions and overlapping loads and stores to perform
the ciphertext stealing, which means that there is only a single layer
which implements CTS+CBC+AES, and this layer can consume the entire
scatterlist in one go. We could easily do something similar in the
AES-NI driver as well.
David Howells Dec. 4, 2020, 5:19 p.m. UTC | #25
Ard Biesheuvel <ardb@kernel.org> wrote:

> The tricky thing with CTS is that you have to ensure that the final
> full and partial blocks are presented to the crypto driver as one
> chunk, or it won't be able to perform the ciphertext stealing. This
> might be the reason for the current approach. If the sunrpc code has
> multiple disjoint chunks of data to encrypto, it is always better to
> wrap it in a single scatterlist and call into the skcipher only once.

Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB
plus bits of non-contiguous pages, requiring >8K of scatterlist elements
(admittedly, we can chain them, but we may have to do one or more large
allocations).

> However, I would recommend against it:

Sorry, recommend against what?

> at least for ARM and arm64, I
> have already contributed SIMD based implementations that use SIMD
> permutation instructions and overlapping loads and stores to perform
> the ciphertext stealing, which means that there is only a single layer
> which implements CTS+CBC+AES, and this layer can consume the entire
> scatterlist in one go. We could easily do something similar in the
> AES-NI driver as well.

Can you point me at that in the sources?

Can you also do SHA at the same time in the same loop?

Note that the rfc3962 AES does the checksum over the plaintext, but rfc8009
does it over the ciphertext.

David
Ard Biesheuvel Dec. 4, 2020, 5:35 p.m. UTC | #26
On Fri, 4 Dec 2020 at 18:19, David Howells <dhowells@redhat.com> wrote:
>
> Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > The tricky thing with CTS is that you have to ensure that the final
> > full and partial blocks are presented to the crypto driver as one
> > chunk, or it won't be able to perform the ciphertext stealing. This
> > might be the reason for the current approach. If the sunrpc code has
> > multiple disjoint chunks of data to encrypto, it is always better to
> > wrap it in a single scatterlist and call into the skcipher only once.
>
> Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB
> plus bits of non-contiguous pages, requiring >8K of scatterlist elements
> (admittedly, we can chain them, but we may have to do one or more large
> allocations).
>
> > However, I would recommend against it:
>
> Sorry, recommend against what?
>

Recommend against the current approach of manipulating the input like
this and feeding it into the skcipher piecemeal.

Herbert recently made some changes for MSG_MORE support in the AF_ALG
code, which permits a skcipher encryption to be split into several
invocations of the skcipher layer without the need for this complexity
on the side of the caller. Maybe there is a way to reuse that here.
Herbert?

> > at least for ARM and arm64, I
> > have already contributed SIMD based implementations that use SIMD
> > permutation instructions and overlapping loads and stores to perform
> > the ciphertext stealing, which means that there is only a single layer
> > which implements CTS+CBC+AES, and this layer can consume the entire
> > scatterlist in one go. We could easily do something similar in the
> > AES-NI driver as well.
>
> Can you point me at that in the sources?
>

arm64 has

arch/arm64/crypto/aes-glue.c
arch/arm64/crypto/aes-modes.S

where the former implements the skcipher wrapper for an implementation
of "cts(cbc(aes))"

static int cts_cbc_encrypt(struct skcipher_request *req)

walks over the src/dst scatterlist and feeds the data into the asm
helpers, one for the bulk of the input, and one for the final full and
partial blocks (or two final full blocks)

The SIMD asm helpers are

aes_cbc_encrypt
aes_cbc_decrypt
aes_cbc_cts_encrypt
aes_cbc_cts_decrypt

> Can you also do SHA at the same time in the same loop?
>

SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD.
The former doesn't really fit the current API so we'd have to invent
something for it.

> Note that the rfc3962 AES does the checksum over the plaintext, but rfc8009
> does it over the ciphertext.
>
> David
>
Theodore Ts'o Dec. 4, 2020, 6:13 p.m. UTC | #27
On Fri, Dec 04, 2020 at 02:59:35PM +0000, David Howells wrote:
> Hi Chuck, Bruce,
> 
> Why is gss_krb5_crypto.c using an auxiliary cipher?  For reference, the
> gss_krb5_aes_encrypt() code looks like the attached.
> 
> From what I can tell, in AES mode, the difference between the main cipher and
> the auxiliary cipher is that the latter is "cbc(aes)" whereas the former is
> "cts(cbc(aes))" - but they have the same key.
> 
> Reading up on CTS, I'm guessing the reason it's like this is that CTS is the
> same as the non-CTS, except for the last two blocks, but the non-CTS one is
> more efficient.

The reason to use CTS is if you don't want to expand the size of the
cipher text to the cipher block size.  e.g., if you have a 53 byte
plaintext, and you can't afford to let the ciphertext be 56 bytes, the
cryptographic engineer will reach for CTS instead of CBC.

So that probably explains the explanation to use CTS (and it's
required by the spec in any case).  As far as why CBC is being used
instead of CTS, the only reason I can think of is the one you posted.
Perhaps there was some hardware or software configureation where
cbc(aes) was hardware accelerated, and cts(cbc(aes)) would not be?

In any case, using cbc(aes) for all but the last two blocks, and using
cts(cbc(aes)) for the last two blocks, is identical to using
cts(cbc(aes)) for the whole encryption.  So the only reason to do this
in the more complex way would be because for performance reasons.

       	    	    	      	 	 - Ted
Herbert Xu Dec. 4, 2020, 9:08 p.m. UTC | #28
On Fri, Dec 04, 2020 at 06:35:48PM +0100, Ard Biesheuvel wrote:
>
> Herbert recently made some changes for MSG_MORE support in the AF_ALG
> code, which permits a skcipher encryption to be split into several
> invocations of the skcipher layer without the need for this complexity
> on the side of the caller. Maybe there is a way to reuse that here.
> Herbert?

Yes this was one of the reasons I was persuing the continuation
work.  It should allow us to kill the special case for CTS in the
krb5 code.

Hopefully I can get some time to restart work on this soon.

Cheers,
David Howells Dec. 7, 2020, 8:24 a.m. UTC | #29
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> > Herbert recently made some changes for MSG_MORE support in the AF_ALG
> > code, which permits a skcipher encryption to be split into several
> > invocations of the skcipher layer without the need for this complexity
> > on the side of the caller. Maybe there is a way to reuse that here.
> > Herbert?
> 
> Yes this was one of the reasons I was persuing the continuation
> work.  It should allow us to kill the special case for CTS in the
> krb5 code.
> 
> Hopefully I can get some time to restart work on this soon.

In the krb5 case, we know in advance how much data we're going to be dealing
with, if that helps.

David
David Howells Dec. 7, 2020, 12:01 p.m. UTC | #30
Ard Biesheuvel <ardb@kernel.org> wrote:

> > Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB
> > plus bits of non-contiguous pages, requiring >8K of scatterlist elements
> > (admittedly, we can chain them, but we may have to do one or more large
> > allocations).
> >
> > > However, I would recommend against it:
> >
> > Sorry, recommend against what?
> >
> 
> Recommend against the current approach of manipulating the input like
> this and feeding it into the skcipher piecemeal.

Right.  I understand the problem, but as I mentioned above, the scatterlist
itself becomes a performance issue as it may exceed two pages in size.  Double
that as there may need to be separate input and output scatterlists.

> Herbert recently made some changes for MSG_MORE support in the AF_ALG
> code, which permits a skcipher encryption to be split into several
> invocations of the skcipher layer without the need for this complexity
> on the side of the caller. Maybe there is a way to reuse that here.
> Herbert?

I wonder if it would help if the input buffer and output buffer didn't have to
correspond exactly in usage - ie. the output buffer could be used at a slower
rate than the input to allow for buffering inside the crypto algorithm.

> > Can you also do SHA at the same time in the same loop?
> 
> SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD.
> The former doesn't really fit the current API so we'd have to invent
> something for it.

The hashes corresponding to the kerberos enctypes I'm supporting are:

HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96.

HMAC-SHA256 for aes128-cts-hmac-sha256-128

HMAC-SHA384 for aes256-cts-hmac-sha384-192

CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac

I'm not sure you can support all of those with the instructions available.

David
Ard Biesheuvel Dec. 7, 2020, 1:08 p.m. UTC | #31
On Mon, 7 Dec 2020 at 13:02, David Howells <dhowells@redhat.com> wrote:
>
> Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > > Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB
> > > plus bits of non-contiguous pages, requiring >8K of scatterlist elements
> > > (admittedly, we can chain them, but we may have to do one or more large
> > > allocations).
> > >
> > > > However, I would recommend against it:
> > >
> > > Sorry, recommend against what?
> > >
> >
> > Recommend against the current approach of manipulating the input like
> > this and feeding it into the skcipher piecemeal.
>
> Right.  I understand the problem, but as I mentioned above, the scatterlist
> itself becomes a performance issue as it may exceed two pages in size.  Double
> that as there may need to be separate input and output scatterlists.
>

I wasn't aware that Herbert's work hadn't been merged yet. So that
means it is entirely reasonable to split the input like this and feed
the first part into a cbc(aes) skcipher and the last part into a
cts(cbc(aes)) skcipher, provided that you ensure that the last part
covers the final two blocks (one full block and one block that is
either full or partial)

With Herbert's changes, you will be able to use the same skcipher, and
pass a flag to all but the final part that more data is coming. But
for lack of that, the current approach is optimal for cases where
having to cover the entire input with a single scatterlist is
undesirable.

> > Herbert recently made some changes for MSG_MORE support in the AF_ALG
> > code, which permits a skcipher encryption to be split into several
> > invocations of the skcipher layer without the need for this complexity
> > on the side of the caller. Maybe there is a way to reuse that here.
> > Herbert?
>
> I wonder if it would help if the input buffer and output buffer didn't have to
> correspond exactly in usage - ie. the output buffer could be used at a slower
> rate than the input to allow for buffering inside the crypto algorithm.
>

I don't follow - how could one be used at a slower rate?

> > > Can you also do SHA at the same time in the same loop?
> >
> > SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD.
> > The former doesn't really fit the current API so we'd have to invent
> > something for it.
>
> The hashes corresponding to the kerberos enctypes I'm supporting are:
>
> HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96.
>
> HMAC-SHA256 for aes128-cts-hmac-sha256-128
>
> HMAC-SHA384 for aes256-cts-hmac-sha384-192
>
> CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac
>
> I'm not sure you can support all of those with the instructions available.
>

It depends on whether the caller can make use of the authenc()
pattern, which is a type of AEAD we support. There are numerous
implementations of authenc(hmac(shaXXX),cbc(aes)), including h/w
accelerated ones, but none that implement ciphertext stealing. So that
means that, even if you manage to use the AEAD layer to perform both
at the same time, the generic authenc() template will perform the
cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes,
respectively, which won't give you any benefit until accelerated
implementations turn up that perform the whole operation in one pass
over the input. And even then, I don't think the performance benefit
will be worth it.
David Howells Dec. 7, 2020, 2:15 p.m. UTC | #32
Ard Biesheuvel <ardb@kernel.org> wrote:

> > I wonder if it would help if the input buffer and output buffer didn't
> > have to correspond exactly in usage - ie. the output buffer could be used
> > at a slower rate than the input to allow for buffering inside the crypto
> > algorithm.
> >
> 
> I don't follow - how could one be used at a slower rate?

I mean that the crypto algorithm might need to buffer the last part of the
input until it has a block's worth before it can write to the output.

> > The hashes corresponding to the kerberos enctypes I'm supporting are:
> >
> > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96.
> >
> > HMAC-SHA256 for aes128-cts-hmac-sha256-128
> >
> > HMAC-SHA384 for aes256-cts-hmac-sha384-192
> >
> > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac
> >
> > I'm not sure you can support all of those with the instructions available.
>
> It depends on whether the caller can make use of the authenc()
> pattern, which is a type of AEAD we support.

Interesting.  I didn't realise AEAD was an API.

> There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)),
> including h/w accelerated ones, but none that implement ciphertext
> stealing. So that means that, even if you manage to use the AEAD layer to
> perform both at the same time, the generic authenc() template will perform
> the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes,
> respectively, which won't give you any benefit until accelerated
> implementations turn up that perform the whole operation in one pass over
> the input. And even then, I don't think the performance benefit will be
> worth it.

Also, the rfc8009 variants that use AES with SHA256/384 hash the ciphertext,
not the plaintext.

For the moment, it's probably not worth worrying about, then.  If I can manage
to abstract the sunrpc bits out into a krb5 library, we can improve the
library later.

David
Ard Biesheuvel Dec. 8, 2020, 8:27 a.m. UTC | #33
On Mon, 7 Dec 2020 at 15:15, David Howells <dhowells@redhat.com> wrote:
>
> Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > > I wonder if it would help if the input buffer and output buffer didn't
> > > have to correspond exactly in usage - ie. the output buffer could be used
> > > at a slower rate than the input to allow for buffering inside the crypto
> > > algorithm.
> > >
> >
> > I don't follow - how could one be used at a slower rate?
>
> I mean that the crypto algorithm might need to buffer the last part of the
> input until it has a block's worth before it can write to the output.
>

This is what is typically handled transparently by the driver. When
you populate a scatterlist, it doesn't matter how misaligned the
individual elements are, the scatterlist walker will always present
the data in chunks that the crypto algorithm can manage. This is why
using a single scatterlist for the entire input is preferable in
general.

> > > The hashes corresponding to the kerberos enctypes I'm supporting are:
> > >
> > > HMAC-SHA1 for aes128-cts-hmac-sha1-96 and aes256-cts-hmac-sha1-96.
> > >
> > > HMAC-SHA256 for aes128-cts-hmac-sha256-128
> > >
> > > HMAC-SHA384 for aes256-cts-hmac-sha384-192
> > >
> > > CMAC-CAMELLIA for camellia128-cts-cmac and camellia256-cts-cmac
> > >
> > > I'm not sure you can support all of those with the instructions available.
> >
> > It depends on whether the caller can make use of the authenc()
> > pattern, which is a type of AEAD we support.
>
> Interesting.  I didn't realise AEAD was an API.
>
> > There are numerous implementations of authenc(hmac(shaXXX),cbc(aes)),
> > including h/w accelerated ones, but none that implement ciphertext
> > stealing. So that means that, even if you manage to use the AEAD layer to
> > perform both at the same time, the generic authenc() template will perform
> > the cts(cbc(aes)) and hmac(shaXXX) by calling into skciphers and ahashes,
> > respectively, which won't give you any benefit until accelerated
> > implementations turn up that perform the whole operation in one pass over
> > the input. And even then, I don't think the performance benefit will be
> > worth it.
>
> Also, the rfc8009 variants that use AES with SHA256/384 hash the ciphertext,
> not the plaintext.
>
> For the moment, it's probably not worth worrying about, then.  If I can manage
> to abstract the sunrpc bits out into a krb5 library, we can improve the
> library later.
>
David Howells Dec. 8, 2020, 9:18 a.m. UTC | #34
Ard Biesheuvel <ardb@kernel.org> wrote:

Ard Biesheuvel <ardb@kernel.org> wrote:

> > > > I wonder if it would help if the input buffer and output buffer didn't
> > > > have to correspond exactly in usage - ie. the output buffer could be
> > > > used at a slower rate than the input to allow for buffering inside the
> > > > crypto algorithm.
> > >
> > > I don't follow - how could one be used at a slower rate?
> >
> > I mean that the crypto algorithm might need to buffer the last part of the
> > input until it has a block's worth before it can write to the output.
> 
> This is what is typically handled transparently by the driver. When
> you populate a scatterlist, it doesn't matter how misaligned the
> individual elements are, the scatterlist walker will always present
> the data in chunks that the crypto algorithm can manage. This is why
> using a single scatterlist for the entire input is preferable in
> general.

Yep - but the assumption currently on the part of the callers is that they
provide the input buffer and corresponding output buffer - and that the
algorithm will transfer data from one to the other, such that the same amount
of input and output bufferage will be used.

However, if we start pushing data in progressively, this would no longer hold
true unless we also require the caller to only present in block-size chunks.

For example, if I gave the encryption function 120 bytes of data and a 120
byte output buffer, but the algorithm has a 16-byte blocksize, it will,
presumably, consume 120 bytes of input, but it can only write 112 bytes of
output at this time.  So the current interface would need to evolve to
indicate separately how much input has been consumed and how much output has
been produced - in which case it can't be handled transparently.

For krb5, it's actually worse than that, since we want to be able to
insert/remove a header and a trailer (and might need to go back and update the
header after) - but I think in the krb5 case, we need to treat the header and
trailer specially and update them after the fact in the wrapping case
(unwrapping is not a problem, since we can just cache the header).

David
David Howells Dec. 8, 2020, 1:25 p.m. UTC | #35
I wonder - would it make sense to reserve two arrays of scatterlist structs
and a mutex per CPU sufficient to map up to 1MiB of pages with each array
while the krb5 service is in use?

That way sunrpc could, say, grab the mutex, map the input and output buffers,
do the entire crypto op in one go and then release the mutex - at least for
big ops, small ops needn't use this service.

For rxrpc/afs's use case this would probably be overkill - it's doing crypto
on each packet, not on whole operations - but I could still make use of it
there.

However, that then limits the maximum size of an op to 1MiB, plus dangly bits
on either side (which can be managed with chained scatterlist structs) and
also limits the number of large simultaneous krb5 crypto ops we can do.

David
David Howells Dec. 8, 2020, 2:02 p.m. UTC | #36
David Howells <dhowells@redhat.com> wrote:

> I wonder - would it make sense to reserve two arrays of scatterlist structs
> and a mutex per CPU sufficient to map up to 1MiB of pages with each array
> while the krb5 service is in use?

Actually, simply reserving a set per CPU is probably unnecessary.  We could,
say, set a minimum and a maximum on the reservations (say 2 -> 2*nr_cpus) and
then allocate new ones when we run out.  Then let the memory shrinker clean
them up off an lru list.

David
Ard Biesheuvel Dec. 8, 2020, 2:04 p.m. UTC | #37
On Tue, 8 Dec 2020 at 14:25, David Howells <dhowells@redhat.com> wrote:
>
> I wonder - would it make sense to reserve two arrays of scatterlist structs
> and a mutex per CPU sufficient to map up to 1MiB of pages with each array
> while the krb5 service is in use?
>
> That way sunrpc could, say, grab the mutex, map the input and output buffers,
> do the entire crypto op in one go and then release the mutex - at least for
> big ops, small ops needn't use this service.
>
> For rxrpc/afs's use case this would probably be overkill - it's doing crypto
> on each packet, not on whole operations - but I could still make use of it
> there.
>
> However, that then limits the maximum size of an op to 1MiB, plus dangly bits
> on either side (which can be managed with chained scatterlist structs) and
> also limits the number of large simultaneous krb5 crypto ops we can do.
>

Apparently, it is permitted for gss_krb5_cts_crypt() to do a
kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is
being invoked, and so I don't see why it wouldn't be possible to
simply kmalloc() a scatterlist[] of the appropriate size, populate it
with all the pages, bufs and whatever else gets passed into the
skcipher, and pass it into the skcipher in one go.
David Howells Dec. 8, 2020, 2:13 p.m. UTC | #38
Ard Biesheuvel <ardb@kernel.org> wrote:

> Apparently, it is permitted for gss_krb5_cts_crypt() to do a
> kmalloc(GFP_NOFS) in the context from where gss_krb5_aes_encrypt() is
> being invoked, and so I don't see why it wouldn't be possible to
> simply kmalloc() a scatterlist[] of the appropriate size, populate it
> with all the pages, bufs and whatever else gets passed into the
> skcipher, and pass it into the skcipher in one go.

I never said it wasn't possible.  But doing a pair of order-1 allocations from
there might have a significant detrimental effect on performance - in which
case Trond and co. will say "no".

Remember: to crypt 1MiB of data on a 64-bit machine requires 2 x minimum 8KiB
scatterlist arrays.  That's assuming the pages in the middle are contiguous,
which might not be the case for a direct I/O read/write.  So for the DIO case,
it could be involve an order-2 allocation (or chaining of single pages).

David