diff mbox

crypto: ccm - avoid scatterlist for MAC encryption

Message ID 1476551776-8099-1-git-send-email-ard.biesheuvel@linaro.org (mailing list archive)
State Not Applicable
Delegated to: Johannes Berg
Headers show

Commit Message

Ard Biesheuvel Oct. 15, 2016, 5:16 p.m. UTC
The CCM code goes out of its way to perform the CTR encryption of the MAC
using the subordinate CTR driver. To this end, it tweaks the input and
output scatterlists so the aead_req 'odata' and/or 'auth_tag' fields [which
may live on the stack] are prepended to the CTR payload. This involves
calling sg_set_buf() on addresses which are not direct mapped, which is
not supported.

Since the calculation of the MAC keystream involves a single call into
the cipher, to which we have a handle already given that the CBC-MAC
calculation uses it as well, just calculate the MAC keystream directly,
and record it in the aead_req private context so we can apply it to the
MAC in cypto_ccm_auth_mac(). This greatly simplifies the scatterlist
manipulation, and no longer requires scatterlists to refer to buffers
that may live on the stack.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---

This is an alternative for the patch 'mac80211: aes_ccm: move struct
aead_req off the stack' that I sent out yesterday. IMO, this is a more
correct approach, since it addresses the problem directly in crypto/ccm.c,
which is the only CCM-AES driver that suffers from this issue.

 crypto/ccm.c | 55 +++++++++++---------
 1 file changed, 29 insertions(+), 26 deletions(-)

Comments

Johannes Berg Oct. 17, 2016, 7:28 a.m. UTC | #1
On Sat, 2016-10-15 at 18:16 +0100, Ard Biesheuvel wrote:
> The CCM code goes out of its way to perform the CTR encryption of the
> MAC using the subordinate CTR driver. To this end, it tweaks the
> input and output scatterlists so the aead_req 'odata' and/or
> 'auth_tag' fields [which may live on the stack] are prepended to the
> CTR payload. This involves calling sg_set_buf() on addresses which
> are not direct mapped, which is not supported.

> Since the calculation of the MAC keystream involves a single call
> into the cipher, to which we have a handle already given that the
> CBC-MAC calculation uses it as well, just calculate the MAC keystream
> directly, and record it in the aead_req private context so we can
> apply it to the MAC in cypto_ccm_auth_mac(). This greatly simplifies
> the scatterlist manipulation, and no longer requires scatterlists to
> refer to buffers that may live on the stack.

No objection from me, Herbert?

I'm getting a bit nervous though - I'd rather have any fix first so
people get things working again - so maybe I'll apply your other patch
and mine first, and then we can replace yours by this later.

johannes
Ard Biesheuvel Oct. 17, 2016, 7:37 a.m. UTC | #2
On 17 October 2016 at 08:28, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Sat, 2016-10-15 at 18:16 +0100, Ard Biesheuvel wrote:
>> The CCM code goes out of its way to perform the CTR encryption of the
>> MAC using the subordinate CTR driver. To this end, it tweaks the
>> input and output scatterlists so the aead_req 'odata' and/or
>> 'auth_tag' fields [which may live on the stack] are prepended to the
>> CTR payload. This involves calling sg_set_buf() on addresses which
>> are not direct mapped, which is not supported.
>
>> Since the calculation of the MAC keystream involves a single call
>> into the cipher, to which we have a handle already given that the
>> CBC-MAC calculation uses it as well, just calculate the MAC keystream
>> directly, and record it in the aead_req private context so we can
>> apply it to the MAC in cypto_ccm_auth_mac(). This greatly simplifies
>> the scatterlist manipulation, and no longer requires scatterlists to
>> refer to buffers that may live on the stack.
>
> No objection from me, Herbert?
>
> I'm getting a bit nervous though - I'd rather have any fix first so
> people get things working again - so maybe I'll apply your other patch
> and mine first, and then we can replace yours by this later.
>

Could we get a statement first whether it is supported to allocate
aead_req (and other crypto req structures) on the stack? If not, then
we have our work cut out for us. But if it is, I'd rather we didn't
apply the kzalloc/kfree patch, since it is just a workaround for the
broken generic CCM driver, for which a fix is already available.

Also, regarding your __percpu patch: those are located in the vmalloc
area as well, at least on arm64, and likely other architectures too.
Johannes Berg Oct. 17, 2016, 7:47 a.m. UTC | #3
On Mon, 2016-10-17 at 08:37 +0100, Ard Biesheuvel wrote:

> Could we get a statement first whether it is supported to allocate
> aead_req (and other crypto req structures) on the stack? 

Well, we haven't heard from Herbert :)

> If not, then
> we have our work cut out for us. But if it is, I'd rather we didn't
> apply the kzalloc/kfree patch, since it is just a workaround for the
> broken generic CCM driver, for which a fix is already available.

Yeah but I can't apply it. I just fixed up your kzalloc patch to also
handle GCM and GMAC, and to have error checking. Will send it in a
minute.

> Also, regarding your __percpu patch: those are located in the vmalloc
> area as well, at least on arm64, and likely other architectures too.

Crap. Any other bright ideas?

johannes
Ard Biesheuvel Oct. 17, 2016, 7:50 a.m. UTC | #4
On 17 October 2016 at 08:47, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Mon, 2016-10-17 at 08:37 +0100, Ard Biesheuvel wrote:
>>
>> Could we get a statement first whether it is supported to allocate
>> aead_req (and other crypto req structures) on the stack?
>
> Well, we haven't heard from Herbert :)
>
>> If not, then
>> we have our work cut out for us. But if it is, I'd rather we didn't
>> apply the kzalloc/kfree patch, since it is just a workaround for the
>> broken generic CCM driver, for which a fix is already available.
>
> Yeah but I can't apply it. I just fixed up your kzalloc patch to also
> handle GCM and GMAC, and to have error checking. Will send it in a
> minute.
>

I just realised that patch should probably use
aead_request_alloc/aead_request_free [and drop the memset]. That also
fixes the latent bug where the alignment of the req ctx is not take
into account.

>> Also, regarding your __percpu patch: those are located in the vmalloc
>> area as well, at least on arm64, and likely other architectures too.
>
> Crap. Any other bright ideas?
>

kmem_cache_create() and kmem_cache_alloc()
Johannes Berg Oct. 17, 2016, 7:57 a.m. UTC | #5
On Mon, 2016-10-17 at 08:50 +0100, Ard Biesheuvel wrote:

> I just realised that patch should probably use
> aead_request_alloc/aead_request_free [and drop the memset]. That also
> fixes the latent bug where the alignment of the req ctx is not take
> into account.

Good point, I'll fix that up.

> > 
> > > 
> > > Also, regarding your __percpu patch: those are located in the
> > > vmalloc
> > > area as well, at least on arm64, and likely other architectures
> > > too.
> > 
> > Crap. Any other bright ideas?
> > 
> 
> kmem_cache_create() and kmem_cache_alloc()

for the aad/b0/j0/etc? Hmm. Yeah I guess we should do those dynamically
as well then.

johannes
Andy Lutomirski Oct. 17, 2016, 5:08 p.m. UTC | #6
On Mon, Oct 17, 2016 at 12:37 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 17 October 2016 at 08:28, Johannes Berg <johannes@sipsolutions.net> wrote:
>> On Sat, 2016-10-15 at 18:16 +0100, Ard Biesheuvel wrote:
>>> The CCM code goes out of its way to perform the CTR encryption of the
>>> MAC using the subordinate CTR driver. To this end, it tweaks the
>>> input and output scatterlists so the aead_req 'odata' and/or
>>> 'auth_tag' fields [which may live on the stack] are prepended to the
>>> CTR payload. This involves calling sg_set_buf() on addresses which
>>> are not direct mapped, which is not supported.
>>
>>> Since the calculation of the MAC keystream involves a single call
>>> into the cipher, to which we have a handle already given that the
>>> CBC-MAC calculation uses it as well, just calculate the MAC keystream
>>> directly, and record it in the aead_req private context so we can
>>> apply it to the MAC in cypto_ccm_auth_mac(). This greatly simplifies
>>> the scatterlist manipulation, and no longer requires scatterlists to
>>> refer to buffers that may live on the stack.
>>
>> No objection from me, Herbert?
>>
>> I'm getting a bit nervous though - I'd rather have any fix first so
>> people get things working again - so maybe I'll apply your other patch
>> and mine first, and then we can replace yours by this later.
>>
>
> Could we get a statement first whether it is supported to allocate
> aead_req (and other crypto req structures) on the stack? If not, then
> we have our work cut out for us. But if it is, I'd rather we didn't
> apply the kzalloc/kfree patch, since it is just a workaround for the
> broken generic CCM driver, for which a fix is already available.

I'm not a crypto person, but I don't see why not.  There's even a
helper called SKCIPHER_REQUEST_ON_STACK. :)  The only problem I know
of is pointing a scatterlist at the stack, which is bad for much the
same reason as doing real DMA from the stack.
Ard Biesheuvel Oct. 17, 2016, 5:21 p.m. UTC | #7
On 17 October 2016 at 18:08, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Oct 17, 2016 at 12:37 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 17 October 2016 at 08:28, Johannes Berg <johannes@sipsolutions.net> wrote:
>>> On Sat, 2016-10-15 at 18:16 +0100, Ard Biesheuvel wrote:
>>>> The CCM code goes out of its way to perform the CTR encryption of the
>>>> MAC using the subordinate CTR driver. To this end, it tweaks the
>>>> input and output scatterlists so the aead_req 'odata' and/or
>>>> 'auth_tag' fields [which may live on the stack] are prepended to the
>>>> CTR payload. This involves calling sg_set_buf() on addresses which
>>>> are not direct mapped, which is not supported.
>>>
>>>> Since the calculation of the MAC keystream involves a single call
>>>> into the cipher, to which we have a handle already given that the
>>>> CBC-MAC calculation uses it as well, just calculate the MAC keystream
>>>> directly, and record it in the aead_req private context so we can
>>>> apply it to the MAC in cypto_ccm_auth_mac(). This greatly simplifies
>>>> the scatterlist manipulation, and no longer requires scatterlists to
>>>> refer to buffers that may live on the stack.
>>>
>>> No objection from me, Herbert?
>>>
>>> I'm getting a bit nervous though - I'd rather have any fix first so
>>> people get things working again - so maybe I'll apply your other patch
>>> and mine first, and then we can replace yours by this later.
>>>
>>
>> Could we get a statement first whether it is supported to allocate
>> aead_req (and other crypto req structures) on the stack? If not, then
>> we have our work cut out for us. But if it is, I'd rather we didn't
>> apply the kzalloc/kfree patch, since it is just a workaround for the
>> broken generic CCM driver, for which a fix is already available.
>
> I'm not a crypto person, but I don't see why not.  There's even a
> helper called SKCIPHER_REQUEST_ON_STACK. :)  The only problem I know
> of is pointing a scatterlist at the stack, which is bad for much the
> same reason as doing real DMA from the stack.

Excellent point!

So the CCM code was an easy fix, although the RFC4309 part is still broken:

It does

"""
scatterwalk_map_and_copy(iv + 16, req->src, 0, req->assoclen - 8, 0);
...
sg_set_buf(rctx->src, iv + 16, req->assoclen - 8);
sg = scatterwalk_ffwd(rctx->src + 1, req->src, req->assoclen);
"""

which essentially just hides the last 8 bytes of associated data from
the inner CCM transform. So we'd need to rewrite this part to create a
new scatterlist that omits those 8 bytes instead of just replacing the
first sg entry and point it to another buffer entirely (and copy the
data into it)

The GCM code is much more complicated, and does not easily allow the
offending sg_set_buf() calls to be turned into something that only
involves direct mapped memory references. I haven't looked at anything
else.

Annoyingly, all this complication with scatterlists etc is for doing
asynchronous crypto via DMA capable crypto accelerators, and the
networking code (ipsec as well as mac80211, afaik) only allow
synchronous in the first place, given that they execute in softirq
context.
Herbert Xu Oct. 19, 2016, 3:31 a.m. UTC | #8
On Mon, Oct 17, 2016 at 06:21:14PM +0100, Ard Biesheuvel wrote:
>
> Annoyingly, all this complication with scatterlists etc is for doing
> asynchronous crypto via DMA capable crypto accelerators, and the
> networking code (ipsec as well as mac80211, afaik) only allow
> synchronous in the first place, given that they execute in softirq
> context.

I'm still thinking about the issue (in particular, whether we
should continue to rely on the request context being SG-capable
or allow it to be on the stack for AEAD).

But IPsec definitely supports async crypto.  In fact it was the
very first user of async crypto.

mac80211 on the other hand is currently sync-only.

Cheers,
Johannes Berg Oct. 19, 2016, 7:43 a.m. UTC | #9
On Wed, 2016-10-19 at 11:31 +0800, Herbert Xu wrote:
> On Mon, Oct 17, 2016 at 06:21:14PM +0100, Ard Biesheuvel wrote:
> > 
> > 
> > Annoyingly, all this complication with scatterlists etc is for
> > doing
> > asynchronous crypto via DMA capable crypto accelerators, and the
> > networking code (ipsec as well as mac80211, afaik) only allow
> > synchronous in the first place, given that they execute in softirq
> > context.
> 
> I'm still thinking about the issue (in particular, whether we
> should continue to rely on the request context being SG-capable
> or allow it to be on the stack for AEAD).

:)

> But IPsec definitely supports async crypto.  In fact it was the
> very first user of async crypto.

Yeah.

> mac80211 on the other hand is currently sync-only.

We could probably make mac80211 do that too, but can we guarantee in-
order processing? Anyway, it's pretty low priority, maybe never
happening, since hardly anyone really uses "software" crypto, the wifi
devices mostly have it built in anyway.

(One problem is that the skb->cb is already completely full, so we
can't stash away the AAD there)

johannes
Ard Biesheuvel Oct. 19, 2016, 3:08 p.m. UTC | #10
On 19 October 2016 at 08:43, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Wed, 2016-10-19 at 11:31 +0800, Herbert Xu wrote:
>> On Mon, Oct 17, 2016 at 06:21:14PM +0100, Ard Biesheuvel wrote:
>> >
>> >
>> > Annoyingly, all this complication with scatterlists etc is for
>> > doing
>> > asynchronous crypto via DMA capable crypto accelerators, and the
>> > networking code (ipsec as well as mac80211, afaik) only allow
>> > synchronous in the first place, given that they execute in softirq
>> > context.
>>
>> I'm still thinking about the issue (in particular, whether we
>> should continue to rely on the request context being SG-capable
>> or allow it to be on the stack for AEAD).
>
> :)
>
>> But IPsec definitely supports async crypto.  In fact it was the
>> very first user of async crypto.
>
> Yeah.
>

Ah yes, my bad.

>> mac80211 on the other hand is currently sync-only.
>
> We could probably make mac80211 do that too, but can we guarantee in-
> order processing? Anyway, it's pretty low priority, maybe never
> happening, since hardly anyone really uses "software" crypto, the wifi
> devices mostly have it built in anyway.
>

Indeed. The code is now correct in terms of API requirements, so let's
just wait for someone to complain about any performance regressions.
Ben Greear Oct. 19, 2016, 3:59 p.m. UTC | #11
On 10/19/2016 08:08 AM, Ard Biesheuvel wrote:
> On 19 October 2016 at 08:43, Johannes Berg <johannes@sipsolutions.net> wrote:
>> On Wed, 2016-10-19 at 11:31 +0800, Herbert Xu wrote:

>> We could probably make mac80211 do that too, but can we guarantee in-
>> order processing? Anyway, it's pretty low priority, maybe never
>> happening, since hardly anyone really uses "software" crypto, the wifi
>> devices mostly have it built in anyway.
>>
>
> Indeed. The code is now correct in terms of API requirements, so let's
> just wait for someone to complain about any performance regressions.

Do you actually expect performance regressions?  I'll be complaining if
so, but will test first :)

Thanks,
Ben
Johannes Berg Oct. 19, 2016, 4 p.m. UTC | #12
On Wed, 2016-10-19 at 08:59 -0700, Ben Greear wrote:

> Do you actually expect performance regressions?  I'll be complaining
> if so, but will test first :)

I think we can expect this to use a bit more CPU time, but unless
you're very tight on that you probably shouldn't expect any throughput
difference.

johannes
diff mbox

Patch

diff --git a/crypto/ccm.c b/crypto/ccm.c
index 006d8575ef5c..faa5efcf59e2 100644
--- a/crypto/ccm.c
+++ b/crypto/ccm.c
@@ -46,10 +46,13 @@  struct crypto_ccm_req_priv_ctx {
 	u8 odata[16];
 	u8 idata[16];
 	u8 auth_tag[16];
+	u8 cmac[16];
 	u32 ilen;
 	u32 flags;
-	struct scatterlist src[3];
-	struct scatterlist dst[3];
+	struct scatterlist *src;
+	struct scatterlist *dst;
+	struct scatterlist srcbuf[2];
+	struct scatterlist dstbuf[2];
 	struct skcipher_request skreq;
 };
 
@@ -280,6 +283,8 @@  static int crypto_ccm_auth(struct aead_request *req, struct scatterlist *plain,
 	if (cryptlen)
 		get_data_to_compute(cipher, pctx, plain, cryptlen);
 
+	crypto_xor(odata, pctx->cmac, 16);
+
 out:
 	return err;
 }
@@ -307,10 +312,12 @@  static inline int crypto_ccm_check_iv(const u8 *iv)
 	return 0;
 }
 
-static int crypto_ccm_init_crypt(struct aead_request *req, u8 *tag)
+static int crypto_ccm_init_crypt(struct aead_request *req)
 {
+	struct crypto_aead *aead = crypto_aead_reqtfm(req);
+	struct crypto_ccm_ctx *ctx = crypto_aead_ctx(aead);
 	struct crypto_ccm_req_priv_ctx *pctx = crypto_ccm_reqctx(req);
-	struct scatterlist *sg;
+	struct crypto_cipher *cipher = ctx->cipher;
 	u8 *iv = req->iv;
 	int err;
 
@@ -325,19 +332,16 @@  static int crypto_ccm_init_crypt(struct aead_request *req, u8 *tag)
 	 */
 	memset(iv + 15 - iv[0], 0, iv[0] + 1);
 
-	sg_init_table(pctx->src, 3);
-	sg_set_buf(pctx->src, tag, 16);
-	sg = scatterwalk_ffwd(pctx->src + 1, req->src, req->assoclen);
-	if (sg != pctx->src + 1)
-		sg_chain(pctx->src, 2, sg);
+	/* prepare the key stream for the auth tag  */
+	crypto_cipher_encrypt_one(cipher, pctx->cmac, iv);
 
-	if (req->src != req->dst) {
-		sg_init_table(pctx->dst, 3);
-		sg_set_buf(pctx->dst, tag, 16);
-		sg = scatterwalk_ffwd(pctx->dst + 1, req->dst, req->assoclen);
-		if (sg != pctx->dst + 1)
-			sg_chain(pctx->dst, 2, sg);
-	}
+	/* increment BE counter in IV[] for the actual payload */
+	iv[15] = 1;
+
+	pctx->src = scatterwalk_ffwd(pctx->srcbuf, req->src, req->assoclen);
+	if (req->src != req->dst)
+		pctx->dst = scatterwalk_ffwd(pctx->dstbuf, req->dst,
+					     req->assoclen);
 
 	return 0;
 }
@@ -354,11 +358,11 @@  static int crypto_ccm_encrypt(struct aead_request *req)
 	u8 *iv = req->iv;
 	int err;
 
-	err = crypto_ccm_init_crypt(req, odata);
+	err = crypto_ccm_init_crypt(req);
 	if (err)
 		return err;
 
-	err = crypto_ccm_auth(req, sg_next(pctx->src), cryptlen);
+	err = crypto_ccm_auth(req, pctx->src, cryptlen);
 	if (err)
 		return err;
 
@@ -369,13 +373,13 @@  static int crypto_ccm_encrypt(struct aead_request *req)
 	skcipher_request_set_tfm(skreq, ctx->ctr);
 	skcipher_request_set_callback(skreq, pctx->flags,
 				      crypto_ccm_encrypt_done, req);
-	skcipher_request_set_crypt(skreq, pctx->src, dst, cryptlen + 16, iv);
+	skcipher_request_set_crypt(skreq, pctx->src, dst, cryptlen, iv);
 	err = crypto_skcipher_encrypt(skreq);
 	if (err)
 		return err;
 
 	/* copy authtag to end of dst */
-	scatterwalk_map_and_copy(odata, sg_next(dst), cryptlen,
+	scatterwalk_map_and_copy(odata, dst, cryptlen,
 				 crypto_aead_authsize(aead), 1);
 	return err;
 }
@@ -392,7 +396,7 @@  static void crypto_ccm_decrypt_done(struct crypto_async_request *areq,
 
 	pctx->flags = 0;
 
-	dst = sg_next(req->src == req->dst ? pctx->src : pctx->dst);
+	dst = req->src == req->dst ? pctx->src : pctx->dst;
 
 	if (!err) {
 		err = crypto_ccm_auth(req, dst, cryptlen);
@@ -418,12 +422,11 @@  static int crypto_ccm_decrypt(struct aead_request *req)
 
 	cryptlen -= authsize;
 
-	err = crypto_ccm_init_crypt(req, authtag);
+	err = crypto_ccm_init_crypt(req);
 	if (err)
 		return err;
 
-	scatterwalk_map_and_copy(authtag, sg_next(pctx->src), cryptlen,
-				 authsize, 0);
+	scatterwalk_map_and_copy(authtag, pctx->src, cryptlen, authsize, 0);
 
 	dst = pctx->src;
 	if (req->src != req->dst)
@@ -432,12 +435,12 @@  static int crypto_ccm_decrypt(struct aead_request *req)
 	skcipher_request_set_tfm(skreq, ctx->ctr);
 	skcipher_request_set_callback(skreq, pctx->flags,
 				      crypto_ccm_decrypt_done, req);
-	skcipher_request_set_crypt(skreq, pctx->src, dst, cryptlen + 16, iv);
+	skcipher_request_set_crypt(skreq, pctx->src, dst, cryptlen, iv);
 	err = crypto_skcipher_decrypt(skreq);
 	if (err)
 		return err;
 
-	err = crypto_ccm_auth(req, sg_next(dst), cryptlen);
+	err = crypto_ccm_auth(req, dst, cryptlen);
 	if (err)
 		return err;