[v4] mac80211: move extra crypto data off the stack

Message ID	CAKv+Gu-aZhCBvnEQoZUZLDPXCrvgxO1pSd=6EHz+tMB+dFz5hg@mail.gmail.com (mailing list archive)
State	Not Applicable
Delegated to:	Johannes Berg
Headers	show Return-Path: <linux-wireless-owner@kernel.org> MIME-Version: 1.0 In-Reply-To: <1476693195-14071-1-git-send-email-johannes@sipsolutions.net> References: <1476693195-14071-1-git-send-email-johannes@sipsolutions.net> From: Ard Biesheuvel <ard.biesheuvel@linaro.org> Date: Mon, 17 Oct 2016 10:14:28 +0100 Message-ID: <CAKv+Gu-aZhCBvnEQoZUZLDPXCrvgxO1pSd=6EHz+tMB+dFz5hg@mail.gmail.com> Subject: Re: [PATCH v4] mac80211: move extra crypto data off the stack To: Johannes Berg <johannes@sipsolutions.net> Cc: "<linux-wireless@vger.kernel.org>" <linux-wireless@vger.kernel.org>, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>, "<netdev@vger.kernel.org>" <netdev@vger.kernel.org>, Herbert Xu <herbert@gondor.apana.org.au>, Johannes Berg <johannes.berg@intel.com> Content-Type: text/plain; charset=UTF-8 Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk

Ard Biesheuvel Oct. 17, 2016, 9:14 a.m. UTC

On 17 October 2016 at 09:33, Johannes Berg <johannes@sipsolutions.net> wrote:
> From: Johannes Berg <johannes.berg@intel.com>
>
> As the stack can (on x86-64) now be virtually mapped rather than
> using "normal" kernel memory, Sergey noticed mac80211 isn't using
> the SG APIs correctly by putting on-stack buffers into SG tables.
> This leads to kernel crashes.
>
> Fix this by allocating the extra fields dynamically on the fly as
> needed, using a kmem cache.
>
> I used per-CPU memory in a previous iteration of this patch, but
> Ard Biesheuvel pointed out that was also vmalloc'ed on some
> architectures.
>
> Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Apologies for going back and forth on this, but it appears there may
be another way to deal with this.

First of all, we only need this handling for the authenticated data,
and only for CCM and GCM, not CMAC (which does not use scatterlists at
all, it simply calls the AES cipher directly)

So that leaves a fixed 20 bytes for GCM and fixed 32 bytes for CCM,
which we could allocate along with the AEAD request, e..g.,

"""

@@ -49,6 +53,7 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead
*tfm, u8 *b_0, u8 *aad,
 {
        struct scatterlist sg[3];
        struct aead_request *aead_req;
+       u8 *__aad;
        int err;

        if (data_len == 0)
@@ -58,8 +63,11 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead
*tfm, u8 *b_0, u8 *aad,
        if (!aead_req)
                return -ENOMEM;

+       __aad = (u8 *)aead_req + crypto_aead_reqsize(tfm);
+       memcpy(__aad, aad, 2 * AES_BLOCK_SIZE);
+
        sg_init_table(sg, 3);
-       sg_set_buf(&sg[0], &aad[2], be16_to_cpup((__be16 *)aad));
+       sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad));
        sg_set_buf(&sg[1], data, data_len);
        sg_set_buf(&sg[2], mic, mic_len);

@@ -90,6 +98,8 @@ struct crypto_aead
*ieee80211_aes_key_setup_encrypt(const u8 key[],
        if (err)
                goto free_aead;

+       crypto_aead_set_reqsize(tfm,
+                               crypto_aead_reqsize(tfm) + 2 * AES_BLOCK_SIZE));
        return tfm;

 free_aead:
"""

Ard Biesheuvel Oct. 17, 2016, 9:17 a.m. UTC | #1

On 17 October 2016 at 10:14, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 17 October 2016 at 09:33, Johannes Berg <johannes@sipsolutions.net> wrote:
>> From: Johannes Berg <johannes.berg@intel.com>
>>
>> As the stack can (on x86-64) now be virtually mapped rather than
>> using "normal" kernel memory, Sergey noticed mac80211 isn't using
>> the SG APIs correctly by putting on-stack buffers into SG tables.
>> This leads to kernel crashes.
>>
>> Fix this by allocating the extra fields dynamically on the fly as
>> needed, using a kmem cache.
>>
>> I used per-CPU memory in a previous iteration of this patch, but
>> Ard Biesheuvel pointed out that was also vmalloc'ed on some
>> architectures.
>>
>> Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
>> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>
> Apologies for going back and forth on this, but it appears there may
> be another way to deal with this.
>
> First of all, we only need this handling for the authenticated data,
> and only for CCM and GCM, not CMAC (which does not use scatterlists at
> all, it simply calls the AES cipher directly)
>
> So that leaves a fixed 20 bytes for GCM and fixed 32 bytes for CCM,
> which we could allocate along with the AEAD request, e..g.,
>
> """
> diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aes_ccm.c
> index 8e898a6e8de8..c0c33e6ad94e 100644
> --- a/net/mac80211/aes_ccm.c
> +++ b/net/mac80211/aes_ccm.c
> @@ -24,13 +24,17 @@ int ieee80211_aes_ccm_encrypt(struct crypto_aead
> *tfm, u8 *b_0, u8 *aad,
>  {
>         struct scatterlist sg[3];
>         struct aead_request *aead_req;
> +       u8 *__aad;
>
>         aead_req = aead_request_alloc(tfm, GFP_ATOMIC);
>         if (!aead_req)
>                 return -ENOMEM;
>
> +       __aad = (u8 *)aead_req + crypto_aead_reqsize(tfm);
> +       memcpy(__aad, aad, 2 * AES_BLOCK_SIZE);
> +
>         sg_init_table(sg, 3);
> -       sg_set_buf(&sg[0], &aad[2], be16_to_cpup((__be16 *)aad));
> +       sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad));
>         sg_set_buf(&sg[1], data, data_len);
>         sg_set_buf(&sg[2], mic, mic_len);
>
> @@ -49,6 +53,7 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead
> *tfm, u8 *b_0, u8 *aad,
>  {
>         struct scatterlist sg[3];
>         struct aead_request *aead_req;
> +       u8 *__aad;
>         int err;
>
>         if (data_len == 0)
> @@ -58,8 +63,11 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead
> *tfm, u8 *b_0, u8 *aad,
>         if (!aead_req)
>                 return -ENOMEM;
>
> +       __aad = (u8 *)aead_req + crypto_aead_reqsize(tfm);
> +       memcpy(__aad, aad, 2 * AES_BLOCK_SIZE);
> +
>         sg_init_table(sg, 3);
> -       sg_set_buf(&sg[0], &aad[2], be16_to_cpup((__be16 *)aad));
> +       sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad));
>         sg_set_buf(&sg[1], data, data_len);
>         sg_set_buf(&sg[2], mic, mic_len);
>
> @@ -90,6 +98,8 @@ struct crypto_aead
> *ieee80211_aes_key_setup_encrypt(const u8 key[],
>         if (err)
>                 goto free_aead;
>
> +       crypto_aead_set_reqsize(tfm,
> +                               crypto_aead_reqsize(tfm) + 2 * AES_BLOCK_SIZE));
>         return tfm;
>

Darn, it seems crypto_aead_set_reqsize() is internal to the crypto API ... :-(

Johannes Berg Oct. 17, 2016, 9:23 a.m. UTC | #2

> Apologies for going back and forth on this, but it appears there may
> be another way to deal with this.
> 
> First of all, we only need this handling for the authenticated data,

Are you sure b_0/j_0 aren't needed? We pass those
to aead_request_set_crypt(), and I wasn't sure what that really did
internally, perhaps like the internal data.

Testing with that on the stack does seem to work, in fact.

Surely we need zero for GMAC though, since we also put that into the sg
list. Thus for GMAC we definitely need 20+16 bytes, and since I round
up to a cacheline (at least on SMP) it doesn't really matter that we
could get 36 instead of the 48 I have now.

> and only for CCM and GCM, not CMAC (which does not use scatterlists
> at all, it simply calls the AES cipher directly)

I didn't modify CMAC, I think, only GMAC, which also uses scatterlists.

> So that leaves a fixed 20 bytes for GCM and fixed 32 bytes for CCM,

and 36 for GMAC :)

johannes

Ard Biesheuvel Oct. 17, 2016, 9:30 a.m. UTC | #3

On 17 October 2016 at 10:23, Johannes Berg <johannes@sipsolutions.net> wrote:
>
>> Apologies for going back and forth on this, but it appears there may
>> be another way to deal with this.
>>
>> First of all, we only need this handling for the authenticated data,
>
> Are you sure b_0/j_0 aren't needed? We pass those
> to aead_request_set_crypt(), and I wasn't sure what that really did
> internally, perhaps like the internal data.
>

They are the IV[], which is a fixed length parameter of the algorithm.
In contrast, the AAD[] could be of arbitrary length (from the POV of
the crypto API) so it uses scatterlists.

> Testing with that on the stack does seem to work, in fact.
>
> Surely we need zero for GMAC though, since we also put that into the sg
> list. Thus for GMAC we definitely need 20+16 bytes, and since I round
> up to a cacheline (at least on SMP) it doesn't really matter that we
> could get 36 instead of the 48 I have now.
>
>> and only for CCM and GCM, not CMAC (which does not use scatterlists
>> at all, it simply calls the AES cipher directly)
>
> I didn't modify CMAC, I think, only GMAC, which also uses scatterlists.
>

Ah ok, I misread the patch.

>> So that leaves a fixed 20 bytes for GCM and fixed 32 bytes for CCM,
>
> and 36 for GMAC :)

Yes. But as I replied, setting the req size is not supported atm,
although it is reasonable to demand a way to allocate additional data
in the request specifically for this issue. So let's proceed with the
aead_request_alloc/free patch, but I would like to propose something
on the API side to address this particular issue

Johannes Berg Oct. 17, 2016, 9:35 a.m. UTC | #4

On Mon, 2016-10-17 at 10:30 +0100, Ard Biesheuvel wrote:

> Yes. But as I replied, setting the req size is not supported atm,
> although it is reasonable to demand a way to allocate additional data
> in the request specifically for this issue. So let's proceed with the
> aead_request_alloc/free patch, but I would like to propose something
> on the API side to address this particular issue

Well, if your other patch to make it OK to be on-stack would be applied
instead, this wouldn't make much sense either :)

In this particular patch, we could reduce the size of the struct, but I
don't actually think it'll make a difference to go from 48 to 36 bytes,
given alignment etc., so I think I'll just leave it as is.

johannes

Ard Biesheuvel Oct. 17, 2016, 9:49 a.m. UTC | #5

> On 17 Oct 2016, at 10:35, Johannes Berg <johannes@sipsolutions.net> wrote:
> 
>> On Mon, 2016-10-17 at 10:30 +0100, Ard Biesheuvel wrote:
>> 
>> Yes. But as I replied, setting the req size is not supported atm,
>> although it is reasonable to demand a way to allocate additional data
>> in the request specifically for this issue. So let's proceed with the
>> aead_request_alloc/free patch, but I would like to propose something
>> on the API side to address this particular issue
> 
> Well, if your other patch to make it OK to be on-stack would be applied
> instead, this wouldn't make much sense either :)
> 
> In this particular patch, we could reduce the size of the struct, but I
> don't actually think it'll make a difference to go from 48 to 36 bytes,
> given alignment etc., so I think I'll just leave it as is.
> 
> johannes

Ard Biesheuvel Oct. 17, 2016, 9:52 a.m. UTC | #6

> On 17 Oct 2016, at 10:35, Johannes Berg <johannes@sipsolutions.net> wrote:
> 
>> On Mon, 2016-10-17 at 10:30 +0100, Ard Biesheuvel wrote:
>> 
>> Yes. But as I replied, setting the req size is not supported atm,
>> although it is reasonable to demand a way to allocate additional data
>> in the request specifically for this issue. So let's proceed with the
>> aead_request_alloc/free patch, but I would like to propose something
>> on the API side to address this particular issue
> 
> Well, if your other patch to make it OK to be on-stack would be applied
> instead, this wouldn't make much sense either :)
> 

Yes but that one only fixes ccm not gcm

> In this particular patch, we could reduce the size of the struct, but I
> don't actually think it'll make a difference to go from 48 to 36 bytes,
> given alignment etc., so I think I'll just leave it as is.
> 

I understand you are in a hurry, but this is unlikely to be the only such issue. I will propose an 'auxdata' feature for the crypto api that can be used here, but also for any other occurrence where client data assoiciated with the request can no longer be allocated on the stack

Johannes Berg Oct. 17, 2016, 9:54 a.m. UTC | #7

> > Well, if your other patch to make it OK to be on-stack would be
> > applied instead, this wouldn't make much sense either :)
> > 
> 
> Yes but that one only fixes ccm not gcm

Yes, but we can do the same for GCM, no?

> > In this particular patch, we could reduce the size of the struct,
> > but I
> > don't actually think it'll make a difference to go from 48 to 36
> > bytes,
> > given alignment etc., so I think I'll just leave it as is.
> > 
> 
> I understand you are in a hurry, but this is unlikely to be the only
> such issue. I will propose an 'auxdata' feature for the crypto api
> that can be used here, but also for any other occurrence where client
> data assoiciated with the request can no longer be allocated on the
> stack

No objections. I'll merge this anyway today I think, reverting is easy
later.

johannes

Ard Biesheuvel Oct. 17, 2016, 10:02 a.m. UTC | #8

> On 17 Oct 2016, at 10:54, Johannes Berg <johannes@sipsolutions.net> wrote:
> 
> 
>>> Well, if your other patch to make it OK to be on-stack would be
>>> applied instead, this wouldn't make much sense either :)
>>> 
>> 
>> Yes but that one only fixes ccm not gcm
> 
> Yes, but we can do the same for GCM, no?
> 

No, not really. ccm happens to use aes with the same key for the mac and the encryption. gcm uses an another algo entirely for the mac

>>> In this particular patch, we could reduce the size of the struct,
>>> but I
>>> don't actually think it'll make a difference to go from 48 to 36
>>> bytes,
>>> given alignment etc., so I think I'll just leave it as is.
>>> 
>> 
>> I understand you are in a hurry, but this is unlikely to be the only
>> such issue. I will propose an 'auxdata' feature for the crypto api
>> that can be used here, but also for any other occurrence where client
>> data assoiciated with the request can no longer be allocated on the
>> stack
> 
> No objections. I'll merge this anyway today I think, reverting is easy
> later.
> 

ok fair enough

Ard Biesheuvel Oct. 17, 2016, 1:06 p.m. UTC | #9

On 17 October 2016 at 11:02, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
>
>> On 17 Oct 2016, at 10:54, Johannes Berg <johannes@sipsolutions.net> wrote:
>>
>>
>>>> Well, if your other patch to make it OK to be on-stack would be
>>>> applied instead, this wouldn't make much sense either :)
>>>>
>>>
>>> Yes but that one only fixes ccm not gcm
>>
>> Yes, but we can do the same for GCM, no?
>>
>
> No, not really. ccm happens to use aes with the same key for the mac and the encryption. gcm uses an another algo entirely for the mac
>
>>>> In this particular patch, we could reduce the size of the struct,
>>>> but I
>>>> don't actually think it'll make a difference to go from 48 to 36
>>>> bytes,
>>>> given alignment etc., so I think I'll just leave it as is.
>>>>
>>>
>>> I understand you are in a hurry, but this is unlikely to be the only
>>> such issue. I will propose an 'auxdata' feature for the crypto api
>>> that can be used here, but also for any other occurrence where client
>>> data assoiciated with the request can no longer be allocated on the
>>> stack
>>
>> No objections. I'll merge this anyway today I think, reverting is easy
>> later.
>>
>
> ok fair enough

Actually, while I think it will be worthwhile going forward to
implement such an 'auxiliary data' feature in a generic way, I still
think we should address the issue at hand without too much
complication.

If we pedal back to the version of 'mac80211: move struct aead_req off
the stack' that uses kzalloc() instead of aead_request_alloc(), we can
simply add some space for aad[] and/or zero[], and get rid of the kmem
cache entirely.

If you're past this point already, i won't bother but otherwise I can
rework 'mac80211: move struct aead_req off the stack' so that the
other patch is no longer required (and IIRC, this is actually
something you proposed yourself a couple of iterations ago?)

Johannes Berg Oct. 17, 2016, 1:16 p.m. UTC | #10

On Mon, 2016-10-17 at 14:06 +0100, Ard Biesheuvel wrote:
> 
> Actually, while I think it will be worthwhile going forward to
> implement such an 'auxiliary data' feature in a generic way, I still
> think we should address the issue at hand without too much
> complication.
> 
> If we pedal back to the version of 'mac80211: move struct aead_req
> off the stack' that uses kzalloc() instead of aead_request_alloc(),
> we can simply add some space for aad[] and/or zero[], and get rid of
> the kmem cache entirely.
> 
> If you're past this point already, i won't bother but otherwise I can
> rework 'mac80211: move struct aead_req off the stack' so that the
> other patch is no longer required (and IIRC, this is actually
> something you proposed yourself a couple of iterations ago?)

Yes, I did consider that.

It makes some sense, and I guess the extra memcpy() would be cheaper
than the extra alloc?

I'd happily use that instead of the combination of my two patches. The
aead_request_alloc() is just a simple inline anyway, so no real problem
not using it.

johannes

Ard Biesheuvel Oct. 17, 2016, 1:20 p.m. UTC | #11

On 17 October 2016 at 14:16, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Mon, 2016-10-17 at 14:06 +0100, Ard Biesheuvel wrote:
>>
>> Actually, while I think it will be worthwhile going forward to
>> implement such an 'auxiliary data' feature in a generic way, I still
>> think we should address the issue at hand without too much
>> complication.
>>
>> If we pedal back to the version of 'mac80211: move struct aead_req
>> off the stack' that uses kzalloc() instead of aead_request_alloc(),
>> we can simply add some space for aad[] and/or zero[], and get rid of
>> the kmem cache entirely.
>>
>> If you're past this point already, i won't bother but otherwise I can
>> rework 'mac80211: move struct aead_req off the stack' so that the
>> other patch is no longer required (and IIRC, this is actually
>> something you proposed yourself a couple of iterations ago?)
>
> Yes, I did consider that.
>
> It makes some sense, and I guess the extra memcpy() would be cheaper
> than the extra alloc?
>
> I'd happily use that instead of the combination of my two patches. The
> aead_request_alloc() is just a simple inline anyway, so no real problem
> not using it.
>

Indeed. And it keeps the clutter inside the aes_xxx.c files, which
could easily be updated in the future to use some auxdata feature if
it ever materializes.

I think it would help this code, but also the ESP code you pointed
out, to have some kind of 'ordered synchronous' CRYPTO_xxx flag, where
the crypto API could manage the kmem cache and percpu pointers to
allocations. This goes well beyond what we can do as a fix, though, so
we need an intermediate solution in any case.

Shall I propose the patch?

Johannes Berg Oct. 17, 2016, 1:25 p.m. UTC | #12

> Indeed. And it keeps the clutter inside the aes_xxx.c files, which
> could easily be updated in the future to use some auxdata feature if
> it ever materializes.
> 
> I think it would help this code, but also the ESP code you pointed
> out, to have some kind of 'ordered synchronous' CRYPTO_xxx flag,
> where the crypto API could manage the kmem cache and percpu pointers
> to allocations.

Yeah, could be useful to have that more generally.

>  This goes well beyond what we can do as a fix, though, so we need an
> intermediate solution in any case.
> 
> Shall I propose the patch?

I assume you mean a mac80211 patch - sure, I'll take that instead of
the two I have now.

johannes

[v4] mac80211: move extra crypto data off the stack

Commit Message

Comments

Patch