Message ID | CAKv+Gu-aZhCBvnEQoZUZLDPXCrvgxO1pSd=6EHz+tMB+dFz5hg@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Johannes Berg |
Headers | show |
On 17 October 2016 at 10:14, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 17 October 2016 at 09:33, Johannes Berg <johannes@sipsolutions.net> wrote: >> From: Johannes Berg <johannes.berg@intel.com> >> >> As the stack can (on x86-64) now be virtually mapped rather than >> using "normal" kernel memory, Sergey noticed mac80211 isn't using >> the SG APIs correctly by putting on-stack buffers into SG tables. >> This leads to kernel crashes. >> >> Fix this by allocating the extra fields dynamically on the fly as >> needed, using a kmem cache. >> >> I used per-CPU memory in a previous iteration of this patch, but >> Ard Biesheuvel pointed out that was also vmalloc'ed on some >> architectures. >> >> Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> >> Signed-off-by: Johannes Berg <johannes.berg@intel.com> > > Apologies for going back and forth on this, but it appears there may > be another way to deal with this. > > First of all, we only need this handling for the authenticated data, > and only for CCM and GCM, not CMAC (which does not use scatterlists at > all, it simply calls the AES cipher directly) > > So that leaves a fixed 20 bytes for GCM and fixed 32 bytes for CCM, > which we could allocate along with the AEAD request, e..g., > > """ > diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aes_ccm.c > index 8e898a6e8de8..c0c33e6ad94e 100644 > --- a/net/mac80211/aes_ccm.c > +++ b/net/mac80211/aes_ccm.c > @@ -24,13 +24,17 @@ int ieee80211_aes_ccm_encrypt(struct crypto_aead > *tfm, u8 *b_0, u8 *aad, > { > struct scatterlist sg[3]; > struct aead_request *aead_req; > + u8 *__aad; > > aead_req = aead_request_alloc(tfm, GFP_ATOMIC); > if (!aead_req) > return -ENOMEM; > > + __aad = (u8 *)aead_req + crypto_aead_reqsize(tfm); > + memcpy(__aad, aad, 2 * AES_BLOCK_SIZE); > + > sg_init_table(sg, 3); > - sg_set_buf(&sg[0], &aad[2], be16_to_cpup((__be16 *)aad)); > + sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad)); > sg_set_buf(&sg[1], data, data_len); > sg_set_buf(&sg[2], mic, mic_len); > > @@ -49,6 +53,7 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead > *tfm, u8 *b_0, u8 *aad, > { > struct scatterlist sg[3]; > struct aead_request *aead_req; > + u8 *__aad; > int err; > > if (data_len == 0) > @@ -58,8 +63,11 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead > *tfm, u8 *b_0, u8 *aad, > if (!aead_req) > return -ENOMEM; > > + __aad = (u8 *)aead_req + crypto_aead_reqsize(tfm); > + memcpy(__aad, aad, 2 * AES_BLOCK_SIZE); > + > sg_init_table(sg, 3); > - sg_set_buf(&sg[0], &aad[2], be16_to_cpup((__be16 *)aad)); > + sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad)); > sg_set_buf(&sg[1], data, data_len); > sg_set_buf(&sg[2], mic, mic_len); > > @@ -90,6 +98,8 @@ struct crypto_aead > *ieee80211_aes_key_setup_encrypt(const u8 key[], > if (err) > goto free_aead; > > + crypto_aead_set_reqsize(tfm, > + crypto_aead_reqsize(tfm) + 2 * AES_BLOCK_SIZE)); > return tfm; > Darn, it seems crypto_aead_set_reqsize() is internal to the crypto API ... :-(
> Apologies for going back and forth on this, but it appears there may > be another way to deal with this. > > First of all, we only need this handling for the authenticated data, Are you sure b_0/j_0 aren't needed? We pass those to aead_request_set_crypt(), and I wasn't sure what that really did internally, perhaps like the internal data. Testing with that on the stack does seem to work, in fact. Surely we need zero for GMAC though, since we also put that into the sg list. Thus for GMAC we definitely need 20+16 bytes, and since I round up to a cacheline (at least on SMP) it doesn't really matter that we could get 36 instead of the 48 I have now. > and only for CCM and GCM, not CMAC (which does not use scatterlists > at all, it simply calls the AES cipher directly) I didn't modify CMAC, I think, only GMAC, which also uses scatterlists. > So that leaves a fixed 20 bytes for GCM and fixed 32 bytes for CCM, and 36 for GMAC :) johannes
On 17 October 2016 at 10:23, Johannes Berg <johannes@sipsolutions.net> wrote: > >> Apologies for going back and forth on this, but it appears there may >> be another way to deal with this. >> >> First of all, we only need this handling for the authenticated data, > > Are you sure b_0/j_0 aren't needed? We pass those > to aead_request_set_crypt(), and I wasn't sure what that really did > internally, perhaps like the internal data. > They are the IV[], which is a fixed length parameter of the algorithm. In contrast, the AAD[] could be of arbitrary length (from the POV of the crypto API) so it uses scatterlists. > Testing with that on the stack does seem to work, in fact. > > Surely we need zero for GMAC though, since we also put that into the sg > list. Thus for GMAC we definitely need 20+16 bytes, and since I round > up to a cacheline (at least on SMP) it doesn't really matter that we > could get 36 instead of the 48 I have now. > >> and only for CCM and GCM, not CMAC (which does not use scatterlists >> at all, it simply calls the AES cipher directly) > > I didn't modify CMAC, I think, only GMAC, which also uses scatterlists. > Ah ok, I misread the patch. >> So that leaves a fixed 20 bytes for GCM and fixed 32 bytes for CCM, > > and 36 for GMAC :) Yes. But as I replied, setting the req size is not supported atm, although it is reasonable to demand a way to allocate additional data in the request specifically for this issue. So let's proceed with the aead_request_alloc/free patch, but I would like to propose something on the API side to address this particular issue
On Mon, 2016-10-17 at 10:30 +0100, Ard Biesheuvel wrote: > Yes. But as I replied, setting the req size is not supported atm, > although it is reasonable to demand a way to allocate additional data > in the request specifically for this issue. So let's proceed with the > aead_request_alloc/free patch, but I would like to propose something > on the API side to address this particular issue Well, if your other patch to make it OK to be on-stack would be applied instead, this wouldn't make much sense either :) In this particular patch, we could reduce the size of the struct, but I don't actually think it'll make a difference to go from 48 to 36 bytes, given alignment etc., so I think I'll just leave it as is. johannes
> On 17 Oct 2016, at 10:35, Johannes Berg <johannes@sipsolutions.net> wrote: > >> On Mon, 2016-10-17 at 10:30 +0100, Ard Biesheuvel wrote: >> >> Yes. But as I replied, setting the req size is not supported atm, >> although it is reasonable to demand a way to allocate additional data >> in the request specifically for this issue. So let's proceed with the >> aead_request_alloc/free patch, but I would like to propose something >> on the API side to address this particular issue > > Well, if your other patch to make it OK to be on-stack would be applied > instead, this wouldn't make much sense either :) > > In this particular patch, we could reduce the size of the struct, but I > don't actually think it'll make a difference to go from 48 to 36 bytes, > given alignment etc., so I think I'll just leave it as is. > > johannes
> On 17 Oct 2016, at 10:35, Johannes Berg <johannes@sipsolutions.net> wrote: > >> On Mon, 2016-10-17 at 10:30 +0100, Ard Biesheuvel wrote: >> >> Yes. But as I replied, setting the req size is not supported atm, >> although it is reasonable to demand a way to allocate additional data >> in the request specifically for this issue. So let's proceed with the >> aead_request_alloc/free patch, but I would like to propose something >> on the API side to address this particular issue > > Well, if your other patch to make it OK to be on-stack would be applied > instead, this wouldn't make much sense either :) > Yes but that one only fixes ccm not gcm > In this particular patch, we could reduce the size of the struct, but I > don't actually think it'll make a difference to go from 48 to 36 bytes, > given alignment etc., so I think I'll just leave it as is. > I understand you are in a hurry, but this is unlikely to be the only such issue. I will propose an 'auxdata' feature for the crypto api that can be used here, but also for any other occurrence where client data assoiciated with the request can no longer be allocated on the stack
> > Well, if your other patch to make it OK to be on-stack would be > > applied instead, this wouldn't make much sense either :) > > > > Yes but that one only fixes ccm not gcm Yes, but we can do the same for GCM, no? > > In this particular patch, we could reduce the size of the struct, > > but I > > don't actually think it'll make a difference to go from 48 to 36 > > bytes, > > given alignment etc., so I think I'll just leave it as is. > > > > I understand you are in a hurry, but this is unlikely to be the only > such issue. I will propose an 'auxdata' feature for the crypto api > that can be used here, but also for any other occurrence where client > data assoiciated with the request can no longer be allocated on the > stack No objections. I'll merge this anyway today I think, reverting is easy later. johannes
> On 17 Oct 2016, at 10:54, Johannes Berg <johannes@sipsolutions.net> wrote: > > >>> Well, if your other patch to make it OK to be on-stack would be >>> applied instead, this wouldn't make much sense either :) >>> >> >> Yes but that one only fixes ccm not gcm > > Yes, but we can do the same for GCM, no? > No, not really. ccm happens to use aes with the same key for the mac and the encryption. gcm uses an another algo entirely for the mac >>> In this particular patch, we could reduce the size of the struct, >>> but I >>> don't actually think it'll make a difference to go from 48 to 36 >>> bytes, >>> given alignment etc., so I think I'll just leave it as is. >>> >> >> I understand you are in a hurry, but this is unlikely to be the only >> such issue. I will propose an 'auxdata' feature for the crypto api >> that can be used here, but also for any other occurrence where client >> data assoiciated with the request can no longer be allocated on the >> stack > > No objections. I'll merge this anyway today I think, reverting is easy > later. > ok fair enough
On 17 October 2016 at 11:02, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > >> On 17 Oct 2016, at 10:54, Johannes Berg <johannes@sipsolutions.net> wrote: >> >> >>>> Well, if your other patch to make it OK to be on-stack would be >>>> applied instead, this wouldn't make much sense either :) >>>> >>> >>> Yes but that one only fixes ccm not gcm >> >> Yes, but we can do the same for GCM, no? >> > > No, not really. ccm happens to use aes with the same key for the mac and the encryption. gcm uses an another algo entirely for the mac > >>>> In this particular patch, we could reduce the size of the struct, >>>> but I >>>> don't actually think it'll make a difference to go from 48 to 36 >>>> bytes, >>>> given alignment etc., so I think I'll just leave it as is. >>>> >>> >>> I understand you are in a hurry, but this is unlikely to be the only >>> such issue. I will propose an 'auxdata' feature for the crypto api >>> that can be used here, but also for any other occurrence where client >>> data assoiciated with the request can no longer be allocated on the >>> stack >> >> No objections. I'll merge this anyway today I think, reverting is easy >> later. >> > > ok fair enough Actually, while I think it will be worthwhile going forward to implement such an 'auxiliary data' feature in a generic way, I still think we should address the issue at hand without too much complication. If we pedal back to the version of 'mac80211: move struct aead_req off the stack' that uses kzalloc() instead of aead_request_alloc(), we can simply add some space for aad[] and/or zero[], and get rid of the kmem cache entirely. If you're past this point already, i won't bother but otherwise I can rework 'mac80211: move struct aead_req off the stack' so that the other patch is no longer required (and IIRC, this is actually something you proposed yourself a couple of iterations ago?)
On Mon, 2016-10-17 at 14:06 +0100, Ard Biesheuvel wrote: > > Actually, while I think it will be worthwhile going forward to > implement such an 'auxiliary data' feature in a generic way, I still > think we should address the issue at hand without too much > complication. > > If we pedal back to the version of 'mac80211: move struct aead_req > off the stack' that uses kzalloc() instead of aead_request_alloc(), > we can simply add some space for aad[] and/or zero[], and get rid of > the kmem cache entirely. > > If you're past this point already, i won't bother but otherwise I can > rework 'mac80211: move struct aead_req off the stack' so that the > other patch is no longer required (and IIRC, this is actually > something you proposed yourself a couple of iterations ago?) Yes, I did consider that. It makes some sense, and I guess the extra memcpy() would be cheaper than the extra alloc? I'd happily use that instead of the combination of my two patches. The aead_request_alloc() is just a simple inline anyway, so no real problem not using it. johannes
On 17 October 2016 at 14:16, Johannes Berg <johannes@sipsolutions.net> wrote: > On Mon, 2016-10-17 at 14:06 +0100, Ard Biesheuvel wrote: >> >> Actually, while I think it will be worthwhile going forward to >> implement such an 'auxiliary data' feature in a generic way, I still >> think we should address the issue at hand without too much >> complication. >> >> If we pedal back to the version of 'mac80211: move struct aead_req >> off the stack' that uses kzalloc() instead of aead_request_alloc(), >> we can simply add some space for aad[] and/or zero[], and get rid of >> the kmem cache entirely. >> >> If you're past this point already, i won't bother but otherwise I can >> rework 'mac80211: move struct aead_req off the stack' so that the >> other patch is no longer required (and IIRC, this is actually >> something you proposed yourself a couple of iterations ago?) > > Yes, I did consider that. > > It makes some sense, and I guess the extra memcpy() would be cheaper > than the extra alloc? > > I'd happily use that instead of the combination of my two patches. The > aead_request_alloc() is just a simple inline anyway, so no real problem > not using it. > Indeed. And it keeps the clutter inside the aes_xxx.c files, which could easily be updated in the future to use some auxdata feature if it ever materializes. I think it would help this code, but also the ESP code you pointed out, to have some kind of 'ordered synchronous' CRYPTO_xxx flag, where the crypto API could manage the kmem cache and percpu pointers to allocations. This goes well beyond what we can do as a fix, though, so we need an intermediate solution in any case. Shall I propose the patch?
> Indeed. And it keeps the clutter inside the aes_xxx.c files, which > could easily be updated in the future to use some auxdata feature if > it ever materializes. > > I think it would help this code, but also the ESP code you pointed > out, to have some kind of 'ordered synchronous' CRYPTO_xxx flag, > where the crypto API could manage the kmem cache and percpu pointers > to allocations. Yeah, could be useful to have that more generally. > This goes well beyond what we can do as a fix, though, so we need an > intermediate solution in any case. > > Shall I propose the patch? I assume you mean a mac80211 patch - sure, I'll take that instead of the two I have now. johannes
diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aes_ccm.c index 8e898a6e8de8..c0c33e6ad94e 100644 --- a/net/mac80211/aes_ccm.c +++ b/net/mac80211/aes_ccm.c @@ -24,13 +24,17 @@ int ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, { struct scatterlist sg[3]; struct aead_request *aead_req; + u8 *__aad; aead_req = aead_request_alloc(tfm, GFP_ATOMIC); if (!aead_req) return -ENOMEM; + __aad = (u8 *)aead_req + crypto_aead_reqsize(tfm); + memcpy(__aad, aad, 2 * AES_BLOCK_SIZE); + sg_init_table(sg, 3); - sg_set_buf(&sg[0], &aad[2], be16_to_cpup((__be16 *)aad)); + sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad)); sg_set_buf(&sg[1], data, data_len); sg_set_buf(&sg[2], mic, mic_len);