[v4,11/14] treewide: Prepare to remove VLA usage for AHASH_REQUEST_ON_STACK
diff mbox

Message ID 20180711203619.1020-12-keescook@chromium.org
State Changes Requested
Delegated to: Herbert Xu
Headers show

Commit Message

Kees Cook July 11, 2018, 8:36 p.m. UTC
Several uses of AHASH_REQUEST_ON_STACK() will trigger FRAME_WARN warnings
(when less than 2048) once the VLA is no longer hidden from the check:

drivers/block/drbd/drbd_worker.c:325:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/block/drbd/drbd_worker.c:352:1: warning: the frame size of 1120 bytes is larger than 1024 bytes [-Wframe-larger-than=]
crypto/ccm.c:235:1: warning: the frame size of 1184 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/md/dm-crypt.c:353:1: warning: the frame size of 1096 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/net/ppp/ppp_mppe.c:158:1: warning: the frame size of 1168 bytes is larger than 1024 bytes [-Wframe-larger-than=]
net/wireless/lib80211_crypt_tkip.c:537:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c:528:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/staging/rtl8192e/rtllib_crypt_tkip.c:531:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]

This bumps the affected objects by 20% to silence the warnings while still
providing coverage is anything grows even more.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 crypto/Makefile                             | 1 +
 drivers/block/drbd/Makefile                 | 2 ++
 drivers/md/Makefile                         | 1 +
 drivers/net/ppp/Makefile                    | 1 +
 drivers/staging/rtl8192e/Makefile           | 1 +
 drivers/staging/rtl8192u/Makefile           | 1 +
 drivers/staging/rtl8192u/ieee80211/Makefile | 1 +
 net/wireless/Makefile                       | 1 +
 8 files changed, 9 insertions(+)

Comments

Arnd Bergmann July 12, 2018, 4:02 p.m. UTC | #1
On Wed, Jul 11, 2018 at 10:36 PM, Kees Cook <keescook@chromium.org> wrote:
> Several uses of AHASH_REQUEST_ON_STACK() will trigger FRAME_WARN warnings
> (when less than 2048) once the VLA is no longer hidden from the check:
>
> drivers/block/drbd/drbd_worker.c:325:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> drivers/block/drbd/drbd_worker.c:352:1: warning: the frame size of 1120 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> crypto/ccm.c:235:1: warning: the frame size of 1184 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> drivers/md/dm-crypt.c:353:1: warning: the frame size of 1096 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> drivers/net/ppp/ppp_mppe.c:158:1: warning: the frame size of 1168 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> net/wireless/lib80211_crypt_tkip.c:537:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c:528:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> drivers/staging/rtl8192e/rtllib_crypt_tkip.c:531:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>
> This bumps the affected objects by 20% to silence the warnings while still
> providing coverage is anything grows even more.
>
> Signed-off-by: Kees Cook <keescook@chromium.org>

I think this is a dangerous precedent, I wouldn't really want any of
those functions to
ever take more than 1024 bytes, even that is really too much, but we
can't easily
lower the global limit.

You are patching all users of AHASH_REQUEST_ON_STACK with the exception of
arch/x86/power/hibernate_64.c here (which is always used on 64-bit and has
a larger limit already), which in turn suggests that the AHASH_REQUEST_ON_STACK
macro using bytes is just fundamentally broken by requiring that much space
(808 bytes for the context, plus 8 pointers for struct ahash_request, plus
CRYPTO_MINALIGN_ATTR).

How did you come up with that 808 byte number? I see a total of 39 callers
of crypto_ahash_set_reqsize(), did you check all of those individually?
If 808 bytes is the worst case, what are the next 5 ones? If there are only
a few of them that are badly written, maybe we can fix the drivers instead
and lower that number to something more reasonable.

Looking through some of the drivers, I found this interesting one:

#define SHA_BUFFER_LEN          (PAGE_SIZE / 16)
struct atmel_sha_reqctx {
...
        u8 buffer[SHA_BUFFER_LEN + SHA512_BLOCK_SIZE] __aligned(sizeof(u32));
};

which would result in overrunning the kernel stack immediately if ever
used with 64k PAGE_SIZE (we fortunately don't support that driver on
any architectures with 64k pages yet).

The other ones I looked at seem to all be well under 400 bytes (which is
still a lot to put on the stack, but probably ok).

      Arnd
Kees Cook July 12, 2018, 8:17 p.m. UTC | #2
On Thu, Jul 12, 2018 at 9:02 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wed, Jul 11, 2018 at 10:36 PM, Kees Cook <keescook@chromium.org> wrote:
>> Several uses of AHASH_REQUEST_ON_STACK() will trigger FRAME_WARN warnings
>> (when less than 2048) once the VLA is no longer hidden from the check:
>>
>> drivers/block/drbd/drbd_worker.c:325:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>> drivers/block/drbd/drbd_worker.c:352:1: warning: the frame size of 1120 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>> crypto/ccm.c:235:1: warning: the frame size of 1184 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>> drivers/md/dm-crypt.c:353:1: warning: the frame size of 1096 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>> drivers/net/ppp/ppp_mppe.c:158:1: warning: the frame size of 1168 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>> net/wireless/lib80211_crypt_tkip.c:537:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>> drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c:528:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>> drivers/staging/rtl8192e/rtllib_crypt_tkip.c:531:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>>
>> This bumps the affected objects by 20% to silence the warnings while still
>> providing coverage is anything grows even more.
>>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>
> I think this is a dangerous precedent, I wouldn't really want any of
> those functions to
> ever take more than 1024 bytes, even that is really too much, but we
> can't easily
> lower the global limit.

The issue is that these are _already_ able to use this much stack
because of the VLA. It was just hidden from the FRAME_WARN checks.

> You are patching all users of AHASH_REQUEST_ON_STACK with the exception of
> arch/x86/power/hibernate_64.c here (which is always used on 64-bit and has
> a larger limit already), which in turn suggests that the AHASH_REQUEST_ON_STACK
> macro using bytes is just fundamentally broken by requiring that much space
> (808 bytes for the context, plus 8 pointers for struct ahash_request, plus
> CRYPTO_MINALIGN_ATTR).

Yes -- it's huge. That's always been true, unfortunately.

> How did you come up with that 808 byte number? I see a total of 39 callers
> of crypto_ahash_set_reqsize(), did you check all of those individually?
> If 808 bytes is the worst case, what are the next 5 ones? If there are only
> a few of them that are badly written, maybe we can fix the drivers instead
> and lower that number to something more reasonable.

That was discussed a bit (maybe not enough?) in the next patch:
https://patchwork.kernel.org/patch/10520407/

I used tcrypt (which examines all sane combinations) and sha512
produces the 808 number. I had done an earlier manual evaluation of
all crypto_ahash_set_reqsize() callers but Herbert and Eric pointed
out issues with my methodology (namely that things can be recursively
stacked (I had calculated too low) but some things will never be
stacked together (so some pathological conditions will never happen)).
So I moved to the tcrypt instrumentation approach, which tests
real-world combinations.

For example, reaching this 808 size is trivially easy to do right now
by just asking for dm-crypt to use a cipher of
capi:cbc(aes)-essiv:sha512.

> Looking through some of the drivers, I found this interesting one:
>
> #define SHA_BUFFER_LEN          (PAGE_SIZE / 16)
> struct atmel_sha_reqctx {
> ...
>         u8 buffer[SHA_BUFFER_LEN + SHA512_BLOCK_SIZE] __aligned(sizeof(u32));
> };
>
> which would result in overrunning the kernel stack immediately if ever
> used with 64k PAGE_SIZE (we fortunately don't support that driver on
> any architectures with 64k pages yet).

Right -- the large page size isn't reachable there. But we don't
overrun the kernel stack because of the check I added in
crypto_ahash_set_reqsize() in the above mentioned patch.

> The other ones I looked at seem to all be well under 400 bytes (which is
> still a lot to put on the stack, but probably ok).

I wish sha512 was "rare", but it's not. :(

So: mainly the crypto VLA removal is about exposing all these giant
stack usages. We can work to fix them, but I want to get these fixed
so we can add -Wvla to the kernel to avoid more being added (we've had
at least 2 added during this linux-next cycle already).

IMO, we're much better off with this stack usage _actually_ being
checked (even with a 20% bump) than staying entirely hidden (as it's
been).

-Kees
Arnd Bergmann July 12, 2018, 9:38 p.m. UTC | #3
On Thu, Jul 12, 2018 at 10:17 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 12, 2018 at 9:02 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Wed, Jul 11, 2018 at 10:36 PM, Kees Cook <keescook@chromium.org> wrote:
>>> Several uses of AHASH_REQUEST_ON_STACK() will trigger FRAME_WARN warnings
>>> (when less than 2048) once the VLA is no longer hidden from the check:
>>>
>>> drivers/block/drbd/drbd_worker.c:325:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>>> drivers/block/drbd/drbd_worker.c:352:1: warning: the frame size of 1120 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>>> crypto/ccm.c:235:1: warning: the frame size of 1184 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>>> drivers/md/dm-crypt.c:353:1: warning: the frame size of 1096 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>>> drivers/net/ppp/ppp_mppe.c:158:1: warning: the frame size of 1168 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>>> net/wireless/lib80211_crypt_tkip.c:537:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>>> drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_tkip.c:528:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>>> drivers/staging/rtl8192e/rtllib_crypt_tkip.c:531:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>>>
>>> This bumps the affected objects by 20% to silence the warnings while still
>>> providing coverage is anything grows even more.
>>>
>>> Signed-off-by: Kees Cook <keescook@chromium.org>
>>
>> I think this is a dangerous precedent, I wouldn't really want any of
>> those functions to
>> ever take more than 1024 bytes, even that is really too much, but we
>> can't easily
>> lower the global limit.
>
> The issue is that these are _already_ able to use this much stack
> because of the VLA. It was just hidden from the FRAME_WARN checks.

Yes, of course.

>> You are patching all users of AHASH_REQUEST_ON_STACK with the exception of
>> arch/x86/power/hibernate_64.c here (which is always used on 64-bit and has
>> a larger limit already), which in turn suggests that the AHASH_REQUEST_ON_STACK
>> macro using bytes is just fundamentally broken by requiring that much space
>> (808 bytes for the context, plus 8 pointers for struct ahash_request, plus
>> CRYPTO_MINALIGN_ATTR).
>
> Yes -- it's huge. That's always been true, unfortunately.
>
>> How did you come up with that 808 byte number? I see a total of 39 callers
>> of crypto_ahash_set_reqsize(), did you check all of those individually?
>> If 808 bytes is the worst case, what are the next 5 ones? If there are only
>> a few of them that are badly written, maybe we can fix the drivers instead
>> and lower that number to something more reasonable.
>
> That was discussed a bit (maybe not enough?) in the next patch:
> https://patchwork.kernel.org/patch/10520407/
>
> I used tcrypt (which examines all sane combinations) and sha512
> produces the 808 number. I had done an earlier manual evaluation of
> all crypto_ahash_set_reqsize() callers but Herbert and Eric pointed
> out issues with my methodology (namely that things can be recursively
> stacked (I had calculated too low) but some things will never be
> stacked together (so some pathological conditions will never happen)).
> So I moved to the tcrypt instrumentation approach, which tests
> real-world combinations.
>
> For example, reaching this 808 size is trivially easy to do right now
> by just asking for dm-crypt to use a cipher of
> capi:cbc(aes)-essiv:sha512.

Ok, but is there anything that can be done to the sha512
implementation to lower that number? E.g. if a significant chunk
of struct sha512_hash_ctx is only used to hold temporary data,
could it be replaced with e.g. a percpu buffer?
>> The other ones I looked at seem to all be well under 400 bytes (which is
>> still a lot to put on the stack, but probably ok).
>
> I wish sha512 was "rare", but it's not. :(

Looking at the callers of crypto_ahash_set_reqsize(), it appears
that the only instance that is so bad is specifically
arch/x86/crypto/sha512-mb/sha512_mb.c, which is architecture
specific, and only one of multiple implementations of sha512.

Am I misreading that code, or does that mean that we could get
away with using the 808 byte limit only on x86 when
CONFIG_CRYPTO_SHA512_MB is enabled, but using a smaller
limit everywhere where else?

> So: mainly the crypto VLA removal is about exposing all these giant
> stack usages. We can work to fix them, but I want to get these fixed
> so we can add -Wvla to the kernel to avoid more being added (we've had
> at least 2 added during this linux-next cycle already).
>
> IMO, we're much better off with this stack usage _actually_ being
> checked (even with a 20% bump) than staying entirely hidden (as it's
> been).

Yes, definitely. You may recall that I spent several months tracking
down all drivers that grew to insane stack usage when CONFIG_KASAN
was enabled, so we could again turn on the existing stack check
in an allmodconfig build in order to find the normal regressions, so
I'm definitely all for improving both the actual usage and the kind
of diagnostic we have available.

I mainly want to ensure that we have tried anything within reason
to reduce the stack usage of the AHASH_REQUEST_ON_STACK()
users before we resort to changing the warning limit. I'm not
convinced that everything has been tried if we have 808 byte
structures.

      Arnd
Herbert Xu July 13, 2018, 12:40 a.m. UTC | #4
On Thu, Jul 12, 2018 at 06:02:26PM +0200, Arnd Bergmann wrote:
>
> Looking through some of the drivers, I found this interesting one:

As I said before these patches are fundamentally broken.  Users
of AHASH_REQUEST_ON_STACK can only use sync algorithm providers
and therefore drivers are irrelevant.

Cheers,
Kees Cook July 13, 2018, 3:33 a.m. UTC | #5
On Thu, Jul 12, 2018 at 5:40 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Thu, Jul 12, 2018 at 06:02:26PM +0200, Arnd Bergmann wrote:
>>
>> Looking through some of the drivers, I found this interesting one:
>
> As I said before these patches are fundamentally broken.  Users
> of AHASH_REQUEST_ON_STACK can only use sync algorithm providers
> and therefore drivers are irrelevant.

I don't understand what this means. Can you give an example of what
you want to see happen that will accomplish the VLA removals?

-Kees
Herbert Xu July 13, 2018, 3:44 a.m. UTC | #6
On Thu, Jul 12, 2018 at 08:33:24PM -0700, Kees Cook wrote:
> On Thu, Jul 12, 2018 at 5:40 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > On Thu, Jul 12, 2018 at 06:02:26PM +0200, Arnd Bergmann wrote:
> >>
> >> Looking through some of the drivers, I found this interesting one:
> >
> > As I said before these patches are fundamentally broken.  Users
> > of AHASH_REQUEST_ON_STACK can only use sync algorithm providers
> > and therefore drivers are irrelevant.
> 
> I don't understand what this means. Can you give an example of what
> you want to see happen that will accomplish the VLA removals?

Any algorithm that is async must be ignored when you're calculating
the maximum on-stack size of the request.  For example, sha512-mb
is marked as async and therefore must not be used in conjunction
with AHASH_REQUEST_ON_STACK.

Cheers,
Kees Cook July 13, 2018, 5:17 a.m. UTC | #7
On Thu, Jul 12, 2018 at 8:44 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Thu, Jul 12, 2018 at 08:33:24PM -0700, Kees Cook wrote:
>> On Thu, Jul 12, 2018 at 5:40 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> > On Thu, Jul 12, 2018 at 06:02:26PM +0200, Arnd Bergmann wrote:
>> >>
>> >> Looking through some of the drivers, I found this interesting one:
>> >
>> > As I said before these patches are fundamentally broken.  Users
>> > of AHASH_REQUEST_ON_STACK can only use sync algorithm providers
>> > and therefore drivers are irrelevant.
>>
>> I don't understand what this means. Can you give an example of what
>> you want to see happen that will accomplish the VLA removals?
>
> Any algorithm that is async must be ignored when you're calculating
> the maximum on-stack size of the request.  For example, sha512-mb
> is marked as async and therefore must not be used in conjunction
> with AHASH_REQUEST_ON_STACK.

Then why does the instrumented tcrypt output show the huge size? Is
tcrypt doing something incorrectly?

What is the correct value to use for AHASH_REQUEST_ON_STACK?

-Kees
Herbert Xu July 13, 2018, 5:20 a.m. UTC | #8
On Thu, Jul 12, 2018 at 10:17:29PM -0700, Kees Cook wrote:
>
> Then why does the instrumented tcrypt output show the huge size? Is
> tcrypt doing something incorrectly?

tcrypt doesn't even use AHASH_REQUEST_ON_STACK so I don't understand
your point.

> What is the correct value to use for AHASH_REQUEST_ON_STACK?

As I said to arrive at a fixed value you should examine all sync
ahash algorithms (e.g., all shash ones plus ahash ones marked as
sync if there are any).

Cheers,
Kees Cook July 13, 2018, 6 a.m. UTC | #9
On Thu, Jul 12, 2018 at 10:20 PM, Herbert Xu
<herbert@gondor.apana.org.au> wrote:
> On Thu, Jul 12, 2018 at 10:17:29PM -0700, Kees Cook wrote:
>>
>> Then why does the instrumented tcrypt output show the huge size? Is
>> tcrypt doing something incorrectly?
>
> tcrypt doesn't even use AHASH_REQUEST_ON_STACK so I don't understand
> your point.

It's using crypto_ahash_set_reqsize(), which is what
AHASH_REQUEST_ON_STACK() reads back via crypto_ahash_reqsize() (i.e.
tfm->reqsize). It sounds like you're saying that there are cases where
an ahash is constructed (and will call crypto_ahash_set_reqsize()) but
where it cannot be used with AHASH_REQUEST_ON_STACK()? What actually
enforces this, since there will be a difference between
crypto_ahash_set_reqsize() (as seen with sha512-mb) and the actually
allowed stack usage. (i.e. where should I perform a check against the
new fixed value?)

>> What is the correct value to use for AHASH_REQUEST_ON_STACK?
>
> As I said to arrive at a fixed value you should examine all sync
> ahash algorithms (e.g., all shash ones plus ahash ones marked as
> sync if there are any).

The "value" for the ahash I understand: it has a request size
(tfm->reqsize, as set by crypto_ahash_set_reqsize()) what should be
used to measure the shash value? (And how does this relate to the
value returned by crypto_ahash_reqsize()?) The closest clue I can find
is this:

crypto_init_shash_ops_async() does:
        crt->reqsize = sizeof(struct shash_desc) + crypto_shash_descsize(shash);

and that gets called from crypto_ahash_init_tfm(), so if it starts
with the above reqsize and adds to it with a call to
crypto_ahash_set_reqsize() later, we'll have that maximum?

So, do I want to calculate this answer as:

sizeof(struct shash_desc) + max(shash descsize) + max(ahash-sync reqsize) ?
16 + 360 + 0

It's 0 above because if I look at all the callers of
crypto_ahash_set_reqsize() that do wrapping, all are ASYNC.

So, should this really just be 376? Where is best to validate this
size, as it seems checking in crypto_ahash_set_reqsize() is
inappropriate?

-Kees
Kees Cook July 13, 2018, 6:16 a.m. UTC | #10
On Thu, Jul 12, 2018 at 5:40 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Thu, Jul 12, 2018 at 06:02:26PM +0200, Arnd Bergmann wrote:
>>
>> Looking through some of the drivers, I found this interesting one:
>
> As I said before these patches are fundamentally broken.  Users
> of AHASH_REQUEST_ON_STACK can only use sync algorithm providers
> and therefore drivers are irrelevant.

I've also now gone to look at the few users of AHASH_REQUEST_ON_STACK,
and it seems like they come in two flavors:

- ones that can be trivially converts to shash (hibernate)
- things that use scatter/gather

Is this correct? It seems like you did the bulk of
AHASH_REQUEST_ON_STACK conversions in 2016. Can shash grow an sg
interface?

-Kees
Herbert Xu July 13, 2018, 6:22 a.m. UTC | #11
On Thu, Jul 12, 2018 at 11:16:28PM -0700, Kees Cook wrote:
>
> Is this correct? It seems like you did the bulk of
> AHASH_REQUEST_ON_STACK conversions in 2016. Can shash grow an sg
> interface?

shash does not need to grow an sg interface.  All users of
AHASH_REQUEST_ON_STACK set the CRYPTO_ALG_ASYNC flag to zero
when allocating the tfm.

Cheers,
Arnd Bergmann July 13, 2018, 10:14 a.m. UTC | #12
On Fri, Jul 13, 2018 at 8:00 AM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 12, 2018 at 10:20 PM, Herbert Xu
> <herbert@gondor.apana.org.au> wrote:
>> On Thu, Jul 12, 2018 at 10:17:29PM -0700, Kees Cook wrote:
>>> What is the correct value to use for AHASH_REQUEST_ON_STACK?
>>
>> As I said to arrive at a fixed value you should examine all sync
>> ahash algorithms (e.g., all shash ones plus ahash ones marked as
>> sync if there are any).
>
> The "value" for the ahash I understand: it has a request size
> (tfm->reqsize, as set by crypto_ahash_set_reqsize()) what should be
> used to measure the shash value? (And how does this relate to the
> value returned by crypto_ahash_reqsize()?) The closest clue I can find
> is this:
>
> crypto_init_shash_ops_async() does:
>         crt->reqsize = sizeof(struct shash_desc) + crypto_shash_descsize(shash);
>
> and that gets called from crypto_ahash_init_tfm(), so if it starts
> with the above reqsize and adds to it with a call to
> crypto_ahash_set_reqsize() later, we'll have that maximum?
>
> So, do I want to calculate this answer as:
>
> sizeof(struct shash_desc) + max(shash descsize) + max(ahash-sync reqsize) ?
> 16 + 360 + 0

I arrived at the same number, looking at all the sizes in shash,
The largest I found are sha3_state (360 bytes) and s390_sha_ctx
(336 bytes), everything else is way smaller.

> It's 0 above because if I look at all the callers of
> crypto_ahash_set_reqsize() that do wrapping, all are ASYNC.
>
> So, should this really just be 376? Where is best to validate this
> size, as it seems checking in crypto_ahash_set_reqsize() is
> inappropriate?

How about crypto_init_shash_ops_async()?

      Arnd
Kees Cook July 14, 2018, 3:07 a.m. UTC | #13
On Thu, Jul 12, 2018 at 11:22 PM, Herbert Xu
<herbert@gondor.apana.org.au> wrote:
> On Thu, Jul 12, 2018 at 11:16:28PM -0700, Kees Cook wrote:
>>
>> Is this correct? It seems like you did the bulk of
>> AHASH_REQUEST_ON_STACK conversions in 2016. Can shash grow an sg
>> interface?
>
> shash does not need to grow an sg interface.  All users of
> AHASH_REQUEST_ON_STACK set the CRYPTO_ALG_ASYNC flag to zero
> when allocating the tfm.

On a plane today I started converting all these to shash. IIUC, it
just looks like this (apologies for whitespace damage):


 static int crypt_iv_essiv_init(struct crypt_config *cc)
 {
        struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv;
-       AHASH_REQUEST_ON_STACK(req, essiv->hash_tfm);
-       struct scatterlist sg;
+       SHASH_DESC_ON_STACK(desc, essiv->hash_tfm);
        struct crypto_cipher *essiv_tfm;
        int err;

-       sg_init_one(&sg, cc->key, cc->key_size);
-       ahash_request_set_tfm(req, essiv->hash_tfm);
-       ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
-       ahash_request_set_crypt(req, &sg, essiv->salt, cc->key_size);
+       desc->tfm = essiv->hash_tfm;
+       desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;

-       err = crypto_ahash_digest(req);
-       ahash_request_zero(req);
+       err = crypto_shash_digest(desc, key, cc->key_size, essiv->salt);
+       shash_desc_zero(desc);
        if (err)
                return err;


(I left out all the s/ahash/shash/ in types and function declarations.)

Does this look like what you were thinking of for converting these
away from ahash? The only one I couldn't make sense of was in
drivers/crypto/inside-secure/safexcel_hash.c. I have no idea what's
happening there.

-Kees
Herbert Xu July 15, 2018, 2:44 a.m. UTC | #14
On Fri, Jul 13, 2018 at 08:07:10PM -0700, Kees Cook wrote:
>
> On a plane today I started converting all these to shash. IIUC, it
> just looks like this (apologies for whitespace damage):

Yes if it doesn't actually make use of SGs then shash would be
the way to go.  However, for SG users ahash is the best interface.

Cheers,
Kees Cook July 15, 2018, 2:59 a.m. UTC | #15
On Sat, Jul 14, 2018 at 7:44 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Fri, Jul 13, 2018 at 08:07:10PM -0700, Kees Cook wrote:
>>
>> On a plane today I started converting all these to shash. IIUC, it
>> just looks like this (apologies for whitespace damage):
>
> Yes if it doesn't actually make use of SGs then shash would be
> the way to go.  However, for SG users ahash is the best interface.

Nearly all of them artificially build an sg explicitly to use the
ahash interface. :P

So, I'll take that as a "yes, do these conversions." :) Thanks!

-Kees
Kees Cook July 15, 2018, 4:28 a.m. UTC | #16
On Fri, Jul 13, 2018 at 3:14 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Fri, Jul 13, 2018 at 8:00 AM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Jul 12, 2018 at 10:20 PM, Herbert Xu
>> <herbert@gondor.apana.org.au> wrote:
>>> On Thu, Jul 12, 2018 at 10:17:29PM -0700, Kees Cook wrote:
>>>> What is the correct value to use for AHASH_REQUEST_ON_STACK?
>>>
>>> As I said to arrive at a fixed value you should examine all sync
>>> ahash algorithms (e.g., all shash ones plus ahash ones marked as
>>> sync if there are any).
>>
>> The "value" for the ahash I understand: it has a request size
>> (tfm->reqsize, as set by crypto_ahash_set_reqsize()) what should be
>> used to measure the shash value? (And how does this relate to the
>> value returned by crypto_ahash_reqsize()?) The closest clue I can find
>> is this:
>>
>> crypto_init_shash_ops_async() does:
>>         crt->reqsize = sizeof(struct shash_desc) + crypto_shash_descsize(shash);
>>
>> and that gets called from crypto_ahash_init_tfm(), so if it starts
>> with the above reqsize and adds to it with a call to
>> crypto_ahash_set_reqsize() later, we'll have that maximum?
>>
>> So, do I want to calculate this answer as:
>>
>> sizeof(struct shash_desc) + max(shash descsize) + max(ahash-sync reqsize) ?
>> 16 + 360 + 0
>
> I arrived at the same number, looking at all the sizes in shash,
> The largest I found are sha3_state (360 bytes) and s390_sha_ctx
> (336 bytes), everything else is way smaller.

Excellent. Thanks for double-checking this. :)

>
>> It's 0 above because if I look at all the callers of
>> crypto_ahash_set_reqsize() that do wrapping, all are ASYNC.
>>
>> So, should this really just be 376? Where is best to validate this
>> size, as it seems checking in crypto_ahash_set_reqsize() is
>> inappropriate?
>
> How about crypto_init_shash_ops_async()?

Ah yes, that looks good. Nice find!

After my ahash to shash conversions, only ccm is left as an ahash
user, since it actually uses sg. But with the hard-coded value reduced
to 376, this doesn't trip the frame warnings any more. :)

I'll send an updated series soon.

-Kees
Herbert Xu July 16, 2018, 12:01 a.m. UTC | #17
On Sat, Jul 14, 2018 at 07:59:09PM -0700, Kees Cook wrote:
> On Sat, Jul 14, 2018 at 7:44 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > On Fri, Jul 13, 2018 at 08:07:10PM -0700, Kees Cook wrote:
> >>
> >> On a plane today I started converting all these to shash. IIUC, it
> >> just looks like this (apologies for whitespace damage):
> >
> > Yes if it doesn't actually make use of SGs then shash would be
> > the way to go.  However, for SG users ahash is the best interface.
> 
> Nearly all of them artificially build an sg explicitly to use the
> ahash interface. :P
> 
> So, I'll take that as a "yes, do these conversions." :) Thanks!

Yeah anything that's doing a single-element SG list should just
be converted.

Thanks,
Kees Cook July 16, 2018, 3:39 a.m. UTC | #18
On Sun, Jul 15, 2018 at 5:01 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Sat, Jul 14, 2018 at 07:59:09PM -0700, Kees Cook wrote:
>> On Sat, Jul 14, 2018 at 7:44 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> > On Fri, Jul 13, 2018 at 08:07:10PM -0700, Kees Cook wrote:
>> >>
>> >> On a plane today I started converting all these to shash. IIUC, it
>> >> just looks like this (apologies for whitespace damage):
>> >
>> > Yes if it doesn't actually make use of SGs then shash would be
>> > the way to go.  However, for SG users ahash is the best interface.
>>
>> Nearly all of them artificially build an sg explicitly to use the
>> ahash interface. :P
>>
>> So, I'll take that as a "yes, do these conversions." :) Thanks!
>
> Yeah anything that's doing a single-element SG list should just
> be converted.

There are a few that are multiple element SG list, but it's a locally
allocated array of SGs, and filled with data. All easily replaced with
just calls to ..._update() instead of sg helpers. For example
net/wireless/lib80211_crypt_tkip.c:

-       sg_init_table(sg, 2);
-       sg_set_buf(&sg[0], hdr, 16);
-       sg_set_buf(&sg[1], data, data_len);
...
-       ahash_request_set_tfm(req, tfm_michael);
-       ahash_request_set_callback(req, 0, NULL, NULL);
-       ahash_request_set_crypt(req, sg, mic, data_len + 16);
-       err = crypto_ahash_digest(req);
-       ahash_request_zero(req);
+       err = crypto_shash_init(desc);
+       if (err)
+               goto out;
+       err = crypto_shash_update(desc, hdr, 16);
+       if (err)
+               goto out;
+       err = crypto_shash_update(desc, data, data_len);
+       if (err)
+               goto out;
+       err = crypto_shash_final(desc, mic);
+
+out:
+       shash_desc_zero(desc);
        return err;

-Kees
Arnd Bergmann July 16, 2018, 7:24 a.m. UTC | #19
On Mon, Jul 16, 2018 at 5:39 AM, Kees Cook <keescook@chromium.org> wrote:
> On Sun, Jul 15, 2018 at 5:01 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> On Sat, Jul 14, 2018 at 07:59:09PM -0700, Kees Cook wrote:
>>> On Sat, Jul 14, 2018 at 7:44 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>>> > On Fri, Jul 13, 2018 at 08:07:10PM -0700, Kees Cook wrote:
>>> >>
>>> >> On a plane today I started converting all these to shash. IIUC, it
>>> >> just looks like this (apologies for whitespace damage):
>>> >
>>> > Yes if it doesn't actually make use of SGs then shash would be
>>> > the way to go.  However, for SG users ahash is the best interface.
>>>
>>> Nearly all of them artificially build an sg explicitly to use the
>>> ahash interface. :P
>>>
>>> So, I'll take that as a "yes, do these conversions." :) Thanks!
>>
>> Yeah anything that's doing a single-element SG list should just
>> be converted.
>
> There are a few that are multiple element SG list, but it's a locally
> allocated array of SGs, and filled with data. All easily replaced with
> just calls to ..._update() instead of sg helpers. For example
> net/wireless/lib80211_crypt_tkip.c:
>
> -       sg_init_table(sg, 2);
> -       sg_set_buf(&sg[0], hdr, 16);
> -       sg_set_buf(&sg[1], data, data_len);
> ...
> -       ahash_request_set_tfm(req, tfm_michael);
> -       ahash_request_set_callback(req, 0, NULL, NULL);
> -       ahash_request_set_crypt(req, sg, mic, data_len + 16);
> -       err = crypto_ahash_digest(req);
> -       ahash_request_zero(req);
> +       err = crypto_shash_init(desc);
> +       if (err)
> +               goto out;
> +       err = crypto_shash_update(desc, hdr, 16);
> +       if (err)
> +               goto out;
> +       err = crypto_shash_update(desc, data, data_len);
> +       if (err)
> +               goto out;
> +       err = crypto_shash_final(desc, mic);
> +
> +out:
> +       shash_desc_zero(desc);
>         return err;

There may be a little overhead in calling crypto_shash_update()/
crypto_shash_final() repeatedly compared to calling
crypto_ahash_digest() once. It's probably no worse (or maybe
better) in this case, since we call only three times and there
is less indirection, but if there are any cases with a long sglist,
it would be good to measure the performance difference.

       Arnd
Arnd Bergmann July 17, 2018, 8:59 p.m. UTC | #20
On Sun, Jul 15, 2018 at 6:28 AM, Kees Cook <keescook@chromium.org> wrote:
>
> After my ahash to shash conversions, only ccm is left as an ahash
> user, since it actually uses sg. But with the hard-coded value reduced
> to 376, this doesn't trip the frame warnings any more. :)
>
> I'll send an updated series soon.

Maybe we should get rid of that one as well then and remove
AHASH_REQUEST_ON_STACK()?

I see that Ard (now on Cc) added this usage only recently. Looking
at the code some more, I also find that the descsize is probably
much smaller than 376 for all possible cases   of "cbcmac(*)",
either alg->cra_blocksize plus a few bytes or sizeof(mac_desc_ctx)
(i.e. 20) for arch/arm64/crypto/aes-glue.c.

Walking the sglist here means open-coding a shash_ahash_update()
implementation in crypto_ccm_auth(), that that doesn't seem to
add much complexity over what it already has to do to chain
the sglist today.

      Arnd
Ard Biesheuvel July 18, 2018, 2:50 p.m. UTC | #21
On 18 July 2018 at 05:59, Arnd Bergmann <arnd@arndb.de> wrote:
> On Sun, Jul 15, 2018 at 6:28 AM, Kees Cook <keescook@chromium.org> wrote:
>>
>> After my ahash to shash conversions, only ccm is left as an ahash
>> user, since it actually uses sg. But with the hard-coded value reduced
>> to 376, this doesn't trip the frame warnings any more. :)
>>
>> I'll send an updated series soon.
>
> Maybe we should get rid of that one as well then and remove
> AHASH_REQUEST_ON_STACK()?
>
> I see that Ard (now on Cc) added this usage only recently. Looking
> at the code some more, I also find that the descsize is probably
> much smaller than 376 for all possible cases   of "cbcmac(*)",
> either alg->cra_blocksize plus a few bytes or sizeof(mac_desc_ctx)
> (i.e. 20) for arch/arm64/crypto/aes-glue.c.
>
> Walking the sglist here means open-coding a shash_ahash_update()
> implementation in crypto_ccm_auth(), that that doesn't seem to
> add much complexity over what it already has to do to chain
> the sglist today.
>

It would be better to add a variably sized ahash request member to
struct crypto_ccm_req_priv_ctx, the only problem is that the last
member of that struct (skreq) is variably sized already, so it would
involve having a struct ahash_request pointer pointing into the same
struct, after the skreq member.

Patch
diff mbox

diff --git a/crypto/Makefile b/crypto/Makefile
index 6d1d40eeb964..a4487b61ac4e 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -85,6 +85,7 @@  obj-$(CONFIG_CRYPTO_CTR) += ctr.o
 obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o
 obj-$(CONFIG_CRYPTO_GCM) += gcm.o
 obj-$(CONFIG_CRYPTO_CCM) += ccm.o
+CFLAGS_ccm.o += $(FRAME_WARN_BUMP_FLAG)
 obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
 obj-$(CONFIG_CRYPTO_AEGIS128) += aegis128.o
 obj-$(CONFIG_CRYPTO_AEGIS128L) += aegis128l.o
diff --git a/drivers/block/drbd/Makefile b/drivers/block/drbd/Makefile
index 8bd534697d1b..9b6184487cb4 100644
--- a/drivers/block/drbd/Makefile
+++ b/drivers/block/drbd/Makefile
@@ -7,3 +7,5 @@  drbd-y += drbd_nla.o
 drbd-$(CONFIG_DEBUG_FS) += drbd_debugfs.o
 
 obj-$(CONFIG_BLK_DEV_DRBD)     += drbd.o
+
+CFLAGS_drbd_worker.o += $(FRAME_WARN_BUMP_FLAG)
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 822f4e8753bc..639ff6599846 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -47,6 +47,7 @@  obj-$(CONFIG_DM_UNSTRIPED)	+= dm-unstripe.o
 obj-$(CONFIG_DM_BUFIO)		+= dm-bufio.o
 obj-$(CONFIG_DM_BIO_PRISON)	+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT)		+= dm-crypt.o
+CFLAGS_dm-crypt.o		+= $(FRAME_WARN_BUMP_FLAG)
 obj-$(CONFIG_DM_DELAY)		+= dm-delay.o
 obj-$(CONFIG_DM_FLAKEY)		+= dm-flakey.o
 obj-$(CONFIG_DM_MULTIPATH)	+= dm-multipath.o dm-round-robin.o
diff --git a/drivers/net/ppp/Makefile b/drivers/net/ppp/Makefile
index 16c457d6b324..18f35e449c93 100644
--- a/drivers/net/ppp/Makefile
+++ b/drivers/net/ppp/Makefile
@@ -8,6 +8,7 @@  obj-$(CONFIG_PPP_ASYNC) += ppp_async.o
 obj-$(CONFIG_PPP_BSDCOMP) += bsd_comp.o
 obj-$(CONFIG_PPP_DEFLATE) += ppp_deflate.o
 obj-$(CONFIG_PPP_MPPE) += ppp_mppe.o
+CFLAGS_ppp_mppe.o += $(FRAME_WARN_BUMP_FLAG)
 obj-$(CONFIG_PPP_SYNC_TTY) += ppp_synctty.o
 obj-$(CONFIG_PPPOE) += pppox.o pppoe.o
 obj-$(CONFIG_PPPOL2TP) += pppox.o
diff --git a/drivers/staging/rtl8192e/Makefile b/drivers/staging/rtl8192e/Makefile
index 6af519938868..fde738cdf876 100644
--- a/drivers/staging/rtl8192e/Makefile
+++ b/drivers/staging/rtl8192e/Makefile
@@ -15,6 +15,7 @@  obj-$(CONFIG_RTLLIB) += rtllib.o
 
 obj-$(CONFIG_RTLLIB_CRYPTO_CCMP) += rtllib_crypt_ccmp.o
 obj-$(CONFIG_RTLLIB_CRYPTO_TKIP) += rtllib_crypt_tkip.o
+CFLAGS_rtllib_crypt_tkip.o += $(FRAME_WARN_BUMP_FLAG)
 obj-$(CONFIG_RTLLIB_CRYPTO_WEP) += rtllib_crypt_wep.o
 
 obj-$(CONFIG_RTL8192E) += rtl8192e/
diff --git a/drivers/staging/rtl8192u/Makefile b/drivers/staging/rtl8192u/Makefile
index 3022728a364c..ad059546df88 100644
--- a/drivers/staging/rtl8192u/Makefile
+++ b/drivers/staging/rtl8192u/Makefile
@@ -26,5 +26,6 @@  r8192u_usb-y := r8192U_core.o r8180_93cx6.o r8192U_wx.o		\
 		  ieee80211/rtl819x_TSProc.o				\
 		  ieee80211/rtl819x_BAProc.o				\
 		  ieee80211/dot11d.o
+CFLAGS_ieee80211_crypt_tkip.o += $(FRAME_WARN_BUMP_FLAG)
 
 obj-$(CONFIG_RTL8192U) += r8192u_usb.o
diff --git a/drivers/staging/rtl8192u/ieee80211/Makefile b/drivers/staging/rtl8192u/ieee80211/Makefile
index 0d4d6489f767..9f3a06674c1a 100644
--- a/drivers/staging/rtl8192u/ieee80211/Makefile
+++ b/drivers/staging/rtl8192u/ieee80211/Makefile
@@ -17,6 +17,7 @@  ieee80211-rsl-objs := ieee80211_rx.o \
 
 ieee80211_crypt-rsl-objs := ieee80211_crypt.o
 ieee80211_crypt_tkip-rsl-objs := ieee80211_crypt_tkip.o
+CFLAGS_ieee80211_crypt_tkip.o += $(FRAME_WARN_BUMP_FLAG)
 ieee80211_crypt_ccmp-rsl-objs := ieee80211_crypt_ccmp.o
 ieee80211_crypt_wep-rsl-objs := ieee80211_crypt_wep.o
 
diff --git a/net/wireless/Makefile b/net/wireless/Makefile
index 1d84f91bbfb0..f6af5a6233e1 100644
--- a/net/wireless/Makefile
+++ b/net/wireless/Makefile
@@ -4,6 +4,7 @@  obj-$(CONFIG_LIB80211) += lib80211.o
 obj-$(CONFIG_LIB80211_CRYPT_WEP) += lib80211_crypt_wep.o
 obj-$(CONFIG_LIB80211_CRYPT_CCMP) += lib80211_crypt_ccmp.o
 obj-$(CONFIG_LIB80211_CRYPT_TKIP) += lib80211_crypt_tkip.o
+CFLAGS_lib80211_crypt_tkip.o += $(FRAME_WARN_BUMP_FLAG)
 
 obj-$(CONFIG_WEXT_CORE) += wext-core.o
 obj-$(CONFIG_WEXT_PROC) += wext-proc.o