x86-64: Maintain 16-byte stack alignment

Message ID	CA+55aFxP+V6Wbq5Xw_NOksiWouEMg4gjBJgeGa-qFyxDMnTmcA@mail.gmail.com (mailing list archive)
State	Not Applicable
Delegated to:	Herbert Xu
Headers	show Return-Path: <linux-crypto-owner@kernel.org> MIME-Version: 1.0 In-Reply-To: <20170112140215.rh247gwk55fjzmg7@treble> References: <20170110143340.GA3787@gondor.apana.org.au> <20170110143913.GA3822@gondor.apana.org.au> <CA+55aFyuUCr3aUBhFt6T3vKyZOskLyH8Kd_2jeFkPYxxvEaXEg@mail.gmail.com> <20170111031124.GA4515@gondor.apana.org.au> <CA+55aFxJOzMim_d-O2E2yip8JWo0NdYs_72sNwFKSkTjy8q0Sw@mail.gmail.com> <CA+55aFzypqpMog8aTczWYok0MqOcMi4gGzCor6k-NdJzBoLWyQ@mail.gmail.com> <20170111043541.GA4944@gondor.apana.org.au> <CALCETrX_vWTWu=kh4Zawn2OuhPnLip8xTqBGjYU-BvpJ9=ruTg@mail.gmail.com> <CALCETrVb-NyKJajGEsLOrQqQrQ7cRm4sqLRxNr_yVE2-mjiDrg@mail.gmail.com> <20170112140215.rh247gwk55fjzmg7@treble> From: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu, 12 Jan 2017 11:51:23 -0800 Message-ID: <CA+55aFxP+V6Wbq5Xw_NOksiWouEMg4gjBJgeGa-qFyxDMnTmcA@mail.gmail.com> Subject: Re: x86-64: Maintain 16-byte stack alignment To: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net>, Herbert Xu <herbert@gondor.apana.org.au>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux Crypto Mailing List <linux-crypto@vger.kernel.org>, Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Andy Lutomirski <luto@kernel.org>, Ard Biesheuvel <ard.biesheuvel@linaro.org> Content-Type: multipart/mixed; boundary=001a11409132d4d0050545eb0a98 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk

Linus Torvalds Jan. 12, 2017, 7:51 p.m. UTC

On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> Just to clarify, I think you're asking if, for versions of gcc which
> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> functions to ensure their stacks are 16-byte aligned.
>
> It's certainly possible, but I don't see how that solves the problem.
> The stack will still be misaligned by entry code.  Or am I missing
> something?

I think the argument is that we *could* try to align things, if we
just had some tool that actually then verified that we aren't missing
anything.

I'm not entirely happy with checking the generated code, though,
because as Ingo says, you have a 50:50 chance of just getting it right
by mistake. So I'd much rather have some static tool that checks
things at a code level (ie coccinelle or sparse).

Almost totally untested "sparse" patch appended. The problem with
sparse, obviously, is that few enough people run it, and it gives a
lot of other warnings. But maybe Herbert can test whether this would
actually have caught his situation, doing something like an
allmodconfig build with "C=2" to force a sparse run on everything, and
redirecting the warnings to stderr.

But this patch does seem to give a warning for the patch that Herbert
had, and that caused problems.

And in fact it seems to find a few other possible problems (most, but
not all, in crypto). This run was with the broken chacha20 patch
applied, to verify that I get a warning for that case:

   arch/x86/crypto/chacha20_glue.c:70:13: warning: symbol 'state' has
excessive alignment (16)
   arch/x86/crypto/aesni-intel_glue.c:724:12: warning: symbol 'iv' has
excessive alignment (16)
   arch/x86/crypto/aesni-intel_glue.c:803:12: warning: symbol 'iv' has
excessive alignment (16)
   crypto/shash.c:82:12: warning: symbol 'ubuf' has excessive alignment (16)
   crypto/shash.c:118:12: warning: symbol 'ubuf' has excessive alignment (16)
   drivers/char/hw_random/via-rng.c:89:14: warning: symbol 'buf' has
excessive alignment (16)
   net/bridge/netfilter/ebtables.c:1809:31: warning: symbol 'tinfo'
has excessive alignment (64)
   drivers/crypto/padlock-sha.c:85:14: warning: symbol 'buf' has
excessive alignment (16)
   drivers/crypto/padlock-sha.c:147:14: warning: symbol 'buf' has
excessive alignment (16)
   drivers/crypto/padlock-sha.c:304:12: warning: symbol 'buf' has
excessive alignment (16)
   drivers/crypto/padlock-sha.c:388:12: warning: symbol 'buf' has
excessive alignment (16)
   net/openvswitch/actions.c:797:33: warning: symbol 'ovs_rt' has
excessive alignment (64)
   drivers/net/ethernet/neterion/vxge/vxge-config.c:1006:38: warning:
symbol 'vpath' has excessive alignment (64)

although I think at least some of these happen to be ok.

There are a few places that clearly don't care about exact alignment,
and use "__attribute__((aligned))" without any specific alignment
value.

It's just sparse that thinks that implies 16-byte alignment (it
doesn't, really - it's unspecified, and is telling gcc to use "maximum
useful alignment", so who knows _what_ gcc will assume).

But some of them may well be real issues - if the alignment is about
correctness rather than anything else.

Anyway, the advantage of this kind of source-level check is that it
should really catch things regardless of "luck" wrt alignment.

                    Linus
flow.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Andy Lutomirski Jan. 12, 2017, 8:08 p.m. UTC | #1

On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>
>> Just to clarify, I think you're asking if, for versions of gcc which
>> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> functions to ensure their stacks are 16-byte aligned.
>>
>> It's certainly possible, but I don't see how that solves the problem.
>> The stack will still be misaligned by entry code.  Or am I missing
>> something?
>
> I think the argument is that we *could* try to align things, if we
> just had some tool that actually then verified that we aren't missing
> anything.
>
> I'm not entirely happy with checking the generated code, though,
> because as Ingo says, you have a 50:50 chance of just getting it right
> by mistake. So I'd much rather have some static tool that checks
> things at a code level (ie coccinelle or sparse).

What I meant was checking the entry code to see if it aligns stack
frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
alignment for real may actually be entirely a lost cause.  After all,
I think we have some inline functions that do asm volatile ("call
..."), and I don't see any credible way of forcing alignment short of
generating an entirely new stack frame and aligning that.  Ick.  This
whole situation stinks, and I wish that the gcc developers had been
less daft here in the first place or that we'd noticed and gotten it
fixed much longer ago.

Can we come up with a macro like STACK_ALIGN_16 that turns into
__aligned__(32) on bad gcc versions and combine that with your sparse
patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josh Poimboeuf Jan. 12, 2017, 8:15 p.m. UTC | #2

On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>
> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> functions to ensure their stacks are 16-byte aligned.
> >>
> >> It's certainly possible, but I don't see how that solves the problem.
> >> The stack will still be misaligned by entry code.  Or am I missing
> >> something?
> >
> > I think the argument is that we *could* try to align things, if we
> > just had some tool that actually then verified that we aren't missing
> > anything.
> >
> > I'm not entirely happy with checking the generated code, though,
> > because as Ingo says, you have a 50:50 chance of just getting it right
> > by mistake. So I'd much rather have some static tool that checks
> > things at a code level (ie coccinelle or sparse).
> 
> What I meant was checking the entry code to see if it aligns stack
> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> alignment for real may actually be entirely a lost cause.  After all,
> I think we have some inline functions that do asm volatile ("call
> ..."), and I don't see any credible way of forcing alignment short of
> generating an entirely new stack frame and aligning that.

Actually we already found all such cases and fixed them by forcing a new
stack frame, thanks to objtool.  For example, see 55a76b59b5fe.

> Ick.  This
> whole situation stinks, and I wish that the gcc developers had been
> less daft here in the first place or that we'd noticed and gotten it
> fixed much longer ago.
> 
> Can we come up with a macro like STACK_ALIGN_16 that turns into
> __aligned__(32) on bad gcc versions and combine that with your sparse
> patch?

Josh Poimboeuf Jan. 12, 2017, 8:55 p.m. UTC | #3

On Thu, Jan 12, 2017 at 02:15:11PM -0600, Josh Poimboeuf wrote:
> On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> > On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >>
> > >> Just to clarify, I think you're asking if, for versions of gcc which
> > >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> > >> functions to ensure their stacks are 16-byte aligned.
> > >>
> > >> It's certainly possible, but I don't see how that solves the problem.
> > >> The stack will still be misaligned by entry code.  Or am I missing
> > >> something?
> > >
> > > I think the argument is that we *could* try to align things, if we
> > > just had some tool that actually then verified that we aren't missing
> > > anything.
> > >
> > > I'm not entirely happy with checking the generated code, though,
> > > because as Ingo says, you have a 50:50 chance of just getting it right
> > > by mistake. So I'd much rather have some static tool that checks
> > > things at a code level (ie coccinelle or sparse).
> > 
> > What I meant was checking the entry code to see if it aligns stack
> > frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> > alignment for real may actually be entirely a lost cause.  After all,
> > I think we have some inline functions that do asm volatile ("call
> > ..."), and I don't see any credible way of forcing alignment short of
> > generating an entirely new stack frame and aligning that.
> 
> Actually we already found all such cases and fixed them by forcing a new
> stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
> 
> > Ick.  This
> > whole situation stinks, and I wish that the gcc developers had been
> > less daft here in the first place or that we'd noticed and gotten it
> > fixed much longer ago.
> > 
> > Can we come up with a macro like STACK_ALIGN_16 that turns into
> > __aligned__(32) on bad gcc versions and combine that with your sparse
> > patch?

This could work.  Only concerns I'd have are:

- Are there (or will there be in the future) any asm functions which
  assume a 16-byte aligned stack?  (Seems unlikely.  Stack alignment is
  common in the crypto code but they do the alignment manually.)

- Who's going to run sparse all the time to catch unauthorized users of
  __aligned__(16)?

Linus Torvalds Jan. 12, 2017, 9:40 p.m. UTC | #4

On Thu, Jan 12, 2017 at 12:55 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> - Who's going to run sparse all the time to catch unauthorized users of
>   __aligned__(16)?

Well, considering that we apparently only have a small handful of
existing users without anybody having ever run any tool at all, I
don't think this is necessarily a huge problem.

One of the build servers could easily add the "make C=2" case to a
build test, and just grep the error reports for the 'excessive
alignment' string. The zero-day build bot already does much fancier
things.

So I don't think it would necessarily be all that hard to get a clean
build, and just say "if you need aligned stack space, you have to do
it yourself by hand".

That saId, if we now always enable frame pointers on x86 (and it has
gotten more and more difficult to avoid it), then the 16-byte
alignment would fairly natural.

The 8-byte alignment mainly makes sense when the basic call sequence
just adds 8 bytes, and you have functions without frames (that still
call other functions).

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andy Lutomirski Jan. 13, 2017, 1:46 a.m. UTC | #5

On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> >>
>> >> Just to clarify, I think you're asking if, for versions of gcc which
>> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> >> functions to ensure their stacks are 16-byte aligned.
>> >>
>> >> It's certainly possible, but I don't see how that solves the problem.
>> >> The stack will still be misaligned by entry code.  Or am I missing
>> >> something?
>> >
>> > I think the argument is that we *could* try to align things, if we
>> > just had some tool that actually then verified that we aren't missing
>> > anything.
>> >
>> > I'm not entirely happy with checking the generated code, though,
>> > because as Ingo says, you have a 50:50 chance of just getting it right
>> > by mistake. So I'd much rather have some static tool that checks
>> > things at a code level (ie coccinelle or sparse).
>>
>> What I meant was checking the entry code to see if it aligns stack
>> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
>> alignment for real may actually be entirely a lost cause.  After all,
>> I think we have some inline functions that do asm volatile ("call
>> ..."), and I don't see any credible way of forcing alignment short of
>> generating an entirely new stack frame and aligning that.
>
> Actually we already found all such cases and fixed them by forcing a new
> stack frame, thanks to objtool.  For example, see 55a76b59b5fe.

What I mean is: what guarantees that the stack is properly aligned for
the subroutine call?  gcc promises to set up a stack frame, but does
it promise that rsp will be properly aligned to call a C function?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josh Poimboeuf Jan. 13, 2017, 3:11 a.m. UTC | #6

On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> >>
> >> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> >> functions to ensure their stacks are 16-byte aligned.
> >> >>
> >> >> It's certainly possible, but I don't see how that solves the problem.
> >> >> The stack will still be misaligned by entry code.  Or am I missing
> >> >> something?
> >> >
> >> > I think the argument is that we *could* try to align things, if we
> >> > just had some tool that actually then verified that we aren't missing
> >> > anything.
> >> >
> >> > I'm not entirely happy with checking the generated code, though,
> >> > because as Ingo says, you have a 50:50 chance of just getting it right
> >> > by mistake. So I'd much rather have some static tool that checks
> >> > things at a code level (ie coccinelle or sparse).
> >>
> >> What I meant was checking the entry code to see if it aligns stack
> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> >> alignment for real may actually be entirely a lost cause.  After all,
> >> I think we have some inline functions that do asm volatile ("call
> >> ..."), and I don't see any credible way of forcing alignment short of
> >> generating an entirely new stack frame and aligning that.
> >
> > Actually we already found all such cases and fixed them by forcing a new
> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
> 
> What I mean is: what guarantees that the stack is properly aligned for
> the subroutine call?  gcc promises to set up a stack frame, but does
> it promise that rsp will be properly aligned to call a C function?

Yes, I did an experiment and you're right.  I had naively assumed that
all stack frames would be aligned.

Andy Lutomirski Jan. 13, 2017, 3:23 a.m. UTC | #7

On Thu, Jan 12, 2017 at 7:11 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>> >> <torvalds@linux-foundation.org> wrote:
>> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> >> >>
>> >> >> Just to clarify, I think you're asking if, for versions of gcc which
>> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> >> >> functions to ensure their stacks are 16-byte aligned.
>> >> >>
>> >> >> It's certainly possible, but I don't see how that solves the problem.
>> >> >> The stack will still be misaligned by entry code.  Or am I missing
>> >> >> something?
>> >> >
>> >> > I think the argument is that we *could* try to align things, if we
>> >> > just had some tool that actually then verified that we aren't missing
>> >> > anything.
>> >> >
>> >> > I'm not entirely happy with checking the generated code, though,
>> >> > because as Ingo says, you have a 50:50 chance of just getting it right
>> >> > by mistake. So I'd much rather have some static tool that checks
>> >> > things at a code level (ie coccinelle or sparse).
>> >>
>> >> What I meant was checking the entry code to see if it aligns stack
>> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
>> >> alignment for real may actually be entirely a lost cause.  After all,
>> >> I think we have some inline functions that do asm volatile ("call
>> >> ..."), and I don't see any credible way of forcing alignment short of
>> >> generating an entirely new stack frame and aligning that.
>> >
>> > Actually we already found all such cases and fixed them by forcing a new
>> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
>>
>> What I mean is: what guarantees that the stack is properly aligned for
>> the subroutine call?  gcc promises to set up a stack frame, but does
>> it promise that rsp will be properly aligned to call a C function?
>
> Yes, I did an experiment and you're right.  I had naively assumed that
> all stack frames would be aligned.

Just to check: did you do your experiment with -mpreferred-stack-boundary=4?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josh Poimboeuf Jan. 13, 2017, 4:27 a.m. UTC | #8

On Thu, Jan 12, 2017 at 07:23:18PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 7:11 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> >> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> >> >> <torvalds@linux-foundation.org> wrote:
> >> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> >> >>
> >> >> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> >> >> functions to ensure their stacks are 16-byte aligned.
> >> >> >>
> >> >> >> It's certainly possible, but I don't see how that solves the problem.
> >> >> >> The stack will still be misaligned by entry code.  Or am I missing
> >> >> >> something?
> >> >> >
> >> >> > I think the argument is that we *could* try to align things, if we
> >> >> > just had some tool that actually then verified that we aren't missing
> >> >> > anything.
> >> >> >
> >> >> > I'm not entirely happy with checking the generated code, though,
> >> >> > because as Ingo says, you have a 50:50 chance of just getting it right
> >> >> > by mistake. So I'd much rather have some static tool that checks
> >> >> > things at a code level (ie coccinelle or sparse).
> >> >>
> >> >> What I meant was checking the entry code to see if it aligns stack
> >> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> >> >> alignment for real may actually be entirely a lost cause.  After all,
> >> >> I think we have some inline functions that do asm volatile ("call
> >> >> ..."), and I don't see any credible way of forcing alignment short of
> >> >> generating an entirely new stack frame and aligning that.
> >> >
> >> > Actually we already found all such cases and fixed them by forcing a new
> >> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
> >>
> >> What I mean is: what guarantees that the stack is properly aligned for
> >> the subroutine call?  gcc promises to set up a stack frame, but does
> >> it promise that rsp will be properly aligned to call a C function?
> >
> > Yes, I did an experiment and you're right.  I had naively assumed that
> > all stack frames would be aligned.
> 
> Just to check: did you do your experiment with -mpreferred-stack-boundary=4?

Yes, but it's too late for me to be doing hard stuff and I think my
first experiment was bogus.  I didn't use all the other kernel-specific
gcc options.

I tried again with all the kernel gcc options, except with
-mpreferred-stack-boundary=4 instead of 3, and actually came up with the
opposite conclusion.

I used the following code:

void otherfunc(void);

static inline void bar(long *f)
{
	asm volatile("call otherfunc" : : "m" (f) : );
}

void foo(void)
{
	long buf[3] = {0, 0, 0};
	bar(buf);
}

The stack frame was always 16-byte aligned regardless of whether the
buf array size was even or odd.

So my half-asleep brain is telling me that my original assumption was
right.

Josh Poimboeuf Jan. 13, 2017, 5:07 a.m. UTC | #9

On Thu, Jan 12, 2017 at 08:37:18PM -0800, Linus Torvalds wrote:
> On Jan 12, 2017 8:28 PM, "Josh Poimboeuf" <jpoimboe@redhat.com> wrote:
> 
> 
> The stack frame was always 16-byte aligned regardless of whether the
> buf array size was even or odd.
> 
> 
> Including with -fomit-frame-pointer?
> 
> With frame pointers, stack frames really are naturally 16 bytes, and then
> keeping the frame 16-byte aligned is just a matter of making any extra
> frame allocations or push/pop sequences that you do also be a multiple of
> 16 bytes.
> 
> But *without* frame pointers, the"native" frame size is just 8 bytes, and a
> function that doesn't need any other local storage and then calls another
> function (think various trivial wrapper functions that just add an argument
> and then munge the return value) would thus naturally cause the frame to
> become misaligned.
> 
> So then the compiler actually needs to start adding useless instructions
> just to keep the stack 16-byte aligned.

Disabling frame pointers didn't seem to help, but I finally got it to
misalign with a different test case.  I think it had been aligning the
array, so instead I made it push a register.


void otherfunc(void);

static inline void bar(int f)
{
	register void *__sp asm(_ASM_SP);
	asm volatile("call otherfunc" : "+r" (__sp) : "b"(f));
}

void foo(void)
{
	bar(5);
}


00000000000020f0 <foo>:
    20f0:	55                   	push   %rbp
    20f1:	48 89 e5             	mov    %rsp,%rbp
    20f4:	53                   	push   %rbx
    20f5:	bb 05 00 00 00       	mov    $0x5,%ebx
    20fa:	e8 00 00 00 00       	callq  20ff <foo+0xf>
			20fb: R_X86_64_PC32	otherfunc-0x4
    20ff:	5b                   	pop    %rbx
    2100:	5d                   	pop    %rbp
    2101:	c3                   	retq   
    2102:	0f 1f 40 00          	nopl   0x0(%rax)
    2106:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
    210d:	00 00 00

Herbert Xu Jan. 13, 2017, 8:36 a.m. UTC | #10

On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>
> I think we have some inline functions that do asm volatile ("call
> ..."), and I don't see any credible way of forcing alignment short of
> generating an entirely new stack frame and aligning that.  Ick.  This

A straight asm call from C should always work because gcc keeps
the stack aligned in the prologue.

The only problem with inline assembly is when you start pushing
things onto the stack directly.

Cheers,

Herbert Xu Jan. 13, 2017, 8:38 a.m. UTC | #11

On Thu, Jan 12, 2017 at 01:40:54PM -0800, Linus Torvalds wrote:
> 
> The 8-byte alignment mainly makes sense when the basic call sequence
> just adds 8 bytes, and you have functions without frames (that still
> call other functions).

The question is does it really make sense to save those 8 bytes
of padding on x86-64 when arm64 apparently also requires 16-byte
stack alignment.

Cheers,

Herbert Xu Jan. 13, 2017, 8:39 a.m. UTC | #12

On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
> 
> What I mean is: what guarantees that the stack is properly aligned for
> the subroutine call?  gcc promises to set up a stack frame, but does
> it promise that rsp will be properly aligned to call a C function?

Yes, as long as you don't go behind its back and start directly
pushing things onto the stack with inline asm.

Cheers,

Herbert Xu Jan. 13, 2017, 8:42 a.m. UTC | #13

On Thu, Jan 12, 2017 at 08:37:18PM -0800, Linus Torvalds wrote:
>
> So then the compiler actually needs to start adding useless instructions
> just to keep the stack 16-byte aligned.

Which it does.  Of course most of the time no extra instructions
are required because there are stack variables, so it's just matter
of adding 8 to the value you're subtracting from rsp.  But it is
probably why gcc assumes that the stack is 16-byte aligned which
triggered my original crash.

Here is an example from the function that was involved in the crash,
without frame pointers:

00000000000001b0 <chacha20_simd>:
 1b0:   41 54                   push   %r12
 1b2:   55                      push   %rbp
 1b3:   48 81 ec f8 00 00 00    sub    $0xf8,%rsp

Cheers,

Herbert Xu Jan. 13, 2017, 8:43 a.m. UTC | #14

On Thu, Jan 12, 2017 at 11:07:09PM -0600, Josh Poimboeuf wrote:
> 
> Disabling frame pointers didn't seem to help, but I finally got it to
> misalign with a different test case.  I think it had been aligning the
> array, so instead I made it push a register.

Right.  If you start manipulating the stack directly then all bets
are off.

Cheers,

Josh Poimboeuf Jan. 13, 2017, 1:07 p.m. UTC | #15

On Fri, Jan 13, 2017 at 04:36:48PM +0800, Herbert Xu wrote:
> On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> >
> > I think we have some inline functions that do asm volatile ("call
> > ..."), and I don't see any credible way of forcing alignment short of
> > generating an entirely new stack frame and aligning that.  Ick.  This
> 
> A straight asm call from C should always work because gcc keeps
> the stack aligned in the prologue.
> 
> The only problem with inline assembly is when you start pushing
> things onto the stack directly.

I tried another approach.  I rebuilt the kernel with
-mpreferred-stack-boundary=4 and used awk (poor man's objtool) to find
all leaf functions with misaligned stacks.

  objdump -d ~/k/vmlinux | awk '/>:/ { f=$2; call=0; push=0 } /fentry/ { next } /callq/ { call=1 } /push/ { push=!push } /sub.*8,%rsp/ { push=!push } /^$/ && call == 0 && push == 0 { print f }'

It found a lot of functions.  Here's one of them:

  ffffffff814ab450 <mpihelp_add_n>:
  ffffffff814ab450:	55                   	push   %rbp
  ffffffff814ab451:	f7 d9                	neg    %ecx
  ffffffff814ab453:	31 c0                	xor    %eax,%eax
  ffffffff814ab455:	4c 63 c1             	movslq %ecx,%r8
  ffffffff814ab458:	48 89 e5             	mov    %rsp,%rbp
  ffffffff814ab45b:	53                   	push   %rbx
  ffffffff814ab45c:	4a 8d 1c c5 00 00 00 	lea    0x0(,%r8,8),%rbx
  ffffffff814ab463:	00 
  ffffffff814ab464:	eb 03                	jmp    ffffffff814ab469 <mpihelp_add_n+0x19>
  ffffffff814ab466:	4c 63 c1             	movslq %ecx,%r8
  ffffffff814ab469:	49 c1 e0 03          	shl    $0x3,%r8
  ffffffff814ab46d:	45 31 c9             	xor    %r9d,%r9d
  ffffffff814ab470:	49 29 d8             	sub    %rbx,%r8
  ffffffff814ab473:	4a 03 04 02          	add    (%rdx,%r8,1),%rax
  ffffffff814ab477:	41 0f 92 c1          	setb   %r9b
  ffffffff814ab47b:	4a 03 04 06          	add    (%rsi,%r8,1),%rax
  ffffffff814ab47f:	41 0f 92 c2          	setb   %r10b
  ffffffff814ab483:	49 89 c3             	mov    %rax,%r11
  ffffffff814ab486:	83 c1 01             	add    $0x1,%ecx
  ffffffff814ab489:	45 0f b6 d2          	movzbl %r10b,%r10d
  ffffffff814ab48d:	4e 89 1c 07          	mov    %r11,(%rdi,%r8,1)
  ffffffff814ab491:	4b 8d 04 0a          	lea    (%r10,%r9,1),%rax
  ffffffff814ab495:	75 cf                	jne    ffffffff814ab466 <mpihelp_add_n+0x16>
  ffffffff814ab497:	5b                   	pop    %rbx
  ffffffff814ab498:	5d                   	pop    %rbp
  ffffffff814ab499:	c3                   	retq   
  ffffffff814ab49a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

That's a leaf function which, as far as I can tell, doesn't use any
inline asm, but its prologue produces a misaligned stack.

I added inline asm with a call instruction and no operands or clobbers,
and got the same result.

So Andy's theory seems to be correct.  As long as we allow calls from
inline asm, we can't rely on aligned stacks.

x86-64: Maintain 16-byte stack alignment

Commit Message

Comments

Patch