mbox series

[0/7] Ensure stack is aligned for kernel entries

Message ID 1537970184-44348-1-git-send-email-julien.thierry@arm.com (mailing list archive)
Headers show
Series Ensure stack is aligned for kernel entries | expand

Message

Julien Thierry Sept. 26, 2018, 1:56 p.m. UTC
Hi,

Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
using it to access memory. When taking an exception, it is possible that
the context during which the exception occured had SP mis-aligned. The
entry code needs to make sure that the stack is aligned before using it to
save the context.

This is only a concern when taking exception from an EL using the same
SP_ELx as the handler. In other cases it can be assumed that the SP being
picked up on exception entry is aligned under the condition that SP is
always aligned when doing eret to an EL using a different SP.

On Juno I see a runtime difference <1% for hackbench. If I do not include
the fast path at EL1 (patch 4) I see a diff of 1-2%.

For EL2 entries, a bit of clean up of stuff getting patched in the vector
has been needed.

Cheers,

Julien

-->

Julien Thierry (7):
  arm64: Add static check for pt_regs alignment
  arm64: sdei: Always use sdei stack for sdei events
  arm64: Align stack when taking exception from EL1
  arm64: Add fast-path for stack alignment
  arm64: Do not apply BP hardening for hyp entries from EL2
  arm64: Do not apply vector harderning for hyp entries from EL2
  arm64: kvm: Align stack for exception coming from EL2

 arch/arm64/include/asm/assembler.h |  9 +++++
 arch/arm64/include/asm/ptrace.h    |  2 +
 arch/arm64/include/asm/sdei.h      |  2 -
 arch/arm64/kernel/cpu_errata.c     | 10 ++++-
 arch/arm64/kernel/entry.S          | 43 +++++++++++++++++++--
 arch/arm64/kernel/sdei.c           | 23 ++++-------
 arch/arm64/kvm/hyp/hyp-entry.S     | 78 +++++++++++++++++++++++++++++++-------
 drivers/firmware/Kconfig           |  1 +
 8 files changed, 132 insertions(+), 36 deletions(-)

--
1.9.1

Comments

Julien Thierry Oct. 19, 2018, 8:07 a.m. UTC | #1
Gentle ping on this series.

On 26/09/18 14:56, Julien Thierry wrote:
> Hi,
> 
> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
> using it to access memory. When taking an exception, it is possible that
> the context during which the exception occured had SP mis-aligned. The
> entry code needs to make sure that the stack is aligned before using it to
> save the context.
> 
> This is only a concern when taking exception from an EL using the same
> SP_ELx as the handler. In other cases it can be assumed that the SP being
> picked up on exception entry is aligned under the condition that SP is
> always aligned when doing eret to an EL using a different SP.
> 
> On Juno I see a runtime difference <1% for hackbench. If I do not include
> the fast path at EL1 (patch 4) I see a diff of 1-2%.
> 
> For EL2 entries, a bit of clean up of stuff getting patched in the vector
> has been needed.
> 
> Cheers,
> 
> Julien
> 
> -->
> 
> Julien Thierry (7):
>    arm64: Add static check for pt_regs alignment
>    arm64: sdei: Always use sdei stack for sdei events
>    arm64: Align stack when taking exception from EL1
>    arm64: Add fast-path for stack alignment
>    arm64: Do not apply BP hardening for hyp entries from EL2
>    arm64: Do not apply vector harderning for hyp entries from EL2
>    arm64: kvm: Align stack for exception coming from EL2
> 
>   arch/arm64/include/asm/assembler.h |  9 +++++
>   arch/arm64/include/asm/ptrace.h    |  2 +
>   arch/arm64/include/asm/sdei.h      |  2 -
>   arch/arm64/kernel/cpu_errata.c     | 10 ++++-
>   arch/arm64/kernel/entry.S          | 43 +++++++++++++++++++--
>   arch/arm64/kernel/sdei.c           | 23 ++++-------
>   arch/arm64/kvm/hyp/hyp-entry.S     | 78 +++++++++++++++++++++++++++++++-------
>   drivers/firmware/Kconfig           |  1 +
>   8 files changed, 132 insertions(+), 36 deletions(-)
> 
> --
> 1.9.1
>
Will Deacon Nov. 7, 2018, 9:58 p.m. UTC | #2
Hi Julien,

On Wed, Sep 26, 2018 at 02:56:17PM +0100, Julien Thierry wrote:
> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
> using it to access memory. When taking an exception, it is possible that
> the context during which the exception occured had SP mis-aligned. The
> entry code needs to make sure that the stack is aligned before using it to
> save the context.

Do you know what we haven't had reports of this crashing? Is it because GCC
tends to keep the SP aligned anyway, so we're getting away with it for the
moment? Trying to work out whether this is a candidate for -stable.

Cheers,

Will
Julien Thierry Nov. 8, 2018, 12:43 p.m. UTC | #3
Hi Will,

On 07/11/18 21:58, Will Deacon wrote:
> Hi Julien,
> 
> On Wed, Sep 26, 2018 at 02:56:17PM +0100, Julien Thierry wrote:
>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>> using it to access memory. When taking an exception, it is possible that
>> the context during which the exception occured had SP mis-aligned. The
>> entry code needs to make sure that the stack is aligned before using it to
>> save the context.
> 
> Do you know what we haven't had reports of this crashing? Is it because GCC
> tends to keep the SP aligned anyway, so we're getting away with it for the
> moment? Trying to work out whether this is a candidate for -stable.
> 

I think that GCC tends to keep the SP aligned anyway is the most likely 
explanation, yes.

I tried looking for specific options that could make this more likely, 
but all I could find was the option -mpreferred-stack-boundary only 
available for x86 and -mstack-alignment only provided by clang.

Couldn't find anything yet on the gcc arm64 side that would either 
guarantee we'd have an aligned stack nor that GCC would make it very 
very likely.

I can try to investigate a bit more.

Thanks,
Ard Biesheuvel Nov. 8, 2018, 1:04 p.m. UTC | #4
On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> wrote:
> Hi,
>
> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
> using it to access memory. When taking an exception, it is possible that
> the context during which the exception occured had SP mis-aligned.

How is this possible? GCC clearly only manipulates the stack pointer
in 16 byte multiples, and so if we do the same in our asm code (which
I think we already do, given the lack of reports about this issue), is
this handling really necessary?


> The
> entry code needs to make sure that the stack is aligned before using it to
> save the context.
>
> This is only a concern when taking exception from an EL using the same
> SP_ELx as the handler. In other cases it can be assumed that the SP being
> picked up on exception entry is aligned under the condition that SP is
> always aligned when doing eret to an EL using a different SP.
>
> On Juno I see a runtime difference <1% for hackbench. If I do not include
> the fast path at EL1 (patch 4) I see a diff of 1-2%.
>
> For EL2 entries, a bit of clean up of stuff getting patched in the vector
> has been needed.
>
> Cheers,
>
> Julien
>
> -->
>
> Julien Thierry (7):
>   arm64: Add static check for pt_regs alignment
>   arm64: sdei: Always use sdei stack for sdei events
>   arm64: Align stack when taking exception from EL1
>   arm64: Add fast-path for stack alignment
>   arm64: Do not apply BP hardening for hyp entries from EL2
>   arm64: Do not apply vector harderning for hyp entries from EL2
>   arm64: kvm: Align stack for exception coming from EL2
>
>  arch/arm64/include/asm/assembler.h |  9 +++++
>  arch/arm64/include/asm/ptrace.h    |  2 +
>  arch/arm64/include/asm/sdei.h      |  2 -
>  arch/arm64/kernel/cpu_errata.c     | 10 ++++-
>  arch/arm64/kernel/entry.S          | 43 +++++++++++++++++++--
>  arch/arm64/kernel/sdei.c           | 23 ++++-------
>  arch/arm64/kvm/hyp/hyp-entry.S     | 78 +++++++++++++++++++++++++++++++-------
>  drivers/firmware/Kconfig           |  1 +
>  8 files changed, 132 insertions(+), 36 deletions(-)
>
> --
> 1.9.1
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Julien Thierry Nov. 8, 2018, 1:27 p.m. UTC | #5
On 08/11/18 13:04, Ard Biesheuvel wrote:
> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> wrote:
>> Hi,
>>
>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>> using it to access memory. When taking an exception, it is possible that
>> the context during which the exception occured had SP mis-aligned.
> 
> How is this possible? GCC clearly only manipulates the stack pointer
> in 16 byte multiples, and so if we do the same in our asm code (which
> I think we already do, given the lack of reports about this issue), is
> this handling really necessary?
> 

Is there anything that actually gives us that guarantee from GCC? I 
agree that currently it looks like aarch64-<...>-gcc only manipulates SP 
aligned to 16 bytes, but I don't know whether that is certain.

The series can be dropped if there is enough confidence that this won't 
happen.

Thanks,

> 
>> The
>> entry code needs to make sure that the stack is aligned before using it to
>> save the context.
>>
>> This is only a concern when taking exception from an EL using the same
>> SP_ELx as the handler. In other cases it can be assumed that the SP being
>> picked up on exception entry is aligned under the condition that SP is
>> always aligned when doing eret to an EL using a different SP.
>>
>> On Juno I see a runtime difference <1% for hackbench. If I do not include
>> the fast path at EL1 (patch 4) I see a diff of 1-2%.
>>
>> For EL2 entries, a bit of clean up of stuff getting patched in the vector
>> has been needed.
>>
>> Cheers,
>>
>> Julien
>>
>> -->
>>
>> Julien Thierry (7):
>>    arm64: Add static check for pt_regs alignment
>>    arm64: sdei: Always use sdei stack for sdei events
>>    arm64: Align stack when taking exception from EL1
>>    arm64: Add fast-path for stack alignment
>>    arm64: Do not apply BP hardening for hyp entries from EL2
>>    arm64: Do not apply vector harderning for hyp entries from EL2
>>    arm64: kvm: Align stack for exception coming from EL2
>>
>>   arch/arm64/include/asm/assembler.h |  9 +++++
>>   arch/arm64/include/asm/ptrace.h    |  2 +
>>   arch/arm64/include/asm/sdei.h      |  2 -
>>   arch/arm64/kernel/cpu_errata.c     | 10 ++++-
>>   arch/arm64/kernel/entry.S          | 43 +++++++++++++++++++--
>>   arch/arm64/kernel/sdei.c           | 23 ++++-------
>>   arch/arm64/kvm/hyp/hyp-entry.S     | 78 +++++++++++++++++++++++++++++++-------
>>   drivers/firmware/Kconfig           |  1 +
>>   8 files changed, 132 insertions(+), 36 deletions(-)
>>
>> --
>> 1.9.1
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Nov. 8, 2018, 2:10 p.m. UTC | #6
(+ Ramana)

On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote:
>
>
> On 08/11/18 13:04, Ard Biesheuvel wrote:
>>
>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>>> using it to access memory. When taking an exception, it is possible that
>>> the context during which the exception occured had SP mis-aligned.
>>
>>
>> How is this possible? GCC clearly only manipulates the stack pointer
>> in 16 byte multiples, and so if we do the same in our asm code (which
>> I think we already do, given the lack of reports about this issue), is
>> this handling really necessary?
>>
>
> Is there anything that actually gives us that guarantee from GCC? I agree
> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned
> to 16 bytes, but I don't know whether that is certain.
>

I think we should get that clarified then. I don't think it makes
sense for GCC to have to reason about whether SP currently has a value
that permits dereferencing.

> The series can be dropped if there is enough confidence that this won't
> happen.
>
> Thanks,
>
>
>>
>>> The
>>> entry code needs to make sure that the stack is aligned before using it
>>> to
>>> save the context.
>>>
>>> This is only a concern when taking exception from an EL using the same
>>> SP_ELx as the handler. In other cases it can be assumed that the SP being
>>> picked up on exception entry is aligned under the condition that SP is
>>> always aligned when doing eret to an EL using a different SP.
>>>
>>> On Juno I see a runtime difference <1% for hackbench. If I do not include
>>> the fast path at EL1 (patch 4) I see a diff of 1-2%.
>>>
>>> For EL2 entries, a bit of clean up of stuff getting patched in the vector
>>> has been needed.
>>>
>>> Cheers,
>>>
>>> Julien
>>>
>>> -->
>>>
>>> Julien Thierry (7):
>>>    arm64: Add static check for pt_regs alignment
>>>    arm64: sdei: Always use sdei stack for sdei events
>>>    arm64: Align stack when taking exception from EL1
>>>    arm64: Add fast-path for stack alignment
>>>    arm64: Do not apply BP hardening for hyp entries from EL2
>>>    arm64: Do not apply vector harderning for hyp entries from EL2
>>>    arm64: kvm: Align stack for exception coming from EL2
>>>
>>>   arch/arm64/include/asm/assembler.h |  9 +++++
>>>   arch/arm64/include/asm/ptrace.h    |  2 +
>>>   arch/arm64/include/asm/sdei.h      |  2 -
>>>   arch/arm64/kernel/cpu_errata.c     | 10 ++++-
>>>   arch/arm64/kernel/entry.S          | 43 +++++++++++++++++++--
>>>   arch/arm64/kernel/sdei.c           | 23 ++++-------
>>>   arch/arm64/kvm/hyp/hyp-entry.S     | 78
>>> +++++++++++++++++++++++++++++++-------
>>>   drivers/firmware/Kconfig           |  1 +
>>>   8 files changed, 132 insertions(+), 36 deletions(-)
>>>
>>> --
>>> 1.9.1
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
>
> --
> Julien Thierry
Ramana Radhakrishnan Nov. 8, 2018, 2:19 p.m. UTC | #7
On 08/11/2018 14:10, Ard Biesheuvel wrote:
> (+ Ramana)
> 
> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote:
>>
>>
>> On 08/11/18 13:04, Ard Biesheuvel wrote:
>>>
>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>>>> using it to access memory. When taking an exception, it is possible that
>>>> the context during which the exception occured had SP mis-aligned.
>>>
>>>
>>> How is this possible? GCC clearly only manipulates the stack pointer
>>> in 16 byte multiples, and so if we do the same in our asm code (which
>>> I think we already do, given the lack of reports about this issue), is
>>> this handling really necessary?
>>>
>>
>> Is there anything that actually gives us that guarantee from GCC? I agree
>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned
>> to 16 bytes, but I don't know whether that is certain.
>>
> 
> I think we should get that clarified then. I don't think it makes
> sense for GCC to have to reason about whether SP currently has a value
> that permits dereferencing.

The ABI gives that guarantee.

http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf

<quote>

5.2.2.1 Universal stack constraints

<...>

Additionally, at any point at which memory is accessed
via SP, the hardware requires that SP mod 16 = 0.  The stack must be 
quad-word aligned

</end quote>

regards
Ramana
Julien Thierry Nov. 8, 2018, 3:01 p.m. UTC | #8
On 08/11/18 14:19, Ramana Radhakrishnan wrote:
> On 08/11/2018 14:10, Ard Biesheuvel wrote:
>> (+ Ramana)
>>
>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote:
>>>
>>>
>>> On 08/11/18 13:04, Ard Biesheuvel wrote:
>>>>
>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>>>>> using it to access memory. When taking an exception, it is possible that
>>>>> the context during which the exception occured had SP mis-aligned.
>>>>
>>>>
>>>> How is this possible? GCC clearly only manipulates the stack pointer
>>>> in 16 byte multiples, and so if we do the same in our asm code (which
>>>> I think we already do, given the lack of reports about this issue), is
>>>> this handling really necessary?
>>>>
>>>
>>> Is there anything that actually gives us that guarantee from GCC? I agree
>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned
>>> to 16 bytes, but I don't know whether that is certain.
>>>
>>
>> I think we should get that clarified then. I don't think it makes
>> sense for GCC to have to reason about whether SP currently has a value
>> that permits dereferencing.
> 
> The ABI gives that guarantee.
> 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
> 
> <quote>
> 
> 5.2.2.1 Universal stack constraints
> 
> <...>
> 
> Additionally, at any point at which memory is accessed
> via SP, the hardware requires that SP mod 16 = 0.  The stack must be
> quad-word aligned
> 
> </end quote>
> 

Thanks Ramana. Sadly I don't think this clarifies things. The issue is 
that the guarantee is only that SP is aligned when we will use it to 
access memory, but still allows for a scenario like:

-> Updating SP with non 16bytes-aligned value (it's fine as long as the 
following code takes care to align it before accessing memory)
-> IRQ is raised before SP gets aligned
-> Vector entry starts saving context on misaligned stack
-> Misaligned SP exception

The only thing that would avoid us the trouble is a guarantee that GCC 
never modifies SP in such a way that SP is not 16-bytes aligned.

Thanks,
Dave Martin Nov. 8, 2018, 3:30 p.m. UTC | #9
On Thu, Nov 08, 2018 at 02:19:14PM +0000, Ramana Radhakrishnan wrote:
> On 08/11/2018 14:10, Ard Biesheuvel wrote:
> > (+ Ramana)
> > 
> > On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote:
> >>
> >>
> >> On 08/11/18 13:04, Ard Biesheuvel wrote:
> >>>
> >>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com>
> >>> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
> >>>> using it to access memory. When taking an exception, it is possible that
> >>>> the context during which the exception occured had SP mis-aligned.
> >>>
> >>>
> >>> How is this possible? GCC clearly only manipulates the stack pointer
> >>> in 16 byte multiples, and so if we do the same in our asm code (which
> >>> I think we already do, given the lack of reports about this issue), is
> >>> this handling really necessary?
> >>>
> >>
> >> Is there anything that actually gives us that guarantee from GCC? I agree
> >> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned
> >> to 16 bytes, but I don't know whether that is certain.
> >>
> > 
> > I think we should get that clarified then. I don't think it makes
> > sense for GCC to have to reason about whether SP currently has a value
> > that permits dereferencing.
> 
> The ABI gives that guarantee.
> 
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
> 
> <quote>

Surely This only applies at public interfaces?

We make be re-entering the kernel due to an exception taken in the
middle of a function.

In theory, the compiler is allowed to temporarily misalign SP (i.e., the
procedure call standard doesn't forbid that).


So, we'd need to be confident that GCC and LLVM and any other compiler
we care about _guarantee_ never to do that.  (And that there is not asm
in the kernel that will do so.)

Cheers
---Dave
Ramana Radhakrishnan Nov. 8, 2018, 3:33 p.m. UTC | #10
On 08/11/2018 15:30, Dave Martin wrote:
> On Thu, Nov 08, 2018 at 02:19:14PM +0000, Ramana Radhakrishnan wrote:
>> On 08/11/2018 14:10, Ard Biesheuvel wrote:
>>> (+ Ramana)
>>>
>>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote:
>>>>
>>>>
>>>> On 08/11/18 13:04, Ard Biesheuvel wrote:
>>>>>
>>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>>>>>> using it to access memory. When taking an exception, it is possible that
>>>>>> the context during which the exception occured had SP mis-aligned.
>>>>>
>>>>>
>>>>> How is this possible? GCC clearly only manipulates the stack pointer
>>>>> in 16 byte multiples, and so if we do the same in our asm code (which
>>>>> I think we already do, given the lack of reports about this issue), is
>>>>> this handling really necessary?
>>>>>
>>>>
>>>> Is there anything that actually gives us that guarantee from GCC? I agree
>>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned
>>>> to 16 bytes, but I don't know whether that is certain.
>>>>
>>>
>>> I think we should get that clarified then. I don't think it makes
>>> sense for GCC to have to reason about whether SP currently has a value
>>> that permits dereferencing.
>>
>> The ABI gives that guarantee.
>>
>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>>
>> <quote>
> 
> Surely This only applies at public interfaces?
> 

I don't think this has anything to do with public interfaces. If there 
is a trap with a 16byte misaligned access of the SP then it doesn't 
matter whether it's a public interface or not.

regards
Ramana
Ramana Radhakrishnan Nov. 8, 2018, 3:33 p.m. UTC | #11
On 08/11/2018 15:01, Julien Thierry wrote:
> 
> 
> On 08/11/18 14:19, Ramana Radhakrishnan wrote:
>> On 08/11/2018 14:10, Ard Biesheuvel wrote:
>>> (+ Ramana)
>>>
>>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote:
>>>>
>>>>
>>>> On 08/11/18 13:04, Ard Biesheuvel wrote:
>>>>>
>>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>>>>>> using it to access memory. When taking an exception, it is possible that
>>>>>> the context during which the exception occured had SP mis-aligned.
>>>>>
>>>>>
>>>>> How is this possible? GCC clearly only manipulates the stack pointer
>>>>> in 16 byte multiples, and so if we do the same in our asm code (which
>>>>> I think we already do, given the lack of reports about this issue), is
>>>>> this handling really necessary?
>>>>>
>>>>
>>>> Is there anything that actually gives us that guarantee from GCC? I agree
>>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned
>>>> to 16 bytes, but I don't know whether that is certain.
>>>>
>>>
>>> I think we should get that clarified then. I don't think it makes
>>> sense for GCC to have to reason about whether SP currently has a value
>>> that permits dereferencing.
>>
>> The ABI gives that guarantee.
>>
>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>>
>> <quote>
>>
>> 5.2.2.1 Universal stack constraints
>>
>> <...>
>>
>> Additionally, at any point at which memory is accessed
>> via SP, the hardware requires that SP mod 16 = 0.  The stack must be
>> quad-word aligned
>>
>> </end quote>
>>
> 
> Thanks Ramana. Sadly I don't think this clarifies things. The issue is
> that the guarantee is only that SP is aligned when we will use it to
> access memory, but still allows for a scenario like:


That is certainly a correct interpretation of the ABI language.

> 
> -> Updating SP with non 16bytes-aligned value (it's fine as long as the
> following code takes care to align it before accessing memory)
> -> IRQ is raised before SP gets aligned
> -> Vector entry starts saving context on misaligned stack
> -> Misaligned SP exception
> 
> The only thing that would avoid us the trouble is a guarantee that GCC
> never modifies SP in such a way that SP is not 16-bytes aligned.


I think it sort of falls out in the implementation from what I remember 
and see (and empirically checked with a couple of colleagues).

I don't think there is an ABI requirement that SP should left 16 byte 
aligned even for intermediate calculations.

Whether GCC does this or not today is immaterial and for userland it's 
certainly not the only code generator in town for this sort of question. 
The question also arises with other jits and other code generators which 
may leave the stack temporarily not aligned at 16 bytes. So it does 
sound like the right thing to do in the kernel defensively.

regards
Ramana
> 
> Thanks,
>
Dave Martin Nov. 8, 2018, 3:39 p.m. UTC | #12
On Thu, Nov 08, 2018 at 03:33:01PM +0000, Ramana Radhakrishnan wrote:
> On 08/11/2018 15:30, Dave Martin wrote:
> > On Thu, Nov 08, 2018 at 02:19:14PM +0000, Ramana Radhakrishnan wrote:
> >> On 08/11/2018 14:10, Ard Biesheuvel wrote:
> >>> (+ Ramana)
> >>>
> >>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote:
> >>>>
> >>>>
> >>>> On 08/11/18 13:04, Ard Biesheuvel wrote:
> >>>>>
> >>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
> >>>>>> using it to access memory. When taking an exception, it is possible that
> >>>>>> the context during which the exception occured had SP mis-aligned.
> >>>>>
> >>>>>
> >>>>> How is this possible? GCC clearly only manipulates the stack pointer
> >>>>> in 16 byte multiples, and so if we do the same in our asm code (which
> >>>>> I think we already do, given the lack of reports about this issue), is
> >>>>> this handling really necessary?
> >>>>>
> >>>>
> >>>> Is there anything that actually gives us that guarantee from GCC? I agree
> >>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned
> >>>> to 16 bytes, but I don't know whether that is certain.
> >>>>
> >>>
> >>> I think we should get that clarified then. I don't think it makes
> >>> sense for GCC to have to reason about whether SP currently has a value
> >>> that permits dereferencing.
> >>
> >> The ABI gives that guarantee.
> >>
> >> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
> >>
> >> <quote>
> > 
> > Surely This only applies at public interfaces?
> > 
> 
> I don't think this has anything to do with public interfaces. If there 
> is a trap with a 16byte misaligned access of the SP then it doesn't 
> matter whether it's a public interface or not.

We're not talking about SP alignment faults here particluarly.

We're talking about any exception that may be taken from EL1h to EL1h,
which may happen on random instructions inside a function, for random
reasons.

There was talk about running the kernel mostly in EL1t but I don't think
we currently do this (somebody please put me right if I'm wrong here!)

Cheers
---Dave
Julien Thierry Nov. 8, 2018, 3:41 p.m. UTC | #13
On 08/11/18 15:33, Ramana Radhakrishnan wrote:
> On 08/11/2018 15:01, Julien Thierry wrote:
>>
>>
>> On 08/11/18 14:19, Ramana Radhakrishnan wrote:
>>> On 08/11/2018 14:10, Ard Biesheuvel wrote:
>>>> (+ Ramana)
>>>>
>>>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote:
>>>>>
>>>>>
>>>>> On 08/11/18 13:04, Ard Biesheuvel wrote:
>>>>>>
>>>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>>>>>>> using it to access memory. When taking an exception, it is possible that
>>>>>>> the context during which the exception occured had SP mis-aligned.
>>>>>>
>>>>>>
>>>>>> How is this possible? GCC clearly only manipulates the stack pointer
>>>>>> in 16 byte multiples, and so if we do the same in our asm code (which
>>>>>> I think we already do, given the lack of reports about this issue), is
>>>>>> this handling really necessary?
>>>>>>
>>>>>
>>>>> Is there anything that actually gives us that guarantee from GCC? I agree
>>>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned
>>>>> to 16 bytes, but I don't know whether that is certain.
>>>>>
>>>>
>>>> I think we should get that clarified then. I don't think it makes
>>>> sense for GCC to have to reason about whether SP currently has a value
>>>> that permits dereferencing.
>>>
>>> The ABI gives that guarantee.
>>>
>>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>>>
>>> <quote>
>>>
>>> 5.2.2.1 Universal stack constraints
>>>
>>> <...>
>>>
>>> Additionally, at any point at which memory is accessed
>>> via SP, the hardware requires that SP mod 16 = 0.  The stack must be
>>> quad-word aligned
>>>
>>> </end quote>
>>>
>>
>> Thanks Ramana. Sadly I don't think this clarifies things. The issue is
>> that the guarantee is only that SP is aligned when we will use it to
>> access memory, but still allows for a scenario like:
> 
> 
> That is certainly a correct interpretation of the ABI language.
> 
>>
>> -> Updating SP with non 16bytes-aligned value (it's fine as long as the
>> following code takes care to align it before accessing memory)
>> -> IRQ is raised before SP gets aligned
>> -> Vector entry starts saving context on misaligned stack
>> -> Misaligned SP exception
>>
>> The only thing that would avoid us the trouble is a guarantee that GCC
>> never modifies SP in such a way that SP is not 16-bytes aligned.
> 
> 
> I think it sort of falls out in the implementation from what I remember
> and see (and empirically checked with a couple of colleagues).
> 
> I don't think there is an ABI requirement that SP should left 16 byte
> aligned even for intermediate calculations.
> 
> Whether GCC does this or not today is immaterial and for userland it's
> certainly not the only code generator in town for this sort of question.
> The question also arises with other jits and other code generators which
> may leave the stack temporarily not aligned at 16 bytes. So it does
> sound like the right thing to do in the kernel defensively.
> 

I don't think we really mind about userland. EL1 can (and, in Linux, 
does) use it's own SP. So having an interrupt while EL0 has a misaligned 
SP is not an issue for the kernel.

The cases covered by this series in only for exceptions received at the 
same EL that will handle that exception.

Cheers,
Ard Biesheuvel Nov. 8, 2018, 3:44 p.m. UTC | #14
On 8 November 2018 at 16:39, Dave Martin <Dave.Martin@arm.com> wrote:
> On Thu, Nov 08, 2018 at 03:33:01PM +0000, Ramana Radhakrishnan wrote:
>> On 08/11/2018 15:30, Dave Martin wrote:
>> > On Thu, Nov 08, 2018 at 02:19:14PM +0000, Ramana Radhakrishnan wrote:
>> >> On 08/11/2018 14:10, Ard Biesheuvel wrote:
>> >>> (+ Ramana)
>> >>>
>> >>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote:
>> >>>>
>> >>>>
>> >>>> On 08/11/18 13:04, Ard Biesheuvel wrote:
>> >>>>>
>> >>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>> >>>>>> using it to access memory. When taking an exception, it is possible that
>> >>>>>> the context during which the exception occured had SP mis-aligned.
>> >>>>>
>> >>>>>
>> >>>>> How is this possible? GCC clearly only manipulates the stack pointer
>> >>>>> in 16 byte multiples, and so if we do the same in our asm code (which
>> >>>>> I think we already do, given the lack of reports about this issue), is
>> >>>>> this handling really necessary?
>> >>>>>
>> >>>>
>> >>>> Is there anything that actually gives us that guarantee from GCC? I agree
>> >>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned
>> >>>> to 16 bytes, but I don't know whether that is certain.
>> >>>>
>> >>>
>> >>> I think we should get that clarified then. I don't think it makes
>> >>> sense for GCC to have to reason about whether SP currently has a value
>> >>> that permits dereferencing.
>> >>
>> >> The ABI gives that guarantee.
>> >>
>> >> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
>> >>
>> >> <quote>
>> >
>> > Surely This only applies at public interfaces?
>> >
>>
>> I don't think this has anything to do with public interfaces. If there
>> is a trap with a 16byte misaligned access of the SP then it doesn't
>> matter whether it's a public interface or not.
>
> We're not talking about SP alignment faults here particluarly.
>
> We're talking about any exception that may be taken from EL1h to EL1h,
> which may happen on random instructions inside a function, for random
> reasons.
>

Indeed.

But my question remains how likely it is that the compiler we use for
generating the kernel code (so we are not talking about userland JITs
or other crazy stuff here) would play with SP like that, especially
since it is no longer a general purpose register. To me, it would make
sense to attempt to reach an agreement with the GCC folks that
compiler generated code does not muck about with SP like that.

In asm, there is one notable exception, of course, where we swap x0
and sp temporarily in the entry path, but we can't get interrupted
there anyway.

In any case, if people insist, I'm ok with it. To me, it just seems
like unnecessary clutter for a theoretical issue.
Ramana Radhakrishnan Nov. 8, 2018, 4:57 p.m. UTC | #15
> 
> Indeed.
> 
> But my question remains how likely it is that the compiler we use for
> generating the kernel code (so we are not talking about userland JITs
> or other crazy stuff here) would play with SP like that, especially
> since it is no longer a general purpose register. To me, it would make
> sense to attempt to reach an agreement with the GCC folks that
> compiler generated code does not muck about with SP like that.


It falls out from the choice of instructions we use for this sort of 
thing and because frame sizes really get rounded up to 16 bytes.

In the explanation below assume the stack is aligned to 16 bytes on 
entry. Frame sizes are always a multiple of 16.

For frame sizes up to 240 bytes that's a stp fp, lr, [sp, #-<off>]!

For frame sizes up to 4k bytes, that's a single sub instruction. Again 
all frame sizes are 16 byte aligned and therefore it's all ok.

For frame sizes greater than this and up to 64k that's a mov to a 
temporary register with a 16 bit immediate followed by a single 
subtract, again not going to leave your stack frame misaligned.

 From frame sizes above that we split this into a subtract with a 
multiple of 4k (again 16 byte aligned) and the rest.

For frame size above 16MB we use a sequence of mov / movk's with a 
temporary register and then do a single subtract of SP and therefore 
that's also fine.

I think for the sake of this conversation with respect to compiling the 
kernel, I think we can safely say that GCC will end up leaving the stack 
16 byte aligned even with the intermediate computations and that 
shouldn't be an issue from my reading of the AArch64 backend.

So, probably move on but if you really want to be defensive you may want 
to carry this patch. However there's nothing that I will say about 
hand-written assembly (and I say that in the interest of completeness).

regards
Ramana
Ard Biesheuvel Nov. 8, 2018, 5:14 p.m. UTC | #16
On 8 November 2018 at 17:57, Ramana Radhakrishnan
<Ramana.Radhakrishnan@arm.com> wrote:
>
>>
>> Indeed.
>>
>> But my question remains how likely it is that the compiler we use for
>> generating the kernel code (so we are not talking about userland JITs
>> or other crazy stuff here) would play with SP like that, especially
>> since it is no longer a general purpose register. To me, it would make
>> sense to attempt to reach an agreement with the GCC folks that
>> compiler generated code does not muck about with SP like that.
>
>
> It falls out from the choice of instructions we use for this sort of
> thing and because frame sizes really get rounded up to 16 bytes.
>
> In the explanation below assume the stack is aligned to 16 bytes on
> entry. Frame sizes are always a multiple of 16.
>
> For frame sizes up to 240 bytes that's a stp fp, lr, [sp, #-<off>]!
>
> For frame sizes up to 4k bytes, that's a single sub instruction. Again
> all frame sizes are 16 byte aligned and therefore it's all ok.
>
> For frame sizes greater than this and up to 64k that's a mov to a
> temporary register with a 16 bit immediate followed by a single
> subtract, again not going to leave your stack frame misaligned.
>
>  From frame sizes above that we split this into a subtract with a
> multiple of 4k (again 16 byte aligned) and the rest.
>
> For frame size above 16MB we use a sequence of mov / movk's with a
> temporary register and then do a single subtract of SP and therefore
> that's also fine.
>

Thanks Ramana. In the kernel, stack frames are typically limited to ~1
KB in size, so most of these will never even be encountered in our
case.

> I think for the sake of this conversation with respect to compiling the
> kernel, I think we can safely say that GCC will end up leaving the stack
> 16 byte aligned even with the intermediate computations and that
> shouldn't be an issue from my reading of the AArch64 backend.
>
> So, probably move on but if you really want to be defensive you may want
> to carry this patch. However there's nothing that I will say about
> hand-written assembly (and I say that in the interest of completeness).
>

indeed, hand written assembly is an entirely different matter, but
that is a manageable quantity of code which is currenty clean in this
regard, as far as we know.
Julien Thierry Nov. 22, 2018, 11:49 a.m. UTC | #17
Hi,

On 26/09/18 14:56, Julien Thierry wrote:
> Hi,
> 
> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
> using it to access memory. When taking an exception, it is possible that
> the context during which the exception occured had SP mis-aligned. The
> entry code needs to make sure that the stack is aligned before using it to
> save the context.
> 

So, as discussed in this thread:
https://www.spinics.net/lists/arm-kernel/msg688342.html

We might have the compiler provide the guarantee that SP is kept at 
multiples of 16-bytes throughout C functions. If this is accepted, it 
avoids the complexity of this series.

There is just one case remaining, which is less important as it is 
mostly a debug concern. If we take an SP alignment fault from EL1 (due 
to bad implementation) we take recursive exceptions:

- Without CONFIG_VMAP_STACK, kernel just freezes always taking SP 
alignment faults
- With CONFIG_VMAP_STACK after taking a number of exceptions, we 
overflow the stack and the kernel switches to the overflow stack where 
it finally manages to display a kernel panic message

So the question is, do we care about doing something for the above case?

Thanks,
Ard Biesheuvel Nov. 22, 2018, 12:11 p.m. UTC | #18
On Thu, 22 Nov 2018 at 12:49, Julien Thierry <julien.thierry@arm.com> wrote:
>
> Hi,
>
> On 26/09/18 14:56, Julien Thierry wrote:
> > Hi,
> >
> > Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
> > using it to access memory. When taking an exception, it is possible that
> > the context during which the exception occured had SP mis-aligned. The
> > entry code needs to make sure that the stack is aligned before using it to
> > save the context.
> >
>
> So, as discussed in this thread:
> https://www.spinics.net/lists/arm-kernel/msg688342.html
>
> We might have the compiler provide the guarantee that SP is kept at
> multiples of 16-bytes throughout C functions. If this is accepted, it
> avoids the complexity of this series.
>

Yes, but we still want the build time sizeof(pt_regs) check and
perhaps other pieces of this series.

> There is just one case remaining, which is less important as it is
> mostly a debug concern. If we take an SP alignment fault from EL1 (due
> to bad implementation) we take recursive exceptions:
>
> - Without CONFIG_VMAP_STACK, kernel just freezes always taking SP
> alignment faults
> - With CONFIG_VMAP_STACK after taking a number of exceptions, we
> overflow the stack and the kernel switches to the overflow stack where
> it finally manages to display a kernel panic message
>
> So the question is, do we care about doing something for the above case?
>

So in the latter case, it is obvious from the debug output that the
stack pointer was misaligned? I'd expect so, given that each
recursively taken exception has the same cause, and so it would be
clear what happened.

So I'm leaning towards just relying on that, given that
CONFIG_VMAP_STACK is the default, although it is unfortunate that
KASAN still disables it.
Julien Thierry Nov. 22, 2018, 3:10 p.m. UTC | #19
On 22/11/18 12:11, Ard Biesheuvel wrote:
> On Thu, 22 Nov 2018 at 12:49, Julien Thierry <julien.thierry@arm.com> wrote:
>>
>> Hi,
>>
>> On 26/09/18 14:56, Julien Thierry wrote:
>>> Hi,
>>>
>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before
>>> using it to access memory. When taking an exception, it is possible that
>>> the context during which the exception occured had SP mis-aligned. The
>>> entry code needs to make sure that the stack is aligned before using it to
>>> save the context.
>>>
>>
>> So, as discussed in this thread:
>> https://www.spinics.net/lists/arm-kernel/msg688342.html
>>
>> We might have the compiler provide the guarantee that SP is kept at
>> multiples of 16-bytes throughout C functions. If this is accepted, it
>> avoids the complexity of this series.
>>
> 
> Yes, but we still want the build time sizeof(pt_regs) check and
> perhaps other pieces of this series.
> 

Good point.

>> There is just one case remaining, which is less important as it is
>> mostly a debug concern. If we take an SP alignment fault from EL1 (due
>> to bad implementation) we take recursive exceptions:
>>
>> - Without CONFIG_VMAP_STACK, kernel just freezes always taking SP
>> alignment faults
>> - With CONFIG_VMAP_STACK after taking a number of exceptions, we
>> overflow the stack and the kernel switches to the overflow stack where
>> it finally manages to display a kernel panic message
>>
>> So the question is, do we care about doing something for the above case?
>>
> 
> So in the latter case, it is obvious from the debug output that the
> stack pointer was misaligned? I'd expect so, given that each
> recursively taken exception has the same cause, and so it would be
> clear what happened.
> 

So, it is obvious that the error is a misaligned stack pointer as it is 
described in the ESR and as you said, we are always taking the same kind 
of exception.

However the ELR will get overwritten by the recursive exception and its 
last value will be the first instruction accessing memory through SP in 
the exception handler. Meaning we lost the address where the first 
misaligned stack access happened.

But then I don't know whether this case warrants adding complexity to 
the entry code just in case one day fiddles around with the SP and does 
not align it to 16-bytes before accessing memory (or does it while 
interrupts are enabled).

> So I'm leaning towards just relying on that, given that
> CONFIG_VMAP_STACK is the default, although it is unfortunate that
> KASAN still disables it.

Thanks,
Will Deacon Nov. 22, 2018, 3:12 p.m. UTC | #20
On Thu, Nov 22, 2018 at 01:11:43PM +0100, Ard Biesheuvel wrote:
> On Thu, 22 Nov 2018 at 12:49, Julien Thierry <julien.thierry@arm.com> wrote:
> > On 26/09/18 14:56, Julien Thierry wrote:
> > There is just one case remaining, which is less important as it is
> > mostly a debug concern. If we take an SP alignment fault from EL1 (due
> > to bad implementation) we take recursive exceptions:
> >
> > - Without CONFIG_VMAP_STACK, kernel just freezes always taking SP
> > alignment faults
> > - With CONFIG_VMAP_STACK after taking a number of exceptions, we
> > overflow the stack and the kernel switches to the overflow stack where
> > it finally manages to display a kernel panic message
> >
> > So the question is, do we care about doing something for the above case?
> >
> 
> So in the latter case, it is obvious from the debug output that the
> stack pointer was misaligned? I'd expect so, given that each
> recursively taken exception has the same cause, and so it would be
> clear what happened.
> 
> So I'm leaning towards just relying on that, given that
> CONFIG_VMAP_STACK is the default, although it is unfortunate that
> KASAN still disables it.

Whilst it's unfortunate, I don't think it's the end of the world as long
as KASAN remains a debug-only feature. In that case, you'd only enable it
if you were debugging a suspected use-after-free, and I think the splat
you'd see from the overflow is clear enough that it's not one of those.

However, I really would like a written confirmation (i.e. in some
documentation) from GCC that they won't misalign the SP in generated
code before we drop the meat of this series.

Will