Message ID | 1537970184-44348-1-git-send-email-julien.thierry@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | Ensure stack is aligned for kernel entries | expand |
Gentle ping on this series. On 26/09/18 14:56, Julien Thierry wrote: > Hi, > > Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before > using it to access memory. When taking an exception, it is possible that > the context during which the exception occured had SP mis-aligned. The > entry code needs to make sure that the stack is aligned before using it to > save the context. > > This is only a concern when taking exception from an EL using the same > SP_ELx as the handler. In other cases it can be assumed that the SP being > picked up on exception entry is aligned under the condition that SP is > always aligned when doing eret to an EL using a different SP. > > On Juno I see a runtime difference <1% for hackbench. If I do not include > the fast path at EL1 (patch 4) I see a diff of 1-2%. > > For EL2 entries, a bit of clean up of stuff getting patched in the vector > has been needed. > > Cheers, > > Julien > > --> > > Julien Thierry (7): > arm64: Add static check for pt_regs alignment > arm64: sdei: Always use sdei stack for sdei events > arm64: Align stack when taking exception from EL1 > arm64: Add fast-path for stack alignment > arm64: Do not apply BP hardening for hyp entries from EL2 > arm64: Do not apply vector harderning for hyp entries from EL2 > arm64: kvm: Align stack for exception coming from EL2 > > arch/arm64/include/asm/assembler.h | 9 +++++ > arch/arm64/include/asm/ptrace.h | 2 + > arch/arm64/include/asm/sdei.h | 2 - > arch/arm64/kernel/cpu_errata.c | 10 ++++- > arch/arm64/kernel/entry.S | 43 +++++++++++++++++++-- > arch/arm64/kernel/sdei.c | 23 ++++------- > arch/arm64/kvm/hyp/hyp-entry.S | 78 +++++++++++++++++++++++++++++++------- > drivers/firmware/Kconfig | 1 + > 8 files changed, 132 insertions(+), 36 deletions(-) > > -- > 1.9.1 >
Hi Julien, On Wed, Sep 26, 2018 at 02:56:17PM +0100, Julien Thierry wrote: > Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before > using it to access memory. When taking an exception, it is possible that > the context during which the exception occured had SP mis-aligned. The > entry code needs to make sure that the stack is aligned before using it to > save the context. Do you know what we haven't had reports of this crashing? Is it because GCC tends to keep the SP aligned anyway, so we're getting away with it for the moment? Trying to work out whether this is a candidate for -stable. Cheers, Will
Hi Will, On 07/11/18 21:58, Will Deacon wrote: > Hi Julien, > > On Wed, Sep 26, 2018 at 02:56:17PM +0100, Julien Thierry wrote: >> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >> using it to access memory. When taking an exception, it is possible that >> the context during which the exception occured had SP mis-aligned. The >> entry code needs to make sure that the stack is aligned before using it to >> save the context. > > Do you know what we haven't had reports of this crashing? Is it because GCC > tends to keep the SP aligned anyway, so we're getting away with it for the > moment? Trying to work out whether this is a candidate for -stable. > I think that GCC tends to keep the SP aligned anyway is the most likely explanation, yes. I tried looking for specific options that could make this more likely, but all I could find was the option -mpreferred-stack-boundary only available for x86 and -mstack-alignment only provided by clang. Couldn't find anything yet on the gcc arm64 side that would either guarantee we'd have an aligned stack nor that GCC would make it very very likely. I can try to investigate a bit more. Thanks,
On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> wrote: > Hi, > > Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before > using it to access memory. When taking an exception, it is possible that > the context during which the exception occured had SP mis-aligned. How is this possible? GCC clearly only manipulates the stack pointer in 16 byte multiples, and so if we do the same in our asm code (which I think we already do, given the lack of reports about this issue), is this handling really necessary? > The > entry code needs to make sure that the stack is aligned before using it to > save the context. > > This is only a concern when taking exception from an EL using the same > SP_ELx as the handler. In other cases it can be assumed that the SP being > picked up on exception entry is aligned under the condition that SP is > always aligned when doing eret to an EL using a different SP. > > On Juno I see a runtime difference <1% for hackbench. If I do not include > the fast path at EL1 (patch 4) I see a diff of 1-2%. > > For EL2 entries, a bit of clean up of stuff getting patched in the vector > has been needed. > > Cheers, > > Julien > > --> > > Julien Thierry (7): > arm64: Add static check for pt_regs alignment > arm64: sdei: Always use sdei stack for sdei events > arm64: Align stack when taking exception from EL1 > arm64: Add fast-path for stack alignment > arm64: Do not apply BP hardening for hyp entries from EL2 > arm64: Do not apply vector harderning for hyp entries from EL2 > arm64: kvm: Align stack for exception coming from EL2 > > arch/arm64/include/asm/assembler.h | 9 +++++ > arch/arm64/include/asm/ptrace.h | 2 + > arch/arm64/include/asm/sdei.h | 2 - > arch/arm64/kernel/cpu_errata.c | 10 ++++- > arch/arm64/kernel/entry.S | 43 +++++++++++++++++++-- > arch/arm64/kernel/sdei.c | 23 ++++------- > arch/arm64/kvm/hyp/hyp-entry.S | 78 +++++++++++++++++++++++++++++++------- > drivers/firmware/Kconfig | 1 + > 8 files changed, 132 insertions(+), 36 deletions(-) > > -- > 1.9.1 > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 08/11/18 13:04, Ard Biesheuvel wrote: > On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> wrote: >> Hi, >> >> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >> using it to access memory. When taking an exception, it is possible that >> the context during which the exception occured had SP mis-aligned. > > How is this possible? GCC clearly only manipulates the stack pointer > in 16 byte multiples, and so if we do the same in our asm code (which > I think we already do, given the lack of reports about this issue), is > this handling really necessary? > Is there anything that actually gives us that guarantee from GCC? I agree that currently it looks like aarch64-<...>-gcc only manipulates SP aligned to 16 bytes, but I don't know whether that is certain. The series can be dropped if there is enough confidence that this won't happen. Thanks, > >> The >> entry code needs to make sure that the stack is aligned before using it to >> save the context. >> >> This is only a concern when taking exception from an EL using the same >> SP_ELx as the handler. In other cases it can be assumed that the SP being >> picked up on exception entry is aligned under the condition that SP is >> always aligned when doing eret to an EL using a different SP. >> >> On Juno I see a runtime difference <1% for hackbench. If I do not include >> the fast path at EL1 (patch 4) I see a diff of 1-2%. >> >> For EL2 entries, a bit of clean up of stuff getting patched in the vector >> has been needed. >> >> Cheers, >> >> Julien >> >> --> >> >> Julien Thierry (7): >> arm64: Add static check for pt_regs alignment >> arm64: sdei: Always use sdei stack for sdei events >> arm64: Align stack when taking exception from EL1 >> arm64: Add fast-path for stack alignment >> arm64: Do not apply BP hardening for hyp entries from EL2 >> arm64: Do not apply vector harderning for hyp entries from EL2 >> arm64: kvm: Align stack for exception coming from EL2 >> >> arch/arm64/include/asm/assembler.h | 9 +++++ >> arch/arm64/include/asm/ptrace.h | 2 + >> arch/arm64/include/asm/sdei.h | 2 - >> arch/arm64/kernel/cpu_errata.c | 10 ++++- >> arch/arm64/kernel/entry.S | 43 +++++++++++++++++++-- >> arch/arm64/kernel/sdei.c | 23 ++++------- >> arch/arm64/kvm/hyp/hyp-entry.S | 78 +++++++++++++++++++++++++++++++------- >> drivers/firmware/Kconfig | 1 + >> 8 files changed, 132 insertions(+), 36 deletions(-) >> >> -- >> 1.9.1 >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
(+ Ramana) On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote: > > > On 08/11/18 13:04, Ard Biesheuvel wrote: >> >> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> >> wrote: >>> >>> Hi, >>> >>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >>> using it to access memory. When taking an exception, it is possible that >>> the context during which the exception occured had SP mis-aligned. >> >> >> How is this possible? GCC clearly only manipulates the stack pointer >> in 16 byte multiples, and so if we do the same in our asm code (which >> I think we already do, given the lack of reports about this issue), is >> this handling really necessary? >> > > Is there anything that actually gives us that guarantee from GCC? I agree > that currently it looks like aarch64-<...>-gcc only manipulates SP aligned > to 16 bytes, but I don't know whether that is certain. > I think we should get that clarified then. I don't think it makes sense for GCC to have to reason about whether SP currently has a value that permits dereferencing. > The series can be dropped if there is enough confidence that this won't > happen. > > Thanks, > > >> >>> The >>> entry code needs to make sure that the stack is aligned before using it >>> to >>> save the context. >>> >>> This is only a concern when taking exception from an EL using the same >>> SP_ELx as the handler. In other cases it can be assumed that the SP being >>> picked up on exception entry is aligned under the condition that SP is >>> always aligned when doing eret to an EL using a different SP. >>> >>> On Juno I see a runtime difference <1% for hackbench. If I do not include >>> the fast path at EL1 (patch 4) I see a diff of 1-2%. >>> >>> For EL2 entries, a bit of clean up of stuff getting patched in the vector >>> has been needed. >>> >>> Cheers, >>> >>> Julien >>> >>> --> >>> >>> Julien Thierry (7): >>> arm64: Add static check for pt_regs alignment >>> arm64: sdei: Always use sdei stack for sdei events >>> arm64: Align stack when taking exception from EL1 >>> arm64: Add fast-path for stack alignment >>> arm64: Do not apply BP hardening for hyp entries from EL2 >>> arm64: Do not apply vector harderning for hyp entries from EL2 >>> arm64: kvm: Align stack for exception coming from EL2 >>> >>> arch/arm64/include/asm/assembler.h | 9 +++++ >>> arch/arm64/include/asm/ptrace.h | 2 + >>> arch/arm64/include/asm/sdei.h | 2 - >>> arch/arm64/kernel/cpu_errata.c | 10 ++++- >>> arch/arm64/kernel/entry.S | 43 +++++++++++++++++++-- >>> arch/arm64/kernel/sdei.c | 23 ++++------- >>> arch/arm64/kvm/hyp/hyp-entry.S | 78 >>> +++++++++++++++++++++++++++++++------- >>> drivers/firmware/Kconfig | 1 + >>> 8 files changed, 132 insertions(+), 36 deletions(-) >>> >>> -- >>> 1.9.1 >>> >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-kernel@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > -- > Julien Thierry
On 08/11/2018 14:10, Ard Biesheuvel wrote: > (+ Ramana) > > On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote: >> >> >> On 08/11/18 13:04, Ard Biesheuvel wrote: >>> >>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> >>> wrote: >>>> >>>> Hi, >>>> >>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >>>> using it to access memory. When taking an exception, it is possible that >>>> the context during which the exception occured had SP mis-aligned. >>> >>> >>> How is this possible? GCC clearly only manipulates the stack pointer >>> in 16 byte multiples, and so if we do the same in our asm code (which >>> I think we already do, given the lack of reports about this issue), is >>> this handling really necessary? >>> >> >> Is there anything that actually gives us that guarantee from GCC? I agree >> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned >> to 16 bytes, but I don't know whether that is certain. >> > > I think we should get that clarified then. I don't think it makes > sense for GCC to have to reason about whether SP currently has a value > that permits dereferencing. The ABI gives that guarantee. http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf <quote> 5.2.2.1 Universal stack constraints <...> Additionally, at any point at which memory is accessed via SP, the hardware requires that SP mod 16 = 0. The stack must be quad-word aligned </end quote> regards Ramana
On 08/11/18 14:19, Ramana Radhakrishnan wrote: > On 08/11/2018 14:10, Ard Biesheuvel wrote: >> (+ Ramana) >> >> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote: >>> >>> >>> On 08/11/18 13:04, Ard Biesheuvel wrote: >>>> >>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >>>>> using it to access memory. When taking an exception, it is possible that >>>>> the context during which the exception occured had SP mis-aligned. >>>> >>>> >>>> How is this possible? GCC clearly only manipulates the stack pointer >>>> in 16 byte multiples, and so if we do the same in our asm code (which >>>> I think we already do, given the lack of reports about this issue), is >>>> this handling really necessary? >>>> >>> >>> Is there anything that actually gives us that guarantee from GCC? I agree >>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned >>> to 16 bytes, but I don't know whether that is certain. >>> >> >> I think we should get that clarified then. I don't think it makes >> sense for GCC to have to reason about whether SP currently has a value >> that permits dereferencing. > > The ABI gives that guarantee. > > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf > > <quote> > > 5.2.2.1 Universal stack constraints > > <...> > > Additionally, at any point at which memory is accessed > via SP, the hardware requires that SP mod 16 = 0. The stack must be > quad-word aligned > > </end quote> > Thanks Ramana. Sadly I don't think this clarifies things. The issue is that the guarantee is only that SP is aligned when we will use it to access memory, but still allows for a scenario like: -> Updating SP with non 16bytes-aligned value (it's fine as long as the following code takes care to align it before accessing memory) -> IRQ is raised before SP gets aligned -> Vector entry starts saving context on misaligned stack -> Misaligned SP exception The only thing that would avoid us the trouble is a guarantee that GCC never modifies SP in such a way that SP is not 16-bytes aligned. Thanks,
On Thu, Nov 08, 2018 at 02:19:14PM +0000, Ramana Radhakrishnan wrote: > On 08/11/2018 14:10, Ard Biesheuvel wrote: > > (+ Ramana) > > > > On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote: > >> > >> > >> On 08/11/18 13:04, Ard Biesheuvel wrote: > >>> > >>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before > >>>> using it to access memory. When taking an exception, it is possible that > >>>> the context during which the exception occured had SP mis-aligned. > >>> > >>> > >>> How is this possible? GCC clearly only manipulates the stack pointer > >>> in 16 byte multiples, and so if we do the same in our asm code (which > >>> I think we already do, given the lack of reports about this issue), is > >>> this handling really necessary? > >>> > >> > >> Is there anything that actually gives us that guarantee from GCC? I agree > >> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned > >> to 16 bytes, but I don't know whether that is certain. > >> > > > > I think we should get that clarified then. I don't think it makes > > sense for GCC to have to reason about whether SP currently has a value > > that permits dereferencing. > > The ABI gives that guarantee. > > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf > > <quote> Surely This only applies at public interfaces? We make be re-entering the kernel due to an exception taken in the middle of a function. In theory, the compiler is allowed to temporarily misalign SP (i.e., the procedure call standard doesn't forbid that). So, we'd need to be confident that GCC and LLVM and any other compiler we care about _guarantee_ never to do that. (And that there is not asm in the kernel that will do so.) Cheers ---Dave
On 08/11/2018 15:30, Dave Martin wrote: > On Thu, Nov 08, 2018 at 02:19:14PM +0000, Ramana Radhakrishnan wrote: >> On 08/11/2018 14:10, Ard Biesheuvel wrote: >>> (+ Ramana) >>> >>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote: >>>> >>>> >>>> On 08/11/18 13:04, Ard Biesheuvel wrote: >>>>> >>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >>>>>> using it to access memory. When taking an exception, it is possible that >>>>>> the context during which the exception occured had SP mis-aligned. >>>>> >>>>> >>>>> How is this possible? GCC clearly only manipulates the stack pointer >>>>> in 16 byte multiples, and so if we do the same in our asm code (which >>>>> I think we already do, given the lack of reports about this issue), is >>>>> this handling really necessary? >>>>> >>>> >>>> Is there anything that actually gives us that guarantee from GCC? I agree >>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned >>>> to 16 bytes, but I don't know whether that is certain. >>>> >>> >>> I think we should get that clarified then. I don't think it makes >>> sense for GCC to have to reason about whether SP currently has a value >>> that permits dereferencing. >> >> The ABI gives that guarantee. >> >> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf >> >> <quote> > > Surely This only applies at public interfaces? > I don't think this has anything to do with public interfaces. If there is a trap with a 16byte misaligned access of the SP then it doesn't matter whether it's a public interface or not. regards Ramana
On 08/11/2018 15:01, Julien Thierry wrote: > > > On 08/11/18 14:19, Ramana Radhakrishnan wrote: >> On 08/11/2018 14:10, Ard Biesheuvel wrote: >>> (+ Ramana) >>> >>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote: >>>> >>>> >>>> On 08/11/18 13:04, Ard Biesheuvel wrote: >>>>> >>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >>>>>> using it to access memory. When taking an exception, it is possible that >>>>>> the context during which the exception occured had SP mis-aligned. >>>>> >>>>> >>>>> How is this possible? GCC clearly only manipulates the stack pointer >>>>> in 16 byte multiples, and so if we do the same in our asm code (which >>>>> I think we already do, given the lack of reports about this issue), is >>>>> this handling really necessary? >>>>> >>>> >>>> Is there anything that actually gives us that guarantee from GCC? I agree >>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned >>>> to 16 bytes, but I don't know whether that is certain. >>>> >>> >>> I think we should get that clarified then. I don't think it makes >>> sense for GCC to have to reason about whether SP currently has a value >>> that permits dereferencing. >> >> The ABI gives that guarantee. >> >> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf >> >> <quote> >> >> 5.2.2.1 Universal stack constraints >> >> <...> >> >> Additionally, at any point at which memory is accessed >> via SP, the hardware requires that SP mod 16 = 0. The stack must be >> quad-word aligned >> >> </end quote> >> > > Thanks Ramana. Sadly I don't think this clarifies things. The issue is > that the guarantee is only that SP is aligned when we will use it to > access memory, but still allows for a scenario like: That is certainly a correct interpretation of the ABI language. > > -> Updating SP with non 16bytes-aligned value (it's fine as long as the > following code takes care to align it before accessing memory) > -> IRQ is raised before SP gets aligned > -> Vector entry starts saving context on misaligned stack > -> Misaligned SP exception > > The only thing that would avoid us the trouble is a guarantee that GCC > never modifies SP in such a way that SP is not 16-bytes aligned. I think it sort of falls out in the implementation from what I remember and see (and empirically checked with a couple of colleagues). I don't think there is an ABI requirement that SP should left 16 byte aligned even for intermediate calculations. Whether GCC does this or not today is immaterial and for userland it's certainly not the only code generator in town for this sort of question. The question also arises with other jits and other code generators which may leave the stack temporarily not aligned at 16 bytes. So it does sound like the right thing to do in the kernel defensively. regards Ramana > > Thanks, >
On Thu, Nov 08, 2018 at 03:33:01PM +0000, Ramana Radhakrishnan wrote: > On 08/11/2018 15:30, Dave Martin wrote: > > On Thu, Nov 08, 2018 at 02:19:14PM +0000, Ramana Radhakrishnan wrote: > >> On 08/11/2018 14:10, Ard Biesheuvel wrote: > >>> (+ Ramana) > >>> > >>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote: > >>>> > >>>> > >>>> On 08/11/18 13:04, Ard Biesheuvel wrote: > >>>>> > >>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> > >>>>> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before > >>>>>> using it to access memory. When taking an exception, it is possible that > >>>>>> the context during which the exception occured had SP mis-aligned. > >>>>> > >>>>> > >>>>> How is this possible? GCC clearly only manipulates the stack pointer > >>>>> in 16 byte multiples, and so if we do the same in our asm code (which > >>>>> I think we already do, given the lack of reports about this issue), is > >>>>> this handling really necessary? > >>>>> > >>>> > >>>> Is there anything that actually gives us that guarantee from GCC? I agree > >>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned > >>>> to 16 bytes, but I don't know whether that is certain. > >>>> > >>> > >>> I think we should get that clarified then. I don't think it makes > >>> sense for GCC to have to reason about whether SP currently has a value > >>> that permits dereferencing. > >> > >> The ABI gives that guarantee. > >> > >> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf > >> > >> <quote> > > > > Surely This only applies at public interfaces? > > > > I don't think this has anything to do with public interfaces. If there > is a trap with a 16byte misaligned access of the SP then it doesn't > matter whether it's a public interface or not. We're not talking about SP alignment faults here particluarly. We're talking about any exception that may be taken from EL1h to EL1h, which may happen on random instructions inside a function, for random reasons. There was talk about running the kernel mostly in EL1t but I don't think we currently do this (somebody please put me right if I'm wrong here!) Cheers ---Dave
On 08/11/18 15:33, Ramana Radhakrishnan wrote: > On 08/11/2018 15:01, Julien Thierry wrote: >> >> >> On 08/11/18 14:19, Ramana Radhakrishnan wrote: >>> On 08/11/2018 14:10, Ard Biesheuvel wrote: >>>> (+ Ramana) >>>> >>>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote: >>>>> >>>>> >>>>> On 08/11/18 13:04, Ard Biesheuvel wrote: >>>>>> >>>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> >>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >>>>>>> using it to access memory. When taking an exception, it is possible that >>>>>>> the context during which the exception occured had SP mis-aligned. >>>>>> >>>>>> >>>>>> How is this possible? GCC clearly only manipulates the stack pointer >>>>>> in 16 byte multiples, and so if we do the same in our asm code (which >>>>>> I think we already do, given the lack of reports about this issue), is >>>>>> this handling really necessary? >>>>>> >>>>> >>>>> Is there anything that actually gives us that guarantee from GCC? I agree >>>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned >>>>> to 16 bytes, but I don't know whether that is certain. >>>>> >>>> >>>> I think we should get that clarified then. I don't think it makes >>>> sense for GCC to have to reason about whether SP currently has a value >>>> that permits dereferencing. >>> >>> The ABI gives that guarantee. >>> >>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf >>> >>> <quote> >>> >>> 5.2.2.1 Universal stack constraints >>> >>> <...> >>> >>> Additionally, at any point at which memory is accessed >>> via SP, the hardware requires that SP mod 16 = 0. The stack must be >>> quad-word aligned >>> >>> </end quote> >>> >> >> Thanks Ramana. Sadly I don't think this clarifies things. The issue is >> that the guarantee is only that SP is aligned when we will use it to >> access memory, but still allows for a scenario like: > > > That is certainly a correct interpretation of the ABI language. > >> >> -> Updating SP with non 16bytes-aligned value (it's fine as long as the >> following code takes care to align it before accessing memory) >> -> IRQ is raised before SP gets aligned >> -> Vector entry starts saving context on misaligned stack >> -> Misaligned SP exception >> >> The only thing that would avoid us the trouble is a guarantee that GCC >> never modifies SP in such a way that SP is not 16-bytes aligned. > > > I think it sort of falls out in the implementation from what I remember > and see (and empirically checked with a couple of colleagues). > > I don't think there is an ABI requirement that SP should left 16 byte > aligned even for intermediate calculations. > > Whether GCC does this or not today is immaterial and for userland it's > certainly not the only code generator in town for this sort of question. > The question also arises with other jits and other code generators which > may leave the stack temporarily not aligned at 16 bytes. So it does > sound like the right thing to do in the kernel defensively. > I don't think we really mind about userland. EL1 can (and, in Linux, does) use it's own SP. So having an interrupt while EL0 has a misaligned SP is not an issue for the kernel. The cases covered by this series in only for exceptions received at the same EL that will handle that exception. Cheers,
On 8 November 2018 at 16:39, Dave Martin <Dave.Martin@arm.com> wrote: > On Thu, Nov 08, 2018 at 03:33:01PM +0000, Ramana Radhakrishnan wrote: >> On 08/11/2018 15:30, Dave Martin wrote: >> > On Thu, Nov 08, 2018 at 02:19:14PM +0000, Ramana Radhakrishnan wrote: >> >> On 08/11/2018 14:10, Ard Biesheuvel wrote: >> >>> (+ Ramana) >> >>> >> >>> On 8 November 2018 at 14:27, Julien Thierry <julien.thierry@arm.com> wrote: >> >>>> >> >>>> >> >>>> On 08/11/18 13:04, Ard Biesheuvel wrote: >> >>>>> >> >>>>> On 26 September 2018 at 15:56, Julien Thierry <julien.thierry@arm.com> >> >>>>> wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>> >> >>>>>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >> >>>>>> using it to access memory. When taking an exception, it is possible that >> >>>>>> the context during which the exception occured had SP mis-aligned. >> >>>>> >> >>>>> >> >>>>> How is this possible? GCC clearly only manipulates the stack pointer >> >>>>> in 16 byte multiples, and so if we do the same in our asm code (which >> >>>>> I think we already do, given the lack of reports about this issue), is >> >>>>> this handling really necessary? >> >>>>> >> >>>> >> >>>> Is there anything that actually gives us that guarantee from GCC? I agree >> >>>> that currently it looks like aarch64-<...>-gcc only manipulates SP aligned >> >>>> to 16 bytes, but I don't know whether that is certain. >> >>>> >> >>> >> >>> I think we should get that clarified then. I don't think it makes >> >>> sense for GCC to have to reason about whether SP currently has a value >> >>> that permits dereferencing. >> >> >> >> The ABI gives that guarantee. >> >> >> >> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf >> >> >> >> <quote> >> > >> > Surely This only applies at public interfaces? >> > >> >> I don't think this has anything to do with public interfaces. If there >> is a trap with a 16byte misaligned access of the SP then it doesn't >> matter whether it's a public interface or not. > > We're not talking about SP alignment faults here particluarly. > > We're talking about any exception that may be taken from EL1h to EL1h, > which may happen on random instructions inside a function, for random > reasons. > Indeed. But my question remains how likely it is that the compiler we use for generating the kernel code (so we are not talking about userland JITs or other crazy stuff here) would play with SP like that, especially since it is no longer a general purpose register. To me, it would make sense to attempt to reach an agreement with the GCC folks that compiler generated code does not muck about with SP like that. In asm, there is one notable exception, of course, where we swap x0 and sp temporarily in the entry path, but we can't get interrupted there anyway. In any case, if people insist, I'm ok with it. To me, it just seems like unnecessary clutter for a theoretical issue.
> > Indeed. > > But my question remains how likely it is that the compiler we use for > generating the kernel code (so we are not talking about userland JITs > or other crazy stuff here) would play with SP like that, especially > since it is no longer a general purpose register. To me, it would make > sense to attempt to reach an agreement with the GCC folks that > compiler generated code does not muck about with SP like that. It falls out from the choice of instructions we use for this sort of thing and because frame sizes really get rounded up to 16 bytes. In the explanation below assume the stack is aligned to 16 bytes on entry. Frame sizes are always a multiple of 16. For frame sizes up to 240 bytes that's a stp fp, lr, [sp, #-<off>]! For frame sizes up to 4k bytes, that's a single sub instruction. Again all frame sizes are 16 byte aligned and therefore it's all ok. For frame sizes greater than this and up to 64k that's a mov to a temporary register with a 16 bit immediate followed by a single subtract, again not going to leave your stack frame misaligned. From frame sizes above that we split this into a subtract with a multiple of 4k (again 16 byte aligned) and the rest. For frame size above 16MB we use a sequence of mov / movk's with a temporary register and then do a single subtract of SP and therefore that's also fine. I think for the sake of this conversation with respect to compiling the kernel, I think we can safely say that GCC will end up leaving the stack 16 byte aligned even with the intermediate computations and that shouldn't be an issue from my reading of the AArch64 backend. So, probably move on but if you really want to be defensive you may want to carry this patch. However there's nothing that I will say about hand-written assembly (and I say that in the interest of completeness). regards Ramana
On 8 November 2018 at 17:57, Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com> wrote: > >> >> Indeed. >> >> But my question remains how likely it is that the compiler we use for >> generating the kernel code (so we are not talking about userland JITs >> or other crazy stuff here) would play with SP like that, especially >> since it is no longer a general purpose register. To me, it would make >> sense to attempt to reach an agreement with the GCC folks that >> compiler generated code does not muck about with SP like that. > > > It falls out from the choice of instructions we use for this sort of > thing and because frame sizes really get rounded up to 16 bytes. > > In the explanation below assume the stack is aligned to 16 bytes on > entry. Frame sizes are always a multiple of 16. > > For frame sizes up to 240 bytes that's a stp fp, lr, [sp, #-<off>]! > > For frame sizes up to 4k bytes, that's a single sub instruction. Again > all frame sizes are 16 byte aligned and therefore it's all ok. > > For frame sizes greater than this and up to 64k that's a mov to a > temporary register with a 16 bit immediate followed by a single > subtract, again not going to leave your stack frame misaligned. > > From frame sizes above that we split this into a subtract with a > multiple of 4k (again 16 byte aligned) and the rest. > > For frame size above 16MB we use a sequence of mov / movk's with a > temporary register and then do a single subtract of SP and therefore > that's also fine. > Thanks Ramana. In the kernel, stack frames are typically limited to ~1 KB in size, so most of these will never even be encountered in our case. > I think for the sake of this conversation with respect to compiling the > kernel, I think we can safely say that GCC will end up leaving the stack > 16 byte aligned even with the intermediate computations and that > shouldn't be an issue from my reading of the AArch64 backend. > > So, probably move on but if you really want to be defensive you may want > to carry this patch. However there's nothing that I will say about > hand-written assembly (and I say that in the interest of completeness). > indeed, hand written assembly is an entirely different matter, but that is a manageable quantity of code which is currenty clean in this regard, as far as we know.
Hi, On 26/09/18 14:56, Julien Thierry wrote: > Hi, > > Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before > using it to access memory. When taking an exception, it is possible that > the context during which the exception occured had SP mis-aligned. The > entry code needs to make sure that the stack is aligned before using it to > save the context. > So, as discussed in this thread: https://www.spinics.net/lists/arm-kernel/msg688342.html We might have the compiler provide the guarantee that SP is kept at multiples of 16-bytes throughout C functions. If this is accepted, it avoids the complexity of this series. There is just one case remaining, which is less important as it is mostly a debug concern. If we take an SP alignment fault from EL1 (due to bad implementation) we take recursive exceptions: - Without CONFIG_VMAP_STACK, kernel just freezes always taking SP alignment faults - With CONFIG_VMAP_STACK after taking a number of exceptions, we overflow the stack and the kernel switches to the overflow stack where it finally manages to display a kernel panic message So the question is, do we care about doing something for the above case? Thanks,
On Thu, 22 Nov 2018 at 12:49, Julien Thierry <julien.thierry@arm.com> wrote: > > Hi, > > On 26/09/18 14:56, Julien Thierry wrote: > > Hi, > > > > Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before > > using it to access memory. When taking an exception, it is possible that > > the context during which the exception occured had SP mis-aligned. The > > entry code needs to make sure that the stack is aligned before using it to > > save the context. > > > > So, as discussed in this thread: > https://www.spinics.net/lists/arm-kernel/msg688342.html > > We might have the compiler provide the guarantee that SP is kept at > multiples of 16-bytes throughout C functions. If this is accepted, it > avoids the complexity of this series. > Yes, but we still want the build time sizeof(pt_regs) check and perhaps other pieces of this series. > There is just one case remaining, which is less important as it is > mostly a debug concern. If we take an SP alignment fault from EL1 (due > to bad implementation) we take recursive exceptions: > > - Without CONFIG_VMAP_STACK, kernel just freezes always taking SP > alignment faults > - With CONFIG_VMAP_STACK after taking a number of exceptions, we > overflow the stack and the kernel switches to the overflow stack where > it finally manages to display a kernel panic message > > So the question is, do we care about doing something for the above case? > So in the latter case, it is obvious from the debug output that the stack pointer was misaligned? I'd expect so, given that each recursively taken exception has the same cause, and so it would be clear what happened. So I'm leaning towards just relying on that, given that CONFIG_VMAP_STACK is the default, although it is unfortunate that KASAN still disables it.
On 22/11/18 12:11, Ard Biesheuvel wrote: > On Thu, 22 Nov 2018 at 12:49, Julien Thierry <julien.thierry@arm.com> wrote: >> >> Hi, >> >> On 26/09/18 14:56, Julien Thierry wrote: >>> Hi, >>> >>> Having SCTLR_ELx.SA enabled requires the SP to be 16-bytes aligned before >>> using it to access memory. When taking an exception, it is possible that >>> the context during which the exception occured had SP mis-aligned. The >>> entry code needs to make sure that the stack is aligned before using it to >>> save the context. >>> >> >> So, as discussed in this thread: >> https://www.spinics.net/lists/arm-kernel/msg688342.html >> >> We might have the compiler provide the guarantee that SP is kept at >> multiples of 16-bytes throughout C functions. If this is accepted, it >> avoids the complexity of this series. >> > > Yes, but we still want the build time sizeof(pt_regs) check and > perhaps other pieces of this series. > Good point. >> There is just one case remaining, which is less important as it is >> mostly a debug concern. If we take an SP alignment fault from EL1 (due >> to bad implementation) we take recursive exceptions: >> >> - Without CONFIG_VMAP_STACK, kernel just freezes always taking SP >> alignment faults >> - With CONFIG_VMAP_STACK after taking a number of exceptions, we >> overflow the stack and the kernel switches to the overflow stack where >> it finally manages to display a kernel panic message >> >> So the question is, do we care about doing something for the above case? >> > > So in the latter case, it is obvious from the debug output that the > stack pointer was misaligned? I'd expect so, given that each > recursively taken exception has the same cause, and so it would be > clear what happened. > So, it is obvious that the error is a misaligned stack pointer as it is described in the ESR and as you said, we are always taking the same kind of exception. However the ELR will get overwritten by the recursive exception and its last value will be the first instruction accessing memory through SP in the exception handler. Meaning we lost the address where the first misaligned stack access happened. But then I don't know whether this case warrants adding complexity to the entry code just in case one day fiddles around with the SP and does not align it to 16-bytes before accessing memory (or does it while interrupts are enabled). > So I'm leaning towards just relying on that, given that > CONFIG_VMAP_STACK is the default, although it is unfortunate that > KASAN still disables it. Thanks,
On Thu, Nov 22, 2018 at 01:11:43PM +0100, Ard Biesheuvel wrote: > On Thu, 22 Nov 2018 at 12:49, Julien Thierry <julien.thierry@arm.com> wrote: > > On 26/09/18 14:56, Julien Thierry wrote: > > There is just one case remaining, which is less important as it is > > mostly a debug concern. If we take an SP alignment fault from EL1 (due > > to bad implementation) we take recursive exceptions: > > > > - Without CONFIG_VMAP_STACK, kernel just freezes always taking SP > > alignment faults > > - With CONFIG_VMAP_STACK after taking a number of exceptions, we > > overflow the stack and the kernel switches to the overflow stack where > > it finally manages to display a kernel panic message > > > > So the question is, do we care about doing something for the above case? > > > > So in the latter case, it is obvious from the debug output that the > stack pointer was misaligned? I'd expect so, given that each > recursively taken exception has the same cause, and so it would be > clear what happened. > > So I'm leaning towards just relying on that, given that > CONFIG_VMAP_STACK is the default, although it is unfortunate that > KASAN still disables it. Whilst it's unfortunate, I don't think it's the end of the world as long as KASAN remains a debug-only feature. In that case, you'd only enable it if you were debugging a suspected use-after-free, and I think the splat you'd see from the overflow is clear enough that it's not one of those. However, I really would like a written confirmation (i.e. in some documentation) from GCC that they won't misalign the SP in generated code before we drop the meat of this series. Will