Message ID | 20200319091407.51449-1-remi@remlab.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | clean up KPTI / SDEI trampoline data alignment | expand |
On Thu, Mar 19, 2020 at 11:14:05AM +0200, Rémi Denis-Courmont wrote: > From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > This switches from custom instruction patterns to the regular large > memory model sequence with ADRP and LDR. In doing so, the ADD > instruction can be eliminated in the SDEI handler, and the code no > longer assumes that the trampoline vectors and the vectors address both > start on a page boundary. > > Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > --- > arch/arm64/kernel/entry.S | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > index e5d4e30ee242..24f828739696 100644 > --- a/arch/arm64/kernel/entry.S > +++ b/arch/arm64/kernel/entry.S > @@ -805,9 +805,9 @@ alternative_else_nop_endif > 2: > tramp_map_kernel x30 > #ifdef CONFIG_RANDOMIZE_BASE > - adr x30, tramp_vectors + PAGE_SIZE > + adrp x30, tramp_vectors + PAGE_SIZE > alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 > - ldr x30, [x30] > + ldr x30, [x30, #:lo12:__entry_tramp_data_start] I think this is busted for !4K kernels once we reduce the alignment of __entry_tramp_data_start. The ADRP gives us a 64K aligned address (with bits 15:0 clear). The lo12 relocation gives us bits 11:0, so we haven't accounted for bits 15:12. I think that's what's causing the hang Catalin sees with 64K pages (and would also be a problem for 16K pages). Ideally, we'd account for those bits with the ADRP, but I'm not sure that an ELF relocation can encode symbol + addr + symbol:15-12, so we likely nned more instructions to explicitly mask that in. ... either that, or leave this page aligned. > #else > ldr x30, =vectors > #endif > @@ -953,9 +953,8 @@ SYM_CODE_START(__sdei_asm_entry_trampoline) > 1: str x4, [x1, #(SDEI_EVENT_INTREGS + S_ORIG_ADDR_LIMIT)] > > #ifdef CONFIG_RANDOMIZE_BASE > - adr x4, tramp_vectors + PAGE_SIZE > - add x4, x4, #:lo12:__sdei_asm_trampoline_next_handler > - ldr x4, [x4] > + adrp x4, tramp_vectors + PAGE_SIZE > + ldr x4, [x4, #:lo12:__sdei_asm_trampoline_next_handler] Likewise here. Thanks, Mark. > #else > ldr x4, =__sdei_asm_handler > #endif > -- > 2.26.0.rc2 >
Le maanantaina 23. maaliskuuta 2020, 14.07.00 EET Mark Rutland a écrit : > On Thu, Mar 19, 2020 at 11:14:05AM +0200, Rémi Denis-Courmont wrote: > > From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > > > This switches from custom instruction patterns to the regular large > > memory model sequence with ADRP and LDR. In doing so, the ADD > > instruction can be eliminated in the SDEI handler, and the code no > > longer assumes that the trampoline vectors and the vectors address both > > start on a page boundary. > > > > Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > --- > > > > arch/arm64/kernel/entry.S | 9 ++++----- > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > > index e5d4e30ee242..24f828739696 100644 > > --- a/arch/arm64/kernel/entry.S > > +++ b/arch/arm64/kernel/entry.S > > @@ -805,9 +805,9 @@ alternative_else_nop_endif > > > > 2: > > tramp_map_kernel x30 > > > > #ifdef CONFIG_RANDOMIZE_BASE > > > > - adr x30, tramp_vectors + PAGE_SIZE > > + adrp x30, tramp_vectors + PAGE_SIZE > > > > alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 > > > > - ldr x30, [x30] > > + ldr x30, [x30, #:lo12:__entry_tramp_data_start] > > I think this is busted for !4K kernels once we reduce the alignment of > __entry_tramp_data_start. > > The ADRP gives us a 64K aligned address (with bits 15:0 clear). The lo12 > relocation gives us bits 11:0, so we haven't accounted for bits 15:12. IMU, ADRP gives a 4K aligned value, regardless of MMU (TCR) settings. I rather suspect that the problem is with my C code diff assuming that PAGE_MASK is 4095.
On Mon, Mar 23, 2020 at 02:08:53PM +0200, Rémi Denis-Courmont wrote: > Le maanantaina 23. maaliskuuta 2020, 14.07.00 EET Mark Rutland a écrit : > > On Thu, Mar 19, 2020 at 11:14:05AM +0200, Rémi Denis-Courmont wrote: > > > From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > > > > > This switches from custom instruction patterns to the regular large > > > memory model sequence with ADRP and LDR. In doing so, the ADD > > > instruction can be eliminated in the SDEI handler, and the code no > > > longer assumes that the trampoline vectors and the vectors address both > > > start on a page boundary. > > > > > > Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > > --- > > > > > > arch/arm64/kernel/entry.S | 9 ++++----- > > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > > > index e5d4e30ee242..24f828739696 100644 > > > --- a/arch/arm64/kernel/entry.S > > > +++ b/arch/arm64/kernel/entry.S > > > @@ -805,9 +805,9 @@ alternative_else_nop_endif > > > > > > 2: > > > tramp_map_kernel x30 > > > > > > #ifdef CONFIG_RANDOMIZE_BASE > > > > > > - adr x30, tramp_vectors + PAGE_SIZE > > > + adrp x30, tramp_vectors + PAGE_SIZE > > > > > > alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 > > > > > > - ldr x30, [x30] > > > + ldr x30, [x30, #:lo12:__entry_tramp_data_start] > > > > I think this is busted for !4K kernels once we reduce the alignment of > > __entry_tramp_data_start. > > > > The ADRP gives us a 64K aligned address (with bits 15:0 clear). The lo12 > > relocation gives us bits 11:0, so we haven't accounted for bits 15:12. > > IMU, ADRP gives a 4K aligned value, regardless of MMU (TCR) settings. Sorry, I had erroneously assumed tramp_vectors was page aligned. The issue still stands -- we haven't accounted for bits 15:12, as those can differ between tramp_vectors and __entry_tramp_data_start. Thanks, Mark.
On Mon, Mar 23, 2020 at 12:14:37PM +0000, Mark Rutland wrote: > On Mon, Mar 23, 2020 at 02:08:53PM +0200, Rémi Denis-Courmont wrote: > > Le maanantaina 23. maaliskuuta 2020, 14.07.00 EET Mark Rutland a écrit : > > > On Thu, Mar 19, 2020 at 11:14:05AM +0200, Rémi Denis-Courmont wrote: > > > > From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > > > > > > > This switches from custom instruction patterns to the regular large > > > > memory model sequence with ADRP and LDR. In doing so, the ADD > > > > instruction can be eliminated in the SDEI handler, and the code no > > > > longer assumes that the trampoline vectors and the vectors address both > > > > start on a page boundary. > > > > > > > > Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > > > --- > > > > > > > > arch/arm64/kernel/entry.S | 9 ++++----- > > > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > > > > index e5d4e30ee242..24f828739696 100644 > > > > --- a/arch/arm64/kernel/entry.S > > > > +++ b/arch/arm64/kernel/entry.S > > > > @@ -805,9 +805,9 @@ alternative_else_nop_endif > > > > > > > > 2: > > > > tramp_map_kernel x30 > > > > > > > > #ifdef CONFIG_RANDOMIZE_BASE > > > > > > > > - adr x30, tramp_vectors + PAGE_SIZE > > > > + adrp x30, tramp_vectors + PAGE_SIZE > > > > > > > > alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 > > > > > > > > - ldr x30, [x30] > > > > + ldr x30, [x30, #:lo12:__entry_tramp_data_start] > > > > > > I think this is busted for !4K kernels once we reduce the alignment of > > > __entry_tramp_data_start. > > > > > > The ADRP gives us a 64K aligned address (with bits 15:0 clear). The lo12 > > > relocation gives us bits 11:0, so we haven't accounted for bits 15:12. > > > > IMU, ADRP gives a 4K aligned value, regardless of MMU (TCR) settings. > > Sorry, I had erroneously assumed tramp_vectors was page aligned. The > issue still stands -- we haven't accounted for bits 15:12, as those can > differ between tramp_vectors and __entry_tramp_data_start. Should we just use adrp on __entry_tramp_data_start? Anyway, the diff below doesn't solve the issue I'm seeing (only reverting patch 3). diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index ca1340eb46d8..4cc9d1df3985 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -810,7 +810,7 @@ alternative_else_nop_endif 2: tramp_map_kernel x30 #ifdef CONFIG_RANDOMIZE_BASE - adrp x30, tramp_vectors + PAGE_SIZE + adrp x30, __entry_tramp_data_start alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 ldr x30, [x30, #:lo12:__entry_tramp_data_start] #else @@ -964,7 +964,7 @@ SYM_CODE_START(__sdei_asm_entry_trampoline) 1: str x4, [x1, #(SDEI_EVENT_INTREGS + S_ORIG_ADDR_LIMIT)] #ifdef CONFIG_RANDOMIZE_BASE - adrp x4, tramp_vectors + PAGE_SIZE + adrp x4, __sdei_asm_trampoline_next_handler ldr x4, [x4, #:lo12:__sdei_asm_trampoline_next_handler] #else ldr x4, =__sdei_asm_handler
Le maanantaina 23. maaliskuuta 2020, 21.04.09 EET Catalin Marinas a écrit : > On Mon, Mar 23, 2020 at 12:14:37PM +0000, Mark Rutland wrote: > > On Mon, Mar 23, 2020 at 02:08:53PM +0200, Rémi Denis-Courmont wrote: > > > Le maanantaina 23. maaliskuuta 2020, 14.07.00 EET Mark Rutland a écrit : > > > > On Thu, Mar 19, 2020 at 11:14:05AM +0200, Rémi Denis-Courmont wrote: > > > > > From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > > > > > > > > > This switches from custom instruction patterns to the regular large > > > > > memory model sequence with ADRP and LDR. In doing so, the ADD > > > > > instruction can be eliminated in the SDEI handler, and the code no > > > > > longer assumes that the trampoline vectors and the vectors address > > > > > both > > > > > start on a page boundary. > > > > > > > > > > Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > > > > --- > > > > > > > > > > arch/arm64/kernel/entry.S | 9 ++++----- > > > > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > > > > > index e5d4e30ee242..24f828739696 100644 > > > > > --- a/arch/arm64/kernel/entry.S > > > > > +++ b/arch/arm64/kernel/entry.S > > > > > @@ -805,9 +805,9 @@ alternative_else_nop_endif > > > > > > > > > > 2: > > > > > tramp_map_kernel x30 > > > > > > > > > > #ifdef CONFIG_RANDOMIZE_BASE > > > > > > > > > > - adr x30, tramp_vectors + PAGE_SIZE > > > > > + adrp x30, tramp_vectors + PAGE_SIZE > > > > > > > > > > alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 > > > > > > > > > > - ldr x30, [x30] > > > > > + ldr x30, [x30, #:lo12:__entry_tramp_data_start] > > > > > > > > I think this is busted for !4K kernels once we reduce the alignment of > > > > __entry_tramp_data_start. > > > > > > > > The ADRP gives us a 64K aligned address (with bits 15:0 clear). The > > > > lo12 > > > > relocation gives us bits 11:0, so we haven't accounted for bits 15:12. > > > > > > IMU, ADRP gives a 4K aligned value, regardless of MMU (TCR) settings. > > > > Sorry, I had erroneously assumed tramp_vectors was page aligned. The > > issue still stands -- we haven't accounted for bits 15:12, as those can > > differ between tramp_vectors and __entry_tramp_data_start. Does that mean that the SDEI code never worked with page size > 4 KiB? > Should we just use adrp on __entry_tramp_data_start? Anyway, the diff > below doesn't solve the issue I'm seeing (only reverting patch 3). AFAIU, the preexisting code uses the manual PAGE_SIZE offset because the offset in the main vmlinux does not match the architected offset inside the fixmap. If so, then using the symbol directly will not work at all. > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > index ca1340eb46d8..4cc9d1df3985 100644 > --- a/arch/arm64/kernel/entry.S > +++ b/arch/arm64/kernel/entry.S > @@ -810,7 +810,7 @@ alternative_else_nop_endif > 2: > tramp_map_kernel x30 > #ifdef CONFIG_RANDOMIZE_BASE > - adrp x30, tramp_vectors + PAGE_SIZE > + adrp x30, __entry_tramp_data_start > alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 > ldr x30, [x30, #:lo12:__entry_tramp_data_start] > #else > @@ -964,7 +964,7 @@ SYM_CODE_START(__sdei_asm_entry_trampoline) > 1: str x4, [x1, #(SDEI_EVENT_INTREGS + S_ORIG_ADDR_LIMIT)] > > #ifdef CONFIG_RANDOMIZE_BASE > - adrp x4, tramp_vectors + PAGE_SIZE > + adrp x4, __sdei_asm_trampoline_next_handler > ldr x4, [x4, #:lo12:__sdei_asm_trampoline_next_handler] > #else > ldr x4, =__sdei_asm_handler
On Mon, Mar 23, 2020 at 10:42:30PM +0200, Rémi Denis-Courmont wrote: > Le maanantaina 23. maaliskuuta 2020, 21.04.09 EET Catalin Marinas a écrit : > > Should we just use adrp on __entry_tramp_data_start? Anyway, the diff > > below doesn't solve the issue I'm seeing (only reverting patch 3). > > AFAIU, the preexisting code uses the manual PAGE_SIZE offset because the offset > in the main vmlinux does not match the architected offset inside the fixmap. If > so, then using the symbol directly will not work at all. You are right, it broke the defconfig as well.
On Mon, Mar 23, 2020 at 10:42:30PM +0200, Rémi Denis-Courmont wrote: > Le maanantaina 23. maaliskuuta 2020, 21.04.09 EET Catalin Marinas a écrit : > > On Mon, Mar 23, 2020 at 12:14:37PM +0000, Mark Rutland wrote: > > > On Mon, Mar 23, 2020 at 02:08:53PM +0200, Rémi Denis-Courmont wrote: > > > > Le maanantaina 23. maaliskuuta 2020, 14.07.00 EET Mark Rutland a écrit : > > > > > On Thu, Mar 19, 2020 at 11:14:05AM +0200, Rémi Denis-Courmont wrote: > > > > > > From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > > > > > > > > > > > This switches from custom instruction patterns to the regular large > > > > > > memory model sequence with ADRP and LDR. In doing so, the ADD > > > > > > instruction can be eliminated in the SDEI handler, and the code no > > > > > > longer assumes that the trampoline vectors and the vectors address > > > > > > both > > > > > > start on a page boundary. > > > > > > > > > > > > Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com> > > > > > > --- > > > > > > > > > > > > arch/arm64/kernel/entry.S | 9 ++++----- > > > > > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > > > > > > index e5d4e30ee242..24f828739696 100644 > > > > > > --- a/arch/arm64/kernel/entry.S > > > > > > +++ b/arch/arm64/kernel/entry.S > > > > > > @@ -805,9 +805,9 @@ alternative_else_nop_endif > > > > > > > > > > > > 2: > > > > > > tramp_map_kernel x30 > > > > > > > > > > > > #ifdef CONFIG_RANDOMIZE_BASE > > > > > > > > > > > > - adr x30, tramp_vectors + PAGE_SIZE > > > > > > + adrp x30, tramp_vectors + PAGE_SIZE > > > > > > > > > > > > alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 > > > > > > > > > > > > - ldr x30, [x30] > > > > > > + ldr x30, [x30, #:lo12:__entry_tramp_data_start] > > > > > > > > > > I think this is busted for !4K kernels once we reduce the alignment of > > > > > __entry_tramp_data_start. > > > > > > > > > > The ADRP gives us a 64K aligned address (with bits 15:0 clear). The > > > > > lo12 > > > > > relocation gives us bits 11:0, so we haven't accounted for bits 15:12. > > > > > > > > IMU, ADRP gives a 4K aligned value, regardless of MMU (TCR) settings. > > > > > > Sorry, I had erroneously assumed tramp_vectors was page aligned. The > > > issue still stands -- we haven't accounted for bits 15:12, as those can > > > differ between tramp_vectors and __entry_tramp_data_start. > > Does that mean that the SDEI code never worked with page size > 4 KiB? I think this happens to work, but is fragile. Because nothing happens to get placed in .rodata between the _entry_tramp_data_start data and the __sdei_asm_trampoline_next_handler data, the __sdei_asm_trampoline_next_handler data doesn't spill into a separate page from the _entry_tramp_data_start data. If we did start adding stuff into .rodata between those two, there'd be a bigger risk of things going wrong. That was why I suggested a .entry.tramp.data section previously. > > Should we just use adrp on __entry_tramp_data_start? Anyway, the diff > > below doesn't solve the issue I'm seeing (only reverting patch 3). > > AFAIU, the preexisting code uses the manual PAGE_SIZE offset because the offset > in the main vmlinux does not match the architected offset inside the fixmap. If > so, then using the symbol directly will not work at all. Indeed. I can't see a neat way of avoiding this right now, so should we drop these patches and leave the code as-is (but with comments as to the special requirements that it has)? Thanks, Mark.
On Tue, Mar 24, 2020 at 10:52:17AM +0000, Mark Rutland wrote: > On Mon, Mar 23, 2020 at 10:42:30PM +0200, Rémi Denis-Courmont wrote: > > Le maanantaina 23. maaliskuuta 2020, 21.04.09 EET Catalin Marinas a écrit : > > > Should we just use adrp on __entry_tramp_data_start? Anyway, the diff > > > below doesn't solve the issue I'm seeing (only reverting patch 3). > > > > AFAIU, the preexisting code uses the manual PAGE_SIZE offset because the offset > > in the main vmlinux does not match the architected offset inside the fixmap. If > > so, then using the symbol directly will not work at all. > > Indeed. I can't see a neat way of avoiding this right now, so should we > drop these patches and leave the code as-is (but with comments as to the > special requirements that it has)? I'm going to drop these three patches from -next for now but I can take any updated comments (they are pretty much missing from this code). Thanks.
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index e5d4e30ee242..24f828739696 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -805,9 +805,9 @@ alternative_else_nop_endif 2: tramp_map_kernel x30 #ifdef CONFIG_RANDOMIZE_BASE - adr x30, tramp_vectors + PAGE_SIZE + adrp x30, tramp_vectors + PAGE_SIZE alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 - ldr x30, [x30] + ldr x30, [x30, #:lo12:__entry_tramp_data_start] #else ldr x30, =vectors #endif @@ -953,9 +953,8 @@ SYM_CODE_START(__sdei_asm_entry_trampoline) 1: str x4, [x1, #(SDEI_EVENT_INTREGS + S_ORIG_ADDR_LIMIT)] #ifdef CONFIG_RANDOMIZE_BASE - adr x4, tramp_vectors + PAGE_SIZE - add x4, x4, #:lo12:__sdei_asm_trampoline_next_handler - ldr x4, [x4] + adrp x4, tramp_vectors + PAGE_SIZE + ldr x4, [x4, #:lo12:__sdei_asm_trampoline_next_handler] #else ldr x4, =__sdei_asm_handler #endif