riscv: Don't use va_pa_offset on kdump

Message ID	20211002122026.1451269-1-mick@ics.forth.gr (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Ne9D=OW=lists.infradead.org=linux-riscv-bounces+linux-riscv=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2E70F619F6 user: mick@ics.forth.gr at ics.forth.gr From: Nick Kossifidis <mick@ics.forth.gr> To: palmer@dabbelt.com, paul.walmsley@sifive.com, aou@eecs.berkeley.edu Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Nick Kossifidis <mick@ics.forth.gr> Subject: [PATCH] riscv: Don't use va_pa_offset on kdump Date: Sat, 2 Oct 2021 15:20:26 +0300 Message-Id: <20211002122026.1451269-1-mick@ics.forth.gr> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org> Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org
Series	riscv: Don't use va_pa_offset on kdump \| expand riscv: Don't use va_pa_offset on kdump

Message ID

20211002122026.1451269-1-mick@ics.forth.gr (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2E70F619F6
From: Nick Kossifidis <mick@ics.forth.gr>
To: palmer@dabbelt.com, paul.walmsley@sifive.com, aou@eecs.berkeley.edu
Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
 Nick Kossifidis <mick@ics.forth.gr>
Subject: [PATCH] riscv: Don't use va_pa_offset on kdump
Date: Sat,  2 Oct 2021 15:20:26 +0300
Message-Id: <20211002122026.1451269-1-mick@ics.forth.gr>
MIME-Version: 1.0
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org>
Errors-To: 
 linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org

Series

riscv: Don't use va_pa_offset on kdump | expand

Commit Message

Nick Kossifidis Oct. 2, 2021, 12:20 p.m. UTC

On kdump instead of using an intermediate step to relocate the kernel, that
lives in a "control buffer" outside the current kernel's mapping, we jump
to the crash kernel directly by calling riscv_kexec_norelocate(). The
current implementation uses va_pa_offset while switching to physical
addressing, however since we moved the kernel outside the linear mapping
this won't work anymore since riscv_kexec_norelocate() is part of the
kernel mapping and we should use kernel_map.va_kernel_pa_offset, and also
take XIP kernel into account.

We don't really need to use va_pa_offset on riscv_kexec_norelocate, we can
just set STVEC to the physical address of the new kernel instead and let
the hart jump to the new kernel on the next instruction after setting
SATP to zero. This fixes kdump and is also simpler/cleaner.

Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
---
 arch/riscv/kernel/kexec_relocate.S | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

Comments

Alexandre Ghiti Oct. 6, 2021, 11:13 a.m. UTC | #1

On Sat, Oct 2, 2021 at 2:23 PM Nick Kossifidis <mick@ics.forth.gr> wrote:
>
> On kdump instead of using an intermediate step to relocate the kernel, that
> lives in a "control buffer" outside the current kernel's mapping, we jump
> to the crash kernel directly by calling riscv_kexec_norelocate(). The
> current implementation uses va_pa_offset while switching to physical
> addressing, however since we moved the kernel outside the linear mapping
> this won't work anymore since riscv_kexec_norelocate() is part of the
> kernel mapping and we should use kernel_map.va_kernel_pa_offset, and also
> take XIP kernel into account.
>
> We don't really need to use va_pa_offset on riscv_kexec_norelocate, we can
> just set STVEC to the physical address of the new kernel instead and let
> the hart jump to the new kernel on the next instruction after setting
> SATP to zero. This fixes kdump and is also simpler/cleaner.
>
> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> ---
>  arch/riscv/kernel/kexec_relocate.S | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
> index a80b52a74..e2f34196e 100644
> --- a/arch/riscv/kernel/kexec_relocate.S
> +++ b/arch/riscv/kernel/kexec_relocate.S
> @@ -159,25 +159,15 @@ SYM_CODE_START(riscv_kexec_norelocate)
>          * s0: (const) Phys address to jump to
>          * s1: (const) Phys address of the FDT image
>          * s2: (const) The hartid of the current hart
> -        * s3: (const) kernel_map.va_pa_offset, used when switching MMU off
>          */
>         mv      s0, a1
>         mv      s1, a2
>         mv      s2, a3
> -       mv      s3, a4
>
>         /* Disable / cleanup interrupts */
>         csrw    CSR_SIE, zero
>         csrw    CSR_SIP, zero
>
> -       /* Switch to physical addressing */
> -       la      s4, 1f
> -       sub     s4, s4, s3
> -       csrw    CSR_STVEC, s4
> -       csrw    CSR_SATP, zero
> -
> -.align 2
> -1:
>         /* Pass the arguments to the next kernel  / Cleanup*/
>         mv      a0, s2
>         mv      a1, s1
> @@ -214,6 +204,11 @@ SYM_CODE_START(riscv_kexec_norelocate)
>         csrw    CSR_SCAUSE, zero
>         csrw    CSR_SSCRATCH, zero
>
> +       /* Switch to physical addressing */
> +       csrw    CSR_STVEC, a2
> +       csrw    CSR_SATP, zero
> +
> +       /* This will trigger a jump to CSR_STVEC anyway */
>         jalr    zero, a2, 0

The last jump to a2 can be removed since the fault will be triggered
before even reaching this instruction.

>  SYM_CODE_END(riscv_kexec_norelocate)
>
> --
> 2.32.0
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

This patch fixes a regression introduced when moving the kernel to the
end of the address space, so we should add:
Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping")

And it should be backported to 5.13 and 5.14. It seems that the
following tags should be enough:

Cc: <stable@vger.kernel.org> # 5.13
Cc: <stable@vger.kernel.org> # 5.14

And finally, you can add:

Reviewed-by: Alexandre Ghiti <alex@ghiti.fr>

Thanks,

Alex

Nick Kossifidis Oct. 9, 2021, 1:18 p.m. UTC | #2

Στις 2021-10-06 14:13, Alexandre Ghiti έγραψε:
>> +
>> +       /* This will trigger a jump to CSR_STVEC anyway */
>>         jalr    zero, a2, 0
> 
> The last jump to a2 can be removed since the fault will be triggered
> before even reaching this instruction.
> 

Just switching SATP to zero doesn't generate a trap unless mstatus.TVM 
is set (for visualization purposes). The hart will try and execute the 
next instruction but it's not clear in the spec what happens in case the 
code is cached, I don't want to rely solely on STVEC. I prefer having 
this instruction there, note that some earlier QEMU versions also had 
this behavior (the original kdump patch didn't set STVEC and it worked 
fine after setting SATP to zero).

> 
> This patch fixes a regression introduced when moving the kernel to the
> end of the address space, so we should add:
> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear 
> mapping")
> 
> And it should be backported to 5.13 and 5.14. It seems that the
> following tags should be enough:
> 
> Cc: <stable@vger.kernel.org> # 5.13
> Cc: <stable@vger.kernel.org> # 5.14
> 
> And finally, you can add:
> 
> Reviewed-by: Alexandre Ghiti <alex@ghiti.fr>
> 

ACK, thanks ! I'll resend the patch with the tags you mentioned.

Regards,
Nick

Palmer Dabbelt Oct. 23, 2021, 8:14 p.m. UTC | #3

On Sat, 09 Oct 2021 06:18:48 PDT (-0700), mick@ics.forth.gr wrote:
> Στις 2021-10-06 14:13, Alexandre Ghiti έγραψε:
>>> +
>>> +       /* This will trigger a jump to CSR_STVEC anyway */
>>>         jalr    zero, a2, 0
>>
>> The last jump to a2 can be removed since the fault will be triggered
>> before even reaching this instruction.
>>
>
> Just switching SATP to zero doesn't generate a trap unless mstatus.TVM
> is set (for visualization purposes). The hart will try and execute the
> next instruction but it's not clear in the spec what happens in case the
> code is cached, I don't want to rely solely on STVEC. I prefer having
> this instruction there, note that some earlier QEMU versions also had
> this behavior (the original kdump patch didn't set STVEC and it worked
> fine after setting SATP to zero).

IIRC this came down to some very specific wording in the spec.  
Something along the lines of the 0 in SATP meaning "no translation", 
SFENCE.VMA ordering translations, and the general "if the spec doesn't 
mention it then it has to work" logic.  I thought I opened a spec issue 
about this for clarification, but I can't find it.

That said, I'm perfectly fine taking the safe approach here as it's not 
like the performance matters here.  Warrants a comment, though.

>
>>
>> This patch fixes a regression introduced when moving the kernel to the
>> end of the address space, so we should add:
>> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear
>> mapping")
>>
>> And it should be backported to 5.13 and 5.14. It seems that the
>> following tags should be enough:
>>
>> Cc: <stable@vger.kernel.org> # 5.13
>> Cc: <stable@vger.kernel.org> # 5.14
>>
>> And finally, you can add:
>>
>> Reviewed-by: Alexandre Ghiti <alex@ghiti.fr>
>>
>
> ACK, thanks ! I'll resend the patch with the tags you mentioned.

I don't have a v2 in my inbox, did I miss something?  Also, if it's just 
the tags then it's generally not necessary to re-send something.  The 
comment does, though.

LMK if you want me to deal with this, or if there's going to be a v2.

Thanks!

Nick Kossifidis Oct. 25, 2021, 1 a.m. UTC | #4

Στις 2021-10-23 23:14, Palmer Dabbelt έγραψε:
> On Sat, 09 Oct 2021 06:18:48 PDT (-0700), mick@ics.forth.gr wrote:
>> Στις 2021-10-06 14:13, Alexandre Ghiti έγραψε:
>>>> +
>>>> +       /* This will trigger a jump to CSR_STVEC anyway */
>>>>         jalr    zero, a2, 0
>>> 
>>> The last jump to a2 can be removed since the fault will be triggered
>>> before even reaching this instruction.
>>> 
>> 
>> Just switching SATP to zero doesn't generate a trap unless mstatus.TVM
>> is set (for visualization purposes). The hart will try and execute the
>> next instruction but it's not clear in the spec what happens in case 
>> the
>> code is cached, I don't want to rely solely on STVEC. I prefer having
>> this instruction there, note that some earlier QEMU versions also had
>> this behavior (the original kdump patch didn't set STVEC and it worked
>> fine after setting SATP to zero).
> 
> IIRC this came down to some very specific wording in the spec.
> Something along the lines of the 0 in SATP meaning "no translation",
> SFENCE.VMA ordering translations, and the general "if the spec doesn't
> mention it then it has to work" logic.  I thought I opened a spec
> issue about this for clarification, but I can't find it.
> 

I guess you mean this one:
https://github.com/riscv/riscv-isa-manual/issues/538

I couldn't find anything though regarding cached code, it's not that 
there's going to be a load after setting satp to 0 if the code has been 
cached, so even if the translation is cached we don't have a guarantee 
that the next instruction will result a trap.

> That said, I'm perfectly fine taking the safe approach here as it's
> not like the performance matters here.  Warrants a comment, though.
> 

ACK

> 
> I don't have a v2 in my inbox, did I miss something?  Also, if it's
> just the tags then it's generally not necessary to re-send something.
> The comment does, though.
> 
> LMK if you want me to deal with this, or if there's going to be a v2.
> 
> Thanks!

I'll send a v2 with the tags and the comment.

Regards,
Nick

diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index a80b52a74..e2f34196e 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -159,25 +159,15 @@  SYM_CODE_START(riscv_kexec_norelocate)
 	 * s0: (const) Phys address to jump to
 	 * s1: (const) Phys address of the FDT image
 	 * s2: (const) The hartid of the current hart
-	 * s3: (const) kernel_map.va_pa_offset, used when switching MMU off
 	 */
 	mv	s0, a1
 	mv	s1, a2
 	mv	s2, a3
-	mv	s3, a4
 
 	/* Disable / cleanup interrupts */
 	csrw	CSR_SIE, zero
 	csrw	CSR_SIP, zero
 
-	/* Switch to physical addressing */
-	la	s4, 1f
-	sub	s4, s4, s3
-	csrw	CSR_STVEC, s4
-	csrw	CSR_SATP, zero
-
-.align 2
-1:
 	/* Pass the arguments to the next kernel  / Cleanup*/
 	mv	a0, s2
 	mv	a1, s1
@@ -214,6 +204,11 @@  SYM_CODE_START(riscv_kexec_norelocate)
 	csrw	CSR_SCAUSE, zero
 	csrw	CSR_SSCRATCH, zero
 
+	/* Switch to physical addressing */
+	csrw	CSR_STVEC, a2
+	csrw	CSR_SATP, zero
+
+	/* This will trigger a jump to CSR_STVEC anyway */
 	jalr	zero, a2, 0
 SYM_CODE_END(riscv_kexec_norelocate)

riscv: Don't use va_pa_offset on kdump

Commit Message

Comments

Patch