diff mbox series

[V3] riscv: kexec: Fixup synchronization problem between init_mm and active_mm

Message ID 20230713150758.2956316-1-guoren@kernel.org (mailing list archive)
State Superseded
Headers show
Series [V3] riscv: kexec: Fixup synchronization problem between init_mm and active_mm | expand

Checks

Context Check Description
conchuod/cover_letter success Single patches do not need cover letters
conchuod/tree_selection success Guessed tree name to be fixes at HEAD b690e266dae2
conchuod/fixes_present success Fixes tag present in non-next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 4 and now 4
conchuod/verify_signedoff success Signed-off-by tag matches author and committer
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig success Errors and warnings before: 46 this patch: 46
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 306 this patch: 306
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 3 this patch: 3
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch warning CHECK: Comparison to NULL could be written "!control_code_buffer" CHECK: spaces preferred around that '/' (ctx:VxV)
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success Fixes tag looks correct
conchuod/build_rv64_nommu_virt_defconfig success Build OK

Commit Message

Guo Ren July 13, 2023, 3:07 p.m. UTC
From: Guo Ren <guoren@linux.alibaba.com>

The machine_kexec() uses set_memory_x to modify the direct mapping
attributes from RW to RWX. The current implementation of set_memory_x
does not split hugepages in the linear mapping and then when a PGD
mapping is used, the whole PGD is marked as executable. But changing
the permissions at the PGD level must be propagated to all the page
tables. When kexec jumps into control_buffer, the instruction page
fault happens, and there is no minor_pagefault for it, then panic.

The bug is found on an MMU_sv39 machine, and the direct mapping used a
1GB PUD, the pgd entries. Here is the bug output:

 kexec_core: Starting new kernel
 Will call new kernel at 00300000 from hart id 0
 FDT image at 747c7000
 Bye...
 Unable to handle kernel paging request at virtual address ffffffda23b0d000
 Oops [#1]
 Modules linked in:
 CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15
 Hardware name: Sophgo Mango (DT)
 epc : 0xffffffda23b0d000
  ra : machine_kexec+0xa6/0xb0
 epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10
  gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000
  t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50
  s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000
  a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000
  a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff
  s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000
  s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
  s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
  s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af
  t5 : ffffffff815351b0 t6 : ffffffc80c173b50
 status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c

Given the current flaw in the set_memory_x implementation, the simplest
solution is to fix machine_kexec() to remap control code page outside
the linear mapping.

Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
Changelog:
V3:
 - Resume set_memory_x to set the _PAGE_EXEC attribute
 - Optimize the commit log with Alexandre advice

V2:
 - Use vm_map_ram instead of modifying set_memory_x
 - Correct Fixes tag
---
 arch/riscv/include/asm/kexec.h    |  1 +
 arch/riscv/kernel/machine_kexec.c | 13 +++++++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

Comments

Guo Ren July 14, 2023, 3:55 a.m. UTC | #1
Sorry, this patch is broken with riscv_kexec_relocate which depends on
direct mapping. I'm debugging on it. Please abandon it.

On Thu, Jul 13, 2023 at 11:08 AM <guoren@kernel.org> wrote:
>
> From: Guo Ren <guoren@linux.alibaba.com>
>
> The machine_kexec() uses set_memory_x to modify the direct mapping
> Aattributes from RW to RWX. The current implementation of set_memory_x

> does not split hugepages in the linear mapping and then when a PGD
> mapping is used, the whole PGD is marked as executable. But changing
> the permissions at the PGD level must be propagated to all the page
> tables. When kexec jumps into control_buffer, the instruction page
> fault happens, and there is no minor_pagefault for it, then panic.
>
> The bug is found on an MMU_sv39 machine, and the direct mapping used a
> 1GB PUD, the pgd entries. Here is the bug output:
>
>  kexec_core: Starting new kernel
>  Will call new kernel at 00300000 from hart id 0
>  FDT image at 747c7000
>  Bye...
>  Unable to handle kernel paging request at virtual address ffffffda23b0d000
>  Oops [#1]
>  Modules linked in:
>  CPU: 0 PID: 53 Comm: uinit Not tainted 6.4.0-rc6 #15
>  Hardware name: Sophgo Mango (DT)
>  epc : 0xffffffda23b0d000
>   ra : machine_kexec+0xa6/0xb0
>  epc : ffffffda23b0d000 ra : ffffffff80008272 sp : ffffffc80c173d10
>   gp : ffffffff8150e1e0 tp : ffffffd9073d2c40 t0 : 0000000000000000
>   t1 : 0000000000000042 t2 : 6567616d69205444 s0 : ffffffc80c173d50
>   s1 : ffffffd9076c4800 a0 : ffffffd9076c4800 a1 : 0000000000300000
>   a2 : 00000000747c7000 a3 : 0000000000000000 a4 : ffffffd800000000
>   a5 : 0000000000000000 a6 : ffffffd903619c40 a7 : ffffffffffffffff
>   s2 : ffffffda23b0d000 s3 : 0000000000300000 s4 : 00000000747c7000
>   s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
>   s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
>   s11: 0000003f940001a0 t3 : ffffffff815351af t4 : ffffffff815351af
>   t5 : ffffffff815351b0 t6 : ffffffc80c173b50
>  status: 0000000200000100 badaddr: ffffffda23b0d000 cause: 000000000000000c
>
> Given the current flaw in the set_memory_x implementation, the simplest
> solution is to fix machine_kexec() to remap control code page outside
> the linear mapping.
>
> Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
> Changelog:
> V3:
>  - Resume set_memory_x to set the _PAGE_EXEC attribute
>  - Optimize the commit log with Alexandre advice
>
> V2:
>  - Use vm_map_ram instead of modifying set_memory_x
>  - Correct Fixes tag
> ---
>  arch/riscv/include/asm/kexec.h    |  1 +
>  arch/riscv/kernel/machine_kexec.c | 13 +++++++++++--
>  2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
> index 2b56769cb530..17456e91476e 100644
> --- a/arch/riscv/include/asm/kexec.h
> +++ b/arch/riscv/include/asm/kexec.h
> @@ -41,6 +41,7 @@ crash_setup_regs(struct pt_regs *newregs,
>  struct kimage_arch {
>         void *fdt; /* For CONFIG_KEXEC_FILE */
>         unsigned long fdt_addr;
> +       void *control_code_buffer;
>  };
>
>  extern const unsigned char riscv_kexec_relocate[];
> diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
> index 2d139b724bc8..83b499178902 100644
> --- a/arch/riscv/kernel/machine_kexec.c
> +++ b/arch/riscv/kernel/machine_kexec.c
> @@ -86,7 +86,14 @@ machine_kexec_prepare(struct kimage *image)
>
>         /* Copy the assembler code for relocation to the control page */
>         if (image->type != KEXEC_TYPE_CRASH) {
> -               control_code_buffer = page_address(image->control_code_page);
> +               control_code_buffer = vm_map_ram(&image->control_code_page,
> +                                                KEXEC_CONTROL_PAGE_SIZE/PAGE_SIZE,
> +                                                NUMA_NO_NODE);
> +               if (control_code_buffer == NULL) {
> +                       pr_err("Failed to vm_map control page\n");
> +                       return -ENOMEM;
> +               }
> +
>                 control_code_buffer_sz = page_size(image->control_code_page);
>
>                 if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) {
> @@ -99,6 +106,8 @@ machine_kexec_prepare(struct kimage *image)
>
>                 /* Mark the control page executable */
>                 set_memory_x((unsigned long) control_code_buffer, 1);
> +
> +               internal->control_code_buffer = control_code_buffer;
>         }
>
>         return 0;
> @@ -211,7 +220,7 @@ machine_kexec(struct kimage *image)
>         unsigned long this_cpu_id = __smp_processor_id();
>         unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id);
>         unsigned long fdt_addr = internal->fdt_addr;
> -       void *control_code_buffer = page_address(image->control_code_page);
> +       void *control_code_buffer = internal->control_code_buffer;
>         riscv_kexec_method kexec_method = NULL;
>
>  #ifdef CONFIG_SMP
> --
> 2.36.1
>


--
Best Regards
 Guo Ren
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
index 2b56769cb530..17456e91476e 100644
--- a/arch/riscv/include/asm/kexec.h
+++ b/arch/riscv/include/asm/kexec.h
@@ -41,6 +41,7 @@  crash_setup_regs(struct pt_regs *newregs,
 struct kimage_arch {
 	void *fdt; /* For CONFIG_KEXEC_FILE */
 	unsigned long fdt_addr;
+	void *control_code_buffer;
 };
 
 extern const unsigned char riscv_kexec_relocate[];
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index 2d139b724bc8..83b499178902 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -86,7 +86,14 @@  machine_kexec_prepare(struct kimage *image)
 
 	/* Copy the assembler code for relocation to the control page */
 	if (image->type != KEXEC_TYPE_CRASH) {
-		control_code_buffer = page_address(image->control_code_page);
+		control_code_buffer = vm_map_ram(&image->control_code_page,
+						 KEXEC_CONTROL_PAGE_SIZE/PAGE_SIZE,
+						 NUMA_NO_NODE);
+		if (control_code_buffer == NULL) {
+			pr_err("Failed to vm_map control page\n");
+			return -ENOMEM;
+		}
+
 		control_code_buffer_sz = page_size(image->control_code_page);
 
 		if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) {
@@ -99,6 +106,8 @@  machine_kexec_prepare(struct kimage *image)
 
 		/* Mark the control page executable */
 		set_memory_x((unsigned long) control_code_buffer, 1);
+
+		internal->control_code_buffer = control_code_buffer;
 	}
 
 	return 0;
@@ -211,7 +220,7 @@  machine_kexec(struct kimage *image)
 	unsigned long this_cpu_id = __smp_processor_id();
 	unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id);
 	unsigned long fdt_addr = internal->fdt_addr;
-	void *control_code_buffer = page_address(image->control_code_page);
+	void *control_code_buffer = internal->control_code_buffer;
 	riscv_kexec_method kexec_method = NULL;
 
 #ifdef CONFIG_SMP