Message ID | 1612140296-12546-1-git-send-email-giancarlo.ferrari89@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ARM: kexec: Fix panic after TLB are invalidated | expand |
I wish others who know this code would get involved, and such stuff wasn't left to me to research and work out whether a patch is correct or not. On Mon, Feb 01, 2021 at 12:44:56AM +0000, Giancarlo Ferrari wrote: > machine_kexec() need to set rw permission in text and rodata sections > to assign some variables (e.g. kexec_start_address). To do that at > the end (after flushing pdm in memory, etc.) it needs to invalidate > TLB [section] entries. > > If during the TLB invalidation an interrupt occours, which might cause > a context switch, there is the risk to inject invalid TLBs, with ro > permissions. > > When trying to assign .text labels, this lead to the following: > > Unable to handle kernel paging request at virtual address 80112f38 > pgd = fd7ef03e > [80112f38] *pgd=0001141e(bad) > Internal error: Oops: 80d [#1] PREEMPT SMP ARM > ... > > Signed-off-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> I don't know this code very well, but I don't think this patch is correct. What happens if we have CRASH_DUMP enabled, and we enter this function with IRQs already disabled? Should we really be re-enabling IRQs?
On Mon, Feb 01, 2021 at 12:44:56AM +0000, Giancarlo Ferrari wrote: > machine_kexec() need to set rw permission in text and rodata sections > to assign some variables (e.g. kexec_start_address). To do that at > the end (after flushing pdm in memory, etc.) it needs to invalidate > TLB [section] entries. It'd be worth noting explicitly that set_kernel_text_rw() alters current->active_mm... > If during the TLB invalidation an interrupt occours, which might cause > a context switch, there is the risk to inject invalid TLBs, with ro > permissions. ... which is why if there's a context switch things can go wrong, since active_mm isn't stable, and so it's possible that set_kernel_text_rw() updates multiple tables, none of which might be the active table at the point we try to make an access. It would be nice to spell that out rather than saying "invalid TLBs". We could disable preemption to prevent that, which is possibly better than disabling interrupts. Overall, it would be much better to avoid having to mess with the kernel page tables. So rather than going: 1. mark kernel RW 2. alter variables in reloc code 3. copy reloc code into buffer 4. branch to buffer ... we should be able to go: 1. copy reloc code into buffer 2. alter variables in copy of reloc code 3. branch to buffer ... which would avoid this class of problem too. Thanks, Mark. > When trying to assign .text labels, this lead to the following: > > Unable to handle kernel paging request at virtual address 80112f38 > pgd = fd7ef03e > [80112f38] *pgd=0001141e(bad) > Internal error: Oops: 80d [#1] PREEMPT SMP ARM > ... > > Signed-off-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> > --- > arch/arm/kernel/machine_kexec.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index 5d84ad3..23e8816 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -174,6 +174,13 @@ void machine_kexec(struct kimage *image) > > reboot_code_buffer = page_address(image->control_code_page); > > + /* > + * If below part is not atomic TLB entries might be corrupted after TLB > + * invalidation, which leads to Data Abort in .text variable assignment > + */ > + raw_local_irq_disable(); > + local_fiq_disable(); > + > /* Prepare parameters for reboot_code_buffer*/ > set_kernel_text_rw(); > kexec_start_address = image->start; > @@ -181,6 +188,9 @@ void machine_kexec(struct kimage *image) > kexec_mach_type = machine_arch_type; > kexec_boot_atags = image->arch.kernel_r2; > > + local_fiq_enable(); > + raw_local_irq_enable(); > + > /* copy our kernel relocation code to the control code page */ > reboot_entry = fncpy(reboot_code_buffer, > &relocate_new_kernel, > -- > 2.7.4 >
On Mon, Feb 01, 2021 at 12:47:20PM +0000, Mark Rutland wrote: > 1. copy reloc code into buffer > 2. alter variables in copy of reloc code > 3. branch to buffer > > ... which would avoid this class of problem too. Yep, slightly messy to do though: diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index 5d84ad333f05..6058e0d3a40d 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -174,18 +174,27 @@ void machine_kexec(struct kimage *image) reboot_code_buffer = page_address(image->control_code_page); - /* Prepare parameters for reboot_code_buffer*/ - set_kernel_text_rw(); - kexec_start_address = image->start; - kexec_indirection_page = page_list; - kexec_mach_type = machine_arch_type; - kexec_boot_atags = image->arch.kernel_r2; - /* copy our kernel relocation code to the control code page */ reboot_entry = fncpy(reboot_code_buffer, &relocate_new_kernel, relocate_new_kernel_size); +#define set(what, val) \ + do { \ + uintptr_t __funcp_address; \ + int __offset; \ + void *__ptr; \ + asm("" : "=r" (__funcp_address) : "0" (&relocate_new_kernel)); \ + __offset = (uintptr_t)&(what) - (__funcp_address & ~1); \ + __ptr = reboot_code_buffer + __offset; \ + *(__typeof__(&(what)))__ptr = val; \ + } while (0) + + set(kexec_start_address, image->start); + set(kexec_indirection_page, page_list); + set(kexec_mach_type, machine_arch_type); + set(kexec_boot_atags, image->arch.kernel_r2); + /* get the identity mapping physical address for the reboot code */ reboot_entry_phys = virt_to_idmap(reboot_entry);
On Mon, Feb 01, 2021 at 01:03:45PM +0000, Russell King - ARM Linux admin wrote: > On Mon, Feb 01, 2021 at 12:47:20PM +0000, Mark Rutland wrote: > > 1. copy reloc code into buffer > > 2. alter variables in copy of reloc code > > 3. branch to buffer > > > > ... which would avoid this class of problem too. > > Yep, slightly messy to do though: > > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index 5d84ad333f05..6058e0d3a40d 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -174,18 +174,27 @@ void machine_kexec(struct kimage *image) > > reboot_code_buffer = page_address(image->control_code_page); > > - /* Prepare parameters for reboot_code_buffer*/ > - set_kernel_text_rw(); > - kexec_start_address = image->start; > - kexec_indirection_page = page_list; > - kexec_mach_type = machine_arch_type; > - kexec_boot_atags = image->arch.kernel_r2; > - > /* copy our kernel relocation code to the control code page */ > reboot_entry = fncpy(reboot_code_buffer, > &relocate_new_kernel, > relocate_new_kernel_size); > > +#define set(what, val) \ > + do { \ > + uintptr_t __funcp_address; \ > + int __offset; \ > + void *__ptr; \ > + asm("" : "=r" (__funcp_address) : "0" (&relocate_new_kernel)); \ > + __offset = (uintptr_t)&(what) - (__funcp_address & ~1); \ > + __ptr = reboot_code_buffer + __offset; \ > + *(__typeof__(&(what)))__ptr = val; \ > + } while (0) > + > + set(kexec_start_address, image->start); > + set(kexec_indirection_page, page_list); > + set(kexec_mach_type, machine_arch_type); > + set(kexec_boot_atags, image->arch.kernel_r2); We could simplify this slightly if we moved the kexec_& variables into a struct (using asm-offset KEXEC_VAR_* offsets and a KEXEC_VAR_SIZE region reserved in the asm), then here we could do something like: static struct kexec_vars *kexec_buffer_vars(void *buffer) { unsigned long code = ((unisigned long)relocate_new_kernel) & ~1; unsigned long vars - (unsigned long)relocate_vars; unsigned long offset = vars - code; return buffer + offset; } ... and in machine_kexec() do: struct kexec_vars *kv = kexec_buffer_vars(reboot_code_buffer); kv->start_address = image->start; kv->indirection_page = page_list; kv->mach_type = machine-arch_type; kv->boot_atags = arch.kernel_r2; ... if that looks any better to you? Mark.
Hi, On Mon, Feb 01, 2021 at 12:47:20PM +0000, Mark Rutland wrote: > On Mon, Feb 01, 2021 at 12:44:56AM +0000, Giancarlo Ferrari wrote: > > machine_kexec() need to set rw permission in text and rodata sections > > to assign some variables (e.g. kexec_start_address). To do that at > > the end (after flushing pdm in memory, etc.) it needs to invalidate > > TLB [section] entries. > > It'd be worth noting explicitly that set_kernel_text_rw() alters > current->active_mm... > > > If during the TLB invalidation an interrupt occours, which might cause > > a context switch, there is the risk to inject invalid TLBs, with ro > > permissions. > > ... which is why if there's a context switch things can go wrong, since > active_mm isn't stable, and so it's possible that set_kernel_text_rw() > updates multiple tables, none of which might be the active table at the > point we try to make an access. > Maybe the behaviour causing issue is not completely clear to me, and I do apologize for that (moreover I haven't eougth debug capabilities). However, current-active_mm is switched among context switches. Correct ? So, in principle, the invalidation, if stopped, is carried on where it left. I thought the issue was that the PageTable entry for the section 0x8010_0000 is global, thus not indexed by ASID (Address Space ID). By the fact that each process has its own version of that entry, is the cause of the issue, as the schedule process might bringing a spurious entry (with ro permission) in the MMU cache. If the entry is not global holds the ASID, and the issue cannot happen. Please note that this behaviour was tested on a armv7 arch board. > It would be nice to spell that out rather than saying "invalid TLBs". > > We could disable preemption to prevent that, which is possibly better > than disabling interrupts. > > Overall, it would be much better to avoid having to mess with the kernel > page tables. So rather than going: > > 1. mark kernel RW > 2. alter variables in reloc code > 3. copy reloc code into buffer > 4. branch to buffer > > ... we should be able to go: > > 1. copy reloc code into buffer > 2. alter variables in copy of reloc code > 3. branch to buffer > > ... which would avoid this class of problem too. > > Thanks, > Mark. Thanks, GF
Hi, On Mon, Feb 01, 2021 at 02:39:46PM +0000, Giancarlo Ferrari wrote: > On Mon, Feb 01, 2021 at 12:47:20PM +0000, Mark Rutland wrote: > > On Mon, Feb 01, 2021 at 12:44:56AM +0000, Giancarlo Ferrari wrote: > > > machine_kexec() need to set rw permission in text and rodata sections > > > to assign some variables (e.g. kexec_start_address). To do that at > > > the end (after flushing pdm in memory, etc.) it needs to invalidate > > > TLB [section] entries. > > > > It'd be worth noting explicitly that set_kernel_text_rw() alters > > current->active_mm... > > > > > If during the TLB invalidation an interrupt occours, which might cause > > > a context switch, there is the risk to inject invalid TLBs, with ro > > > permissions. > > > > ... which is why if there's a context switch things can go wrong, since > > active_mm isn't stable, and so it's possible that set_kernel_text_rw() > > updates multiple tables, none of which might be the active table at the > > point we try to make an access. > > Maybe the behaviour causing issue is not completely clear to me, and I do > apologize for that (moreover I haven't eougth debug capabilities). I think we're in rough agreement that the issue is likely related to the context switch, but our understanding of the specifics differs (and I think we're missing a detail here). > However, current-active_mm is switched among context switches. Correct ? In some cases current->active_mm is not switched, and can be inherited over a context switch. When switching to a user task, we always switch to its mm (which becomes the active_mm), but when switching to a kthread we retain the previous task's mm as the active_mm as part of the lazy context switch. So while a kthread is preemptible, its active_mm (and active ASID) can change under its feet. That could happen anywhere while the task is preemptible, e.g. in the middle of set_kernel_text_rw(), or mid-modification to the kexec variables. > So, in principle, the invalidation, if stopped, is carried on where it > left. That's true so long as all the processes we switch between share the same leaf tables for the region we're altering. If not, then the lazy context switch means that those tables can change under our feet. I believe the tables mapping the kernel text are shared by all threads, and if so this _should_ be true. Russell might be able to confirm that or correct me if I have that wrong. > I thought the issue was that the PageTable entry for the section 0x8010_0000 > is global, thus not indexed by ASID (Address Space ID). By the fact that each > process has its own version of that entry, is the cause of the issue, as the > schedule process might bringing a spurious entry (with ro permission) in the > MMU cache. The TLB invalidation performed under set_kernel_text_rw() affects all ASIDs on the current CPU, so there shouldn't be any stale RO TLB entries to hit unless the kthread is migrated to another CPU. > If the entry is not global holds the ASID, and the issue cannot happen. I don't think that's true, since switching to a different active_mm would also change ASID, and we'd have no additional guarantee. Thanks, Mark.
On Mon, Feb 01, 2021 at 01:57:14PM +0000, Mark Rutland wrote: > We could simplify this slightly if we moved the kexec_& variables into a > struct (using asm-offset KEXEC_VAR_* offsets and a KEXEC_VAR_SIZE region > reserved in the asm), then here we could do something like: > > static struct kexec_vars *kexec_buffer_vars(void *buffer) > { > unsigned long code = ((unisigned long)relocate_new_kernel) & ~1; > unsigned long vars - (unsigned long)relocate_vars; > unsigned long offset = vars - code; > > return buffer + offset; > } > > ... and in machine_kexec() do: > > struct kexec_vars *kv = kexec_buffer_vars(reboot_code_buffer); > > kv->start_address = image->start; > kv->indirection_page = page_list; > kv->mach_type = machine-arch_type; > kv->boot_atags = arch.kernel_r2; > > ... if that looks any better to you? Something like this? diff --git a/arch/arm/include/asm/kexec-internal.h b/arch/arm/include/asm/kexec-internal.h new file mode 100644 index 000000000000..ecc2322db7aa --- /dev/null +++ b/arch/arm/include/asm/kexec-internal.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ARM_KEXEC_INTERNAL_H +#define _ARM_KEXEC_INTERNAL_H + +struct kexec_relocate_data { + unsigned long kexec_start_address; + unsigned long kexec_indirection_page; + unsigned long kexec_mach_type; + unsigned long kexec_r2; +}; + +#endif diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index a1570c8bab25..be8050b0c3df 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -12,6 +12,7 @@ #include <linux/mm.h> #include <linux/dma-mapping.h> #include <asm/cacheflush.h> +#include <asm/kexec-internal.h> #include <asm/glue-df.h> #include <asm/glue-pf.h> #include <asm/mach/arch.h> @@ -170,5 +171,9 @@ int main(void) DEFINE(MPU_RGN_PRBAR, offsetof(struct mpu_rgn, prbar)); DEFINE(MPU_RGN_PRLAR, offsetof(struct mpu_rgn, prlar)); #endif + DEFINE(KEXEC_START_ADDR, offsetof(struct kexec_relocate_data, kexec_start_address)); + DEFINE(KEXEC_INDIR_PAGE, offsetof(struct kexec_relocate_data, kexec_indirection_page)); + DEFINE(KEXEC_MACH_TYPE, offsetof(struct kexec_relocate_data, kexec_mach_type)); + DEFINE(KEXEC_R2, offsetof(struct kexec_relocate_data, kexec_r2)); return 0; } diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index 5d84ad333f05..2b09dad7935e 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -13,6 +13,7 @@ #include <linux/of_fdt.h> #include <asm/mmu_context.h> #include <asm/cacheflush.h> +#include <asm/kexec-internal.h> #include <asm/fncpy.h> #include <asm/mach-types.h> #include <asm/smp_plat.h> @@ -22,11 +23,6 @@ extern void relocate_new_kernel(void); extern const unsigned int relocate_new_kernel_size; -extern unsigned long kexec_start_address; -extern unsigned long kexec_indirection_page; -extern unsigned long kexec_mach_type; -extern unsigned long kexec_boot_atags; - static atomic_t waiting_for_crash_ipi; /* @@ -159,6 +155,7 @@ void (*kexec_reinit)(void); void machine_kexec(struct kimage *image) { unsigned long page_list, reboot_entry_phys; + struct kexec_relocate_data *data; void (*reboot_entry)(void); void *reboot_code_buffer; @@ -174,18 +171,17 @@ void machine_kexec(struct kimage *image) reboot_code_buffer = page_address(image->control_code_page); - /* Prepare parameters for reboot_code_buffer*/ - set_kernel_text_rw(); - kexec_start_address = image->start; - kexec_indirection_page = page_list; - kexec_mach_type = machine_arch_type; - kexec_boot_atags = image->arch.kernel_r2; - /* copy our kernel relocation code to the control code page */ reboot_entry = fncpy(reboot_code_buffer, &relocate_new_kernel, relocate_new_kernel_size); + data = reboot_code_buffer + relocate_new_kernel_size; + data->kexec_start_address = image->start; + data->kexec_indirection_page = page_list; + data->kexec_mach_type = machine_arch_type; + data->kexec_r2 = image->arch.kernel_r2; + /* get the identity mapping physical address for the reboot code */ reboot_entry_phys = virt_to_idmap(reboot_entry); diff --git a/arch/arm/kernel/relocate_kernel.S b/arch/arm/kernel/relocate_kernel.S index 72a08786e16e..218d524360fc 100644 --- a/arch/arm/kernel/relocate_kernel.S +++ b/arch/arm/kernel/relocate_kernel.S @@ -5,14 +5,16 @@ #include <linux/linkage.h> #include <asm/assembler.h> +#include <asm/asm-offsets.h> #include <asm/kexec.h> .align 3 /* not needed for this code, but keeps fncpy() happy */ ENTRY(relocate_new_kernel) - ldr r0,kexec_indirection_page - ldr r1,kexec_start_address + adr r7, relocate_new_kernel_end + ldr r0, [r7, #KEXEC_INDIR_PAGE] + ldr r1, [r7, #KEXEC_START_ADDR] /* * If there is no indirection page (we are doing crashdumps) @@ -57,34 +59,16 @@ ENTRY(relocate_new_kernel) 2: /* Jump to relocated kernel */ - mov lr,r1 - mov r0,#0 - ldr r1,kexec_mach_type - ldr r2,kexec_boot_atags - ARM( ret lr ) - THUMB( bx lr ) - - .align - - .globl kexec_start_address -kexec_start_address: - .long 0x0 - - .globl kexec_indirection_page -kexec_indirection_page: - .long 0x0 - - .globl kexec_mach_type -kexec_mach_type: - .long 0x0 - - /* phy addr of the atags for the new kernel */ - .globl kexec_boot_atags -kexec_boot_atags: - .long 0x0 + mov lr, r1 + mov r0, #0 + ldr r1, [r7, #KEXEC_MACH_TYPE] + ldr r2, [r7, #KEXEC_R2] + ARM( ret lr ) + THUMB( bx lr ) ENDPROC(relocate_new_kernel) + .align 3 relocate_new_kernel_end: .globl relocate_new_kernel_size
On Mon, Feb 01, 2021 at 04:08:38PM +0000, Russell King - ARM Linux admin wrote: > On Mon, Feb 01, 2021 at 01:57:14PM +0000, Mark Rutland wrote: > > We could simplify this slightly if we moved the kexec_& variables into a > > struct (using asm-offset KEXEC_VAR_* offsets and a KEXEC_VAR_SIZE region > > reserved in the asm), then here we could do something like: > > > > static struct kexec_vars *kexec_buffer_vars(void *buffer) > > { > > unsigned long code = ((unisigned long)relocate_new_kernel) & ~1; > > unsigned long vars - (unsigned long)relocate_vars; > > unsigned long offset = vars - code; > > > > return buffer + offset; > > } > > > > ... and in machine_kexec() do: > > > > struct kexec_vars *kv = kexec_buffer_vars(reboot_code_buffer); > > > > kv->start_address = image->start; > > kv->indirection_page = page_list; > > kv->mach_type = machine-arch_type; > > kv->boot_atags = arch.kernel_r2; > > > > ... if that looks any better to you? > > Something like this? Nice! That looks about right to me, modulo a bit of cache maintenance noted below. > diff --git a/arch/arm/include/asm/kexec-internal.h b/arch/arm/include/asm/kexec-internal.h > new file mode 100644 > index 000000000000..ecc2322db7aa > --- /dev/null > +++ b/arch/arm/include/asm/kexec-internal.h > @@ -0,0 +1,12 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ARM_KEXEC_INTERNAL_H > +#define _ARM_KEXEC_INTERNAL_H > + > +struct kexec_relocate_data { > + unsigned long kexec_start_address; > + unsigned long kexec_indirection_page; > + unsigned long kexec_mach_type; > + unsigned long kexec_r2; > +}; > + > +#endif > diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c > index a1570c8bab25..be8050b0c3df 100644 > --- a/arch/arm/kernel/asm-offsets.c > +++ b/arch/arm/kernel/asm-offsets.c > @@ -12,6 +12,7 @@ > #include <linux/mm.h> > #include <linux/dma-mapping.h> > #include <asm/cacheflush.h> > +#include <asm/kexec-internal.h> > #include <asm/glue-df.h> > #include <asm/glue-pf.h> > #include <asm/mach/arch.h> > @@ -170,5 +171,9 @@ int main(void) > DEFINE(MPU_RGN_PRBAR, offsetof(struct mpu_rgn, prbar)); > DEFINE(MPU_RGN_PRLAR, offsetof(struct mpu_rgn, prlar)); > #endif > + DEFINE(KEXEC_START_ADDR, offsetof(struct kexec_relocate_data, kexec_start_address)); > + DEFINE(KEXEC_INDIR_PAGE, offsetof(struct kexec_relocate_data, kexec_indirection_page)); > + DEFINE(KEXEC_MACH_TYPE, offsetof(struct kexec_relocate_data, kexec_mach_type)); > + DEFINE(KEXEC_R2, offsetof(struct kexec_relocate_data, kexec_r2)); > return 0; > } > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index 5d84ad333f05..2b09dad7935e 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -13,6 +13,7 @@ > #include <linux/of_fdt.h> > #include <asm/mmu_context.h> > #include <asm/cacheflush.h> > +#include <asm/kexec-internal.h> > #include <asm/fncpy.h> > #include <asm/mach-types.h> > #include <asm/smp_plat.h> > @@ -22,11 +23,6 @@ > extern void relocate_new_kernel(void); > extern const unsigned int relocate_new_kernel_size; > > -extern unsigned long kexec_start_address; > -extern unsigned long kexec_indirection_page; > -extern unsigned long kexec_mach_type; > -extern unsigned long kexec_boot_atags; > - > static atomic_t waiting_for_crash_ipi; > > /* > @@ -159,6 +155,7 @@ void (*kexec_reinit)(void); > void machine_kexec(struct kimage *image) > { > unsigned long page_list, reboot_entry_phys; > + struct kexec_relocate_data *data; > void (*reboot_entry)(void); > void *reboot_code_buffer; > > @@ -174,18 +171,17 @@ void machine_kexec(struct kimage *image) > > reboot_code_buffer = page_address(image->control_code_page); > > - /* Prepare parameters for reboot_code_buffer*/ > - set_kernel_text_rw(); > - kexec_start_address = image->start; > - kexec_indirection_page = page_list; > - kexec_mach_type = machine_arch_type; > - kexec_boot_atags = image->arch.kernel_r2; > - > /* copy our kernel relocation code to the control code page */ > reboot_entry = fncpy(reboot_code_buffer, > &relocate_new_kernel, > relocate_new_kernel_size); > > + data = reboot_code_buffer + relocate_new_kernel_size; > + data->kexec_start_address = image->start; > + data->kexec_indirection_page = page_list; > + data->kexec_mach_type = machine_arch_type; > + data->kexec_r2 = image->arch.kernel_r2; I reckon here we need: __cpuc_flush_dcache_area(reboot_code_buffer, relocate_new_kernel_size + sizeof(*data)); ... to make sure both the instructions and data are visible with the MMU off (since fncpy() only cleans to the PoU, not the PoC). Otherwise this all looks good to me. Mark. > + > /* get the identity mapping physical address for the reboot code */ > reboot_entry_phys = virt_to_idmap(reboot_entry); > > diff --git a/arch/arm/kernel/relocate_kernel.S b/arch/arm/kernel/relocate_kernel.S > index 72a08786e16e..218d524360fc 100644 > --- a/arch/arm/kernel/relocate_kernel.S > +++ b/arch/arm/kernel/relocate_kernel.S > @@ -5,14 +5,16 @@ > > #include <linux/linkage.h> > #include <asm/assembler.h> > +#include <asm/asm-offsets.h> > #include <asm/kexec.h> > > .align 3 /* not needed for this code, but keeps fncpy() happy */ > > ENTRY(relocate_new_kernel) > > - ldr r0,kexec_indirection_page > - ldr r1,kexec_start_address > + adr r7, relocate_new_kernel_end > + ldr r0, [r7, #KEXEC_INDIR_PAGE] > + ldr r1, [r7, #KEXEC_START_ADDR] > > /* > * If there is no indirection page (we are doing crashdumps) > @@ -57,34 +59,16 @@ ENTRY(relocate_new_kernel) > > 2: > /* Jump to relocated kernel */ > - mov lr,r1 > - mov r0,#0 > - ldr r1,kexec_mach_type > - ldr r2,kexec_boot_atags > - ARM( ret lr ) > - THUMB( bx lr ) > - > - .align > - > - .globl kexec_start_address > -kexec_start_address: > - .long 0x0 > - > - .globl kexec_indirection_page > -kexec_indirection_page: > - .long 0x0 > - > - .globl kexec_mach_type > -kexec_mach_type: > - .long 0x0 > - > - /* phy addr of the atags for the new kernel */ > - .globl kexec_boot_atags > -kexec_boot_atags: > - .long 0x0 > + mov lr, r1 > + mov r0, #0 > + ldr r1, [r7, #KEXEC_MACH_TYPE] > + ldr r2, [r7, #KEXEC_R2] > + ARM( ret lr ) > + THUMB( bx lr ) > > ENDPROC(relocate_new_kernel) > > + .align 3 > relocate_new_kernel_end: > > .globl relocate_new_kernel_size > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Mon, Feb 01, 2021 at 04:32:40PM +0000, Mark Rutland wrote: > I reckon here we need: > > __cpuc_flush_dcache_area(reboot_code_buffer, > relocate_new_kernel_size + sizeof(*data)); > > ... to make sure both the instructions and data are visible with the MMU > off (since fncpy() only cleans to the PoU, not the PoC). We don't. When soft_restart() turns the MMU off, and it calls flush_cache_all() before and afterwards to ensure that all dirty lines are pushed out. The flushing to PoU in fncpy() is to cover other cases.
Hi, On Mon, Feb 01, 2021 at 03:30:12PM +0000, Mark Rutland wrote: > Hi, > > On Mon, Feb 01, 2021 at 02:39:46PM +0000, Giancarlo Ferrari wrote: > > On Mon, Feb 01, 2021 at 12:47:20PM +0000, Mark Rutland wrote: > > > On Mon, Feb 01, 2021 at 12:44:56AM +0000, Giancarlo Ferrari wrote: > > > > machine_kexec() need to set rw permission in text and rodata sections > > > > to assign some variables (e.g. kexec_start_address). To do that at > > > > the end (after flushing pdm in memory, etc.) it needs to invalidate > > > > TLB [section] entries. > > > > > > It'd be worth noting explicitly that set_kernel_text_rw() alters > > > current->active_mm... > > > > > > > If during the TLB invalidation an interrupt occours, which might cause > > > > a context switch, there is the risk to inject invalid TLBs, with ro > > > > permissions. > > > > > > ... which is why if there's a context switch things can go wrong, since > > > active_mm isn't stable, and so it's possible that set_kernel_text_rw() > > > updates multiple tables, none of which might be the active table at the > > > point we try to make an access. > > > > Maybe the behaviour causing issue is not completely clear to me, and I do > > apologize for that (moreover I haven't eougth debug capabilities). > > I think we're in rough agreement that the issue is likely related to the > context switch, but our understanding of the specifics differs (and I > think we're missing a detail here). > > > However, current-active_mm is switched among context switches. Correct ? > > In some cases current->active_mm is not switched, and can be inherited > over a context switch. When switching to a user task, we always switch > to its mm (which becomes the active_mm), but when switching to a kthread > we retain the previous task's mm as the active_mm as part of the lazy > context switch. > > So while a kthread is preemptible, its active_mm (and active ASID) can > change under its feet. That could happen anywhere while the task is > preemptible, e.g. in the middle of set_kernel_text_rw(), or > mid-modification to the kexec variables. > Yes. In my understanding, even in the case of user process, when current->active_mm is switched, we can run into trouble. For instance: - Process A issue kexec (PageTables entry of A has 0x8000_0000-0x8010_0000 with ro permission and section is global, no NG bit set). - A context switch happens in the middle of set_kernel_text_rw(), right after the section 0x8000_0000-0x8010_0000 has been invalidated. - Process B, in its execution, re-inject its own PageTable entry with ro permission, which is not shared with Process A (and is not touched by the previous invalidation) in the MMU cache. - When Process A, is rescheduled, it carries on with the invalidation, but unfortunately I have "wrong" entries in the MMU. > > So, in principle, the invalidation, if stopped, is carried on where it > > left. > > That's true so long as all the processes we switch between share the > same leaf tables for the region we're altering. If not, then the lazy > context switch means that those tables can change under our feet. > > I believe the tables mapping the kernel text are shared by all threads, > and if so this _should_ be true. Russell might be able to confirm that > or correct me if I have that wrong. > I am not ready to put my hand on the fire, but I seen the behaviour described above. > > I thought the issue was that the PageTable entry for the section 0x8010_0000 > > is global, thus not indexed by ASID (Address Space ID). By the fact that each > > process has its own version of that entry, is the cause of the issue, as the > > schedule process might bringing a spurious entry (with ro permission) in the > > MMU cache. > > The TLB invalidation performed under set_kernel_text_rw() affects all > ASIDs on the current CPU, so there shouldn't be any stale RO TLB entries > to hit unless the kthread is migrated to another CPU. > > > If the entry is not global holds the ASID, and the issue cannot happen. > > I don't think that's true, since switching to a different active_mm > would also change ASID, and we'd have no additional guarantee. > I agree, my fault. > Thanks, > Mark. Thanks, GF
Hi, On Mon, Feb 01, 2021 at 04:08:38PM +0000, Russell King - ARM Linux admin wrote: > On Mon, Feb 01, 2021 at 01:57:14PM +0000, Mark Rutland wrote: > > We could simplify this slightly if we moved the kexec_& variables into a > > struct (using asm-offset KEXEC_VAR_* offsets and a KEXEC_VAR_SIZE region > > reserved in the asm), then here we could do something like: > > > > static struct kexec_vars *kexec_buffer_vars(void *buffer) > > { > > unsigned long code = ((unisigned long)relocate_new_kernel) & ~1; > > unsigned long vars - (unsigned long)relocate_vars; > > unsigned long offset = vars - code; > > > > return buffer + offset; > > } > > > > ... and in machine_kexec() do: > > > > struct kexec_vars *kv = kexec_buffer_vars(reboot_code_buffer); > > > > kv->start_address = image->start; > > kv->indirection_page = page_list; > > kv->mach_type = machine-arch_type; > > kv->boot_atags = arch.kernel_r2; > > > > ... if that looks any better to you? > > Something like this? > > diff --git a/arch/arm/include/asm/kexec-internal.h b/arch/arm/include/asm/kexec-internal.h > new file mode 100644 > index 000000000000..ecc2322db7aa > --- /dev/null > +++ b/arch/arm/include/asm/kexec-internal.h > @@ -0,0 +1,12 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ARM_KEXEC_INTERNAL_H > +#define _ARM_KEXEC_INTERNAL_H > + > +struct kexec_relocate_data { > + unsigned long kexec_start_address; > + unsigned long kexec_indirection_page; > + unsigned long kexec_mach_type; > + unsigned long kexec_r2; > +}; > + > +#endif > diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c > index a1570c8bab25..be8050b0c3df 100644 > --- a/arch/arm/kernel/asm-offsets.c > +++ b/arch/arm/kernel/asm-offsets.c > @@ -12,6 +12,7 @@ > #include <linux/mm.h> > #include <linux/dma-mapping.h> > #include <asm/cacheflush.h> > +#include <asm/kexec-internal.h> > #include <asm/glue-df.h> > #include <asm/glue-pf.h> > #include <asm/mach/arch.h> > @@ -170,5 +171,9 @@ int main(void) > DEFINE(MPU_RGN_PRBAR, offsetof(struct mpu_rgn, prbar)); > DEFINE(MPU_RGN_PRLAR, offsetof(struct mpu_rgn, prlar)); > #endif > + DEFINE(KEXEC_START_ADDR, offsetof(struct kexec_relocate_data, kexec_start_address)); > + DEFINE(KEXEC_INDIR_PAGE, offsetof(struct kexec_relocate_data, kexec_indirection_page)); > + DEFINE(KEXEC_MACH_TYPE, offsetof(struct kexec_relocate_data, kexec_mach_type)); > + DEFINE(KEXEC_R2, offsetof(struct kexec_relocate_data, kexec_r2)); > return 0; > } > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index 5d84ad333f05..2b09dad7935e 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -13,6 +13,7 @@ > #include <linux/of_fdt.h> > #include <asm/mmu_context.h> > #include <asm/cacheflush.h> > +#include <asm/kexec-internal.h> > #include <asm/fncpy.h> > #include <asm/mach-types.h> > #include <asm/smp_plat.h> > @@ -22,11 +23,6 @@ > extern void relocate_new_kernel(void); > extern const unsigned int relocate_new_kernel_size; > > -extern unsigned long kexec_start_address; > -extern unsigned long kexec_indirection_page; > -extern unsigned long kexec_mach_type; > -extern unsigned long kexec_boot_atags; > - > static atomic_t waiting_for_crash_ipi; > > /* > @@ -159,6 +155,7 @@ void (*kexec_reinit)(void); > void machine_kexec(struct kimage *image) > { > unsigned long page_list, reboot_entry_phys; > + struct kexec_relocate_data *data; > void (*reboot_entry)(void); > void *reboot_code_buffer; > > @@ -174,18 +171,17 @@ void machine_kexec(struct kimage *image) > > reboot_code_buffer = page_address(image->control_code_page); > > - /* Prepare parameters for reboot_code_buffer*/ > - set_kernel_text_rw(); > - kexec_start_address = image->start; > - kexec_indirection_page = page_list; > - kexec_mach_type = machine_arch_type; > - kexec_boot_atags = image->arch.kernel_r2; > - > /* copy our kernel relocation code to the control code page */ > reboot_entry = fncpy(reboot_code_buffer, > &relocate_new_kernel, > relocate_new_kernel_size); > > + data = reboot_code_buffer + relocate_new_kernel_size; > + data->kexec_start_address = image->start; > + data->kexec_indirection_page = page_list; > + data->kexec_mach_type = machine_arch_type; > + data->kexec_r2 = image->arch.kernel_r2; > + > /* get the identity mapping physical address for the reboot code */ > reboot_entry_phys = virt_to_idmap(reboot_entry); > > diff --git a/arch/arm/kernel/relocate_kernel.S b/arch/arm/kernel/relocate_kernel.S > index 72a08786e16e..218d524360fc 100644 > --- a/arch/arm/kernel/relocate_kernel.S > +++ b/arch/arm/kernel/relocate_kernel.S > @@ -5,14 +5,16 @@ > > #include <linux/linkage.h> > #include <asm/assembler.h> > +#include <asm/asm-offsets.h> > #include <asm/kexec.h> > > .align 3 /* not needed for this code, but keeps fncpy() happy */ > > ENTRY(relocate_new_kernel) > > - ldr r0,kexec_indirection_page > - ldr r1,kexec_start_address > + adr r7, relocate_new_kernel_end > + ldr r0, [r7, #KEXEC_INDIR_PAGE] > + ldr r1, [r7, #KEXEC_START_ADDR] > > /* > * If there is no indirection page (we are doing crashdumps) > @@ -57,34 +59,16 @@ ENTRY(relocate_new_kernel) > > 2: > /* Jump to relocated kernel */ > - mov lr,r1 > - mov r0,#0 > - ldr r1,kexec_mach_type > - ldr r2,kexec_boot_atags > - ARM( ret lr ) > - THUMB( bx lr ) > - > - .align > - > - .globl kexec_start_address > -kexec_start_address: > - .long 0x0 > - > - .globl kexec_indirection_page > -kexec_indirection_page: > - .long 0x0 > - > - .globl kexec_mach_type > -kexec_mach_type: > - .long 0x0 > - > - /* phy addr of the atags for the new kernel */ > - .globl kexec_boot_atags > -kexec_boot_atags: > - .long 0x0 > + mov lr, r1 > + mov r0, #0 > + ldr r1, [r7, #KEXEC_MACH_TYPE] > + ldr r2, [r7, #KEXEC_R2] > + ARM( ret lr ) > + THUMB( bx lr ) > > ENDPROC(relocate_new_kernel) > > + .align 3 Nice. Why we should align 3 ? For the fncpy I suppose. > relocate_new_kernel_end: > > .globl relocate_new_kernel_size > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! I don't know now how to proceed now, as you (Mark and you) do completely the patch. You see is my first kernel patch submission :) . Thanks, GF
On Mon, Feb 01, 2021 at 08:07:37PM +0000, Giancarlo Ferrari wrote: > Hi, Hi, > Why we should align 3 ? For the fncpy I suppose. Slightly arbitary really - it gives a nice 8-byte alignment to the data. .align 2 would also be sufficient. > I don't know now how to proceed now, as you (Mark and you) do completely > the patch. Please can you test my patch and let us know if it solves your problem (or even if it works! I haven't tested it beyond build-testing.) > You see is my first kernel patch submission :) . Yay. Sorry for giving you a different patch - Mark is quite right that there's a better solution to this problem, which is eliminating the set_kernel_text_rw() call. The only reason I cooked up the patch was doing that would be more in-depth (as you can see from the increased size of the patch.)
Russell, On Mon, Feb 01, 2021 at 08:16:33PM +0000, Russell King - ARM Linux admin wrote: > On Mon, Feb 01, 2021 at 08:07:37PM +0000, Giancarlo Ferrari wrote: > > Hi, > > Hi, > > > Why we should align 3 ? For the fncpy I suppose. > > Slightly arbitary really - it gives a nice 8-byte alignment to the data. > .align 2 would also be sufficient. > > > I don't know now how to proceed now, as you (Mark and you) do completely > > the patch. > > Please can you test my patch and let us know if it solves your problem > (or even if it works! I haven't tested it beyond build-testing.) > sure, unfortunately due to restriction, I hope to test it by the end of the week. I hope there will be no rush. Otherwise, please let me know. > > You see is my first kernel patch submission :) . > > Yay. Sorry for giving you a different patch - Mark is quite right that > there's a better solution to this problem, which is eliminating the > set_kernel_text_rw() call. The only reason I cooked up the patch was > doing that would be more in-depth (as you can see from the increased > size of the patch.) > I definitely agree with you. > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! Thanks again, GF
Hi all, On Mon, Feb 01, 2021 at 10:18:26PM +0000, Giancarlo Ferrari wrote: > Russell, > > On Mon, Feb 01, 2021 at 08:16:33PM +0000, Russell King - ARM Linux admin wrote: > > On Mon, Feb 01, 2021 at 08:07:37PM +0000, Giancarlo Ferrari wrote: > > > Hi, > > > > Hi, > > > > > Why we should align 3 ? For the fncpy I suppose. > > > > Slightly arbitary really - it gives a nice 8-byte alignment to the data. > > .align 2 would also be sufficient. > > > > > I don't know now how to proceed now, as you (Mark and you) do completely > > > the patch. > > > > Please can you test my patch and let us know if it solves your problem > > (or even if it works! I haven't tested it beyond build-testing.) > > I have tested your patch and it works. I have tested using kexec -l/kexec -e. Considering that .text var are no more written, I assume the issue is gone. > > sure, unfortunately due to restriction, I hope to test it by the end of the > week. I hope there will be no rush. Otherwise, please let me know. > > > > You see is my first kernel patch submission :) . > > > > Yay. Sorry for giving you a different patch - Mark is quite right that > > there's a better solution to this problem, which is eliminating the > > set_kernel_text_rw() call. The only reason I cooked up the patch was > > doing that would be more in-depth (as you can see from the increased > > size of the patch.) > > > > I definitely agree with you. > > > -- > > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! > > Thanks again, > > > GF Can I ask about having it integrated ? Thanks in advance, GF
On Thu, Feb 04, 2021 at 11:48:42PM +0000, Giancarlo Ferrari wrote: > Can I ask about having it integrated ? Thanks for testing. Are you willing for me to add: Tested-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> to the commit log? I can move it into the fixes branch which I want to send to Linus by Saturday at the very latest.
Russell, On Fri, Feb 05, 2021 at 12:18:33AM +0000, Russell King - ARM Linux admin wrote: > On Thu, Feb 04, 2021 at 11:48:42PM +0000, Giancarlo Ferrari wrote: > > Can I ask about having it integrated ? > > Thanks for testing. Are you willing for me to add: > > Tested-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> > > to the commit log? > Sure. I have a question regarding the patch. I saw that the structure start at the end of the relocation code: data = reboot_code_buffer + relocate_new_kernel_size; which means it overlap with the global symbol relocate_new_kernel_size. I think is minor comment as the variable is only used in the fncpy() then thrown away. Something like: data = reboot_code_buffer + relocate_new_kernel_size + sizeof(long); and accordingly in the instruction (arch/arm/kernel/relocate_kernel.S): adr r7, relocate_new_kernel_end > I can move it into the fixes branch which I want to send to Linus by > Saturday at the very latest. > I am not used with all the merging process. However it sounds great ! > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! Thanks in advance, GF
Sorry for polluting, On Fri, Feb 05, 2021 at 12:40:54AM +0000, Giancarlo Ferrari wrote: > Russell, > > On Fri, Feb 05, 2021 at 12:18:33AM +0000, Russell King - ARM Linux admin wrote: > > On Thu, Feb 04, 2021 at 11:48:42PM +0000, Giancarlo Ferrari wrote: > > > Can I ask about having it integrated ? > > > > Thanks for testing. Are you willing for me to add: > > > > Tested-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> > > > > to the commit log? > > > > Sure. > > I have a question regarding the patch. I saw that the structure start at > the end of the relocation code: > > data = reboot_code_buffer + relocate_new_kernel_size; > > which means it overlap with the global symbol relocate_new_kernel_size. > I think is minor comment as the variable is only used in the fncpy() > then thrown away. Something like: > > data = reboot_code_buffer + relocate_new_kernel_size + sizeof(long); > > and accordingly in the instruction (arch/arm/kernel/relocate_kernel.S): > > adr r7, relocate_new_kernel_end > my fault. Do as I didn't talk... one is in virtual memory the other in the initial code. > > I can move it into the fixes branch which I want to send to Linus by > > Saturday at the very latest. > > > > I am not used with all the merging process. However it sounds great ! > > > -- > > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! > > Thanks in advance, > > > GF Thanks, GF
On Fri, Feb 05, 2021 at 12:40:54AM +0000, Giancarlo Ferrari wrote: > Russell, > > On Fri, Feb 05, 2021 at 12:18:33AM +0000, Russell King - ARM Linux admin wrote: > > On Thu, Feb 04, 2021 at 11:48:42PM +0000, Giancarlo Ferrari wrote: > > > Can I ask about having it integrated ? > > > > Thanks for testing. Are you willing for me to add: > > > > Tested-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> > > > > to the commit log? > > > > Sure. > > I have a question regarding the patch. I saw that the structure start at > the end of the relocation code: > > data = reboot_code_buffer + relocate_new_kernel_size; > > which means it overlap with the global symbol relocate_new_kernel_size. > I think is minor comment as the variable is only used in the fncpy() > then thrown away. The same is true of the rest of the kernel image if that's how you'd like to look at it. relocate_new_kernel_size is just there to tell the C code the size of the function, nothing more nothing less. It isn't there to be copied. The copied code doesn't use it. > Something like: > > data = reboot_code_buffer + relocate_new_kernel_size + sizeof(long); No. That will place the structure after the size variable, which we don't want, and... > and accordingly in the instruction (arch/arm/kernel/relocate_kernel.S): > > adr r7, relocate_new_kernel_end ... we will then need to follow this with: add r7, r7, #4 or replace it with: adr r7, relocate_new_kernel_end + 4 so that r7 points at "data".
On Fri, Feb 05, 2021 at 09:44:59AM +0000, Russell King - ARM Linux admin wrote: > On Fri, Feb 05, 2021 at 12:40:54AM +0000, Giancarlo Ferrari wrote: > > Russell, > > > > On Fri, Feb 05, 2021 at 12:18:33AM +0000, Russell King - ARM Linux admin wrote: > > > On Thu, Feb 04, 2021 at 11:48:42PM +0000, Giancarlo Ferrari wrote: > > > > Can I ask about having it integrated ? > > > > > > Thanks for testing. Are you willing for me to add: > > > > > > Tested-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> > > > > > > to the commit log? > > > > > > > Sure. > > > > I have a question regarding the patch. I saw that the structure start at > > the end of the relocation code: > > > > data = reboot_code_buffer + relocate_new_kernel_size; > > > > which means it overlap with the global symbol relocate_new_kernel_size. > > I think is minor comment as the variable is only used in the fncpy() > > then thrown away. > > The same is true of the rest of the kernel image if that's how you'd > like to look at it. > > relocate_new_kernel_size is just there to tell the C code the size of > the function, nothing more nothing less. It isn't there to be copied. > The copied code doesn't use it. > Yes, clear. > > Something like: > > > > data = reboot_code_buffer + relocate_new_kernel_size + sizeof(long); > > No. That will place the structure after the size variable, which we > don't want, and... > > > and accordingly in the instruction (arch/arm/kernel/relocate_kernel.S): > > > > adr r7, relocate_new_kernel_end > > ... we will then need to follow this with: > add r7, r7, #4 > > or replace it with: > adr r7, relocate_new_kernel_end + 4 > > so that r7 points at "data". > It was what I meant with "accordingly". > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! Regards, GF
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index 5d84ad3..23e8816 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -174,6 +174,13 @@ void machine_kexec(struct kimage *image) reboot_code_buffer = page_address(image->control_code_page); + /* + * If below part is not atomic TLB entries might be corrupted after TLB + * invalidation, which leads to Data Abort in .text variable assignment + */ + raw_local_irq_disable(); + local_fiq_disable(); + /* Prepare parameters for reboot_code_buffer*/ set_kernel_text_rw(); kexec_start_address = image->start; @@ -181,6 +188,9 @@ void machine_kexec(struct kimage *image) kexec_mach_type = machine_arch_type; kexec_boot_atags = image->arch.kernel_r2; + local_fiq_enable(); + raw_local_irq_enable(); + /* copy our kernel relocation code to the control code page */ reboot_entry = fncpy(reboot_code_buffer, &relocate_new_kernel,
machine_kexec() need to set rw permission in text and rodata sections to assign some variables (e.g. kexec_start_address). To do that at the end (after flushing pdm in memory, etc.) it needs to invalidate TLB [section] entries. If during the TLB invalidation an interrupt occours, which might cause a context switch, there is the risk to inject invalid TLBs, with ro permissions. When trying to assign .text labels, this lead to the following: Unable to handle kernel paging request at virtual address 80112f38 pgd = fd7ef03e [80112f38] *pgd=0001141e(bad) Internal error: Oops: 80d [#1] PREEMPT SMP ARM ... Signed-off-by: Giancarlo Ferrari <giancarlo.ferrari89@gmail.com> --- arch/arm/kernel/machine_kexec.c | 10 ++++++++++ 1 file changed, 10 insertions(+)