Message ID | 1453745225-27736-5-git-send-email-suzuki.poulose@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Suzuki, On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K. Poulose wrote: > +/* Values for secondary_data.status */ > + > +#define CPU_MMU_OFF -1 > +#define CPU_BOOT_SUCCESS 0 > +/* The cpu invoked ops->cpu_die, synchronise it with cpu_kill */ > +#define CPU_KILL_ME 1 > +/* The cpu couldn't die gracefully and is looping in the kernel */ > +#define CPU_STUCK_IN_KERNEL 2 > +/* Fatal system error detected by secondary CPU, crash the system */ > +#define CPU_PANIC_KERNEL 3 Please add braces around these numbers, just in case (I added them locally). > /* > + * The booting CPU updates the failed status, with MMU turned off, > + * below which lies in head.txt to make sure it doesn't share the same writeback > + * granule. So that we can invalidate it properly. I can't really parse this (it looks like punctuation in the wrong place; also "share the same..." with what?). > + * > + * update_early_cpu_boot_status tmp, status > + * - Corrupts tmp, x0, x1 > + * - Writes 'status' to __early_cpu_boot_status and makes sure > + * it is committed to memory. > + */ > + > + .macro update_early_cpu_boot_status tmp, status > + mov \tmp, lr > + adrp x0, __early_cpu_boot_status > + add x0, x0, #:lo12:__early_cpu_boot_status Nitpick: you could use the adr_l macro. > + mov x1, #\status > + str x1, [x0] > + add x1, x0, 4 > + bl __inval_cache_range > + mov lr, \tmp > + .endm If the CPU that's currently booting has the MMU off, what's the point of invalidating the cache here? The operation may not even be broadcast to the other CPU. So you actually need the invalidation before reading the status on the primary CPU. > + > +ENTRY(__early_cpu_boot_status) > + .long 0 > +END(__early_cpu_boot_status) I think we should just do like __boot_cpu_mode and place it in the .data..cacheline_aligned section. You can always use the safe clean+invalidate before reading the value so that we don't care much about the write-back granule. > @@ -89,12 +101,14 @@ static DECLARE_COMPLETION(cpu_running); > int __cpu_up(unsigned int cpu, struct task_struct *idle) > { > int ret; > + int status; > > /* > * We need to tell the secondary core where to find its stack and the > * page tables. > */ > secondary_data.stack = task_stack_page(idle) + THREAD_START_SP; > + update_cpu_boot_status(CPU_MMU_OFF); > __flush_dcache_area(&secondary_data, sizeof(secondary_data)); > > /* > @@ -117,7 +131,35 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) > pr_err("CPU%u: failed to boot: %d\n", cpu, ret); > } > > + /* Make sure the update to status is visible */ > + smp_rmb(); Which status? In relation to what? > secondary_data.stack = NULL; > + status = READ_ONCE(secondary_data.status); > + if (ret && status) { > + > + if (status == CPU_MMU_OFF) > + status = READ_ONCE(__early_cpu_boot_status); You need cache maintenance before reading this. > + > + switch (status) { > + default: > + pr_err("CPU%u: failed in unknown state : 0x%x\n", > + cpu, status); > + break; > + case CPU_KILL_ME: > + if (!op_cpu_kill(cpu)) { > + pr_crit("CPU%u: died during early boot\n", cpu); > + break; > + } > + /* Fall through */ > + pr_crit("CPU%u: may not have shut down cleanly\n", cpu); > + case CPU_STUCK_IN_KERNEL: > + pr_crit("CPU%u: is stuck in kernel\n", cpu); > + cpus_stuck_in_kernel++; > + break; > + case CPU_PANIC_KERNEL: > + panic("CPU%u detected unsupported configuration\n", cpu); > + } > + } > > return ret; > } BTW, you can send a fix-up on top of this series with corrections, I can fold them in.
On Wed, Feb 03, 2016 at 12:57:38PM +0000, Catalin Marinas wrote: > Hi Suzuki, > > On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K. Poulose wrote: > > + * update_early_cpu_boot_status tmp, status > > + * - Corrupts tmp, x0, x1 > > + * - Writes 'status' to __early_cpu_boot_status and makes sure > > + * it is committed to memory. > > + */ > > + > > + .macro update_early_cpu_boot_status tmp, status > > + mov \tmp, lr > > + adrp x0, __early_cpu_boot_status > > + add x0, x0, #:lo12:__early_cpu_boot_status > > Nitpick: you could use the adr_l macro. > > > + mov x1, #\status > > + str x1, [x0] > > + add x1, x0, 4 > > + bl __inval_cache_range > > + mov lr, \tmp > > + .endm > > If the CPU that's currently booting has the MMU off, what's the point of > invalidating the cache here? To invalidate stale lines for this address, held in any caches prior to the PoC. I'm assuming that __early_cpu_boot_status is sufficiently padded to the CWG. Cache maintenance works when SCTLR_ELx.M == 0, though barriers are required prior to cache maintenance as non-cacheable accesses do not hazard by VA. The MMU being off has no effect on the cache maintenance itself. > The operation may not even be broadcast to the other CPU. So you > actually need the invalidation before reading the status on the > primary CPU. We require that CPUs are coherent when they enter the kernel, so any cache maintenance operation _must_ affect all coherent caches (i.e. it must be broadcast and must affect all coherent caches prior to the PoC in this case). > > + > > +ENTRY(__early_cpu_boot_status) > > + .long 0 > > +END(__early_cpu_boot_status) > > I think we should just do like __boot_cpu_mode and place it in the > .data..cacheline_aligned section. I think we should add a separate __writeback_aligned annotation for stuff like this, even if it falls in .data..cacheline_aligned for now. Otherwise, agreed. > You can always use the safe clean+invalidate before reading the value > so that we don't care much about the write-back granule. To get correct data out we need to pad to the CQG regardless of whether the reader or the writer perform the maintenance. Mark.
On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K Poulose wrote: > From: Suzuki K. Poulose <suzuki.poulose@arm.com> > > A secondary CPU could fail to come online due to insufficient > capabilities and could simply die or loop in the kernel. > e.g, a CPU with no support for the selected kernel PAGE_SIZE > loops in kernel with MMU turned off. > or a hotplugged CPU which doesn't have one of the advertised > system capability will die during the activation. > > There is no way to synchronise the status of the failing CPU > back to the master. This patch solves the issue by adding a > field to the secondary_data which can be updated by the failing > CPU. If the secondary CPU fails even before turning the MMU on, > it updates the status in a special variable reserved in the head.txt > section to make sure that the update can be cache invalidated safely > without possible sharing of cache write back granule. > > Here are the possible states : > > -1. CPU_MMU_OFF - Initial value set by the master CPU, this value > indicates that the CPU could not turn the MMU on, hence the status > could not be reliably updated in the secondary_data. Instead, the > CPU has updated the status in __early_cpu_boot_status (reserved in > head.txt section) > > 0. CPU_BOOT_SUCCESS - CPU has booted successfully. > > 1. CPU_KILL_ME - CPU has invoked cpu_ops->die, indicating the > master CPU to synchronise by issuing a cpu_ops->cpu_kill. > > 2. CPU_STUCK_IN_KERNEL - CPU couldn't invoke die(), instead is > looping in the kernel. This information could be used by say, > kexec to check if it is really safe to do a kexec reboot. > > 3. CPU_PANIC_KERNEL - CPU detected some serious issues which > requires kernel to crash immediately. The secondary CPU cannot > call panic() until it has initialised the GIC. This flag can > be used to instruct the master to do so. When would we use this last case? Perhaps a better option is to always throw any incompatible CPU into an (MMU-off) pen, and assume that it's stuck in the kernel, even if we could theoretically turn it off. A system with incompatible CPUs is a broken system to begin with... Otherwise, comments below. > Cc: Will Deacon <will.deacon@arm.com> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> > --- > arch/arm64/include/asm/smp.h | 26 ++++++++++++++++++++++ > arch/arm64/kernel/asm-offsets.c | 2 ++ > arch/arm64/kernel/head.S | 39 +++++++++++++++++++++++++++++--- > arch/arm64/kernel/smp.c | 47 +++++++++++++++++++++++++++++++++++++++ > 4 files changed, 111 insertions(+), 3 deletions(-) [...] > @@ -650,6 +679,10 @@ __enable_mmu: > ENDPROC(__enable_mmu) > > __no_granule_support: > + /* Indicate that this CPU can't boot and is stuck in the kernel */ > + update_early_cpu_boot_status x2, CPU_STUCK_IN_KERNEL > +1: > wfe > - b __no_granule_support > + wfi > + b 1b The addition of wfi seems fine, but should be mentioned in the commit message. > ENDPROC(__no_granule_support) > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 59f032b..d2721ae 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -63,6 +63,8 @@ > * where to place its SVC stack > */ > struct secondary_data secondary_data; > +/* Number of CPUs which aren't online, but looping in kernel text. */ > +u32 cpus_stuck_in_kernel; Why u32 rather than int? > > enum ipi_msg_type { > IPI_RESCHEDULE, > @@ -72,6 +74,16 @@ enum ipi_msg_type { > IPI_IRQ_WORK, > }; > > +#ifdef CONFIG_HOTPLUG_CPU > +static int op_cpu_kill(unsigned int cpu); > +#else > +static inline int op_cpu_kill(unsigned int cpu) > +{ > + return -ENOSYS; > +} > +#endif There is no !CONFIG_HOTPLUG_CPU configuration any more. > + > + > /* > * Boot a secondary CPU, and assign it the specified idle task. > * This also gives us the initial stack to use for this CPU. > @@ -89,12 +101,14 @@ static DECLARE_COMPLETION(cpu_running); > int __cpu_up(unsigned int cpu, struct task_struct *idle) > { > int ret; > + int status; > > /* > * We need to tell the secondary core where to find its stack and the > * page tables. > */ > secondary_data.stack = task_stack_page(idle) + THREAD_START_SP; > + update_cpu_boot_status(CPU_MMU_OFF); > __flush_dcache_area(&secondary_data, sizeof(secondary_data)); > > /* > @@ -117,7 +131,35 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) > pr_err("CPU%u: failed to boot: %d\n", cpu, ret); > } > > + /* Make sure the update to status is visible */ > + smp_rmb(); > secondary_data.stack = NULL; > + status = READ_ONCE(secondary_data.status); What is the rmb intended to order here? > + if (ret && status) { > + > + if (status == CPU_MMU_OFF) > + status = READ_ONCE(__early_cpu_boot_status); > + > + switch (status) { > + default: > + pr_err("CPU%u: failed in unknown state : 0x%x\n", > + cpu, status); > + break; > + case CPU_KILL_ME: > + if (!op_cpu_kill(cpu)) { > + pr_crit("CPU%u: died during early boot\n", cpu); > + break; > + } > + /* Fall through */ > + pr_crit("CPU%u: may not have shut down cleanly\n", cpu); > + case CPU_STUCK_IN_KERNEL: > + pr_crit("CPU%u: is stuck in kernel\n", cpu); > + cpus_stuck_in_kernel++; > + break; > + case CPU_PANIC_KERNEL: > + panic("CPU%u detected unsupported configuration\n", cpu); > + } > + } > > return ret; > } > @@ -185,6 +227,9 @@ asmlinkage void secondary_start_kernel(void) > */ > pr_info("CPU%u: Booted secondary processor [%08x]\n", > cpu, read_cpuid_id()); > + update_cpu_boot_status(CPU_BOOT_SUCCESS); > + /* Make sure the status update is visible before we complete */ > + smp_wmb(); Surely complete() has appropriate barriers? > set_cpu_online(cpu, true); > complete(&cpu_running); > > @@ -327,10 +372,12 @@ void cpu_die_early(void) > set_cpu_present(cpu, 0); > > #ifdef CONFIG_HOTPLUG_CPU > + update_cpu_boot_status(CPU_KILL_ME); > /* Check if we can park ourselves */ > if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die) > cpu_ops[cpu]->cpu_die(cpu); I think you need a barrier to ensure visibility of the store prior to calling cpu_die (i.e. you want to order an access against execution). A dsb is what you want -- smp_wmb() only expands to a dmb. > #endif > + update_cpu_boot_status(CPU_STUCK_IN_KERNEL); > > cpu_park_loop(); Likewise here. Mark.
On Wed, Feb 03, 2016 at 05:01:15PM +0000, Mark Rutland wrote: > On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K Poulose wrote: > > From: Suzuki K. Poulose <suzuki.poulose@arm.com> > > > > A secondary CPU could fail to come online due to insufficient > > capabilities and could simply die or loop in the kernel. > > e.g, a CPU with no support for the selected kernel PAGE_SIZE > > loops in kernel with MMU turned off. > > or a hotplugged CPU which doesn't have one of the advertised > > system capability will die during the activation. > > > > There is no way to synchronise the status of the failing CPU > > back to the master. This patch solves the issue by adding a > > field to the secondary_data which can be updated by the failing > > CPU. If the secondary CPU fails even before turning the MMU on, > > it updates the status in a special variable reserved in the head.txt > > section to make sure that the update can be cache invalidated safely > > without possible sharing of cache write back granule. > > > > Here are the possible states : > > > > -1. CPU_MMU_OFF - Initial value set by the master CPU, this value > > indicates that the CPU could not turn the MMU on, hence the status > > could not be reliably updated in the secondary_data. Instead, the > > CPU has updated the status in __early_cpu_boot_status (reserved in > > head.txt section) > > > > 0. CPU_BOOT_SUCCESS - CPU has booted successfully. > > > > 1. CPU_KILL_ME - CPU has invoked cpu_ops->die, indicating the > > master CPU to synchronise by issuing a cpu_ops->cpu_kill. > > > > 2. CPU_STUCK_IN_KERNEL - CPU couldn't invoke die(), instead is > > looping in the kernel. This information could be used by say, > > kexec to check if it is really safe to do a kexec reboot. > > > > 3. CPU_PANIC_KERNEL - CPU detected some serious issues which > > requires kernel to crash immediately. The secondary CPU cannot > > call panic() until it has initialised the GIC. This flag can > > be used to instruct the master to do so. > > When would we use this last case? It's used in a subsequent series when verifying the ASID bits. I haven't followed the previous discussions but I guess Suzuki aims to panic the whole kernel rather than just stop the current CPU when incompatible ASID size is found.
Hi Catalin, >> +/* Fatal system error detected by secondary CPU, crash the system */ >> +#define CPU_PANIC_KERNEL 3 > > Please add braces around these numbers, just in case (I added them > locally). OK, I will base my changes on top of the corrections. > >> /* >> + * The booting CPU updates the failed status, with MMU turned off, >> + * below which lies in head.txt to make sure it doesn't share the same writeback >> + * granule. So that we can invalidate it properly. > > I can't really parse this (it looks like punctuation in the wrong place; > also "share the same..." with what?). Sorry it doesn't parse :(. It should have been something like : "The booting CPU updates the failed status @__early_cpu_boot_status (with the MMU turned off). It is placed in head.txt to make sure it doesn't share the same write back granule with anything else while the CPU updates it." >> + .macro update_early_cpu_boot_status tmp, status >> + mov \tmp, lr >> + adrp x0, __early_cpu_boot_status >> + add x0, x0, #:lo12:__early_cpu_boot_status > > Nitpick: you could use the adr_l macro. Yes, that looks much better, will keep in mind. >> @@ -117,7 +131,35 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) >> pr_err("CPU%u: failed to boot: %d\n", cpu, ret); >> } >> >> + /* Make sure the update to status is visible */ >> + smp_rmb(); > > Which status? In relation to what? The secondary_data.status updated by the successful CPUs which have mmu turned on, and updating the cpu_running. But as Mark pointed out, complete() implies a barrier, hence won't need it. > >> secondary_data.stack = NULL; >> + status = READ_ONCE(secondary_data.status); >> + if (ret && status) { >> + >> + if (status == CPU_MMU_OFF) >> + status = READ_ONCE(__early_cpu_boot_status); > > You need cache maintenance before reading this. OK. Thanks Suzuki
On 03/02/16 17:01, Mark Rutland wrote: > On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K Poulose wrote: >> From: Suzuki K. Poulose <suzuki.poulose@arm.com> >> >> 3. CPU_PANIC_KERNEL - CPU detected some serious issues which >> requires kernel to crash immediately. The secondary CPU cannot >> call panic() until it has initialised the GIC. This flag can >> be used to instruct the master to do so. > > When would we use this last case? As of now, it is used when we have incompatible ASID bits. > > Perhaps a better option is to always throw any incompatible CPU into an > (MMU-off) pen, and assume that it's stuck in the kernel, even if we > could theoretically turn it off. Right, that is another option. I am fine with either. >> - b __no_granule_support >> + wfi >> + b 1b > > The addition of wfi seems fine, but should be mentioned in the commit > message. Sure. >> struct secondary_data secondary_data; >> +/* Number of CPUs which aren't online, but looping in kernel text. */ >> +u32 cpus_stuck_in_kernel; > > Why u32 rather than int? No specific reasons, since it is going to be a quantity, which cannot be < 0, kept it unsigned. It could be unsigned int. >> +#ifdef CONFIG_HOTPLUG_CPU >> +static int op_cpu_kill(unsigned int cpu); >> +#else >> +static inline int op_cpu_kill(unsigned int cpu) >> +{ >> + return -ENOSYS; >> +} >> +#endif > > There is no !CONFIG_HOTPLUG_CPU configuration any more. Thats what I thought but then there was [1]. If you disable CONFIG_PM_SLEEP, you can still build with !CONFIG_HOTPLUG_CPU (or in other words allnoconfig) [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/384589.html >> >> + /* Make sure the update to status is visible */ >> + smp_rmb(); >> secondary_data.stack = NULL; >> + status = READ_ONCE(secondary_data.status); > > What is the rmb intended to order here? It was for the complete(). But... >> + update_cpu_boot_status(CPU_BOOT_SUCCESS); >> + /* Make sure the status update is visible before we complete */ >> + smp_wmb(); > > Surely complete() has appropriate barriers? Yes, it does. We can remove it. >> #ifdef CONFIG_HOTPLUG_CPU >> + update_cpu_boot_status(CPU_KILL_ME); >> /* Check if we can park ourselves */ >> if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die) >> cpu_ops[cpu]->cpu_die(cpu); > > I think you need a barrier to ensure visibility of the store prior to > calling cpu_die (i.e. you want to order an access against execution). A > dsb is what you want -- smp_wmb() only expands to a dmb. > OK. >> #endif >> + update_cpu_boot_status(CPU_STUCK_IN_KERNEL); >> >> cpu_park_loop(); > > Likewise here. OK. Thanks Suzuki
On Wed, Feb 03, 2016 at 04:46:32PM +0000, Mark Rutland wrote: > On Wed, Feb 03, 2016 at 12:57:38PM +0000, Catalin Marinas wrote: > > On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K. Poulose wrote: > > > + * update_early_cpu_boot_status tmp, status > > > + * - Corrupts tmp, x0, x1 > > > + * - Writes 'status' to __early_cpu_boot_status and makes sure > > > + * it is committed to memory. > > > + */ > > > + > > > + .macro update_early_cpu_boot_status tmp, status > > > + mov \tmp, lr > > > + adrp x0, __early_cpu_boot_status > > > + add x0, x0, #:lo12:__early_cpu_boot_status > > > > Nitpick: you could use the adr_l macro. > > > > > + mov x1, #\status > > > + str x1, [x0] > > > + add x1, x0, 4 > > > + bl __inval_cache_range > > > + mov lr, \tmp > > > + .endm > > > > If the CPU that's currently booting has the MMU off, what's the point of > > invalidating the cache here? > > To invalidate stale lines for this address, held in any caches prior to > the PoC. I'm assuming that __early_cpu_boot_status is sufficiently > padded to the CWG. I would have rather invalidated it before writing the [x0], if that's what it's aimed at. > Cache maintenance works when SCTLR_ELx.M == 0, though barriers are > required prior to cache maintenance as non-cacheable accesses do not > hazard by VA. > > The MMU being off has no effect on the cache maintenance itself. I know, but whether it has an effect on other CPUs is a different question (it probably has). Anyway, I would rather do the invalidation on the CPU that actually reads this status. > > The operation may not even be broadcast to the other CPU. So you > > actually need the invalidation before reading the status on the > > primary CPU. > > We require that CPUs are coherent when they enter the kernel, so any > cache maintenance operation _must_ affect all coherent caches (i.e. it > must be broadcast and must affect all coherent caches prior to the PoC > in this case). In general, if you perform cache maintenance on a non-shareable mapping, I don't think it would be broadcast. But in this case, the MMU is off, data accesses default to Device_nGnRnE and considered outer shareable, so it may actually work. Is this stated anywhere in the ARM ARM? But see my point above about invalidating on the secondary CPU before writing the location and invalidating again on the primary CPU before reading it.
On Wed, Feb 03, 2016 at 05:24:07PM +0000, Suzuki K. Poulose wrote: > On 03/02/16 17:01, Mark Rutland wrote: > >On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K Poulose wrote: > >>From: Suzuki K. Poulose <suzuki.poulose@arm.com> [...] > >> struct secondary_data secondary_data; > >>+/* Number of CPUs which aren't online, but looping in kernel text. */ > >>+u32 cpus_stuck_in_kernel; > > > >Why u32 rather than int? > > No specific reasons, since it is going to be a quantity, which cannot be < 0, > kept it unsigned. It could be unsigned int. Elsewhere, int or unsigned int is used to contain a cpu number. I think either would be preferable to u32, to at least limit the inconsistency. > >>+#ifdef CONFIG_HOTPLUG_CPU > >>+static int op_cpu_kill(unsigned int cpu); > >>+#else > >>+static inline int op_cpu_kill(unsigned int cpu) > >>+{ > >>+ return -ENOSYS; > >>+} > >>+#endif > > > >There is no !CONFIG_HOTPLUG_CPU configuration any more. > > Thats what I thought but then there was [1]. If you disable CONFIG_PM_SLEEP, you can > still build with !CONFIG_HOTPLUG_CPU (or in other words allnoconfig) > > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/384589.html Aargh, indeed. I had confused CONFIG_HOTPLUG_CPU and CONFIG_SMP. Sorry! Mark.
On Wed, Feb 03, 2016 at 05:34:49PM +0000, Catalin Marinas wrote: > On Wed, Feb 03, 2016 at 04:46:32PM +0000, Mark Rutland wrote: > > On Wed, Feb 03, 2016 at 12:57:38PM +0000, Catalin Marinas wrote: > > > On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K. Poulose wrote: > > > > + * update_early_cpu_boot_status tmp, status > > > > + * - Corrupts tmp, x0, x1 > > > > + * - Writes 'status' to __early_cpu_boot_status and makes sure > > > > + * it is committed to memory. > > > > + */ > > > > + > > > > + .macro update_early_cpu_boot_status tmp, status > > > > + mov \tmp, lr > > > > + adrp x0, __early_cpu_boot_status > > > > + add x0, x0, #:lo12:__early_cpu_boot_status > > > > > > Nitpick: you could use the adr_l macro. > > > > > > > + mov x1, #\status > > > > + str x1, [x0] > > > > + add x1, x0, 4 > > > > + bl __inval_cache_range > > > > + mov lr, \tmp > > > > + .endm > > > > > > If the CPU that's currently booting has the MMU off, what's the point of > > > invalidating the cache here? > > > > To invalidate stale lines for this address, held in any caches prior to > > the PoC. I'm assuming that __early_cpu_boot_status is sufficiently > > padded to the CWG. > > I would have rather invalidated it before writing the [x0], if that's > what it's aimed at. That alone wouldn't not be sufficient, due to speculative fetches allocating new (clean) lines prior to the write completing. I was expecting the CWG-aligned region to only be written to with the MMU off, i.e. we'd only have clean stale lines and no dirty lines. > > Cache maintenance works when SCTLR_ELx.M == 0, though barriers are > > required prior to cache maintenance as non-cacheable accesses do not > > hazard by VA. > > > > The MMU being off has no effect on the cache maintenance itself. > > I know, but whether it has an effect on other CPUs is a different > question (it probably has). Anyway, I would rather do the invalidation > on the CPU that actually reads this status. My only worry would be how this gets ordered against the (non-cacheable) store. I guess we'd complete that with a DSB SY regardless. Given that, I have no problem doing the invalidate on the read side. Assuming we only write from the side with the MMU off, we don't need maintenance on that side. > > > The operation may not even be broadcast to the other CPU. So you > > > actually need the invalidation before reading the status on the > > > primary CPU. > > > > We require that CPUs are coherent when they enter the kernel, so any > > cache maintenance operation _must_ affect all coherent caches (i.e. it > > must be broadcast and must affect all coherent caches prior to the PoC > > in this case). > > In general, if you perform cache maintenance on a non-shareable mapping, > I don't think it would be broadcast. But in this case, the MMU is off, > data accesses default to Device_nGnRnE and considered outer shareable, > so it may actually work. Is this stated anywhere in the ARM ARM? In ARM DDI 0487A.h, D4.2.8 "The effects of disabling a stage of address translation" we state: Cache maintenance instructions act on the target cache regardless of whether any stages of address translation are disabled, and regardless of the values of the memory attributes. However, if a stage of address translation is disabled, they use the flat address mapping for that translation stage. Thanks, Mark.
On Wed, Feb 03, 2016 at 05:53:51PM +0000, Mark Rutland wrote: > On Wed, Feb 03, 2016 at 05:34:49PM +0000, Catalin Marinas wrote: > > On Wed, Feb 03, 2016 at 04:46:32PM +0000, Mark Rutland wrote: > > > On Wed, Feb 03, 2016 at 12:57:38PM +0000, Catalin Marinas wrote: > > > > On Mon, Jan 25, 2016 at 06:07:02PM +0000, Suzuki K. Poulose wrote: > > > > > + * update_early_cpu_boot_status tmp, status > > > > > + * - Corrupts tmp, x0, x1 > > > > > + * - Writes 'status' to __early_cpu_boot_status and makes sure > > > > > + * it is committed to memory. > > > > > + */ > > > > > + > > > > > + .macro update_early_cpu_boot_status tmp, status > > > > > + mov \tmp, lr > > > > > + adrp x0, __early_cpu_boot_status > > > > > + add x0, x0, #:lo12:__early_cpu_boot_status > > > > > > > > Nitpick: you could use the adr_l macro. > > > > > > > > > + mov x1, #\status > > > > > + str x1, [x0] > > > > > + add x1, x0, 4 > > > > > + bl __inval_cache_range > > > > > + mov lr, \tmp > > > > > + .endm > > > > > > > > If the CPU that's currently booting has the MMU off, what's the point of > > > > invalidating the cache here? > > > > > > To invalidate stale lines for this address, held in any caches prior to > > > the PoC. I'm assuming that __early_cpu_boot_status is sufficiently > > > padded to the CWG. > > > > I would have rather invalidated it before writing the [x0], if that's > > what it's aimed at. > > That alone wouldn't not be sufficient, due to speculative fetches > allocating new (clean) lines prior to the write completing. Indeed. > I was expecting the CWG-aligned region to only be written to with the > MMU off, i.e. we'd only have clean stale lines and no dirty lines. I assume that's true even in a guest. > > > Cache maintenance works when SCTLR_ELx.M == 0, though barriers are > > > required prior to cache maintenance as non-cacheable accesses do not > > > hazard by VA. > > > > > > The MMU being off has no effect on the cache maintenance itself. > > > > I know, but whether it has an effect on other CPUs is a different > > question (it probably has). Anyway, I would rather do the invalidation > > on the CPU that actually reads this status. > > My only worry would be how this gets ordered against the (non-cacheable) > store. I guess we'd complete that with a DSB SY regardless. The synchronisation on the primary CPU is a bit flaky anyway and just relies on long timeouts. It waits for a while to complete (1 sec) and than it reads the status. So it assumes that the secondary CPU finished its writing but neither DSB nor cache maintenance guarantee this. > Given that, I have no problem doing the invalidate on the read side. > Assuming we only write from the side with the MMU off, we don't need > maintenance on that side. I agree. > > > > The operation may not even be broadcast to the other CPU. So you > > > > actually need the invalidation before reading the status on the > > > > primary CPU. > > > > > > We require that CPUs are coherent when they enter the kernel, so any > > > cache maintenance operation _must_ affect all coherent caches (i.e. it > > > must be broadcast and must affect all coherent caches prior to the PoC > > > in this case). > > > > In general, if you perform cache maintenance on a non-shareable mapping, > > I don't think it would be broadcast. But in this case, the MMU is off, > > data accesses default to Device_nGnRnE and considered outer shareable, > > so it may actually work. Is this stated anywhere in the ARM ARM? > > In ARM DDI 0487A.h, D4.2.8 "The effects of disabling a stage of address > translation" we state: > > Cache maintenance instructions act on the target cache > regardless of whether any stages of address translation are > disabled, and regardless of the values of the memory attributes. > However, if a stage of address translation is disabled, they use > the flat address mapping for that translation stage. But does "target cache" include other CPUs in the system?
On Wed, Feb 03, 2016 at 06:12:52PM +0000, Catalin Marinas wrote: > On Wed, Feb 03, 2016 at 05:53:51PM +0000, Mark Rutland wrote: > > On Wed, Feb 03, 2016 at 05:34:49PM +0000, Catalin Marinas wrote: > > > In general, if you perform cache maintenance on a non-shareable mapping, > > > I don't think it would be broadcast. But in this case, the MMU is off, > > > data accesses default to Device_nGnRnE and considered outer shareable, > > > so it may actually work. Is this stated anywhere in the ARM ARM? > > > > In ARM DDI 0487A.h, D4.2.8 "The effects of disabling a stage of address > > translation" we state: > > > > Cache maintenance instructions act on the target cache > > regardless of whether any stages of address translation are > > disabled, and regardless of the values of the memory attributes. > > However, if a stage of address translation is disabled, they use > > the flat address mapping for that translation stage. > > But does "target cache" include other CPUs in the system? It appears so. There's a more complete description (with a table) on page D3-1718, "Effects of instructions that operate by VA to the Point of Coherency", where it's explicitly stated: For Device memory and Normal memory that is Inner Non-cacheable, Outer Non-cacheable, these instructions must affect the caches of all PEs in the Outer Shareable shareability domain of the PE on which the instruction is operating. In other cases, per the table, the maintenance may be limited to a smaller set of caches. Thanks, Mark.
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 32e75ee..6548769 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -16,6 +16,19 @@ #ifndef __ASM_SMP_H #define __ASM_SMP_H +/* Values for secondary_data.status */ + +#define CPU_MMU_OFF -1 +#define CPU_BOOT_SUCCESS 0 +/* The cpu invoked ops->cpu_die, synchronise it with cpu_kill */ +#define CPU_KILL_ME 1 +/* The cpu couldn't die gracefully and is looping in the kernel */ +#define CPU_STUCK_IN_KERNEL 2 +/* Fatal system error detected by secondary CPU, crash the system */ +#define CPU_PANIC_KERNEL 3 + +#ifndef __ASSEMBLY__ + #include <linux/threads.h> #include <linux/cpumask.h> #include <linux/thread_info.h> @@ -54,11 +67,17 @@ asmlinkage void secondary_start_kernel(void); /* * Initial data for bringing up a secondary CPU. + * @stack - sp for the secondary CPU + * @status - Result passed back from the secondary CPU to + * indicate failure. */ struct secondary_data { void *stack; + int status; }; + extern struct secondary_data secondary_data; +extern int __early_cpu_boot_status; extern void secondary_entry(void); extern void arch_send_call_function_single_ipi(int cpu); @@ -78,4 +97,11 @@ static inline void cpu_park_loop(void) } } +static inline void update_cpu_boot_status(int val) +{ + WRITE_ONCE(secondary_data.status, val); +} + +#endif /* ifndef __ASSEMBLY__ */ + #endif /* ifndef __ASM_SMP_H */ diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index fffa4ac6..463fcf1 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -104,6 +104,8 @@ int main(void) DEFINE(TZ_MINWEST, offsetof(struct timezone, tz_minuteswest)); DEFINE(TZ_DSTTIME, offsetof(struct timezone, tz_dsttime)); BLANK(); + DEFINE(CPU_BOOT_STACK, offsetof(struct secondary_data, stack)); + BLANK(); #ifdef CONFIG_KVM_ARM_HOST DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt)); DEFINE(CPU_GP_REGS, offsetof(struct kvm_cpu_context, gp_regs)); diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index ffe9c2b..16bfa34 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -34,6 +34,7 @@ #include <asm/pgtable-hwdef.h> #include <asm/pgtable.h> #include <asm/page.h> +#include <asm/smp.h> #include <asm/sysreg.h> #include <asm/thread_info.h> #include <asm/virt.h> @@ -606,7 +607,7 @@ ENTRY(secondary_startup) ENDPROC(secondary_startup) ENTRY(__secondary_switched) - ldr x0, [x21] // get secondary_data.stack + ldr x0, [x21, #CPU_BOOT_STACK] // get secondary_data.stack mov sp, x0 and x0, x0, #~(THREAD_SIZE - 1) msr sp_el0, x0 // save thread_info @@ -615,6 +616,32 @@ ENTRY(__secondary_switched) ENDPROC(__secondary_switched) /* + * The booting CPU updates the failed status, with MMU turned off, + * below which lies in head.txt to make sure it doesn't share the same writeback + * granule. So that we can invalidate it properly. + * + * update_early_cpu_boot_status tmp, status + * - Corrupts tmp, x0, x1 + * - Writes 'status' to __early_cpu_boot_status and makes sure + * it is committed to memory. + */ + + .macro update_early_cpu_boot_status tmp, status + mov \tmp, lr + adrp x0, __early_cpu_boot_status + add x0, x0, #:lo12:__early_cpu_boot_status + mov x1, #\status + str x1, [x0] + add x1, x0, 4 + bl __inval_cache_range + mov lr, \tmp + .endm + +ENTRY(__early_cpu_boot_status) + .long 0 +END(__early_cpu_boot_status) + +/* * Enable the MMU. * * x0 = SCTLR_EL1 value for turning on the MMU. @@ -631,12 +658,14 @@ __enable_mmu: ubfx x2, x1, #ID_AA64MMFR0_TGRAN_SHIFT, 4 cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED b.ne __no_granule_support + mov x4, x0 + update_early_cpu_boot_status x2, 0 ldr x5, =vectors msr vbar_el1, x5 msr ttbr0_el1, x25 // load TTBR0 msr ttbr1_el1, x26 // load TTBR1 isb - msr sctlr_el1, x0 + msr sctlr_el1, x4 isb /* * Invalidate the local I-cache so that any instructions fetched @@ -650,6 +679,10 @@ __enable_mmu: ENDPROC(__enable_mmu) __no_granule_support: + /* Indicate that this CPU can't boot and is stuck in the kernel */ + update_early_cpu_boot_status x2, CPU_STUCK_IN_KERNEL +1: wfe - b __no_granule_support + wfi + b 1b ENDPROC(__no_granule_support) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 59f032b..d2721ae 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -63,6 +63,8 @@ * where to place its SVC stack */ struct secondary_data secondary_data; +/* Number of CPUs which aren't online, but looping in kernel text. */ +u32 cpus_stuck_in_kernel; enum ipi_msg_type { IPI_RESCHEDULE, @@ -72,6 +74,16 @@ enum ipi_msg_type { IPI_IRQ_WORK, }; +#ifdef CONFIG_HOTPLUG_CPU +static int op_cpu_kill(unsigned int cpu); +#else +static inline int op_cpu_kill(unsigned int cpu) +{ + return -ENOSYS; +} +#endif + + /* * Boot a secondary CPU, and assign it the specified idle task. * This also gives us the initial stack to use for this CPU. @@ -89,12 +101,14 @@ static DECLARE_COMPLETION(cpu_running); int __cpu_up(unsigned int cpu, struct task_struct *idle) { int ret; + int status; /* * We need to tell the secondary core where to find its stack and the * page tables. */ secondary_data.stack = task_stack_page(idle) + THREAD_START_SP; + update_cpu_boot_status(CPU_MMU_OFF); __flush_dcache_area(&secondary_data, sizeof(secondary_data)); /* @@ -117,7 +131,35 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) pr_err("CPU%u: failed to boot: %d\n", cpu, ret); } + /* Make sure the update to status is visible */ + smp_rmb(); secondary_data.stack = NULL; + status = READ_ONCE(secondary_data.status); + if (ret && status) { + + if (status == CPU_MMU_OFF) + status = READ_ONCE(__early_cpu_boot_status); + + switch (status) { + default: + pr_err("CPU%u: failed in unknown state : 0x%x\n", + cpu, status); + break; + case CPU_KILL_ME: + if (!op_cpu_kill(cpu)) { + pr_crit("CPU%u: died during early boot\n", cpu); + break; + } + /* Fall through */ + pr_crit("CPU%u: may not have shut down cleanly\n", cpu); + case CPU_STUCK_IN_KERNEL: + pr_crit("CPU%u: is stuck in kernel\n", cpu); + cpus_stuck_in_kernel++; + break; + case CPU_PANIC_KERNEL: + panic("CPU%u detected unsupported configuration\n", cpu); + } + } return ret; } @@ -185,6 +227,9 @@ asmlinkage void secondary_start_kernel(void) */ pr_info("CPU%u: Booted secondary processor [%08x]\n", cpu, read_cpuid_id()); + update_cpu_boot_status(CPU_BOOT_SUCCESS); + /* Make sure the status update is visible before we complete */ + smp_wmb(); set_cpu_online(cpu, true); complete(&cpu_running); @@ -327,10 +372,12 @@ void cpu_die_early(void) set_cpu_present(cpu, 0); #ifdef CONFIG_HOTPLUG_CPU + update_cpu_boot_status(CPU_KILL_ME); /* Check if we can park ourselves */ if (cpu_ops[cpu] && cpu_ops[cpu]->cpu_die) cpu_ops[cpu]->cpu_die(cpu); #endif + update_cpu_boot_status(CPU_STUCK_IN_KERNEL); cpu_park_loop(); }