Message ID | alpine.DEB.2.20.1612141807410.3556@nanos (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 14/12/2016 18:08, Thomas Gleixner wrote: > On Wed, 14 Dec 2016, Mason wrote: > >> I'm seeing Linux v4.9 crash (dereferencing NULL) when I try to online >> the secondary core, after putting it offline. > > Does the patch below fix the issue? > > Thanks, > > tglx > > 8<--------------- > > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > index 22acee76cf4c..2594c287b078 100644 > --- a/include/linux/cpuhotplug.h > +++ b/include/linux/cpuhotplug.h > @@ -101,7 +101,6 @@ enum cpuhp_state { > CPUHP_AP_ARM_L2X0_STARTING, > CPUHP_AP_ARM_ARCH_TIMER_STARTING, > CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, > - CPUHP_AP_DUMMY_TIMER_STARTING, > CPUHP_AP_JCORE_TIMER_STARTING, > CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, > CPUHP_AP_ARM_TWD_STARTING, > @@ -111,6 +110,7 @@ enum cpuhp_state { > CPUHP_AP_MARCO_TIMER_STARTING, > CPUHP_AP_MIPS_GIC_TIMER_STARTING, > CPUHP_AP_ARC_TIMER_STARTING, > + CPUHP_AP_DUMMY_TIMER_STARTING, > CPUHP_AP_KVM_STARTING, > CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING, > CPUHP_AP_KVM_ARM_VGIC_STARTING, $ patch -p1 < tglx.patch patching file include/linux/cpuhotplug.h Hunk #1 succeeded at 80 (offset -21 lines). Hunk #2 succeeded at 89 (offset -21 lines). It does seem to fix the problem: # echo 0 > /sys/devices/system/cpu/cpu1/online SMC called with a0=0x00[000001 a1=0x00000121 a2=0x00000005 a3 =0xc01189b4 0x00000121 [1][flow/suspend3.c:39] CPU 1 die: jumping6 to. post-boot WFE 402826] CPU1: shutdown SMC called with a0=0x00000001 a1=0x00000122 a2=0x00000000 a3=0x00000000 0x00000122 [0][flow/suspend.c:82] Killing core1 armor+++ armor: core 1 booted, entering wfe... # echo 1 > /sys/devices/system/cpu/cpu1/online [ 215.692700] tango_boot_secondary from __cpu_up SMC called with a0=0x80101500 a1=0x00000105 a2=0x00000000 a3=0x00000000 0x00000105 [ 215.704494] tango_set_aux_boot_addr=0 SMC called with a0=0x00000001 a1=0x00000104 a2=0x00000000 a3=0x00000000 0x00000104 [0][flow/smc_handler.c:127] waking up CPU1 [ 215.719308] tango_start_aux_core=0 I reverted your patch, and the kernel blows up again. So what's the problem, and how does your patch solve it? Regards.
On 14/12/2016 18:47, Mason wrote: > On 14/12/2016 18:08, Thomas Gleixner wrote: > >> On Wed, 14 Dec 2016, Mason wrote: >> >>> I'm seeing Linux v4.9 crash (dereferencing NULL) when I try to online >>> the secondary core, after putting it offline. >> >> Does the patch below fix the issue? >> >> Thanks, >> >> tglx >> >> 8<--------------- >> >> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h >> index 22acee76cf4c..2594c287b078 100644 >> --- a/include/linux/cpuhotplug.h >> +++ b/include/linux/cpuhotplug.h >> @@ -101,7 +101,6 @@ enum cpuhp_state { >> CPUHP_AP_ARM_L2X0_STARTING, >> CPUHP_AP_ARM_ARCH_TIMER_STARTING, >> CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, >> - CPUHP_AP_DUMMY_TIMER_STARTING, >> CPUHP_AP_JCORE_TIMER_STARTING, >> CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, >> CPUHP_AP_ARM_TWD_STARTING, >> @@ -111,6 +110,7 @@ enum cpuhp_state { >> CPUHP_AP_MARCO_TIMER_STARTING, >> CPUHP_AP_MIPS_GIC_TIMER_STARTING, >> CPUHP_AP_ARC_TIMER_STARTING, >> + CPUHP_AP_DUMMY_TIMER_STARTING, >> CPUHP_AP_KVM_STARTING, >> CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING, >> CPUHP_AP_KVM_ARM_VGIC_STARTING, > > $ patch -p1 < tglx.patch > patching file include/linux/cpuhotplug.h > Hunk #1 succeeded at 80 (offset -21 lines). > Hunk #2 succeeded at 89 (offset -21 lines). > > It does seem to fix the problem: > > # echo 0 > /sys/devices/system/cpu/cpu1/online > SMC called with a0=0x00[000001 a1=0x00000121 a2=0x00000005 a3 =0xc01189b4 0x00000121 > [1][flow/suspend3.c:39] CPU 1 die: jumping6 to. post-boot WFE > 402826] CPU1: shutdown > SMC called with a0=0x00000001 a1=0x00000122 a2=0x00000000 a3=0x00000000 0x00000122 > [0][flow/suspend.c:82] Killing core1 > armor+++ armor: core 1 booted, entering wfe... > # echo 1 > /sys/devices/system/cpu/cpu1/online > [ 215.692700] tango_boot_secondary from __cpu_up > SMC called with a0=0x80101500 a1=0x00000105 a2=0x00000000 a3=0x00000000 0x00000105 > [ 215.704494] tango_set_aux_boot_addr=0 > SMC called with a0=0x00000001 a1=0x00000104 a2=0x00000000 a3=0x00000000 0x00000104 > [0][flow/smc_handler.c:127] waking up CPU1 > [ 215.719308] tango_start_aux_core=0 > > > I reverted your patch, and the kernel blows up again. > > So what's the problem, and how does your patch solve it? Link to the original report: https://marc.info/?l=linux-arm-kernel&m=148173152524746&w=2 Forgot to CC Robin Murphy, who had provided valuable input in similar circumstances a few months back. Also add LKML, since this doesn't appear to be ARM-specific. Do I need to specify which device tree I was using? Regards.
On Thu, Dec 15, 2016 at 11:35:12AM +0100, Mason wrote: > On 14/12/2016 18:47, Mason wrote: > > On 14/12/2016 18:08, Thomas Gleixner wrote: > >> Does the patch below fix the issue? > >> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > >> index 22acee76cf4c..2594c287b078 100644 > >> --- a/include/linux/cpuhotplug.h > >> +++ b/include/linux/cpuhotplug.h > >> @@ -101,7 +101,6 @@ enum cpuhp_state { > >> CPUHP_AP_ARM_L2X0_STARTING, > >> CPUHP_AP_ARM_ARCH_TIMER_STARTING, > >> CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, > >> - CPUHP_AP_DUMMY_TIMER_STARTING, > >> CPUHP_AP_JCORE_TIMER_STARTING, > >> CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, > >> CPUHP_AP_ARM_TWD_STARTING, > >> @@ -111,6 +110,7 @@ enum cpuhp_state { > >> CPUHP_AP_MARCO_TIMER_STARTING, > >> CPUHP_AP_MIPS_GIC_TIMER_STARTING, > >> CPUHP_AP_ARC_TIMER_STARTING, > >> + CPUHP_AP_DUMMY_TIMER_STARTING, > >> CPUHP_AP_KVM_STARTING, > >> CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING, > >> CPUHP_AP_KVM_ARM_VGIC_STARTING, > > It does seem to fix the problem: > > I reverted your patch, and the kernel blows up again. > > > > So what's the problem, and how does your patch solve it? > > Link to the original report: > https://marc.info/?l=linux-arm-kernel&m=148173152524746&w=2 > > Forgot to CC Robin Murphy, who had provided valuable input > in similar circumstances a few months back. > > Also add LKML, since this doesn't appear to be ARM-specific. > > Do I need to specify which device tree I was using? This is already fixed in the linux-tip tree, with commit messages describing the fix. It's specific to a few clocksources, due to their hotplug callbacks occuring later than the dummy timer. That triggers the bug fixed in: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=timers/urgent&id=c1a9eeb938b5433947e5ea22f89baff3182e7075 The relevant timers were fixed in: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=smp/urgent&id=9bf11ecce5a2758e5a097c2f3a13d08552d0d6f9 Thanks, Mark.
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 22acee76cf4c..2594c287b078 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -101,7 +101,6 @@ enum cpuhp_state { CPUHP_AP_ARM_L2X0_STARTING, CPUHP_AP_ARM_ARCH_TIMER_STARTING, CPUHP_AP_ARM_GLOBAL_TIMER_STARTING, - CPUHP_AP_DUMMY_TIMER_STARTING, CPUHP_AP_JCORE_TIMER_STARTING, CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, CPUHP_AP_ARM_TWD_STARTING, @@ -111,6 +110,7 @@ enum cpuhp_state { CPUHP_AP_MARCO_TIMER_STARTING, CPUHP_AP_MIPS_GIC_TIMER_STARTING, CPUHP_AP_ARC_TIMER_STARTING, + CPUHP_AP_DUMMY_TIMER_STARTING, CPUHP_AP_KVM_STARTING, CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING, CPUHP_AP_KVM_ARM_VGIC_STARTING,