mbox series

[0/2] Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI

Message ID 20200924044236.1245808-1-ito-yuichi@fujitsu.com (mailing list archive)
Headers show
Series Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI | expand

Message

Yuichi Ito Sept. 24, 2020, 4:42 a.m. UTC
Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI

This patchset enables IPI_CPU_CRASH_STOP IPI to be pseudo-NMI.
This allows kdump to collect system information even when the CPU is in
a HARDLOCKUP state.

Only IPI_CPU_CRASH_STOP uses NMI and the other IPIs remain normal IRQs.

The patch has been tested on ThunderX.

This patch assumes Marc's latest IPIs patch-set. [1]
It also uses some of Sumit's IPI patch set for NMI.[2]

[1] https://lore.kernel.org/linux-arm-kernel/20200901144324.1071694-1-maz@kernel.org/
[2] https://lore.kernel.org/linux-arm-kernel/1599830924-13990-3-git-send-email-sumit.garg@linaro.org/

$ echo 1 > /proc/sys/kernel/panic_on_rcu_stal
$ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
   : kernel panics and crash kernel boot
   : makedumpfile saves the system state at HARDLOCKUP in vmcore.

crash utility:
crash> bt
  PID: 3213   TASK: fffffd001adc5940  CPU: 8   COMMAND: "bash"
  #0 [fffffe0022fefcf0] lkdtm_HARDLOCKUP at fffffe0010888ab4
  #1 [fffffe0022fefd10] lkdtm_do_action at fffffe00108882bc
  #2 [fffffe0022fefd20] direct_entry at fffffe0010888720
  #3 [fffffe0022fefd70] full_proxy_write at fffffe001058cfe4
  #4 [fffffe0022fefdb0] vfs_write at fffffe00104a4c2c
  #5 [fffffe0022fefdf0] ksys_write at fffffe00104a4f0c
  #6 [fffffe0022fefe40] __arm64_sys_write at fffffe00104a4fbc
  #7 [fffffe0022fefe50] el0_svc_common.constprop.0 at fffffe0010159e38
  #8 [fffffe0022fefe80] do_el0_svc at fffffe0010159fa0
  #9 [fffffe0022fefe90] el0_svc at fffffe00101481d0
  #10 [fffffe0022fefea0] el0_sync_handler at fffffe00101484b4
  #11 [fffffe0022fefff0] el0_sync at fffffe0010142b7c


Sumit Garg (1):
  irqchip/gic-v3: Enable support for SGIs to act as NMIs

Yuichi Ito (1):
  Register IPI_CPU_CRASH_STOP IPI as pseudo-NMI

 arch/arm64/kernel/smp.c      | 39 ++++++++++++++++++++++++++++--------
 drivers/irqchip/irq-gic-v3.c | 13 ++++++++++--
 2 files changed, 42 insertions(+), 10 deletions(-)

Comments

Yuichi Ito Sept. 28, 2020, 2:43 a.m. UTC | #1
Hi Marc, Sumit

I would appreciate if you have any advice on this patch.

Yuichi Ito

> -----Original Message-----
> From: Yuichi Ito <ito-yuichi@fujitsu.com>
> Sent: Thursday, September 24, 2020 1:43 PM
> To: maz@kernel.org; sumit.garg@linaro.org; tglx@linutronix.de;
> jason@lakedaemon.net; catalin.marinas@arm.com; will@kernel.org
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; Ito,
> Yuichi/伊藤 有一 <ito-yuichi@fujitsu.com>
> Subject: [PATCH 0/2] Enable support IPI_CPU_CRASH_STOP to be
> pseudo-NMI
> 
> Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI
> 
> This patchset enables IPI_CPU_CRASH_STOP IPI to be pseudo-NMI.
> This allows kdump to collect system information even when the CPU is in a
> HARDLOCKUP state.
> 
> Only IPI_CPU_CRASH_STOP uses NMI and the other IPIs remain normal
> IRQs.
> 
> The patch has been tested on ThunderX.
> 
> This patch assumes Marc's latest IPIs patch-set. [1] It also uses some of
> Sumit's IPI patch set for NMI.[2]
> 
> [1]
> https://lore.kernel.org/linux-arm-kernel/20200901144324.1071694-1-maz@ke
> rnel.org/
> [2]
> https://lore.kernel.org/linux-arm-kernel/1599830924-13990-3-git-send-email
> -sumit.garg@linaro.org/
> 
> $ echo 1 > /proc/sys/kernel/panic_on_rcu_stal
> $ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
>    : kernel panics and crash kernel boot
>    : makedumpfile saves the system state at HARDLOCKUP in vmcore.
> 
> crash utility:
> crash> bt
>   PID: 3213   TASK: fffffd001adc5940  CPU: 8   COMMAND: "bash"
>   #0 [fffffe0022fefcf0] lkdtm_HARDLOCKUP at fffffe0010888ab4
>   #1 [fffffe0022fefd10] lkdtm_do_action at fffffe00108882bc
>   #2 [fffffe0022fefd20] direct_entry at fffffe0010888720
>   #3 [fffffe0022fefd70] full_proxy_write at fffffe001058cfe4
>   #4 [fffffe0022fefdb0] vfs_write at fffffe00104a4c2c
>   #5 [fffffe0022fefdf0] ksys_write at fffffe00104a4f0c
>   #6 [fffffe0022fefe40] __arm64_sys_write at fffffe00104a4fbc
>   #7 [fffffe0022fefe50] el0_svc_common.constprop.0 at fffffe0010159e38
>   #8 [fffffe0022fefe80] do_el0_svc at fffffe0010159fa0
>   #9 [fffffe0022fefe90] el0_svc at fffffe00101481d0
>   #10 [fffffe0022fefea0] el0_sync_handler at fffffe00101484b4
>   #11 [fffffe0022fefff0] el0_sync at fffffe0010142b7c
> 
> 
> Sumit Garg (1):
>   irqchip/gic-v3: Enable support for SGIs to act as NMIs
> 
> Yuichi Ito (1):
>   Register IPI_CPU_CRASH_STOP IPI as pseudo-NMI
> 
>  arch/arm64/kernel/smp.c      | 39
> ++++++++++++++++++++++++++++--------
>  drivers/irqchip/irq-gic-v3.c | 13 ++++++++++--
>  2 files changed, 42 insertions(+), 10 deletions(-)
> 
> --
> 2.25.1
Marc Zyngier Sept. 28, 2020, 8:59 a.m. UTC | #2
On 2020-09-28 03:43, ito-yuichi@fujitsu.com wrote:
> Hi Marc, Sumit
> 
> I would appreciate if you have any advice on this patch.

I haven't had a chance to look into it, as I'm not even sure I'll
take the core series in the first place (there are outstanding
regressions I can't reproduce, let alone fix them).

> 
> Yuichi Ito
> 
>> -----Original Message-----
>> From: Yuichi Ito <ito-yuichi@fujitsu.com>
>> Sent: Thursday, September 24, 2020 1:43 PM
>> To: maz@kernel.org; sumit.garg@linaro.org; tglx@linutronix.de;
>> jason@lakedaemon.net; catalin.marinas@arm.com; will@kernel.org
>> Cc: linux-arm-kernel@lists.infradead.org; 
>> linux-kernel@vger.kernel.org; Ito,
>> Yuichi/伊藤 有一 <ito-yuichi@fujitsu.com>
>> Subject: [PATCH 0/2] Enable support IPI_CPU_CRASH_STOP to be
>> pseudo-NMI
>> 
>> Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI
>> 
>> This patchset enables IPI_CPU_CRASH_STOP IPI to be pseudo-NMI.
>> This allows kdump to collect system information even when the CPU is 
>> in a
>> HARDLOCKUP state.
>> 
>> Only IPI_CPU_CRASH_STOP uses NMI and the other IPIs remain normal
>> IRQs.
>> 
>> The patch has been tested on ThunderX.

Which ThunderX? TX2 (at least the incarnation I used in the past) wasn't
able to correctly deal with priorities.

         M.
Yuichi Ito Sept. 29, 2020, 5:50 a.m. UTC | #3
Hi Marc

Thank you for your reply.

> On 2020-09-28 03:43, ito-yuichi@fujitsu.com wrote:
> > Hi Marc, Sumit
> >
> > I would appreciate if you have any advice on this patch.
> 
> I haven't had a chance to look into it, as I'm not even sure I'll take the core
> series in the first place (there are outstanding regressions I can't reproduce,
> let alone fix them).
> 

I understand it.
Please let me know if there is anything I can do.
I sincerely hope that your patches will be merged into the mainline.

> >
> > Yuichi Ito
> >
> >> Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI
> >>
> >> This patchset enables IPI_CPU_CRASH_STOP IPI to be pseudo-NMI.
> >> This allows kdump to collect system information even when the CPU is
> >> in a HARDLOCKUP state.
> >>
> >> Only IPI_CPU_CRASH_STOP uses NMI and the other IPIs remain normal
> >> IRQs.
> >>
> >> The patch has been tested on ThunderX.
> 
> Which ThunderX? TX2 (at least the incarnation I used in the past) wasn't able
> to correctly deal with priorities.

I tried it with ThunderX CN8890.
If you tell me steps to reproduce the problem of TX2, I will investigate it with TX as well.

>          M.
> --
> Jazz is not dead. It just smells funny...

Thank you and best regards,

Yuichi Ito
Marc Zyngier Sept. 29, 2020, 10:54 a.m. UTC | #4
On 2020-09-29 06:50, ito-yuichi@fujitsu.com wrote:
> Hi Marc

[...]

>> >> The patch has been tested on ThunderX.
>> 
>> Which ThunderX? TX2 (at least the incarnation I used in the past) 
>> wasn't able
>> to correctly deal with priorities.
> 
> I tried it with ThunderX CN8890.
> If you tell me steps to reproduce the problem of TX2, I will
> investigate it with TX as well.

PMR_EL1 reporting fantasy values, non-uniform priority support across
the interrupt classes, and generally prone to lockups. The original TX
is a very different machine though (TX 1 and 2 only share the engraving
of the manufacturer on the heat-spreader).

         M.
Yuichi Ito Sept. 30, 2020, 8:51 a.m. UTC | #5
Hi Marc

> 
> On 2020-09-29 06:50, ito-yuichi@fujitsu.com wrote:
> > Hi Marc
> 
> [...]
> 
> >> >> The patch has been tested on ThunderX.
> >>
> >> Which ThunderX? TX2 (at least the incarnation I used in the past)
> >> wasn't able
> >> to correctly deal with priorities.
> >
> > I tried it with ThunderX CN8890.
> > If you tell me steps to reproduce the problem of TX2, I will
> > investigate it with TX as well.
> 
> PMR_EL1 reporting fantasy values, non-uniform priority support across
> the interrupt classes, and generally prone to lockups. The original TX
> is a very different machine though (TX 1 and 2 only share the engraving
> of the manufacturer on the heat-spreader).

Thank you for the information.
I will check if we have a ThunderX1 or X2 environment. If we have either one, I will investigate it.

>          M.
> --
> Jazz is not dead. It just smells funny...


Thank you and best regards,

Yuichi Ito