mbox series

[0/2] arm64: ACPI GTDT watchdog fixes

Message ID 20210421164317.1718831-1-maz@kernel.org (mailing list archive)
Headers show
Series arm64: ACPI GTDT watchdog fixes | expand

Message

Marc Zyngier April 21, 2021, 4:43 p.m. UTC
Dann recently reported that his ThunderX machine failed to boot since
64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
interrupts"), with a not so pretty crash while trying to send an IPI.

It turned out to be caused by a mix of broken firmware and a buggy
GTDT watchdog driver. Both have forever been buggy, but the above
commit revealed that the error handling path of the driver was
probably the worse part of it all.

Anyway, this short series has two goals:
- handle broken firmware in a less broken way
- make sure that the route cause of the problem can be identified
  quickly

Thanks,

	M.

Marc Zyngier (2):
  ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
  ACPI: irq: Prevent unregistering of GIC SGIs

 drivers/acpi/arm64/gtdt.c | 10 ++++++----
 drivers/acpi/irq.c        |  6 +++++-
 2 files changed, 11 insertions(+), 5 deletions(-)

Comments

dann frazier April 21, 2021, 6:09 p.m. UTC | #1
On Wed, Apr 21, 2021 at 05:43:15PM +0100, Marc Zyngier wrote:
> Dann recently reported that his ThunderX machine failed to boot since
> 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> interrupts"), with a not so pretty crash while trying to send an IPI.
> 
> It turned out to be caused by a mix of broken firmware and a buggy
> GTDT watchdog driver. Both have forever been buggy, but the above
> commit revealed that the error handling path of the driver was
> probably the worse part of it all.
> 
> Anyway, this short series has two goals:
> - handle broken firmware in a less broken way
> - make sure that the route cause of the problem can be identified
>   quickly
> 
> Thanks,
> 
> 	M.
> 
> Marc Zyngier (2):
>   ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
>   ACPI: irq: Prevent unregistering of GIC SGIs
> 
>  drivers/acpi/arm64/gtdt.c | 10 ++++++----
>  drivers/acpi/irq.c        |  6 +++++-
>  2 files changed, 11 insertions(+), 5 deletions(-)

For the series:

Tested-by: dann frazier <dann.frazier@canonical.com>
Hanjun Guo April 22, 2021, 1:42 p.m. UTC | #2
On 2021/4/22 0:43, Marc Zyngier wrote:
> Dann recently reported that his ThunderX machine failed to boot since
> 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> interrupts"), with a not so pretty crash while trying to send an IPI.
> 
> It turned out to be caused by a mix of broken firmware and a buggy
> GTDT watchdog driver. Both have forever been buggy, but the above
> commit revealed that the error handling path of the driver was
> probably the worse part of it all.
> 
> Anyway, this short series has two goals:
> - handle broken firmware in a less broken way
> - make sure that the route cause of the problem can be identified
>    quickly

Tested on Kunpeng920 ARM64 server, didn't any issue after applying
this patch set,

Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>

Thanks
Hanjun
Lorenzo Pieralisi April 22, 2021, 2:23 p.m. UTC | #3
On Wed, Apr 21, 2021 at 05:43:15PM +0100, Marc Zyngier wrote:
> Dann recently reported that his ThunderX machine failed to boot since
> 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> interrupts"), with a not so pretty crash while trying to send an IPI.
> 
> It turned out to be caused by a mix of broken firmware and a buggy
> GTDT watchdog driver. Both have forever been buggy, but the above
> commit revealed that the error handling path of the driver was
> probably the worse part of it all.
> 
> Anyway, this short series has two goals:
> - handle broken firmware in a less broken way
> - make sure that the route cause of the problem can be identified
>   quickly
> 
> Thanks,
> 
> 	M.
> 
> Marc Zyngier (2):
>   ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
>   ACPI: irq: Prevent unregistering of GIC SGIs
> 
>  drivers/acpi/arm64/gtdt.c | 10 ++++++----
>  drivers/acpi/irq.c        |  6 +++++-
>  2 files changed, 11 insertions(+), 5 deletions(-)

Patch(2) needs an ACK from Rafael - usually these patches go via
the ARM64 tree but I don't think it is compulsory for this series.

Thank you !

Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Catalin Marinas April 23, 2021, 1:43 p.m. UTC | #4
On Thu, Apr 22, 2021 at 03:23:42PM +0100, Lorenzo Pieralisi wrote:
> On Wed, Apr 21, 2021 at 05:43:15PM +0100, Marc Zyngier wrote:
> > Dann recently reported that his ThunderX machine failed to boot since
> > 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> > interrupts"), with a not so pretty crash while trying to send an IPI.
> > 
> > It turned out to be caused by a mix of broken firmware and a buggy
> > GTDT watchdog driver. Both have forever been buggy, but the above
> > commit revealed that the error handling path of the driver was
> > probably the worse part of it all.
> > 
> > Anyway, this short series has two goals:
> > - handle broken firmware in a less broken way
> > - make sure that the route cause of the problem can be identified
> >   quickly
> > 
> > Thanks,
> > 
> > 	M.
> > 
> > Marc Zyngier (2):
> >   ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
> >   ACPI: irq: Prevent unregistering of GIC SGIs
> > 
> >  drivers/acpi/arm64/gtdt.c | 10 ++++++----
> >  drivers/acpi/irq.c        |  6 +++++-
> >  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> Patch(2) needs an ACK from Rafael - usually these patches go via
> the ARM64 tree but I don't think it is compulsory for this series.
> 
> Thank you !
> 
> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Thanks Lorenzo.

Rafael, if there are no objections, I'll take these two patches in the
arm64 tree.
Catalin Marinas April 23, 2021, 5:11 p.m. UTC | #5
On Wed, 21 Apr 2021 17:43:15 +0100, Marc Zyngier wrote:
> Dann recently reported that his ThunderX machine failed to boot since
> 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard
> interrupts"), with a not so pretty crash while trying to send an IPI.
> 
> It turned out to be caused by a mix of broken firmware and a buggy
> GTDT watchdog driver. Both have forever been buggy, but the above
> commit revealed that the error handling path of the driver was
> probably the worse part of it all.
> 
> [...]

Applied to arm64 (for-next/core), thanks!

[1/2] ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
      https://git.kernel.org/arm64/c/1ecd5b129252
[2/2] ACPI: irq: Prevent unregistering of GIC SGIs
      https://git.kernel.org/arm64/c/2a20b08f06e7