diff mbox

[v2,1/3] ARM: catch pending imprecise abort on unmask

Message ID 1444905142-21500-2-git-send-email-l.stach@pengutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Lucas Stach Oct. 15, 2015, 10:32 a.m. UTC
Install a non-faulting handler just before unmasking imprecise aborts
and switch back to the regular one after unmasking is done.

This catches any pending imprecise abort that the firmware/bootloader
may have left behind that would normally crash the kernel at that point.
As there are apparently a lot of bootlaoders out there that do such a
thing it makes sense to handle it in the common startup code.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
v2:
- move temporary fault handler swapping into fault.c to hide it from
  other parts of the kernel
- print FSR, when warning user about bad firmware/bootloader
---
 arch/arm/mm/fault.c | 22 ++++++++++++++++++++++
 arch/arm/mm/fault.h |  1 +
 arch/arm/mm/mmu.c   |  3 ++-
 3 files changed, 25 insertions(+), 1 deletion(-)

Comments

Russell King - ARM Linux Oct. 15, 2015, 3:32 p.m. UTC | #1
On Thu, Oct 15, 2015 at 12:32:20PM +0200, Lucas Stach wrote:
> Install a non-faulting handler just before unmasking imprecise aborts
> and switch back to the regular one after unmasking is done.
> 
> This catches any pending imprecise abort that the firmware/bootloader
> may have left behind that would normally crash the kernel at that point.
> As there are apparently a lot of bootlaoders out there that do such a
> thing it makes sense to handle it in the common startup code.
> 
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

Much better.  Please feel free to add it to the patch system, thanks.

I think, given that the original seems to be breaking platforms, this
patch needs to go into -rc kernels, right?
Tony Lindgren Oct. 15, 2015, 3:39 p.m. UTC | #2
* Russell King - ARM Linux <linux@arm.linux.org.uk> [151015 08:37]:
> On Thu, Oct 15, 2015 at 12:32:20PM +0200, Lucas Stach wrote:
> > Install a non-faulting handler just before unmasking imprecise aborts
> > and switch back to the regular one after unmasking is done.
> > 
> > This catches any pending imprecise abort that the firmware/bootloader
> > may have left behind that would normally crash the kernel at that point.
> > As there are apparently a lot of bootlaoders out there that do such a
> > thing it makes sense to handle it in the common startup code.
> > 
> > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> 
> Much better.  Please feel free to add it to the patch system, thanks.
> 
> I think, given that the original seems to be breaking platforms, this
> patch needs to go into -rc kernels, right?

Hmm do we still see a trace where the issue happened though with this
one?

What I though wes something pending from the bootloader, turned out
to be a bogus SRAM init code that I was able to easily fix with the
trace.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Oct. 15, 2015, 4:06 p.m. UTC | #3
On Thu, Oct 15, 2015 at 08:39:15AM -0700, Tony Lindgren wrote:
> * Russell King - ARM Linux <linux@arm.linux.org.uk> [151015 08:37]:
> > On Thu, Oct 15, 2015 at 12:32:20PM +0200, Lucas Stach wrote:
> > > Install a non-faulting handler just before unmasking imprecise aborts
> > > and switch back to the regular one after unmasking is done.
> > > 
> > > This catches any pending imprecise abort that the firmware/bootloader
> > > may have left behind that would normally crash the kernel at that point.
> > > As there are apparently a lot of bootlaoders out there that do such a
> > > thing it makes sense to handle it in the common startup code.
> > > 
> > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > 
> > Much better.  Please feel free to add it to the patch system, thanks.
> > 
> > I think, given that the original seems to be breaking platforms, this
> > patch needs to go into -rc kernels, right?
> 
> Hmm do we still see a trace where the issue happened though with this
> one?

That's not the intention of this specific patch.

This is solely to detect the bootloader induced imprecise exception,
nothing more.  A backtrace for that won't be useful in any shape or
form - in fact, the backtrace will be well known (it'll be from the
site where the imprecise exceptions are unmasked.)

Any imprecise exception which happens after this will be handled in
the normal way: it'll raise a kernel oops, and that will have all the
details that a kernel oops normally has.

The difference is, rather than the boot loader provoking the kernel to
oops at this point, we're able to report the event and continue on.
Tony Lindgren Oct. 15, 2015, 4:23 p.m. UTC | #4
* Russell King - ARM Linux <linux@arm.linux.org.uk> [151015 09:11]:
> On Thu, Oct 15, 2015 at 08:39:15AM -0700, Tony Lindgren wrote:
> > * Russell King - ARM Linux <linux@arm.linux.org.uk> [151015 08:37]:
> > > On Thu, Oct 15, 2015 at 12:32:20PM +0200, Lucas Stach wrote:
> > > > Install a non-faulting handler just before unmasking imprecise aborts
> > > > and switch back to the regular one after unmasking is done.
> > > > 
> > > > This catches any pending imprecise abort that the firmware/bootloader
> > > > may have left behind that would normally crash the kernel at that point.
> > > > As there are apparently a lot of bootlaoders out there that do such a
> > > > thing it makes sense to handle it in the common startup code.
> > > > 
> > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > > 
> > > Much better.  Please feel free to add it to the patch system, thanks.
> > > 
> > > I think, given that the original seems to be breaking platforms, this
> > > patch needs to go into -rc kernels, right?
> > 
> > Hmm do we still see a trace where the issue happened though with this
> > one?
> 
> That's not the intention of this specific patch.
> 
> This is solely to detect the bootloader induced imprecise exception,
> nothing more.  A backtrace for that won't be useful in any shape or
> form - in fact, the backtrace will be well known (it'll be from the
> site where the imprecise exceptions are unmasked.)
> 
> Any imprecise exception which happens after this will be handled in
> the normal way: it'll raise a kernel oops, and that will have all the
> details that a kernel oops normally has.

OK makes sense.

> The difference is, rather than the boot loader provoking the kernel to
> oops at this point, we're able to report the event and continue on.

OK

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lucas Stach Oct. 16, 2015, 8:21 a.m. UTC | #5
Am Donnerstag, den 15.10.2015, 16:32 +0100 schrieb Russell King - ARM
Linux:
> On Thu, Oct 15, 2015 at 12:32:20PM +0200, Lucas Stach wrote:
> > Install a non-faulting handler just before unmasking imprecise aborts
> > and switch back to the regular one after unmasking is done.
> > 
> > This catches any pending imprecise abort that the firmware/bootloader
> > may have left behind that would normally crash the kernel at that point.
> > As there are apparently a lot of bootlaoders out there that do such a
> > thing it makes sense to handle it in the common startup code.
> > 
> > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> 
> Much better.  Please feel free to add it to the patch system, thanks.
> 
Do you feel like taking patches 2 and 3 through your tree too, given
that they are already acked by the platform maintainers, or should they
pick them up themselves?

> I think, given that the original seems to be breaking platforms, this
> patch needs to go into -rc kernels, right?
> 
No, the original patch is only queued for 4.4, so the regression is only
in -next. This patch only needs to make it into your 4.4 pull request.

Thanks,
Lucas
Lucas Stach Oct. 19, 2015, 12:41 p.m. UTC | #6
Am Donnerstag, den 15.10.2015, 16:32 +0100 schrieb Russell King - ARM
Linux:
> On Thu, Oct 15, 2015 at 12:32:20PM +0200, Lucas Stach wrote:
> > Install a non-faulting handler just before unmasking imprecise aborts
> > and switch back to the regular one after unmasking is done.
> > 
> > This catches any pending imprecise abort that the firmware/bootloader
> > may have left behind that would normally crash the kernel at that point.
> > As there are apparently a lot of bootlaoders out there that do such a
> > thing it makes sense to handle it in the common startup code.
> > 
> > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> 
> Much better.  Please feel free to add it to the patch system, thanks.
> 
> I think, given that the original seems to be breaking platforms, this
> patch needs to go into -rc kernels, right?
> 
Okay, this is in the patch system as patch 8447/1 now.

As Russell hasn't said whether he would like to pick up the platform
patches I've omitted them for now and would ask the platform maintainers
to pick them up themselves for kernel 4.4.

Regards,
Lucas
diff mbox

Patch

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 0d629b8f973f..daafcf121ce0 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -593,6 +593,28 @@  do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
 	arm_notify_die("", regs, &info, ifsr, 0);
 }
 
+/*
+ * Abort handler to be used only during first unmasking of asynchronous aborts
+ * on the boot CPU. This makes sure that the machine will not die if the
+ * firmware/bootloader left an imprecise abort pending for us to trip over.
+ */
+static int __init early_abort_handler(unsigned long addr, unsigned int fsr,
+				      struct pt_regs *regs)
+{
+	pr_warn("Hit pending asynchronous external abort (FSR=0x%08x) during "
+		"first unmask, this is most likely caused by a "
+		"firmware/bootloader bug.\n", fsr);
+
+	return 0;
+}
+
+void __init early_abt_enable(void)
+{
+	fsr_info[22].fn = early_abort_handler;
+	local_abt_enable();
+	fsr_info[22].fn = do_bad;
+}
+
 #ifndef CONFIG_ARM_LPAE
 static int __init exceptions_init(void)
 {
diff --git a/arch/arm/mm/fault.h b/arch/arm/mm/fault.h
index cf08bdfbe0d6..05ec5e0df32d 100644
--- a/arch/arm/mm/fault.h
+++ b/arch/arm/mm/fault.h
@@ -24,5 +24,6 @@  static inline int fsr_fs(unsigned int fsr)
 
 void do_bad_area(unsigned long addr, unsigned int fsr, struct pt_regs *regs);
 unsigned long search_exception_table(unsigned long addr);
+void early_abt_enable(void);
 
 #endif	/* __ARCH_ARM_FAULT_H */
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index f65a6f344b6d..4867f5daf82c 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -38,6 +38,7 @@ 
 #include <asm/mach/pci.h>
 #include <asm/fixmap.h>
 
+#include "fault.h"
 #include "mm.h"
 #include "tcm.h"
 
@@ -1365,7 +1366,7 @@  static void __init devicemaps_init(const struct machine_desc *mdesc)
 	flush_cache_all();
 
 	/* Enable asynchronous aborts */
-	local_abt_enable();
+	early_abt_enable();
 }
 
 static void __init kmap_init(void)