diff mbox series

ARM: Fix null die() string for unhandled data and prefetch abort cases

Message ID 1563589976-19004-1-git-send-email-george_davis@mentor.com (mailing list archive)
State New, archived
Headers show
Series ARM: Fix null die() string for unhandled data and prefetch abort cases | expand

Commit Message

George G. Davis July 20, 2019, 2:32 a.m. UTC
When an unhandled data or prefetch abort occurs, the die() string
is empty resulting in backtrace messages similar to the following:

	Internal error: : 1 [#1] PREEMPT SMP ARM

Replace the null string with the name of the abort handler in order
to provide more meaningful hints as to the cause of the fault.

Signed-off-by: George G. Davis <george_davis@mentor.com>
---
 arch/arm/mm/fault.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Russell King (Oracle) July 20, 2019, 12:30 p.m. UTC | #1
On Fri, Jul 19, 2019 at 10:32:55PM -0400, George G. Davis wrote:
> When an unhandled data or prefetch abort occurs, the die() string
> is empty resulting in backtrace messages similar to the following:
> 
> 	Internal error: : 1 [#1] PREEMPT SMP ARM
> 
> Replace the null string with the name of the abort handler in order
> to provide more meaningful hints as to the cause of the fault.

NAK.

We already print the cause of the abort earlier in the dump, and we've
also added a "cut here" marker to help people include all the necessary
information when reporting a problem.

It's unfortunate that we have the additional colon in the oops dump,
but repeating the information that we've printed on one of the previous
two lines is really not necessary.

> 
> Signed-off-by: George G. Davis <george_davis@mentor.com>
> ---
>  arch/arm/mm/fault.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index 0048eadd0681..dddea0a21220 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -557,7 +557,7 @@ do_DataAbort(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>  		inf->name, fsr, addr);
>  	show_pte(current->mm, addr);
>  
> -	arm_notify_die("", regs, inf->sig, inf->code, (void __user *)addr,
> +	arm_notify_die(inf->name, regs, inf->sig, inf->code, (void __user *)addr,
>  		       fsr, 0);
>  }
>  
> @@ -585,7 +585,7 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
>  	pr_alert("Unhandled prefetch abort: %s (0x%03x) at 0x%08lx\n",
>  		inf->name, ifsr, addr);
>  
> -	arm_notify_die("", regs, inf->sig, inf->code, (void __user *)addr,
> +	arm_notify_die(inf->name, regs, inf->sig, inf->code, (void __user *)addr,
>  		       ifsr, 0);
>  }
>  
> -- 
> 2.7.4
> 
>
George G. Davis July 25, 2019, 9:37 p.m. UTC | #2
Hello Russell,

Thanks for your prompt reply!

On Sat, Jul 20, 2019 at 01:30:23PM +0100, Russell King - ARM Linux admin wrote:
> On Fri, Jul 19, 2019 at 10:32:55PM -0400, George G. Davis wrote:
> > When an unhandled data or prefetch abort occurs, the die() string
> > is empty resulting in backtrace messages similar to the following:
> > 
> > 	Internal error: : 1 [#1] PREEMPT SMP ARM
> > 
> > Replace the null string with the name of the abort handler in order
> > to provide more meaningful hints as to the cause of the fault.
> 
> NAK.
> 
> We already print the cause of the abort earlier in the dump, and we've
> also added a "cut here" marker to help people include all the necessary
> information when reporting a problem.

For what it's worth, I often receive crash dumps which lack the pr_alert
messages and only include the pr_emerg messages which this change would at
least provide extra hints, since the "Internal error" as at EMERG level
wereas the initial messages are only at ALERT level. It's subtle but for
cases where the end user has set loglevel such that they only see EMERG
messages, the change is helpful, to me at least.

> It's unfortunate that we have the additional colon in the oops dump,

Agreed, it's rather unfortunate that the string is NULL in these cases.

> but repeating the information that we've printed on one of the previous
> two lines is really not necessary.

It depends on the loglevel the user has set. So perhaps it's not such a
bad thing to repeat the information?

Thanks!

> > 
> > Signed-off-by: George G. Davis <george_davis@mentor.com>
> > ---
> >  arch/arm/mm/fault.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > index 0048eadd0681..dddea0a21220 100644
> > --- a/arch/arm/mm/fault.c
> > +++ b/arch/arm/mm/fault.c
> > @@ -557,7 +557,7 @@ do_DataAbort(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> >  		inf->name, fsr, addr);
> >  	show_pte(current->mm, addr);
> >  
> > -	arm_notify_die("", regs, inf->sig, inf->code, (void __user *)addr,
> > +	arm_notify_die(inf->name, regs, inf->sig, inf->code, (void __user *)addr,
> >  		       fsr, 0);
> >  }
> >  
> > @@ -585,7 +585,7 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
> >  	pr_alert("Unhandled prefetch abort: %s (0x%03x) at 0x%08lx\n",
> >  		inf->name, ifsr, addr);
> >  
> > -	arm_notify_die("", regs, inf->sig, inf->code, (void __user *)addr,
> > +	arm_notify_die(inf->name, regs, inf->sig, inf->code, (void __user *)addr,
> >  		       ifsr, 0);
> >  }
> >  
> > -- 
> > 2.7.4
> > 
> > 
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up
Russell King (Oracle) July 25, 2019, 9:55 p.m. UTC | #3
On Thu, Jul 25, 2019 at 05:37:54PM -0400, George G. Davis wrote:
> Hello Russell,
> 
> Thanks for your prompt reply!
> 
> On Sat, Jul 20, 2019 at 01:30:23PM +0100, Russell King - ARM Linux admin wrote:
> > On Fri, Jul 19, 2019 at 10:32:55PM -0400, George G. Davis wrote:
> > > When an unhandled data or prefetch abort occurs, the die() string
> > > is empty resulting in backtrace messages similar to the following:
> > > 
> > > 	Internal error: : 1 [#1] PREEMPT SMP ARM
> > > 
> > > Replace the null string with the name of the abort handler in order
> > > to provide more meaningful hints as to the cause of the fault.
> > 
> > NAK.
> > 
> > We already print the cause of the abort earlier in the dump, and we've
> > also added a "cut here" marker to help people include all the necessary
> > information when reporting a problem.
> 
> For what it's worth, I often receive crash dumps which lack the pr_alert
> messages and only include the pr_emerg messages which this change would at
> least provide extra hints, since the "Internal error" as at EMERG level
> wereas the initial messages are only at ALERT level. It's subtle but for
> cases where the end user has set loglevel such that they only see EMERG
> messages, the change is helpful, to me at least.
> 
> > It's unfortunate that we have the additional colon in the oops dump,
> 
> Agreed, it's rather unfortunate that the string is NULL in these cases.
> 
> > but repeating the information that we've printed on one of the previous
> > two lines is really not necessary.
> 
> It depends on the loglevel the user has set. So perhaps it's not such a
> bad thing to repeat the information?

Or maybe we should arrange for consistent usage of the log levels?
George G. Davis July 25, 2019, 10:24 p.m. UTC | #4
Hello Russell,

On Thu, Jul 25, 2019 at 10:55:40PM +0100, Russell King - ARM Linux admin wrote:
> On Thu, Jul 25, 2019 at 05:37:54PM -0400, George G. Davis wrote:
> > Hello Russell,
> > 
> > Thanks for your prompt reply!
> > 
> > On Sat, Jul 20, 2019 at 01:30:23PM +0100, Russell King - ARM Linux admin wrote:
> > > On Fri, Jul 19, 2019 at 10:32:55PM -0400, George G. Davis wrote:
> > > > When an unhandled data or prefetch abort occurs, the die() string
> > > > is empty resulting in backtrace messages similar to the following:
> > > > 
> > > > 	Internal error: : 1 [#1] PREEMPT SMP ARM
> > > > 
> > > > Replace the null string with the name of the abort handler in order
> > > > to provide more meaningful hints as to the cause of the fault.
> > > 
> > > NAK.
> > > 
> > > We already print the cause of the abort earlier in the dump, and we've
> > > also added a "cut here" marker to help people include all the necessary
> > > information when reporting a problem.
> > 
> > For what it's worth, I often receive crash dumps which lack the pr_alert
> > messages and only include the pr_emerg messages which this change would at
> > least provide extra hints, since the "Internal error" as at EMERG level
> > wereas the initial messages are only at ALERT level. It's subtle but for
> > cases where the end user has set loglevel such that they only see EMERG
> > messages, the change is helpful, to me at least.
> > 
> > > It's unfortunate that we have the additional colon in the oops dump,
> > 
> > Agreed, it's rather unfortunate that the string is NULL in these cases.
> > 
> > > but repeating the information that we've printed on one of the previous
> > > two lines is really not necessary.
> > 
> > It depends on the loglevel the user has set. So perhaps it's not such a
> > bad thing to repeat the information?
> 
> Or maybe we should arrange for consistent usage of the log levels?

Unfortunately, some of the users that I work with have very specific limits
and requirements for kernel error message logging which are driven by
performance and/or storage limitations. So it's not always possible to "arrange
for consistent usage of the log levels" with some users. Meanwhile, these
messages do show up in logs without the pre-able headers, lacking the string
which is already available. It's hardly a big deal to re-use the same string,
especially for the !user_mode(regs) case, where the kernel will oops at
EMERG loglevel, leaving the NULL string as the reason. I can assure you that
I've tried to convince these users to change the loglevel but they have their
reasons for keeping it as they do and I'm unable to convince them otherwise.

Thanks!
Russell King (Oracle) July 25, 2019, 10:32 p.m. UTC | #5
On Thu, Jul 25, 2019 at 06:24:01PM -0400, George G. Davis wrote:
> Hello Russell,
> 
> On Thu, Jul 25, 2019 at 10:55:40PM +0100, Russell King - ARM Linux admin wrote:
> > On Thu, Jul 25, 2019 at 05:37:54PM -0400, George G. Davis wrote:
> > > Hello Russell,
> > > 
> > > Thanks for your prompt reply!
> > > 
> > > On Sat, Jul 20, 2019 at 01:30:23PM +0100, Russell King - ARM Linux admin wrote:
> > > > On Fri, Jul 19, 2019 at 10:32:55PM -0400, George G. Davis wrote:
> > > > > When an unhandled data or prefetch abort occurs, the die() string
> > > > > is empty resulting in backtrace messages similar to the following:
> > > > > 
> > > > > 	Internal error: : 1 [#1] PREEMPT SMP ARM
> > > > > 
> > > > > Replace the null string with the name of the abort handler in order
> > > > > to provide more meaningful hints as to the cause of the fault.
> > > > 
> > > > NAK.
> > > > 
> > > > We already print the cause of the abort earlier in the dump, and we've
> > > > also added a "cut here" marker to help people include all the necessary
> > > > information when reporting a problem.
> > > 
> > > For what it's worth, I often receive crash dumps which lack the pr_alert
> > > messages and only include the pr_emerg messages which this change would at
> > > least provide extra hints, since the "Internal error" as at EMERG level
> > > wereas the initial messages are only at ALERT level. It's subtle but for
> > > cases where the end user has set loglevel such that they only see EMERG
> > > messages, the change is helpful, to me at least.
> > > 
> > > > It's unfortunate that we have the additional colon in the oops dump,
> > > 
> > > Agreed, it's rather unfortunate that the string is NULL in these cases.
> > > 
> > > > but repeating the information that we've printed on one of the previous
> > > > two lines is really not necessary.
> > > 
> > > It depends on the loglevel the user has set. So perhaps it's not such a
> > > bad thing to repeat the information?
> > 
> > Or maybe we should arrange for consistent usage of the log levels?
> 
> Unfortunately, some of the users that I work with have very specific limits
> and requirements for kernel error message logging which are driven by
> performance and/or storage limitations. So it's not always possible to "arrange
> for consistent usage of the log levels" with some users. Meanwhile, these
> messages do show up in logs without the pre-able headers, lacking the string
> which is already available. It's hardly a big deal to re-use the same string,
> especially for the !user_mode(regs) case, where the kernel will oops at
> EMERG loglevel, leaving the NULL string as the reason. I can assure you that
> I've tried to convince these users to change the loglevel but they have their
> reasons for keeping it as they do and I'm unable to convince them otherwise.

Sorry, but I really don't buy this.

By your argument, we should get rid of the pre-amble headers because
they're "not useful" in your eyes...
George G. Davis July 25, 2019, 11:15 p.m. UTC | #6
Hello Russell,

On Thu, Jul 25, 2019 at 11:32:49PM +0100, Russell King - ARM Linux admin wrote:
> On Thu, Jul 25, 2019 at 06:24:01PM -0400, George G. Davis wrote:
> > Hello Russell,
> > 
> > On Thu, Jul 25, 2019 at 10:55:40PM +0100, Russell King - ARM Linux admin wrote:
> > > On Thu, Jul 25, 2019 at 05:37:54PM -0400, George G. Davis wrote:
> > > > Hello Russell,
> > > > 
> > > > Thanks for your prompt reply!
> > > > 
> > > > On Sat, Jul 20, 2019 at 01:30:23PM +0100, Russell King - ARM Linux admin wrote:
> > > > > On Fri, Jul 19, 2019 at 10:32:55PM -0400, George G. Davis wrote:
> > > > > > When an unhandled data or prefetch abort occurs, the die() string
> > > > > > is empty resulting in backtrace messages similar to the following:
> > > > > > 
> > > > > > 	Internal error: : 1 [#1] PREEMPT SMP ARM
> > > > > > 
> > > > > > Replace the null string with the name of the abort handler in order
> > > > > > to provide more meaningful hints as to the cause of the fault.
> > > > > 
> > > > > NAK.
> > > > > 
> > > > > We already print the cause of the abort earlier in the dump, and we've
> > > > > also added a "cut here" marker to help people include all the necessary
> > > > > information when reporting a problem.
> > > > 
> > > > For what it's worth, I often receive crash dumps which lack the pr_alert
> > > > messages and only include the pr_emerg messages which this change would at
> > > > least provide extra hints, since the "Internal error" as at EMERG level
> > > > wereas the initial messages are only at ALERT level. It's subtle but for
> > > > cases where the end user has set loglevel such that they only see EMERG
> > > > messages, the change is helpful, to me at least.
> > > > 
> > > > > It's unfortunate that we have the additional colon in the oops dump,
> > > > 
> > > > Agreed, it's rather unfortunate that the string is NULL in these cases.
> > > > 
> > > > > but repeating the information that we've printed on one of the previous
> > > > > two lines is really not necessary.
> > > > 
> > > > It depends on the loglevel the user has set. So perhaps it's not such a
> > > > bad thing to repeat the information?
> > > 
> > > Or maybe we should arrange for consistent usage of the log levels?
> > 
> > Unfortunately, some of the users that I work with have very specific limits
> > and requirements for kernel error message logging which are driven by
> > performance and/or storage limitations. So it's not always possible to "arrange
> > for consistent usage of the log levels" with some users. Meanwhile, these
> > messages do show up in logs without the pre-able headers, lacking the string
> > which is already available. It's hardly a big deal to re-use the same string,
> > especially for the !user_mode(regs) case, where the kernel will oops at
> > EMERG loglevel, leaving the NULL string as the reason. I can assure you that
> > I've tried to convince these users to change the loglevel but they have their
> > reasons for keeping it as they do and I'm unable to convince them otherwise.
> 
> Sorry, but I really don't buy this.
> 
> By your argument, we should get rid of the pre-amble headers because
> they're "not useful" in your eyes...

For user_mode(regs), the system will remain running and logs may be
checked on the running system as usual in conjuction signal handler
exception handling. So no, I don't agree that the pre-amble headers are
"not useful", in fact, they are quite useful for interactive and automated
debugging of user faults, and of course most normal deployment cases which
retain full message logs on disk. It's only for the !user_mode(regs) case,
in some embedded deployment cases, where the change is intended to provide more
insight which may be missing otherwise, in admittedly limited use cases.

My last argument in favor of applying the change is this: the string
pointer is already loaded in a register and so likely costs less instructions
and time to simply pass it onto arm_notify_die() compared to the
cost of loading a NULL string pointer into a register. For the user_mode(regs)
case, the string is not used and cost nothing to pass along. For the !user_mode(regs)
case, it provides information which may be missing otherwise depending on the
loglevel.

Thanks again for your prompt replies and consideration!
diff mbox series

Patch

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 0048eadd0681..dddea0a21220 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -557,7 +557,7 @@  do_DataAbort(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 		inf->name, fsr, addr);
 	show_pte(current->mm, addr);
 
-	arm_notify_die("", regs, inf->sig, inf->code, (void __user *)addr,
+	arm_notify_die(inf->name, regs, inf->sig, inf->code, (void __user *)addr,
 		       fsr, 0);
 }
 
@@ -585,7 +585,7 @@  do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)
 	pr_alert("Unhandled prefetch abort: %s (0x%03x) at 0x%08lx\n",
 		inf->name, ifsr, addr);
 
-	arm_notify_die("", regs, inf->sig, inf->code, (void __user *)addr,
+	arm_notify_die(inf->name, regs, inf->sig, inf->code, (void __user *)addr,
 		       ifsr, 0);
 }