diff mbox series

[v4,2/2] arm64: show signal info for global init

Message ID 20240924074341.37272-2-qiwu.chen@transsion.com (mailing list archive)
State New
Headers show
Series [v4,1/2] panic: add option to dump task maps info in panic_print | expand

Commit Message

chenqiwu Sept. 24, 2024, 7:43 a.m. UTC
Currently, it's hard to debug panic issues caused by kill init
on arm64, since there is no debug info from user mode in current
panic msg such as the user_regs and maps info.

This patch shows signal info sent to the global init, which will
be helpful for debugging kill init issue caused by unhandled exception
from user mode.

- changes history:
v3:
https://lore.kernel.org/all/20240922095504.7182-1-qiwu.chen@transsion.com/
https://lore.kernel.org/all/20240922095504.7182-2-qiwu.chen@transsion.com/
v2: https://lore.kernel.org/all/20231110031553.33186-1-qiwu.chen@transsion.com/
v1: https://lore.kernel.org/all/20231110022720.GA3087@rlk/

Signed-off-by: qiwu.chen <qiwu.chen@transsion.com>
---
 arch/arm64/kernel/traps.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Oleg Nesterov Sept. 24, 2024, 6:36 p.m. UTC | #1
Well, can't comment, I leave this to the arch/arm64/ maintainers.

but let me add a bit of spam just for the record,

On 09/24, qiwu.chen wrote:
>
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -247,12 +247,20 @@ static void arm64_show_signal(int signo, const char *str)
>  	unsigned long esr = tsk->thread.fault_code;
>  	struct pt_regs *regs = task_pt_regs(tsk);
>
> +	/*
> +	 * The signal sent to the global init needs to be shown,
> +	 * which is useful for debugging kill init issue.
> +	 */
> +	if (unlikely(is_global_init(tsk)))
> +		goto dump;
> +
>  	/* Leave if the signal won't be shown */
>  	if (!show_unhandled_signals ||
>  	    !unhandled_signal(tsk, signo) ||
>  	    !__ratelimit(&rs))
>  		return;
>
> +dump:

So what does this patch try to do? Note that unhandled_signal(tsk) returns true
if is_global_init(tsk).

So it seems that this patch just tries to bypass the show_unhandled_signals and
__ratelimit() checks? Or what?



OTOH. The is_global_init() check in unhandled_signal() (which predates the git
history) doesn't look right to me. If init has a handler for, say, SIGSEGV, why
should the kernel complain? I need to recheck this logic...

Oleg.
chenqiwu Sept. 25, 2024, 3:54 a.m. UTC | #2
On Tue, Sep 24, 2024 at 08:36:35PM +0200, Oleg Nesterov wrote:
> 
> So what does this patch try to do? Note that unhandled_signal(tsk) returns true
> if is_global_init(tsk).
> 
> So it seems that this patch just tries to bypass the show_unhandled_signals and
> __ratelimit() checks? Or what?
> 
Yes, this patch just try to bypass the show_unhandled_signals and
__ratelimit() checks for the global init.

> 
> OTOH. The is_global_init() check in unhandled_signal() (which predates the git
> history) doesn't look right to me. If init has a handler for, say, SIGSEGV, why
> should the kernel complain? I need to recheck this logic...
> 
I think the orignal logic is the signal sent to the global init is
regarded as unhandled becuase it has SIGNAL_UNKILLABLE feature.

Thanks
Qiwu
Oleg Nesterov Sept. 25, 2024, 12:05 p.m. UTC | #3
On 09/25, chenqiwu wrote:
>
> On Tue, Sep 24, 2024 at 08:36:35PM +0200, Oleg Nesterov wrote:
> >
> > So what does this patch try to do? Note that unhandled_signal(tsk) returns true
> > if is_global_init(tsk).
> >
> > So it seems that this patch just tries to bypass the show_unhandled_signals and
> > __ratelimit() checks? Or what?
> >
> Yes, this patch just try to bypass the show_unhandled_signals and
> __ratelimit() checks for the global init.

I think the changelog should explain this.

But this doesn't look right to me, see below.

> > OTOH. The is_global_init() check in unhandled_signal() (which predates the git
> > history) doesn't look right to me. If init has a handler for, say, SIGSEGV, why
> > should the kernel complain? I need to recheck this logic...
> >
> I think the orignal logic is the signal sent to the global init is
> regarded as unhandled becuase it has SIGNAL_UNKILLABLE feature.

And? How does this relate to SIGNAL_UNKILLABLE? Again, this check predates
the git history and the SIGNAL_UNKILLABLE feature. And SIGNAL_UNKILLABLE
has no effect in this case, see force_sig_info_to_task().

I am running this program

	void sigh(int sig)
	{
		printf("SIGSEGV %d\n", getpid());
		execl("/usr/bin/sh", "sh", NULL);
		printf("exec failed\n");
	}

	int main(void)
	{
		signal(SIGSEGV, sigh);
		*((char*)(1234)) = 0;
		return 1;
	}

as a global init under KVM. It works but dmesg reports

	init[1]: segfault at 4d2 ip 00000000004006b5 sp 00007ffcf2975170 error 6 in init[6b5,400000+1000] likely on CPU 0 (core 0, socket 0)
	Code: e8 c0 fe ff ff bf 6b 07 40 00 e8 56 fe ff ff 90 c9 c3 55 48 89 e5 be 56 06 40 00 bf 0b 00 00 00 e8 80 fe ff ff b8 d2 04 00 00 <c6> 00 00 b8 01 00 00 00 5d c3 90 41 57 41 56 41 89 ff 41 55 41 54

I'll send the patch which removes the is_global_init check from unhandled_signal()
which actually has more problems, afaics. But this all is offtopic.

Oleg.
chenqiwu Sept. 26, 2024, 10:12 a.m. UTC | #4
On Wed, Sep 25, 2024 at 02:05:47PM +0200, Oleg Nesterov wrote:
> > >
> > Yes, this patch just try to bypass the show_unhandled_signals and
> > __ratelimit() checks for the global init.
> 
> I think the changelog should explain this.
>
Sure, I will explain this in the changelog of patch v5 if someone else
would ACK this patch.

> But this doesn't look right to me, see below.
> 
> > > OTOH. The is_global_init() check in unhandled_signal() (which predates the git
> > > history) doesn't look right to me. If init has a handler for, say, SIGSEGV, why
> > > should the kernel complain? I need to recheck this logic...
> > >
> > I think the orignal logic is the signal sent to the global init is
> > regarded as unhandled becuase it has SIGNAL_UNKILLABLE feature.
> 
> And? How does this relate to SIGNAL_UNKILLABLE? Again, this check predates
> the git history and the SIGNAL_UNKILLABLE feature. And SIGNAL_UNKILLABLE
> has no effect in this case, see force_sig_info_to_task().
>
Thanks for your explaination!

For the report on ARM64 KVM, the signal sent to the global init by
arm4_force_sig() and the global init do_exit by get_signal().
However, it doesn't show signal info due to show_unhandled_signals
check, because the initial value of show_unhandled_signals is 0
which can be adjusted by /proc/sys/debug/exception-trace in root mode.
In ARM64 productive environment, the default value of show_unhandled_signals
is set to 0 in case of show_unhandled_signals storms, and the debuggers
cannot adjust it dynamically by shell in non-root mode.

Considering the minor case of kill init, this change won't have any
side effect. Let's to see the ARM64 maintainers if really like it :)

Thanks
Qiwu
diff mbox series

Patch

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 563cbce11126..3150fb84195f 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -247,12 +247,20 @@  static void arm64_show_signal(int signo, const char *str)
 	unsigned long esr = tsk->thread.fault_code;
 	struct pt_regs *regs = task_pt_regs(tsk);
 
+	/*
+	 * The signal sent to the global init needs to be shown,
+	 * which is useful for debugging kill init issue.
+	 */
+	if (unlikely(is_global_init(tsk)))
+		goto dump;
+
 	/* Leave if the signal won't be shown */
 	if (!show_unhandled_signals ||
 	    !unhandled_signal(tsk, signo) ||
 	    !__ratelimit(&rs))
 		return;
 
+dump:
 	pr_info("%s[%d]: unhandled exception: ", tsk->comm, task_pid_nr(tsk));
 	if (esr)
 		pr_cont("%s, ESR 0x%016lx, ", esr_get_class_string(esr), esr);