Message ID | 20240924074341.37272-2-qiwu.chen@transsion.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4,1/2] panic: add option to dump task maps info in panic_print | expand |
Well, can't comment, I leave this to the arch/arm64/ maintainers. but let me add a bit of spam just for the record, On 09/24, qiwu.chen wrote: > > --- a/arch/arm64/kernel/traps.c > +++ b/arch/arm64/kernel/traps.c > @@ -247,12 +247,20 @@ static void arm64_show_signal(int signo, const char *str) > unsigned long esr = tsk->thread.fault_code; > struct pt_regs *regs = task_pt_regs(tsk); > > + /* > + * The signal sent to the global init needs to be shown, > + * which is useful for debugging kill init issue. > + */ > + if (unlikely(is_global_init(tsk))) > + goto dump; > + > /* Leave if the signal won't be shown */ > if (!show_unhandled_signals || > !unhandled_signal(tsk, signo) || > !__ratelimit(&rs)) > return; > > +dump: So what does this patch try to do? Note that unhandled_signal(tsk) returns true if is_global_init(tsk). So it seems that this patch just tries to bypass the show_unhandled_signals and __ratelimit() checks? Or what? OTOH. The is_global_init() check in unhandled_signal() (which predates the git history) doesn't look right to me. If init has a handler for, say, SIGSEGV, why should the kernel complain? I need to recheck this logic... Oleg.
On Tue, Sep 24, 2024 at 08:36:35PM +0200, Oleg Nesterov wrote: > > So what does this patch try to do? Note that unhandled_signal(tsk) returns true > if is_global_init(tsk). > > So it seems that this patch just tries to bypass the show_unhandled_signals and > __ratelimit() checks? Or what? > Yes, this patch just try to bypass the show_unhandled_signals and __ratelimit() checks for the global init. > > OTOH. The is_global_init() check in unhandled_signal() (which predates the git > history) doesn't look right to me. If init has a handler for, say, SIGSEGV, why > should the kernel complain? I need to recheck this logic... > I think the orignal logic is the signal sent to the global init is regarded as unhandled becuase it has SIGNAL_UNKILLABLE feature. Thanks Qiwu
On 09/25, chenqiwu wrote: > > On Tue, Sep 24, 2024 at 08:36:35PM +0200, Oleg Nesterov wrote: > > > > So what does this patch try to do? Note that unhandled_signal(tsk) returns true > > if is_global_init(tsk). > > > > So it seems that this patch just tries to bypass the show_unhandled_signals and > > __ratelimit() checks? Or what? > > > Yes, this patch just try to bypass the show_unhandled_signals and > __ratelimit() checks for the global init. I think the changelog should explain this. But this doesn't look right to me, see below. > > OTOH. The is_global_init() check in unhandled_signal() (which predates the git > > history) doesn't look right to me. If init has a handler for, say, SIGSEGV, why > > should the kernel complain? I need to recheck this logic... > > > I think the orignal logic is the signal sent to the global init is > regarded as unhandled becuase it has SIGNAL_UNKILLABLE feature. And? How does this relate to SIGNAL_UNKILLABLE? Again, this check predates the git history and the SIGNAL_UNKILLABLE feature. And SIGNAL_UNKILLABLE has no effect in this case, see force_sig_info_to_task(). I am running this program void sigh(int sig) { printf("SIGSEGV %d\n", getpid()); execl("/usr/bin/sh", "sh", NULL); printf("exec failed\n"); } int main(void) { signal(SIGSEGV, sigh); *((char*)(1234)) = 0; return 1; } as a global init under KVM. It works but dmesg reports init[1]: segfault at 4d2 ip 00000000004006b5 sp 00007ffcf2975170 error 6 in init[6b5,400000+1000] likely on CPU 0 (core 0, socket 0) Code: e8 c0 fe ff ff bf 6b 07 40 00 e8 56 fe ff ff 90 c9 c3 55 48 89 e5 be 56 06 40 00 bf 0b 00 00 00 e8 80 fe ff ff b8 d2 04 00 00 <c6> 00 00 b8 01 00 00 00 5d c3 90 41 57 41 56 41 89 ff 41 55 41 54 I'll send the patch which removes the is_global_init check from unhandled_signal() which actually has more problems, afaics. But this all is offtopic. Oleg.
On Wed, Sep 25, 2024 at 02:05:47PM +0200, Oleg Nesterov wrote: > > > > > Yes, this patch just try to bypass the show_unhandled_signals and > > __ratelimit() checks for the global init. > > I think the changelog should explain this. > Sure, I will explain this in the changelog of patch v5 if someone else would ACK this patch. > But this doesn't look right to me, see below. > > > > OTOH. The is_global_init() check in unhandled_signal() (which predates the git > > > history) doesn't look right to me. If init has a handler for, say, SIGSEGV, why > > > should the kernel complain? I need to recheck this logic... > > > > > I think the orignal logic is the signal sent to the global init is > > regarded as unhandled becuase it has SIGNAL_UNKILLABLE feature. > > And? How does this relate to SIGNAL_UNKILLABLE? Again, this check predates > the git history and the SIGNAL_UNKILLABLE feature. And SIGNAL_UNKILLABLE > has no effect in this case, see force_sig_info_to_task(). > Thanks for your explaination! For the report on ARM64 KVM, the signal sent to the global init by arm4_force_sig() and the global init do_exit by get_signal(). However, it doesn't show signal info due to show_unhandled_signals check, because the initial value of show_unhandled_signals is 0 which can be adjusted by /proc/sys/debug/exception-trace in root mode. In ARM64 productive environment, the default value of show_unhandled_signals is set to 0 in case of show_unhandled_signals storms, and the debuggers cannot adjust it dynamically by shell in non-root mode. Considering the minor case of kill init, this change won't have any side effect. Let's to see the ARM64 maintainers if really like it :) Thanks Qiwu
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 563cbce11126..3150fb84195f 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -247,12 +247,20 @@ static void arm64_show_signal(int signo, const char *str) unsigned long esr = tsk->thread.fault_code; struct pt_regs *regs = task_pt_regs(tsk); + /* + * The signal sent to the global init needs to be shown, + * which is useful for debugging kill init issue. + */ + if (unlikely(is_global_init(tsk))) + goto dump; + /* Leave if the signal won't be shown */ if (!show_unhandled_signals || !unhandled_signal(tsk, signo) || !__ratelimit(&rs)) return; +dump: pr_info("%s[%d]: unhandled exception: ", tsk->comm, task_pid_nr(tsk)); if (esr) pr_cont("%s, ESR 0x%016lx, ", esr_get_class_string(esr), esr);
Currently, it's hard to debug panic issues caused by kill init on arm64, since there is no debug info from user mode in current panic msg such as the user_regs and maps info. This patch shows signal info sent to the global init, which will be helpful for debugging kill init issue caused by unhandled exception from user mode. - changes history: v3: https://lore.kernel.org/all/20240922095504.7182-1-qiwu.chen@transsion.com/ https://lore.kernel.org/all/20240922095504.7182-2-qiwu.chen@transsion.com/ v2: https://lore.kernel.org/all/20231110031553.33186-1-qiwu.chen@transsion.com/ v1: https://lore.kernel.org/all/20231110022720.GA3087@rlk/ Signed-off-by: qiwu.chen <qiwu.chen@transsion.com> --- arch/arm64/kernel/traps.c | 8 ++++++++ 1 file changed, 8 insertions(+)