[24/30] panic: Refactor the panic path

From: Guilherme G. Piccoli <gpiccoli@igalia.com> Sent: Wednesday, April 27, 2022 3:49 PM

The panic() function is somewhat convoluted - a lot of changes were
made over the years, adding comments that might be misleading/outdated
now, it has a code structure that is a bit complex to follow, with
lots of conditionals, for example. The panic notifier list is something
else - a single list, with multiple callbacks of different purposes,
that run in a non-deterministic order and may affect hardly kdump
reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.

This patch proposes a major refactor on the panic path based on Petr's
idea [0] - basically we split the notifiers list in three, having a set
of different call points in the panic path. Below a list of changes
proposed in this patch, culminating in the panic notifiers level
concept:

(a) First of all, we improved comments all over the function
and removed useless variables / includes. Also, as part of this
clean-up we concentrate the console flushing functions in a helper.

(b) As mentioned before, there is a split of the panic notifier list
in three, based on the purpose of the callback. The code contains
good documentation in form of comments, but a summary of the three
lists follows:

- the hypervisor list aims low-risk procedures to inform hypervisors
or firmware about the panic event, also includes LED-related functions;

- the informational list contains callbacks that provide more details,
like kernel offset or trace dump (if enabled) and also includes the
callbacks aimed at reducing log pollution or warns, like the RCU and
hung task disable callbacks;

- finally, the pre_reboot list is the old notifier list renamed,
containing the more risky callbacks that didn't fit the previous
lists. There is also a 4th list (the post_reboot one), but it's not
related with the original list - it contains late time architecture
callbacks aimed at stopping the machine, for example.

The 3 notifiers lists execute in different moments, hypervisor being
the first, followed by informational and finally the pre_reboot list.

(c) But then, there is the ordering problem of the notifiers against
the crash_kernel() call - kdump must be as reliable as possible.
For that, a simple binary "switch" as "crash_kexec_post_notifiers"
is not enough, hence we introduce here concept of panic notifier
levels: there are 5 levels, from 0 (no notifier executes before
kdump) until 4 (all notifiers run before kdump); the default level
is 2, in which the hypervisor and (iff we have any kmsg dumper)
the informational notifiers execute before kdump.

The detailed documentation of the levels is present in code comments
and in the kernel-parameters.txt file; as an analogy with the previous
panic() implementation, the level 0 is exactly the same as the old
behavior of notifiers, running all after kdump, and the level 4 is
the same as "crash_kexec_post_notifiers=Y" (we kept this parameter as
a deprecated one).

(d) Finally, an important change made here: we now use only the
function "crash_smp_send_stop()" to shut all the secondary CPUs
in the panic path. Before, there was a case of using the regular
"smp_send_stop()", but the better approach is to simplify the
code and try to use the function which was created exclusively
for the panic path. Experiments showed that it works fine, and
code was very simplified with that.

Functional change is expected from this refactor, since now we
call some notifiers by default before kdump, but the goal here
besides code clean-up is to have a better panic path, more
reliable and deterministic, but also very customizable.

[0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---

Special thanks to Petr and Baoquan for the suggestion and feedback in a previous
email thread. There's some important design decisions that worth mentioning and
discussing:

* The default panic notifiers level is 2, based on Petr Mladek's suggestion,
which makes a lot of sense. Of course, this is customizable through the
parameter, but would be something worthwhile to have a KConfig option to set
the default level? It would help distros that want the old behavior
(no notifiers before kdump) as default.

* The implementation choice was to _avoid_ intricate if conditionals in the
panic path, which would _definitely_ be present with the panic notifiers levels
idea; so, instead of lots of if conditionals, the set/clear bits approach with
functions called in 2 points (but executing only in one of them) is much easier
to follow an was used here; the ordering helper function and the comments also
help a lot to avoid confusion (hopefully).

* Choice was to *always* use crash_smp_send_stop() instead of sometimes making
use of the regular smp_send_stop(); for most architectures they are the same,
including Xen (on x86). For the ones that override it, all should work fine,
in the powerpc case it's even more correct (see the subsequent patch
"powerpc: Do not force all panic notifiers to execute before kdump")

There seems to be 2 cases that requires some plumbing to work 100% right:
- ARM doesn't disable local interrupts / FIQs in the crash version of
send_stop(); we patched that early in this series;
- x86 could face an issue if we have VMX and do use crash_smp_send_stop()
_without_ kdump, but this is fixed in the first patch of the series (and
it's a bug present even before this refactor).

* Notice we didn't add a sysrq for panic notifiers level - should have it?
Alejandro proposed recently to add a sysrq for "crash_kexec_post_notifiers",
let me know if you feel the need here Alejandro, since the core parameters are
present in /sys, I didn't consider much gain in having a sysrq, but of course
I'm open to suggestions!

Thanks advance for the review!

 .../admin-guide/kernel-parameters.txt         |  42 ++-
 include/linux/panic_notifier.h                |   1 +
 kernel/kexec_core.c                           |   8 +-
 kernel/panic.c                                | 292 +++++++++++++-----
 .../selftests/pstore/pstore_crash_test        |   5 +-
 5 files changed, 252 insertions(+), 96 deletions(-)

Message ID	20220427224924.592546-25-gpiccoli@igalia.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C2CA6C433EF for <xen-devel@archiver.kernel.org>; Wed, 27 Apr 2022 23:03:31 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.315606.534420 (Exim 4.92) (envelope-from <xen-devel-bounces@lists.xenproject.org>) id 1njqgo-00067E-EJ; Wed, 27 Apr 2022 23:03:18 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 315606.534420; Wed, 27 Apr 2022 23:03:17 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from <xen-devel-bounces@lists.xenproject.org>) id 1njqgm-0005xz-FI; Wed, 27 Apr 2022 23:03:16 +0000 Received: by outflank-mailman (input) for mailman id 315606; Wed, 27 Apr 2022 22:56:36 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from <SRS0=mIOA=VF=igalia.com=gpiccoli@srs-se1.protection.inumbo.net>) id 1njqaJ-00040Q-Q4 for xen-devel@lists.xenproject.org; Wed, 27 Apr 2022 22:56:36 +0000 Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 487ea9ed-c67d-11ec-a405-831a346695d4; Thu, 28 Apr 2022 00:56:33 +0200 (CEST) Received: from [179.113.53.197] (helo=localhost) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1njqa3-0002Pf-2B; Thu, 28 Apr 2022 00:56:20 +0200 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion <xen-devel.lists.xenproject.org> List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>, <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe> List-Post: <mailto:xen-devel@lists.xenproject.org> List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help> List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> X-Inumbo-ID: 487ea9ed-c67d-11ec-a405-831a346695d4 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=HBUuAr51OMuxJxYe3RD+OwNeXP+xs2hLWBMDzaZAVcU=; b=TdeuJAIDaqG7cwkqI4nHtdLny3 iyCGAksOLbOqYUJhSHm7/jnC9FYcG7MIAOI+24axWzVXXtY5atDwJhF+jKUhXmqfua6SRtlM+APMM /pYrjZPc6jxGuTXcpWZNvCBeGuUVSbIdA+s5Wy3xFKjzxGB2gHwr9ssNf9AZuLz8yIIYL86UZuejU XVMh5xio++Mw71vSoYtvIu7OGOfqGzJUfFYq3P814XI3qfdU5eIZpgCj0gTmmOGhFHuiK0SwPAxTu lSlbzkU4S5i7NKFBbNSmypNtm9Lj4sjOaRZ0Un4K1h/t8X5QpXDk7e0wi3PVQ1JnV2KXISI4VvN0g bri/UK8A==; From: "Guilherme G. Piccoli" <gpiccoli@igalia.com> To: akpm@linux-foundation.org, bhe@redhat.com, pmladek@suse.com, kexec@lists.infradead.org Cc: linux-kernel@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com, coresight@lists.linaro.org, linuxppc-dev@lists.ozlabs.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-leds@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linux-pm@vger.kernel.org, linux-remoteproc@vger.kernel.org, linux-s390@vger.kernel.org, linux-tegra@vger.kernel.org, linux-um@lists.infradead.org, linux-xtensa@linux-xtensa.org, netdev@vger.kernel.org, openipmi-developer@lists.sourceforge.net, rcu@vger.kernel.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, x86@kernel.org, kernel-dev@igalia.com, gpiccoli@igalia.com, kernel@gpiccoli.net, halves@canonical.com, fabiomirmar@gmail.com, alejandro.j.jimenez@oracle.com, andriy.shevchenko@linux.intel.com, arnd@arndb.de, bp@alien8.de, corbet@lwn.net, d.hatayama@jp.fujitsu.com, dave.hansen@linux.intel.com, dyoung@redhat.com, feng.tang@intel.com, gregkh@linuxfoundation.org, mikelley@microsoft.com, hidehiro.kawai.ez@hitachi.com, jgross@suse.com, john.ogness@linutronix.de, keescook@chromium.org, luto@kernel.org, mhiramat@kernel.org, mingo@redhat.com, paulmck@kernel.org, peterz@infradead.org, rostedt@goodmis.org, senozhatsky@chromium.org, stern@rowland.harvard.edu, tglx@linutronix.de, vgoyal@redhat.com, vkuznets@redhat.com, will@kernel.org Subject: [PATCH 24/30] panic: Refactor the panic path Date: Wed, 27 Apr 2022 19:49:18 -0300 Message-Id: <20220427224924.592546-25-gpiccoli@igalia.com> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20220427224924.592546-1-gpiccoli@igalia.com> References: <20220427224924.592546-1-gpiccoli@igalia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	The panic notifiers refactor \| expand [00/30] The panic notifiers refactor [01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart [02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path [03/30] notifier: Add panic notifiers info and purge trailing whitespaces [04/30] firmware: google: Convert regular spinlock into trylock on panic path [05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path [06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header [07/30] mips: ip22: Reword PANICED to PANICKED and remove useless header [08/30] powerpc/setup: Refactor/untangle panic notifiers [09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier [10/30] alpha: Clean-up the panic notifier code [11/30] um: Improve panic notifiers consistency and ordering [12/30] parisc: Replace regular spinlock with spin_trylock on panic path [13/30] s390/consoles: Improve panic notifiers reliability [14/30] panic: Properly identify the panic event to the notifiers' callbacks [15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers [16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers [17/30] tracing: Improve panic/die notifiers [18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set [19/30] panic: Add the panic hypervisor notifier list [20/30] panic: Add the panic informational notifier list [21/30] panic: Introduce the panic pre-reboot notifier list [22/30] panic: Introduce the panic post-reboot notifier list [23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers [24/30] panic: Refactor the panic path [25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier [26/30] Drivers: hv: Do not force all panic notifiers to execute before kdump [27/30] powerpc: Do not force all panic notifiers to execute before kdump [28/30] panic: Unexport crash_kexec_post_notifiers [29/30] powerpc: ps3, pseries: Avoid duplicate call to kmsg_dump() on panic [30/30] um: Avoid duplicate call to kmsg_dump()

[24/30] panic: Refactor the panic path

Commit Message

Comments

Patch