[v2,0/2] Introduce the pkill_on_warn parameter

Message ID	20211027233215.306111-1-alex.popov@linux.com (mailing list archive)
State	Changes Requested
Delegated to:	Kees Cook
Headers	show Return-Path: <linux-hardening-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8042C433F5 for <linux-hardening@archiver.kernel.org>; Wed, 27 Oct 2021 23:32:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A90A361073 for <linux-hardening@archiver.kernel.org>; Wed, 27 Oct 2021 23:32:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229480AbhJ0Xe7 (ORCPT <rfc822;linux-hardening@archiver.kernel.org>); Wed, 27 Oct 2021 19:34:59 -0400 Received: from mail-wm1-f47.google.com ([209.85.128.47]:34743 "EHLO mail-wm1-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229437AbhJ0Xe7 (ORCPT <rfc822;linux-hardening@vger.kernel.org>); Wed, 27 Oct 2021 19:34:59 -0400 Received: by mail-wm1-f47.google.com with SMTP id o4-20020a1c7504000000b0032cab7473caso3679769wmc.1; Wed, 27 Oct 2021 16:32:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=gGvRlwy1hKPSRe6UF/gEOA6Zs7pqAF2R7y94ByLf0fc=; b=V70icanAhmAgsDLoIqa/NExiNCZlvioQ1R74St1gs3N+iUPb/BaB3k2mOFqbUGvZGF s0+gFRqjkKidAKEwg8kOvawMCJV7w9GVdDQ8uINFBbq1UnJCgOIKRHRgZv4SSBBUe/19 O3on3D/vBK1A5tOlNeIfWYlCe6CxVz0ScNfnKSGrASjGUEiOdG9wi1vVrPpY/6qUHvWD iXuGifYePMUmVBJs98ZrNcJm0g6fTED6QUPHmS8RX4exYahI6pyXp6X9htzbY3nNM6XA QdZbDVdlRtoLTeL2BjLT0q8vEzLCL1QOPLa3U3IU8QjN20HiUBY35aLj3poYJfsnz1AV +Amg== X-Gm-Message-State: AOAM532C22K7rUEszf39deOYPym6Qci5b//cWkn8b/m2RMZCiZXMv7Pn q5SnM90zBHtTCfZkfGCaqOw= X-Google-Smtp-Source: ABdhPJyj9WWwmhPoreFXZQ7aB4YrttTeEGCd1+PRHxnVW+etewoW2O0IcbL3FlSqY563+nnOXCWjlw== X-Received: by 2002:a1c:9d82:: with SMTP id g124mr775917wme.160.1635377551711; Wed, 27 Oct 2021 16:32:31 -0700 (PDT) Received: from hackbase.. ([46.166.133.199]) by smtp.gmail.com with ESMTPSA id m3sm1199032wrx.52.2021.10.27.16.32.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Oct 2021 16:32:31 -0700 (PDT) From: Alexander Popov <alex.popov@linux.com> To: Jonathan Corbet <corbet@lwn.net>, Linus Torvalds <torvalds@linux-foundation.org>, Paul McKenney <paulmck@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Thomas Gleixner <tglx@linutronix.de>, Peter Zijlstra <peterz@infradead.org>, Joerg Roedel <jroedel@suse.de>, Maciej Rozycki <macro@orcam.me.uk>, Muchun Song <songmuchun@bytedance.com>, Viresh Kumar <viresh.kumar@linaro.org>, Robin Murphy <robin.murphy@arm.com>, Randy Dunlap <rdunlap@infradead.org>, Lu Baolu <baolu.lu@linux.intel.com>, Petr Mladek <pmladek@suse.com>, Kees Cook <keescook@chromium.org>, Luis Chamberlain <mcgrof@kernel.org>, Wei Liu <wl@xen.org>, John Ogness <john.ogness@linutronix.de>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Alexey Kardashevskiy <aik@ozlabs.ru>, Christophe Leroy <christophe.leroy@csgroup.eu>, Jann Horn <jannh@google.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Mark Rutland <mark.rutland@arm.com>, Andy Lutomirski <luto@kernel.org>, Dave Hansen <dave.hansen@linux.intel.com>, Steven Rostedt <rostedt@goodmis.org>, Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>, Laura Abbott <labbott@kernel.org>, David S Miller <davem@davemloft.net>, Borislav Petkov <bp@alien8.de>, Arnd Bergmann <arnd@arndb.de>, Andrew Scull <ascull@google.com>, Marc Zyngier <maz@kernel.org>, Jessica Yu <jeyu@kernel.org>, Iurii Zaikin <yzaikin@google.com>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Wang Qing <wangqing@vivo.com>, Mel Gorman <mgorman@suse.de>, Mauro Carvalho Chehab <mchehab+huawei@kernel.org>, Andrew Klychkov <andrew.a.klychkov@gmail.com>, Mathieu Chouquet-Stringer <me@mathieu.digital>, Daniel Borkmann <daniel@iogearbox.net>, Stephen Kitt <steve@sk2.org>, Stephen Boyd <sboyd@kernel.org>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, Mike Rapoport <rppt@kernel.org>, Bjorn Andersson <bjorn.andersson@linaro.org>, Alexander Popov <alex.popov@linux.com>, kernel-hardening@lists.openwall.com, linux-hardening@vger.kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: notify@kernel.org Subject: [PATCH v2 0/2] Introduce the pkill_on_warn parameter Date: Thu, 28 Oct 2021 02:32:13 +0300 Message-Id: <20211027233215.306111-1-alex.popov@linux.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-hardening.vger.kernel.org> X-Mailing-List: linux-hardening@vger.kernel.org

Alexander Popov Oct. 27, 2021, 11:32 p.m. UTC

Hello! This is the v2 of pkill_on_warn.
Changes from v1 and tricks for testing are described below.

Rationale
=========

Currently, the Linux kernel provides two types of reaction to kernel
warnings:
 1. Do nothing (by default),
 2. Call panic() if panic_on_warn is set. That's a very strong reaction,
    so panic_on_warn is usually disabled on production systems.

From a safety point of view, the Linux kernel misses a middle way of
handling kernel warnings:
 - The kernel should stop the activity that provokes a warning,
 - But the kernel should avoid complete denial of service.

From a security point of view, kernel warning messages provide a lot of
useful information for attackers. Many GNU/Linux distributions allow
unprivileged users to read the kernel log, so attackers use kernel
warning infoleak in vulnerability exploits. See the examples:
https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html
https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html

Let's introduce the pkill_on_warn sysctl.
If this parameter is set, the kernel kills all threads in a process that
provoked a kernel warning. This behavior is reasonable from a safety point of
view described above. It is also useful for kernel security hardening because
the system kills an exploit process that hits a kernel warning.

Moreover, bugs usually don't come alone, and a kernel warning may be
followed by memory corruption or other bad effects. So pkill_on_warn allows
the kernel to stop the process when the first signs of wrong behavior
are detected.


Changes from v1
===============

1) Introduce do_pkill_on_warn() and call it in all warning handling paths.

2) Do refactoring without functional changes in a separate patch.

3) Avoid killing init and kthreads.

4) Use do_send_sig_info() instead of do_group_exit().

5) Introduce sysctl instead of using core_param().


Tricks for testing
==================

1) This patch series was tested on x86_64 using CONFIG_LKDTM.
The kernel kills a process that performs this:
  echo WARNING > /sys/kernel/debug/provoke-crash/DIRECT

2) The warn_slowpath_fmt() path was tested using this trick:
4) Changing drivers/misc/lkdtm/bugs.c:lkdtm_WARNING() allowed me
to test all warning flavours:
 - WARN_ON()
 - WARN()
 - WARN_TAINT()
 - WARN_ON_ONCE()
 - WARN_ONCE()
 - WARN_TAINT_ONCE()

Thanks!

Alexander Popov (2):
  bug: do refactoring allowing to add a warning handling action
  sysctl: introduce kernel.pkill_on_warn

 Documentation/admin-guide/sysctl/kernel.rst | 14 ++++++++
 include/asm-generic/bug.h                   | 37 +++++++++++++++------
 include/linux/panic.h                       |  3 ++
 kernel/panic.c                              | 22 +++++++++++-
 kernel/sysctl.c                             |  9 +++++
 lib/bug.c                                   | 22 ++++++++----
 6 files changed, 90 insertions(+), 17 deletions(-)

Alexander Popov Nov. 12, 2021, 6:52 p.m. UTC | #1

On 28.10.2021 02:32, Alexander Popov wrote:
> Hello! This is the v2 of pkill_on_warn.
> Changes from v1 and tricks for testing are described below.

Hello everyone!
Friendly ping for your feedback.

Thanks.
Alexander

> Rationale
> =========
> 
> Currently, the Linux kernel provides two types of reaction to kernel
> warnings:
>   1. Do nothing (by default),
>   2. Call panic() if panic_on_warn is set. That's a very strong reaction,
>      so panic_on_warn is usually disabled on production systems.
> 
>  From a safety point of view, the Linux kernel misses a middle way of
> handling kernel warnings:
>   - The kernel should stop the activity that provokes a warning,
>   - But the kernel should avoid complete denial of service.
> 
>  From a security point of view, kernel warning messages provide a lot of
> useful information for attackers. Many GNU/Linux distributions allow
> unprivileged users to read the kernel log, so attackers use kernel
> warning infoleak in vulnerability exploits. See the examples:
> https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html
> https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
> https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html
> 
> Let's introduce the pkill_on_warn sysctl.
> If this parameter is set, the kernel kills all threads in a process that
> provoked a kernel warning. This behavior is reasonable from a safety point of
> view described above. It is also useful for kernel security hardening because
> the system kills an exploit process that hits a kernel warning.
> 
> Moreover, bugs usually don't come alone, and a kernel warning may be
> followed by memory corruption or other bad effects. So pkill_on_warn allows
> the kernel to stop the process when the first signs of wrong behavior
> are detected.
> 
> 
> Changes from v1
> ===============
> 
> 1) Introduce do_pkill_on_warn() and call it in all warning handling paths.
> 
> 2) Do refactoring without functional changes in a separate patch.
> 
> 3) Avoid killing init and kthreads.
> 
> 4) Use do_send_sig_info() instead of do_group_exit().
> 
> 5) Introduce sysctl instead of using core_param().
> 
> 
> Tricks for testing
> ==================
> 
> 1) This patch series was tested on x86_64 using CONFIG_LKDTM.
> The kernel kills a process that performs this:
>    echo WARNING > /sys/kernel/debug/provoke-crash/DIRECT
> 
> 2) The warn_slowpath_fmt() path was tested using this trick:
> diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
> index 84b87538a15d..3106c203ebb6 100644
> --- a/arch/x86/include/asm/bug.h
> +++ b/arch/x86/include/asm/bug.h
> @@ -73,7 +73,7 @@ do {                                                          \
>    * were to trigger, we'd rather wreck the machine in an attempt to get the
>    * message out than not know about it.
>    */
> -#define __WARN_FLAGS(flags)                                    \
> +#define ___WARN_FLAGS(flags)                                   \
>   do {                                                           \
>          instrumentation_begin();                                \
>          _BUG_FLAGS(ASM_UD2, BUGFLAG_WARNING|(flags));           \
> 
> 3) Testing pkill_on_warn with kthreads was done using this trick:
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index bce848e50512..13c56f472681 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2133,6 +2133,8 @@ static int __noreturn rcu_gp_kthread(void *unused)
>                  WRITE_ONCE(rcu_state.gp_state, RCU_GP_CLEANUP);
>                  rcu_gp_cleanup();
>                  WRITE_ONCE(rcu_state.gp_state, RCU_GP_CLEANED);
> +
> +               WARN_ONCE(1, "hello from kthread\n");
>          }
>   }
> 
> 4) Changing drivers/misc/lkdtm/bugs.c:lkdtm_WARNING() allowed me
> to test all warning flavours:
>   - WARN_ON()
>   - WARN()
>   - WARN_TAINT()
>   - WARN_ON_ONCE()
>   - WARN_ONCE()
>   - WARN_TAINT_ONCE()
> 
> Thanks!
> 
> Alexander Popov (2):
>    bug: do refactoring allowing to add a warning handling action
>    sysctl: introduce kernel.pkill_on_warn
> 
>   Documentation/admin-guide/sysctl/kernel.rst | 14 ++++++++
>   include/asm-generic/bug.h                   | 37 +++++++++++++++------
>   include/linux/panic.h                       |  3 ++
>   kernel/panic.c                              | 22 +++++++++++-
>   kernel/sysctl.c                             |  9 +++++
>   lib/bug.c                                   | 22 ++++++++----
>   6 files changed, 90 insertions(+), 17 deletions(-)
>

Linus Torvalds Nov. 12, 2021, 9:26 p.m. UTC | #2

On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <alex.popov@linux.com> wrote:
>
> Hello everyone!
> Friendly ping for your feedback.

I still haven't heard a compelling _reason_ for this all, and why
anybody should ever use this or care?

               Linus

Alexander Popov Nov. 13, 2021, 6:14 p.m. UTC | #3

On 13.11.2021 00:26, Linus Torvalds wrote:
> On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <alex.popov@linux.com> wrote:
>>
>> Hello everyone!
>> Friendly ping for your feedback.
> 
> I still haven't heard a compelling _reason_ for this all, and why
> anybody should ever use this or care?

Ok, to sum up:

Killing the process that hit a kernel warning complies with the Fail-Fast 
principle [1]. pkill_on_warn sysctl allows the kernel to stop the process when 
the **first signs** of wrong behavior are detected.

By default, the Linux kernel ignores a warning and proceeds the execution from 
the flawed state. That is opposite to the Fail-Fast principle.
A kernel warning may be followed by memory corruption or other negative effects, 
like in CVE-2019-18683 exploit [2] or many other cases detected by the SyzScope 
project [3]. pkill_on_warn would prevent the system from the errors going after 
a warning in the process context.

At the same time, pkill_on_warn does not kill the entire system like 
panic_on_warn. That is the middle way of handling kernel warnings.
Linus, it's similar to your BUG_ON() policy [4]. The process hitting BUG_ON() is 
killed, and the system proceeds to work. pkill_on_warn just brings a similar 
policy to WARN_ON() handling.

I believe that many Linux distros (which don't hit WARN_ON() here and there) 
will enable pkill_on_warn because it's reasonable from the safety and security 
points of view.

And I'm sure that the ELISA project by the Linux Foundation (Enabling Linux In 
Safety Applications [5]) would support the pkill_on_warn sysctl.
[Adding people from this project to CC]

I hope that I managed to show the rationale.

Best regards,
Alexander

[1]: https://en.wikipedia.org/wiki/Fail-fast
[2]: https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
[3]: https://www.usenix.org/system/files/sec22summer_zou.pdf
[4]: http://lkml.iu.edu/hypermail/linux/kernel/1610.0/01217.html
[5]: https://elisa.tech/

Linus Torvalds Nov. 13, 2021, 7:58 p.m. UTC | #4

On Sat, Nov 13, 2021 at 10:14 AM Alexander Popov <alex.popov@linux.com> wrote:
>
> Killing the process that hit a kernel warning complies with the Fail-Fast
> principle [1].

The thing is a WARNING.

It's not even clear that the warning has anything to do with the
process that triggered it. It could happen in an interrupt, or in some
async context (kernel threads, whatever), or the warning could just be
something that is detected by a different user than the thing that
actually caused the warning to become an issue.

If you want to reboot the machine on a kernel warning, you get that
fail-fast thing you want. There are two situations:

 - kernel testing (pretty much universally done in a virtual machine,
or simply just checking 'dmesg' afterwards)

 - hyperscalers like google etc that just want to take any suspect
machines offline asap

But sending a signal to a random process is just voodoo programming,
and as likely to cause other very odd failures as anything else.

I really don't see the point of that signal.

I'm happy to be proven wrong, but that will require some major
installation actually using it first and having a lot of strong
arguments to counter-act the above.

Seriously, WARN_ON() can happen in situations where sending a signal
may be a REALLY BAD idea, never mind the issue that it's not even
clear who the signal should be sent to.

Yes, yes, your patches have some random "safety guards", in that it
won't send the signal to a PF_KTHREAD or the global init process. But
those safety guards literally make my argument for me: sending a
signal to whoever randomly triggered a warning is simply _wrong_.
Adding random "don't do it in this case" doesn't make it right, it
only shows that "yes, it happens to the wrong person, and here's a
hack to avoid generating obvious problems".

Honestly, if the intent is to not have to parse the dmesg output, then
I think it would be much better to introduce a new /proc file to read
the kernel tainting state, and then some test manager process could be
able to poll() that file or something. Not sending a signal to random
targets, but have a much more explicit model.

That said, I'm not convinced that "just read the kernel message log"
is in any way wrong either.

                  Linus

Marco Elver Nov. 14, 2021, 2:21 p.m. UTC | #5

On Sat, Nov 13, 2021 at 11:58AM -0800, Linus Torvalds wrote:
> On Sat, Nov 13, 2021 at 10:14 AM Alexander Popov <alex.popov@linux.com> wrote:
[...]
> Honestly, if the intent is to not have to parse the dmesg output, then
> I think it would be much better to introduce a new /proc file to read
> the kernel tainting state, and then some test manager process could be
> able to poll() that file or something. Not sending a signal to random
> targets, but have a much more explicit model.
> 
> That said, I'm not convinced that "just read the kernel message log"
> is in any way wrong either.

We had this problem of "need to get errors/warnings that appear in the
kernel log" without actually polling the kernel log all the time. Since
5.12 there's the 'error_report' tracepoint for exactly this purpose [1].

Right now it only generates events on KASAN and KFENCE reports, but we
imagined it's easy enough to extend with more types. Like WARN, should
the need arise (you'd have to add it if you decide to go down that
route).

So you could implement a close-enough variant of the whole thing in
userspace using what tracepoints give you by just monitoring the trace
pipe. It'd be much easier to experiment with different policies as well.

[1] https://git.kernel.org/torvalds/c/9c0dee54eb91d48cca048bd7bd2c1f4a166e0252

Kaiwan N Billimoria Nov. 15, 2021, 8:12 a.m. UTC | #6

On Thu, 2021-10-28 at 02:32 +0300, Alexander Popov wrote:
> [...]
> 
> From a security point of view, kernel warning messages provide a lot of
> useful information for attackers. Many GNU/Linux distributions allow
> unprivileged users to read the kernel log, so attackers use kernel
> warning infoleak in vulnerability exploits. 
At the risk of being too simplistic, if the intention is to cut down infoleaks,
why not simply have a config (and/or sysctl) to toggle it - both at kernel build
as well as at runtime via a sysctl.

A minimal starting attempt at this, definitely incomplete (i've not actually written
the config anywhere, sorry, I'd just like to propose this as an idea for now) could
be something like this? (Am calling the kconfig CONFIG_TERSE_DIAGS_ONWARN):

---
 kernel/panic.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index cefd7d82366f..bbf00b0a8110 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -587,10 +587,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
    if (args)
        vprintk(args->fmt, args->args);
 
-   print_modules();
-
-   if (regs)
-       show_regs(regs);
+   if (IS_ENABLED(CONFIG_TERSE_DIAGS_ONWARN))
+       return;
 
    if (panic_on_warn) {
        /*
@@ -603,6 +601,11 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
        panic("panic_on_warn set ...\n");
    }   
 
+   print_modules();
+
+   if (regs)
+       show_regs(regs);
+
    if (!regs)
        dump_stack();

Lukas Bulwahn Nov. 15, 2021, 1:59 p.m. UTC | #7

On Sat, Nov 13, 2021 at 7:14 PM Alexander Popov <alex.popov@linux.com> wrote:
>
> On 13.11.2021 00:26, Linus Torvalds wrote:
> > On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <alex.popov@linux.com> wrote:
> >>
> >> Hello everyone!
> >> Friendly ping for your feedback.
> >
> > I still haven't heard a compelling _reason_ for this all, and why
> > anybody should ever use this or care?
>
> Ok, to sum up:
>
> Killing the process that hit a kernel warning complies with the Fail-Fast
> principle [1]. pkill_on_warn sysctl allows the kernel to stop the process when
> the **first signs** of wrong behavior are detected.
>
> By default, the Linux kernel ignores a warning and proceeds the execution from
> the flawed state. That is opposite to the Fail-Fast principle.
> A kernel warning may be followed by memory corruption or other negative effects,
> like in CVE-2019-18683 exploit [2] or many other cases detected by the SyzScope
> project [3]. pkill_on_warn would prevent the system from the errors going after
> a warning in the process context.
>
> At the same time, pkill_on_warn does not kill the entire system like
> panic_on_warn. That is the middle way of handling kernel warnings.
> Linus, it's similar to your BUG_ON() policy [4]. The process hitting BUG_ON() is
> killed, and the system proceeds to work. pkill_on_warn just brings a similar
> policy to WARN_ON() handling.
>
> I believe that many Linux distros (which don't hit WARN_ON() here and there)
> will enable pkill_on_warn because it's reasonable from the safety and security
> points of view.
>
> And I'm sure that the ELISA project by the Linux Foundation (Enabling Linux In
> Safety Applications [5]) would support the pkill_on_warn sysctl.
> [Adding people from this project to CC]
>
> I hope that I managed to show the rationale.
>

Alex, officially and formally, I cannot talk for the ELISA project
(Enabling Linux In Safety Applications) by the Linux Foundation and I
do not think there is anyone that can confidently do so on such a
detailed technical aspect that you are raising here, and as the
various participants in the ELISA Project have not really agreed on
such a technical aspect being one way or the other and I would not see
that happening quickly. However, I have spent quite some years on the
topic on "what is the right and important topics for using Linux in
safety applications"; so here are my five cents:

One of the general assumptions about safety applications and safety
systems is that the malfunction of a function within a system is more
critical, i.e., more likely to cause harm to people, directly or
indirectly, than the unavailability of the system. So, before
"something potentially unexpected happens"---which can have arbitrary
effects and hence effects difficult to foresee and control---, it is
better to just shutdown/silence the system, i.e., design a fail-safe
or fail-silent system, as the effect of shutdown is pretty easily
foreseeable during the overall system design and you could think about
what the overall system does, when the kernel crashes the usual way.

So, that brings us to what a user would expect from the kernel in a
safety-critical system: Shutdown on any event that is unexpected.

Here, I currently see panic_on_warn as the closest existing feature to
indicate any event that is unexpected and to shutdown the system. That
requires two things for the kernel development:

1. Allow a reasonably configured kernel to boot and run with
panic_on_warn set. Warnings should only be raised when something is
not configured as the developers expect it or the kernel is put into a
state that generally is _unexpected_ and has been exposed little to
the critical thought of the developer, to testing efforts and use in
other systems in the wild. Warnings should not be used for something
informative, which still allows the kernel to continue running in a
proper way in a generally expected environment. Up to my knowledge,
there are some kernels in production that run with panic_on_warn; so,
IMHO, this requirement is generally accepted (we might of course
discuss the one or other use of warn) and is not too much to ask for.

2. Really ensure that the system shuts down when it hits warn and
panic. That requires that the execution path for warn() and panic() is
not overly complicated (stuffed with various bells and whistles).
Otherwise, warn() and panic() could fail in various complex ways and
potentially keep the system running, although it should be shut down.
Some people in the ELISA Project looked a bit into why they believe
panic() shuts down a system but I have not seen a good system analysis
and argument why any third person could be convinced that panic()
works under all circumstances where it is invoked or that at least,
the circumstances under which panic really works is properly
documented. That is a central aspect for using Linux in a
reasonably-designed safety-critical system. That is possibly also
relevant for security, as you might see an attacker obtain information
because it was possible to "block" the kernel shutting down after
invoking panic() and hence, the attacker could obtain certain
information that was only possible because 1. the system got into an
inconsistent state, 2. it was detected by some check leading to warn()
or panic(), and 3. the system's security engineers assumed that the
system must have been shutting down at that point, as panic() was
invoked, and hence, this would be disallowing a lot of further
operations or some specific operations that the attacker would need to
trigger in that inconsistent state to obtain information.

To your feature, Alex, I do not see the need to have any refined
handling of killing a specific process when the kernel warns; stopping
the whole system is the better and more predictable thing to do. I
would prefer if systems, which have those high-integrity requirements,
e.g., in a highly secure---where stopping any unintended information
flow matters more than availability---or in fail-silent environments
in safety systems, can use panic_on_warn. That should address your
concern above of handling certain CVEs as well.

In summary, I am not supporting pkill_on_warn. I would support the
other points I mentioned above, i.e., a good enforced policy for use
of warn() and any investigation to understand the complexity of
panic() and reducing its complexity if triggered by such an
investigation.

Of course, the listeners and participants in the ELISA Project are
very, very diverse and still on a steep learning curve, i.e., what
does the kernel do, how complex are certain aspects in the kernel, and
what are reasonable system designs that are in reach for
certification. So, there might be some stakeholders in the ELISA
Project that consider availability of a Linux system safety-critical,
i.e., if the system with a Linux kernel is not available, something
really bad (harmful to people) happens. I personally do not know how
these stakeholders could (ever) argue the availability of a complex
system with a Linux kernel, with the availability criteria and the
needed confidence (evidence and required methods) for exposing anyone
to such system under our current societal expectations on technical
systems (you would to need show sufficient investigation of the
kernel's availability for a certification), but that does not stop
anyone looking into it... Those stakeholders should really speak for
themselves, if they see any need for such a refined control of
"something unexpected happens" (an invocation of warn) and more
generally what features from the kernel are needed for such systems.

Lukas

Gabriele Paoloni Nov. 15, 2021, 3:51 p.m. UTC | #8

On 15/11/2021 14:59, Lukas Bulwahn wrote:
> On Sat, Nov 13, 2021 at 7:14 PM Alexander Popov <alex.popov@linux.com> wrote:
>>
>> On 13.11.2021 00:26, Linus Torvalds wrote:
>>> On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <alex.popov@linux.com> wrote:
>>>>
>>>> Hello everyone!
>>>> Friendly ping for your feedback.
>>>
>>> I still haven't heard a compelling _reason_ for this all, and why
>>> anybody should ever use this or care?
>>
>> Ok, to sum up:
>>
>> Killing the process that hit a kernel warning complies with the Fail-Fast
>> principle [1]. pkill_on_warn sysctl allows the kernel to stop the process when
>> the **first signs** of wrong behavior are detected.
>>
>> By default, the Linux kernel ignores a warning and proceeds the execution from
>> the flawed state. That is opposite to the Fail-Fast principle.
>> A kernel warning may be followed by memory corruption or other negative effects,
>> like in CVE-2019-18683 exploit [2] or many other cases detected by the SyzScope
>> project [3]. pkill_on_warn would prevent the system from the errors going after
>> a warning in the process context.
>>
>> At the same time, pkill_on_warn does not kill the entire system like
>> panic_on_warn. That is the middle way of handling kernel warnings.
>> Linus, it's similar to your BUG_ON() policy [4]. The process hitting BUG_ON() is
>> killed, and the system proceeds to work. pkill_on_warn just brings a similar
>> policy to WARN_ON() handling.
>>
>> I believe that many Linux distros (which don't hit WARN_ON() here and there)
>> will enable pkill_on_warn because it's reasonable from the safety and security
>> points of view.
>>
>> And I'm sure that the ELISA project by the Linux Foundation (Enabling Linux In
>> Safety Applications [5]) would support the pkill_on_warn sysctl.
>> [Adding people from this project to CC]
>>
>> I hope that I managed to show the rationale.
>>
> 
> Alex, officially and formally, I cannot talk for the ELISA project
> (Enabling Linux In Safety Applications) by the Linux Foundation and I
> do not think there is anyone that can confidently do so on such a
> detailed technical aspect that you are raising here, and as the
> various participants in the ELISA Project have not really agreed on
> such a technical aspect being one way or the other and I would not see
> that happening quickly. However, I have spent quite some years on the
> topic on "what is the right and important topics for using Linux in
> safety applications"; so here are my five cents:
> 
> One of the general assumptions about safety applications and safety
> systems is that the malfunction of a function within a system is more
> critical, i.e., more likely to cause harm to people, directly or
> indirectly, than the unavailability of the system. So, before
> "something potentially unexpected happens"---which can have arbitrary
> effects and hence effects difficult to foresee and control---, it is
> better to just shutdown/silence the system, i.e., design a fail-safe
> or fail-silent system, as the effect of shutdown is pretty easily
> foreseeable during the overall system design and you could think about
> what the overall system does, when the kernel crashes the usual way.
> 
> So, that brings us to what a user would expect from the kernel in a
> safety-critical system: Shutdown on any event that is unexpected.
> 
> Here, I currently see panic_on_warn as the closest existing feature to
> indicate any event that is unexpected and to shutdown the system. That
> requires two things for the kernel development:
> 
> 1. Allow a reasonably configured kernel to boot and run with
> panic_on_warn set. Warnings should only be raised when something is
> not configured as the developers expect it or the kernel is put into a
> state that generally is _unexpected_ and has been exposed little to
> the critical thought of the developer, to testing efforts and use in
> other systems in the wild. Warnings should not be used for something
> informative, which still allows the kernel to continue running in a
> proper way in a generally expected environment. Up to my knowledge,
> there are some kernels in production that run with panic_on_warn; so,
> IMHO, this requirement is generally accepted (we might of course
> discuss the one or other use of warn) and is not too much to ask for.
> 
> 2. Really ensure that the system shuts down when it hits warn and
> panic. That requires that the execution path for warn() and panic() is
> not overly complicated (stuffed with various bells and whistles).
> Otherwise, warn() and panic() could fail in various complex ways and
> potentially keep the system running, although it should be shut down.
> Some people in the ELISA Project looked a bit into why they believe
> panic() shuts down a system but I have not seen a good system analysis
> and argument why any third person could be convinced that panic()
> works under all circumstances where it is invoked or that at least,
> the circumstances under which panic really works is properly
> documented. That is a central aspect for using Linux in a
> reasonably-designed safety-critical system. That is possibly also
> relevant for security, as you might see an attacker obtain information
> because it was possible to "block" the kernel shutting down after
> invoking panic() and hence, the attacker could obtain certain
> information that was only possible because 1. the system got into an
> inconsistent state, 2. it was detected by some check leading to warn()
> or panic(), and 3. the system's security engineers assumed that the
> system must have been shutting down at that point, as panic() was
> invoked, and hence, this would be disallowing a lot of further
> operations or some specific operations that the attacker would need to
> trigger in that inconsistent state to obtain information.
> 
> To your feature, Alex, I do not see the need to have any refined
> handling of killing a specific process when the kernel warns; stopping
> the whole system is the better and more predictable thing to do. I
> would prefer if systems, which have those high-integrity requirements,
> e.g., in a highly secure---where stopping any unintended information
> flow matters more than availability---or in fail-silent environments
> in safety systems, can use panic_on_warn. That should address your
> concern above of handling certain CVEs as well.
> 
> In summary, I am not supporting pkill_on_warn. I would support the
> other points I mentioned above, i.e., a good enforced policy for use
> of warn() and any investigation to understand the complexity of
> panic() and reducing its complexity if triggered by such an
> investigation.

Hi Alex

I also agree with the summary that Lukas gave here. From my experience
the safety system are always guarded by an external flow monitor (e.g. a
watchdog) that triggers in case the safety relevant workloads slows down
or block (for any reason); given this condition of use, a system that
goes into the panic state is always safe, since the watchdog would
trigger and drive the system automatically into safe state.
So I also don't see a clear advantage of having pkill_on_warn();
actually on the flip side it seems to me that such feature could
introduce more risk, as it kills only the threads of the process that
caused the kernel warning whereas the other processes are trusted to
run on a weaker Kernel (does killing the threads of the process that
caused the kernel warning always fix the Kernel condition that lead to
the warning?)

Thanks
Gab

> 
> Of course, the listeners and participants in the ELISA Project are
> very, very diverse and still on a steep learning curve, i.e., what
> does the kernel do, how complex are certain aspects in the kernel, and
> what are reasonable system designs that are in reach for
> certification. So, there might be some stakeholders in the ELISA
> Project that consider availability of a Linux system safety-critical,
> i.e., if the system with a Linux kernel is not available, something
> really bad (harmful to people) happens. I personally do not know how
> these stakeholders could (ever) argue the availability of a complex
> system with a Linux kernel, with the availability criteria and the
> needed confidence (evidence and required methods) for exposing anyone
> to such system under our current societal expectations on technical
> systems (you would to need show sufficient investigation of the
> kernel's availability for a certification), but that does not stop
> anyone looking into it... Those stakeholders should really speak for
> themselves, if they see any need for such a refined control of
> "something unexpected happens" (an invocation of warn) and more
> generally what features from the kernel are needed for such systems.
> 
> 
> Lukas
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#629): https://lists.elisa.tech/g/safety-architecture/message/629
> Mute This Topic: https://lists.elisa.tech/mt/87069369/6239706
> Group Owner: safety-architecture+owner@lists.elisa.tech
> Unsubscribe: https://lists.elisa.tech/g/safety-architecture/unsub [gpaoloni@redhat.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 
>

Steven Rostedt Nov. 15, 2021, 4:06 p.m. UTC | #9

On Mon, 15 Nov 2021 14:59:57 +0100
Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:

> 1. Allow a reasonably configured kernel to boot and run with
> panic_on_warn set. Warnings should only be raised when something is
> not configured as the developers expect it or the kernel is put into a
> state that generally is _unexpected_ and has been exposed little to
> the critical thought of the developer, to testing efforts and use in
> other systems in the wild. Warnings should not be used for something
> informative, which still allows the kernel to continue running in a
> proper way in a generally expected environment. Up to my knowledge,
> there are some kernels in production that run with panic_on_warn; so,
> IMHO, this requirement is generally accepted (we might of course

To me, WARN*() is the same as BUG*(). If it gets hit, it's a bug in the
kernel and needs to be fixed. I have several WARN*() calls in my code, and
it's all because the algorithms used is expected to prevent the condition
in the warning from happening. If the warning triggers, it means either that
the algorithm is wrong or my assumption about the algorithm is wrong. In
either case, the kernel needs to be updated. All my tests fail if a WARN*()
gets hit (anywhere in the kernel, not just my own).

After reading all the replies and thinking about this more, I find the
pkill_on_warning actually worse than not doing anything. If you are
concerned about exploits from warnings, the only real solution is a
panic_on_warning. Yes, it brings down the system, but really, it has to be
brought down anyway, because it is in need of a kernel update.

-- Steve

Kees Cook Nov. 15, 2021, 10:06 p.m. UTC | #10

On Mon, Nov 15, 2021 at 11:06:49AM -0500, Steven Rostedt wrote:
> On Mon, 15 Nov 2021 14:59:57 +0100
> Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> 
> > 1. Allow a reasonably configured kernel to boot and run with
> > panic_on_warn set. Warnings should only be raised when something is
> > not configured as the developers expect it or the kernel is put into a
> > state that generally is _unexpected_ and has been exposed little to
> > the critical thought of the developer, to testing efforts and use in
> > other systems in the wild. Warnings should not be used for something
> > informative, which still allows the kernel to continue running in a
> > proper way in a generally expected environment. Up to my knowledge,
> > there are some kernels in production that run with panic_on_warn; so,
> > IMHO, this requirement is generally accepted (we might of course
> 
> To me, WARN*() is the same as BUG*(). If it gets hit, it's a bug in the
> kernel and needs to be fixed. I have several WARN*() calls in my code, and
> it's all because the algorithms used is expected to prevent the condition
> in the warning from happening. If the warning triggers, it means either that
> the algorithm is wrong or my assumption about the algorithm is wrong. In
> either case, the kernel needs to be updated. All my tests fail if a WARN*()
> gets hit (anywhere in the kernel, not just my own).
> 
> After reading all the replies and thinking about this more, I find the
> pkill_on_warning actually worse than not doing anything. If you are
> concerned about exploits from warnings, the only real solution is a
> panic_on_warning. Yes, it brings down the system, but really, it has to be
> brought down anyway, because it is in need of a kernel update.

Hmm, yes. What it originally boiled down to, which is why Linus first
objected to BUG(), was that we don't know what other parts of the system
have been disrupted. The best example is just that of locking: if we
BUG() or do_exit() in the middle of holding a lock, we'll wreck whatever
subsystem that was attached to. Without a deterministic system state
unwinder, there really isn't a "safe" way to just stop a kernel thread.

With this pkill_on_warn, we avoid the BUG problem (since the thread of
execution continues and stops at an 'expected' place: the signal
handler).

However, now we have the newer objection from Linus, which is one of
attribution: the WARN might be hit during an "unrelated" thread of
execution and "current" gets blamed, etc. And beyond that, if we take
down a portion of userspace, what in userspace may be destabilized? In
theory, we get a case where any required daemons would be restarted by
init, but that's not "known".

The safest version of this I can think of is for processes to opt into
this mitigation. That would also cover the "special cases" we've seen
exposed too. i.e. init and kthreads would not opt in.

However, that's a lot to implement when Marco's tracing suggestion might
be sufficient and policy could be entirely implemented in userspace. It
could be as simple as this (totally untested):


diff --git a/include/trace/events/error_report.h b/include/trace/events/error_report.h
index 96f64bf218b2..129d22eb8b6e 100644
--- a/include/trace/events/error_report.h
+++ b/include/trace/events/error_report.h
@@ -16,6 +16,8 @@
 #define __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY
 
 enum error_detector {
+	ERROR_DETECTOR_WARN,
+	ERROR_DETECTOR_BUG,
 	ERROR_DETECTOR_KFENCE,
 	ERROR_DETECTOR_KASAN
 };
@@ -23,6 +25,8 @@ enum error_detector {
 #endif /* __ERROR_REPORT_DECLARE_TRACE_ENUMS_ONCE_ONLY */
 
 #define error_detector_list	\
+	EM(ERROR_DETECTOR_WARN, "warn")	\
+	EM(ERROR_DETECTOR_BUG, "bug")	\
 	EM(ERROR_DETECTOR_KFENCE, "kfence")	\
 	EMe(ERROR_DETECTOR_KASAN, "kasan")
 /* Always end the list with an EMe. */
diff --git a/lib/bug.c b/lib/bug.c
index 45a0584f6541..201b4070bbbc 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -48,6 +48,7 @@
 #include <linux/sched.h>
 #include <linux/rculist.h>
 #include <linux/ftrace.h>
+#include <trace/events/error_report.h>
 
 extern struct bug_entry __start___bug_table[], __stop___bug_table[];
 
@@ -198,6 +199,7 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
 		/* this is a WARN_ON rather than BUG/BUG_ON */
 		__warn(file, line, (void *)bugaddr, BUG_GET_TAINT(bug), regs,
 		       NULL);
+		trace_error_report_end(ERROR_DETECTOR_WARN, bugaddr);
 		return BUG_TRAP_TYPE_WARN;
 	}
 
@@ -206,6 +208,7 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs)
 	else
 		pr_crit("Kernel BUG at %pB [verbose debug info unavailable]\n",
 			(void *)bugaddr);
+	trace_error_report_end(ERROR_DETECTOR_BUG, bugaddr);
 
 	return BUG_TRAP_TYPE_BUG;
 }


Marco, is this the full version of monitoring this from the userspace
side?

	perf record -e error_report:error_report_end

Christophe Leroy Nov. 16, 2021, 6:37 a.m. UTC | #11

Le 15/11/2021 à 17:06, Steven Rostedt a écrit :
> On Mon, 15 Nov 2021 14:59:57 +0100
> Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> 
>> 1. Allow a reasonably configured kernel to boot and run with
>> panic_on_warn set. Warnings should only be raised when something is
>> not configured as the developers expect it or the kernel is put into a
>> state that generally is _unexpected_ and has been exposed little to
>> the critical thought of the developer, to testing efforts and use in
>> other systems in the wild. Warnings should not be used for something
>> informative, which still allows the kernel to continue running in a
>> proper way in a generally expected environment. Up to my knowledge,
>> there are some kernels in production that run with panic_on_warn; so,
>> IMHO, this requirement is generally accepted (we might of course
> 
> To me, WARN*() is the same as BUG*(). If it gets hit, it's a bug in the
> kernel and needs to be fixed. I have several WARN*() calls in my code, and
> it's all because the algorithms used is expected to prevent the condition
> in the warning from happening. If the warning triggers, it means either that
> the algorithm is wrong or my assumption about the algorithm is wrong. In
> either case, the kernel needs to be updated. All my tests fail if a WARN*()
> gets hit (anywhere in the kernel, not just my own).
> 
> After reading all the replies and thinking about this more, I find the
> pkill_on_warning actually worse than not doing anything. If you are
> concerned about exploits from warnings, the only real solution is a
> panic_on_warning. Yes, it brings down the system, but really, it has to be
> brought down anyway, because it is in need of a kernel update.
> 

We also have LIVEPATCH to avoid bringing down the system for a kernel 
update, don't we ? So I wouldn't expect bringing down a vital system 
just for a WARN.

As far as I understand from 
https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on, 
WARN() and WARN_ON() are meant to deal with those situations as 
gracefull as possible, allowing the system to continue running the best 
it can until a human controled action is taken.

So I'd expect the WARN/WARN_ON to be handled and I agree that that 
pkill_on_warning seems dangerous and unrelevant, probably more dangerous 
than doing nothing, especially as the WARN may trigger for a reason 
which has nothing to do with the running thread.

Christophe

Alexander Popov Nov. 16, 2021, 7:52 a.m. UTC | #12

On 15.11.2021 18:51, Gabriele Paoloni wrote:
> 
> 
> On 15/11/2021 14:59, Lukas Bulwahn wrote:
>> On Sat, Nov 13, 2021 at 7:14 PM Alexander Popov <alex.popov@linux.com> wrote:
>>>
>>> On 13.11.2021 00:26, Linus Torvalds wrote:
>>>> On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <alex.popov@linux.com> wrote:
>>>>>
>>>>> Hello everyone!
>>>>> Friendly ping for your feedback.
>>>>
>>>> I still haven't heard a compelling _reason_ for this all, and why
>>>> anybody should ever use this or care?
>>>
>>> Ok, to sum up:
>>>
>>> Killing the process that hit a kernel warning complies with the Fail-Fast
>>> principle [1]. pkill_on_warn sysctl allows the kernel to stop the process when
>>> the **first signs** of wrong behavior are detected.
>>>
>>> By default, the Linux kernel ignores a warning and proceeds the execution from
>>> the flawed state. That is opposite to the Fail-Fast principle.
>>> A kernel warning may be followed by memory corruption or other negative effects,
>>> like in CVE-2019-18683 exploit [2] or many other cases detected by the SyzScope
>>> project [3]. pkill_on_warn would prevent the system from the errors going after
>>> a warning in the process context.
>>>
>>> At the same time, pkill_on_warn does not kill the entire system like
>>> panic_on_warn. That is the middle way of handling kernel warnings.
>>> Linus, it's similar to your BUG_ON() policy [4]. The process hitting BUG_ON() is
>>> killed, and the system proceeds to work. pkill_on_warn just brings a similar
>>> policy to WARN_ON() handling.
>>>
>>> I believe that many Linux distros (which don't hit WARN_ON() here and there)
>>> will enable pkill_on_warn because it's reasonable from the safety and security
>>> points of view.
>>>
>>> And I'm sure that the ELISA project by the Linux Foundation (Enabling Linux In
>>> Safety Applications [5]) would support the pkill_on_warn sysctl.
>>> [Adding people from this project to CC]
>>>
>>> I hope that I managed to show the rationale.
>>>
>>
>> Alex, officially and formally, I cannot talk for the ELISA project
>> (Enabling Linux In Safety Applications) by the Linux Foundation and I
>> do not think there is anyone that can confidently do so on such a
>> detailed technical aspect that you are raising here, and as the
>> various participants in the ELISA Project have not really agreed on
>> such a technical aspect being one way or the other and I would not see
>> that happening quickly. However, I have spent quite some years on the
>> topic on "what is the right and important topics for using Linux in
>> safety applications"; so here are my five cents:
>>
>> One of the general assumptions about safety applications and safety
>> systems is that the malfunction of a function within a system is more
>> critical, i.e., more likely to cause harm to people, directly or
>> indirectly, than the unavailability of the system. So, before
>> "something potentially unexpected happens"---which can have arbitrary
>> effects and hence effects difficult to foresee and control---, it is
>> better to just shutdown/silence the system, i.e., design a fail-safe
>> or fail-silent system, as the effect of shutdown is pretty easily
>> foreseeable during the overall system design and you could think about
>> what the overall system does, when the kernel crashes the usual way.
>>
>> So, that brings us to what a user would expect from the kernel in a
>> safety-critical system: Shutdown on any event that is unexpected.
>>
>> Here, I currently see panic_on_warn as the closest existing feature to
>> indicate any event that is unexpected and to shutdown the system. That
>> requires two things for the kernel development:
>>
>> 1. Allow a reasonably configured kernel to boot and run with
>> panic_on_warn set. Warnings should only be raised when something is
>> not configured as the developers expect it or the kernel is put into a
>> state that generally is _unexpected_ and has been exposed little to
>> the critical thought of the developer, to testing efforts and use in
>> other systems in the wild. Warnings should not be used for something
>> informative, which still allows the kernel to continue running in a
>> proper way in a generally expected environment. Up to my knowledge,
>> there are some kernels in production that run with panic_on_warn; so,
>> IMHO, this requirement is generally accepted (we might of course
>> discuss the one or other use of warn) and is not too much to ask for.
>>
>> 2. Really ensure that the system shuts down when it hits warn and
>> panic. That requires that the execution path for warn() and panic() is
>> not overly complicated (stuffed with various bells and whistles).
>> Otherwise, warn() and panic() could fail in various complex ways and
>> potentially keep the system running, although it should be shut down.
>> Some people in the ELISA Project looked a bit into why they believe
>> panic() shuts down a system but I have not seen a good system analysis
>> and argument why any third person could be convinced that panic()
>> works under all circumstances where it is invoked or that at least,
>> the circumstances under which panic really works is properly
>> documented. That is a central aspect for using Linux in a
>> reasonably-designed safety-critical system. That is possibly also
>> relevant for security, as you might see an attacker obtain information
>> because it was possible to "block" the kernel shutting down after
>> invoking panic() and hence, the attacker could obtain certain
>> information that was only possible because 1. the system got into an
>> inconsistent state, 2. it was detected by some check leading to warn()
>> or panic(), and 3. the system's security engineers assumed that the
>> system must have been shutting down at that point, as panic() was
>> invoked, and hence, this would be disallowing a lot of further
>> operations or some specific operations that the attacker would need to
>> trigger in that inconsistent state to obtain information.
>>
>> To your feature, Alex, I do not see the need to have any refined
>> handling of killing a specific process when the kernel warns; stopping
>> the whole system is the better and more predictable thing to do. I
>> would prefer if systems, which have those high-integrity requirements,
>> e.g., in a highly secure---where stopping any unintended information
>> flow matters more than availability---or in fail-silent environments
>> in safety systems, can use panic_on_warn. That should address your
>> concern above of handling certain CVEs as well.
>>
>> In summary, I am not supporting pkill_on_warn. I would support the
>> other points I mentioned above, i.e., a good enforced policy for use
>> of warn() and any investigation to understand the complexity of
>> panic() and reducing its complexity if triggered by such an
>> investigation.
> 
> Hi Alex
> 
> I also agree with the summary that Lukas gave here. From my experience
> the safety system are always guarded by an external flow monitor (e.g. a
> watchdog) that triggers in case the safety relevant workloads slows down
> or block (for any reason); given this condition of use, a system that
> goes into the panic state is always safe, since the watchdog would
> trigger and drive the system automatically into safe state.
> So I also don't see a clear advantage of having pkill_on_warn();
> actually on the flip side it seems to me that such feature could
> introduce more risk, as it kills only the threads of the process that
> caused the kernel warning whereas the other processes are trusted to
> run on a weaker Kernel (does killing the threads of the process that
> caused the kernel warning always fix the Kernel condition that lead to
> the warning?)

Lukas, Gabriele, Robert,
Thanks for showing this from the safety point of view.

The part about believing in panic() functionality is amazing :)
Yes, safety critical systems depend on the robust ability to restart.

Best regards,
Alexander

Lukas Bulwahn Nov. 16, 2021, 8:01 a.m. UTC | #13

On Tue, Nov 16, 2021 at 8:52 AM Alexander Popov <alex.popov@linux.com> wrote:
>
> On 15.11.2021 18:51, Gabriele Paoloni wrote:
> >
> >
> > On 15/11/2021 14:59, Lukas Bulwahn wrote:
> >> On Sat, Nov 13, 2021 at 7:14 PM Alexander Popov <alex.popov@linux.com> wrote:
> >>>
> >>> On 13.11.2021 00:26, Linus Torvalds wrote:
> >>>> On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <alex.popov@linux.com> wrote:
> >>>>>
> >>>>> Hello everyone!
> >>>>> Friendly ping for your feedback.
> >>>>
> >>>> I still haven't heard a compelling _reason_ for this all, and why
> >>>> anybody should ever use this or care?
> >>>
> >>> Ok, to sum up:
> >>>
> >>> Killing the process that hit a kernel warning complies with the Fail-Fast
> >>> principle [1]. pkill_on_warn sysctl allows the kernel to stop the process when
> >>> the **first signs** of wrong behavior are detected.
> >>>
> >>> By default, the Linux kernel ignores a warning and proceeds the execution from
> >>> the flawed state. That is opposite to the Fail-Fast principle.
> >>> A kernel warning may be followed by memory corruption or other negative effects,
> >>> like in CVE-2019-18683 exploit [2] or many other cases detected by the SyzScope
> >>> project [3]. pkill_on_warn would prevent the system from the errors going after
> >>> a warning in the process context.
> >>>
> >>> At the same time, pkill_on_warn does not kill the entire system like
> >>> panic_on_warn. That is the middle way of handling kernel warnings.
> >>> Linus, it's similar to your BUG_ON() policy [4]. The process hitting BUG_ON() is
> >>> killed, and the system proceeds to work. pkill_on_warn just brings a similar
> >>> policy to WARN_ON() handling.
> >>>
> >>> I believe that many Linux distros (which don't hit WARN_ON() here and there)
> >>> will enable pkill_on_warn because it's reasonable from the safety and security
> >>> points of view.
> >>>
> >>> And I'm sure that the ELISA project by the Linux Foundation (Enabling Linux In
> >>> Safety Applications [5]) would support the pkill_on_warn sysctl.
> >>> [Adding people from this project to CC]
> >>>
> >>> I hope that I managed to show the rationale.
> >>>
> >>
> >> Alex, officially and formally, I cannot talk for the ELISA project
> >> (Enabling Linux In Safety Applications) by the Linux Foundation and I
> >> do not think there is anyone that can confidently do so on such a
> >> detailed technical aspect that you are raising here, and as the
> >> various participants in the ELISA Project have not really agreed on
> >> such a technical aspect being one way or the other and I would not see
> >> that happening quickly. However, I have spent quite some years on the
> >> topic on "what is the right and important topics for using Linux in
> >> safety applications"; so here are my five cents:
> >>
> >> One of the general assumptions about safety applications and safety
> >> systems is that the malfunction of a function within a system is more
> >> critical, i.e., more likely to cause harm to people, directly or
> >> indirectly, than the unavailability of the system. So, before
> >> "something potentially unexpected happens"---which can have arbitrary
> >> effects and hence effects difficult to foresee and control---, it is
> >> better to just shutdown/silence the system, i.e., design a fail-safe
> >> or fail-silent system, as the effect of shutdown is pretty easily
> >> foreseeable during the overall system design and you could think about
> >> what the overall system does, when the kernel crashes the usual way.
> >>
> >> So, that brings us to what a user would expect from the kernel in a
> >> safety-critical system: Shutdown on any event that is unexpected.
> >>
> >> Here, I currently see panic_on_warn as the closest existing feature to
> >> indicate any event that is unexpected and to shutdown the system. That
> >> requires two things for the kernel development:
> >>
> >> 1. Allow a reasonably configured kernel to boot and run with
> >> panic_on_warn set. Warnings should only be raised when something is
> >> not configured as the developers expect it or the kernel is put into a
> >> state that generally is _unexpected_ and has been exposed little to
> >> the critical thought of the developer, to testing efforts and use in
> >> other systems in the wild. Warnings should not be used for something
> >> informative, which still allows the kernel to continue running in a
> >> proper way in a generally expected environment. Up to my knowledge,
> >> there are some kernels in production that run with panic_on_warn; so,
> >> IMHO, this requirement is generally accepted (we might of course
> >> discuss the one or other use of warn) and is not too much to ask for.
> >>
> >> 2. Really ensure that the system shuts down when it hits warn and
> >> panic. That requires that the execution path for warn() and panic() is
> >> not overly complicated (stuffed with various bells and whistles).
> >> Otherwise, warn() and panic() could fail in various complex ways and
> >> potentially keep the system running, although it should be shut down.
> >> Some people in the ELISA Project looked a bit into why they believe
> >> panic() shuts down a system but I have not seen a good system analysis
> >> and argument why any third person could be convinced that panic()
> >> works under all circumstances where it is invoked or that at least,
> >> the circumstances under which panic really works is properly
> >> documented. That is a central aspect for using Linux in a
> >> reasonably-designed safety-critical system. That is possibly also
> >> relevant for security, as you might see an attacker obtain information
> >> because it was possible to "block" the kernel shutting down after
> >> invoking panic() and hence, the attacker could obtain certain
> >> information that was only possible because 1. the system got into an
> >> inconsistent state, 2. it was detected by some check leading to warn()
> >> or panic(), and 3. the system's security engineers assumed that the
> >> system must have been shutting down at that point, as panic() was
> >> invoked, and hence, this would be disallowing a lot of further
> >> operations or some specific operations that the attacker would need to
> >> trigger in that inconsistent state to obtain information.
> >>
> >> To your feature, Alex, I do not see the need to have any refined
> >> handling of killing a specific process when the kernel warns; stopping
> >> the whole system is the better and more predictable thing to do. I
> >> would prefer if systems, which have those high-integrity requirements,
> >> e.g., in a highly secure---where stopping any unintended information
> >> flow matters more than availability---or in fail-silent environments
> >> in safety systems, can use panic_on_warn. That should address your
> >> concern above of handling certain CVEs as well.
> >>
> >> In summary, I am not supporting pkill_on_warn. I would support the
> >> other points I mentioned above, i.e., a good enforced policy for use
> >> of warn() and any investigation to understand the complexity of
> >> panic() and reducing its complexity if triggered by such an
> >> investigation.
> >
> > Hi Alex
> >
> > I also agree with the summary that Lukas gave here. From my experience
> > the safety system are always guarded by an external flow monitor (e.g. a
> > watchdog) that triggers in case the safety relevant workloads slows down
> > or block (for any reason); given this condition of use, a system that
> > goes into the panic state is always safe, since the watchdog would
> > trigger and drive the system automatically into safe state.
> > So I also don't see a clear advantage of having pkill_on_warn();
> > actually on the flip side it seems to me that such feature could
> > introduce more risk, as it kills only the threads of the process that
> > caused the kernel warning whereas the other processes are trusted to
> > run on a weaker Kernel (does killing the threads of the process that
> > caused the kernel warning always fix the Kernel condition that lead to
> > the warning?)
>
> Lukas, Gabriele, Robert,
> Thanks for showing this from the safety point of view.
>
> The part about believing in panic() functionality is amazing :)
> Yes, safety critical systems depend on the robust ability to restart.
>

Well, there is really a lot of thought and willingness for engineering
effort to address the fact there must be high confidence that the
shutdown with panic() really works.

The proper start and restart of the kernel is actually another
issue... but there various sanity checks are possible before the
system switches into a mode that could potentially harm people (cause
physical damage, directly or indirectly).

Lukas

> Best regards,
> Alexander

Alexander Popov Nov. 16, 2021, 8:34 a.m. UTC | #14

On 16.11.2021 09:37, Christophe Leroy wrote:
> Le 15/11/2021 à 17:06, Steven Rostedt a écrit :
>> On Mon, 15 Nov 2021 14:59:57 +0100
>> Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>>
>>> 1. Allow a reasonably configured kernel to boot and run with
>>> panic_on_warn set. Warnings should only be raised when something is
>>> not configured as the developers expect it or the kernel is put into a
>>> state that generally is _unexpected_ and has been exposed little to
>>> the critical thought of the developer, to testing efforts and use in
>>> other systems in the wild. Warnings should not be used for something
>>> informative, which still allows the kernel to continue running in a
>>> proper way in a generally expected environment. Up to my knowledge,
>>> there are some kernels in production that run with panic_on_warn; so,
>>> IMHO, this requirement is generally accepted (we might of course
>>
>> To me, WARN*() is the same as BUG*(). If it gets hit, it's a bug in the
>> kernel and needs to be fixed. I have several WARN*() calls in my code, and
>> it's all because the algorithms used is expected to prevent the condition
>> in the warning from happening. If the warning triggers, it means either that
>> the algorithm is wrong or my assumption about the algorithm is wrong. In
>> either case, the kernel needs to be updated. All my tests fail if a WARN*()
>> gets hit (anywhere in the kernel, not just my own).
>>
>> After reading all the replies and thinking about this more, I find the
>> pkill_on_warning actually worse than not doing anything. If you are
>> concerned about exploits from warnings, the only real solution is a
>> panic_on_warning. Yes, it brings down the system, but really, it has to be
>> brought down anyway, because it is in need of a kernel update.
>>
> 
> We also have LIVEPATCH to avoid bringing down the system for a kernel
> update, don't we ? So I wouldn't expect bringing down a vital system
> just for a WARN.

Hello Christophe,

I would say that different systems have different requirements.
Not every Linux-based system needs live patching (it also has own limitations).

That's why I proposed a sysctl and didn't change the default kernel behavior.

> As far as I understand from
> https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on,
> WARN() and WARN_ON() are meant to deal with those situations as
> gracefull as possible, allowing the system to continue running the best
> it can until a human controled action is taken.

I can't agree here. There is a very strong push against adding BUG*() to the 
kernel source code. So there are a lot of cases when WARN*() is used for severe 
problems because kernel developers just don't have other options.

Currently, it looks like there is no consistent error handling policy in the kernel.

> So I'd expect the WARN/WARN_ON to be handled and I agree that that
> pkill_on_warning seems dangerous and unrelevant, probably more dangerous
> than doing nothing, especially as the WARN may trigger for a reason
> which has nothing to do with the running thread.

Sorry, I see a contradiction.
If killing a process hitting a kernel warning is "dangerous and unrelevant",
why killing a process on a kernel oops is fine? That's strange.

Linus calls that behavior "fairly benign" here: 
http://lkml.iu.edu/hypermail/linux/kernel/1610.0/01217.html

Best regards,
Alexander

Petr Mladek Nov. 16, 2021, 8:41 a.m. UTC | #15

On Tue 2021-11-16 10:52:39, Alexander Popov wrote:
> On 15.11.2021 18:51, Gabriele Paoloni wrote:
> > On 15/11/2021 14:59, Lukas Bulwahn wrote:
> > > On Sat, Nov 13, 2021 at 7:14 PM Alexander Popov <alex.popov@linux.com> wrote:
> > > > On 13.11.2021 00:26, Linus Torvalds wrote:
> > > > > On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <alex.popov@linux.com> wrote:
> > > > Killing the process that hit a kernel warning complies with the Fail-Fast
> > > > principle [1]. pkill_on_warn sysctl allows the kernel to stop the process when
> > > > the **first signs** of wrong behavior are detected.
> > > > 
> > > In summary, I am not supporting pkill_on_warn. I would support the
> > > other points I mentioned above, i.e., a good enforced policy for use
> > > of warn() and any investigation to understand the complexity of
> > > panic() and reducing its complexity if triggered by such an
> > > investigation.
> > 
> > Hi Alex
> > 
> > I also agree with the summary that Lukas gave here. From my experience
> > the safety system are always guarded by an external flow monitor (e.g. a
> > watchdog) that triggers in case the safety relevant workloads slows down
> > or block (for any reason); given this condition of use, a system that
> > goes into the panic state is always safe, since the watchdog would
> > trigger and drive the system automatically into safe state.
> > So I also don't see a clear advantage of having pkill_on_warn();
> > actually on the flip side it seems to me that such feature could
> > introduce more risk, as it kills only the threads of the process that
> > caused the kernel warning whereas the other processes are trusted to
> > run on a weaker Kernel (does killing the threads of the process that
> > caused the kernel warning always fix the Kernel condition that lead to
> > the warning?)
> 
> Lukas, Gabriele, Robert,
> Thanks for showing this from the safety point of view.
> 
> The part about believing in panic() functionality is amazing :)

Nothing is 100% reliable.

With printk() maintainer hat on, the current panic() implementation
is less reliable because it tries hard to provide some debugging
information, for example, error message, backtrace, registry,
flush pending messages on console, crashdump.

See panic() implementation, the reboot is done by emergency_restart().
The rest is about duping the information.

Well, the information is important. Otherwise, it is really hard to
fix the problem.

From my experience, especially the access to consoles is not fully
safe. The reliability might improve a lot when a lockless console
is used. I guess that using non-volatile memory for the log buffer
might be even more reliable.

I am not familiar with the code under emergency_restart(). I am not
sure how reliable it is.

> Yes, safety critical systems depend on the robust ability to restart.

If I wanted to implement a super-reliable panic() I would
use some external device that would cause power-reset when
the watched device is not responding.

Best Regards,
Petr


PS: I do not believe much into the pkill approach as well.

    It is similar to OOM killer. And I always had to restart the
    system when it was triggered.

    Also kernel is not prepared for the situation that an external
    code kills a kthread. And kthreads are used by many subsystems
    to handle work that has to be done asynchronously and/or in
    process context. And I guess that kthreads are non-trivial
    source of WARN().

Lukas Bulwahn Nov. 16, 2021, 8:57 a.m. UTC | #16

On Tue, Nov 16, 2021 at 7:37 AM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
>
>
> Le 15/11/2021 à 17:06, Steven Rostedt a écrit :
> > On Mon, 15 Nov 2021 14:59:57 +0100
> > Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> >
> >> 1. Allow a reasonably configured kernel to boot and run with
> >> panic_on_warn set. Warnings should only be raised when something is
> >> not configured as the developers expect it or the kernel is put into a
> >> state that generally is _unexpected_ and has been exposed little to
> >> the critical thought of the developer, to testing efforts and use in
> >> other systems in the wild. Warnings should not be used for something
> >> informative, which still allows the kernel to continue running in a
> >> proper way in a generally expected environment. Up to my knowledge,
> >> there are some kernels in production that run with panic_on_warn; so,
> >> IMHO, this requirement is generally accepted (we might of course
> >
> > To me, WARN*() is the same as BUG*(). If it gets hit, it's a bug in the
> > kernel and needs to be fixed. I have several WARN*() calls in my code, and
> > it's all because the algorithms used is expected to prevent the condition
> > in the warning from happening. If the warning triggers, it means either that
> > the algorithm is wrong or my assumption about the algorithm is wrong. In
> > either case, the kernel needs to be updated. All my tests fail if a WARN*()
> > gets hit (anywhere in the kernel, not just my own).
> >
> > After reading all the replies and thinking about this more, I find the
> > pkill_on_warning actually worse than not doing anything. If you are
> > concerned about exploits from warnings, the only real solution is a
> > panic_on_warning. Yes, it brings down the system, but really, it has to be
> > brought down anyway, because it is in need of a kernel update.
> >
>
> We also have LIVEPATCH to avoid bringing down the system for a kernel
> update, don't we ? So I wouldn't expect bringing down a vital system
> just for a WARN.
>
> As far as I understand from
> https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on,
> WARN() and WARN_ON() are meant to deal with those situations as
> gracefull as possible, allowing the system to continue running the best
> it can until a human controled action is taken.
>
> So I'd expect the WARN/WARN_ON to be handled and I agree that that
> pkill_on_warning seems dangerous and unrelevant, probably more dangerous
> than doing nothing, especially as the WARN may trigger for a reason
> which has nothing to do with the running thread.
>

Christophe,

I agree with a reasonable goal that WARN() should allow users "to deal
with those situations as gracefull as possible, allowing the system to
continue running the best it can until a human controled action is
taken."

However, that makes me wonder even more: what does the system after a
WARN() invocation still need to provide as properly working
functionality, so that the human can take action, and how can the
kernel indicate to the whole user applications that a certain
functionality is not working anymore and how adaptive can those user
application really be here? Making that explicit for every WARN()
invocation seems to be tricky and probably also quite error-prone. So,
in the end, after a WARN(), you end up running a system where you have
this uncomfortable feeling of a running system where some things work
and some things do not and it might be insecure (the whole system
security concept is invalidated, because security features do not
work, security holes are opened etc.) or other surprises happen.

The panic_on_warn implements a simple policy of that "run as graceful
as possible": We assume stopping the kernel is _graceful_, and we just
assume that the functionality "panic shuts down the system" still
works properly after any WARN() invocation. Once the system is shut
down, the human can take action and switch it into some (remote)
diagnostic mode for further analysis and repair.

I am wondering if that policy and that assumption holds for all WARN()
invocations in the kernel? I would hope that we can answer this
question, which is much simpler than getting the precise answer on
"what as graceful as possible actually means".

Lukas

Alexander Popov Nov. 16, 2021, 9:12 a.m. UTC | #17

On 16.11.2021 01:06, Kees Cook wrote:
> Hmm, yes. What it originally boiled down to, which is why Linus first
> objected to BUG(), was that we don't know what other parts of the system
> have been disrupted. The best example is just that of locking: if we
> BUG() or do_exit() in the middle of holding a lock, we'll wreck whatever
> subsystem that was attached to. Without a deterministic system state
> unwinder, there really isn't a "safe" way to just stop a kernel thread.
> 
> With this pkill_on_warn, we avoid the BUG problem (since the thread of
> execution continues and stops at an 'expected' place: the signal
> handler).
> 
> However, now we have the newer objection from Linus, which is one of
> attribution: the WARN might be hit during an "unrelated" thread of
> execution and "current" gets blamed, etc. And beyond that, if we take
> down a portion of userspace, what in userspace may be destabilized? In
> theory, we get a case where any required daemons would be restarted by
> init, but that's not "known".
> 
> The safest version of this I can think of is for processes to opt into
> this mitigation. That would also cover the "special cases" we've seen
> exposed too. i.e. init and kthreads would not opt in.
> 
> However, that's a lot to implement when Marco's tracing suggestion might
> be sufficient and policy could be entirely implemented in userspace. It
> could be as simple as this (totally untested):

I don't think that this userspace warning handling can work as pkill_on_warn.

1. The kernel code execution continues after WARN_ON(), it will not wait some 
userspace daemon that is polling trace events. That's not different from 
ignoring and having all negative effects after WARN_ON().

2. This userspace policy will miss WARN_ON_ONCE(), WARN_ONCE() and 
WARN_TAINT_ONCE() after the first hit.

Oh, wait...
I got a crazy idea that may bring more consistency in the error handling mess.

What if the Linux kernel had a LSM module responsible for error handling policy?
That would require adding LSM hooks to BUG*(), WARN*(), KERN_EMERG, etc.
In such LSM policy we can decide immediately how to react on the kernel error.
We can even decide depending on the subsystem and things like that.

(idea for brainstorming)

Best regards,
Alexander

Lukas Bulwahn Nov. 16, 2021, 9:19 a.m. UTC | #18

On Tue, Nov 16, 2021 at 9:41 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Tue 2021-11-16 10:52:39, Alexander Popov wrote:
> > On 15.11.2021 18:51, Gabriele Paoloni wrote:
> > > On 15/11/2021 14:59, Lukas Bulwahn wrote:
> > > > On Sat, Nov 13, 2021 at 7:14 PM Alexander Popov <alex.popov@linux.com> wrote:
> > > > > On 13.11.2021 00:26, Linus Torvalds wrote:
> > > > > > On Fri, Nov 12, 2021 at 10:52 AM Alexander Popov <alex.popov@linux.com> wrote:
> > > > > Killing the process that hit a kernel warning complies with the Fail-Fast
> > > > > principle [1]. pkill_on_warn sysctl allows the kernel to stop the process when
> > > > > the **first signs** of wrong behavior are detected.
> > > > >
> > > > In summary, I am not supporting pkill_on_warn. I would support the
> > > > other points I mentioned above, i.e., a good enforced policy for use
> > > > of warn() and any investigation to understand the complexity of
> > > > panic() and reducing its complexity if triggered by such an
> > > > investigation.
> > >
> > > Hi Alex
> > >
> > > I also agree with the summary that Lukas gave here. From my experience
> > > the safety system are always guarded by an external flow monitor (e.g. a
> > > watchdog) that triggers in case the safety relevant workloads slows down
> > > or block (for any reason); given this condition of use, a system that
> > > goes into the panic state is always safe, since the watchdog would
> > > trigger and drive the system automatically into safe state.
> > > So I also don't see a clear advantage of having pkill_on_warn();
> > > actually on the flip side it seems to me that such feature could
> > > introduce more risk, as it kills only the threads of the process that
> > > caused the kernel warning whereas the other processes are trusted to
> > > run on a weaker Kernel (does killing the threads of the process that
> > > caused the kernel warning always fix the Kernel condition that lead to
> > > the warning?)
> >
> > Lukas, Gabriele, Robert,
> > Thanks for showing this from the safety point of view.
> >
> > The part about believing in panic() functionality is amazing :)
>
> Nothing is 100% reliable.
>
> With printk() maintainer hat on, the current panic() implementation
> is less reliable because it tries hard to provide some debugging
> information, for example, error message, backtrace, registry,
> flush pending messages on console, crashdump.
>
> See panic() implementation, the reboot is done by emergency_restart().
> The rest is about duping the information.
>
> Well, the information is important. Otherwise, it is really hard to
> fix the problem.
>
> From my experience, especially the access to consoles is not fully
> safe. The reliability might improve a lot when a lockless console
> is used. I guess that using non-volatile memory for the log buffer
> might be even more reliable.
>
> I am not familiar with the code under emergency_restart(). I am not
> sure how reliable it is.
>
> > Yes, safety critical systems depend on the robust ability to restart.
>
> If I wanted to implement a super-reliable panic() I would
> use some external device that would cause power-reset when
> the watched device is not responding.
>

Petr, that is basically the common system design taken.

The whole challenge then remains to show that:

Once panic() was invoked, the watched device does not signal being
alive unintentionally, while the panic() is stuck in its shutdown
routines. That requires having a panic() or other shutdown routine
that still reliably can do something that the kernel routine that
makes the watched device signal does not signal anymore.


Lukas

> Best Regards,
> Petr
>
>
> PS: I do not believe much into the pkill approach as well.
>
>     It is similar to OOM killer. And I always had to restart the
>     system when it was triggered.
>
>     Also kernel is not prepared for the situation that an external
>     code kills a kthread. And kthreads are used by many subsystems
>     to handle work that has to be done asynchronously and/or in
>     process context. And I guess that kthreads are non-trivial
>     source of WARN().

James Bottomley Nov. 16, 2021, 1:07 p.m. UTC | #19

On Mon, 2021-11-15 at 14:06 -0800, Kees Cook wrote:
> On Mon, Nov 15, 2021 at 11:06:49AM -0500, Steven Rostedt wrote:
> > On Mon, 15 Nov 2021 14:59:57 +0100
> > Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
> > 
> > > 1. Allow a reasonably configured kernel to boot and run with
> > > panic_on_warn set. Warnings should only be raised when something
> > > is not configured as the developers expect it or the kernel is
> > > put into a state that generally is _unexpected_ and has been
> > > exposed little to the critical thought of the developer, to
> > > testing efforts and use in other systems in the wild. Warnings
> > > should not be used for something informative, which still allows
> > > the kernel to continue running in a proper way in a generally
> > > expected environment. Up to my knowledge, there are some kernels
> > > in production that run with panic_on_warn; so, IMHO, this
> > > requirement is generally accepted (we might of course
> > 
> > To me, WARN*() is the same as BUG*(). If it gets hit, it's a bug in
> > the kernel and needs to be fixed. I have several WARN*() calls in
> > my code, and it's all because the algorithms used is expected to
> > prevent the condition in the warning from happening. If the warning
> > triggers, it means either that the algorithm is wrong or my
> > assumption about the algorithm is wrong. In either case, the kernel
> > needs to be updated. All my tests fail if a WARN*() gets hit
> > (anywhere in the kernel, not just my own).
> > 
> > After reading all the replies and thinking about this more, I find
> > the pkill_on_warning actually worse than not doing anything. If you
> > are concerned about exploits from warnings, the only real solution
> > is a panic_on_warning. Yes, it brings down the system, but really,
> > it has to be brought down anyway, because it is in need of a kernel
> > update.
> 
> Hmm, yes. What it originally boiled down to, which is why Linus first
> objected to BUG(), was that we don't know what other parts of the
> system have been disrupted. The best example is just that of locking:
> if we BUG() or do_exit() in the middle of holding a lock, we'll wreck
> whatever subsystem that was attached to. Without a deterministic
> system state unwinder, there really isn't a "safe" way to just stop a
> kernel thread.

But this misses the real point: the majority of WARN conditions are in
device drivers checking expected device state against an internal state
model.  If this triggers it's a problem with the device not the thread,
so killing the thread is blaming the wrong party and making the
situation worse because it didn't do anything to address the actual
problem.

> With this pkill_on_warn, we avoid the BUG problem (since the thread
> of execution continues and stops at an 'expected' place: the signal
> handler).

And what about the unexpected state?

> However, now we have the newer objection from Linus, which is one of
> attribution: the WARN might be hit during an "unrelated" thread of
> execution and "current" gets blamed, etc. And beyond that, if we take
> down a portion of userspace, what in userspace may be destabilized?
> In theory, we get a case where any required daemons would be
> restarted by init, but that's not "known".
> 
> The safest version of this I can think of is for processes to opt
> into this mitigation. That would also cover the "special cases" we've
> seen exposed too. i.e. init and kthreads would not opt in.
> 
> However, that's a lot to implement when Marco's tracing suggestion
> might be sufficient and policy could be entirely implemented in
> userspace. It could be as simple as this (totally untested):

Really, no, this is precisely wrong thinking.  If the condition were
recoverable it wouldn't result in a WARN.  There are some WARNs where
we think the condition is unexpected enough not to bother adding error
handling (we need these reporting so we know that the assumption was
wrong), but for most if there were a way to handle it we'd have built
it into the usual error flow.  What WARN means is that an unexpected
condition occurred which means the kernel itself is in an unknown
state.  You can't recover from that by killing and restarting random
stuff, you have to reinitialize to a known state (i.e. reset the
system).  Some of the reason we do WARN instead of BUG is that we
believe the state contamination is limited and if you're careful the
system can continue in a degraded state if the user wants to accept the
risk.  Thinking the user can handle the state reset locally by some
preset policy is pure fantasy: if we didn't know how to fix it at the
point it occurred, why would something far away from the action when
most of the information has been lost have a better chance?

Your only policy choices when hitting WARN are

   1. Accept the risk and continue degraded operation, or
   2. reset the system to a known good state.

James

James Bottomley Nov. 16, 2021, 1:20 p.m. UTC | #20

On Tue, 2021-11-16 at 09:41 +0100, Petr Mladek wrote:
[...]
> If I wanted to implement a super-reliable panic() I would
> use some external device that would cause power-reset when
> the watched device is not responding.

They're called watchdog timers.  We have a whole subsystem full of
them:

drivers/watchdog

We used them in old cluster HA systems to guarantee successful recovery
of shared state from contaminated cluster members, but I think they'd
serve the reliable panic need equally well.  Most server class systems
today have them built in (on the BMC if they don't have a separate
mechanism), they're just not usually activated.

James

Kees Cook Nov. 16, 2021, 6:41 p.m. UTC | #21

On Tue, Nov 16, 2021 at 12:12:16PM +0300, Alexander Popov wrote:
> What if the Linux kernel had a LSM module responsible for error handling policy?
> That would require adding LSM hooks to BUG*(), WARN*(), KERN_EMERG, etc.
> In such LSM policy we can decide immediately how to react on the kernel error.
> We can even decide depending on the subsystem and things like that.

That would solve the "atomicity" issue the WARN tracepoint solution has,
and it would allow for very flexible userspace policy.

I actually wonder if the existing panic_on_* sites should serve as a
guide for where to put the hooks. The current sysctls could be replaced
by the hooks and a simple LSM.

Casey Schaufler Nov. 16, 2021, 7 p.m. UTC | #22

On 11/16/2021 10:41 AM, Kees Cook wrote:
> On Tue, Nov 16, 2021 at 12:12:16PM +0300, Alexander Popov wrote:
>> What if the Linux kernel had a LSM module responsible for error handling policy?
>> That would require adding LSM hooks to BUG*(), WARN*(), KERN_EMERG, etc.
>> In such LSM policy we can decide immediately how to react on the kernel error.
>> We can even decide depending on the subsystem and things like that.
> That would solve the "atomicity" issue the WARN tracepoint solution has,
> and it would allow for very flexible userspace policy.
>
> I actually wonder if the existing panic_on_* sites should serve as a
> guide for where to put the hooks. The current sysctls could be replaced
> by the hooks and a simple LSM.

Do you really want to make error handling a "security" issue?
If you add security_bug(), security_warn_on() and the like
you're begging that they be included in SELinux (AppArmor) policy.
BPF, too, come to think of it. Is that what you want?

Kees Cook Nov. 18, 2021, 5:32 p.m. UTC | #23

On Tue, Nov 16, 2021 at 11:00:23AM -0800, Casey Schaufler wrote:
> On 11/16/2021 10:41 AM, Kees Cook wrote:
> > On Tue, Nov 16, 2021 at 12:12:16PM +0300, Alexander Popov wrote:
> > > What if the Linux kernel had a LSM module responsible for error handling policy?
> > > That would require adding LSM hooks to BUG*(), WARN*(), KERN_EMERG, etc.
> > > In such LSM policy we can decide immediately how to react on the kernel error.
> > > We can even decide depending on the subsystem and things like that.
> > That would solve the "atomicity" issue the WARN tracepoint solution has,
> > and it would allow for very flexible userspace policy.
> > 
> > I actually wonder if the existing panic_on_* sites should serve as a
> > guide for where to put the hooks. The current sysctls could be replaced
> > by the hooks and a simple LSM.
> 
> Do you really want to make error handling a "security" issue?
> If you add security_bug(), security_warn_on() and the like
> you're begging that they be included in SELinux (AppArmor) policy.
> BPF, too, come to think of it. Is that what you want?

Yeah, that is what I was thinking. This would give the LSM a view into
kernel state, which seems a reasonable thing to do. If system integrity
is compromised, an LSM may want to stop trusting things.

A dedicated error-handling LSM could be added for those hooks that
implemented the existing default panic_on_* sysctls, and could expand on
that logic for other actions.

Casey Schaufler Nov. 18, 2021, 6:30 p.m. UTC | #24

On 11/18/2021 9:32 AM, Kees Cook wrote:
> On Tue, Nov 16, 2021 at 11:00:23AM -0800, Casey Schaufler wrote:
>> On 11/16/2021 10:41 AM, Kees Cook wrote:
>>> On Tue, Nov 16, 2021 at 12:12:16PM +0300, Alexander Popov wrote:
>>>> What if the Linux kernel had a LSM module responsible for error handling policy?
>>>> That would require adding LSM hooks to BUG*(), WARN*(), KERN_EMERG, etc.
>>>> In such LSM policy we can decide immediately how to react on the kernel error.
>>>> We can even decide depending on the subsystem and things like that.
>>> That would solve the "atomicity" issue the WARN tracepoint solution has,
>>> and it would allow for very flexible userspace policy.
>>>
>>> I actually wonder if the existing panic_on_* sites should serve as a
>>> guide for where to put the hooks. The current sysctls could be replaced
>>> by the hooks and a simple LSM.
>> Do you really want to make error handling a "security" issue?
>> If you add security_bug(), security_warn_on() and the like
>> you're begging that they be included in SELinux (AppArmor) policy.
>> BPF, too, come to think of it. Is that what you want?
> Yeah, that is what I was thinking. This would give the LSM a view into
> kernel state, which seems a reasonable thing to do. If system integrity
> is compromised, an LSM may want to stop trusting things.

How are you planning to communicate the security relevance of the
warning to the LSM? I don't think that __FILE__, __LINE__ or __func__
is great information to base security policy on. Nor is a backtrace.

> A dedicated error-handling LSM could be added for those hooks that
> implemented the existing default panic_on_* sysctls, and could expand on
> that logic for other actions.

I can see having an interface like LSM for choosing a bug/warn policy.
I worry about expanding the LSM hook list for a case where I would
hope no existing LSM would use them, and the new LSM doesn't use any
of the existing hooks.

Kees Cook Nov. 18, 2021, 8:29 p.m. UTC | #25

On Thu, Nov 18, 2021 at 10:30:32AM -0800, Casey Schaufler wrote:
> On 11/18/2021 9:32 AM, Kees Cook wrote:
> > On Tue, Nov 16, 2021 at 11:00:23AM -0800, Casey Schaufler wrote:
> > > On 11/16/2021 10:41 AM, Kees Cook wrote:
> > > > On Tue, Nov 16, 2021 at 12:12:16PM +0300, Alexander Popov wrote:
> > > > > What if the Linux kernel had a LSM module responsible for error handling policy?
> > > > > That would require adding LSM hooks to BUG*(), WARN*(), KERN_EMERG, etc.
> > > > > In such LSM policy we can decide immediately how to react on the kernel error.
> > > > > We can even decide depending on the subsystem and things like that.
> > > > That would solve the "atomicity" issue the WARN tracepoint solution has,
> > > > and it would allow for very flexible userspace policy.
> > > > 
> > > > I actually wonder if the existing panic_on_* sites should serve as a
> > > > guide for where to put the hooks. The current sysctls could be replaced
> > > > by the hooks and a simple LSM.
> > > Do you really want to make error handling a "security" issue?
> > > If you add security_bug(), security_warn_on() and the like
> > > you're begging that they be included in SELinux (AppArmor) policy.
> > > BPF, too, come to think of it. Is that what you want?
> > Yeah, that is what I was thinking. This would give the LSM a view into
> > kernel state, which seems a reasonable thing to do. If system integrity
> > is compromised, an LSM may want to stop trusting things.
> 
> How are you planning to communicate the security relevance of the
> warning to the LSM? I don't think that __FILE__, __LINE__ or __func__
> is great information to base security policy on. Nor is a backtrace.

I think that would be part of the design proposal. Initially, the known
parts are "warn or bug" and "pid".

> > A dedicated error-handling LSM could be added for those hooks that
> > implemented the existing default panic_on_* sysctls, and could expand on
> > that logic for other actions.
> 
> I can see having an interface like LSM for choosing a bug/warn policy.
> I worry about expanding the LSM hook list for a case where I would
> hope no existing LSM would use them, and the new LSM doesn't use any
> of the existing hooks.

Yeah, I can see that, though we've got a history of the "specialized"
hooks getting used by other LSMs. (e.g. loadpin's stuff got hooked up to
other LSMs, etc.)

Marco Elver Nov. 20, 2021, 12:17 p.m. UTC | #26

On Mon, Nov 15, 2021 at 02:06PM -0800, Kees Cook wrote:
[...]
> However, that's a lot to implement when Marco's tracing suggestion might
> be sufficient and policy could be entirely implemented in userspace. It
> could be as simple as this (totally untested):
[...]
> 
> Marco, is this the full version of monitoring this from the userspace
> side?

Sorry I completely missed this email (I somehow wasn't Cc'd... I just
saw it by chance re-reading this thread).

I've sent a patch to add WARN:

	https://lkml.kernel.org/r/20211115085630.1756817-1-elver@google.com

Not sure how useful BUG is, but I have no objection to it also being
traced if you think it's useful.

(I added it to kernel/panic.c, because lib/bug.c requires
CONFIG_GENERIC_BUG.)

> 	perf record -e error_report:error_report_end

I think userspace would want something other than perf tool to handle it
of course.  There are several options:

	1. Open trace pipe to be notified (/sys/kernel/tracing/trace_pipe).
	   This already includes the pid.

	2. As you suggest, use perf events globally (but the handling
	   would be done by some system process).

	3. As of 5.13 there's actually a new perf feature to
	   synchronously SIGTRAP the exact task where an event occurred
	   (see perf_event_attr::sigtrap). This would very closely mimic
	   pkill_on_warn (because the SIGTRAP is synchronous), but lets the
	   process being SIGTRAP'd decide what to do. Not sure how to
	   deploy this though, because a) only root user can create this
	   perf event (because exclude_kernel=0), and b) sigtrap perf
	   events deliberately won't propagate beyond an exec
	   (must remove_on_exec=1 if sigtrap=1) because who knows if
	   the exec'd process has the right SIGTRAP handler.

I think #3 is hard to deploy right, but below is an example program I
played with.

Thanks,
-- Marco

------ >8 ------

#define _GNU_SOURCE
#include <assert.h>
#include <stdio.h>
#include <linux/perf_event.h>
#include <signal.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <sys/syscall.h>
#include <unistd.h>

static void sigtrap_handler(int signum, siginfo_t *info, void *ucontext)
{
	// FIXME: check event is error_report_end
	printf("Kernel error in this task!\n");
}
static void generate_warning(void)
{
	... do something to generate a warning ...
}
int main()
{
	struct perf_event_attr attr = {
		.type		= PERF_TYPE_TRACEPOINT,
		.size		= sizeof(attr),
		.config		= 189, // FIXME: error_report_end
		.sample_period	= 1,
		.inherit	= 1, /* Children inherit events ... */
		.remove_on_exec = 1, /* Required by sigtrap. */
		.sigtrap	= 1, /* Request synchronous SIGTRAP on event. */
		.sig_data	= 189, /* FIXME: use to identify error_report_end */
	};
	struct sigaction action = {};
	struct sigaction oldact;
	int fd;
	action.sa_flags = SA_SIGINFO | SA_NODEFER;
	action.sa_sigaction = sigtrap_handler;
	sigemptyset(&action.sa_mask);
	assert(sigaction(SIGTRAP, &action, &oldact) == 0);
	fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, PERF_FLAG_FD_CLOEXEC);
	assert(fd != -1);
	sleep(5); /* Try to generate a warning from elsewhere, nothing will be printed. */
	generate_warning(); /* Warning from this process. */
	sigaction(SIGTRAP, &oldact, NULL);
	close(fd);
	return 0;
}

Steven Rostedt Nov. 22, 2021, 4:21 p.m. UTC | #27

On Sat, 20 Nov 2021 13:17:08 +0100
Marco Elver <elver@google.com> wrote:


> I think userspace would want something other than perf tool to handle it
> of course.  There are several options:
> 
> 	1. Open trace pipe to be notified (/sys/kernel/tracing/trace_pipe).
> 	   This already includes the pid.

I would suggest using /sys/kernel/tracing/per_cpu/cpu*/trace_pipe_raw

and use libtracefs[1] to read it.

-- Steve

[1] https://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git/

[v2,0/2] Introduce the pkill_on_warn parameter

Commit Message

Comments

Patch