[RFC,v1,05/26] printk_safe: externalize printk_context

Message ID	20191018094304.37056-6-glider@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=Fj1U=YL=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9E3F720820 Date: Fri, 18 Oct 2019 11:42:43 +0200 In-Reply-To: <20191018094304.37056-1-glider@google.com> Message-Id: <20191018094304.37056-6-glider@google.com> Mime-Version: 1.0 References: <20191018094304.37056-1-glider@google.com> Subject: [PATCH RFC v1 05/26] printk_safe: externalize printk_context From: glider@google.com To: Alexander Potapenko <glider@google.com> Cc: Vegard Nossum <vegard.nossum@oracle.com>, Petr Mladek <pmladek@suse.com>, Sergey Senozhatsky <sergey.senozhatsky@gmail.com>, Steven Rostedt <rostedt@goodmis.org>, Dmitry Vyukov <dvyukov@google.com>, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Add KernelMemorySanitizer infrastructure \| expand [RFC,v1,00/26] Add KernelMemorySanitizer infrastructure [RFC,v1,01/26] stackdepot: check depot_index before accessing the stack slab [RFC,v1,02/26] stackdepot: prevent Clang from optimizing away stackdepot_memcmp() [RFC,v1,03/26] kasan: stackdepot: move filter_irq_stacks() to stackdepot.c [RFC,v1,04/26] stackdepot: reserve 5 extra bits in depot_stack_handle_t [RFC,v1,05/26] printk_safe: externalize printk_context [RFC,v1,06/26] kasan: compiler.h: rename __no_kasan_or_inline into __no_memory_tool_or_inline [RFC,v1,07/26] kmsan: add ReST documentation [RFC,v1,08/26] kmsan: gfp: introduce __GFP_NO_KMSAN_SHADOW [RFC,v1,09/26] kmsan: introduce __no_sanitize_memory and __SANITIZE_MEMORY__ [RFC,v1,10/26] kmsan: reduce vmalloc space [RFC,v1,11/26] kmsan: add KMSAN runtime [RFC,v1,12/26] kmsan: x86: sync metadata pages on page fault [RFC,v1,13/26] kmsan: add tests for KMSAN [RFC,v1,14/26] kmsan: make READ_ONCE_TASK_STACK() return initialized values [RFC,v1,15/26] kmsan: Kconfig changes to disable options incompatible with KMSAN [RFC,v1,16/26] kmsan: Changing existing files to enable KMSAN builds [RFC,v1,17/26] kmsan: disable KMSAN instrumentation for certain kernel parts [RFC,v1,18/26] kmsan: mm: call KMSAN hooks from SLUB code [RFC,v1,19/26] kmsan: call KMSAN hooks where needed [RFC,v1,20/26] kmsan: disable instrumentation of certain functions [RFC,v1,21/26] kmsan: unpoison \|tlb\| in arch_tlb_gather_mmu() [RFC,v1,22/26] kmsan: use __msan_memcpy() where possible. [RFC,v1,23/26] kmsan: unpoisoning buffers from devices etc. [RFC,v1,24/26] kmsan: hooks for copy_to_user() and friends [RFC,v1,25/26] kmsan: disable strscpy() optimization under KMSAN [RFC,v1,26/26] net: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN

Alexander Potapenko Oct. 18, 2019, 9:42 a.m. UTC

Petr Mladek suggested we use
  this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK
instead of checking the spinlock status in kmsan_pr_err()

This appears to be less intrusive, although we'll need to declare
printk_context in some printk header.

Signed-off-by: Alexander Potapenko <glider@google.com>
To: Alexander Potapenko <glider@google.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: linux-mm@kvack.org
---

Change-Id: Idb5e7c7cedf6ab6ba635a64423d58d702ad3eca3
---
 kernel/printk/printk_safe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Petr Mladek Oct. 21, 2019, 9:09 a.m. UTC | #1

On Fri 2019-10-18 11:42:43, glider@google.com wrote:
> Petr Mladek suggested we use
>   this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK
> instead of checking the spinlock status in kmsan_pr_err()

I would like to understand why the check is needed.

My guess is that it prevents a infinite recursion.
Is this the case? It might be possible to debug this using
trace_printk().

It is important. If it is the recursion then there are
two possibilities how to prevent it. Either prevent
calling the recursive printk(). Or prevent calling
the memory checker recursive. I am not sure what makes
more sense.

Is printk() the only piece of code where you need to avoid
recursion? I wonder how it is prevented in the other cases.

> This appears to be less intrusive, although we'll need to declare
> printk_context in some printk header.

It is easier than the original approach. But the main motivation
is that it is more reliable. The spinlock is a global lock.
But it seems that it is enough to check the state of the local
CPU.

Finally, could you please CC me to the patch(es) that are using
this variable? I would actually prefer to be in CC of entire
patchsets so that I see the full context.

Best Regards,
Petr

Alexander Potapenko Oct. 23, 2019, 5:57 p.m. UTC | #2

On Mon, Oct 21, 2019 at 11:09 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2019-10-18 11:42:43, glider@google.com wrote:
> > Petr Mladek suggested we use
> >   this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK
> > instead of checking the spinlock status in kmsan_pr_err()
>
> I would like to understand why the check is needed.
>
> My guess is that it prevents a infinite recursion.
> Is this the case? It might be possible to debug this using
> trace_printk().
This indeed used to prevent infinite recursion, however looks like the
check isn't needed anymore.
When I remove it, KMSAN doesn't hang, probably because printk is doing
a better job at deadlock prevention.

What I'm seeing now is that e.g. in the following case:
  ptr = kmalloc(sizeof(int), GFP_KERNEL);
  if (ptr)
    pr_info("true\n");
  else
    pr_info("false\n");

KMSAN detects errors within pr_info(), namely in vprintk_store().
If I understand correctly, printing from that point causes printk to
use the per-cpu buffer, which is flushed once we're done with the
original printing:

[   58.984971][ T8390] BUG: KMSAN: uninit-value in
kmsan_handle_vprintk+0xa0/0xf0
...
[   59.061976][    C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0
...
[   59.062457][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
...
[   59.062961][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
...
[   59.063338][    C0] Lost 6207 message(s)!

So it turns out we'd better disable reporting under logbuf_lock,
otherwise these lost messages will confuse the developers.
Because the first pr_info() isn't called by KMSAN itself, the tool
still needs a way to know it's running inside printk.

> It is important. If it is the recursion then there are
> two possibilities how to prevent it. Either prevent
> calling the recursive printk(). Or prevent calling
> the memory checker recursive. I am not sure what makes
> more sense.
>
> Is printk() the only piece of code where you need to avoid
> recursion? I wonder how it is prevented in the other cases.
>
> > This appears to be less intrusive, although we'll need to declare
> > printk_context in some printk header.
>
> It is easier than the original approach. But the main motivation
> is that it is more reliable. The spinlock is a global lock.
> But it seems that it is enough to check the state of the local
> CPU.
>
> Finally, could you please CC me to the patch(es) that are using
> this variable? I would actually prefer to be in CC of entire
> patchsets so that I see the full context.
>
> Best Regards,
> Petr

Alexander Potapenko Oct. 23, 2019, 6 p.m. UTC | #3

On Wed, Oct 23, 2019 at 7:57 PM Alexander Potapenko <glider@google.com> wrote:
>
> On Mon, Oct 21, 2019 at 11:09 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Fri 2019-10-18 11:42:43, glider@google.com wrote:
> > > Petr Mladek suggested we use
> > >   this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK
> > > instead of checking the spinlock status in kmsan_pr_err()
> >
> > I would like to understand why the check is needed.
> >
> > My guess is that it prevents a infinite recursion.
> > Is this the case? It might be possible to debug this using
> > trace_printk().
> This indeed used to prevent infinite recursion, however looks like the
> check isn't needed anymore.
> When I remove it, KMSAN doesn't hang, probably because printk is doing
> a better job at deadlock prevention.
>
> What I'm seeing now is that e.g. in the following case:
>   ptr = kmalloc(sizeof(int), GFP_KERNEL);
>   if (ptr)
>     pr_info("true\n");
>   else
>     pr_info("false\n");
>
> KMSAN detects errors within pr_info(), namely in vprintk_store().
> If I understand correctly, printing from that point causes printk to
> use the per-cpu buffer, which is flushed once we're done with the
> original printing:
>
> [   58.984971][ T8390] BUG: KMSAN: uninit-value in
> kmsan_handle_vprintk+0xa0/0xf0
> ...
> [   59.061976][    C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0
> ...
> [   59.062457][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
> ...
> [   59.062961][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
> ...
> [   59.063338][    C0] Lost 6207 message(s)!
>
> So it turns out we'd better disable reporting under logbuf_lock,
> otherwise these lost messages will confuse the developers.
> Because the first pr_info() isn't called by KMSAN itself, the tool
> still needs a way to know it's running inside printk.
>
> > It is important. If it is the recursion then there are
> > two possibilities how to prevent it. Either prevent
> > calling the recursive printk(). Or prevent calling
> > the memory checker recursive. I am not sure what makes
> > more sense.
> >
> > Is printk() the only piece of code where you need to avoid
> > recursion? I wonder how it is prevented in the other cases.
> >
> > > This appears to be less intrusive, although we'll need to declare
> > > printk_context in some printk header.
> >
> > It is easier than the original approach. But the main motivation
> > is that it is more reliable. The spinlock is a global lock.
> > But it seems that it is enough to check the state of the local
> > CPU.
> >
> > Finally, could you please CC me to the patch(es) that are using
> > this variable? I would actually prefer to be in CC of entire
> > patchsets so that I see the full context.
I am very sorry about this. I'm going to re-send the patches soon and
CC everyone on every patch.
What I thought to be reducing the email burden on every reviewer
turned out to be really inconvenient.
> > Best Regards,
> > Petr
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg

Petr Mladek Oct. 24, 2019, 12:46 p.m. UTC | #4

On Wed 2019-10-23 19:57:39, Alexander Potapenko wrote:
> On Mon, Oct 21, 2019 at 11:09 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Fri 2019-10-18 11:42:43, glider@google.com wrote:
> > > Petr Mladek suggested we use
> > >   this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK
> > > instead of checking the spinlock status in kmsan_pr_err()
> >
> > I would like to understand why the check is needed.
> >
> > My guess is that it prevents a infinite recursion.
> > Is this the case? It might be possible to debug this using
> > trace_printk().
> This indeed used to prevent infinite recursion, however looks like the
> check isn't needed anymore.
> When I remove it, KMSAN doesn't hang, probably because printk is doing
> a better job at deadlock prevention.

Great.

> What I'm seeing now is that e.g. in the following case:
>   ptr = kmalloc(sizeof(int), GFP_KERNEL);
>   if (ptr)
>     pr_info("true\n");
>   else
>     pr_info("false\n");
> 
> KMSAN detects errors within pr_info(), namely in vprintk_store().
> If I understand correctly, printing from that point causes printk to
> use the per-cpu buffer, which is flushed once we're done with the
> original printing:
> 
> [   58.984971][ T8390] BUG: KMSAN: uninit-value in
> kmsan_handle_vprintk+0xa0/0xf0
> ...
> [   59.061976][    C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0
> ...
> [   59.062457][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
> ...
> [   59.062961][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900

Please, do you have an explanation where the uninitialized values come
from? Is it a false positive? Or is there really a bug how the
printk() parameters are passed?

> ...
> [   59.063338][    C0] Lost 6207 message(s)!

This message comes from report_message_lost(). It means that many
recursive messages did not fit into the limited per-CPU buffers.

> So it turns out we'd better disable reporting under logbuf_lock,
> otherwise these lost messages will confuse the developers.
> Because the first pr_info() isn't called by KMSAN itself, the tool
> still needs a way to know it's running inside printk.

OK. Well, I still would like to understand why the recursive messages
are printed at all. Is it a bug in the printk() code or a false positive?

There should be no recursive messages if we fix the bug or make
some annotations to prevent the false positives.

That said, I am not completely against disabling the recursive
messages. But it would be pity to hide some real bugs.

Best Regards,
Petr

Alexander Potapenko Oct. 28, 2019, 1:09 p.m. UTC | #5

On Thu, Oct 24, 2019 at 2:46 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2019-10-23 19:57:39, Alexander Potapenko wrote:
> > On Mon, Oct 21, 2019 at 11:09 AM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Fri 2019-10-18 11:42:43, glider@google.com wrote:
> > > > Petr Mladek suggested we use
> > > >   this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK
> > > > instead of checking the spinlock status in kmsan_pr_err()
> > >
> > > I would like to understand why the check is needed.
> > >
> > > My guess is that it prevents a infinite recursion.
> > > Is this the case? It might be possible to debug this using
> > > trace_printk().
> > This indeed used to prevent infinite recursion, however looks like the
> > check isn't needed anymore.
> > When I remove it, KMSAN doesn't hang, probably because printk is doing
> > a better job at deadlock prevention.
>
> Great.
>
> > What I'm seeing now is that e.g. in the following case:
> >   ptr = kmalloc(sizeof(int), GFP_KERNEL);
> >   if (ptr)
> >     pr_info("true\n");
> >   else
> >     pr_info("false\n");
> >
> > KMSAN detects errors within pr_info(), namely in vprintk_store().
> > If I understand correctly, printing from that point causes printk to
> > use the per-cpu buffer, which is flushed once we're done with the
> > original printing:
> >
> > [   58.984971][ T8390] BUG: KMSAN: uninit-value in
> > kmsan_handle_vprintk+0xa0/0xf0
> > ...
> > [   59.061976][    C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0
> > ...
> > [   59.062457][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
> > ...
> > [   59.062961][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
>
> Please, do you have an explanation where the uninitialized values come
> from? Is it a false positive? Or is there really a bug how the
> printk() parameters are passed?
I believe these are true bugs.
The problem is, when we pass an uninitialized argument into printk(),
KMSAN will report an error every time this uninitialized argument is
used.
E.g. for an uninitialized format string there'll be at least
strlen(fmt) reports in format_decode(), followed by several reports in
vsnprintf().
Although these reports seem to be real, printing only the first of
them should be more than enough.
In the future we'll actually want KMSAN to check every printk()
argument (which will require parsing the format string to figure out
the arguments' lengths), but disable reports within printk.

> > ...
> > [   59.063338][    C0] Lost 6207 message(s)!
>
> This message comes from report_message_lost(). It means that many
> recursive messages did not fit into the limited per-CPU buffers.
>
> > So it turns out we'd better disable reporting under logbuf_lock,
> > otherwise these lost messages will confuse the developers.
> > Because the first pr_info() isn't called by KMSAN itself, the tool
> > still needs a way to know it's running inside printk.
>
> OK. Well, I still would like to understand why the recursive messages
> are printed at all. Is it a bug in the printk() code or a false positive?
>
> There should be no recursive messages if we fix the bug or make
> some annotations to prevent the false positives.
>
> That said, I am not completely against disabling the recursive
> messages. But it would be pity to hide some real bugs.
>
> Best Regards,
> Petr


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Petr Mladek Oct. 29, 2019, 12:02 p.m. UTC | #6

On Mon 2019-10-28 14:09:29, Alexander Potapenko wrote:
> On Thu, Oct 24, 2019 at 2:46 PM Petr Mladek <pmladek@suse.com> wrote:
> > On Wed 2019-10-23 19:57:39, Alexander Potapenko wrote:
> > > What I'm seeing now is that e.g. in the following case:
> > >   ptr = kmalloc(sizeof(int), GFP_KERNEL);
> > >   if (ptr)
> > >     pr_info("true\n");
> > >   else
> > >     pr_info("false\n");
> > >
> > > KMSAN detects errors within pr_info(), namely in vprintk_store().
> > > If I understand correctly, printing from that point causes printk to
> > > use the per-cpu buffer, which is flushed once we're done with the
> > > original printing:
> > >
> > > [   58.984971][ T8390] BUG: KMSAN: uninit-value in
> > > kmsan_handle_vprintk+0xa0/0xf0
> > > ...
> > > [   59.061976][    C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0
> > > ...
> > > [   59.062457][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
> > > ...
> > > [   59.062961][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
> >
> > Please, do you have an explanation where the uninitialized values come
> > from? Is it a false positive? Or is there really a bug how the
> > printk() parameters are passed?
> I believe these are true bugs.
> The problem is, when we pass an uninitialized argument into printk(),
> KMSAN will report an error every time this uninitialized argument is
> used.

I see, thanks for explanation.

> E.g. for an uninitialized format string there'll be at least
> strlen(fmt) reports in format_decode(), followed by several reports in
> vsnprintf().
> Although these reports seem to be real, printing only the first of
> them should be more than enough.

Isn't this a generic problem? I mean that uninitialized values
can be passed and used also in many other locations.

printk() is special because this problem causes infinite loop. But it
would make sense to report also any other problematic value only once.

> In the future we'll actually want KMSAN to check every printk()
> argument (which will require parsing the format string to figure out
> the arguments' lengths), but disable reports within printk.

What is the motivation for this, please?

It looks to me that you want to do very paranoid checks of variables
passed to printk()? Do you want to prevent printk() crashes? Or
do you want to make sure that printk() produces correct values?

From my POV, printk() is debugging tool. It is used to show values
that people are interested in. On one hand, it might make sense to warn
people that a particular value was not initalized. On the other hand,
printk() is not important for the kernel behavior. It just
reads values and does not affect any behavior.

I would like to understand how many printk-related code is
worth the effort.

Best Regards,
Petr

Alexander Potapenko Oct. 29, 2019, 12:45 p.m. UTC | #7

On Tue, Oct 29, 2019 at 1:02 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2019-10-28 14:09:29, Alexander Potapenko wrote:
> > On Thu, Oct 24, 2019 at 2:46 PM Petr Mladek <pmladek@suse.com> wrote:
> > > On Wed 2019-10-23 19:57:39, Alexander Potapenko wrote:
> > > > What I'm seeing now is that e.g. in the following case:
> > > >   ptr = kmalloc(sizeof(int), GFP_KERNEL);
> > > >   if (ptr)
> > > >     pr_info("true\n");
> > > >   else
> > > >     pr_info("false\n");
> > > >
> > > > KMSAN detects errors within pr_info(), namely in vprintk_store().
> > > > If I understand correctly, printing from that point causes printk to
> > > > use the per-cpu buffer, which is flushed once we're done with the
> > > > original printing:
> > > >
> > > > [   58.984971][ T8390] BUG: KMSAN: uninit-value in
> > > > kmsan_handle_vprintk+0xa0/0xf0
> > > > ...
> > > > [   59.061976][    C0] BUG: KMSAN: uninit-value in vsnprintf+0x3276/0x32b0
> > > > ...
> > > > [   59.062457][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
> > > > ...
> > > > [   59.062961][    C0] BUG: KMSAN: uninit-value in format_decode+0x17f/0x1900
> > >
> > > Please, do you have an explanation where the uninitialized values come
> > > from? Is it a false positive? Or is there really a bug how the
> > > printk() parameters are passed?
> > I believe these are true bugs.
> > The problem is, when we pass an uninitialized argument into printk(),
> > KMSAN will report an error every time this uninitialized argument is
> > used.
>
> I see, thanks for explanation.
>
> > E.g. for an uninitialized format string there'll be at least
> > strlen(fmt) reports in format_decode(), followed by several reports in
> > vsnprintf().
> > Although these reports seem to be real, printing only the first of
> > them should be more than enough.
>
> Isn't this a generic problem? I mean that uninitialized values
> can be passed and used also in many other locations.
More or less, yes.
> printk() is special because this problem causes infinite loop. But it
> would make sense to report also any other problematic value only once.
Yes, printk is a bit different, because finding errors in it causes
other printk() calls which now don't cause infinite loops, but may
produce out-of-order error messages.

>
> > In the future we'll actually want KMSAN to check every printk()
> > argument (which will require parsing the format string to figure out
> > the arguments' lengths), but disable reports within printk.
>
> What is the motivation for this, please?
>
> It looks to me that you want to do very paranoid checks of variables
> passed to printk()? Do you want to prevent printk() crashes? Or
> do you want to make sure that printk() produces correct values?
Simply passing an uninitialized value to printk() may result in
cryptic error messages from printk guts, which may be harder to
understand without knowing how printk works.
The idea is to check the arguments right before printk, so that we can
print a really simple diagnostic message.
> From my POV, printk() is debugging tool. It is used to show values
> that people are interested in. On one hand, it might make sense to warn
> people that a particular value was not initalized. On the other hand,
> printk() is not important for the kernel behavior. It just
> reads values and does not affect any behavior.
>
> I would like to understand how many printk-related code is
> worth the effort.
I think overall you're right.
The feature I'm talking about isn't a critical part of KMSAN
functionality, so in order to keep the code simple I'd better drop it,
as long as we're able to report bugs when uninitialized memory is
passed to printk.

The only change to printk that I'll probably still have to make is to
initialize the result of vscnprintf() here:
https://github.com/google/kmsan/blob/master/kernel/printk/printk.c#L1921
Passing uninitialized data to vscnprintf() causes its result to also
be uninitialized, which causes an avalanche of new reports in
vprintk_store()
I'll send the updated patch (together with other KMSAN patches) your way today.


> Best Regards,
> Petr

[RFC,v1,05/26] printk_safe: externalize printk_context

Commit Message

Comments

Patch