diff mbox series

[v6,1/3] printk: Add line-buffered printk() API.

Message ID 1541165517-3557-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp (mailing list archive)
State New, archived
Headers show
Series [v6,1/3] printk: Add line-buffered printk() API. | expand

Commit Message

Tetsuo Handa Nov. 2, 2018, 1:31 p.m. UTC
Sometimes we want to print a whole line without being disturbed by
concurrent printk() from interrupts and/or other threads, for printk()
which does not end with '\n' can be disturbed.

Since mixed printk() output makes it hard to interpret, this patch
introduces API for line-buffered printk() output (so that we can make
sure that printk() ends with '\n').

Since functions introduced by this patch are merely wrapping
printk()/vprintk() calls in order to minimize possibility of using
"struct cont", it is safe to replace printk()/vprintk() with this API.
Since we want to remove "struct cont" eventually, we will try to remove
both "implicit printk() users who are expecting KERN_CONT behavior" and
"explicit pr_cont()/printk(KERN_CONT) users". Therefore, converting to
this API is recommended.

Details:

  A structure named "struct printk_buffer" is introduced for buffering
  up to LOG_LINE_MAX bytes of printk() output which did not end with '\n'.

  get_printk_buffer() tries to assign a "struct printk_buffer" from
  statically preallocated array. get_printk_buffer() returns NULL if
  all "struct printk_buffer" are in use, but the caller does not need to
  check for NULL.

  put_printk_buffer() flushes and releases the "struct printk_buffer".
  put_printk_buffer() must match corresponding get_printk_buffer() as with
  rcu_read_unlock() must match corresponding rcu_read_lock().

  Three functions vprintk_buffered(), printk_buffered() and
  flush_printk_buffer() are provided for using "struct printk_buffer".
  These are like vfprintf(), fprintf(), fflush() except that these receive
  "struct printk_buffer *" for the first argument.

  vprintk_buffered() and printk_buffered() fall back to vprintk() and
  printk() respectively if "struct printk_buffer *" argument is NULL.
  flush_printk_buffer() and put_printk_buffer() become no-op if
  "struct printk_buffer *" argument is NULL. Therefore, the caller of
  get_printk_buffer() does not need to check for NULL.

How to use this API:

  (1) Call get_printk_buffer() and acquire "struct printk_buffer *".

  (2) Rewrite printk() calls in the following way. The "ptr" is
      "struct printk_buffer *" obtained in step (1).

      printk(fmt, ...)     => printk_buffered(ptr, fmt, ...)
      vprintk(fmt, args)   => vprintk_buffered(ptr, fmt, args)
      pr_emerg(fmt, ...)   => bpr_emerg(ptr, fmt, ...)
      pr_alert(fmt, ...)   => bpr_alert(ptr, fmt, ...)
      pr_crit(fmt, ...)    => bpr_crit(ptr, fmt, ...)
      pr_err(fmt, ...)     => bpr_err(ptr, fmt, ...)
      pr_warning(fmt, ...) => bpr_warning(ptr, fmt, ...)
      pr_warn(fmt, ...)    => bpr_warn(ptr, fmt, ...)
      pr_notice(fmt, ...)  => bpr_notice(ptr, fmt, ...)
      pr_info(fmt, ...)    => bpr_info(ptr, fmt, ...)
      pr_cont(fmt, ...)    => bpr_cont(ptr, fmt, ...)

  (3) Release "struct printk_buffer" by calling put_printk_buffer().

Note that since "struct printk_buffer" buffers only up to one line, there
is no need to rewrite if it is known that the "struct printk_buffer" is
empty and printk() ends with '\n'.

  Good example:

    printk("Hello ");    =>  buf = get_printk_buffer();
    pr_cont("world.\n");     printk_buffered(buf, "Hello ");
                             printk_buffered(buf, "world.\n");
                             put_printk_buffer(buf);

  Pointless example:

    printk("Hello\n");   =>  buf = get_printk_buffer();
    printk("World.\n");      printk_buffered(buf, "Hello\n");
                             printk_buffered(buf, "World.\n");
                             put_printk_buffer(buf);

Note that bpr_devel() and bpr_debug() are not defined. This is
because pr_devel()/pr_debug() should not be followed by pr_cont()
because pr_devel()/pr_debug() are conditionally enabled; output from
pr_devel()/pr_debug() should always end with '\n'.

Previous version was proposed at
https://lkml.kernel.org/r/1540375870-6235-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp .

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 include/linux/printk.h          |  43 ++++++++++
 kernel/printk/Makefile          |   2 +-
 kernel/printk/internal.h        |   3 +
 kernel/printk/printk.c          |   3 -
 kernel/printk/printk_buffered.c | 179 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 226 insertions(+), 4 deletions(-)
 create mode 100644 kernel/printk/printk_buffered.c

Comments

Matthew Wilcox Nov. 2, 2018, 2:40 p.m. UTC | #1
On Fri, Nov 02, 2018 at 10:31:55PM +0900, Tetsuo Handa wrote:
>   get_printk_buffer() tries to assign a "struct printk_buffer" from
>   statically preallocated array. get_printk_buffer() returns NULL if
>   all "struct printk_buffer" are in use, but the caller does not need to
>   check for NULL.

This seems like a great way of wasting 16kB of memory.  Since you've
already made printk_buffered() work with a NULL initial argument, what's
the advantage over just doing kmalloc(1024, GFP_ATOMIC)?
kernel test robot Nov. 2, 2018, 6:12 p.m. UTC | #2
Hi Tetsuo,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.19 next-20181102]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tetsuo-Handa/printk-Add-line-buffered-printk-API/20181103-010234
config: i386-randconfig-x002-201843 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from kernel/printk/printk_buffered.c:5:0:
>> kernel/printk/internal.h:28:8: error: unknown type name 'raw_spinlock_t'
    extern raw_spinlock_t logbuf_lock;
           ^~~~~~~~~~~~~~

vim +/raw_spinlock_t +28 kernel/printk/internal.h

f956d1a3 Tetsuo Handa       2018-11-02  27  
099f1c84 Sergey Senozhatsky 2016-12-27 @28  extern raw_spinlock_t logbuf_lock;
099f1c84 Sergey Senozhatsky 2016-12-27  29  

:::::: The code at line 28 was first introduced by commit
:::::: 099f1c84c0052ec1b27f1c3942eed5830d86bdbb printk: introduce per-cpu safe_print seq buffer

:::::: TO: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
:::::: CC: Petr Mladek <pmladek@suse.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Tetsuo Handa Nov. 3, 2018, 1:55 a.m. UTC | #3
On 2018/11/02 23:40, Matthew Wilcox wrote:
> On Fri, Nov 02, 2018 at 10:31:55PM +0900, Tetsuo Handa wrote:
>>   get_printk_buffer() tries to assign a "struct printk_buffer" from
>>   statically preallocated array. get_printk_buffer() returns NULL if
>>   all "struct printk_buffer" are in use, but the caller does not need to
>>   check for NULL.
> 
> This seems like a great way of wasting 16kB of memory.  Since you've
> already made printk_buffered() work with a NULL initial argument, what's
> the advantage over just doing kmalloc(1024, GFP_ATOMIC)?

Like "[PATCH 2/3] mm: Use line-buffered printk() for show_free_areas()."
demonstrates, kzalloc(sizeof(struct printk_buffer), GFP_ATOMIC) can fail.

And using statically preallocated buffers helps avoiding

  (1) out of buffers when memory cannot be allocated

  (2) kernel stack overflow when kernel stack is already tight (e.g.
      a memory allocation attempt from an interrupt handler which was
      invoked from deep inside call chain of a process context)

. Whether

  (A) tuning the number of statically preallocated buffers

  (B) allocating buffers on caller side (e.g. kzalloc() or in .bss section)

are useful is a future decision, for too much concurrent printk() will lockup
the system even if there are enough buffers. I think that starting with
statically preallocated buffers is (at least for now) a good choice for
minimizing risk of (1) (2) while offering practically acceptable result.
Sergey Senozhatsky Nov. 6, 2018, 2:35 p.m. UTC | #4
On (11/02/18 22:31), Tetsuo Handa wrote:
>   (1) Call get_printk_buffer() and acquire "struct printk_buffer *".
> 
>   (2) Rewrite printk() calls in the following way. The "ptr" is
>       "struct printk_buffer *" obtained in step (1).
> 
>       printk(fmt, ...)     => printk_buffered(ptr, fmt, ...)
>       vprintk(fmt, args)   => vprintk_buffered(ptr, fmt, args)
>       pr_emerg(fmt, ...)   => bpr_emerg(ptr, fmt, ...)
>       pr_alert(fmt, ...)   => bpr_alert(ptr, fmt, ...)
>       pr_crit(fmt, ...)    => bpr_crit(ptr, fmt, ...)
>       pr_err(fmt, ...)     => bpr_err(ptr, fmt, ...)
>       pr_warning(fmt, ...) => bpr_warning(ptr, fmt, ...)
>       pr_warn(fmt, ...)    => bpr_warn(ptr, fmt, ...)
>       pr_notice(fmt, ...)  => bpr_notice(ptr, fmt, ...)
>       pr_info(fmt, ...)    => bpr_info(ptr, fmt, ...)
>       pr_cont(fmt, ...)    => bpr_cont(ptr, fmt, ...)
> 
>   (3) Release "struct printk_buffer" by calling put_printk_buffer().

[..]

> Since we want to remove "struct cont" eventually, we will try to remove
> both "implicit printk() users who are expecting KERN_CONT behavior" and
> "explicit pr_cont()/printk(KERN_CONT) users". Therefore, converting to
> this API is recommended.

- The printk-fallback sounds like a hint that the existing 'cont' handling
  better stay in the kernel. I don't see how the existing 'cont' is
  significantly worse than
		bpr_warn(NULL, ...)->printk() // no 'cont' support
  I don't see why would we want to do it, sorry. I don't see "it takes 16
  printk-buffers to make a thing go right" as a sure thing.

A question.

How bad would it actually be to:

- Allocate seq_buf 512-bytes buffer (GFP_ATOMIC) just-in-time, when we
  need it.
    // How often systems cannot allocate a 512-byte buffer? //

- OK, assuming that systems around the world are so badly OOM like all the
  time and even kmalloc(512) is absolutely impossible, then have a fallback
  to the existing 'cont' handling; it just looks to me better than a plain
  printk()-fallback with removed 'cont' support.

- Do not allocate seq_buf if we are in printk-safe or in printk-nmi mode.
  To avoid "buffering for the sake of buffering". IOW, when in printk-safe
  use printk-safe.

	-ss
Petr Mladek Nov. 7, 2018, 10:21 a.m. UTC | #5
On Tue 2018-11-06 23:35:02, Sergey Senozhatsky wrote:
> On (11/02/18 22:31), Tetsuo Handa wrote:
> >   (1) Call get_printk_buffer() and acquire "struct printk_buffer *".
> > 
> >   (2) Rewrite printk() calls in the following way. The "ptr" is
> >       "struct printk_buffer *" obtained in step (1).
> > 
> >       printk(fmt, ...)     => printk_buffered(ptr, fmt, ...)
> >       vprintk(fmt, args)   => vprintk_buffered(ptr, fmt, args)
> >       pr_emerg(fmt, ...)   => bpr_emerg(ptr, fmt, ...)
> >       pr_alert(fmt, ...)   => bpr_alert(ptr, fmt, ...)
> >       pr_crit(fmt, ...)    => bpr_crit(ptr, fmt, ...)
> >       pr_err(fmt, ...)     => bpr_err(ptr, fmt, ...)
> >       pr_warning(fmt, ...) => bpr_warning(ptr, fmt, ...)
> >       pr_warn(fmt, ...)    => bpr_warn(ptr, fmt, ...)
> >       pr_notice(fmt, ...)  => bpr_notice(ptr, fmt, ...)
> >       pr_info(fmt, ...)    => bpr_info(ptr, fmt, ...)
> >       pr_cont(fmt, ...)    => bpr_cont(ptr, fmt, ...)
> > 
> >   (3) Release "struct printk_buffer" by calling put_printk_buffer().
> 
> [..]
> 
> > Since we want to remove "struct cont" eventually, we will try to remove
> > both "implicit printk() users who are expecting KERN_CONT behavior" and
> > "explicit pr_cont()/printk(KERN_CONT) users". Therefore, converting to
> > this API is recommended.
> 
> - The printk-fallback sounds like a hint that the existing 'cont' handling
>   better stay in the kernel. I don't see how the existing 'cont' is
>   significantly worse than
> 		bpr_warn(NULL, ...)->printk() // no 'cont' support
>   I don't see why would we want to do it, sorry. I don't see "it takes 16
>   printk-buffers to make a thing go right" as a sure thing.

I see it the following way:

   + mixed cont lines are very rare but they happen

   + 16 buffers are more than 1 so it could only be better [*]

   + the printk_buffer() code is self-contained and does not
     complicate the logic of the classic printk() code [**]


[*] A missing put_printk_buffer() might cause that we would get
    out of buffers. But the same problem is with locks,
    disabled preemption, disabled interrupts, seq_buffer,
    alloc/free. Such problems happen but they are rare.

    Also I do not expect that the same buffer would be shared
    between many functions. Therefore it should be easy
    to use it correctly.


[**] I admit that cont buffer implementation is much easier
     after removing the early flush to consoles but still...


Anyway, I do not think that both implementations are worth it.
We could keep both for some transition period but we should
remove the old one later.


> A question.
> 
> How bad would it actually be to:
> 
> - Allocate seq_buf 512-bytes buffer (GFP_ATOMIC) just-in-time, when we
>   need it.
>     // How often systems cannot allocate a 512-byte buffer? //
> 
> - OK, assuming that systems around the world are so badly OOM like all the
>   time and even kmalloc(512) is absolutely impossible, then have a fallback
>   to the existing 'cont' handling; it just looks to me better than a plain
>   printk()-fallback with removed 'cont' support.

This would prevent removing the fallback to struct cont. OOM is
one important scenario where continuous lines are used.


> - Do not allocate seq_buf if we are in printk-safe or in printk-nmi mode.
>   To avoid "buffering for the sake of buffering". IOW, when in printk-safe
>   use printk-safe.

Sure, my plan is to add a helper function is_buffered_printk_context() or so
that would check printk_context. Then we could do the following in
vprintk_buffered()

	if (is_buffered_printk_context())
		vprintk_func(....);

It might be added on top of the current patchset. I opened this
problem once and it got lost. So I did not want to complicate
it at this moment.

Best Regards,
Petr
Tetsuo Handa Nov. 7, 2018, 10:52 a.m. UTC | #6
On 2018/11/06 23:35, Sergey Senozhatsky wrote:
>> Since we want to remove "struct cont" eventually, we will try to remove
>> both "implicit printk() users who are expecting KERN_CONT behavior" and
>> "explicit pr_cont()/printk(KERN_CONT) users". Therefore, converting to
>> this API is recommended.
> 
> - The printk-fallback sounds like a hint that the existing 'cont' handling
>   better stay in the kernel. I don't see how the existing 'cont' is
>   significantly worse than
> 		bpr_warn(NULL, ...)->printk() // no 'cont' support
>   I don't see why would we want to do it, sorry. I don't see "it takes 16
>   printk-buffers to make a thing go right" as a sure thing.

Existing 'cont' handling will stay for a while. After majority of
pr_cont()/KERN_CONT users are converted, 'cont' support will be removed
(e.g. KERN_CONT becomes "").

> 
> A question.
> 
> How bad would it actually be to:
> 
> - Allocate seq_buf 512-bytes buffer (GFP_ATOMIC) just-in-time, when we
>   need it.
>     // How often systems cannot allocate a 512-byte buffer? //

It is a very bad thing to do GFP_ATOMIC without __GFP_NOWARN. See
"[PATCH 2/3] mm: Use line-buffered printk() for show_free_areas()."
which helps exactly when GFP_ATOMIC without __GFP_NOWARN failed.
Without __GFP_NOWARN, GFP_ATOMIC for printk() can trigger infinite
recursion and kernel stack overflow.

Even without recursion, doing kmalloc(GFP_ATOMIC | __GFP_NOWARN) temporarily
consumes some kernel stack. I don't know the exact amount needed for
kmalloc(GFP_ATOMIC | __GFP_NOWARN), but it might still emit memory allocation
fault injection messages. What GFP_ATOMIC can guarantee is nothing but
"it does not sleep". Not suitable for printk() which might be called from
critically dangerous situations.

> 
> - OK, assuming that systems around the world are so badly OOM like all the
>   time and even kmalloc(512) is absolutely impossible, then have a fallback
>   to the existing 'cont' handling; it just looks to me better than a plain
>   printk()-fallback with removed 'cont' support.

Since I want to eventually remove 'cont' support inside printk(),
I dropped KERN_CONT in patch [2/3] and [3/3].

> 
> - Do not allocate seq_buf if we are in printk-safe or in printk-nmi mode.
>   To avoid "buffering for the sake of buffering". IOW, when in printk-safe
>   use printk-safe.

Why? Since printk_safe_flush_buffer() forcibly flushes the partial line,
calling printk_safe_log_store() after line buffering can reduce possibility of
flushing partial lines, can't it?
David Laight Nov. 7, 2018, 11:01 a.m. UTC | #7
From: Tetsuo Handa
> Sent: 07 November 2018 10:53
> 
> On 2018/11/06 23:35, Sergey Senozhatsky wrote:
> >> Since we want to remove "struct cont" eventually, we will try to remove
> >> both "implicit printk() users who are expecting KERN_CONT behavior" and
> >> "explicit pr_cont()/printk(KERN_CONT) users". Therefore, converting to
> >> this API is recommended.
> >
> > - The printk-fallback sounds like a hint that the existing 'cont' handling
> >   better stay in the kernel. I don't see how the existing 'cont' is
> >   significantly worse than
> > 		bpr_warn(NULL, ...)->printk() // no 'cont' support
> >   I don't see why would we want to do it, sorry. I don't see "it takes 16
> >   printk-buffers to make a thing go right" as a sure thing.
> 
> Existing 'cont' handling will stay for a while. After majority of
> pr_cont()/KERN_CONT users are converted, 'cont' support will be removed
> (e.g. KERN_CONT becomes "").

A though:

Why not make the printf lock slightly 'sticky'?
- If the output line is incomplete save the cpuid.
- If there is a saved cpuid that doesn't match the current cpu then spin for a bit.

Any callers of printk() have to assume they will spin on the buffer for the
longest printk formatting (and symbol lookup might take a while) so a short
additional delay won't matter.

Then two calls to printk() for the same line won't (usually) get split and
none of the callers need any changes.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Petr Mladek Nov. 7, 2018, 11:45 a.m. UTC | #8
On Wed 2018-11-07 19:52:53, Tetsuo Handa wrote:
> On 2018/11/06 23:35, Sergey Senozhatsky wrote:
> > - Do not allocate seq_buf if we are in printk-safe or in printk-nmi mode.
> >   To avoid "buffering for the sake of buffering". IOW, when in printk-safe
> >   use printk-safe.
> 
> Why? Since printk_safe_flush_buffer() forcibly flushes the partial line,
> calling printk_safe_log_store() after line buffering can reduce possibility of
> flushing partial lines, can't it?

Good point.

Well, printk_safe buffers are flushed via irqwork scheduled on the
same CPU. It might get flushed prematurely from other CPU but
I am not sure if this risk is worth the double buffering.

Best Regards,
Petr
Petr Mladek Nov. 7, 2018, noon UTC | #9
On Wed 2018-11-07 11:01:05, David Laight wrote:
> From: Tetsuo Handa
> > Sent: 07 November 2018 10:53
> A though:
> 
> Why not make the printf lock slightly 'sticky'?
> - If the output line is incomplete save the cpuid.
> - If there is a saved cpuid that doesn't match the current cpu then spin for a bit.
> 
> Any callers of printk() have to assume they will spin on the buffer for the
> longest printk formatting (and symbol lookup might take a while) so a short
> additional delay won't matter.

Disabling preemption for a single printk() is questionable. We have
spent many years trying to find an acceptable solution to avoid
softlockups and we did not fully succeeded.

And this patchset is about continuous lines. Therefore we would need
to disable preemption for all code intermixed with the related
printks. This does not have much chances to get accepted.


> Then two calls to printk() for the same line won't (usually) get split and
> none of the callers need any changes.

It would require changes to disable the preemption around the related
calls.

We could not do this without a better API and inspecting all users.
Note that we could not detect this by "\n" because it is too
error-prone.

Best Regards,
Petr
Tetsuo Handa Nov. 7, 2018, 12:54 p.m. UTC | #10
On 2018/11/07 19:21, Petr Mladek wrote:
> On Tue 2018-11-06 23:35:02, Sergey Senozhatsky wrote:
>>> Since we want to remove "struct cont" eventually, we will try to remove
>>> both "implicit printk() users who are expecting KERN_CONT behavior" and
>>> "explicit pr_cont()/printk(KERN_CONT) users". Therefore, converting to
>>> this API is recommended.
>>
>> - The printk-fallback sounds like a hint that the existing 'cont' handling
>>   better stay in the kernel. I don't see how the existing 'cont' is
>>   significantly worse than
>> 		bpr_warn(NULL, ...)->printk() // no 'cont' support
>>   I don't see why would we want to do it, sorry. I don't see "it takes 16
>>   printk-buffers to make a thing go right" as a sure thing.
> 
> I see it the following way:
> 
>    + mixed cont lines are very rare but they happen
> 
>    + 16 buffers are more than 1 so it could only be better [*]
> 
>    + the printk_buffer() code is self-contained and does not
>      complicate the logic of the classic printk() code [**]
> 
> 
> [*] A missing put_printk_buffer() might cause that we would get
>     out of buffers. But the same problem is with locks,
>     disabled preemption, disabled interrupts, seq_buffer,
>     alloc/free. Such problems happen but they are rare.
> 
>     Also I do not expect that the same buffer would be shared
>     between many functions. Therefore it should be easy
>     to use it correctly.

Since we can allocate printk() buffer upon dup_task_struct()
and free it upon free_task_struct(), we can have enough printk()
buffers for task context. Also, since total number of exceptional
contexts (e.g. interrupts, oops) is finite, we can have enough
printk() buffers for exceptional contexts.

Is it possible to identify all locations which should use their own
printk() buffers (e.g. interrupt handlers, oops handlers) ? If yes,
despite Linus's objection, automatically switching printk() buffers
(like memalloc_nofs_save()/memalloc_nofs_restore() instead of
https://lkml.kernel.org/r/201709021512.DJG00503.JHOSOFFMFtVLOQ@I-love.SAKURA.ne.jp )
will be easiest and least error prone.
Petr Mladek Nov. 7, 2018, 1:41 p.m. UTC | #11
On Fri 2018-11-02 22:31:55, Tetsuo Handa wrote:
> Sometimes we want to print a whole line without being disturbed by
> concurrent printk() from interrupts and/or other threads, for printk()
> which does not end with '\n' can be disturbed.
> 
> Since mixed printk() output makes it hard to interpret, this patch
> introduces API for line-buffered printk() output (so that we can make
> sure that printk() ends with '\n').
> 
> Since functions introduced by this patch are merely wrapping
> printk()/vprintk() calls in order to minimize possibility of using
> "struct cont", it is safe to replace printk()/vprintk() with this API.
> Since we want to remove "struct cont" eventually, we will try to remove
> both "implicit printk() users who are expecting KERN_CONT behavior" and
> "explicit pr_cont()/printk(KERN_CONT) users". Therefore, converting to
> this API is recommended.

[...]

> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index cf3eccf..92af345 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -173,6 +174,20 @@ int printk_emit(int facility, int level,
>  
>  asmlinkage __printf(1, 2) __cold
>  int printk(const char *fmt, ...);
> +struct printk_buffer *get_printk_buffer(void);
> +bool flush_printk_buffer(struct printk_buffer *ptr);
> +__printf(2, 3)
> +int printk_buffered(struct printk_buffer *ptr, const char *fmt, ...);
> +__printf(2, 0)
> +int vprintk_buffered(struct printk_buffer *ptr, const char *fmt, va_list args);
> +/*
> + * In order to avoid accidentally reusing "ptr" after put_printk_buffer("ptr"),
> + * put_printk_buffer() is defined as a macro which explicitly resets "ptr" to
> + * NULL.
> + */
> +void __put_printk_buffer(struct printk_buffer *ptr);
> +#define put_printk_buffer(ptr)					\
> +	do { __put_printk_buffer(ptr); ptr = NULL; } while (0)

I have mixed feelings about this. It is clever but it complicates the
code. I wonder why we need to be more paranoid than kfree() and
friends. Especially that the reuse of printk buffer is much less
dangerous than reusing freed pointers.

Please, remove it unless it gets more supporters.


> --- /dev/null
> +++ b/kernel/printk/printk_buffered.c
> +int vprintk_buffered(struct printk_buffer *ptr, const char *fmt, va_list args)
> +{
> +	va_list tmp_args;
> +	int r;
> +	int pos;
> +
> +	if (!ptr)
> +		return vprintk(fmt, args);
> +	/*
> +	 * Skip KERN_CONT here based on an assumption that KERN_CONT will be
> +	 * given via "fmt" argument when KERN_CONT is given.
> +	 */
> +	pos = (printk_get_level(fmt) == 'c') ? 2 : 0;
> +	while (true) {
> +		va_copy(tmp_args, args);
> +		r = vsnprintf(ptr->buf + ptr->len, sizeof(ptr->buf) - ptr->len,
> +			      fmt + pos, tmp_args);
> +		va_end(tmp_args);
> +		if (likely(r + ptr->len < sizeof(ptr->buf)))
> +			break;
> +		if (!flush_printk_buffer(ptr))
> +			return vprintk(fmt, args);
> +	}
> +	ptr->len += r;
> +	/* Flush already completed lines if any. */
> +	for (pos = ptr->len - 1; pos >= 0; pos--) {
> +		if (ptr->buf[pos] != '\n')
> +			continue;
> +		ptr->buf[pos++] = '\0';
> +		printk("%s\n", ptr->buf);
> +		ptr->len -= pos;
> +		memmove(ptr->buf, ptr->buf + pos, ptr->len);
> +		/* This '\0' will be overwritten by next vsnprintf() above. */
> +		ptr->buf[ptr->len] = '\0';

I am not against explicitly adding '\0' this way. Just note that there
is always the trailing '\0' in the original string, see below.

I mean that we could get the same result with memmove(,, ptr->len+1);


> +		break;
> +	}
> +	return r;
> +}
> +
> +/**
> + * flush_printk_buffer - Flush incomplete line in printk_buffer.
> + *
> + * @ptr: Pointer to "struct printk_buffer". It can be NULL.
> + *
> + * Returns true if flushed something, false otherwise.
> + *
> + * Flush if @ptr contains partial data. But usually there is no need to call
> + * this function because @ptr is flushed by put_printk_buffer().
> + */
> +bool flush_printk_buffer(struct printk_buffer *ptr)
> +{
> +	if (!ptr || !ptr->len)
> +		return false;
> +	/*
> +	 * vprintk_buffered() keeps 0 <= ptr->len < sizeof(ptr->buf) true.
> +	 * But ptr->buf[ptr->len] != '\0' if this function is called due to
> +	 * vsnprintf() + ptr->len >= sizeof(ptr->buf).
> +	 */
> +	ptr->buf[ptr->len] = '\0';

This is not necessary. Note that vsnprintf() always adds the trailing
'\0' even when the string is shrunken.


> +	printk("%s", ptr->buf);
> +	ptr->len = 0;
> +	return true;
> +}
> +EXPORT_SYMBOL(flush_printk_buffer);
> +
> +/**
> + * __put_printk_buffer - Release printk_buffer.
> + *
> + * @ptr: Pointer to "struct printk_buffer". It can be NULL.
> + *
> + * Returns nothing.
> + *
> + * Flush and release @ptr.
> + * Please use put_printk_buffer() in order to catch use-after-free bugs.
> + */
> +void __put_printk_buffer(struct printk_buffer *ptr)
> +{
> +	long i = (unsigned long) ptr - (unsigned long) printk_buffers;
> +
> +	if (!ptr)
> +		return;
> +	if (WARN_ON_ONCE(i % sizeof(struct printk_buffer)))
> +		return;
> +	i /= sizeof(struct printk_buffer);
> +	if (WARN_ON_ONCE(i < 0 || i >= NUM_LINE_BUFFERS))
> +		return;
> +	if (ptr->len)

Nit: This check is superfluous.

> +		flush_printk_buffer(ptr);
> +	clear_bit_unlock(i, printk_buffers_in_use);
> +}
> +EXPORT_SYMBOL(__put_printk_buffer);

Otherwise, it looks good to me.


Best Regards,
Petr
Petr Mladek Nov. 7, 2018, 2:06 p.m. UTC | #12
On Fri 2018-11-02 22:31:55, Tetsuo Handa wrote:
> How to use this API:
> 
>   (1) Call get_printk_buffer() and acquire "struct printk_buffer *".
> 
>   (2) Rewrite printk() calls in the following way. The "ptr" is
>       "struct printk_buffer *" obtained in step (1).
> 
>       printk(fmt, ...)     => printk_buffered(ptr, fmt, ...)
>       vprintk(fmt, args)   => vprintk_buffered(ptr, fmt, args)
>       pr_emerg(fmt, ...)   => bpr_emerg(ptr, fmt, ...)
>       pr_alert(fmt, ...)   => bpr_alert(ptr, fmt, ...)
>       pr_crit(fmt, ...)    => bpr_crit(ptr, fmt, ...)
>       pr_err(fmt, ...)     => bpr_err(ptr, fmt, ...)
>       pr_warning(fmt, ...) => bpr_warning(ptr, fmt, ...)
>       pr_warn(fmt, ...)    => bpr_warn(ptr, fmt, ...)
>       pr_notice(fmt, ...)  => bpr_notice(ptr, fmt, ...)
>       pr_info(fmt, ...)    => bpr_info(ptr, fmt, ...)
>       pr_cont(fmt, ...)    => bpr_cont(ptr, fmt, ...)

I am looking at the sample conversions. We actually won't need
bpr_cont(). We will use buffer_printk() instead.

Well, I think about renaming buffer_printk() to bprintk() or
define it as a wrapper at least.

Best Regards,
Petr
Sergey Senozhatsky Nov. 8, 2018, 2:21 a.m. UTC | #13
On (11/07/18 11:21), Petr Mladek wrote:
> > [..]
> > - The printk-fallback sounds like a hint that the existing 'cont' handling
> >   better stay in the kernel. I don't see how the existing 'cont' is
> >   significantly worse than
> > 		bpr_warn(NULL, ...)->printk() // no 'cont' support
> >   I don't see why would we want to do it, sorry. I don't see "it takes 16
> >   printk-buffers to make a thing go right" as a sure thing.
> 
> I see it the following way:
> 
>    + mixed cont lines are very rare but they happen
> 
>    + 16 buffers are more than 1 so it could only be better [*]

Right, so this is where things go a bit shady. I'll try to explain.

What is the problem:
- we have tons of CPUs, with tons of tasks running on them, with preemption,
  and interrupts, and potentially printk-s coming from various
  contexts/CPUs/tasks etc. so one 'cont' buffer is not enough.

What is the proposed solution:
- if 1 is not enough then 16 will do. And if 16 is not enough then this
  is not our problem anymore, it's a kernel misconfiguration and users'
  fault.

So, maybe it is a "common" kernel development pattern, I really don't know;
but it looks like we are just throwing the issue over the fence; and atop
of that we are killing off the existing 'cont'.

> Anyway, I do not think that both implementations are worth it.
> We could keep both for some transition period but we should
> remove the old one later.
>
[..]
>
> This would prevent removing the fallback to struct cont. OOM is
> one important scenario where continuous lines are used.

Let's have one more look at what we will fix and what we will break.

'cont' has premature flushes.

- it's annoying in some cases
- it's good otherwise

  Why is it good.
  It preserves the correct order of events.

  pr_cont("calling foo->init()....");
  foo->init()
   printk("Can't allocate buffer\n");    // premature flush
  pr_cont("...blah\h");

 Will end up in the logbuf as:
 [12345.123] calling foo->init()....
 [12345.124] Can't allocate buffer
 [12345.125] ...blah

 Where buffered printk will endup as:
 [12345.123] Can't allocate buffer
 [12345.124] calling foo->init().......blah

It's very different.

Not to mention that buffered printk does not flush on panic.
So, frankly, as of now, I don't see buffered printk as a 'cont'
replacement.

If our problem is OOM and lockdep print outs, then let's address only
those two; let's not "fix" the rest of the kernel, especially the early
boot, - we can break more things than we can mend.

[..]
> I opened this problem once and it got lost. So I did not want to
> complicate it at this moment.

Right, I saw it. We have similar points; I raised those in private talks:

- we don't need buffered printk in printk_safe/printk_nmi

- we don't need br_cont()

- unlike 'cont' buffer there is no way for us to flush buffered printk
  buffers on panic

- I don't exactly like the completely of the vprintk_buffered. If
  buffered printk is for single line, then it must be for single
  line only. And I'm not buying the "we will need this for printk
  origin info injection" argument.
  Linus was very clear about this whole idea:

  Linus Torvalds wrote:
  :  Sergey Senozhatsky wrote:
  :  >
  :  > so... I think we don't have to update 'struct printk_log'. we can store
  :  > that "extended data" at the beginning of every message, right after the
  :  > prefix.
  :
  :   No, we really can't. That just means that all the tools would have to
  :   be changed to get the normal messages without the extra crud. And
  :   since it will have lost the difference, that's not even easy to do.
  :
  :   So this is exactly the wrong way around.
  :
  :   If people want to see the extra data, it really should be extra data
  :   that you can get with a new interface from the kernel logs. Not a
  :   "let's just a add it to all lines and make every line uglier and
  :   harder to read.

  So the "we need all that complexity to inject printk info in later
  patches" push doesn't look right.

- It seems that buffered printk attempts to solve too many problems.
  I'd prefer it to address just one.

	-ss
Sergey Senozhatsky Nov. 8, 2018, 2:30 a.m. UTC | #14
On (11/07/18 19:52), Tetsuo Handa wrote:
> > A question.
> > 
> > How bad would it actually be to:
> > 
> > - Allocate seq_buf 512-bytes buffer (GFP_ATOMIC) just-in-time, when we
> >   need it.
> >     // How often systems cannot allocate a 512-byte buffer? //
> 
> It is a very bad thing to do GFP_ATOMIC without __GFP_NOWARN.

Absolutely, __GFP_NOWARN.

> "it does not sleep". Not suitable for printk() which might be called from
> critically dangerous situations.

So I'm really not convinced that we can use buffered printk in critically
dangerous situations. Premature 'cont' flushes and 'cont' flushes on panic
are nice and right in critically dangerous situations.

[..]
> > - Do not allocate seq_buf if we are in printk-safe or in printk-nmi mode.
> >   To avoid "buffering for the sake of buffering". IOW, when in printk-safe
> >   use printk-safe.
> 
> Why? Since printk_safe_flush_buffer() forcibly flushes the partial line

We need to leave printk_safe and enable local IRQs for that partial
flush to occur. I'm not sure that those "partial flushes" from printk_safe
actually happen.

	-ss
Petr Mladek Nov. 8, 2018, 11:24 a.m. UTC | #15
On Thu 2018-11-08 11:21:38, Sergey Senozhatsky wrote:
> On (11/07/18 11:21), Petr Mladek wrote:
> What is the problem:
> - we have tons of CPUs, with tons of tasks running on them, with preemption,
>   and interrupts, and potentially printk-s coming from various
>   contexts/CPUs/tasks etc. so one 'cont' buffer is not enough.
> 
> What is the proposed solution:
> - if 1 is not enough then 16 will do. And if 16 is not enough then this
>   is not our problem anymore, it's a kernel misconfiguration and users'
>   fault.

I believe that I mentioned this more times. 16 buffers is the first
attempt. We could improve it later in several ways:

  + add more buffers
  + combine buffers on stack, dynamically allocated and static ones.

BTW: I do not remember seeing mixed lines from anything even close to 16
CPUs. Maybe I was just lucky or my memory is leaky.


> Let's have one more look at what we will fix and what we will break.
> 
> 'cont' has premature flushes.
> 
>   Why is it good.
>   It preserves the correct order of events.
> 
>   pr_cont("calling foo->init()....");
>   foo->init()
>    printk("Can't allocate buffer\n");    // premature flush
>   pr_cont("...blah\h");
> 
>  Will end up in the logbuf as:
>  [12345.123] calling foo->init()....
>  [12345.124] Can't allocate buffer
>  [12345.125] ...blah
> 
>  Where buffered printk will endup as:
>  [12345.123] Can't allocate buffer
>  [12345.124] calling foo->init().......blah

We will always have this problem with API using explicit buffers.
What do you suggest instead, please?

I am afraid that we are running in cycles. The other serious
alternative was having per-process and per-context buffers
but it was rejected several times.


> Not to mention that buffered printk does not flush on panic.
> So, frankly, as of now, I don't see buffered printk as a 'cont'
> replacement.

The static array of buffers can be flushed on panic.


> If our problem is OOM and lockdep print outs, then let's address only
> those two; let's not "fix" the rest of the kernel, especially the early
> boot, - we can break more things than we can mend.

Do you have any alternative proposal how to handle OOM and lockdep, please?


> [..]
> > I opened this problem once and it got lost. So I did not want to
> > complicate it at this moment.
>
> - I don't exactly like the completely of the vprintk_buffered. If
>   buffered printk is for single line, then it must be for single
>   line only.

My undestanding is that the new API is similar to the current cont
buffer from this point of view:

    + buffer size is LOG_LINE_MAX
    + it is flushed when full

The only difference is that it is flushed also when there is a
complete line. Is this a problem, please?


> And I'm not buying the "we will need this for printk origin
> info injection" argument.

I was against this idea several times. The current API does
not do anything like this.


> - It seems that buffered printk attempts to solve too many problems.
>   I'd prefer it to address just one.

This API tries to handle continuous lines more reliably.
Do I miss anything, please?

Best Regards,
Petr
Tetsuo Handa Nov. 8, 2018, 11:46 a.m. UTC | #16
On 2018/11/08 20:24, Petr Mladek wrote:
>> Let's have one more look at what we will fix and what we will break.
>>
>> 'cont' has premature flushes.
>>
>>   Why is it good.
>>   It preserves the correct order of events.
>>
>>   pr_cont("calling foo->init()....");
>>   foo->init()
>>    printk("Can't allocate buffer\n");    // premature flush
>>   pr_cont("...blah\h");
>>
>>  Will end up in the logbuf as:
>>  [12345.123] calling foo->init()....
>>  [12345.124] Can't allocate buffer
>>  [12345.125] ...blah
>>
>>  Where buffered printk will endup as:
>>  [12345.123] Can't allocate buffer
>>  [12345.124] calling foo->init().......blah
> 
> We will always have this problem with API using explicit buffers.
> What do you suggest instead, please?
> 
> I am afraid that we are running in cycles. The other serious
> alternative was having per-process and per-context buffers
> but it was rejected several times.

Is it possible to identify all locations which should use their own
printk() buffers (e.g. interrupt handlers, oops handlers) ? If yes,
automatically switching printk() buffers (like memalloc_nofs_save()/
memalloc_nofs_restore()) will be easiest and least error prone.
Sergey Senozhatsky Nov. 8, 2018, 12:30 p.m. UTC | #17
On (11/08/18 12:24), Petr Mladek wrote:
> I believe that I mentioned this more times. 16 buffers is the first
> attempt. We could improve it later in several ways

Sure. Like I said - maybe it is a normal development pattern; I really
don't know.

> > Let's have one more look at what we will fix and what we will break.
> > 
> > 'cont' has premature flushes.
> > 
> >   Why is it good.
> >   It preserves the correct order of events.
> > 
> >   pr_cont("calling foo->init()....");
> >   foo->init()
> >    printk("Can't allocate buffer\n");    // premature flush
> >   pr_cont("...blah\h");
> > 
> >  Will end up in the logbuf as:
> >  [12345.123] calling foo->init()....
> >  [12345.124] Can't allocate buffer
> >  [12345.125] ...blah
> > 
> >  Where buffered printk will endup as:
> >  [12345.123] Can't allocate buffer
> >  [12345.124] calling foo->init().......blah
> 
> We will always have this problem with API using explicit buffers.

Exactly.

> What do you suggest instead, please?

So this is why "let's not remove 'cont'" thing. Buffered printk
is not 100% identical to 'cont'. And 'cont' does a good job sometimes,
Because 'cont' looks like a buffered printk, but it behaves like a
normal printk when things go bad. Buffered printk is always buffered;
and not even aware of the fact that things go bad around.

> > If our problem is OOM and lockdep print outs, then let's address only
> > those two; let's not "fix" the rest of the kernel, especially the early
> > boot, - we can break more things than we can mend.
> 
> Do you have any alternative proposal how to handle OOM and lockdep, please?

You misunderstood me. I'm not against the buffered printk in lockdep and
OOM. Albeit I must admit that lockdep patch looks quite big. I don't like
the idea of "we will remove 'cont' and replace it with buffered printk
through out the kernel".

[..]
> > - It seems that buffered printk attempts to solve too many problems.
> >   I'd prefer it to address just one.
> 
> This API tries to handle continuous lines more reliably.
> Do I miss anything, please?

This part:

+       /* Flush already completed lines if any. */
+       for (pos = ptr->len - 1; pos >= 0; pos--) {
+               if (ptr->buf[pos] != '\n')
+                       continue;
+               ptr->buf[pos++] = '\0';
+               printk("%s\n", ptr->buf);
+               ptr->len -= pos;
+               memmove(ptr->buf, ptr->buf + pos, ptr->len);
+               /* This '\0' will be overwritten by next vsnprintf() above. */
+               ptr->buf[ptr->len] = '\0';
+               break;
+       }
+       return r;

If I'm not mistaken, this is for the futute "printk injection" work.

Right now printk("foo\nbar\n") will end up to be a single logbuf
entry, with \n in the middle and at the end. So it will look like
two lines on the serial console:

	[123.123] foo
	          bar

Tetsuo does this \n lookup (on every vprintk_buffered) and split lines
(via memove) for "printk injection", so the output will be

	[123.123] foo
	[123.124] bar

Which makes it simpler to "inject" printk origin into every printed
line.

Without it we can just do

	len = vsprintf();
	if (len && text[len - 1] == '\n' || overflow)
		flush();

Can't we?

	-ss
Petr Mladek Nov. 9, 2018, 2:10 p.m. UTC | #18
On Thu 2018-11-08 21:30:49, Sergey Senozhatsky wrote:
> On (11/08/18 12:24), Petr Mladek wrote:
> > > - It seems that buffered printk attempts to solve too many problems.
> > >   I'd prefer it to address just one.
> > 
> > This API tries to handle continuous lines more reliably.
> > Do I miss anything, please?
> 
> This part:
> 
> +       /* Flush already completed lines if any. */
> +       for (pos = ptr->len - 1; pos >= 0; pos--) {
> +               if (ptr->buf[pos] != '\n')
> +                       continue;
> +               ptr->buf[pos++] = '\0';
> +               printk("%s\n", ptr->buf);
> +               ptr->len -= pos;
> +               memmove(ptr->buf, ptr->buf + pos, ptr->len);
> +               /* This '\0' will be overwritten by next vsnprintf() above. */
> +               ptr->buf[ptr->len] = '\0';
> +               break;
> +       }
> +       return r;
> 
> If I'm not mistaken, this is for the futute "printk injection" work.

No, it does not make sense to distinguish the context at this level.
The buffer is passed as an argument. It should not get passed to another
task or context.

The above code only tries to push complete lines to the main log buffer
and consoles ASAP. It sounds like a Good Idea(tm).


> Right now printk("foo\nbar\n") will end up to be a single logbuf
> entry, with \n in the middle and at the end. So it will look like
> two lines on the serial console:
> 
> 	[123.123] foo
> 	          bar
> 
> Tetsuo does this \n lookup (on every vprintk_buffered) and split lines
> (via memove) for "printk injection", so the output will be
> 
> 	[123.123] foo
> 	[123.124] bar

No, please note that the for cycle searches for '\n' from the end
of the string.

> Which makes it simpler to "inject" printk origin into every printed
> line.
> 
> Without it we can just do
> 
> 	len = vsprintf();
> 	if (len && text[len - 1] == '\n' || overflow)
> 		flush();

I had the same idea. Tetsuo ignored it. I looked for more arguments
and found that '\n' is used in the middle of several pr_cont()
calls, see

     git grep 'pr_cont.*\\n.*[^\\n]"'

Best Regards,
Petr
Sergey Senozhatsky Nov. 12, 2018, 7:59 a.m. UTC | #19
On (11/09/18 15:10), Petr Mladek wrote:
> > 
> > If I'm not mistaken, this is for the futute "printk injection" work.
> 
> The above code only tries to push complete lines to the main log buffer
> and consoles ASAP. It sounds like a Good Idea(tm).

Probably it is. So *quite likely* I'm wrong here.

Hmm... Thinking out loud.
At the same time, splitting a single logbuf entry gives a chance to other
events to mix in. Example:
	pr_cont("Foo:");
	pr_cont("\nbar");
	pr_cont("\n");

Previously it could been stored in one logbuf entry (one logstore,
one console_trylock_spinning), which means that it could have been
printed in one go:

	call_console_drivers()
		spin_trylock_irqsave(&port->lock, flags);
		uart_console_write(.... "Foo:\nbar\n");
		spin_unlock_irqrestore(&port->lock, flags);

While with buffered printk, it will be store in two logbuf entries,
and printed in two goes:

	call_console_drivers()
		spin_trylock_irqsave(&port->lock, flags);
		uart_console_write(.... "Foo:\nbar\n");
		spin_unlock_irqrestore(&port->lock, flags);

	<< ... console driver IRQ, TX port->state->xmit chars ... >>>

	call_console_drivers()
		spin_trylock_irqsave(&port->lock, flags);
		uart_console_write(.... "Foo:\nbar\n");
		spin_unlock_irqrestore(&port->lock, flags);

So, *technically*, we now let more events to happens between printk-s.

But, let's look at some of pr_cont() usage examples.
E.g. dump() from arch/h8300/kernel/traps.c. The code in question looks
as follows:

	pr_info("\nKERNEL STACK:");
	tp = ((unsigned char *) fp) - 0x40;
	for (sp = (unsigned long *) tp, i = 0; (i < 0x40);  i += 4) {
		if ((i % 0x10) == 0)
			pr_info("\n%08x: ", (int) (tp + i));
		pr_info("%08x ", (int) *sp++);
	}
	pr_info("\n");

dmesg

[   15.260099] 0000:    00000000  00000010  00000040  00000090
               0010:    00000100  00000190  00000240  00000310
               0020:    00000400  00000510  00000640  00000790
               0030:    00000900  00000a90  00000c40  00000e10

So we have the entire KERNEL STACK stored as a single line, in
a single logbuf entry.

cat /dev/kmsg

4,687,15260099,c;\x0a0000:    00000000  00000010  00000040  00000090  \x0a0010:    00000100  00000190  00000240  00000310  \x0a0020:    00000400  00000510  00000640  00000790  \x0a0030:    00000900  00000a90  00000c40  00000e10

Shall we consider this as a "reference" representation: data that
pr_cont(), ideally, would have stored as a single logbuf entry? And
then compare the "reference" representation and what buffered printk
does.

Buffered printk always stores this as several lines, IOW several
logbuf entries.

cat /dev/kmsg

4,690,15260152,-;0000:    00000000  00000010  00000040  00000090  
4,691,15260154,-;0010:    00000100  00000190  00000240  00000310  
4,692,15260157,-;0020:    00000400  00000510  00000640  00000790  
4,694,15260161,-;0030:    00000900  00000a90  00000c40  00000e10  

So if we will have concurrent printk()-s happening on other CPUs,
then the KERNEL STACK data block still can be a bit hard to read:

[   15.260152] 0000:    00000000  00000010  00000040  00000090  
<printk CPU1> ... foo bar foo bar
<printk CPU2> ... foo bar foo bar
...
[   15.260154] 0010:    00000100  00000190  00000240  00000310  
<printk CPU3> ... foo bar foo bar
<printk CPU2> ... foo bar foo bar
...
              ... and so on; you got the idea.

> No, please note that the for cycle searches for '\n' from the end
> of the string.

You are right, I didn't notice that. Indeed.


Tetsuo, lockdep report with buffered printks looks a bit different:

 kernel:  Possible unsafe locking scenario:
 kernel:        CPU0                    CPU1
 kernel:        ----                    ----
 kernel:   lock(bar_lock);
 kernel:                                lock(
 kernel: foo_lock);
 kernel:                                lock(bar_lock);
 kernel:   lock(foo_lock);
 kernel: 


> > 	len = vsprintf();
> > 	if (len && text[len - 1] == '\n' || overflow)
> > 		flush();
> 
> I had the same idea. Tetsuo ignored it. I looked for more arguments
> and found that '\n' is used in the middle of several pr_cont()
> calls

OK, so we probably can have that new semantics. But we might split
something that possibly was meant to be a single line with a bunch
of \n in the middle.

	-ss
Tetsuo Handa Nov. 12, 2018, 10:42 a.m. UTC | #20
On 2018/11/12 16:59, Sergey Senozhatsky wrote:
> Tetsuo, lockdep report with buffered printks looks a bit different:
> 
>  kernel:  Possible unsafe locking scenario:
>  kernel:        CPU0                    CPU1
>  kernel:        ----                    ----
>  kernel:   lock(bar_lock);
>  kernel:                                lock(
>  kernel: foo_lock);
>  kernel:                                lock(bar_lock);
>  kernel:   lock(foo_lock);
>  kernel: 
> 

Yes. That is because vprintk_buffered() eliminated KERN_CONT.
Since there was pending partial printk() output, vprintk_buffered()
should not eliminate KERN_CONT. Petr Mladek commented

  1. The mixing of normal and buffered printk calls is a bit confusing
     and error prone. It would make sense to use the buffered printk
     everywhere in the given section of code even when it is not
     strictly needed.

and I made a draft version for how the code would look like if we try to
avoid the mixing of normal and buffered printk calls. The required changes
seem to be too large to apply tree wide. And I suggested try_buffered_printk().
Tetsuo Handa Nov. 17, 2018, 10:14 a.m. UTC | #21
On 2018/11/12 16:59, Sergey Senozhatsky wrote:
> On (11/09/18 15:10), Petr Mladek wrote:
>>>
>>> If I'm not mistaken, this is for the futute "printk injection" work.
>>
>> The above code only tries to push complete lines to the main log buffer
>> and consoles ASAP. It sounds like a Good Idea(tm).
> 
> Probably it is. So *quite likely* I'm wrong here.

In the future, I want to inject caller information to standard console output,
for syzbot is using serial console

  [    0.000000] Linux version 4.20.0-rc2-next-20181116+ (syzkaller@ci) (gcc version 8.0.1 20180413 (experimental) (GCC)) #120 SMP PREEMPT Fri Nov 16 20:21:42 UTC 2018
  [    0.000000] Command line: BOOT_IMAGE=/vmlinuz root=/dev/sda1 console=ttyS0 earlyprintk=serial vsyscall=native rodata=n oops=panic panic_on_warn=1 nmi_watchdog=panic panic=86400 security=apparmor ima_policy=tcb workqueue.watchdog_thresh=140 kvm-intel.nested=1 nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 nopcid

and is experiencing confusion due to concurrent printk() like

  [  552.907865] syz-executor0: vmalloc: allocation failure: 256 bytes, mode:0x6000c0(GFP_KERNEL), nodemask=(null)
  [  552.921666] ------------[ cut here ]------------
  [  552.926434] DEBUG_LOCKS_WARN_ON(depth <= 0)
  [  552.926563] WARNING: CPU: 0 PID: 24211 at kernel/locking/lockdep.c:3595 lock_release+0x740/0xa10
  [  552.927489] syz-executor0 cpuset=syz0 mems_allowed=0
  [  552.930893] Kernel panic - not syncing: panic_on_warn set ...
  [  552.943750] CPU: 1 PID: 24208 Comm: syz-executor0 Not tainted 4.20.0-rc1-next-20181109+ #110
  [  552.959345] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  [  552.968694] Call Trace:
  [  552.971296]  dump_stack+0x244/0x39d
  [  552.974934]  ? dump_stack_print_info.cold.1+0x20/0x20
  [  552.980138]  ? rcu_lockdep_current_cpu_online+0x1a4/0x210
  [  552.985685]  warn_alloc.cold.116+0xb7/0x1bd
  [  552.990011]  ? zone_watermark_ok_safe+0x3f0/0x3f0
  [  552.994884]  ? __get_vm_area_node+0x130/0x3a0
  [  552.999387]  ? rcu_read_lock_sched_held+0x14f/0x180
  [  553.004414]  ? __might_fault+0x12b/0x1e0
  [  553.008487]  ? __get_vm_area_node+0x2e5/0x3a0
  [  553.013000]  __vmalloc_node_range+0x472/0x750
  [  553.017514]  ? do_replace+0x23b/0x4c0
  [  553.021327]  vmalloc+0x6f/0x80
  (...snipped...)
  [  553.296333]  __warn.cold.8+0x20/0x45
  [  553.296353]  ? lock_release+0x740/0xa10
  [  553.329854] Node 0 DMA free:15908kB min:164kB low:204kB high:244kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  [  553.332731]  report_bug+0x254/0x2d0
  [  553.332796]  do_error_trap+0x11b/0x200
  [  553.332822]  do_invalid_op+0x36/0x40
  [  553.336825] lowmem_reserve[]: 0 2818 6321 6321
  [  553.362888]  ? lock_release+0x740/0xa10
  [  553.362906]  invalid_op+0x14/0x20
  [  553.362920] RIP: 0010:lock_release+0x740/0xa10
  [  553.362935] Code: 03 38 d0 7c 08 84 d2 0f 85 da 02 00 00 8b 35 a7 95 b3 08 85 f6 75 15 48 c7 c6 20 66 2b 88 48 c7 c7 c0 33 2b 88 e8 10 36 e7 ff <0f> 0b 48 8b 95 e8 fe ff ff 4c 89 f7 48 8b b5 f0 fe ff ff e8 e8 58
  [  553.362941] RSP: 0018:ffff8801839ef868 EFLAGS: 00010086
  [  553.362950] RAX: 0000000000000000 RBX: 1ffff1003073df12 RCX: ffffc9000be41000
  [  553.362957] RDX: 00000000000087b4 RSI: ffffffff8165ba15 RDI: 0000000000000006
  [  553.362966] RBP: ffff8801839ef998 R08: ffff88017c010600 R09: fffffbfff12b2254
  [  553.362974] R10: fffffbfff12b2254 R11: ffffffff895912a3 R12: ffffffff8b0e27a0
  [  553.362989] R13: ffff8801839ef970 R14: ffff88017c010600 R15: ffff8801839ef8b0
  [  553.373138] Node 0 DMA32 free:2887356kB min:30052kB low:37564kB high:45076kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129332kB managed:2888772kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1416kB local_pcp:0kB free_cma:0kB
  [  553.374209]  ? vprintk_func+0x85/0x181
  [  553.387199] lowmem_reserve[]: 0 0 3503 3503
  [  553.390755]  ? lock_release+0x740/0xa10
  [  553.390840]  ? loop_control_ioctl+0xf5/0x4e0

in https://syzkaller.appspot.com/bug?id=e7838e1e659de1ba566bb17410438b9d1dc59eb7 .

Thus, I appreciate if we could mitigate KERN_CONT problem using try_buffered_printk()
and move on to "caller information injection" topic. I'm fine with starting with
CONFIG_DEBUG_AID_FOR_SYZBOT=y.
diff mbox series

Patch

diff --git a/include/linux/printk.h b/include/linux/printk.h
index cf3eccf..92af345 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -157,6 +157,7 @@  static inline void printk_nmi_direct_enter(void) { }
 static inline void printk_nmi_direct_exit(void) { }
 #endif /* PRINTK_NMI */
 
+struct printk_buffer;
 #ifdef CONFIG_PRINTK
 asmlinkage __printf(5, 0)
 int vprintk_emit(int facility, int level,
@@ -173,6 +174,20 @@  int printk_emit(int facility, int level,
 
 asmlinkage __printf(1, 2) __cold
 int printk(const char *fmt, ...);
+struct printk_buffer *get_printk_buffer(void);
+bool flush_printk_buffer(struct printk_buffer *ptr);
+__printf(2, 3)
+int printk_buffered(struct printk_buffer *ptr, const char *fmt, ...);
+__printf(2, 0)
+int vprintk_buffered(struct printk_buffer *ptr, const char *fmt, va_list args);
+/*
+ * In order to avoid accidentally reusing "ptr" after put_printk_buffer("ptr"),
+ * put_printk_buffer() is defined as a macro which explicitly resets "ptr" to
+ * NULL.
+ */
+void __put_printk_buffer(struct printk_buffer *ptr);
+#define put_printk_buffer(ptr)					\
+	do { __put_printk_buffer(ptr); ptr = NULL; } while (0)
 
 /*
  * Special printk facility for scheduler/timekeeping use only, _DO_NOT_USE_ !
@@ -220,6 +235,17 @@  int printk(const char *s, ...)
 {
 	return 0;
 }
+static inline struct printk_buffer *get_printk_buffer(void)
+{
+	return NULL;
+}
+static inline bool flush_printk_buffer(struct printk_buffer *ptr)
+{
+	return false;
+}
+#define printk_buffered(ptr, fmt, ...) printk(fmt, ##__VA_ARGS__)
+#define vprintk_buffered(ptr, fmt, args) vprintk(fmt, args)
+#define put_printk_buffer(ptr) do { ptr = NULL; } while (0)
 static inline __printf(1, 2) __cold
 int printk_deferred(const char *s, ...)
 {
@@ -330,6 +356,23 @@  static inline void printk_safe_flush_on_panic(void)
 	no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
 #endif
 
+#define bpr_emerg(ptr, fmt, ...) \
+	printk_buffered(ptr, KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
+#define bpr_alert(ptr, fmt, ...) \
+	printk_buffered(ptr, KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
+#define bpr_crit(ptr, fmt, ...) \
+	printk_buffered(ptr, KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
+#define bpr_err(ptr, fmt, ...) \
+	printk_buffered(ptr, KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
+#define bpr_warning(ptr, fmt, ...) \
+	printk_buffered(ptr, KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+#define bpr_warn bpr_warning
+#define bpr_notice(ptr, fmt, ...) \
+	printk_buffered(ptr, KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
+#define bpr_info(ptr, fmt, ...) \
+	printk_buffered(ptr, KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+#define bpr_cont(ptr, fmt, ...) \
+	printk_buffered(ptr, KERN_CONT fmt, ##__VA_ARGS__)
 
 /* If you are writing a driver, please use dev_dbg instead */
 #if defined(CONFIG_DYNAMIC_DEBUG)
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 4a2ffc3..23b1547 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,3 +1,3 @@ 
 obj-y	= printk.o
-obj-$(CONFIG_PRINTK)	+= printk_safe.o
+obj-$(CONFIG_PRINTK)	+= printk_safe.o printk_buffered.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE)	+= braille.o
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 0f18988..5e8c048 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -22,6 +22,9 @@ 
 #define PRINTK_NMI_DIRECT_CONTEXT_MASK	 0x40000000
 #define PRINTK_NMI_CONTEXT_MASK		 0x80000000
 
+#define PREFIX_MAX		32
+#define LOG_LINE_MAX		(1024 - PREFIX_MAX)
+
 extern raw_spinlock_t logbuf_lock;
 
 __printf(5, 0)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1b2a029..0b06211 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -421,9 +421,6 @@  __packed __aligned(4)
 static u64 clear_seq;
 static u32 clear_idx;
 
-#define PREFIX_MAX		32
-#define LOG_LINE_MAX		(1024 - PREFIX_MAX)
-
 #define LOG_LEVEL(v)		((v) & 0x07)
 #define LOG_FACILITY(v)		((v) >> 3 & 0xff)
 
diff --git a/kernel/printk/printk_buffered.c b/kernel/printk/printk_buffered.c
new file mode 100644
index 0000000..d181d31
--- /dev/null
+++ b/kernel/printk/printk_buffered.c
@@ -0,0 +1,179 @@ 
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#include <linux/types.h> /* DECLARE_BITMAP() */
+#include <linux/printk.h>
+#include "internal.h"
+
+/* A structure for line-buffered printk() output. */
+struct printk_buffer {
+	unsigned short int len; /* Valid bytes in buf[]. */
+	char buf[LOG_LINE_MAX];
+} __aligned(1024);
+
+/*
+ * Number of statically preallocated buffers.
+ *
+ * We can introduce a kernel config option if someone wants to tune this value.
+ * But since "struct printk_buffer" makes difference only when there are
+ * multiple threads concurrently calling printk() which does not end with '\n',
+ * and this API will fallback to normal printk() when all buffers are in use,
+ * it is possible that nobody needs to tune this value.
+ */
+#define NUM_LINE_BUFFERS 16
+
+static struct printk_buffer printk_buffers[NUM_LINE_BUFFERS];
+static DECLARE_BITMAP(printk_buffers_in_use, NUM_LINE_BUFFERS);
+
+/**
+ * get_printk_buffer - Try to get printk_buffer.
+ *
+ * Returns pointer to "struct printk_buffer" on success, NULL otherwise.
+ *
+ * If this function returned "struct printk_buffer", the caller is responsible
+ * for passing it to put_printk_buffer() so that "struct printk_buffer" can be
+ * reused in the future.
+ *
+ * Even if this function returned NULL, the caller does not need to check for
+ * NULL, for passing NULL to printk_buffered() simply acts like normal printk()
+ * and passing NULL to flush_printk_buffer()/put_printk_buffer() is a no-op.
+ */
+struct printk_buffer *get_printk_buffer(void)
+{
+	long i;
+
+	for (i = 0; i < NUM_LINE_BUFFERS; i++) {
+		if (test_and_set_bit_lock(i, printk_buffers_in_use))
+			continue;
+		printk_buffers[i].len = 0;
+		return &printk_buffers[i];
+	}
+	return NULL;
+}
+EXPORT_SYMBOL(get_printk_buffer);
+
+/**
+ * vprintk_buffered - Try to vprintk() in line buffered mode.
+ *
+ * @ptr:  Pointer to "struct printk_buffer". It can be NULL.
+ * @fmt:  printk() format string.
+ * @args: va_list structure.
+ *
+ * Returns the return value of vprintk().
+ *
+ * Try to store to @ptr first. If it fails, flush @ptr and then try to store to
+ * @ptr again. If it still fails, use unbuffered printing.
+ */
+int vprintk_buffered(struct printk_buffer *ptr, const char *fmt, va_list args)
+{
+	va_list tmp_args;
+	int r;
+	int pos;
+
+	if (!ptr)
+		return vprintk(fmt, args);
+	/*
+	 * Skip KERN_CONT here based on an assumption that KERN_CONT will be
+	 * given via "fmt" argument when KERN_CONT is given.
+	 */
+	pos = (printk_get_level(fmt) == 'c') ? 2 : 0;
+	while (true) {
+		va_copy(tmp_args, args);
+		r = vsnprintf(ptr->buf + ptr->len, sizeof(ptr->buf) - ptr->len,
+			      fmt + pos, tmp_args);
+		va_end(tmp_args);
+		if (likely(r + ptr->len < sizeof(ptr->buf)))
+			break;
+		if (!flush_printk_buffer(ptr))
+			return vprintk(fmt, args);
+	}
+	ptr->len += r;
+	/* Flush already completed lines if any. */
+	for (pos = ptr->len - 1; pos >= 0; pos--) {
+		if (ptr->buf[pos] != '\n')
+			continue;
+		ptr->buf[pos++] = '\0';
+		printk("%s\n", ptr->buf);
+		ptr->len -= pos;
+		memmove(ptr->buf, ptr->buf + pos, ptr->len);
+		/* This '\0' will be overwritten by next vsnprintf() above. */
+		ptr->buf[ptr->len] = '\0';
+		break;
+	}
+	return r;
+}
+
+/**
+ * printk_buffered - Try to printk() in line buffered mode.
+ *
+ * @ptr: Pointer to "struct printk_buffer". It can be NULL.
+ * @fmt: printk() format string, followed by arguments.
+ *
+ * Returns the return value of printk().
+ *
+ * Try to store to @ptr first. If it fails, flush @ptr and then try to store to
+ * @ptr again. If it still fails, use unbuffered printing.
+ */
+int printk_buffered(struct printk_buffer *ptr, const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_buffered(ptr, fmt, args);
+	va_end(args);
+	return r;
+}
+EXPORT_SYMBOL(printk_buffered);
+
+/**
+ * flush_printk_buffer - Flush incomplete line in printk_buffer.
+ *
+ * @ptr: Pointer to "struct printk_buffer". It can be NULL.
+ *
+ * Returns true if flushed something, false otherwise.
+ *
+ * Flush if @ptr contains partial data. But usually there is no need to call
+ * this function because @ptr is flushed by put_printk_buffer().
+ */
+bool flush_printk_buffer(struct printk_buffer *ptr)
+{
+	if (!ptr || !ptr->len)
+		return false;
+	/*
+	 * vprintk_buffered() keeps 0 <= ptr->len < sizeof(ptr->buf) true.
+	 * But ptr->buf[ptr->len] != '\0' if this function is called due to
+	 * vsnprintf() + ptr->len >= sizeof(ptr->buf).
+	 */
+	ptr->buf[ptr->len] = '\0';
+	printk("%s", ptr->buf);
+	ptr->len = 0;
+	return true;
+}
+EXPORT_SYMBOL(flush_printk_buffer);
+
+/**
+ * __put_printk_buffer - Release printk_buffer.
+ *
+ * @ptr: Pointer to "struct printk_buffer". It can be NULL.
+ *
+ * Returns nothing.
+ *
+ * Flush and release @ptr.
+ * Please use put_printk_buffer() in order to catch use-after-free bugs.
+ */
+void __put_printk_buffer(struct printk_buffer *ptr)
+{
+	long i = (unsigned long) ptr - (unsigned long) printk_buffers;
+
+	if (!ptr)
+		return;
+	if (WARN_ON_ONCE(i % sizeof(struct printk_buffer)))
+		return;
+	i /= sizeof(struct printk_buffer);
+	if (WARN_ON_ONCE(i < 0 || i >= NUM_LINE_BUFFERS))
+		return;
+	if (ptr->len)
+		flush_printk_buffer(ptr);
+	clear_bit_unlock(i, printk_buffers_in_use);
+}
+EXPORT_SYMBOL(__put_printk_buffer);