diff mbox

printk: Correctly handle preemption in console_unlock()

Message ID 1484313321-17196-1-git-send-email-pmladek@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Petr Mladek Jan. 13, 2017, 1:15 p.m. UTC
Some console drivers code calls console_conditional_schedule()
that looks at @console_may_schedule. The value must be cleared
when the drivers are called from console_unlock() with
interrupts disabled. But rescheduling is fine when the same
code is called, for example, from tty operations where the
console semaphore is taken via console_lock().

This is why @console_may_schedule is cleared before calling console
drivers. The original value is stored to decide if we could sleep
between lines.

Now, @console_may_schedule is not cleared when we call
console_trylock() and jump back to the "again" goto label.
This has become a problem, since the commit 6b97a20d3a7909daa066
("printk: set may_schedule for some of console_trylock() callers").
@console_may_schedule might get enabled now.

There is also the opposite problem. console_lock() can be called
only from preemptive context. It can always enable scheduling in
the console code. But console_trylock() is not able to detect it
when CONFIG_PREEMPT_COUNT is disabled. Therefore we should use the
original @console_may_schedule value after re-acquiring
the console semaphore in console_unlock().

This patch solves both problems by clearing and restoring the very
original @may_schedule setting only around call_console_drivers().

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
This is related to the thread
https://lkml.kernel.org/r/201612261954.FJE69201.OFLVtFJSQFOHMO@I-love.SAKURA.ne.jp

 kernel/printk/printk.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

Comments

Steven Rostedt Jan. 13, 2017, 4:05 p.m. UTC | #1
On Fri, 13 Jan 2017 14:15:21 +0100
Petr Mladek <pmladek@suse.com> wrote:

> ---
> This is related to the thread
> https://lkml.kernel.org/r/201612261954.FJE69201.OFLVtFJSQFOHMO@I-love.SAKURA.ne.jp
> 
>  kernel/printk/printk.c | 25 ++++++++++++-------------
>  1 file changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 7180088cbb23..2ac54291230d 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2150,7 +2150,7 @@ void console_unlock(void)
>  	static u64 seen_seq;
>  	unsigned long flags;
>  	bool wake_klogd = false;
> -	bool do_cond_resched, retry;
> +	bool may_schedule_orig, retry;

<bike-shedding>
 Hmm, I just hate the name of that variable.
 console_may_schedule_orig, keep the full name?
</bike-shedding>

>  
>  	if (console_suspended) {
>  		up_console_sem();
> @@ -2158,17 +2158,15 @@ void console_unlock(void)
>  	}
>  
>  	/*
> -	 * Console drivers are called under logbuf_lock, so
> -	 * @console_may_schedule should be cleared before; however, we may
> -	 * end up dumping a lot of lines, for example, if called from
> -	 * console registration path, and should invoke cond_resched()
> -	 * between lines if allowable.  Not doing so can cause a very long
> -	 * scheduling stall on a slow console leading to RCU stall and
> -	 * softlockup warnings which exacerbate the issue with more
> -	 * messages practically incapacitating the system.
> +	 * Console drivers are called with interrupts disabled, so
> +	 * @console_may_schedule must be cleared before. The original
> +	 * value must be restored so that we could schedule between lines.
> +	 *
> +	 * console_trylock() is not able to detect the preemptive context when
> +	 * CONFIG_PREEMPT_COUNT is disabled. Therefore the value must be
> +	 * stored before the "again" goto label.
>  	 */
> -	do_cond_resched = console_may_schedule;
> -	console_may_schedule = 0;
> +	may_schedule_orig = console_may_schedule;
>  
>  again:
>  	/*
> @@ -2235,12 +2233,13 @@ void console_unlock(void)
>  		raw_spin_unlock(&logbuf_lock);
>  
>  		stop_critical_timings();	/* don't trace print latency */
> +		console_may_schedule = 0;
>  		call_console_drivers(ext_text, ext_len, text, len);
> +		console_may_schedule = may_schedule_orig;
>  		start_critical_timings();
>  		printk_safe_exit_irqrestore(flags);
>  
> -		if (do_cond_resched)
> -			cond_resched();
> +		console_conditional_schedule();

Makes perfect sense to me. The only thing that worries me is that it
does change the logic slightly, and I'm not sure if this will have any
ramifications with it. That is, console_unlock() use to always leave
with console_may_schedule equal to zero, where console_unlock() clears
it. With this change, console_unlock() no longer clears that variable.
Will that have any side effects that we are unaware of?

-- Steve

>  	}
>  	console_locked = 0;
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Petr Mladek Jan. 16, 2017, 11 a.m. UTC | #2
On Fri 2017-01-13 11:05:42, Steven Rostedt wrote:
> On Fri, 13 Jan 2017 14:15:21 +0100
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > ---
> > This is related to the thread
> > https://lkml.kernel.org/r/201612261954.FJE69201.OFLVtFJSQFOHMO@I-love.SAKURA.ne.jp
> > 
> >  kernel/printk/printk.c | 25 ++++++++++++-------------
> >  1 file changed, 12 insertions(+), 13 deletions(-)
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 7180088cbb23..2ac54291230d 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2150,7 +2150,7 @@ void console_unlock(void)
> >  	static u64 seen_seq;
> >  	unsigned long flags;
> >  	bool wake_klogd = false;
> > -	bool do_cond_resched, retry;
> > +	bool may_schedule_orig, retry;
> 
> <bike-shedding>
>  Hmm, I just hate the name of that variable.
>  console_may_schedule_orig, keep the full name?
> </bike-shedding>

No problem. I will use the full name in the next iteration
if there will be one.

> >  
> >  	if (console_suspended) {
> >  		up_console_sem();
> > @@ -2158,17 +2158,15 @@ void console_unlock(void)
> >  	}
> >  
> >  	/*
> > -	 * Console drivers are called under logbuf_lock, so
> > -	 * @console_may_schedule should be cleared before; however, we may
> > -	 * end up dumping a lot of lines, for example, if called from
> > -	 * console registration path, and should invoke cond_resched()
> > -	 * between lines if allowable.  Not doing so can cause a very long
> > -	 * scheduling stall on a slow console leading to RCU stall and
> > -	 * softlockup warnings which exacerbate the issue with more
> > -	 * messages practically incapacitating the system.
> > +	 * Console drivers are called with interrupts disabled, so
> > +	 * @console_may_schedule must be cleared before. The original
> > +	 * value must be restored so that we could schedule between lines.
> > +	 *
> > +	 * console_trylock() is not able to detect the preemptive context when
> > +	 * CONFIG_PREEMPT_COUNT is disabled. Therefore the value must be
> > +	 * stored before the "again" goto label.
> >  	 */
> > -	do_cond_resched = console_may_schedule;
> > -	console_may_schedule = 0;
> > +	may_schedule_orig = console_may_schedule;
> >  
> >  again:
> >  	/*
> > @@ -2235,12 +2233,13 @@ void console_unlock(void)
> >  		raw_spin_unlock(&logbuf_lock);
> >  
> >  		stop_critical_timings();	/* don't trace print latency */
> > +		console_may_schedule = 0;
> >  		call_console_drivers(ext_text, ext_len, text, len);
> > +		console_may_schedule = may_schedule_orig;
> >  		start_critical_timings();
> >  		printk_safe_exit_irqrestore(flags);
> >  
> > -		if (do_cond_resched)
> > -			cond_resched();
> > +		console_conditional_schedule();
> 
> Makes perfect sense to me. The only thing that worries me is that it
> does change the logic slightly, and I'm not sure if this will have any
> ramifications with it. That is, console_unlock() use to always leave
> with console_may_schedule equal to zero, where console_unlock() clears
> it. With this change, console_unlock() no longer clears that variable.
> Will that have any side effects that we are unaware of?

Good question!

If I get it correctly, the variable should never be used without the
console semaphore. IMHO, if it was used without the semaphore or if
it was not set correctly when the semaphore was taken, it would be a
bug. It means that leaving the variable set might actually help
to find a buggy usage if there is any.

My findings:

  + console_may_lock is set only by functions that get the console
    semaphore.

  + The function that takes the semaphore and does not set the
    variable is resume_console(). IMHO, it is a bug.

    We are on the safe side because the function is called from
    the same context as suspend_console() and it allows rescheduling.


  + I am not aware of any use of the variable without the
    semaphore. But it is not easy to prove just be reading
    the code.


Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergey Senozhatsky Jan. 18, 2017, 5:45 a.m. UTC | #3
On (01/16/17 12:00), Petr Mladek wrote:
[..]
> > Makes perfect sense to me. The only thing that worries me is that it
> > does change the logic slightly, and I'm not sure if this will have any
> > ramifications with it. That is, console_unlock() use to always leave
> > with console_may_schedule equal to zero, where console_unlock() clears
> > it. With this change, console_unlock() no longer clears that variable.
> > Will that have any side effects that we are unaware of?
> 
> Good question!

it does look a bit worrisome.

> If I get it correctly, the variable should never be used without the
> console semaphore. IMHO, if it was used without the semaphore or if
> it was not set correctly when the semaphore was taken, it would be a
> bug. It means that leaving the variable set might actually help
> to find a buggy usage if there is any.
> 
> My findings:
> 
>   + console_may_lock is set only by functions that get the console
>     semaphore.
> 
>   + The function that takes the semaphore and does not set the
>     variable is resume_console(). IMHO, it is a bug.
> 
>     We are on the safe side because the function is called from
>     the same context as suspend_console() and it allows rescheduling.
> 
> 
>   + I am not aware of any use of the variable without the
>     semaphore. But it is not easy to prove just be reading
>     the code.

there is a function that clears @console_may_schedule out of
console_sem scope - console_flush_on_panic().
so I *may be* can think about a worst case scenario of race
condition between
	console_flush_on_panic()->console_may_schedule = 0 on panic CPU
and
	console_unlock()->console_may_schedule = 1 from CPU that panic CPU
failed to stop (smp_send_stop() can return with secondary CPUs still being
online).

thoughts?

	-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergey Senozhatsky Jan. 18, 2017, 7:21 a.m. UTC | #4
On (01/18/17 14:45), Sergey Senozhatsky wrote:
[..]
> 
> there is a function that clears @console_may_schedule out of
> console_sem scope - console_flush_on_panic().
> so I *may be* can think about a worst case scenario of race
> condition between
> 	console_flush_on_panic()->console_may_schedule = 0 on panic CPU
> and
> 	console_unlock()->console_may_schedule = 1 from CPU that panic CPU
> failed to stop (smp_send_stop() can return with secondary CPUs still being
> online).

what I mean, is that we can have, let's say, 2 CPUs spinning in
console_unlock(), both with @console_may_schedule == 1 (because secondary
CPU restores global @console_may_schedule value). now, suppose, we have
misbehaving scheduler (well, we are in panic after all). secondary CPU
will cond_resched() and may be lockup somewhere in the scheduler. which is
fine, we don't care about that secondary CPU anyway. but the same can happen
to panic CPU as well.

what do you think?

	-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Petr Mladek Jan. 25, 2017, 12:34 p.m. UTC | #5
On Wed 2017-01-18 16:21:41, Sergey Senozhatsky wrote:
> On (01/18/17 14:45), Sergey Senozhatsky wrote:
> [..]
> > 
> > there is a function that clears @console_may_schedule out of
> > console_sem scope - console_flush_on_panic().
> > so I *may be* can think about a worst case scenario of race
> > condition between
> > 	console_flush_on_panic()->console_may_schedule = 0 on panic CPU
> > and
> > 	console_unlock()->console_may_schedule = 1 from CPU that panic CPU
> > failed to stop (smp_send_stop() can return with secondary CPUs still being
> > online).
> 
> what I mean, is that we can have, let's say, 2 CPUs spinning in
> console_unlock(), both with @console_may_schedule == 1 (because secondary
> CPU restores global @console_may_schedule value). now, suppose, we have
> misbehaving scheduler (well, we are in panic after all). secondary CPU
> will cond_resched() and may be lockup somewhere in the scheduler. which is
> fine, we don't care about that secondary CPU anyway. but the same can happen
> to panic CPU as well.
> 
> what do you think?

Great catch!

console_flush_on_panic() is called after smp_send_stop();
so only one CPU should be running. But it is not guaranteed.

Better be on the safe side. I am going to use a conservative
solution that will only move the "again" goto label.


Just some thoughts for a future work:

The dependencies between console_sem, console_may_schedule,
console_locked, and console_suspended are complex like hell.
There are several surprises.

For example, console_trylock() and console_lock() behave differently
when console_suspended is set. console_trylock() completely fails.
console_lock() succeeds but it does not modify console_locked
and console_may_schedule.

This is the reason why we do not need to check console_suspended
after the "again" goto target.


IMHO, the key to make it more straightforward is to split
console flushing functionality from console_unlock().

It is a bit problematic. console_unlock() guarantees that all
messages are flushed when the semaphore is finally released.
IMHO, it might get more relaxed with some deferred techniques.
The deferred handling is perfectly fine most of the time.
In emergency situations, the console_sem is either available
or we rely on console_flush_on_panic() anyway.

Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7180088cbb23..2ac54291230d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2150,7 +2150,7 @@  void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
-	bool do_cond_resched, retry;
+	bool may_schedule_orig, retry;
 
 	if (console_suspended) {
 		up_console_sem();
@@ -2158,17 +2158,15 @@  void console_unlock(void)
 	}
 
 	/*
-	 * Console drivers are called under logbuf_lock, so
-	 * @console_may_schedule should be cleared before; however, we may
-	 * end up dumping a lot of lines, for example, if called from
-	 * console registration path, and should invoke cond_resched()
-	 * between lines if allowable.  Not doing so can cause a very long
-	 * scheduling stall on a slow console leading to RCU stall and
-	 * softlockup warnings which exacerbate the issue with more
-	 * messages practically incapacitating the system.
+	 * Console drivers are called with interrupts disabled, so
+	 * @console_may_schedule must be cleared before. The original
+	 * value must be restored so that we could schedule between lines.
+	 *
+	 * console_trylock() is not able to detect the preemptive context when
+	 * CONFIG_PREEMPT_COUNT is disabled. Therefore the value must be
+	 * stored before the "again" goto label.
 	 */
-	do_cond_resched = console_may_schedule;
-	console_may_schedule = 0;
+	may_schedule_orig = console_may_schedule;
 
 again:
 	/*
@@ -2235,12 +2233,13 @@  void console_unlock(void)
 		raw_spin_unlock(&logbuf_lock);
 
 		stop_critical_timings();	/* don't trace print latency */
+		console_may_schedule = 0;
 		call_console_drivers(ext_text, ext_len, text, len);
+		console_may_schedule = may_schedule_orig;
 		start_critical_timings();
 		printk_safe_exit_irqrestore(flags);
 
-		if (do_cond_resched)
-			cond_resched();
+		console_conditional_schedule();
 	}
 	console_locked = 0;