diff mbox

[RFC] x86, mce: change the mce notifier to 'blocking' from 'atomic'

Message ID 20170412224232.GM3956@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul E. McKenney April 12, 2017, 10:42 p.m. UTC
On Wed, Apr 12, 2017 at 11:50:45PM +0200, Thomas Gleixner wrote:
> On Wed, 12 Apr 2017, Borislav Petkov wrote:
> 
> > On Wed, Apr 12, 2017 at 08:27:05PM +0000, Verma, Vishal L wrote:
> > > But isn't the atomic notifier call chain always called in atomic
> > > context?
> > 
> > No, it isn't. We're calling it in normal process context in
> > mce_gen_pool_process() too.
> > 
> > So this early exit will avoid any sleeping in atomic context. And since
> > there's nothing you can do about the errors reported in atomic context,
> > we can actually use that fact.
> 
> No, you can't.
> 
> CONFIG_RCU_PREEMPT=n + CONFIG_PREEMPT_COUNT will disable preemption from
> within __atomic_notifier_call_chain() via rcu_read_lock(). Ergo you wont
> ever enter the handler.
> 
> The behaviour in the RCU code is inconsistent. CONFIG_RCU_PREEMPT=y does
> obviouly not disable preemption, but it should still trigger the
> might_sleep() check when a blocking function is called from within a rcu
> read side critical section.

Maybe something like the (untested) patch below.  Please note that this
would need some help to work correctly in -rt.  This applies only against
-rcu tip, but in that case you can just get it directly from -rcu.

							Thanx, Paul

------------------------------------------------------------------------

commit 122acec803471468d8a453d08219ca2fc94f5556
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Apr 12 15:29:14 2017 -0700

    rcu: Complain if blocking in preemptible RCU read-side critical section
    
    Although preemptible RCU allows its read-side critical sections to be
    preempted, general blocking is forbidden.  The reason for this is that
    excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a
    voluntarily blocked task doesn't care how high you boost its priority.
    Because preemptible RCU is a global mechanism, one ill-behaved reader
    hurts everyone.  Hence the prohibition against general blocking in
    RCU-preempt read-side critical sections.  Preemption yes, blocking no.
    
    This commit enforces this prohibition.
    
    There is a special exception for the -rt patchset (which they kindly
    volunteered to implement):  It is OK to block (as opposed to merely being
    preempted) within an RCU-preempt read-side critical section, but only if
    the blocking is subject to priority inheritance.  This exception permits
    CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble.
    
    Why doesn't this exception also apply to mainline's rt_mutex?  Because
    of the possibility that someone does general blocking while holding
    an rt_mutex.  Yes, the priority boosting will affect the rt_mutex,
    but it won't help with the task doing general blocking while holding
    that rt_mutex.
    
    Reported-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Comments

Paul E. McKenney April 12, 2017, 11:45 p.m. UTC | #1
On Wed, Apr 12, 2017 at 03:42:32PM -0700, Paul E. McKenney wrote:
> On Wed, Apr 12, 2017 at 11:50:45PM +0200, Thomas Gleixner wrote:
> > On Wed, 12 Apr 2017, Borislav Petkov wrote:
> > 
> > > On Wed, Apr 12, 2017 at 08:27:05PM +0000, Verma, Vishal L wrote:
> > > > But isn't the atomic notifier call chain always called in atomic
> > > > context?
> > > 
> > > No, it isn't. We're calling it in normal process context in
> > > mce_gen_pool_process() too.
> > > 
> > > So this early exit will avoid any sleeping in atomic context. And since
> > > there's nothing you can do about the errors reported in atomic context,
> > > we can actually use that fact.
> > 
> > No, you can't.
> > 
> > CONFIG_RCU_PREEMPT=n + CONFIG_PREEMPT_COUNT will disable preemption from
> > within __atomic_notifier_call_chain() via rcu_read_lock(). Ergo you wont
> > ever enter the handler.
> > 
> > The behaviour in the RCU code is inconsistent. CONFIG_RCU_PREEMPT=y does
> > obviouly not disable preemption, but it should still trigger the
> > might_sleep() check when a blocking function is called from within a rcu
> > read side critical section.
> 
> Maybe something like the (untested) patch below.  Please note that this
> would need some help to work correctly in -rt.  This applies only against
> -rcu tip, but in that case you can just get it directly from -rcu.

So I injected a schedule_timeout_interruptible() into rcutorture's RCU
read-side critical section, and the patch complained as expected.  But is
also got a "scheduling while atomic" and a "DEBUG_LOCKS_WARN_ON(val >
preempt_count())" and a warning at "kernel/time/timer.c:1275", which
is this:

	if (count != preempt_count()) {
		WARN_ONCE(1, "timer: %pF preempt leak: %08x -> %08x\n",
			  fn, count, preempt_count());
		/*
		 * Restore the preempt count. That gives us a decent
		 * chance to survive and extract information. If the
		 * callback kept a lock held, bad luck, but not worse
		 * than the BUG() we had.
		 */
		preempt_count_set(count);
	}

So you are saying that you are seeing blocking in RCU-preempt read-side
critical sections being ignored?

Here is the Kconfig fragment used by this test:

	CONFIG_SMP=y
	CONFIG_NR_CPUS=8
	CONFIG_PREEMPT_NONE=n
	CONFIG_PREEMPT_VOLUNTARY=n
	CONFIG_PREEMPT=y
	#CHECK#CONFIG_PREEMPT_RCU=y
	CONFIG_HZ_PERIODIC=n
	CONFIG_NO_HZ_IDLE=y
	CONFIG_NO_HZ_FULL=n
	CONFIG_RCU_FAST_NO_HZ=n
	CONFIG_RCU_TRACE=n
	CONFIG_HOTPLUG_CPU=n
	CONFIG_SUSPEND=n
	CONFIG_HIBERNATION=n
	CONFIG_RCU_FANOUT=3
	CONFIG_RCU_FANOUT_LEAF=3
	CONFIG_RCU_NOCB_CPU=n
	CONFIG_DEBUG_LOCK_ALLOC=y
	CONFIG_PROVE_LOCKING=n
	CONFIG_RCU_BOOST=n
	CONFIG_RCU_EXPERT=y
	CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP=y
	CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y
	CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT=y
	CONFIG_DEBUG_OBJECTS=y
	CONFIG_DEBUG_OBJECTS_RCU_HEAD=y

I will run other scenarios overnight.

							Thanx, Paul
Paul E. McKenney April 13, 2017, 2:34 p.m. UTC | #2
On Wed, Apr 12, 2017 at 04:45:59PM -0700, Paul E. McKenney wrote:
> On Wed, Apr 12, 2017 at 03:42:32PM -0700, Paul E. McKenney wrote:
> > On Wed, Apr 12, 2017 at 11:50:45PM +0200, Thomas Gleixner wrote:
> > > On Wed, 12 Apr 2017, Borislav Petkov wrote:
> > > 
> > > > On Wed, Apr 12, 2017 at 08:27:05PM +0000, Verma, Vishal L wrote:
> > > > > But isn't the atomic notifier call chain always called in atomic
> > > > > context?
> > > > 
> > > > No, it isn't. We're calling it in normal process context in
> > > > mce_gen_pool_process() too.
> > > > 
> > > > So this early exit will avoid any sleeping in atomic context. And since
> > > > there's nothing you can do about the errors reported in atomic context,
> > > > we can actually use that fact.
> > > 
> > > No, you can't.
> > > 
> > > CONFIG_RCU_PREEMPT=n + CONFIG_PREEMPT_COUNT will disable preemption from
> > > within __atomic_notifier_call_chain() via rcu_read_lock(). Ergo you wont
> > > ever enter the handler.
> > > 
> > > The behaviour in the RCU code is inconsistent. CONFIG_RCU_PREEMPT=y does
> > > obviouly not disable preemption, but it should still trigger the
> > > might_sleep() check when a blocking function is called from within a rcu
> > > read side critical section.
> > 
> > Maybe something like the (untested) patch below.  Please note that this
> > would need some help to work correctly in -rt.  This applies only against
> > -rcu tip, but in that case you can just get it directly from -rcu.
> 
> So I injected a schedule_timeout_interruptible() into rcutorture's RCU
> read-side critical section, and the patch complained as expected.  But is
> also got a "scheduling while atomic" and a "DEBUG_LOCKS_WARN_ON(val >
> preempt_count())" and a warning at "kernel/time/timer.c:1275", which
> is this:
> 
> 	if (count != preempt_count()) {
> 		WARN_ONCE(1, "timer: %pF preempt leak: %08x -> %08x\n",
> 			  fn, count, preempt_count());
> 		/*
> 		 * Restore the preempt count. That gives us a decent
> 		 * chance to survive and extract information. If the
> 		 * callback kept a lock held, bad luck, but not worse
> 		 * than the BUG() we had.
> 		 */
> 		preempt_count_set(count);
> 	}
> 
> So you are saying that you are seeing blocking in RCU-preempt read-side
> critical sections being ignored?
> 
> Here is the Kconfig fragment used by this test:
> 
> 	CONFIG_SMP=y
> 	CONFIG_NR_CPUS=8
> 	CONFIG_PREEMPT_NONE=n
> 	CONFIG_PREEMPT_VOLUNTARY=n
> 	CONFIG_PREEMPT=y
> 	#CHECK#CONFIG_PREEMPT_RCU=y
> 	CONFIG_HZ_PERIODIC=n
> 	CONFIG_NO_HZ_IDLE=y
> 	CONFIG_NO_HZ_FULL=n
> 	CONFIG_RCU_FAST_NO_HZ=n
> 	CONFIG_RCU_TRACE=n
> 	CONFIG_HOTPLUG_CPU=n
> 	CONFIG_SUSPEND=n
> 	CONFIG_HIBERNATION=n
> 	CONFIG_RCU_FANOUT=3
> 	CONFIG_RCU_FANOUT_LEAF=3
> 	CONFIG_RCU_NOCB_CPU=n
> 	CONFIG_DEBUG_LOCK_ALLOC=y
> 	CONFIG_PROVE_LOCKING=n
> 	CONFIG_RCU_BOOST=n
> 	CONFIG_RCU_EXPERT=y
> 	CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP=y
> 	CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y
> 	CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT=y
> 	CONFIG_DEBUG_OBJECTS=y
> 	CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> 
> I will run other scenarios overnight.

Well, that was an extremely poor choice of scenario to test.  Yes, it
has CONFIG_PREEMPT_RCU=y, but it tests RCU-bh because the TREE01.boot
file contains "rcutorture.torture_type=rcu_bh maxcpus=8".  Hence all the
"scheduling while atomic" and other complaints.  :-/

Without the patch, RCU-preempt does not complain, as reported, as
shown below.  Other RCU flavors always give "scheduling while atomic".
With the patch, all flavors always complain, including RCU-preempt.

TREE01 ------- 18589 grace periods (103.272 per second)
BUG: 2028 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE01
BUG: FAILURE, 1982 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE01/console.log
WARNING: Summary: Warnings: 2 Bugs: 2033 Call Traces: 2034
----> Lots of:
	"scheduling while atomic"
	"DEBUG_LOCKS_WARN_ON(val > preempt_count())"
----> Preemptible, but testing RCU-bh, thus blocking makes too-short GP.
      Failures are thus expected behavior.
      CONFIG_HOTPLUG_CPU=y, CONFIG_MAXSMP=y, CONFIG_CPUMASK_OFFSTACK=y,
      CONFIG_RCU_NOCB_CPU_ZERO=y.

TREE02 ------- 3270 grace periods (18.1667 per second)
----> Preemptible, CONFIG_HOTPLUG_CPU=n, CONFIG_DEBUG_LOCK_ALLOC=y,
      CONFIG_PROVE_LOCKING=n.  No complaints.

TREE03 ------- 2963 grace periods (16.4611 per second)
CPU count limited from 16 to 8  (Yeah, yeah, wimpy laptop.)
----> Preemptible, CONFIG_HOTPLUG_CPU=y, CONFIG_RCU_BOOST=y,
      CONFIG_RCU_KTHREAD_PRIO=2.  Again, no complaints.

TREE04 ------- 272 grace periods (1.51111 per second)
BUG: 33 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE04
BUG: FAILURE, 72 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE04/console.log
WARNING: Summary: Warnings: 1 Bugs: 30 Call Traces: 31 Stalls: 1
!!! PID 29727 hung at 360 vs. 180 seconds
----> Nonpreemptible, so failures are expected behavior, especially given that
      this is testing RCU-bh.
      "scheduling while atomic"

TREE05 ------- 4520 grace periods (25.1111 per second)
BUG: 1770 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE05
BUG: FAILURE, 1358 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE05/console.log
WARNING: Summary: Warnings: 1 Bugs: 2 Call Traces: 37 Stalls: 14 Starves: 2
!!! PID 13911 hung at 370 vs. 180 seconds
----> Nonpreemptible, so failures are expected behavior.  Also testing RCU-sched.
      "scheduling while atomic"

TREE06 ------- 4012 grace periods (22.2889 per second)
BUG: 2314 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE06
BUG: FAILURE, 1783 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE06/console.log
WARNING: Summary: Warnings: 3 Bugs: 1 Call Traces: 4
----> Nonpreemptible, so failures are expected behavior.
      CONFIG_PROVE_RCU=y, hence:
      "Illegal context switch in RCU read-side critical section!"
      Also:
      "scheduling while atomic"

TREE07 ------- 1853 grace periods (10.2944 per second)
BUG: 278 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE07
BUG: FAILURE, 621 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE07/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1
CPU count limited from 16 to 8
----> Nonpreemptible, so failures are expected behavior.
      But no complaints other than from rcutorture.

TREE08 ------- 2725 grace periods (15.1389 per second)
BUG: 2265 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE08
BUG: FAILURE, 1152 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-17:51:14/TREE08/console.log
WARNING: Summary: Warnings: 3 Bugs: 4290 Call Traces: 4292
----> Lots of:
	"scheduling while atomic"
	"DEBUG_LOCKS_WARN_ON(val > preempt_count())"
----> Preemptible, but testing RCU-bh, so failures are expected behavior.
      CONFIG_HOTPLUG_CPU=n, CONFIG_PROVE_LOCKING=n, CONFIG_RCU_EQS_DEBUG=y.

TREE09 ------- 12148 grace periods (67.4889 per second)
----> Preemptible.  Also !SMP.  No complaints.

------------------------------------------------------------------------

Now with the patch, we always get a splat no matter what:

TREE01 ------- 12418 grace periods (68.9889 per second)
BUG: 2768 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE01
BUG: FAILURE, 2692 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE01/console.log
WARNING: Summary: Warnings: 2 Bugs: 2786 Call Traces: 2787

TREE02 ------- 3221 grace periods (17.8944 per second)
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE02/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1

TREE03 ------- 3129 grace periods (17.3833 per second)
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE03/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1
CPU count limited from 16 to 8

TREE04 ------- 497 grace periods (2.76111 per second)
BUG: 38 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE04
BUG: FAILURE, 99 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE04/console.log
WARNING: Summary: Warnings: 1 Bugs: 34 Call Traces: 35 Stalls: 1
!!! PID 12749 hung at 360 vs. 180 seconds

TREE05 ------- 5793 grace periods (32.1833 per second)
BUG: 2138 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE05
BUG: FAILURE, 1672 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE05/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1

TREE06 ------- 4072 grace periods (22.6222 per second)
BUG: 2339 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE06
BUG: FAILURE, 1762 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE06/console.log
WARNING: Summary: Warnings: 2 Call Traces: 2

TREE07 ------- 1863 grace periods (10.35 per second)
BUG: 288 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE07
BUG: FAILURE, 590 instances
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE07/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1
CPU count limited from 16 to 8

TREE08 ------- 2628 grace periods (14.6 per second)
BUG: 2203 Reader Batch close calls in 3 minute run: /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE08
BUG: FAILURE, 1086 instances
   
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE08/console.log
WARNING: Summary: Warnings: 2 Bugs: 4265 Call Traces: 4266

TREE09 ------- 12109 grace periods (67.2722 per second)
WARNING: Assertion failure in /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2017.04.12-21:13:30/TREE09/console.log
WARNING: Summary: Warnings: 1 Call Traces: 1

------------------------------------------------------------------------

So I have the patch queued for 4.13.

							Thanx, Paul
diff mbox

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d013bd4767a7..abc09d368b3a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -465,7 +465,7 @@  void rcu_note_context_switch(bool preempt)
 	barrier(); /* Avoid RCU read-side critical sections leaking down. */
 	trace_rcu_utilization(TPS("Start context switch"));
 	rcu_sched_qs();
-	rcu_preempt_note_context_switch();
+	rcu_preempt_note_context_switch(preempt);
 	/* Load rcu_urgent_qs before other flags. */
 	if (!smp_load_acquire(this_cpu_ptr(&rcu_dynticks.rcu_urgent_qs)))
 		goto out;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 0e598ab08fea..781fe684f230 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -476,7 +476,7 @@  DECLARE_PER_CPU(char, rcu_cpu_has_work);
 
 /* Forward declarations for rcutree_plugin.h */
 static void rcu_bootup_announce(void);
-static void rcu_preempt_note_context_switch(void);
+static void rcu_preempt_note_context_switch(bool preempt);
 static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp);
 #ifdef CONFIG_HOTPLUG_CPU
 static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 6d8f7f82259c..67a90158f32e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -286,12 +286,13 @@  static void rcu_preempt_qs(void)
  *
  * Caller must disable interrupts.
  */
-static void rcu_preempt_note_context_switch(void)
+static void rcu_preempt_note_context_switch(bool preempt)
 {
 	struct task_struct *t = current;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 
+	WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
 	if (t->rcu_read_lock_nesting > 0 &&
 	    !t->rcu_read_unlock_special.b.blocked) {
 
@@ -738,7 +739,7 @@  static void __init rcu_bootup_announce(void)
  * Because preemptible RCU does not exist, we never have to check for
  * CPUs being in quiescent states.
  */
-static void rcu_preempt_note_context_switch(void)
+static void rcu_preempt_note_context_switch(bool preempt)
 {
 }