diff mbox series

[5/6] doc/rcutorture: Add description of rcutorture.stall_cpu_block

Message ID 20230510171238.2189921-5-paulmck@kernel.org (mailing list archive)
State Accepted
Commit 3cbc1c327de116a41475deab665b5490ea2583f9
Headers show
Series Torture-test updates for v6.5 | expand

Commit Message

Paul E. McKenney May 10, 2023, 5:12 p.m. UTC
From: Zqiang <qiang1.zhang@intel.com>

If you build a kernel with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y,
then run the rcutorture tests specifying stalls as follows:

runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" \
	bootparams="console=ttyS0 rcutorture.stall_cpu=30 \
	rcutorture.stall_no_softlockup=1 rcutorture.stall_cpu_block=1" -d

The tests will produce the following splat:

[   10.841071] rcu-torture: rcu_torture_stall begin CPU stall
[   10.841073] rcu_torture_stall start on CPU 3.
[   10.841077] BUG: scheduling while atomic: rcu_torture_sta/66/0x0000000
....
[   10.841108] Call Trace:
[   10.841110]  <TASK>
[   10.841112]  dump_stack_lvl+0x64/0xb0
[   10.841118]  dump_stack+0x10/0x20
[   10.841121]  __schedule_bug+0x8b/0xb0
[   10.841126]  __schedule+0x2172/0x2940
[   10.841157]  schedule+0x9b/0x150
[   10.841160]  schedule_timeout+0x2e8/0x4f0
[   10.841192]  schedule_timeout_uninterruptible+0x47/0x50
[   10.841195]  rcu_torture_stall+0x2e8/0x300
[   10.841199]  kthread+0x175/0x1a0
[   10.841206]  ret_from_fork+0x2c/0x50

This is because the rcutorture.stall_cpu_block=1 module parameter causes
rcu_torture_stall() to invoke schedule_timeout_uninterruptible() within
an RCU read-side critical section.  This in turn results in a quiescent
state (which prevents the stall) and a sleep in an atomic context (which
produces the above splat).

Although this code is operating as designed, the design has proven to
be counterintuitive to many.  This commit therefore updates the description
in kernel-parameters.txt accordingly.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Joel Fernandes May 11, 2023, 5:47 a.m. UTC | #1
On Wed, May 10, 2023 at 10:12 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> From: Zqiang <qiang1.zhang@intel.com>
>
> If you build a kernel with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y,
> then run the rcutorture tests specifying stalls as follows:
>
> runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" \
>         bootparams="console=ttyS0 rcutorture.stall_cpu=30 \
>         rcutorture.stall_no_softlockup=1 rcutorture.stall_cpu_block=1" -d
>
> The tests will produce the following splat:
>
> [   10.841071] rcu-torture: rcu_torture_stall begin CPU stall
> [   10.841073] rcu_torture_stall start on CPU 3.
> [   10.841077] BUG: scheduling while atomic: rcu_torture_sta/66/0x0000000
> ....
> [   10.841108] Call Trace:
> [   10.841110]  <TASK>
> [   10.841112]  dump_stack_lvl+0x64/0xb0
> [   10.841118]  dump_stack+0x10/0x20
> [   10.841121]  __schedule_bug+0x8b/0xb0
> [   10.841126]  __schedule+0x2172/0x2940
> [   10.841157]  schedule+0x9b/0x150
> [   10.841160]  schedule_timeout+0x2e8/0x4f0
> [   10.841192]  schedule_timeout_uninterruptible+0x47/0x50
> [   10.841195]  rcu_torture_stall+0x2e8/0x300
> [   10.841199]  kthread+0x175/0x1a0
> [   10.841206]  ret_from_fork+0x2c/0x50

Another way to get rid of the warning would be to replace the
cur_ops->readlock() with rcu_read_lock(). Though perhaps that will not
test whether the particular RCU flavor under testing is capable of
causing a stall :-).

>         rcutorture.stall_cpu_block= [KNL]
>                         Sleep while stalling if set.  This will result
> -                       in warnings from preemptible RCU in addition
> -                       to any other stall-related activity.
> +                       in warnings from preemptible RCU in addition to
> +                       any other stall-related activity.  Note that
> +                       in kernels built with CONFIG_PREEMPTION=n and
> +                       CONFIG_PREEMPT_COUNT=y, this parameter will
> +                       cause the CPU to pass through a quiescent state.
> +                       Any such quiescent states will suppress RCU CPU
> +                       stall warnings, but the time-based sleep will
> +                       also result in scheduling-while-atomic splats.

Could change last part to "but may also result in
scheduling-while-atomic splats as preemption might be disabled for
certain RCU flavors in order to cause the stall".

> +                       Which might or might not be what you want.
> +

Suggest drop this line ;-).

 - Joel

>         rcutorture.stall_cpu_holdoff= [KNL]
>                         Time to wait (s) after boot before inducing stall.
> --
> 2.40.1
>
Paul E. McKenney May 11, 2023, 6:11 p.m. UTC | #2
On Wed, May 10, 2023 at 10:47:36PM -0700, Joel Fernandes wrote:
> On Wed, May 10, 2023 at 10:12 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > From: Zqiang <qiang1.zhang@intel.com>
> >
> > If you build a kernel with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y,
> > then run the rcutorture tests specifying stalls as follows:
> >
> > runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" \
> >         bootparams="console=ttyS0 rcutorture.stall_cpu=30 \
> >         rcutorture.stall_no_softlockup=1 rcutorture.stall_cpu_block=1" -d
> >
> > The tests will produce the following splat:
> >
> > [   10.841071] rcu-torture: rcu_torture_stall begin CPU stall
> > [   10.841073] rcu_torture_stall start on CPU 3.
> > [   10.841077] BUG: scheduling while atomic: rcu_torture_sta/66/0x0000000
> > ....
> > [   10.841108] Call Trace:
> > [   10.841110]  <TASK>
> > [   10.841112]  dump_stack_lvl+0x64/0xb0
> > [   10.841118]  dump_stack+0x10/0x20
> > [   10.841121]  __schedule_bug+0x8b/0xb0
> > [   10.841126]  __schedule+0x2172/0x2940
> > [   10.841157]  schedule+0x9b/0x150
> > [   10.841160]  schedule_timeout+0x2e8/0x4f0
> > [   10.841192]  schedule_timeout_uninterruptible+0x47/0x50
> > [   10.841195]  rcu_torture_stall+0x2e8/0x300
> > [   10.841199]  kthread+0x175/0x1a0
> > [   10.841206]  ret_from_fork+0x2c/0x50
> 
> Another way to get rid of the warning would be to replace the
> cur_ops->readlock() with rcu_read_lock(). Though perhaps that will not
> test whether the particular RCU flavor under testing is capable of
> causing a stall :-).

Exactly!

> >         rcutorture.stall_cpu_block= [KNL]
> >                         Sleep while stalling if set.  This will result
> > -                       in warnings from preemptible RCU in addition
> > -                       to any other stall-related activity.
> > +                       in warnings from preemptible RCU in addition to
> > +                       any other stall-related activity.  Note that
> > +                       in kernels built with CONFIG_PREEMPTION=n and
> > +                       CONFIG_PREEMPT_COUNT=y, this parameter will
> > +                       cause the CPU to pass through a quiescent state.
> > +                       Any such quiescent states will suppress RCU CPU
> > +                       stall warnings, but the time-based sleep will
> > +                       also result in scheduling-while-atomic splats.
> 
> Could change last part to "but may also result in
> scheduling-while-atomic splats as preemption might be disabled for
> certain RCU flavors in order to cause the stall".

Is that needed given the earlier "in kernels built with
CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y"?

> > +                       Which might or might not be what you want.
> > +
> 
> Suggest drop this line ;-).

OK, I will bite.  ;-)

What is your concern with this line?

							Thanx, Paul

>  - Joel
> 
> >         rcutorture.stall_cpu_holdoff= [KNL]
> >                         Time to wait (s) after boot before inducing stall.
> > --
> > 2.40.1
> >
Joel Fernandes May 12, 2023, 5 a.m. UTC | #3
On Thu, May 11, 2023 at 11:11 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, May 10, 2023 at 10:47:36PM -0700, Joel Fernandes wrote:
> > On Wed, May 10, 2023 at 10:12 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > From: Zqiang <qiang1.zhang@intel.com>
> > >
> > > If you build a kernel with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y,
> > > then run the rcutorture tests specifying stalls as follows:
> > >
> > > runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" \
> > >         bootparams="console=ttyS0 rcutorture.stall_cpu=30 \
> > >         rcutorture.stall_no_softlockup=1 rcutorture.stall_cpu_block=1" -d
> > >
> > > The tests will produce the following splat:
> > >
> > > [   10.841071] rcu-torture: rcu_torture_stall begin CPU stall
> > > [   10.841073] rcu_torture_stall start on CPU 3.
> > > [   10.841077] BUG: scheduling while atomic: rcu_torture_sta/66/0x0000000
> > > ....
> > > [   10.841108] Call Trace:
> > > [   10.841110]  <TASK>
> > > [   10.841112]  dump_stack_lvl+0x64/0xb0
> > > [   10.841118]  dump_stack+0x10/0x20
> > > [   10.841121]  __schedule_bug+0x8b/0xb0
> > > [   10.841126]  __schedule+0x2172/0x2940
> > > [   10.841157]  schedule+0x9b/0x150
> > > [   10.841160]  schedule_timeout+0x2e8/0x4f0
> > > [   10.841192]  schedule_timeout_uninterruptible+0x47/0x50
> > > [   10.841195]  rcu_torture_stall+0x2e8/0x300
> > > [   10.841199]  kthread+0x175/0x1a0
> > > [   10.841206]  ret_from_fork+0x2c/0x50
> >
> > Another way to get rid of the warning would be to replace the
> > cur_ops->readlock() with rcu_read_lock(). Though perhaps that will not
> > test whether the particular RCU flavor under testing is capable of
> > causing a stall :-).
>
> Exactly!
>
> > >         rcutorture.stall_cpu_block= [KNL]
> > >                         Sleep while stalling if set.  This will result
> > > -                       in warnings from preemptible RCU in addition
> > > -                       to any other stall-related activity.
> > > +                       in warnings from preemptible RCU in addition to
> > > +                       any other stall-related activity.  Note that
> > > +                       in kernels built with CONFIG_PREEMPTION=n and
> > > +                       CONFIG_PREEMPT_COUNT=y, this parameter will
> > > +                       cause the CPU to pass through a quiescent state.
> > > +                       Any such quiescent states will suppress RCU CPU
> > > +                       stall warnings, but the time-based sleep will
> > > +                       also result in scheduling-while-atomic splats.
> >
> > Could change last part to "but may also result in
> > scheduling-while-atomic splats as preemption might be disabled for
> > certain RCU flavors in order to cause the stall".
>
> Is that needed given the earlier "in kernels built with
> CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y"?

Hmm, I guess is not clear to the reader without code reading about why
preempt got disabled. So I would add that last part I mentioned, but I
am Ok either way, it is just a suggestion.

>
> > > +                       Which might or might not be what you want.
> > > +
> >
> > Suggest drop this line ;-).
>
> OK, I will bite.  ;-)
>
> What is your concern with this line?

It is not needed IMO.

thanks,

 - Joel


> > >         rcutorture.stall_cpu_holdoff= [KNL]
> > >                         Time to wait (s) after boot before inducing stall.
> > > --
> > > 2.40.1
> > >
Paul E. McKenney May 15, 2023, 6:04 p.m. UTC | #4
On Thu, May 11, 2023 at 10:00:18PM -0700, Joel Fernandes wrote:
> On Thu, May 11, 2023 at 11:11 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, May 10, 2023 at 10:47:36PM -0700, Joel Fernandes wrote:
> > > On Wed, May 10, 2023 at 10:12 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > From: Zqiang <qiang1.zhang@intel.com>
> > > >
> > > > If you build a kernel with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y,
> > > > then run the rcutorture tests specifying stalls as follows:
> > > >
> > > > runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" \
> > > >         bootparams="console=ttyS0 rcutorture.stall_cpu=30 \
> > > >         rcutorture.stall_no_softlockup=1 rcutorture.stall_cpu_block=1" -d
> > > >
> > > > The tests will produce the following splat:
> > > >
> > > > [   10.841071] rcu-torture: rcu_torture_stall begin CPU stall
> > > > [   10.841073] rcu_torture_stall start on CPU 3.
> > > > [   10.841077] BUG: scheduling while atomic: rcu_torture_sta/66/0x0000000
> > > > ....
> > > > [   10.841108] Call Trace:
> > > > [   10.841110]  <TASK>
> > > > [   10.841112]  dump_stack_lvl+0x64/0xb0
> > > > [   10.841118]  dump_stack+0x10/0x20
> > > > [   10.841121]  __schedule_bug+0x8b/0xb0
> > > > [   10.841126]  __schedule+0x2172/0x2940
> > > > [   10.841157]  schedule+0x9b/0x150
> > > > [   10.841160]  schedule_timeout+0x2e8/0x4f0
> > > > [   10.841192]  schedule_timeout_uninterruptible+0x47/0x50
> > > > [   10.841195]  rcu_torture_stall+0x2e8/0x300
> > > > [   10.841199]  kthread+0x175/0x1a0
> > > > [   10.841206]  ret_from_fork+0x2c/0x50
> > >
> > > Another way to get rid of the warning would be to replace the
> > > cur_ops->readlock() with rcu_read_lock(). Though perhaps that will not
> > > test whether the particular RCU flavor under testing is capable of
> > > causing a stall :-).
> >
> > Exactly!
> >
> > > >         rcutorture.stall_cpu_block= [KNL]
> > > >                         Sleep while stalling if set.  This will result
> > > > -                       in warnings from preemptible RCU in addition
> > > > -                       to any other stall-related activity.
> > > > +                       in warnings from preemptible RCU in addition to
> > > > +                       any other stall-related activity.  Note that
> > > > +                       in kernels built with CONFIG_PREEMPTION=n and
> > > > +                       CONFIG_PREEMPT_COUNT=y, this parameter will
> > > > +                       cause the CPU to pass through a quiescent state.
> > > > +                       Any such quiescent states will suppress RCU CPU
> > > > +                       stall warnings, but the time-based sleep will
> > > > +                       also result in scheduling-while-atomic splats.
> > >
> > > Could change last part to "but may also result in
> > > scheduling-while-atomic splats as preemption might be disabled for
> > > certain RCU flavors in order to cause the stall".
> >
> > Is that needed given the earlier "in kernels built with
> > CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y"?
> 
> Hmm, I guess is not clear to the reader without code reading about why
> preempt got disabled. So I would add that last part I mentioned, but I
> am Ok either way, it is just a suggestion.

I will figure something out to more tightly tie this to the previous
CONFIG_PREEMPTION=n.

> > > > +                       Which might or might not be what you want.
> > > > +
> > >
> > > Suggest drop this line ;-).
> >
> > OK, I will bite.  ;-)
> >
> > What is your concern with this line?
> 
> It is not needed IMO.

It actually is, otherwise the various testing services complain about
getting splats.  I will upgrade it to something more explicit.

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > > >         rcutorture.stall_cpu_holdoff= [KNL]
> > > >                         Time to wait (s) after boot before inducing stall.
> > > > --
> > > > 2.40.1
> > > >
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..eaffe0f8771d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5087,8 +5087,16 @@ 
 
 	rcutorture.stall_cpu_block= [KNL]
 			Sleep while stalling if set.  This will result
-			in warnings from preemptible RCU in addition
-			to any other stall-related activity.
+			in warnings from preemptible RCU in addition to
+			any other stall-related activity.  Note that
+			in kernels built with CONFIG_PREEMPTION=n and
+			CONFIG_PREEMPT_COUNT=y, this parameter will
+			cause the CPU to pass through a quiescent state.
+			Any such quiescent states will suppress RCU CPU
+			stall warnings, but the time-based sleep will
+			also result in scheduling-while-atomic splats.
+			Which might or might not be what you want.
+
 
 	rcutorture.stall_cpu_holdoff= [KNL]
 			Time to wait (s) after boot before inducing stall.