diff mbox series

[PATCHv2,3/3] rcu: coordinate tick dependency during concurrent offlining

Message ID 20220915055825.21525-4-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series rcu: Enhance the capability to cope with concurrent cpu offlining/onlining | expand

Commit Message

Pingfan Liu Sept. 15, 2022, 5:58 a.m. UTC
As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
atomic operations, but the problem is that if there are multiple
nohz_full CPUs going offline concurrently, the first CPU to invoke
rcutree_dead_cpu() will turn the tick off.  This might require an
atomically manipulated counter to mediate the calls to
rcutree_dead_cpu(). "

This patch introduces a new member ->dying to rcu_node, which reflects
the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
the first entrance and cleared by the last.

Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
no lock, no extra locking order is introduced.

Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/tree.c | 19 ++++++++++++++-----
 kernel/rcu/tree.h |  1 +
 2 files changed, 15 insertions(+), 5 deletions(-)

Comments

Frederic Weisbecker Sept. 16, 2022, 1:42 p.m. UTC | #1
On Thu, Sep 15, 2022 at 01:58:25PM +0800, Pingfan Liu wrote:
> As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
> atomic operations, but the problem is that if there are multiple
> nohz_full CPUs going offline concurrently, the first CPU to invoke
> rcutree_dead_cpu() will turn the tick off.  This might require an
> atomically manipulated counter to mediate the calls to
> rcutree_dead_cpu(). "
> 
> This patch introduces a new member ->dying to rcu_node, which reflects
> the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
> the first entrance and cleared by the last.
> 
> Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
> no lock, no extra locking order is introduced.
> 
> Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c | 19 ++++++++++++++-----
>  kernel/rcu/tree.h |  1 +
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8a829b64f5b2..f8bd0fc5fd2f 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2164,13 +2164,19 @@ int rcutree_dead_cpu(unsigned int cpu)
>  {
>  	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> +	unsigned long flags;
> +	u8 dying;
>  
>  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
>  		return 0;
>  
>  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> -	// Stop-machine done, so allow nohz_full to disable tick.
> -	tick_dep_clear(TICK_DEP_BIT_RCU);
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +	dying = --rnp->dying;
> +	if (!dying)
> +		// Stop-machine done, so allow nohz_full to disable tick.
> +		tick_dep_clear(TICK_DEP_BIT_RCU);
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);

Note this is only locking the rdp's node, not the root node.
Therefore if CPU 0 and CPU 256 are going off at the same time and they
don't belong to the same node, the above won't protect against concurrent
TICK_DEP_BIT_RCU set/clear.

My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
anymore. I believe it was there because of issues that were fixed with:

	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
and:

	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)

It's unfortunately just suspicion because the reason for that tick dependency
is unclear but I believe it should be safe to remove now.

Thanks.
Pingfan Liu Sept. 20, 2022, 7:26 a.m. UTC | #2
On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> On Thu, Sep 15, 2022 at 01:58:25PM +0800, Pingfan Liu wrote:
> > As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
> > atomic operations, but the problem is that if there are multiple
> > nohz_full CPUs going offline concurrently, the first CPU to invoke
> > rcutree_dead_cpu() will turn the tick off.  This might require an
> > atomically manipulated counter to mediate the calls to
> > rcutree_dead_cpu(). "
> > 
> > This patch introduces a new member ->dying to rcu_node, which reflects
> > the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
> > the first entrance and cleared by the last.
> > 
> > Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
> > no lock, no extra locking order is introduced.
> > 
> > Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c | 19 ++++++++++++++-----
> >  kernel/rcu/tree.h |  1 +
> >  2 files changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8a829b64f5b2..f8bd0fc5fd2f 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2164,13 +2164,19 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  {
> >  	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > +	unsigned long flags;
> > +	u8 dying;
> >  
> >  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> >  		return 0;
> >  
> >  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > -	// Stop-machine done, so allow nohz_full to disable tick.
> > -	tick_dep_clear(TICK_DEP_BIT_RCU);
> > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > +	dying = --rnp->dying;
> > +	if (!dying)
> > +		// Stop-machine done, so allow nohz_full to disable tick.
> > +		tick_dep_clear(TICK_DEP_BIT_RCU);
> > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> 
> Note this is only locking the rdp's node, not the root node.
> Therefore if CPU 0 and CPU 256 are going off at the same time and they
> don't belong to the same node, the above won't protect against concurrent
> TICK_DEP_BIT_RCU set/clear.
> 

Nice, thanks for the careful thoughts. How about moving the counting
place to the root node?

> My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> anymore. I believe it was there because of issues that were fixed with:
> 
> 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> and:
> 
> 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> 
> It's unfortunately just suspicion because the reason for that tick dependency
> is unclear but I believe it should be safe to remove now.
> 

I have gone through this tick dependency again, but got less.

I think at least from the RCU's viewpoint, it is useless since
multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
requirement for tick interrupt.


Is there a way to have a convincing test so that these code can be removed?
Or this code will be got along with?


Thanks,

	Pingfan
Frederic Weisbecker Sept. 20, 2022, 9:46 a.m. UTC | #3
On Tue, Sep 20, 2022 at 03:26:28PM +0800, Pingfan Liu wrote:
> On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> > Note this is only locking the rdp's node, not the root node.
> > Therefore if CPU 0 and CPU 256 are going off at the same time and they
> > don't belong to the same node, the above won't protect against concurrent
> > TICK_DEP_BIT_RCU set/clear.
> > 
> 
> Nice, thanks for the careful thoughts. How about moving the counting
> place to the root node?

You could but then you'd need to lock the root node.

> 
> > My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> > anymore. I believe it was there because of issues that were fixed with:
> > 
> > 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> > and:
> > 
> > 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> > 
> > It's unfortunately just suspicion because the reason for that tick dependency
> > is unclear but I believe it should be safe to remove now.
> > 
> 
> I have gone through this tick dependency again, but got less.
> 
> I think at least from the RCU's viewpoint, it is useless since
> multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
> requirement for tick interrupt.

Partly yes.

> Is there a way to have a convincing test so that these code can be removed?
> Or this code will be got along with?

Hmm, Paul might remember which rcutorture scenario would trigger it?
Paul E. McKenney Sept. 20, 2022, 7:13 p.m. UTC | #4
On Tue, Sep 20, 2022 at 11:46:45AM +0200, Frederic Weisbecker wrote:
> On Tue, Sep 20, 2022 at 03:26:28PM +0800, Pingfan Liu wrote:
> > On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> > > Note this is only locking the rdp's node, not the root node.
> > > Therefore if CPU 0 and CPU 256 are going off at the same time and they
> > > don't belong to the same node, the above won't protect against concurrent
> > > TICK_DEP_BIT_RCU set/clear.
> > > 
> > 
> > Nice, thanks for the careful thoughts. How about moving the counting
> > place to the root node?
> 
> You could but then you'd need to lock the root node.
> 
> > > My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> > > anymore. I believe it was there because of issues that were fixed with:
> > > 
> > > 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> > > and:
> > > 
> > > 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> > > 
> > > It's unfortunately just suspicion because the reason for that tick dependency
> > > is unclear but I believe it should be safe to remove now.
> > > 
> > 
> > I have gone through this tick dependency again, but got less.
> > 
> > I think at least from the RCU's viewpoint, it is useless since
> > multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
> > requirement for tick interrupt.
> 
> Partly yes.
> 
> > Is there a way to have a convincing test so that these code can be removed?
> > Or this code will be got along with?
> 
> Hmm, Paul might remember which rcutorture scenario would trigger it?

TREE04 on multisocket systems, preferably with faster CPU-hotplug
operations.  This can be accomplished by adding this to the kvm.sh
command line:

	rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30

It does take some time to run.  I did 4,000 hours worth of TREE04
to confirm lack of bug.  But an 80-CPU dual-socket system can run
10 concurrent instances of TREE04, which gets things down to a more
manageable 400 hours.  Please let me know if you don't have access
to a few such systems.

I will let Frederic identify which commit(s) should be reverted in
order to test the test.

							Thanx, Paul
Pingfan Liu Sept. 22, 2022, 9:29 a.m. UTC | #5
On Tue, Sep 20, 2022 at 12:13:39PM -0700, Paul E. McKenney wrote:
> On Tue, Sep 20, 2022 at 11:46:45AM +0200, Frederic Weisbecker wrote:
> > On Tue, Sep 20, 2022 at 03:26:28PM +0800, Pingfan Liu wrote:
> > > On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> > > > Note this is only locking the rdp's node, not the root node.
> > > > Therefore if CPU 0 and CPU 256 are going off at the same time and they
> > > > don't belong to the same node, the above won't protect against concurrent
> > > > TICK_DEP_BIT_RCU set/clear.
> > > > 
> > > 
> > > Nice, thanks for the careful thoughts. How about moving the counting
> > > place to the root node?
> > 
> > You could but then you'd need to lock the root node.
> > 
> > > > My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> > > > anymore. I believe it was there because of issues that were fixed with:
> > > > 
> > > > 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> > > > and:
> > > > 
> > > > 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> > > > 
> > > > It's unfortunately just suspicion because the reason for that tick dependency
> > > > is unclear but I believe it should be safe to remove now.
> > > > 
> > > 
> > > I have gone through this tick dependency again, but got less.
> > > 
> > > I think at least from the RCU's viewpoint, it is useless since
> > > multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
> > > requirement for tick interrupt.
> > 
> > Partly yes.
> > 
> > > Is there a way to have a convincing test so that these code can be removed?
> > > Or this code will be got along with?
> > 
> > Hmm, Paul might remember which rcutorture scenario would trigger it?
> 
> TREE04 on multisocket systems, preferably with faster CPU-hotplug
> operations.  This can be accomplished by adding this to the kvm.sh
> command line:
> 
> 	rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
> 

Is it ok with "sh kvm.sh  --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs TREE04"

> It does take some time to run.  I did 4,000 hours worth of TREE04
                                        ^^^ '--duration=4000h' can serve this purpose?
Is it related with the cpu's freq?

> to confirm lack of bug.  But an 80-CPU dual-socket system can run
> 10 concurrent instances of TREE04, which gets things down to a more

The total demanded hours H = 4000/(system_cpu_num/8)?

> manageable 400 hours.  Please let me know if you don't have access
> to a few such systems.
>

I am happy to have a try if needed. I will try to get a powerful
machine, which can shrink the test time.

> I will let Frederic identify which commit(s) should be reverted in
> order to test the test.
> 

My understanding is after removing the tick dep by
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345..cbfc884f04a4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
        WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
        /* Adjust any no-longer-needed kthreads. */
        rcu_boost_kthread_setaffinity(rnp, -1);
-       // Stop-machine done, so allow nohz_full to disable tick.
-       tick_dep_clear(TICK_DEP_BIT_RCU);
        return 0;
 }

@@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
        sync_sched_exp_online_cleanup(cpu);
        rcutree_affinity_setting(cpu, -1);

-       // Stop-machine done, so allow nohz_full to disable tick.
-       tick_dep_clear(TICK_DEP_BIT_RCU);
        return 0;
 }

@@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)

        rcutree_affinity_setting(cpu, cpu);

-       // nohz_full CPUs need the tick for stop-machine to work quickly
-       tick_dep_set(TICK_DEP_BIT_RCU);
        return 0;
 }

If the TREE04 can success, then move on to revert the commit(s)
identified by Frederic, and do test again.

At this time, a TREE04 failure is expected.

If the above two results are observed, TICK_DEP_BIT_RCU can be
removed.

Is my understanding right?


Thanks,

	Pingfan
Paul E. McKenney Sept. 22, 2022, 1:54 p.m. UTC | #6
On Thu, Sep 22, 2022 at 05:29:32PM +0800, Pingfan Liu wrote:
> On Tue, Sep 20, 2022 at 12:13:39PM -0700, Paul E. McKenney wrote:
> > On Tue, Sep 20, 2022 at 11:46:45AM +0200, Frederic Weisbecker wrote:
> > > On Tue, Sep 20, 2022 at 03:26:28PM +0800, Pingfan Liu wrote:
> > > > On Fri, Sep 16, 2022 at 03:42:58PM +0200, Frederic Weisbecker wrote:
> > > > > Note this is only locking the rdp's node, not the root node.
> > > > > Therefore if CPU 0 and CPU 256 are going off at the same time and they
> > > > > don't belong to the same node, the above won't protect against concurrent
> > > > > TICK_DEP_BIT_RCU set/clear.
> > > > > 
> > > > 
> > > > Nice, thanks for the careful thoughts. How about moving the counting
> > > > place to the root node?
> > > 
> > > You could but then you'd need to lock the root node.
> > > 
> > > > > My suspicion is that we don't need this TICK_DEP_BIT_RCU tick dependency
> > > > > anymore. I believe it was there because of issues that were fixed with:
> > > > > 
> > > > > 	53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
> > > > > and:
> > > > > 
> > > > > 	a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)
> > > > > 
> > > > > It's unfortunately just suspicion because the reason for that tick dependency
> > > > > is unclear but I believe it should be safe to remove now.
> > > > > 
> > > > 
> > > > I have gone through this tick dependency again, but got less.
> > > > 
> > > > I think at least from the RCU's viewpoint, it is useless since
> > > > multi_cpu_stop()->rcu_momentary_dyntick_idle() has eliminate the
> > > > requirement for tick interrupt.
> > > 
> > > Partly yes.
> > > 
> > > > Is there a way to have a convincing test so that these code can be removed?
> > > > Or this code will be got along with?
> > > 
> > > Hmm, Paul might remember which rcutorture scenario would trigger it?
> > 
> > TREE04 on multisocket systems, preferably with faster CPU-hotplug
> > operations.  This can be accomplished by adding this to the kvm.sh
> > command line:
> > 
> > 	rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
> > 
> 
> Is it ok with "sh kvm.sh  --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs TREE04"

If you have tools/.../rcutorture/bin on your path, yes.  This would default
to a 30-minute run.  If you have at least 16 CPUs, you should add
"--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
do this:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"

This would run four concurrent instances of the TREE04 scenario, each for
10 hours, for a total of 40 hours of test time.

> > It does take some time to run.  I did 4,000 hours worth of TREE04
>                                         ^^^ '--duration=4000h' can serve this purpose?

You could, at least if you replace the "=" with a space character, but
that really would run a six-month test, which is probably not what you
want to do.  There being 8,760 hours in a year and all that.

> Is it related with the cpu's freq?

Not at all.  '--duration 10h' would run ten hours of wall-clock time
regardless of the CPU frequencies.

> > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > 10 concurrent instances of TREE04, which gets things down to a more
> 
> The total demanded hours H = 4000/(system_cpu_num/8)?

Yes.  You can also use multiple systems, which is what kvm-remote.sh is
intended for, again assuming 80 CPUs per system to keep the arithmetic
simple:

tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"

Here "sys1" is the name of the system, on which you must have an account
so that "ssh sys1 date" runs the date command on the first remote system.
You really do have to use the "--cpus 80" because kvm-remote.sh does not
assume that the system that it is running on is one of the test systems.

> > manageable 400 hours.  Please let me know if you don't have access
> > to a few such systems.
> 
> I am happy to have a try if needed. I will try to get a powerful
> machine, which can shrink the test time.

Larger numbers of little systems work, also, but in my experience you need
a dual-socket system to have a reasonable chance of reproducing this bug.
Each socket can be small, though, if that helps.

If you work for a cloud provider or some such, you can probably get a
large number of systems.  If you can only get a few, you can do initial
testing, and then we can work out what to do about heavier-duty testing.

> > I will let Frederic identify which commit(s) should be reverted in
> > order to test the test.
> > 
> 
> My understanding is after removing the tick dep by
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..cbfc884f04a4 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
>         WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
>         /* Adjust any no-longer-needed kthreads. */
>         rcu_boost_kthread_setaffinity(rnp, -1);
> -       // Stop-machine done, so allow nohz_full to disable tick.
> -       tick_dep_clear(TICK_DEP_BIT_RCU);
>         return 0;
>  }
> 
> @@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
>         sync_sched_exp_online_cleanup(cpu);
>         rcutree_affinity_setting(cpu, -1);
> 
> -       // Stop-machine done, so allow nohz_full to disable tick.
> -       tick_dep_clear(TICK_DEP_BIT_RCU);
>         return 0;
>  }
> 
> @@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> 
>         rcutree_affinity_setting(cpu, cpu);
> 
> -       // nohz_full CPUs need the tick for stop-machine to work quickly
> -       tick_dep_set(TICK_DEP_BIT_RCU);
>         return 0;
>  }
> 
> If the TREE04 can success, then move on to revert the commit(s)
> identified by Frederic, and do test again.
> 
> At this time, a TREE04 failure is expected.
> 
> If the above two results are observed, TICK_DEP_BIT_RCU can be
> removed.
> 
> Is my understanding right?

Seems plausible to me, but I again defer to Frederic.

							Thanx, Paul
Frederic Weisbecker Sept. 23, 2022, 10:13 p.m. UTC | #7
On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> On Thu, Sep 22, 2022 at 05:29:32PM +0800, Pingfan Liu wrote:
> > My understanding is after removing the tick dep by
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..cbfc884f04a4 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> >         WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> >         /* Adjust any no-longer-needed kthreads. */
> >         rcu_boost_kthread_setaffinity(rnp, -1);
> > -       // Stop-machine done, so allow nohz_full to disable tick.
> > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > @@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
> >         sync_sched_exp_online_cleanup(cpu);
> >         rcutree_affinity_setting(cpu, -1);
> > 
> > -       // Stop-machine done, so allow nohz_full to disable tick.
> > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > @@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > 
> >         rcutree_affinity_setting(cpu, cpu);
> > 
> > -       // nohz_full CPUs need the tick for stop-machine to work quickly
> > -       tick_dep_set(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > If the TREE04 can success, then move on to revert the commit(s)
> > identified by Frederic, and do test again.
> > 
> > At this time, a TREE04 failure is expected.
> > 
> > If the above two results are observed, TICK_DEP_BIT_RCU can be
> > removed.
> > 
> > Is my understanding right?
> 
> Seems plausible to me, but I again defer to Frederic.

Right, more precisely the commit to revert is:

     96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")

which is the diff Pingfan Liu proposed above. The other uses of TICK_DEP_BIT_RCU
will still be needed though.

Thanks.
Pingfan Liu Sept. 26, 2022, 6:34 a.m. UTC | #8
Sorry to reply late. I just realize this e-mail misses in my gmail.

On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
[...]
> 
> If you have tools/.../rcutorture/bin on your path, yes.  This would default
> to a 30-minute run.  If you have at least 16 CPUs, you should add
                                            ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8

> "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> do this:
> 
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> 

I have tried to find a two socket system with 128 cpus and run
  sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04

Where 250*16=4000


> This would run four concurrent instances of the TREE04 scenario, each for
> 10 hours, for a total of 40 hours of test time.
> 
> > > It does take some time to run.  I did 4,000 hours worth of TREE04
> >                                         ^^^ '--duration=4000h' can serve this purpose?
> 
> You could, at least if you replace the "=" with a space character, but
> that really would run a six-month test, which is probably not what you
> want to do.  There being 8,760 hours in a year and all that.
> 
> > Is it related with the cpu's freq?
> 
> Not at all.  '--duration 10h' would run ten hours of wall-clock time
> regardless of the CPU frequencies.
> 
> > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > 10 concurrent instances of TREE04, which gets things down to a more
> > 
> > The total demanded hours H = 4000/(system_cpu_num/8)?
> 
> Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> intended for, again assuming 80 CPUs per system to keep the arithmetic
> simple:
> 
> tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> 

That is appealing.

I will see if any opportunity to grasp a batch of machines to run the
test.


Thanks,

	Pingfan
> Here "sys1" is the name of the system, on which you must have an account
> so that "ssh sys1 date" runs the date command on the first remote system.
> You really do have to use the "--cpus 80" because kvm-remote.sh does not
> assume that the system that it is running on is one of the test systems.
> 
> > > manageable 400 hours.  Please let me know if you don't have access
> > > to a few such systems.
> > 
> > I am happy to have a try if needed. I will try to get a powerful
> > machine, which can shrink the test time.
> 
> Larger numbers of little systems work, also, but in my experience you need
> a dual-socket system to have a reasonable chance of reproducing this bug.
> Each socket can be small, though, if that helps.
> 
> If you work for a cloud provider or some such, you can probably get a
> large number of systems.  If you can only get a few, you can do initial
> testing, and then we can work out what to do about heavier-duty testing.
> 
> > > I will let Frederic identify which commit(s) should be reverted in
> > > order to test the test.
> > > 
> > 
> > My understanding is after removing the tick dep by
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..cbfc884f04a4 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> >         WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> >         /* Adjust any no-longer-needed kthreads. */
> >         rcu_boost_kthread_setaffinity(rnp, -1);
> > -       // Stop-machine done, so allow nohz_full to disable tick.
> > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > @@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
> >         sync_sched_exp_online_cleanup(cpu);
> >         rcutree_affinity_setting(cpu, -1);
> > 
> > -       // Stop-machine done, so allow nohz_full to disable tick.
> > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > @@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > 
> >         rcutree_affinity_setting(cpu, cpu);
> > 
> > -       // nohz_full CPUs need the tick for stop-machine to work quickly
> > -       tick_dep_set(TICK_DEP_BIT_RCU);
> >         return 0;
> >  }
> > 
> > If the TREE04 can success, then move on to revert the commit(s)
> > identified by Frederic, and do test again.
> > 
> > At this time, a TREE04 failure is expected.
> > 
> > If the above two results are observed, TICK_DEP_BIT_RCU can be
> > removed.
> > 
> > Is my understanding right?
> 
> Seems plausible to me, but I again defer to Frederic.
> 
> 							Thanx, Paul
Joel Fernandes Sept. 26, 2022, 4:13 p.m. UTC | #9
On Thu, Sep 15, 2022 at 01:58:25PM +0800, Pingfan Liu wrote:
> As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
> atomic operations, but the problem is that if there are multiple
> nohz_full CPUs going offline concurrently, the first CPU to invoke
> rcutree_dead_cpu() will turn the tick off.  This might require an
> atomically manipulated counter to mediate the calls to
> rcutree_dead_cpu(). "
> 
> This patch introduces a new member ->dying to rcu_node, which reflects
> the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
> the first entrance and cleared by the last.
> 
> Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
> no lock, no extra locking order is introduced.
> 
> Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c | 19 ++++++++++++++-----
>  kernel/rcu/tree.h |  1 +
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8a829b64f5b2..f8bd0fc5fd2f 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2164,13 +2164,19 @@ int rcutree_dead_cpu(unsigned int cpu)
>  {
>  	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> +	unsigned long flags;
> +	u8 dying;
>  
>  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
>  		return 0;
>  
>  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> -	// Stop-machine done, so allow nohz_full to disable tick.
> -	tick_dep_clear(TICK_DEP_BIT_RCU);
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +	dying = --rnp->dying;
> +	if (!dying)
> +		// Stop-machine done, so allow nohz_full to disable tick.
> +		tick_dep_clear(TICK_DEP_BIT_RCU);
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  	return 0;
>  }
>  
> @@ -4020,17 +4026,20 @@ int rcutree_offline_cpu(unsigned int cpu)
>  	unsigned long flags;
>  	struct rcu_data *rdp;
>  	struct rcu_node *rnp;
> +	u8 dying;
>  
>  	rdp = per_cpu_ptr(&rcu_data, cpu);
>  	rnp = rdp->mynode;
>  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  	rnp->ffmask &= ~rdp->grpmask;

Just to ensure the first increment sets the tick dep and the last decrement
resets it, it would be nice to add a check here:

WARN_ON_ONCE(!rnp->dying && tick_dep_test(TICK_DEP_BIT_RCU));

And correpondingly on the tick decrement:
WARN_ON_ONCE(rnp->dying > 0 && !tick_dep_test(TICK_DEP_BIT_RCU));

Of course that will require adding a new API: tick_dep_test, but might be
worth it.

(I think this should catch concurrency bugs such as involving the rnp lock
that Frederic pointed out).

thanks,

 - Joel



> +	/* Let rcutree_dead_cpu() know a new offlining. */
> +	dying = rnp->dying++;
> +	if (!dying)
> +		// nohz_full CPUs need the tick for stop-machine to work quickly
> +		tick_dep_set(TICK_DEP_BIT_RCU);



>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  
>  	rcutree_affinity_setting(cpu, cpu);
> -
> -	// nohz_full CPUs need the tick for stop-machine to work quickly
> -	tick_dep_set(TICK_DEP_BIT_RCU);
>  	return 0;
>  }
>  
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index d4a97e40ea9c..b508a12ac953 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -81,6 +81,7 @@ struct rcu_node {
>  	int	grphi;		/* highest-numbered CPU here. */
>  	u8	grpnum;		/* group number for next level up. */
>  	u8	level;		/* root is at level 0. */
> +	u8	dying;		/* num of concurrent rdp offlining */
>  	bool	wait_blkd_tasks;/* Necessary to wait for blocked tasks to */
>  				/*  exit RCU read-side critical sections */
>  				/*  before propagating offline up the */
> -- 
> 2.31.1
>
Paul E. McKenney Sept. 26, 2022, 10:23 p.m. UTC | #10
On Mon, Sep 26, 2022 at 02:34:17PM +0800, Pingfan Liu wrote:
> Sorry to reply late. I just realize this e-mail misses in my gmail.
> 
> On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> [...]
> > 
> > If you have tools/.../rcutorture/bin on your path, yes.  This would default
> > to a 30-minute run.  If you have at least 16 CPUs, you should add
>                                             ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8

Yes, you will get some benefit from --allcpus on systems with from 9-15
CPUs as well as for 16 and more.  At 8 CPUs, it wouldn't matter.

> > "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> > do this:
> > 
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> > 
> 
> I have tried to find a two socket system with 128 cpus and run
>   sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04
> 
> Where 250*16=4000

That would work.

> > This would run four concurrent instances of the TREE04 scenario, each for
> > 10 hours, for a total of 40 hours of test time.
> > 
> > > > It does take some time to run.  I did 4,000 hours worth of TREE04
> > >                                         ^^^ '--duration=4000h' can serve this purpose?
> > 
> > You could, at least if you replace the "=" with a space character, but
> > that really would run a six-month test, which is probably not what you
> > want to do.  There being 8,760 hours in a year and all that.
> > 
> > > Is it related with the cpu's freq?
> > 
> > Not at all.  '--duration 10h' would run ten hours of wall-clock time
> > regardless of the CPU frequencies.
> > 
> > > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > > 10 concurrent instances of TREE04, which gets things down to a more
> > > 
> > > The total demanded hours H = 4000/(system_cpu_num/8)?
> > 
> > Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> > intended for, again assuming 80 CPUs per system to keep the arithmetic
> > simple:
> > 
> > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> > 
> 
> That is appealing.
> 
> I will see if any opportunity to grasp a batch of machines to run the
> test.

Initial tests with smaller numbers of CPUs are also useful, for example,
in case reversion causes some bug due to bad interaction with a later
commit.

Please let me know how it goes!

							Thanx, Paul

> Thanks,
> 
> 	Pingfan
> > Here "sys1" is the name of the system, on which you must have an account
> > so that "ssh sys1 date" runs the date command on the first remote system.
> > You really do have to use the "--cpus 80" because kvm-remote.sh does not
> > assume that the system that it is running on is one of the test systems.
> > 
> > > > manageable 400 hours.  Please let me know if you don't have access
> > > > to a few such systems.
> > > 
> > > I am happy to have a try if needed. I will try to get a powerful
> > > machine, which can shrink the test time.
> > 
> > Larger numbers of little systems work, also, but in my experience you need
> > a dual-socket system to have a reasonable chance of reproducing this bug.
> > Each socket can be small, though, if that helps.
> > 
> > If you work for a cloud provider or some such, you can probably get a
> > large number of systems.  If you can only get a few, you can do initial
> > testing, and then we can work out what to do about heavier-duty testing.
> > 
> > > > I will let Frederic identify which commit(s) should be reverted in
> > > > order to test the test.
> > > > 
> > > 
> > > My understanding is after removing the tick dep by
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 79aea7df4345..cbfc884f04a4 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2171,8 +2171,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> > >         WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > >         /* Adjust any no-longer-needed kthreads. */
> > >         rcu_boost_kthread_setaffinity(rnp, -1);
> > > -       // Stop-machine done, so allow nohz_full to disable tick.
> > > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> > >         return 0;
> > >  }
> > > 
> > > @@ -4008,8 +4006,6 @@ int rcutree_online_cpu(unsigned int cpu)
> > >         sync_sched_exp_online_cleanup(cpu);
> > >         rcutree_affinity_setting(cpu, -1);
> > > 
> > > -       // Stop-machine done, so allow nohz_full to disable tick.
> > > -       tick_dep_clear(TICK_DEP_BIT_RCU);
> > >         return 0;
> > >  }
> > > 
> > > @@ -4031,8 +4027,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > 
> > >         rcutree_affinity_setting(cpu, cpu);
> > > 
> > > -       // nohz_full CPUs need the tick for stop-machine to work quickly
> > > -       tick_dep_set(TICK_DEP_BIT_RCU);
> > >         return 0;
> > >  }
> > > 
> > > If the TREE04 can success, then move on to revert the commit(s)
> > > identified by Frederic, and do test again.
> > > 
> > > At this time, a TREE04 failure is expected.
> > > 
> > > If the above two results are observed, TICK_DEP_BIT_RCU can be
> > > removed.
> > > 
> > > Is my understanding right?
> > 
> > Seems plausible to me, but I again defer to Frederic.
> > 
> > 							Thanx, Paul
Pingfan Liu Sept. 27, 2022, 9:42 a.m. UTC | #11
On Mon, Sep 26, 2022 at 04:13:42PM +0000, Joel Fernandes wrote:
> On Thu, Sep 15, 2022 at 01:58:25PM +0800, Pingfan Liu wrote:
> > As Paul pointed out  "The tick_dep_clear() is SMP-safe because it uses
> > atomic operations, but the problem is that if there are multiple
> > nohz_full CPUs going offline concurrently, the first CPU to invoke
> > rcutree_dead_cpu() will turn the tick off.  This might require an
> > atomically manipulated counter to mediate the calls to
> > rcutree_dead_cpu(). "
> > 
> > This patch introduces a new member ->dying to rcu_node, which reflects
> > the number of concurrent offlining cpu. TICK_DEP_BIT_RCU is set by
> > the first entrance and cleared by the last.
> > 
> > Note: now, tick_dep_set() is put under the rnp->lock, but since it takes
> > no lock, no extra locking order is introduced.
> > 
> > Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c | 19 ++++++++++++++-----
> >  kernel/rcu/tree.h |  1 +
> >  2 files changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8a829b64f5b2..f8bd0fc5fd2f 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2164,13 +2164,19 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  {
> >  	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >  	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > +	unsigned long flags;
> > +	u8 dying;
> >  
> >  	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> >  		return 0;
> >  
> >  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > -	// Stop-machine done, so allow nohz_full to disable tick.
> > -	tick_dep_clear(TICK_DEP_BIT_RCU);
> > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > +	dying = --rnp->dying;
> > +	if (!dying)
> > +		// Stop-machine done, so allow nohz_full to disable tick.
> > +		tick_dep_clear(TICK_DEP_BIT_RCU);
> > +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >  	return 0;
> >  }
> >  
> > @@ -4020,17 +4026,20 @@ int rcutree_offline_cpu(unsigned int cpu)
> >  	unsigned long flags;
> >  	struct rcu_data *rdp;
> >  	struct rcu_node *rnp;
> > +	u8 dying;
> >  
> >  	rdp = per_cpu_ptr(&rcu_data, cpu);
> >  	rnp = rdp->mynode;
> >  	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >  	rnp->ffmask &= ~rdp->grpmask;
> 
> Just to ensure the first increment sets the tick dep and the last decrement
> resets it, it would be nice to add a check here:
> 
> WARN_ON_ONCE(!rnp->dying && tick_dep_test(TICK_DEP_BIT_RCU));
> 
> And correpondingly on the tick decrement:
> WARN_ON_ONCE(rnp->dying > 0 && !tick_dep_test(TICK_DEP_BIT_RCU));
> 
> Of course that will require adding a new API: tick_dep_test, but might be
> worth it.
> 
> (I think this should catch concurrency bugs such as involving the rnp lock
> that Frederic pointed out).
> 

Thank for your suggestion. But let us see the agressive method firstly,
i.e. removing the tick dep completely. At present, it seems promising.


Thanks,

	Pingfan
Pingfan Liu Sept. 27, 2022, 9:59 a.m. UTC | #12
On Mon, Sep 26, 2022 at 03:23:52PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2022 at 02:34:17PM +0800, Pingfan Liu wrote:
> > Sorry to reply late. I just realize this e-mail misses in my gmail.
> > 
> > On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> > [...]
> > > 
> > > If you have tools/.../rcutorture/bin on your path, yes.  This would default
> > > to a 30-minute run.  If you have at least 16 CPUs, you should add
> >                                             ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8
> 
> Yes, you will get some benefit from --allcpus on systems with from 9-15
> CPUs as well as for 16 and more.  At 8 CPUs, it wouldn't matter.
> 
> > > "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> > > do this:
> > > 
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> > > 
> > 
> > I have tried to find a two socket system with 128 cpus and run
> >   sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04
> > 
> > Where 250*16=4000
> 
> That would work.
> 

This job has successfully run 24+ hours. (But maybe I can only keep it
about 180 hours)

> > > This would run four concurrent instances of the TREE04 scenario, each for
> > > 10 hours, for a total of 40 hours of test time.
> > > 
> > > > > It does take some time to run.  I did 4,000 hours worth of TREE04
> > > >                                         ^^^ '--duration=4000h' can serve this purpose?
> > > 
> > > You could, at least if you replace the "=" with a space character, but
> > > that really would run a six-month test, which is probably not what you
> > > want to do.  There being 8,760 hours in a year and all that.
> > > 
> > > > Is it related with the cpu's freq?
> > > 
> > > Not at all.  '--duration 10h' would run ten hours of wall-clock time
> > > regardless of the CPU frequencies.
> > > 
> > > > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > > > 10 concurrent instances of TREE04, which gets things down to a more
> > > > 
> > > > The total demanded hours H = 4000/(system_cpu_num/8)?
> > > 
> > > Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> > > intended for, again assuming 80 CPUs per system to keep the arithmetic
> > > simple:
> > > 
> > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> > > 
> > 
> > That is appealing.
> > 
> > I will see if any opportunity to grasp a batch of machines to run the
> > test.
> 
> Initial tests with smaller numbers of CPUs are also useful, for example,
> in case reversion causes some bug due to bad interaction with a later
> commit.
> 
> Please let me know how it goes!
> 

I have managed to grasp three two-socket machine, each has 256 cpus.
The test has run about 7 hours till now without any problem by the following command:
tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
--duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"

It seems promising.


Thanks,

	Pingfan
Pingfan Liu Sept. 29, 2022, 8:19 a.m. UTC | #13
On Tue, Sep 27, 2022 at 5:59 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Mon, Sep 26, 2022 at 03:23:52PM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 26, 2022 at 02:34:17PM +0800, Pingfan Liu wrote:
> > > Sorry to reply late. I just realize this e-mail misses in my gmail.
> > >
> > > On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> > > [...]
> > > >
> > > > If you have tools/.../rcutorture/bin on your path, yes.  This would default
> > > > to a 30-minute run.  If you have at least 16 CPUs, you should add
> > >                                             ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8
> >
> > Yes, you will get some benefit from --allcpus on systems with from 9-15
> > CPUs as well as for 16 and more.  At 8 CPUs, it wouldn't matter.
> >
> > > > "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> > > > do this:
> > > >
> > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> > > >
> > >
> > > I have tried to find a two socket system with 128 cpus and run
> > >   sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04
> > >
> > > Where 250*16=4000
> >
> > That would work.
> >
>
> This job has successfully run 24+ hours. (But maybe I can only keep it
> about 180 hours)
>
> > > > This would run four concurrent instances of the TREE04 scenario, each for
> > > > 10 hours, for a total of 40 hours of test time.
> > > >
> > > > > > It does take some time to run.  I did 4,000 hours worth of TREE04
> > > > >                                         ^^^ '--duration=4000h' can serve this purpose?
> > > >
> > > > You could, at least if you replace the "=" with a space character, but
> > > > that really would run a six-month test, which is probably not what you
> > > > want to do.  There being 8,760 hours in a year and all that.
> > > >
> > > > > Is it related with the cpu's freq?
> > > >
> > > > Not at all.  '--duration 10h' would run ten hours of wall-clock time
> > > > regardless of the CPU frequencies.
> > > >
> > > > > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > > > > 10 concurrent instances of TREE04, which gets things down to a more
> > > > >
> > > > > The total demanded hours H = 4000/(system_cpu_num/8)?
> > > >
> > > > Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> > > > intended for, again assuming 80 CPUs per system to keep the arithmetic
> > > > simple:
> > > >
> > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> > > >
> > >
> > > That is appealing.
> > >
> > > I will see if any opportunity to grasp a batch of machines to run the
> > > test.
> >
> > Initial tests with smaller numbers of CPUs are also useful, for example,
> > in case reversion causes some bug due to bad interaction with a later
> > commit.
> >
> > Please let me know how it goes!
> >
>
> I have managed to grasp three two-socket machine, each has 256 cpus.
> The test has run about 7 hours till now without any problem by the following command:
> tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
>
> It seems promising.
>

The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
Make CPU-hotplug removal operations enable tick") reverted. It is
close to the end, but unfortunately it fails.
Quote from remote-log
"
TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
total-gps=9011687] n_max_cbs: 4111392
TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
total-gps=8907816] n_max_cbs: 2411104
TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
QEMU killed
TREE04.59 no success message, 10547 successful version messages
^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
^[[033mWARNING: ^[[mAssertion failure in
/home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
TREE04.59
^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
total-gps=8835575] n_max_cbs: 2329432
...
...
TREE04.91 ------- 4895716 GPs (30.2205/s) [rcu: g39322065 f0x0
total-gps=9830808] n_max_cbs: 2208839
TREE04.92 ------- 4902696 GPs (30.2636/s) [rcu: g39113441 f0x0
total-gps=9778652] n_max_cbs: 1412377
TREE04.93 ------- 4891393 GPs (30.1938/s) [rcu: g39244749 f0x0
total-gps=9811481] n_max_cbs: 1772653
TREE04.94 ------- 4921510 GPs (30.3797/s) [rcu: g39187349 f0x0
total-gps=9797129] n_max_cbs: 1120534
TREE04.95 ------- 4885795 GPs (30.1592/s) [rcu: g39020985 f0x0
total-gps=9755538] n_max_cbs: 1178416
TREE04.96 ------- 4889097 GPs (30.1796/s) [rcu: g39097057 f0x0
total-gps=9774556] n_max_cbs: 1861434
1 runs with runtime errors.
 --- Done at Wed Sep 28 08:40:31 PM EDT 2022 (1d 21:06:57) exitcode 2
"

Quote from  console.log of TREE04.59
"
.....
[162001.696486] rcu-torture: rcu_torture_barrier_cbs is stopping
[162001.697004] rcu-torture: Stopping rcu_torture_fwd_prog task
[162001.697662] rcu_torture_fwd_prog n_max_cbs: 0
[162001.698195] rcu_torture_fwd_prog: Starting forward-progress test 0
[162001.698782] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
[162001.707571] rcu_torture_fwd_prog_cr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.738504] rcu_torture_fwd_prog_nr: Starting forward-progress test 0
[162002.746491] rcu_torture_fwd_prog_nr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.850483] rcu_torture_fwd_prog: tested 2105 tested_tries 2107
[162002.851008] rcu-torture: rcu_torture_fwd_prog is stopping
[162002.851542] rcu-torture: Stopping rcu_torture_writer task
[162004.530463] rcu-torture: rtc: 00000000ac003c99 ver: 800516 tfle: 0
rta: 800517 rtaf: 0 rtf: 800507 rtmbe: 0 rtmbkf: 0/142699 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 205710931 onoff:
185194/185194:185196/185196 1,1860:1,3263 25610601:47601063 (HZ=1000)
barrier: 773783/773783:0 read-exits: 184960 nocb-toggles: 0:0
[162004.532583] rcu-torture: Reader Pipe:  343007605654 1359216 0 0 0
0 0 0 0 0 0
[162004.533113] rcu-torture: Reader Batch:  342996212546 12752324 0 0
0 0 0 0 0 0 0
[162004.533648] rcu-torture: Free-Block Circulation:  800516 800515
800514 800513 800512 800511 800510 800509 800508 800507 0
[162004.534442] ??? Writer stall state RTWS_EXP_SYNC(4) g30755544 f0x0
->state 0x2 cpu 0
[162004.535057] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1)
->state: 0x402 ->rt_priority 0 delta ->gp_start 1674 ->gp_activity
1670 ->gp_req_activity 1674 ->gp_wake_time 1674 ->gp_wake_seq 30755540
->gp_seq 30755544 ->gp_seq_needed 30755544 ->gp_max 989 ->gp_flags 0x0
[162004.536805] rcu:    CB 1^0->2 KbclSW F2838 L2838 C0 ..... q0 S CPU 0
[162004.537277] rcu:    CB 2^0->3 KbclSW F2911 L2911 C7 ..... q0 S CPU 0
[162004.537742] rcu:    CB 3^0->-1 KbclSW F1686 L1686 C2 ..... q0 S CPU 0
[162004.538217] rcu: nocb GP 4 KldtS W[..] ..:0 rnp 4:7 2176869 S CPU 0
[162004.538729] rcu:    CB 4^4->5 KbclSW F2912 L2912 C7 ..... q0 S CPU 0
[162004.539202] rcu:    CB 5^4->6 KbclSW F2871 L2872 C1 ..... q0 S CPU 0
[162004.539667] rcu:    CB 6^4->7 KbclSW F4060 L4060 C0 ..... q0 S CPU 0
[162004.540136] rcu:    CB 7^4->-1 KbclSW F5763 L5763 C1 ..... q0 S CPU 0
[162004.540653] rcu: RCU callbacks invoked since boot: 1431149091
[162004.541076] rcu-torture: rcu_torture_stats is stopping
"

I have no idea whether this is related to the reverted commit.


Thanks,

Pingfan
Pingfan Liu Sept. 29, 2022, 8:20 a.m. UTC | #14
On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
[...]
> "
>
> I have no idea whether this is related to the reverted commit.
>

I have started another test against clean v6.0-rc7 to see whether this
is an issue with the mainline.
Joel Fernandes Sept. 30, 2022, 1:04 p.m. UTC | #15
On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> [...]
> > "
> >
> > I have no idea whether this is related to the reverted commit.
> >
>
> I have started another test against clean v6.0-rc7 to see whether this
> is an issue with the mainline.

I am not sure what exactly you are reverting (you could clarify that),
but if you are just removing the entire TICK_DEP_BIT_RCU, I do
remember (and mentioned on IRC to others recently) that without this
NOHZ_FULL has a hard time ending grace-periods because the forcing of
tick is needed for this configuration if we are spinning in the kernel
with the tick turned off. That seems to align with your TREE04
(NOHZ_FULL) configuration.

Also, the commit Frederic suggested to revert seems to be a cosmetic
optimization in the interrupt-entry path. That should not change
functionality I believe. So I did not fully follow why reverting that
is relevant (maybe Frederic can clarify?).

Or, are you trying to just remove its use from the hotplug path?

thanks,

 - Joel
Paul E. McKenney Sept. 30, 2022, 3:44 p.m. UTC | #16
On Thu, Sep 29, 2022 at 04:19:28PM +0800, Pingfan Liu wrote:
> On Tue, Sep 27, 2022 at 5:59 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > On Mon, Sep 26, 2022 at 03:23:52PM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 26, 2022 at 02:34:17PM +0800, Pingfan Liu wrote:
> > > > Sorry to reply late. I just realize this e-mail misses in my gmail.
> > > >
> > > > On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> > > > [...]
> > > > >
> > > > > If you have tools/.../rcutorture/bin on your path, yes.  This would default
> > > > > to a 30-minute run.  If you have at least 16 CPUs, you should add
> > > >                                             ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8
> > >
> > > Yes, you will get some benefit from --allcpus on systems with from 9-15
> > > CPUs as well as for 16 and more.  At 8 CPUs, it wouldn't matter.
> > >
> > > > > "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> > > > > do this:
> > > > >
> > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> > > > >
> > > >
> > > > I have tried to find a two socket system with 128 cpus and run
> > > >   sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04
> > > >
> > > > Where 250*16=4000
> > >
> > > That would work.
> > >
> >
> > This job has successfully run 24+ hours. (But maybe I can only keep it
> > about 180 hours)
> >
> > > > > This would run four concurrent instances of the TREE04 scenario, each for
> > > > > 10 hours, for a total of 40 hours of test time.
> > > > >
> > > > > > > It does take some time to run.  I did 4,000 hours worth of TREE04
> > > > > >                                         ^^^ '--duration=4000h' can serve this purpose?
> > > > >
> > > > > You could, at least if you replace the "=" with a space character, but
> > > > > that really would run a six-month test, which is probably not what you
> > > > > want to do.  There being 8,760 hours in a year and all that.
> > > > >
> > > > > > Is it related with the cpu's freq?
> > > > >
> > > > > Not at all.  '--duration 10h' would run ten hours of wall-clock time
> > > > > regardless of the CPU frequencies.
> > > > >
> > > > > > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > > > > > 10 concurrent instances of TREE04, which gets things down to a more
> > > > > >
> > > > > > The total demanded hours H = 4000/(system_cpu_num/8)?
> > > > >
> > > > > Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> > > > > intended for, again assuming 80 CPUs per system to keep the arithmetic
> > > > > simple:
> > > > >
> > > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> > > > >
> > > >
> > > > That is appealing.
> > > >
> > > > I will see if any opportunity to grasp a batch of machines to run the
> > > > test.
> > >
> > > Initial tests with smaller numbers of CPUs are also useful, for example,
> > > in case reversion causes some bug due to bad interaction with a later
> > > commit.
> > >
> > > Please let me know how it goes!
> > >
> >
> > I have managed to grasp three two-socket machine, each has 256 cpus.
> > The test has run about 7 hours till now without any problem by the following command:
> > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> >
> > It seems promising.
> >
> 
> The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> Make CPU-hotplug removal operations enable tick") reverted. It is
> close to the end, but unfortunately it fails.
> Quote from remote-log
> "
> TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> total-gps=9011687] n_max_cbs: 4111392
> TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> total-gps=8907816] n_max_cbs: 2411104
> TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> QEMU killed
> TREE04.59 no success message, 10547 successful version messages
> ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> ^[[033mWARNING: ^[[mAssertion failure in
> /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> TREE04.59
> ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> total-gps=8835575] n_max_cbs: 2329432

First, thank you for running this!

This is not the typical failure that we were seeing, which would show
up as a 2.199.0-second RCU CPU stall during which time there would be
no console messages.

But please do let me know how continuing tests go!

							Thanx, Paul

> ...
> ...
> TREE04.91 ------- 4895716 GPs (30.2205/s) [rcu: g39322065 f0x0
> total-gps=9830808] n_max_cbs: 2208839
> TREE04.92 ------- 4902696 GPs (30.2636/s) [rcu: g39113441 f0x0
> total-gps=9778652] n_max_cbs: 1412377
> TREE04.93 ------- 4891393 GPs (30.1938/s) [rcu: g39244749 f0x0
> total-gps=9811481] n_max_cbs: 1772653
> TREE04.94 ------- 4921510 GPs (30.3797/s) [rcu: g39187349 f0x0
> total-gps=9797129] n_max_cbs: 1120534
> TREE04.95 ------- 4885795 GPs (30.1592/s) [rcu: g39020985 f0x0
> total-gps=9755538] n_max_cbs: 1178416
> TREE04.96 ------- 4889097 GPs (30.1796/s) [rcu: g39097057 f0x0
> total-gps=9774556] n_max_cbs: 1861434
> 1 runs with runtime errors.
>  --- Done at Wed Sep 28 08:40:31 PM EDT 2022 (1d 21:06:57) exitcode 2
> "
> 
> Quote from  console.log of TREE04.59
> "
> .....
> [162001.696486] rcu-torture: rcu_torture_barrier_cbs is stopping
> [162001.697004] rcu-torture: Stopping rcu_torture_fwd_prog task
> [162001.697662] rcu_torture_fwd_prog n_max_cbs: 0
> [162001.698195] rcu_torture_fwd_prog: Starting forward-progress test 0
> [162001.698782] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
> [162001.707571] rcu_torture_fwd_prog_cr: Waiting for CBs:
> rcu_barrier+0x0/0x3b0() 0
> [162002.738504] rcu_torture_fwd_prog_nr: Starting forward-progress test 0
> [162002.746491] rcu_torture_fwd_prog_nr: Waiting for CBs:
> rcu_barrier+0x0/0x3b0() 0
> [162002.850483] rcu_torture_fwd_prog: tested 2105 tested_tries 2107
> [162002.851008] rcu-torture: rcu_torture_fwd_prog is stopping
> [162002.851542] rcu-torture: Stopping rcu_torture_writer task
> [162004.530463] rcu-torture: rtc: 00000000ac003c99 ver: 800516 tfle: 0
> rta: 800517 rtaf: 0 rtf: 800507 rtmbe: 0 rtmbkf: 0/142699 rtbe: 0
> rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 205710931 onoff:
> 185194/185194:185196/185196 1,1860:1,3263 25610601:47601063 (HZ=1000)
> barrier: 773783/773783:0 read-exits: 184960 nocb-toggles: 0:0
> [162004.532583] rcu-torture: Reader Pipe:  343007605654 1359216 0 0 0
> 0 0 0 0 0 0
> [162004.533113] rcu-torture: Reader Batch:  342996212546 12752324 0 0
> 0 0 0 0 0 0 0
> [162004.533648] rcu-torture: Free-Block Circulation:  800516 800515
> 800514 800513 800512 800511 800510 800509 800508 800507 0
> [162004.534442] ??? Writer stall state RTWS_EXP_SYNC(4) g30755544 f0x0
> ->state 0x2 cpu 0
> [162004.535057] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1)
> ->state: 0x402 ->rt_priority 0 delta ->gp_start 1674 ->gp_activity
> 1670 ->gp_req_activity 1674 ->gp_wake_time 1674 ->gp_wake_seq 30755540
> ->gp_seq 30755544 ->gp_seq_needed 30755544 ->gp_max 989 ->gp_flags 0x0
> [162004.536805] rcu:    CB 1^0->2 KbclSW F2838 L2838 C0 ..... q0 S CPU 0
> [162004.537277] rcu:    CB 2^0->3 KbclSW F2911 L2911 C7 ..... q0 S CPU 0
> [162004.537742] rcu:    CB 3^0->-1 KbclSW F1686 L1686 C2 ..... q0 S CPU 0
> [162004.538217] rcu: nocb GP 4 KldtS W[..] ..:0 rnp 4:7 2176869 S CPU 0
> [162004.538729] rcu:    CB 4^4->5 KbclSW F2912 L2912 C7 ..... q0 S CPU 0
> [162004.539202] rcu:    CB 5^4->6 KbclSW F2871 L2872 C1 ..... q0 S CPU 0
> [162004.539667] rcu:    CB 6^4->7 KbclSW F4060 L4060 C0 ..... q0 S CPU 0
> [162004.540136] rcu:    CB 7^4->-1 KbclSW F5763 L5763 C1 ..... q0 S CPU 0
> [162004.540653] rcu: RCU callbacks invoked since boot: 1431149091
> [162004.541076] rcu-torture: rcu_torture_stats is stopping
> "
> 
> I have no idea whether this is related to the reverted commit.
> 
> 
> Thanks,
> 
> Pingfan
Pingfan Liu Oct. 2, 2022, 1:29 p.m. UTC | #17
On Fri, Sep 30, 2022 at 11:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
[...]
> > > I have managed to grasp three two-socket machine, each has 256 cpus.
> > > The test has run about 7 hours till now without any problem by the following command:
> > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> > >
> > > It seems promising.
> > >
> >
> > The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> > Make CPU-hotplug removal operations enable tick") reverted. It is
> > close to the end, but unfortunately it fails.
> > Quote from remote-log
> > "
> > TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> > total-gps=9011687] n_max_cbs: 4111392
> > TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> > total-gps=8907816] n_max_cbs: 2411104
> > TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> > QEMU killed
> > TREE04.59 no success message, 10547 successful version messages
> > ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> > ^[[033mWARNING: ^[[mAssertion failure in
> > /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> > TREE04.59
> > ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> > TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> > total-gps=8835575] n_max_cbs: 2329432
>
> First, thank you for running this!
>
> This is not the typical failure that we were seeing, which would show
> up as a 2.199.0-second RCU CPU stall during which time there would be
> no console messages.
>
> But please do let me know how continuing tests go!
>

This time, the same test environment except against v6.0-rc7 mainline,
also encountered the not typical failure.

Quote from remote-log:
"
TREE04.11 ------- 555429 GPs (3.42857/s) n_max_cbs: 1907355
QEMU killed
TREE04.11 no success message, 10547 successful version messages
^[[033mWARNING: ^[[mTREE04.11 GP HANG at 555429 torture stat 1330
^[[033mWARNING: ^[[mAssertion failure in
/home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.11/console.log
TREE04.11
^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 9213
TREE04.12 ------- 4379523 GPs (27.0341/s) [rcu: g35803501 f0x0
total-gps=8951168] n_max_cbs: 2585955
...
TREE04.42 ------- 4383911 GPs (27.0612/s) [rcu: g35706861 f0x0
total-gps=8927008] n_max_cbs: 2544071
TREE04.43 ------- 4370806 GPs (26.9803/s) [rcu: g35793529 f0x0
total-gps=8948675] n_max_cbs: 2490251
^[[033mWARNING: ^[[mAssertion failure in
/home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.43/console.log
TREE04.43
^[[033mWARNING: ^[[mSummary: Warnings: 1 Call Traces: 1
TREE04.44 ------- 4379942 GPs (27.0367/s) [rcu: g35806165 f0x0
total-gps=8951834] n_max_cbs: 1473528
...
TREE04.95 ------- 4850446 GPs (29.941/s) [rcu: g38867685 f0x0
total-gps=9717212] n_max_cbs: 2917447
TREE04.96 ------- 4880570 GPs (30.127/s) [rcu: g39023821 f0x0
total-gps=9756247] n_max_cbs: 1267969
2 runs with runtime errors.
 --- Done at Fri Sep 30 09:35:31 PM EDT 2022 (1d 21:07:02) exitcode 2
"

Quote from TREE04.11 console.log
"
...
[162001.301549] rcu_torture_fwd_prog n_max_cbs: 281053
[162001.301994] rcu_torture_fwd_prog: Starting forward-progress test 0
[162001.302531] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
[162001.308449] rcu_torture_fwd_prog_cr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.352445] rcu_torture_fwd_prog_nr: Starting forward-progress test 0
[162002.360429] rcu_torture_fwd_prog_nr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.464428] rcu_torture_fwd_prog: tested 2104 tested_tries 2105
[162002.464989] rcu-torture: rcu_torture_fwd_prog is stopping
[162002.465518] rcu-torture: Stopping rcu_torture_writer task
[162005.040419] rcu-torture: rtc: 0000000089bf6f56 ver: 555429 tfle: 0
rta: 555430 rtaf: 0 rtf: 555420 rtmbe: 0 rtmbkf: 0/234319 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206274892 onoff:
184783/184783:184787/184787 1,2001:2,3303 25759026:47497333 (HZ=1000)
barrier: 773866/773866:0 read-exits: 185056 nocb-toggles: 0:0
[162005.042858] rcu-torture: Reader Pipe:  344206401977 936546 0 0 0 0 0 0 0 0 0
[162005.043504] rcu-torture: Reader Batch:  344194661258 12677265 0 0
0 0 0 0 0 0 0
[162005.044047] rcu-torture: Free-Block Circulation:  555429 555428
555427 555426 555425 555424 555423 555422 555421 555420 0
[162005.044857] ??? Writer stall state RTWS_COND_SYNC_EXP(8) g30399832
f0x0 ->state 0x2 cpu 1
[162005.045474] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1)
->state: 0x402 ->rt_priority 0 delta ->gp_start 2575 ->gp_activity
2571 ->gp_req_activity 2575 ->gp_wake_time 2575 ->gp_wake_seq 30399828
->gp_seq 30399832 ->gp_seq_needed 30399832 ->gp_max 1118 ->gp_flags
0x0
[162005.047265] rcu:    CB 1^0->2 KbclSW F3852 L3852 C5 ..... q0 S CPU 0
[162005.047856] rcu:    CB 2^0->3 KbclSW F3747 L3747 C2 ..... q0 S CPU 0
[162005.048415] rcu:    CB 3^0->-1 KbclSW F2583 L2583 C2 ..... q0 S CPU 0
[162005.048992] rcu: nocb GP 4 KldtS W[..] ..:0 rnp 4:7 2042607 S CPU 0
[162005.049569] rcu:    CB 4^4->5 KbclSW F3839 L3839 C6 ..... q0 S CPU 0
[162005.050040] rcu:    CB 5^4->6 KbclSW F4449 L4449 C0 ..... q0 S CPU 0
[162005.050515] rcu:    CB 6^4->7 KbclSW F3784 L3785 C1 ..... q0 S CPU 0
[162005.050984] rcu:    CB 7^4->-1 KbclSW F3837 L3837 C7 ..... q0 S CPU 0
[162005.051478] rcu: RCU callbacks invoked since boot: 1422160737
[162005.051908] rcu-torture: rcu_torture_stats is stopping
"

Quote from TREE04.43 console.log
"
...
[162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
[162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
0 0 0 0 0 0
[162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
0 0 0 0 0 0 0
[162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
[162003.266980] rcu-torture: rcu_torture_stats is stopping
[162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
[162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
[162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
= ffffffffa676d9a0
[162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
[162003.281563] mem_dump_obj(NULL): NULL pointer
[162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
[162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
ffff965e01a58000 pointer offset 0 allocated at
rcu_torture_cleanup+0x432/0x4e0
[162003.284138]     kmem_cache_alloc+0x19d/0x1b0
[162003.284522]     rcu_torture_cleanup+0x432/0x4e0
[162003.284918]     torture_shutdown+0xce/0x1c0
[162003.285285]     kthread+0xb7/0xe0
[162003.285587]     ret_from_fork+0x22/0x30
[162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
ffff965e01a58000 pointer offset 8 allocated at
rcu_torture_cleanup+0x432/0x4e0
[162003.287045]     kmem_cache_alloc+0x19d/0x1b0
[162003.287436]     rcu_torture_cleanup+0x432/0x4e0
[162003.287835]     torture_shutdown+0xce/0x1c0
[162003.288202]     kthread+0xb7/0xe0
[162003.288510]     ret_from_fork+0x22/0x30
[162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
[162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
[162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
[162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
[162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
[162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
region starting at 0xffffa0aa0002d000 allocated at
rcu_torture_cleanup.cold+0x287/0x3ef
[162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
region starting at 0xffffa0aa0002d000 allocated at
rcu_torture_cleanup.cold+0x287/0x3ef
[162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
[162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
0 0 0 0 0 0
[162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
0 0 0 0 0 0 0
[162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
[162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
fqs_stutter=3 test_boost=1/0 test_boost_interval=7
test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
[162003.304162] i2c i2c-0: shutdown
[162003.304461] alarmtimer alarmtimer.0.auto: shutdown
[162003.304878] psmouse serio1: shutdown
...
"

I remember if reverting the commits suggested by Frederic, the failure
would occur within an hour.

But unfortunately, I did not keep the data. I will run it again and
submit the data.


Thanks,

Pingfan
Pingfan Liu Oct. 2, 2022, 2:06 p.m. UTC | #18
On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> > >
> > [...]
> > > "
> > >
> > > I have no idea whether this is related to the reverted commit.
> > >
> >
> > I have started another test against clean v6.0-rc7 to see whether this
> > is an issue with the mainline.
>
> I am not sure what exactly you are reverting (you could clarify that),

commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
But due to conflict, "git revert" can not work directly. So I applied
it by hand.

> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
> remember (and mentioned on IRC to others recently) that without this
> NOHZ_FULL has a hard time ending grace-periods because the forcing of
> tick is needed for this configuration if we are spinning in the kernel
> with the tick turned off. That seems to align with your TREE04
> (NOHZ_FULL) configuration.
>

Yes, that is the scenario.

> Also, the commit Frederic suggested to revert seems to be a cosmetic
> optimization in the interrupt-entry path. That should not change
> functionality I believe. So I did not fully follow why reverting that
> is relevant (maybe Frederic can clarify?).
>

Leave this question to Frederic.

And I schedule a test which reverts three commits
  96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
  53e87e3cdc15 (timers/nohz: Last resort update jiffies on nohz_full IRQ entry)
and:
  a1ff03cd6fb9 (tick: Detect and fix jiffies update stall)

Let us see the result. (Last time, it failed within an hour)

> Or, are you trying to just remove its use from the hotplug path?
>

Not from the hotplug path. Just revert the commit and run the standard
TREE04 test.

Thanks,

     Pingfan
Frederic Weisbecker Oct. 2, 2022, 3:08 p.m. UTC | #19
On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> On Fri, Sep 30, 2022 at 11:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> [...]
> > > > I have managed to grasp three two-socket machine, each has 256 cpus.
> > > > The test has run about 7 hours till now without any problem by the following command:
> > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > > > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> > > >
> > > > It seems promising.
> > > >
> > >
> > > The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> > > Make CPU-hotplug removal operations enable tick") reverted. It is
> > > close to the end, but unfortunately it fails.
> > > Quote from remote-log
> > > "
> > > TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> > > total-gps=9011687] n_max_cbs: 4111392
> > > TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> > > total-gps=8907816] n_max_cbs: 2411104
> > > TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> > > QEMU killed
> > > TREE04.59 no success message, 10547 successful version messages
> > > ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> > > ^[[033mWARNING: ^[[mAssertion failure in
> > > /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> > > TREE04.59
> > > ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> > > TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> > > total-gps=8835575] n_max_cbs: 2329432
> >
> > First, thank you for running this!
> >
> > This is not the typical failure that we were seeing, which would show
> > up as a 2.199.0-second RCU CPU stall during which time there would be
> > no console messages.
> >
> > But please do let me know how continuing tests go!
> >
> 
> This time, the same test environment except against v6.0-rc7 mainline,
> also encountered the not typical failure.

Interesting, I'm trying to reproduce...

Thanks!
Joel Fernandes Oct. 2, 2022, 4:11 p.m. UTC | #20
On 10/2/2022 10:06 AM, Pingfan Liu wrote:
> On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>
>> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>
>>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>
>>> [...]
>>>> "
>>>>
>>>> I have no idea whether this is related to the reverted commit.
>>>>
>>>
>>> I have started another test against clean v6.0-rc7 to see whether this
>>> is an issue with the mainline.
>>
>> I am not sure what exactly you are reverting (you could clarify that),
> 
> commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
> But due to conflict, "git revert" can not work directly. So I applied
> it by hand.
> 
>> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
>> remember (and mentioned on IRC to others recently) that without this
>> NOHZ_FULL has a hard time ending grace-periods because the forcing of
>> tick is needed for this configuration if we are spinning in the kernel
>> with the tick turned off. That seems to align with your TREE04
>> (NOHZ_FULL) configuration.
>>
> 
> Yes, that is the scenario.
> 
>> Also, the commit Frederic suggested to revert seems to be a cosmetic
>> optimization in the interrupt-entry path. That should not change
>> functionality I believe. So I did not fully follow why reverting that
>> is relevant (maybe Frederic can clarify?).
>>
> 
> Leave this question to Frederic.

I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
a functional change to CPU hotplug path.

Sorry for the noise.

Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
might jump into the investigation with you guys, but I have to make time for
Lazy-RCU v7 next :)

Thanks.
Paul E. McKenney Oct. 2, 2022, 4:20 p.m. UTC | #21
On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> On Fri, Sep 30, 2022 at 11:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> [...]
> > > > I have managed to grasp three two-socket machine, each has 256 cpus.
> > > > The test has run about 7 hours till now without any problem by the following command:
> > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > > > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> > > >
> > > > It seems promising.
> > > >
> > >
> > > The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> > > Make CPU-hotplug removal operations enable tick") reverted. It is
> > > close to the end, but unfortunately it fails.
> > > Quote from remote-log
> > > "
> > > TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> > > total-gps=9011687] n_max_cbs: 4111392
> > > TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> > > total-gps=8907816] n_max_cbs: 2411104
> > > TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> > > QEMU killed
> > > TREE04.59 no success message, 10547 successful version messages
> > > ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> > > ^[[033mWARNING: ^[[mAssertion failure in
> > > /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> > > TREE04.59
> > > ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> > > TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> > > total-gps=8835575] n_max_cbs: 2329432
> >
> > First, thank you for running this!
> >
> > This is not the typical failure that we were seeing, which would show
> > up as a 2.199.0-second RCU CPU stall during which time there would be
> > no console messages.
> >
> > But please do let me know how continuing tests go!
> >
> 
> This time, the same test environment except against v6.0-rc7 mainline,
> also encountered the not typical failure.
> 
> Quote from remote-log:
> "
> TREE04.11 ------- 555429 GPs (3.42857/s) n_max_cbs: 1907355
> QEMU killed
> TREE04.11 no success message, 10547 successful version messages
> ^[[033mWARNING: ^[[mTREE04.11 GP HANG at 555429 torture stat 1330
> ^[[033mWARNING: ^[[mAssertion failure in
> /home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.11/console.log
> TREE04.11
> ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 9213
> TREE04.12 ------- 4379523 GPs (27.0341/s) [rcu: g35803501 f0x0
> total-gps=8951168] n_max_cbs: 2585955
> ...
> TREE04.42 ------- 4383911 GPs (27.0612/s) [rcu: g35706861 f0x0
> total-gps=8927008] n_max_cbs: 2544071
> TREE04.43 ------- 4370806 GPs (26.9803/s) [rcu: g35793529 f0x0
> total-gps=8948675] n_max_cbs: 2490251
> ^[[033mWARNING: ^[[mAssertion failure in
> /home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.43/console.log
> TREE04.43
> ^[[033mWARNING: ^[[mSummary: Warnings: 1 Call Traces: 1
> TREE04.44 ------- 4379942 GPs (27.0367/s) [rcu: g35806165 f0x0
> total-gps=8951834] n_max_cbs: 1473528
> ...
> TREE04.95 ------- 4850446 GPs (29.941/s) [rcu: g38867685 f0x0
> total-gps=9717212] n_max_cbs: 2917447
> TREE04.96 ------- 4880570 GPs (30.127/s) [rcu: g39023821 f0x0
> total-gps=9756247] n_max_cbs: 1267969
> 2 runs with runtime errors.
>  --- Done at Fri Sep 30 09:35:31 PM EDT 2022 (1d 21:07:02) exitcode 2
> "
> 
> Quote from TREE04.11 console.log
> "
> ...
> [162001.301549] rcu_torture_fwd_prog n_max_cbs: 281053
> [162001.301994] rcu_torture_fwd_prog: Starting forward-progress test 0
> [162001.302531] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
> [162001.308449] rcu_torture_fwd_prog_cr: Waiting for CBs:
> rcu_barrier+0x0/0x3b0() 0
> [162002.352445] rcu_torture_fwd_prog_nr: Starting forward-progress test 0
> [162002.360429] rcu_torture_fwd_prog_nr: Waiting for CBs:
> rcu_barrier+0x0/0x3b0() 0
> [162002.464428] rcu_torture_fwd_prog: tested 2104 tested_tries 2105
> [162002.464989] rcu-torture: rcu_torture_fwd_prog is stopping
> [162002.465518] rcu-torture: Stopping rcu_torture_writer task
> [162005.040419] rcu-torture: rtc: 0000000089bf6f56 ver: 555429 tfle: 0
> rta: 555430 rtaf: 0 rtf: 555420 rtmbe: 0 rtmbkf: 0/234319 rtbe: 0
> rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206274892 onoff:
> 184783/184783:184787/184787 1,2001:2,3303 25759026:47497333 (HZ=1000)
> barrier: 773866/773866:0 read-exits: 185056 nocb-toggles: 0:0
> [162005.042858] rcu-torture: Reader Pipe:  344206401977 936546 0 0 0 0 0 0 0 0 0
> [162005.043504] rcu-torture: Reader Batch:  344194661258 12677265 0 0
> 0 0 0 0 0 0 0
> [162005.044047] rcu-torture: Free-Block Circulation:  555429 555428
> 555427 555426 555425 555424 555423 555422 555421 555420 0
> [162005.044857] ??? Writer stall state RTWS_COND_SYNC_EXP(8) g30399832
> f0x0 ->state 0x2 cpu 1
> [162005.045474] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1)
> ->state: 0x402 ->rt_priority 0 delta ->gp_start 2575 ->gp_activity
> 2571 ->gp_req_activity 2575 ->gp_wake_time 2575 ->gp_wake_seq 30399828
> ->gp_seq 30399832 ->gp_seq_needed 30399832 ->gp_max 1118 ->gp_flags
> 0x0
> [162005.047265] rcu:    CB 1^0->2 KbclSW F3852 L3852 C5 ..... q0 S CPU 0
> [162005.047856] rcu:    CB 2^0->3 KbclSW F3747 L3747 C2 ..... q0 S CPU 0
> [162005.048415] rcu:    CB 3^0->-1 KbclSW F2583 L2583 C2 ..... q0 S CPU 0
> [162005.048992] rcu: nocb GP 4 KldtS W[..] ..:0 rnp 4:7 2042607 S CPU 0
> [162005.049569] rcu:    CB 4^4->5 KbclSW F3839 L3839 C6 ..... q0 S CPU 0
> [162005.050040] rcu:    CB 5^4->6 KbclSW F4449 L4449 C0 ..... q0 S CPU 0
> [162005.050515] rcu:    CB 6^4->7 KbclSW F3784 L3785 C1 ..... q0 S CPU 0
> [162005.050984] rcu:    CB 7^4->-1 KbclSW F3837 L3837 C7 ..... q0 S CPU 0
> [162005.051478] rcu: RCU callbacks invoked since boot: 1422160737
> [162005.051908] rcu-torture: rcu_torture_stats is stopping
> "
> 
> Quote from TREE04.43 console.log
> "
> ...
> [162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> [162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> 0 0 0 0 0 0
> [162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> 0 0 0 0 0 0 0
> [162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
> 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> [162003.266980] rcu-torture: rcu_torture_stats is stopping
> [162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
> [162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
> [162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
> 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
> = ffffffffa676d9a0
> [162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
> [162003.281563] mem_dump_obj(NULL): NULL pointer
> [162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
> starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
> [162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
> ffff965e01a58000 pointer offset 0 allocated at
> rcu_torture_cleanup+0x432/0x4e0
> [162003.284138]     kmem_cache_alloc+0x19d/0x1b0
> [162003.284522]     rcu_torture_cleanup+0x432/0x4e0
> [162003.284918]     torture_shutdown+0xce/0x1c0
> [162003.285285]     kthread+0xb7/0xe0
> [162003.285587]     ret_from_fork+0x22/0x30
> [162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
> ffff965e01a58000 pointer offset 8 allocated at
> rcu_torture_cleanup+0x432/0x4e0
> [162003.287045]     kmem_cache_alloc+0x19d/0x1b0
> [162003.287436]     rcu_torture_cleanup+0x432/0x4e0
> [162003.287835]     torture_shutdown+0xce/0x1c0
> [162003.288202]     kthread+0xb7/0xe0
> [162003.288510]     ret_from_fork+0x22/0x30
> [162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
> [162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
> 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
> [162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
> kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
> [162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
> kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
> [162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
> 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
> [162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
> region starting at 0xffffa0aa0002d000 allocated at
> rcu_torture_cleanup.cold+0x287/0x3ef
> [162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
> region starting at 0xffffa0aa0002d000 allocated at
> rcu_torture_cleanup.cold+0x287/0x3ef
> [162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> [162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> 0 0 0 0 0 0
> [162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> 0 0 0 0 0 0 0
> [162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
> 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> [162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
> nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
> shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
> fqs_stutter=3 test_boost=1/0 test_boost_interval=7
> test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
> stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
> n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
> read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
> [162003.304162] i2c i2c-0: shutdown
> [162003.304461] alarmtimer alarmtimer.0.auto: shutdown
> [162003.304878] psmouse serio1: shutdown
> ...
> "

Huh.  Shutdown writer stall on the first one, but I am not seeing the
but in the second one.  Was the "Call Trace" located somewhere else in
the console.log file?

I have not seen the shutdown writer stall lately, but such stalls have
flagged many sorts of bugs for me to fix in the past.


So maybe there is yet another bug that my testing is for whatever reason
not seeing.

> I remember if reverting the commits suggested by Frederic, the failure
> would occur within an hour.
> 
> But unfortunately, I did not keep the data. I will run it again and
> submit the data.

Sounds good!

							Thanx, Paul
Paul E. McKenney Oct. 2, 2022, 4:20 p.m. UTC | #22
On Sun, Oct 02, 2022 at 05:08:58PM +0200, Frederic Weisbecker wrote:
> On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> > On Fri, Sep 30, 2022 at 11:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > [...]
> > > > > I have managed to grasp three two-socket machine, each has 256 cpus.
> > > > > The test has run about 7 hours till now without any problem by the following command:
> > > > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > > > > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> > > > >
> > > > > It seems promising.
> > > > >
> > > >
> > > > The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> > > > Make CPU-hotplug removal operations enable tick") reverted. It is
> > > > close to the end, but unfortunately it fails.
> > > > Quote from remote-log
> > > > "
> > > > TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> > > > total-gps=9011687] n_max_cbs: 4111392
> > > > TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> > > > total-gps=8907816] n_max_cbs: 2411104
> > > > TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> > > > QEMU killed
> > > > TREE04.59 no success message, 10547 successful version messages
> > > > ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> > > > ^[[033mWARNING: ^[[mAssertion failure in
> > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> > > > TREE04.59
> > > > ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> > > > TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> > > > total-gps=8835575] n_max_cbs: 2329432
> > >
> > > First, thank you for running this!
> > >
> > > This is not the typical failure that we were seeing, which would show
> > > up as a 2.199.0-second RCU CPU stall during which time there would be
> > > no console messages.
> > >
> > > But please do let me know how continuing tests go!
> > >
> > 
> > This time, the same test environment except against v6.0-rc7 mainline,
> > also encountered the not typical failure.
> 
> Interesting, I'm trying to reproduce...

Very good, thank you!

							Thanx, Paul
Paul E. McKenney Oct. 2, 2022, 4:24 p.m. UTC | #23
On Sun, Oct 02, 2022 at 12:11:07PM -0400, Joel Fernandes wrote:
> 
> 
> On 10/2/2022 10:06 AM, Pingfan Liu wrote:
> > On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>
> >> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >>>
> >>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >>>>
> >>> [...]
> >>>> "
> >>>>
> >>>> I have no idea whether this is related to the reverted commit.
> >>>>
> >>>
> >>> I have started another test against clean v6.0-rc7 to see whether this
> >>> is an issue with the mainline.
> >>
> >> I am not sure what exactly you are reverting (you could clarify that),
> > 
> > commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
> > But due to conflict, "git revert" can not work directly. So I applied
> > it by hand.
> > 
> >> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
> >> remember (and mentioned on IRC to others recently) that without this
> >> NOHZ_FULL has a hard time ending grace-periods because the forcing of
> >> tick is needed for this configuration if we are spinning in the kernel
> >> with the tick turned off. That seems to align with your TREE04
> >> (NOHZ_FULL) configuration.
> >>
> > 
> > Yes, that is the scenario.
> > 
> >> Also, the commit Frederic suggested to revert seems to be a cosmetic
> >> optimization in the interrupt-entry path. That should not change
> >> functionality I believe. So I did not fully follow why reverting that
> >> is relevant (maybe Frederic can clarify?).
> >>
> > 
> > Leave this question to Frederic.
> 
> I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
> a functional change to CPU hotplug path.
> 
> Sorry for the noise.
> 
> Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
> might jump into the investigation with you guys, but I have to make time for
> Lazy-RCU v7 next :)

One historical reason was that a nohz_full CPU could enter the kernel with
the tick still disabled, and stay that way indefinitely.  Among other
things, this can interfere with the grace-period wait in the offlining
code path.

							Thanx, Paul
Joel Fernandes Oct. 2, 2022, 4:30 p.m. UTC | #24
> On Oct 2, 2022, at 12:24 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Sun, Oct 02, 2022 at 12:11:07PM -0400, Joel Fernandes wrote:
>> 
>> 
>>> On 10/2/2022 10:06 AM, Pingfan Liu wrote:
>>> On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>> 
>>>> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>> 
>>>>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>>> 
>>>>> [...]
>>>>>> "
>>>>>> 
>>>>>> I have no idea whether this is related to the reverted commit.
>>>>>> 
>>>>> 
>>>>> I have started another test against clean v6.0-rc7 to see whether this
>>>>> is an issue with the mainline.
>>>> 
>>>> I am not sure what exactly you are reverting (you could clarify that),
>>> 
>>> commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
>>> But due to conflict, "git revert" can not work directly. So I applied
>>> it by hand.
>>> 
>>>> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
>>>> remember (and mentioned on IRC to others recently) that without this
>>>> NOHZ_FULL has a hard time ending grace-periods because the forcing of
>>>> tick is needed for this configuration if we are spinning in the kernel
>>>> with the tick turned off. That seems to align with your TREE04
>>>> (NOHZ_FULL) configuration.
>>>> 
>>> 
>>> Yes, that is the scenario.
>>> 
>>>> Also, the commit Frederic suggested to revert seems to be a cosmetic
>>>> optimization in the interrupt-entry path. That should not change
>>>> functionality I believe. So I did not fully follow why reverting that
>>>> is relevant (maybe Frederic can clarify?).
>>>> 
>>> 
>>> Leave this question to Frederic.
>> 
>> I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
>> a functional change to CPU hotplug path.
>> 
>> Sorry for the noise.
>> 
>> Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
>> might jump into the investigation with you guys, but I have to make time for
>> Lazy-RCU v7 next :)
> 
> One historical reason was that a nohz_full CPU could enter the kernel with
> the tick still disabled, and stay that way indefinitely.  Among other
> things, this can interfere with the grace-period wait in the offlining
> code path.

Thanks for clarification/confirmation, that was my hunch as well. Does the emergency forcing on of the tick not suffice for that? If I understand you are sending an IPI to set the same tick dependency if grace periods are being held too long. Another potential fix instead of setting tick dependency in hotplug, then, could be to reduce the time before which emergency IPIs are sent to free up nohz_full CPUs, I think.

Thanks!

- Joel 

> 
>                            Thanx, Paul
Paul E. McKenney Oct. 2, 2022, 4:57 p.m. UTC | #25
On Sun, Oct 02, 2022 at 12:30:52PM -0400, Joel Fernandes wrote:
> 
> 
> > On Oct 2, 2022, at 12:24 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > On Sun, Oct 02, 2022 at 12:11:07PM -0400, Joel Fernandes wrote:
> >> 
> >> 
> >>> On 10/2/2022 10:06 AM, Pingfan Liu wrote:
> >>> On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>>> 
> >>>> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >>>>> 
> >>>>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >>>>>> 
> >>>>> [...]
> >>>>>> "
> >>>>>> 
> >>>>>> I have no idea whether this is related to the reverted commit.
> >>>>>> 
> >>>>> 
> >>>>> I have started another test against clean v6.0-rc7 to see whether this
> >>>>> is an issue with the mainline.
> >>>> 
> >>>> I am not sure what exactly you are reverting (you could clarify that),
> >>> 
> >>> commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
> >>> But due to conflict, "git revert" can not work directly. So I applied
> >>> it by hand.
> >>> 
> >>>> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
> >>>> remember (and mentioned on IRC to others recently) that without this
> >>>> NOHZ_FULL has a hard time ending grace-periods because the forcing of
> >>>> tick is needed for this configuration if we are spinning in the kernel
> >>>> with the tick turned off. That seems to align with your TREE04
> >>>> (NOHZ_FULL) configuration.
> >>>> 
> >>> 
> >>> Yes, that is the scenario.
> >>> 
> >>>> Also, the commit Frederic suggested to revert seems to be a cosmetic
> >>>> optimization in the interrupt-entry path. That should not change
> >>>> functionality I believe. So I did not fully follow why reverting that
> >>>> is relevant (maybe Frederic can clarify?).
> >>>> 
> >>> 
> >>> Leave this question to Frederic.
> >> 
> >> I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
> >> a functional change to CPU hotplug path.
> >> 
> >> Sorry for the noise.
> >> 
> >> Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
> >> might jump into the investigation with you guys, but I have to make time for
> >> Lazy-RCU v7 next :)
> > 
> > One historical reason was that a nohz_full CPU could enter the kernel with
> > the tick still disabled, and stay that way indefinitely.  Among other
> > things, this can interfere with the grace-period wait in the offlining
> > code path.
> 
> Thanks for clarification/confirmation, that was my hunch as well. Does the emergency forcing on of the tick not suffice for that? If I understand you are sending an IPI to set the same tick dependency if grace periods are being held too long. Another potential fix instead of setting tick dependency in hotplug, then, could be to reduce the time before which emergency IPIs are sent to free up nohz_full CPUs, I think.

Why not try modifying it to work this way and seeing what (if anything)
breaks?  If nothing breaks, then there is still the option of digging
into the code.

In both cases, after Lazy-RCU v7, of course.  ;-)

								Thanx, Paul
Joel Fernandes Oct. 2, 2022, 4:59 p.m. UTC | #26
> On Oct 2, 2022, at 12:57 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Sun, Oct 02, 2022 at 12:30:52PM -0400, Joel Fernandes wrote:
>> 
>> 
>>>> On Oct 2, 2022, at 12:24 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>> 
>>> On Sun, Oct 02, 2022 at 12:11:07PM -0400, Joel Fernandes wrote:
>>>> 
>>>> 
>>>>> On 10/2/2022 10:06 AM, Pingfan Liu wrote:
>>>>> On Fri, Sep 30, 2022 at 9:04 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>>> 
>>>>>> On Thu, Sep 29, 2022 at 4:21 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>>>> 
>>>>>>> On Thu, Sep 29, 2022 at 4:19 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>>>>> 
>>>>>>> [...]
>>>>>>>> "
>>>>>>>> 
>>>>>>>> I have no idea whether this is related to the reverted commit.
>>>>>>>> 
>>>>>>> 
>>>>>>> I have started another test against clean v6.0-rc7 to see whether this
>>>>>>> is an issue with the mainline.
>>>>>> 
>>>>>> I am not sure what exactly you are reverting (you could clarify that),
>>>>> 
>>>>> commit 96926686deab ("rcu: Make CPU-hotplug removal operations enable tick").
>>>>> But due to conflict, "git revert" can not work directly. So I applied
>>>>> it by hand.
>>>>> 
>>>>>> but if you are just removing the entire TICK_DEP_BIT_RCU, I do
>>>>>> remember (and mentioned on IRC to others recently) that without this
>>>>>> NOHZ_FULL has a hard time ending grace-periods because the forcing of
>>>>>> tick is needed for this configuration if we are spinning in the kernel
>>>>>> with the tick turned off. That seems to align with your TREE04
>>>>>> (NOHZ_FULL) configuration.
>>>>>> 
>>>>> 
>>>>> Yes, that is the scenario.
>>>>> 
>>>>>> Also, the commit Frederic suggested to revert seems to be a cosmetic
>>>>>> optimization in the interrupt-entry path. That should not change
>>>>>> functionality I believe. So I did not fully follow why reverting that
>>>>>> is relevant (maybe Frederic can clarify?).
>>>>>> 
>>>>> 
>>>>> Leave this question to Frederic.
>>>> 
>>>> I take this comment back, Sorry. Indeed the commits Frederic mentioned will make
>>>> a functional change to CPU hotplug path.
>>>> 
>>>> Sorry for the noise.
>>>> 
>>>> Excited to see exact reason why TICK_DEP_BIT_RCU matters in the hotplug paths. I
>>>> might jump into the investigation with you guys, but I have to make time for
>>>> Lazy-RCU v7 next :)
>>> 
>>> One historical reason was that a nohz_full CPU could enter the kernel with
>>> the tick still disabled, and stay that way indefinitely.  Among other
>>> things, this can interfere with the grace-period wait in the offlining
>>> code path.
>> 
>> Thanks for clarification/confirmation, that was my hunch as well. Does the emergency forcing on of the tick not suffice for that? If I understand you are sending an IPI to set the same tick dependency if grace periods are being held too long. Another potential fix instead of setting tick dependency in hotplug, then, could be to reduce the time before which emergency IPIs are sent to free up nohz_full CPUs, I think.
> 
> Why not try modifying it to work this way and seeing what (if anything)
> breaks?  If nothing breaks, then there is still the option of digging
> into the code.

Yes happy to try :)

> 
> In both cases, after Lazy-RCU v7, of course.  ;-)

Sounds good and thanks for understanding ;)

Thanks,

 - Joel

> 
>                                Thanx, Paul
Paul E. McKenney Oct. 27, 2022, 5:46 p.m. UTC | #27
On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> [...]
> 
> > >
> > > But unfortunately, I did not keep the data. I will run it again and
> > > submit the data.
> >
> 
> I have finished the test on a machine with two sockets and 256 cpus.
> The test runs against the kernel with three commits reverted.
>   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
>   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> IRQ entry")
>   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> 
> Summary from console.log
> "
>  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> Results directory:
> /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> 125h --bootargs rcutorture.onoff_interval=200
> rcutorture.onoff_holdoff=30 --configs 32*TREE04
> TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> TREE04 no success message, 2897 successful version messages
> Completed in 44512 vs. 450000
> TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> TREE04.10 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> TREE04.11 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> TREE04.12 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> TREE04.13 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> TREE04.14 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> TREE04.15 no success message, 2897 successful version messages
> Completed in 44510 vs. 450000
> TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> TREE04.16 no success message, 2897 successful version messages
> Completed in 44510 vs. 450000
> TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> TREE04.17 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> TREE04.18 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> TREE04.19 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> ...
> TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> TREE04.4 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> TREE04.5 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> TREE04.6 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> TREE04.7 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> TREE04.8 no success message, 2897 successful version messages
> Completed in 44511 vs. 450000
> TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> TREE04.9 no success message, 2897 successful version messages
> Completed in 44510 vs. 450000
> 32 runs with runtime errors.
>  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> "
> I have no idea about the test so just arbitrarily pick up the
> console.log of TREE04.10 as an example. Please get it from attachment.

Very good, thank you!

Could you please clearly indicate what you tested?  For example, if
you have an externally visible git tree, please point me at the tree
and the SHA-1.  Or send a patch series clearly indicating what it is
based on.

Then I can try a long run on a larger collection of systems.

If that works out, we can see about adjustments to mainline.  ;-)

							Thanx, Paul
Paul E. McKenney Oct. 27, 2022, 6:13 p.m. UTC | #28
On Mon, Oct 10, 2022 at 09:42:07AM +0800, Pingfan Liu wrote:
> On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> [...]
> 
> > >
> > > Quote from TREE04.43 console.log
> > > "
> > > ...
> > > [162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > [162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > 0 0 0 0 0 0
> > > [162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > 0 0 0 0 0 0 0
> > > [162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > [162003.266980] rcu-torture: rcu_torture_stats is stopping
> > > [162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
> > > [162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
> > > [162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
> > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
> > > = ffffffffa676d9a0
> > > [162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
> > > [162003.281563] mem_dump_obj(NULL): NULL pointer
> > > [162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
> > > starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
> > > [162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
> > > ffff965e01a58000 pointer offset 0 allocated at
> > > rcu_torture_cleanup+0x432/0x4e0
> > > [162003.284138]     kmem_cache_alloc+0x19d/0x1b0
> > > [162003.284522]     rcu_torture_cleanup+0x432/0x4e0
> > > [162003.284918]     torture_shutdown+0xce/0x1c0
> > > [162003.285285]     kthread+0xb7/0xe0
> > > [162003.285587]     ret_from_fork+0x22/0x30
> > > [162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
> > > ffff965e01a58000 pointer offset 8 allocated at
> > > rcu_torture_cleanup+0x432/0x4e0
> > > [162003.287045]     kmem_cache_alloc+0x19d/0x1b0
> > > [162003.287436]     rcu_torture_cleanup+0x432/0x4e0
> > > [162003.287835]     torture_shutdown+0xce/0x1c0
> > > [162003.288202]     kthread+0xb7/0xe0
> > > [162003.288510]     ret_from_fork+0x22/0x30
> > > [162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
> > > [162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
> > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
> > > [162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
> > > kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
> > > [162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
> > > kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
> > > [162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
> > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
> > > [162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
> > > region starting at 0xffffa0aa0002d000 allocated at
> > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > [162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
> > > region starting at 0xffffa0aa0002d000 allocated at
> > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > [162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > [162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > 0 0 0 0 0 0
> > > [162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > 0 0 0 0 0 0 0
> > > [162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > [162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
> > > nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
> > > shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
> > > fqs_stutter=3 test_boost=1/0 test_boost_interval=7
> > > test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
> > > stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
> > > n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
> > > read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
> > > [162003.304162] i2c i2c-0: shutdown
> > > [162003.304461] alarmtimer alarmtimer.0.auto: shutdown
> > > [162003.304878] psmouse serio1: shutdown
> > > ...
> > > "
> >
> > Huh.  Shutdown writer stall on the first one, but I am not seeing the
> > but in the second one.  Was the "Call Trace" located somewhere else in
> > the console.log file?
> >
> > I have not seen the shutdown writer stall lately, but such stalls have
> > flagged many sorts of bugs for me to fix in the past.
> >
> >
> > So maybe there is yet another bug that my testing is for whatever reason
> > not seeing.
> 
> Sorry to reply late, I just came back from a long holiday.

Given my delay, I guess that I cannot complain...

> The attachment is the complete log of TREE04.43.

This happens because rcutorture hit OOM while doing RCU callback flood
testing.  This -might- be related to the changes, for example if timers
slowed down, thus slowing down RCU grace periods, thus allowing the
kthread doing the flooding exhaust memory.

Grace periods are progressing in the histogram:

[12077.630343] rcu_torture_fwd_cb_hist: Callback-invocation histogram 0 (duration 1321 jiffies): 1s/10: 75043:17 2s/10: 137085:35 3s/10: 143871:14 4s/10: 66038:26 5s/10: 51754:18 6s/10: 67000:16 7s/10: 54155:17 8s/10: 56749:23 9s/10: 120850:8 10s/10: 87789:17 11s/10: 111123:32 12s/10: 136322:50 13s/10: 108813:33 14s/10: 138489:1
[12077.632824] rcu: rcu_fwd_progress_check: GP age 17 jiffies

And there is this:

[12077.644478] rcu: RCU callbacks invoked since boot: 102942584
[12077.644974] rcu: rcu_fwd_progress_check: callbacks 0: 45 4: 7973717
[12077.695770] rcutorture_oom_notify: Freed 592481 RCU callbacks.

So another question is "how much memory was available to rcutorture?"

Less than 512MB could explain this.

(I do see this from time to time, but on single-CPU TREE09 runs.)

							Thanx, Paul
Pingfan Liu Oct. 31, 2022, 2:10 a.m. UTC | #29
On Thu, Oct 27, 2022 at 11:13:46AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 10, 2022 at 09:42:07AM +0800, Pingfan Liu wrote:
> > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> > [...]
> > 
> > > >
> > > > Quote from TREE04.43 console.log
> > > > "
> > > > ...
> > > > [162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > > [162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > > 0 0 0 0 0 0
> > > > [162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > > 0 0 0 0 0 0 0
> > > > [162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > > [162003.266980] rcu-torture: rcu_torture_stats is stopping
> > > > [162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
> > > > [162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
> > > > [162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
> > > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
> > > > = ffffffffa676d9a0
> > > > [162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
> > > > [162003.281563] mem_dump_obj(NULL): NULL pointer
> > > > [162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
> > > > starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
> > > > [162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
> > > > ffff965e01a58000 pointer offset 0 allocated at
> > > > rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.284138]     kmem_cache_alloc+0x19d/0x1b0
> > > > [162003.284522]     rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.284918]     torture_shutdown+0xce/0x1c0
> > > > [162003.285285]     kthread+0xb7/0xe0
> > > > [162003.285587]     ret_from_fork+0x22/0x30
> > > > [162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
> > > > ffff965e01a58000 pointer offset 8 allocated at
> > > > rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.287045]     kmem_cache_alloc+0x19d/0x1b0
> > > > [162003.287436]     rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.287835]     torture_shutdown+0xce/0x1c0
> > > > [162003.288202]     kthread+0xb7/0xe0
> > > > [162003.288510]     ret_from_fork+0x22/0x30
> > > > [162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
> > > > [162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
> > > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
> > > > [162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
> > > > kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
> > > > [162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
> > > > kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
> > > > [162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
> > > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
> > > > [162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
> > > > region starting at 0xffffa0aa0002d000 allocated at
> > > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > > [162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
> > > > region starting at 0xffffa0aa0002d000 allocated at
> > > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > > [162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > > [162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > > 0 0 0 0 0 0
> > > > [162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > > 0 0 0 0 0 0 0
> > > > [162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > > [162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
> > > > nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
> > > > shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
> > > > fqs_stutter=3 test_boost=1/0 test_boost_interval=7
> > > > test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
> > > > stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
> > > > n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
> > > > read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
> > > > [162003.304162] i2c i2c-0: shutdown
> > > > [162003.304461] alarmtimer alarmtimer.0.auto: shutdown
> > > > [162003.304878] psmouse serio1: shutdown
> > > > ...
> > > > "
> > >
> > > Huh.  Shutdown writer stall on the first one, but I am not seeing the
> > > but in the second one.  Was the "Call Trace" located somewhere else in
> > > the console.log file?
> > >
> > > I have not seen the shutdown writer stall lately, but such stalls have
> > > flagged many sorts of bugs for me to fix in the past.
> > >
> > >
> > > So maybe there is yet another bug that my testing is for whatever reason
> > > not seeing.
> > 
> > Sorry to reply late, I just came back from a long holiday.
> 
> Given my delay, I guess that I cannot complain...
> 

Thanks for insight to the log. I am a little astray in the whole torture
test. I will pay more time to study it.

> > The attachment is the complete log of TREE04.43.
> 
> This happens because rcutorture hit OOM while doing RCU callback flood
> testing.  This -might- be related to the changes, for example if timers
> slowed down, thus slowing down RCU grace periods, thus allowing the
> kthread doing the flooding exhaust memory.
> 
> Grace periods are progressing in the histogram:
> 
> [12077.630343] rcu_torture_fwd_cb_hist: Callback-invocation histogram 0 (duration 1321 jiffies): 1s/10: 75043:17 2s/10: 137085:35 3s/10: 143871:14 4s/10: 66038:26 5s/10: 51754:18 6s/10: 67000:16 7s/10: 54155:17 8s/10: 56749:23 9s/10: 120850:8 10s/10: 87789:17 11s/10: 111123:32 12s/10: 136322:50 13s/10: 108813:33 14s/10: 138489:1
> [12077.632824] rcu: rcu_fwd_progress_check: GP age 17 jiffies
> 
> And there is this:
> 
> [12077.644478] rcu: RCU callbacks invoked since boot: 102942584
> [12077.644974] rcu: rcu_fwd_progress_check: callbacks 0: 45 4: 7973717
> [12077.695770] rcutorture_oom_notify: Freed 592481 RCU callbacks.
> 
> So another question is "how much memory was available to rcutorture?"
> 
> Less than 512MB could explain this.
> 

Each machine has 262144 MB RAM, and runs 32 TREE04 instance. So each
instance has nearly 8192 MB RAM. (I do not run any other load)


Thanks,

	Pingfan

> (I do see this from time to time, but on single-CPU TREE09 runs.)
> 
> 							Thanx, Paul
Pingfan Liu Oct. 31, 2022, 3:24 a.m. UTC | #30
On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > [...]
> >
> > > >
> > > > But unfortunately, I did not keep the data. I will run it again and
> > > > submit the data.
> > >
> >
> > I have finished the test on a machine with two sockets and 256 cpus.
> > The test runs against the kernel with three commits reverted.
> >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > IRQ entry")
> >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> >
> > Summary from console.log
> > "
> >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > Results directory:
> > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > 125h --bootargs rcutorture.onoff_interval=200
> > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > TREE04 no success message, 2897 successful version messages
> > Completed in 44512 vs. 450000
> > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > TREE04.10 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > TREE04.11 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > TREE04.12 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > TREE04.13 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > TREE04.14 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > TREE04.15 no success message, 2897 successful version messages
> > Completed in 44510 vs. 450000
> > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > TREE04.16 no success message, 2897 successful version messages
> > Completed in 44510 vs. 450000
> > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > TREE04.17 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > TREE04.18 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > TREE04.19 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > ...
> > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > TREE04.4 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > TREE04.5 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > TREE04.6 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > TREE04.7 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > TREE04.8 no success message, 2897 successful version messages
> > Completed in 44511 vs. 450000
> > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > TREE04.9 no success message, 2897 successful version messages
> > Completed in 44510 vs. 450000
> > 32 runs with runtime errors.
> >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > "
> > I have no idea about the test so just arbitrarily pick up the
> > console.log of TREE04.10 as an example. Please get it from attachment.
>
> Very good, thank you!
>
> Could you please clearly indicate what you tested?  For example, if
> you have an externally visible git tree, please point me at the tree
> and the SHA-1.  Or send a patch series clearly indicating what it is
> based on.
>

Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.

Please clone it from https://github.com/pfliu/linux.git  branch:
rcu#revert_tick_dep


> Then I can try a long run on a larger collection of systems.
>

Thank you very much.

> If that works out, we can see about adjustments to mainline.  ;-)
>

Eager to see.

Thanks,

    Pingfan
Paul E. McKenney Nov. 3, 2022, 4:51 p.m. UTC | #31
On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > [...]
> > >
> > > > >
> > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > submit the data.
> > > >
> > >
> > > I have finished the test on a machine with two sockets and 256 cpus.
> > > The test runs against the kernel with three commits reverted.
> > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > IRQ entry")
> > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > >
> > > Summary from console.log
> > > "
> > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > Results directory:
> > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > 125h --bootargs rcutorture.onoff_interval=200
> > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > TREE04 no success message, 2897 successful version messages
> > > Completed in 44512 vs. 450000
> > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > TREE04.10 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > TREE04.11 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > TREE04.12 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > TREE04.13 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > TREE04.14 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > TREE04.15 no success message, 2897 successful version messages
> > > Completed in 44510 vs. 450000
> > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > TREE04.16 no success message, 2897 successful version messages
> > > Completed in 44510 vs. 450000
> > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > TREE04.17 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > TREE04.18 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > TREE04.19 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > ...
> > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > TREE04.4 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > TREE04.5 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > TREE04.6 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > TREE04.7 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > TREE04.8 no success message, 2897 successful version messages
> > > Completed in 44511 vs. 450000
> > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > TREE04.9 no success message, 2897 successful version messages
> > > Completed in 44510 vs. 450000
> > > 32 runs with runtime errors.
> > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > "
> > > I have no idea about the test so just arbitrarily pick up the
> > > console.log of TREE04.10 as an example. Please get it from attachment.
> >
> > Very good, thank you!
> >
> > Could you please clearly indicate what you tested?  For example, if
> > you have an externally visible git tree, please point me at the tree
> > and the SHA-1.  Or send a patch series clearly indicating what it is
> > based on.
> >
> 
> Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> 
> Please clone it from https://github.com/pfliu/linux.git  branch:
> rcu#revert_tick_dep

Thank you very much!

> > Then I can try a long run on a larger collection of systems.
> >
> 
> Thank you very much.
> 
> > If that works out, we can see about adjustments to mainline.  ;-)
> >
> 
> Eager to see.

I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
hours on v6.0, which precedes these commits, and everything passed.

I will run more, primarily on v6.0, but that is what I have thus far.
At the moment, I have some concerns about this change.

							Thanx, Paul
Paul E. McKenney Nov. 7, 2022, 4:07 p.m. UTC | #32
On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > [...]
> > > >
> > > > > >
> > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > submit the data.
> > > > >
> > > >
> > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > The test runs against the kernel with three commits reverted.
> > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > IRQ entry")
> > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > >
> > > > Summary from console.log
> > > > "
> > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > Results directory:
> > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > TREE04 no success message, 2897 successful version messages
> > > > Completed in 44512 vs. 450000
> > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > TREE04.10 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > TREE04.11 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > TREE04.12 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > TREE04.13 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > TREE04.14 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > TREE04.15 no success message, 2897 successful version messages
> > > > Completed in 44510 vs. 450000
> > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > TREE04.16 no success message, 2897 successful version messages
> > > > Completed in 44510 vs. 450000
> > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > TREE04.17 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > TREE04.18 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > TREE04.19 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > ...
> > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > TREE04.4 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > TREE04.5 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > TREE04.6 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > TREE04.7 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > TREE04.8 no success message, 2897 successful version messages
> > > > Completed in 44511 vs. 450000
> > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > TREE04.9 no success message, 2897 successful version messages
> > > > Completed in 44510 vs. 450000
> > > > 32 runs with runtime errors.
> > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > "
> > > > I have no idea about the test so just arbitrarily pick up the
> > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > >
> > > Very good, thank you!
> > >
> > > Could you please clearly indicate what you tested?  For example, if
> > > you have an externally visible git tree, please point me at the tree
> > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > based on.
> > >
> > 
> > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > 
> > Please clone it from https://github.com/pfliu/linux.git  branch:
> > rcu#revert_tick_dep
> 
> Thank you very much!
> 
> > > Then I can try a long run on a larger collection of systems.
> > >
> > 
> > Thank you very much.
> > 
> > > If that works out, we can see about adjustments to mainline.  ;-)
> > >
> > 
> > Eager to see.
> 
> I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> hours on v6.0, which precedes these commits, and everything passed.
> 
> I will run more, primarily on v6.0, but that is what I have thus far.
> At the moment, I have some concerns about this change.

OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
looked at were RCU CPU stall warnings with timer failures.

This data suggests that the kernel is not yet ready for that commit
to be reverted.

							Thanx, Paul
Joel Fernandes Nov. 9, 2022, 6:55 p.m. UTC | #33
On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > [...]
> > > > >
> > > > > > >
> > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > submit the data.
> > > > > >
> > > > >
> > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > The test runs against the kernel with three commits reverted.
> > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > IRQ entry")
> > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > >
> > > > > Summary from console.log
> > > > > "
> > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > Results directory:
> > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > TREE04 no success message, 2897 successful version messages
> > > > > Completed in 44512 vs. 450000
> > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > Completed in 44510 vs. 450000
> > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > Completed in 44510 vs. 450000
> > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > ...
> > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > Completed in 44510 vs. 450000
> > > > > 32 runs with runtime errors.
> > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > "
> > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > >
> > > > Very good, thank you!
> > > >
> > > > Could you please clearly indicate what you tested?  For example, if
> > > > you have an externally visible git tree, please point me at the tree
> > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > based on.
> > > >
> > > 
> > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > 
> > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > rcu#revert_tick_dep
> > 
> > Thank you very much!
> > 
> > > > Then I can try a long run on a larger collection of systems.
> > > >
> > > 
> > > Thank you very much.
> > > 
> > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > >
> > > 
> > > Eager to see.
> > 
> > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > hours on v6.0, which precedes these commits, and everything passed.
> > 
> > I will run more, primarily on v6.0, but that is what I have thus far.
> > At the moment, I have some concerns about this change.
> 
> OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> looked at were RCU CPU stall warnings with timer failures.
> 
> This data suggests that the kernel is not yet ready for that commit
> to be reverted.

Even if the tests pass, can we really survive with this patch
that he reverted?
https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7

If stop machine on a CPU spends a good amount of time in kernel mode, while a
grace period starts on another CPU, then we're kind of screwed if we don't
have the tick enabled right?

Or, did we make any changes to stop machine such that, that's no longer an
issue?

thanks,

 - Joel
Frederic Weisbecker Nov. 17, 2022, 2:39 p.m. UTC | #34
On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > hours on v6.0, which precedes these commits, and everything passed.
> > 
> > I will run more, primarily on v6.0, but that is what I have thus far.
> > At the moment, I have some concerns about this change.
> 
> OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> looked at were RCU CPU stall warnings with timer failures.
> 
> This data suggests that the kernel is not yet ready for that commit
> to be reverted.

But that branch has the three commits reverted:

1) tick: Detect and fix jiffies update stall
2) timers/nohz: Last resort update jiffies on nohz_full IRQ entry*
3) rcu: Make CPU-hotplug removal operations enable tick

Reverting all of them is expected to fail anyway.

What we would like to know is if reverting just 3) is fine. Because
1) and 2) are supposed to fix the underlying issue.

I personally didn't manage to trigger failures with just reverting 3)
after thousands hours. But it failed with reverting all of them.

Has someone managed to trigger a failure with only 3) reverted?

Thanks.
Pingfan Liu Nov. 18, 2022, 1:45 a.m. UTC | #35
On Thu, Nov 17, 2022 at 10:39 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > hours on v6.0, which precedes these commits, and everything passed.
> > >
> > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > At the moment, I have some concerns about this change.
> >
> > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > looked at were RCU CPU stall warnings with timer failures.
> >
> > This data suggests that the kernel is not yet ready for that commit
> > to be reverted.
>
> But that branch has the three commits reverted:
>
> 1) tick: Detect and fix jiffies update stall
> 2) timers/nohz: Last resort update jiffies on nohz_full IRQ entry*
> 3) rcu: Make CPU-hotplug removal operations enable tick
>
> Reverting all of them is expected to fail anyway.
>
> What we would like to know is if reverting just 3) is fine. Because
> 1) and 2) are supposed to fix the underlying issue.
>
> I personally didn't manage to trigger failures with just reverting 3)
> after thousands hours. But it failed with reverting all of them.
>
> Has someone managed to trigger a failure with only 3) reverted?
>

Oh, a long long thread hides the history!   Once it was done [1].  But
I have not kept the data (IIRC, it occupied ~1GB).

If the information and attachment in [1] are not enough for you, I am
happy to redo the test.

Thanks,

    Pingfan

[1]: https://lore.kernel.org/all/CAFgQCTtLm-JXRyQfKo6-+P00SShVGujZGau+khmtCe1AiRodQA@mail.gmail.com/

> Thanks.
Pingfan Liu Nov. 18, 2022, 12:08 p.m. UTC | #36
On Thu, Nov 10, 2022 at 2:55 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > >
> > > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > > submit the data.
> > > > > > >
> > > > > >
> > > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > > The test runs against the kernel with three commits reverted.
> > > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > > IRQ entry")
> > > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > > >
> > > > > > Summary from console.log
> > > > > > "
> > > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > > Results directory:
> > > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > > TREE04 no success message, 2897 successful version messages
> > > > > > Completed in 44512 vs. 450000
> > > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > > Completed in 44510 vs. 450000
> > > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > > Completed in 44510 vs. 450000
> > > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > ...
> > > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > > Completed in 44511 vs. 450000
> > > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > > Completed in 44510 vs. 450000
> > > > > > 32 runs with runtime errors.
> > > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > > "
> > > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > > >
> > > > > Very good, thank you!
> > > > >
> > > > > Could you please clearly indicate what you tested?  For example, if
> > > > > you have an externally visible git tree, please point me at the tree
> > > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > > based on.
> > > > >
> > > >
> > > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > >
> > > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > > rcu#revert_tick_dep
> > >
> > > Thank you very much!
> > >
> > > > > Then I can try a long run on a larger collection of systems.
> > > > >
> > > >
> > > > Thank you very much.
> > > >
> > > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > > >
> > > >
> > > > Eager to see.
> > >
> > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > hours on v6.0, which precedes these commits, and everything passed.
> > >
> > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > At the moment, I have some concerns about this change.
> >
> > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > looked at were RCU CPU stall warnings with timer failures.
> >
> > This data suggests that the kernel is not yet ready for that commit
> > to be reverted.
>
> Even if the tests pass, can we really survive with this patch
> that he reverted?
> https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7
>
> If stop machine on a CPU spends a good amount of time in kernel mode, while a
> grace period starts on another CPU, then we're kind of screwed if we don't
> have the tick enabled right?
>

In this case, I think multi_cpu_stop()->rcu_momentary_dyntick_idle()
can serve this purpose.

Thanks,

    Pingfan

> Or, did we make any changes to stop machine such that, that's no longer an
> issue?
>
> thanks,
>
>  - Joel
>
Paul E. McKenney Nov. 18, 2022, 11:30 p.m. UTC | #37
On Fri, Nov 18, 2022 at 08:08:35PM +0800, Pingfan Liu wrote:
> On Thu, Nov 10, 2022 at 2:55 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > > > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > >
> > > > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > > > submit the data.
> > > > > > > >
> > > > > > >
> > > > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > > > The test runs against the kernel with three commits reverted.
> > > > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > > > IRQ entry")
> > > > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > > > >
> > > > > > > Summary from console.log
> > > > > > > "
> > > > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > > > Results directory:
> > > > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > > > TREE04 no success message, 2897 successful version messages
> > > > > > > Completed in 44512 vs. 450000
> > > > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > > > Completed in 44510 vs. 450000
> > > > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > > > Completed in 44510 vs. 450000
> > > > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > ...
> > > > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > > > Completed in 44511 vs. 450000
> > > > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > > > Completed in 44510 vs. 450000
> > > > > > > 32 runs with runtime errors.
> > > > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > > > "
> > > > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > > > >
> > > > > > Very good, thank you!
> > > > > >
> > > > > > Could you please clearly indicate what you tested?  For example, if
> > > > > > you have an externally visible git tree, please point me at the tree
> > > > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > > > based on.
> > > > > >
> > > > >
> > > > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > > >
> > > > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > > > rcu#revert_tick_dep
> > > >
> > > > Thank you very much!
> > > >
> > > > > > Then I can try a long run on a larger collection of systems.
> > > > > >
> > > > >
> > > > > Thank you very much.
> > > > >
> > > > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > > > >
> > > > >
> > > > > Eager to see.
> > > >
> > > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > > hours on v6.0, which precedes these commits, and everything passed.
> > > >
> > > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > > At the moment, I have some concerns about this change.
> > >
> > > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > > looked at were RCU CPU stall warnings with timer failures.
> > >
> > > This data suggests that the kernel is not yet ready for that commit
> > > to be reverted.
> >
> > Even if the tests pass, can we really survive with this patch
> > that he reverted?
> > https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7
> >
> > If stop machine on a CPU spends a good amount of time in kernel mode, while a
> > grace period starts on another CPU, then we're kind of screwed if we don't
> > have the tick enabled right?
> 
> In this case, I think multi_cpu_stop()->rcu_momentary_dyntick_idle()
> can serve this purpose.

There are a lot of things that can go wrong in this scenario.  Does that
added rcu_momentary_dyntick_idle() cover everything that still needs to
be covered?

							Thanx, Paul
Pingfan Liu Nov. 21, 2022, 3:48 a.m. UTC | #38
On Sat, Nov 19, 2022 at 7:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Nov 18, 2022 at 08:08:35PM +0800, Pingfan Liu wrote:
> > On Thu, Nov 10, 2022 at 2:55 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > > > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > >
> > > > > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > [...]
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > > > > submit the data.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > > > > The test runs against the kernel with three commits reverted.
> > > > > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > > > > IRQ entry")
> > > > > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > > > > >
> > > > > > > > Summary from console.log
> > > > > > > > "
> > > > > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > > > > Results directory:
> > > > > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > > > > TREE04 no success message, 2897 successful version messages
> > > > > > > > Completed in 44512 vs. 450000
> > > > > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > ...
> > > > > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > 32 runs with runtime errors.
> > > > > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > > > > "
> > > > > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > > > > >
> > > > > > > Very good, thank you!
> > > > > > >
> > > > > > > Could you please clearly indicate what you tested?  For example, if
> > > > > > > you have an externally visible git tree, please point me at the tree
> > > > > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > > > > based on.
> > > > > > >
> > > > > >
> > > > > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > > > >
> > > > > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > > > > rcu#revert_tick_dep
> > > > >
> > > > > Thank you very much!
> > > > >
> > > > > > > Then I can try a long run on a larger collection of systems.
> > > > > > >
> > > > > >
> > > > > > Thank you very much.
> > > > > >
> > > > > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > > > > >
> > > > > >
> > > > > > Eager to see.
> > > > >
> > > > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > > > hours on v6.0, which precedes these commits, and everything passed.
> > > > >
> > > > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > > > At the moment, I have some concerns about this change.
> > > >
> > > > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > > > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > > > looked at were RCU CPU stall warnings with timer failures.
> > > >
> > > > This data suggests that the kernel is not yet ready for that commit
> > > > to be reverted.
> > >
> > > Even if the tests pass, can we really survive with this patch
> > > that he reverted?
> > > https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7
> > >
> > > If stop machine on a CPU spends a good amount of time in kernel mode, while a
> > > grace period starts on another CPU, then we're kind of screwed if we don't
> > > have the tick enabled right?
> >
> > In this case, I think multi_cpu_stop()->rcu_momentary_dyntick_idle()
> > can serve this purpose.
>
> There are a lot of things that can go wrong in this scenario.  Does that
> added rcu_momentary_dyntick_idle() cover everything that still needs to
> be covered?
>

Absorbed by the context "while a grace period starts on another CPU",
I assumed it is from the RCU perspective. Then I think every non-idle
cpu can report its quiescent state in that case.

But from a system perspective, I think that it is promising but not
sure. And that is what the TREE04 torture test tries to verify.

Thanks,

    Pingfan
Paul E. McKenney Nov. 21, 2022, 5:14 p.m. UTC | #39
On Mon, Nov 21, 2022 at 11:48:29AM +0800, Pingfan Liu wrote:
> On Sat, Nov 19, 2022 at 7:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Nov 18, 2022 at 08:08:35PM +0800, Pingfan Liu wrote:
> > > On Thu, Nov 10, 2022 at 2:55 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > > > > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > [...]
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > > > > > submit the data.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > > > > > The test runs against the kernel with three commits reverted.
> > > > > > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > > > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > > > > > IRQ entry")
> > > > > > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > > > > > >
> > > > > > > > > Summary from console.log
> > > > > > > > > "
> > > > > > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > > > > > Results directory:
> > > > > > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > > > > > TREE04 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44512 vs. 450000
> > > > > > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > ...
> > > > > > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44511 vs. 450000
> > > > > > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > > > > > Completed in 44510 vs. 450000
> > > > > > > > > 32 runs with runtime errors.
> > > > > > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > > > > > "
> > > > > > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > > > > > >
> > > > > > > > Very good, thank you!
> > > > > > > >
> > > > > > > > Could you please clearly indicate what you tested?  For example, if
> > > > > > > > you have an externally visible git tree, please point me at the tree
> > > > > > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > > > > > based on.
> > > > > > > >
> > > > > > >
> > > > > > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > > > > >
> > > > > > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > > > > > rcu#revert_tick_dep
> > > > > >
> > > > > > Thank you very much!
> > > > > >
> > > > > > > > Then I can try a long run on a larger collection of systems.
> > > > > > > >
> > > > > > >
> > > > > > > Thank you very much.
> > > > > > >
> > > > > > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > > > > > >
> > > > > > >
> > > > > > > Eager to see.
> > > > > >
> > > > > > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > > > > > hours on v6.0, which precedes these commits, and everything passed.
> > > > > >
> > > > > > I will run more, primarily on v6.0, but that is what I have thus far.
> > > > > > At the moment, I have some concerns about this change.
> > > > >
> > > > > OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> > > > > run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> > > > > looked at were RCU CPU stall warnings with timer failures.
> > > > >
> > > > > This data suggests that the kernel is not yet ready for that commit
> > > > > to be reverted.
> > > >
> > > > Even if the tests pass, can we really survive with this patch
> > > > that he reverted?
> > > > https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7
> > > >
> > > > If stop machine on a CPU spends a good amount of time in kernel mode, while a
> > > > grace period starts on another CPU, then we're kind of screwed if we don't
> > > > have the tick enabled right?
> > >
> > > In this case, I think multi_cpu_stop()->rcu_momentary_dyntick_idle()
> > > can serve this purpose.
> >
> > There are a lot of things that can go wrong in this scenario.  Does that
> > added rcu_momentary_dyntick_idle() cover everything that still needs to
> > be covered?
> 
> Absorbed by the context "while a grace period starts on another CPU",
> I assumed it is from the RCU perspective. Then I think every non-idle
> cpu can report its quiescent state in that case.
> 
> But from a system perspective, I think that it is promising but not
> sure. And that is what the TREE04 torture test tries to verify.

The specific case I was thinking of is when all but the incoming
CPU are spinning with interrupts disabled, and the incoming CPU is
getting hammered with scheduler-tick interrupts.  In that case, the RCU
grace-period kthread won't be running anyway.  So there will be nothing
to notice the calls to rcu_momentary_dyntick_idle().

							Thanx, Paul
diff mbox series

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8a829b64f5b2..f8bd0fc5fd2f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2164,13 +2164,19 @@  int rcutree_dead_cpu(unsigned int cpu)
 {
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
+	unsigned long flags;
+	u8 dying;
 
 	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
 		return 0;
 
 	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
-	// Stop-machine done, so allow nohz_full to disable tick.
-	tick_dep_clear(TICK_DEP_BIT_RCU);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	dying = --rnp->dying;
+	if (!dying)
+		// Stop-machine done, so allow nohz_full to disable tick.
+		tick_dep_clear(TICK_DEP_BIT_RCU);
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	return 0;
 }
 
@@ -4020,17 +4026,20 @@  int rcutree_offline_cpu(unsigned int cpu)
 	unsigned long flags;
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
+	u8 dying;
 
 	rdp = per_cpu_ptr(&rcu_data, cpu);
 	rnp = rdp->mynode;
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	rnp->ffmask &= ~rdp->grpmask;
+	/* Let rcutree_dead_cpu() know a new offlining. */
+	dying = rnp->dying++;
+	if (!dying)
+		// nohz_full CPUs need the tick for stop-machine to work quickly
+		tick_dep_set(TICK_DEP_BIT_RCU);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
 	rcutree_affinity_setting(cpu, cpu);
-
-	// nohz_full CPUs need the tick for stop-machine to work quickly
-	tick_dep_set(TICK_DEP_BIT_RCU);
 	return 0;
 }
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index d4a97e40ea9c..b508a12ac953 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -81,6 +81,7 @@  struct rcu_node {
 	int	grphi;		/* highest-numbered CPU here. */
 	u8	grpnum;		/* group number for next level up. */
 	u8	level;		/* root is at level 0. */
+	u8	dying;		/* num of concurrent rdp offlining */
 	bool	wait_blkd_tasks;/* Necessary to wait for blocked tasks to */
 				/*  exit RCU read-side critical sections */
 				/*  before propagating offline up the */