diff mbox series

[1/3] rcu: remove redundant cpu affinity setting during teardown

Message ID 20220905033852.18988-1-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/3] rcu: remove redundant cpu affinity setting during teardown | expand

Commit Message

Pingfan Liu Sept. 5, 2022, 3:38 a.m. UTC
At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
called twice. Firstly by rcutree_offline_cpu(), then by
rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.

From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
is visible to the scheduler. Furthermore, a bit in cpu_active_mask
means that the cpu is suitable as a migration destination.

Now turning back to the case in rcu offlining. sched_cpu_deactivate()
has disabled the dying cpu as the migration destination before
rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
cpu, it will be migrated to another suitable online cpu by the scheduler.
So the affinity setting by rcutree_offline_cpu() is redundant and can be
eliminated.

Besides, this patch does an trival code rearrangement by unfolding
rcutree_affinity_setting() into rcutree_online_cpu(), considering that
the latter one is the only user of the former.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/tree.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

Comments

Paul E. McKenney Sept. 6, 2022, 5:24 p.m. UTC | #1
On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote:
> At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
> called twice. Firstly by rcutree_offline_cpu(), then by
> rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.
> 
> >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
> is visible to the scheduler. Furthermore, a bit in cpu_active_mask
> means that the cpu is suitable as a migration destination.
> 
> Now turning back to the case in rcu offlining. sched_cpu_deactivate()
> has disabled the dying cpu as the migration destination before
> rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
> cpu, it will be migrated to another suitable online cpu by the scheduler.
> So the affinity setting by rcutree_offline_cpu() is redundant and can be
> eliminated.
> 
> Besides, this patch does an trival code rearrangement by unfolding
> rcutree_affinity_setting() into rcutree_online_cpu(), considering that
> the latter one is the only user of the former.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c | 14 +-------------
>  1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..b90f6487fd45 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
>  	return 0;
>  }
>  
> -/*
> - * Update RCU priority boot kthread affinity for CPU-hotplug changes.
> - */
> -static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
> -{
> -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> -
> -	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
> -}

Good point, tiven how simple a wrapper this is and how little it is used,
getting rid of it does sound like a reasonable idea.

>  /*
>   * Near the end of the CPU-online process.  Pretty much all services
>   * enabled, and the CPU is now very much alive.
> @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
>  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>  		return 0; /* Too early in boot for scheduler work. */
>  	sync_sched_exp_online_cleanup(cpu);
> -	rcutree_affinity_setting(cpu, -1);
> +	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
>  
>  	// Stop-machine done, so allow nohz_full to disable tick.
>  	tick_dep_clear(TICK_DEP_BIT_RCU);
> @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu)
>  	rnp->ffmask &= ~rdp->grpmask;
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  
> -	rcutree_affinity_setting(cpu, cpu);

We do need to keep this one because the CPU is going away.

One the other hand, it might well be that we could get rid of the call
to rcutree_affinity_setting() in rcutree_dead_cpu().

Or am I missing something subtle here?

							Thanx, Paul

> -
>  	// nohz_full CPUs need the tick for stop-machine to work quickly
>  	tick_dep_set(TICK_DEP_BIT_RCU);
>  	return 0;
> -- 
> 2.31.1
>
Pingfan Liu Sept. 7, 2022, 1:40 a.m. UTC | #2
On Tue, Sep 06, 2022 at 10:24:41AM -0700, Paul E. McKenney wrote:
> On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote:
> > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
> > called twice. Firstly by rcutree_offline_cpu(), then by
> > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.
> > 
> > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
> > is visible to the scheduler. Furthermore, a bit in cpu_active_mask
> > means that the cpu is suitable as a migration destination.
> > 
> > Now turning back to the case in rcu offlining. sched_cpu_deactivate()
> > has disabled the dying cpu as the migration destination before
> > rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
> > cpu, it will be migrated to another suitable online cpu by the scheduler.
> > So the affinity setting by rcutree_offline_cpu() is redundant and can be
> > eliminated.
> > 
> > Besides, this patch does an trival code rearrangement by unfolding
> > rcutree_affinity_setting() into rcutree_online_cpu(), considering that
> > the latter one is the only user of the former.
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c | 14 +-------------
> >  1 file changed, 1 insertion(+), 13 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..b90f6487fd45 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> >  	return 0;
> >  }
> >  
> > -/*
> > - * Update RCU priority boot kthread affinity for CPU-hotplug changes.
> > - */
> > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
> > -{
> > -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > -
> > -	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
> > -}
> 
> Good point, tiven how simple a wrapper this is and how little it is used,
> getting rid of it does sound like a reasonable idea.
> 
> >  /*
> >   * Near the end of the CPU-online process.  Pretty much all services
> >   * enabled, and the CPU is now very much alive.
> > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
> >  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> >  		return 0; /* Too early in boot for scheduler work. */
> >  	sync_sched_exp_online_cleanup(cpu);
> > -	rcutree_affinity_setting(cpu, -1);
> > +	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
> >  
> >  	// Stop-machine done, so allow nohz_full to disable tick.
> >  	tick_dep_clear(TICK_DEP_BIT_RCU);
> > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> >  	rnp->ffmask &= ~rdp->grpmask;
> >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  
> > -	rcutree_affinity_setting(cpu, cpu);
> 
> We do need to keep this one because the CPU is going away.
> 
> One the other hand, it might well be that we could get rid of the call
> to rcutree_affinity_setting() in rcutree_dead_cpu().
> 
> Or am I missing something subtle here?
> 

Oops, I think I need to rephrase my commit log to describe this nuance.
The keypoint is whether ->qsmaskinitnext is stable.

The teardown code path on a single dying cpu looks like:

sched_cpu_deactivate() // prevent this dying cpu as a migration dst.

rcutree_offline_cpu()  // as a result, the scheduler core will take care
                       // of the transient affinity mismatching until
		       // rcutree_dead_cpu(). (I think it also stands in
		       // the concurrent offlining)

rcu_report_dead()      // running on the dying cpu, and clear its bit in ->qsmaskinitnext

rcutree_dead_cpu()     // running on the initiator (a initiator cpu will
                       // execute this function for each dying cpu)
		       // At this point, ->qsmaskinitnext reflects the
		       // offlining, and the affinity can get right.

Sorry that my commit log had emphasized on the first part, but forgot to
mention the ->qsmaskinitnext.


Does this justification stand?

Thanks,

	Pingfan
Paul E. McKenney Sept. 10, 2022, 8:36 p.m. UTC | #3
On Wed, Sep 07, 2022 at 09:40:29AM +0800, Pingfan Liu wrote:
> On Tue, Sep 06, 2022 at 10:24:41AM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote:
> > > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
> > > called twice. Firstly by rcutree_offline_cpu(), then by
> > > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.
> > > 
> > > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
> > > is visible to the scheduler. Furthermore, a bit in cpu_active_mask
> > > means that the cpu is suitable as a migration destination.
> > > 
> > > Now turning back to the case in rcu offlining. sched_cpu_deactivate()
> > > has disabled the dying cpu as the migration destination before
> > > rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
> > > cpu, it will be migrated to another suitable online cpu by the scheduler.
> > > So the affinity setting by rcutree_offline_cpu() is redundant and can be
> > > eliminated.
> > > 
> > > Besides, this patch does an trival code rearrangement by unfolding
> > > rcutree_affinity_setting() into rcutree_online_cpu(), considering that
> > > the latter one is the only user of the former.
> > > 
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > > Cc: Frederic Weisbecker <frederic@kernel.org>
> > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > > To: rcu@vger.kernel.org
> > > ---
> > >  kernel/rcu/tree.c | 14 +-------------
> > >  1 file changed, 1 insertion(+), 13 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 79aea7df4345..b90f6487fd45 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > >  	return 0;
> > >  }
> > >  
> > > -/*
> > > - * Update RCU priority boot kthread affinity for CPU-hotplug changes.
> > > - */
> > > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
> > > -{
> > > -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > -
> > > -	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
> > > -}
> > 
> > Good point, tiven how simple a wrapper this is and how little it is used,
> > getting rid of it does sound like a reasonable idea.
> > 
> > >  /*
> > >   * Near the end of the CPU-online process.  Pretty much all services
> > >   * enabled, and the CPU is now very much alive.
> > > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
> > >  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> > >  		return 0; /* Too early in boot for scheduler work. */
> > >  	sync_sched_exp_online_cleanup(cpu);
> > > -	rcutree_affinity_setting(cpu, -1);
> > > +	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
> > >  
> > >  	// Stop-machine done, so allow nohz_full to disable tick.
> > >  	tick_dep_clear(TICK_DEP_BIT_RCU);
> > > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >  	rnp->ffmask &= ~rdp->grpmask;
> > >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > >  
> > > -	rcutree_affinity_setting(cpu, cpu);
> > 
> > We do need to keep this one because the CPU is going away.
> > 
> > One the other hand, it might well be that we could get rid of the call
> > to rcutree_affinity_setting() in rcutree_dead_cpu().
> > 
> > Or am I missing something subtle here?
> 
> Oops, I think I need to rephrase my commit log to describe this nuance.
> The keypoint is whether ->qsmaskinitnext is stable.
> 
> The teardown code path on a single dying cpu looks like:
> 
> sched_cpu_deactivate() // prevent this dying cpu as a migration dst.

Suppose that this was the last CPU that the task was permitted to
run on.

> rcutree_offline_cpu()  // as a result, the scheduler core will take care
>                        // of the transient affinity mismatching until
> 		       // rcutree_dead_cpu(). (I think it also stands in
> 		       // the concurrent offlining)
> 
> rcu_report_dead()      // running on the dying cpu, and clear its bit in ->qsmaskinitnext
> 
> rcutree_dead_cpu()     // running on the initiator (a initiator cpu will
>                        // execute this function for each dying cpu)
> 		       // At this point, ->qsmaskinitnext reflects the
> 		       // offlining, and the affinity can get right.
> 
> Sorry that my commit log had emphasized on the first part, but forgot to
> mention the ->qsmaskinitnext.
> 
> 
> Does this justification stand?

We should ensure that the task's permitted set of CPUs always contained
at least one online CPU.  Unless I am missing something, your suggested
change will sometimes end up with the task having no online CPUs in
its mask.

So what am I missing here?

							Thanx, Paul
Pingfan Liu Sept. 14, 2022, 10:12 a.m. UTC | #4
On Sat, Sep 10, 2022 at 01:36:22PM -0700, Paul E. McKenney wrote:
> On Wed, Sep 07, 2022 at 09:40:29AM +0800, Pingfan Liu wrote:
> > On Tue, Sep 06, 2022 at 10:24:41AM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote:
> > > > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
> > > > called twice. Firstly by rcutree_offline_cpu(), then by
> > > > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.
> > > > 
> > > > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
> > > > is visible to the scheduler. Furthermore, a bit in cpu_active_mask
> > > > means that the cpu is suitable as a migration destination.
> > > > 
> > > > Now turning back to the case in rcu offlining. sched_cpu_deactivate()
> > > > has disabled the dying cpu as the migration destination before
> > > > rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
> > > > cpu, it will be migrated to another suitable online cpu by the scheduler.
> > > > So the affinity setting by rcutree_offline_cpu() is redundant and can be
> > > > eliminated.
> > > > 
> > > > Besides, this patch does an trival code rearrangement by unfolding
> > > > rcutree_affinity_setting() into rcutree_online_cpu(), considering that
> > > > the latter one is the only user of the former.
> > > > 
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > > > Cc: Frederic Weisbecker <frederic@kernel.org>
> > > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > > > To: rcu@vger.kernel.org
> > > > ---
> > > >  kernel/rcu/tree.c | 14 +-------------
> > > >  1 file changed, 1 insertion(+), 13 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 79aea7df4345..b90f6487fd45 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > > >  	return 0;
> > > >  }
> > > >  
> > > > -/*
> > > > - * Update RCU priority boot kthread affinity for CPU-hotplug changes.
> > > > - */
> > > > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
> > > > -{
> > > > -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > -
> > > > -	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
> > > > -}
> > > 
> > > Good point, tiven how simple a wrapper this is and how little it is used,
> > > getting rid of it does sound like a reasonable idea.
> > > 
> > > >  /*
> > > >   * Near the end of the CPU-online process.  Pretty much all services
> > > >   * enabled, and the CPU is now very much alive.
> > > > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
> > > >  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> > > >  		return 0; /* Too early in boot for scheduler work. */
> > > >  	sync_sched_exp_online_cleanup(cpu);
> > > > -	rcutree_affinity_setting(cpu, -1);
> > > > +	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
> > > >  
> > > >  	// Stop-machine done, so allow nohz_full to disable tick.
> > > >  	tick_dep_clear(TICK_DEP_BIT_RCU);
> > > > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > >  	rnp->ffmask &= ~rdp->grpmask;
> > > >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > >  
> > > > -	rcutree_affinity_setting(cpu, cpu);
> > > 
> > > We do need to keep this one because the CPU is going away.
> > > 
> > > One the other hand, it might well be that we could get rid of the call
> > > to rcutree_affinity_setting() in rcutree_dead_cpu().
> > > 
> > > Or am I missing something subtle here?
> > 
> > Oops, I think I need to rephrase my commit log to describe this nuance.
> > The keypoint is whether ->qsmaskinitnext is stable.
> > 
> > The teardown code path on a single dying cpu looks like:
> > 
> > sched_cpu_deactivate() // prevent this dying cpu as a migration dst.
> 
> Suppose that this was the last CPU that the task was permitted to
> run on.
> 

Get your concern, please see the comment below.

> > rcutree_offline_cpu()  // as a result, the scheduler core will take care
> >                        // of the transient affinity mismatching until
> > 		       // rcutree_dead_cpu(). (I think it also stands in
> > 		       // the concurrent offlining)
> > 
> > rcu_report_dead()      // running on the dying cpu, and clear its bit in ->qsmaskinitnext
> > 
> > rcutree_dead_cpu()     // running on the initiator (a initiator cpu will
> >                        // execute this function for each dying cpu)
> > 		       // At this point, ->qsmaskinitnext reflects the
> > 		       // offlining, and the affinity can get right.
> > 
> > Sorry that my commit log had emphasized on the first part, but forgot to
> > mention the ->qsmaskinitnext.
> > 
> > 
> > Does this justification stand?
> 
> We should ensure that the task's permitted set of CPUs always contained
> at least one online CPU.  Unless I am missing something, your suggested
> change will sometimes end up with the task having no online CPUs in
> its mask.
> 

Actually, the scheduler will pick up one online cpu for the boost
thread.

On the last cpu, boost is subject to migration, and is pushed away by
__balance_push_cpu_stop()
{
        if (task_rq(p) == rq && task_on_rq_queued(p)) {
                cpu = select_fallback_rq(rq->cpu, p);
                rq = __migrate_task(rq, &rf, p, cpu);
        }
}

Now, turning to select_fallback_rq(), inside that function, if
p->cpus_ptr has no suitable cpus, then case cpuset or possible will make it
point to cpuset or cpu_possible_mask.  So finally, there is a valid cpu
is picked up. (I have not found the online cpu there. The trick should hide
elsewhere. But that rendered the same result as in v5.16, where
rcu_boost_kthread_setaffinity()->cpumask_setall(cm))

But now, in 6.0, it changes into housekeeping_cpumask(HK_TYPE_RCU),
which appeals for a more careful thinking.
If there is a cpuset for housekeeping so that select_fallback_rq() can
pick up one from that cpuset?

To Frederic, could you show some light here and is it worth introducing
such a cpuset so that select_fallback_rq() can pick up a cpu in
housekeeping set in this case?


Anyway, as the last resort, TICK_DEP_BIT_RCU has already appealed for
the awareness of the concurrent rcutree_offline_cpu(), the affinity
setting could be done there if select_fallback_rq() can not work.


Thanks,

	Pingfan
Frederic Weisbecker Sept. 14, 2022, 12:27 p.m. UTC | #5
On Wed, Sep 14, 2022 at 06:12:27PM +0800, Pingfan Liu wrote:
> On Sat, Sep 10, 2022 at 01:36:22PM -0700, Paul E. McKenney wrote:
> Actually, the scheduler will pick up one online cpu for the boost
> thread.
> 
> On the last cpu, boost is subject to migration, and is pushed away by
> __balance_push_cpu_stop()
> {
>         if (task_rq(p) == rq && task_on_rq_queued(p)) {
>                 cpu = select_fallback_rq(rq->cpu, p);
>                 rq = __migrate_task(rq, &rf, p, cpu);
>         }
> }
> 
> Now, turning to select_fallback_rq(), inside that function, if
> p->cpus_ptr has no suitable cpus, then case cpuset or possible will make it
> point to cpuset or cpu_possible_mask.  So finally, there is a valid cpu
> is picked up. (I have not found the online cpu there. The trick should hide
> elsewhere. But that rendered the same result as in v5.16, where
> rcu_boost_kthread_setaffinity()->cpumask_setall(cm))
> 
> But now, in 6.0, it changes into housekeeping_cpumask(HK_TYPE_RCU),
> which appeals for a more careful thinking.
> If there is a cpuset for housekeeping so that select_fallback_rq() can
> pick up one from that cpuset?
> 
> To Frederic, could you show some light here and is it worth introducing
> such a cpuset so that select_fallback_rq() can pick up a cpu in
> housekeeping set in this case?
> 
> 
> Anyway, as the last resort, TICK_DEP_BIT_RCU has already appealed for
> the awareness of the concurrent rcutree_offline_cpu(), the affinity
> setting could be done there if select_fallback_rq() can not work.

Here is the way I understand it: suppose the outgoing CPU was the last online
one in this rcu node, in this case the migration performed in
sched_cpu_deactivate() will find only one CPU in p->cpus_mask, but that CPU is
now out of cpu_active_mask, so the task is affined by default to cpu_possible_mask.

So there is a short window between sched_cpu_deactivate() and
rcutree_offline_cpu() where the task may run on all cpu_possible_mask. Does it
matter? Maybe, I don't know...

After that we reach rcutree_offline_cpu() -> rcu_boost_kthread_setaffinity()
that fails to fill the cpumask since the node is now empty. As the resulting
cpumask is empty, it fills it as a last resort to the HK_TYPE_RCU housekeeping
set which has to have at least 1 online CPU (nohz_full constraint). So the task
is finally affine to that housekeeping set.

Like Paul said, I'd rather indeed remove the rcu_boost_kthread_setaffinity()
call from rcutree_dead_cpu() as this one looks useless.

Thanks.




> 
> Thanks,
> 
> 	Pingfan
Pingfan Liu Sept. 14, 2022, 1:34 p.m. UTC | #6
On Wed, Sep 14, 2022 at 8:27 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Wed, Sep 14, 2022 at 06:12:27PM +0800, Pingfan Liu wrote:
> > On Sat, Sep 10, 2022 at 01:36:22PM -0700, Paul E. McKenney wrote:
> > Actually, the scheduler will pick up one online cpu for the boost
> > thread.
> >
> > On the last cpu, boost is subject to migration, and is pushed away by
> > __balance_push_cpu_stop()
> > {
> >         if (task_rq(p) == rq && task_on_rq_queued(p)) {
> >                 cpu = select_fallback_rq(rq->cpu, p);
> >                 rq = __migrate_task(rq, &rf, p, cpu);
> >         }
> > }
> >
> > Now, turning to select_fallback_rq(), inside that function, if
> > p->cpus_ptr has no suitable cpus, then case cpuset or possible will make it
> > point to cpuset or cpu_possible_mask.  So finally, there is a valid cpu
> > is picked up. (I have not found the online cpu there. The trick should hide
> > elsewhere. But that rendered the same result as in v5.16, where
> > rcu_boost_kthread_setaffinity()->cpumask_setall(cm))
> >
> > But now, in 6.0, it changes into housekeeping_cpumask(HK_TYPE_RCU),
> > which appeals for a more careful thinking.
> > If there is a cpuset for housekeeping so that select_fallback_rq() can
> > pick up one from that cpuset?
> >
> > To Frederic, could you show some light here and is it worth introducing
> > such a cpuset so that select_fallback_rq() can pick up a cpu in
> > housekeeping set in this case?
> >
> >
> > Anyway, as the last resort, TICK_DEP_BIT_RCU has already appealed for
> > the awareness of the concurrent rcutree_offline_cpu(), the affinity
> > setting could be done there if select_fallback_rq() can not work.
>
> Here is the way I understand it: suppose the outgoing CPU was the last online
> one in this rcu node, in this case the migration performed in
> sched_cpu_deactivate() will find only one CPU in p->cpus_mask, but that CPU is
> now out of cpu_active_mask, so the task is affined by default to cpu_possible_mask.
>
> So there is a short window between sched_cpu_deactivate() and
> rcutree_offline_cpu() where the task may run on all cpu_possible_mask. Does it
> matter? Maybe, I don't know...
>

OK, then it is pointless to introduce a housekeeping cpuset.

> After that we reach rcutree_offline_cpu() -> rcu_boost_kthread_setaffinity()
> that fails to fill the cpumask since the node is now empty. As the resulting
> cpumask is empty, it fills it as a last resort to the HK_TYPE_RCU housekeeping
> set which has to have at least 1 online CPU (nohz_full constraint). So the task
> is finally affine to that housekeeping set.
>

Yes, exactly.

> Like Paul said, I'd rather indeed remove the rcu_boost_kthread_setaffinity()
> call from rcutree_dead_cpu() as this one looks useless.
>

OK, I will take that way.

Thanks to both of you for the generous help. I will bring up V2 soon.


Thanks,

    Pingfan
diff mbox series

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345..b90f6487fd45 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3978,16 +3978,6 @@  int rcutree_prepare_cpu(unsigned int cpu)
 	return 0;
 }
 
-/*
- * Update RCU priority boot kthread affinity for CPU-hotplug changes.
- */
-static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
-{
-	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
-
-	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
-}
-
 /*
  * Near the end of the CPU-online process.  Pretty much all services
  * enabled, and the CPU is now very much alive.
@@ -4006,7 +3996,7 @@  int rcutree_online_cpu(unsigned int cpu)
 	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
 		return 0; /* Too early in boot for scheduler work. */
 	sync_sched_exp_online_cleanup(cpu);
-	rcutree_affinity_setting(cpu, -1);
+	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
 
 	// Stop-machine done, so allow nohz_full to disable tick.
 	tick_dep_clear(TICK_DEP_BIT_RCU);
@@ -4029,8 +4019,6 @@  int rcutree_offline_cpu(unsigned int cpu)
 	rnp->ffmask &= ~rdp->grpmask;
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
-	rcutree_affinity_setting(cpu, cpu);
-
 	// nohz_full CPUs need the tick for stop-machine to work quickly
 	tick_dep_set(TICK_DEP_BIT_RCU);
 	return 0;