diff mbox series

[PATCHv2,2/3] rcu: Resort to cpu_dying_mask for affinity when offlining

Message ID 20220915055825.21525-3-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series rcu: Enhance the capability to cope with concurrent cpu offlining/onlining | expand

Commit Message

Pingfan Liu Sept. 15, 2022, 5:58 a.m. UTC
During offlining, the concurrent rcutree_offline_cpu() can not be aware
of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
information at that point and can be utilized.

Besides, a trivial change which removes the redudant call to
rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
rcutree_offline_cpu() can fully serve that purpose.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/tree.c        | 2 --
 kernel/rcu/tree_plugin.h | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

Comments

Frederic Weisbecker Sept. 16, 2022, 2:23 p.m. UTC | #1
On Thu, Sep 15, 2022 at 01:58:24PM +0800, Pingfan Liu wrote:
> During offlining, the concurrent rcutree_offline_cpu() can not be aware
> of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
> information at that point and can be utilized.
> 
> Besides, a trivial change which removes the redudant call to
> rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
> rcutree_offline_cpu() can fully serve that purpose.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c        | 2 --
>  kernel/rcu/tree_plugin.h | 6 ++++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..8a829b64f5b2 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu)
>  		return 0;
>  
>  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> -	/* Adjust any no-longer-needed kthreads. */
> -	rcu_boost_kthread_setaffinity(rnp, -1);
>  	// Stop-machine done, so allow nohz_full to disable tick.
>  	tick_dep_clear(TICK_DEP_BIT_RCU);
>  	return 0;

I would suggest to make this a separate change, for bisectability and
readability.

> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index ef6d3ae239b9..e5afc63bd97f 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  		    cpu != outgoingcpu)
>  			cpumask_set_cpu(cpu, cm);
>  	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> +	/*
> +	 * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> +	 * So resort to cpu_dying_mask, whose changes has already been visible.
> +	 */
> +	if (outgoingcpu != -1)
> +		cpumask_andnot(cm, cm, cpu_dying_mask);

I'm not sure how the infrastructure changes in your concurrent down patchset
but can the cpu_dying_mask concurrently change at this stage?

Thanks.

>  	if (cpumask_empty(cm))
>  		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
>  	set_cpus_allowed_ptr(t, cm);
> -- 
> 2.31.1
>
Pingfan Liu Sept. 19, 2022, 4:33 a.m. UTC | #2
On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> On Thu, Sep 15, 2022 at 01:58:24PM +0800, Pingfan Liu wrote:
> > During offlining, the concurrent rcutree_offline_cpu() can not be aware
> > of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
> > information at that point and can be utilized.
> >
> > Besides, a trivial change which removes the redudant call to
> > rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
> > rcutree_offline_cpu() can fully serve that purpose.
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c        | 2 --
> >  kernel/rcu/tree_plugin.h | 6 ++++++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..8a829b64f5b2 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> >               return 0;
> >
> >       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > -     /* Adjust any no-longer-needed kthreads. */
> > -     rcu_boost_kthread_setaffinity(rnp, -1);
> >       // Stop-machine done, so allow nohz_full to disable tick.
> >       tick_dep_clear(TICK_DEP_BIT_RCU);
> >       return 0;
>
> I would suggest to make this a separate change, for bisectability and
> readability.
>

OK, I will.

> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index ef6d3ae239b9..e5afc63bd97f 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> >                   cpu != outgoingcpu)
> >                       cpumask_set_cpu(cpu, cm);
> >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > +     /*
> > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> > +      * So resort to cpu_dying_mask, whose changes has already been visible.
> > +      */
> > +     if (outgoingcpu != -1)
> > +             cpumask_andnot(cm, cm, cpu_dying_mask);
>
> I'm not sure how the infrastructure changes in your concurrent down patchset
> but can the cpu_dying_mask concurrently change at this stage?
>

For the concurrent down patchset [1], it extends the cpu_down()
capability to let an initiator to tear down several cpus in a batch
and in parallel.

At the first step, all cpus to be torn down should experience
cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are
set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on
each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I
need to fix it by using another loop to call
cpuhp_kick_ap_work_async(cpu);)

At the outmost, the pair
cpu_maps_update_begin()/cpu_maps_update_done() still prevent any new
initiator from launching another concurrent hot-add/remove event. So
cpu_dying_mask can be stable during the batched and concurrent cpus'
teardown.

[1]: https://lore.kernel.org/all/20220822021520.6996-1-kernelfans@gmail.com/
[2]: https://lore.kernel.org/all/20220822021520.6996-4-kernelfans@gmail.com/

Thanks,

    Pingfan
Frederic Weisbecker Sept. 19, 2022, 10:34 a.m. UTC | #3
On Mon, Sep 19, 2022 at 12:33:23PM +0800, Pingfan Liu wrote:
> On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index ef6d3ae239b9..e5afc63bd97f 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > >                   cpu != outgoingcpu)
> > >                       cpumask_set_cpu(cpu, cm);
> > >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > > +     /*
> > > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> > > +      * So resort to cpu_dying_mask, whose changes has already been visible.
> > > +      */
> > > +     if (outgoingcpu != -1)
> > > +             cpumask_andnot(cm, cm, cpu_dying_mask);
> >
> > I'm not sure how the infrastructure changes in your concurrent down patchset
> > but can the cpu_dying_mask concurrently change at this stage?
> >
> 
> For the concurrent down patchset [1], it extends the cpu_down()
> capability to let an initiator to tear down several cpus in a batch
> and in parallel.
> 
> At the first step, all cpus to be torn down should experience
> cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are
> set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on
> each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I
> need to fix it by using another loop to call
> cpuhp_kick_ap_work_async(cpu);)

So if I understand correctly there is a synchronization point for all
CPUs between cpuhp_set_state() and CPUHP_AP_RCUTREE_ONLINE ?

And how about rollbacks through cpuhp_reset_state() ?

Thanks.
Pingfan Liu Sept. 20, 2022, 3:16 a.m. UTC | #4
On Mon, Sep 19, 2022 at 12:34:32PM +0200, Frederic Weisbecker wrote:
> On Mon, Sep 19, 2022 at 12:33:23PM +0800, Pingfan Liu wrote:
> > On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index ef6d3ae239b9..e5afc63bd97f 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > > >                   cpu != outgoingcpu)
> > > >                       cpumask_set_cpu(cpu, cm);
> > > >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > > > +     /*
> > > > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> > > > +      * So resort to cpu_dying_mask, whose changes has already been visible.
> > > > +      */
> > > > +     if (outgoingcpu != -1)
> > > > +             cpumask_andnot(cm, cm, cpu_dying_mask);
> > >
> > > I'm not sure how the infrastructure changes in your concurrent down patchset
> > > but can the cpu_dying_mask concurrently change at this stage?
> > >
> > 
> > For the concurrent down patchset [1], it extends the cpu_down()
> > capability to let an initiator to tear down several cpus in a batch
> > and in parallel.
> > 
> > At the first step, all cpus to be torn down should experience
> > cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are
> > set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on
> > each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I
> > need to fix it by using another loop to call
> > cpuhp_kick_ap_work_async(cpu);)
> 
> So if I understand correctly there is a synchronization point for all
> CPUs between cpuhp_set_state() and CPUHP_AP_RCUTREE_ONLINE ?
> 

Yes, your understanding is right.

> And how about rollbacks through cpuhp_reset_state() ?
> 

Originally, cpuhp_reset_state() is not considered in my fast kexec
reboot series since at that point, all devices have been shutdown and
have no way to back. The rebooting just adventures to move on.

But yes as you point out, cpuhp_reset_state() throws a challenge to keep
the stability of cpu_dying_mask.

Considering we have the following order.
1.
  set_cpu_dying(true)
  rcutree_offline_cpu()
2. when rollback
  set_cpu_dying(false)
  rcutree_online_cpu()


The dying mask is stable before rcu routines, and
rnp->boost_kthread_mutex can be used to build a order to access the
latest cpu_dying_mask as in [1/3].


Thanks,

	Pingfan
Frederic Weisbecker Sept. 20, 2022, 9 a.m. UTC | #5
On Tue, Sep 20, 2022 at 11:16:09AM +0800, Pingfan Liu wrote:
> On Mon, Sep 19, 2022 at 12:34:32PM +0200, Frederic Weisbecker wrote:
> > On Mon, Sep 19, 2022 at 12:33:23PM +0800, Pingfan Liu wrote:
> > > On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index ef6d3ae239b9..e5afc63bd97f 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > > > >                   cpu != outgoingcpu)
> > > > >                       cpumask_set_cpu(cpu, cm);
> > > > >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > > > > +     /*
> > > > > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> > > > > +      * So resort to cpu_dying_mask, whose changes has already been visible.
> > > > > +      */
> > > > > +     if (outgoingcpu != -1)
> > > > > +             cpumask_andnot(cm, cm, cpu_dying_mask);
> > > >
> > > > I'm not sure how the infrastructure changes in your concurrent down patchset
> > > > but can the cpu_dying_mask concurrently change at this stage?
> > > >
> > > 
> > > For the concurrent down patchset [1], it extends the cpu_down()
> > > capability to let an initiator to tear down several cpus in a batch
> > > and in parallel.
> > > 
> > > At the first step, all cpus to be torn down should experience
> > > cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are
> > > set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on
> > > each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I
> > > need to fix it by using another loop to call
> > > cpuhp_kick_ap_work_async(cpu);)
> > 
> > So if I understand correctly there is a synchronization point for all
> > CPUs between cpuhp_set_state() and CPUHP_AP_RCUTREE_ONLINE ?
> > 
> 
> Yes, your understanding is right.
> 
> > And how about rollbacks through cpuhp_reset_state() ?
> > 
> 
> Originally, cpuhp_reset_state() is not considered in my fast kexec
> reboot series since at that point, all devices have been shutdown and
> have no way to back. The rebooting just adventures to move on.
> 
> But yes as you point out, cpuhp_reset_state() throws a challenge to keep
> the stability of cpu_dying_mask.
> 
> Considering we have the following order.
> 1.
>   set_cpu_dying(true)
>   rcutree_offline_cpu()
> 2. when rollback
>   set_cpu_dying(false)
>   rcutree_online_cpu()
> 
> 
> The dying mask is stable before rcu routines, and
> rnp->boost_kthread_mutex can be used to build a order to access the
> latest cpu_dying_mask as in [1/3].

Ok thanks for the clarification!
Frederic Weisbecker Sept. 20, 2022, 9:38 a.m. UTC | #6
On Thu, Sep 15, 2022 at 01:58:24PM +0800, Pingfan Liu wrote:
> During offlining, the concurrent rcutree_offline_cpu() can not be aware
> of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
> information at that point and can be utilized.
> 
> Besides, a trivial change which removes the redudant call to
> rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
> rcutree_offline_cpu() can fully serve that purpose.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c        | 2 --
>  kernel/rcu/tree_plugin.h | 6 ++++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..8a829b64f5b2 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu)
>  		return 0;
>  
>  	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> -	/* Adjust any no-longer-needed kthreads. */
> -	rcu_boost_kthread_setaffinity(rnp, -1);
>  	// Stop-machine done, so allow nohz_full to disable tick.
>  	tick_dep_clear(TICK_DEP_BIT_RCU);
>  	return 0;
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index ef6d3ae239b9..e5afc63bd97f 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>  		    cpu != outgoingcpu)
>  			cpumask_set_cpu(cpu, cm);
>  	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> +	/*
> +	 * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.

For clarification, the comment could be:

           While concurrently offlining, rcu_report_dead() can race, making
           ->qsmaskinitnext unstable. So rely on cpu_dying_mask which is stable
           and already contains all the currently offlining CPUs.

Thanks!

> +	 * So resort to cpu_dying_mask, whose changes has already been visible.
> +	 */
> +	if (outgoingcpu != -1)
> +		cpumask_andnot(cm, cm, cpu_dying_mask);
>  	if (cpumask_empty(cm))
>  		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
>  	set_cpus_allowed_ptr(t, cm);
> -- 
> 2.31.1
>
Pingfan Liu Sept. 21, 2022, 11:48 a.m. UTC | #7
On Tue, Sep 20, 2022 at 5:39 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Thu, Sep 15, 2022 at 01:58:24PM +0800, Pingfan Liu wrote:
> > During offlining, the concurrent rcutree_offline_cpu() can not be aware
> > of each other through ->qsmaskinitnext.  But cpu_dying_mask carries such
> > information at that point and can be utilized.
> >
> > Besides, a trivial change which removes the redudant call to
> > rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since
> > rcutree_offline_cpu() can fully serve that purpose.
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c        | 2 --
> >  kernel/rcu/tree_plugin.h | 6 ++++++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..8a829b64f5b2 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu)
> >               return 0;
> >
> >       WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
> > -     /* Adjust any no-longer-needed kthreads. */
> > -     rcu_boost_kthread_setaffinity(rnp, -1);
> >       // Stop-machine done, so allow nohz_full to disable tick.
> >       tick_dep_clear(TICK_DEP_BIT_RCU);
> >       return 0;
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index ef6d3ae239b9..e5afc63bd97f 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> >                   cpu != outgoingcpu)
> >                       cpumask_set_cpu(cpu, cm);
> >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > +     /*
> > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
>
> For clarification, the comment could be:
>
>            While concurrently offlining, rcu_report_dead() can race, making
>            ->qsmaskinitnext unstable. So rely on cpu_dying_mask which is stable
>            and already contains all the currently offlining CPUs.
>

It  is a neat description.

Thanks,

    Pingfan
diff mbox series

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345..8a829b64f5b2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2169,8 +2169,6 @@  int rcutree_dead_cpu(unsigned int cpu)
 		return 0;
 
 	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
-	/* Adjust any no-longer-needed kthreads. */
-	rcu_boost_kthread_setaffinity(rnp, -1);
 	// Stop-machine done, so allow nohz_full to disable tick.
 	tick_dep_clear(TICK_DEP_BIT_RCU);
 	return 0;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index ef6d3ae239b9..e5afc63bd97f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1243,6 +1243,12 @@  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 		    cpu != outgoingcpu)
 			cpumask_set_cpu(cpu, cm);
 	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
+	/*
+	 * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
+	 * So resort to cpu_dying_mask, whose changes has already been visible.
+	 */
+	if (outgoingcpu != -1)
+		cpumask_andnot(cm, cm, cpu_dying_mask);
 	if (cpumask_empty(cm))
 		cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
 	set_cpus_allowed_ptr(t, cm);