diff mbox series

[1/6] xen/sched: call cpu_disable_scheduler() via cpu notifier

Message ID 20190318131155.29450-2-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series xen: simplify suspend/resume handling | expand

Commit Message

Jürgen Groß March 18, 2019, 1:11 p.m. UTC
cpu_disable_scheduler() is being called from __cpu_disable() today.
There is no need to call it on the cpu just being disabled, so use
the CPU_DEAD case of the cpu notifier chain.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/smpboot.c |  3 ---
 xen/common/schedule.c  | 12 +++++-------
 2 files changed, 5 insertions(+), 10 deletions(-)

Comments

Andrew Cooper March 27, 2019, 3:34 p.m. UTC | #1
On 18/03/2019 13:11, Juergen Gross wrote:
> cpu_disable_scheduler() is being called from __cpu_disable() today.
> There is no need to call it on the cpu just being disabled, so use
> the CPU_DEAD case of the cpu notifier chain.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  xen/arch/x86/smpboot.c |  3 ---
>  xen/common/schedule.c  | 12 +++++-------
>  2 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index 7d1226d7bc..b7a0a4a419 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -1221,9 +1221,6 @@ void __cpu_disable(void)
>      cpumask_clear_cpu(cpu, &cpu_online_map);
>      fixup_irqs(&cpu_online_map, 1);
>      fixup_eoi();
> -
> -    if ( cpu_disable_scheduler(cpu) )
> -        BUG();
>  }

It looks like ARM needs an equivalent adjustment.

>  
>  void __cpu_die(unsigned int cpu)
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 60755a631e..665747f247 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -773,8 +773,9 @@ void restore_vcpu_affinity(struct domain *d)
>  }
>  
>  /*
> - * This function is used by cpu_hotplug code from stop_machine context
> + * This function is used by cpu_hotplug code via cpu notifier chain
>   * and from cpupools to switch schedulers on a cpu.
> + * Caller must get domlist_read_lock.

s/get/hold/ ?

With at least the ARM side adjusted, Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>

>   */
>  int cpu_disable_scheduler(unsigned int cpu)
>  {
> @@ -789,12 +790,6 @@ int cpu_disable_scheduler(unsigned int cpu)
>      if ( c == NULL )
>          return ret;
>  
> -    /*
> -     * We'd need the domain RCU lock, but:
> -     *  - when we are called from cpupool code, it's acquired there already;
> -     *  - when we are called for CPU teardown, we're in stop-machine context,
> -     *    so that's not be a problem.
> -     */
>      for_each_domain_in_cpupool ( d, c )
>      {
>          for_each_vcpu ( d, v )
> @@ -1738,6 +1733,9 @@ static int cpu_schedule_callback(
>          rc = cpu_schedule_up(cpu);
>          break;
>      case CPU_DEAD:
> +        rcu_read_lock(&domlist_read_lock);
> +        cpu_disable_scheduler(cpu);
> +        rcu_read_unlock(&domlist_read_lock);
>          SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
>          /* Fallthrough */
>      case CPU_UP_CANCELED:
George Dunlap March 27, 2019, 3:35 p.m. UTC | #2
On 3/18/19 1:11 PM, Juergen Gross wrote:
> cpu_disable_scheduler() is being called from __cpu_disable() today.
> There is no need to call it on the cpu just being disabled, so use
> the CPU_DEAD case of the cpu notifier chain.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Scheduler change:

Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich March 27, 2019, 4:22 p.m. UTC | #3
>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote:
> @@ -1738,6 +1733,9 @@ static int cpu_schedule_callback(
>          rc = cpu_schedule_up(cpu);
>          break;
>      case CPU_DEAD:
> +        rcu_read_lock(&domlist_read_lock);
> +        cpu_disable_scheduler(cpu);
> +        rcu_read_unlock(&domlist_read_lock);
>          SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
>          /* Fallthrough */
>      case CPU_UP_CANCELED:

cpu_disable_scheduler() has a return value (and hence means to
fail) - is ignoring this here really appropriate?

Also while indeed (as the description says) there's no need to
run the function on the CPU itself, it's not obvious to me that
it's safe to run it outside of stop_machine() context. Or to be
more precise, it's not clear to me that leaving stop_machine()
context with the adjustments not done yet is not going to
lead to problems (due to the gap between leaving that context
and acquiring the RCU lock). Could you clarify this in the
description, please (if it indeed is fine this way)?

Jan
Dario Faggioli March 27, 2019, 4:24 p.m. UTC | #4
On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote:
> cpu_disable_scheduler() is being called from __cpu_disable() today.
> There is no need to call it on the cpu just being disabled, so use
> the CPU_DEAD case of the cpu notifier chain.
> 
So, what do you mean with "There is no need to call it on the cpu just
being disabled"?

Because we still (even after this patch, I mean) call
cpu_disable_scheduler() on all non-boot CPUs, aren't we? It's just that
right now we call it from __cpu_disable(), with the patch we call it
slightly later.

And another difference looks to me to be that right now we call
cpu_disable_scheduler() from stop-machine context, which I think is no
longer true with this patch. Perhaps the changelog could tell why that
is ok?

Regards,
Dario
Jürgen Groß March 27, 2019, 4:31 p.m. UTC | #5
On 27/03/2019 17:24, Dario Faggioli wrote:
> On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote:
>> cpu_disable_scheduler() is being called from __cpu_disable() today.
>> There is no need to call it on the cpu just being disabled, so use
>> the CPU_DEAD case of the cpu notifier chain.
>>
> So, what do you mean with "There is no need to call it on the cpu just
> being disabled"?
> 
> Because we still (even after this patch, I mean) call
> cpu_disable_scheduler() on all non-boot CPUs, aren't we? It's just that
> right now we call it from __cpu_disable(), with the patch we call it
> slightly later.

The CPU_DEAD notifier chain is called on the CPU requesting the other
one to go down (so on the boot CPU in suspend case). So we call it _for_
all non-boot CPUs in the boot CPU.

> And another difference looks to me to be that right now we call
> cpu_disable_scheduler() from stop-machine context, which I think is no
> longer true with this patch. Perhaps the changelog could tell why that
> is ok?

Okay, I'll add something.


Juergen
Jürgen Groß March 27, 2019, 4:45 p.m. UTC | #6
On 27/03/2019 17:22, Jan Beulich wrote:
>>>> On 18.03.19 at 14:11, <jgross@suse.com> wrote:
>> @@ -1738,6 +1733,9 @@ static int cpu_schedule_callback(
>>          rc = cpu_schedule_up(cpu);
>>          break;
>>      case CPU_DEAD:
>> +        rcu_read_lock(&domlist_read_lock);
>> +        cpu_disable_scheduler(cpu);
>> +        rcu_read_unlock(&domlist_read_lock);
>>          SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
>>          /* Fallthrough */
>>      case CPU_UP_CANCELED:
> 
> cpu_disable_scheduler() has a return value (and hence means to
> fail) - is ignoring this here really appropriate?

You are right, I should handle those cases in cpu_disable_scheduler().
Without the complete series committed this will result in a case not
handled correctly: dom0 trying to pin a vcpu to a physical cpu other
than cpu 0 via SCHEDOP_pin_override and suspending in that state. I'm
not aware of any dom0 trying to do that.

> Also while indeed (as the description says) there's no need to
> run the function on the CPU itself, it's not obvious to me that
> it's safe to run it outside of stop_machine() context. Or to be
> more precise, it's not clear to me that leaving stop_machine()
> context with the adjustments not done yet is not going to
> lead to problems (due to the gap between leaving that context
> and acquiring the RCU lock). Could you clarify this in the
> description, please (if it indeed is fine this way)?

It is fine, as the chances are zero that any code will run on the cpu
just taken down and that cpu is not holding any locks we might need.

I'll add that to the commit message.


Juergen
Dario Faggioli March 27, 2019, 4:51 p.m. UTC | #7
On Wed, 2019-03-27 at 17:31 +0100, Juergen Gross wrote:
> On 27/03/2019 17:24, Dario Faggioli wrote:
> > On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote:
> > > cpu_disable_scheduler() is being called from __cpu_disable()
> > > today.
> > > There is no need to call it on the cpu just being disabled, so
> > > use
> > > the CPU_DEAD case of the cpu notifier chain.
> > > 
> > So, what do you mean with "There is no need to call it on the cpu
> > just
> > being disabled"?
> > 
> > Because we still (even after this patch, I mean) call
> > cpu_disable_scheduler() on all non-boot CPUs, aren't we? It's just
> > that
> > right now we call it from __cpu_disable(), with the patch we call
> > it
> > slightly later.
> 
> The CPU_DEAD notifier chain is called on the CPU requesting the other
> one to go down (so on the boot CPU in suspend case). So we call it
> _for_
> all non-boot CPUs in the boot CPU.
> 
Mmm... ok, I see what you mean now.

I guess part of "the problem" is that "call func on cpu A" reads, at
least to me, as both 1) call func so that it acts on and change the
state of cpu A, and 2) call func in such a way that it executes on cpu
A.

But I'm no native speaker, so it may very well be that the confusion is
all and only mine.

Dario
Jürgen Groß March 27, 2019, 4:53 p.m. UTC | #8
On 27/03/2019 17:51, Dario Faggioli wrote:
> On Wed, 2019-03-27 at 17:31 +0100, Juergen Gross wrote:
>> On 27/03/2019 17:24, Dario Faggioli wrote:
>>> On Mon, 2019-03-18 at 14:11 +0100, Juergen Gross wrote:
>>>> cpu_disable_scheduler() is being called from __cpu_disable()
>>>> today.
>>>> There is no need to call it on the cpu just being disabled, so
>>>> use
>>>> the CPU_DEAD case of the cpu notifier chain.
>>>>
>>> So, what do you mean with "There is no need to call it on the cpu
>>> just
>>> being disabled"?
>>>
>>> Because we still (even after this patch, I mean) call
>>> cpu_disable_scheduler() on all non-boot CPUs, aren't we? It's just
>>> that
>>> right now we call it from __cpu_disable(), with the patch we call
>>> it
>>> slightly later.
>>
>> The CPU_DEAD notifier chain is called on the CPU requesting the other
>> one to go down (so on the boot CPU in suspend case). So we call it
>> _for_
>> all non-boot CPUs in the boot CPU.
>>
> Mmm... ok, I see what you mean now.
> 
> I guess part of "the problem" is that "call func on cpu A" reads, at
> least to me, as both 1) call func so that it acts on and change the
> state of cpu A, and 2) call func in such a way that it executes on cpu
> A.

I'll rephrase to "execute func on cpu...".


Juergen
Jan Beulich March 27, 2019, 4:58 p.m. UTC | #9
>>> On 27.03.19 at 17:45, <jgross@suse.com> wrote:
> On 27/03/2019 17:22, Jan Beulich wrote:
>> Also while indeed (as the description says) there's no need to
>> run the function on the CPU itself, it's not obvious to me that
>> it's safe to run it outside of stop_machine() context. Or to be
>> more precise, it's not clear to me that leaving stop_machine()
>> context with the adjustments not done yet is not going to
>> lead to problems (due to the gap between leaving that context
>> and acquiring the RCU lock). Could you clarify this in the
>> description, please (if it indeed is fine this way)?
> 
> It is fine, as the chances are zero that any code will run on the cpu
> just taken down and that cpu is not holding any locks we might need.

Well, of course nothing's going to run on that CPU anymore.
But vCPU-s may still have associations with it, so what I'm
worried about is e.g. something finding v->processor pointing
at an offline CPU and getting confused. Another, more exotic
(or should I say contrived) scenario might be a soft-online
request coming very quickly after a prior soft-offline one, with
this function not having got around to run yet. Or basically
anything else that accesses the same state the function
means to update (or use).

Jan
Jürgen Groß March 27, 2019, 5:06 p.m. UTC | #10
On 27/03/2019 17:58, Jan Beulich wrote:
>>>> On 27.03.19 at 17:45, <jgross@suse.com> wrote:
>> On 27/03/2019 17:22, Jan Beulich wrote:
>>> Also while indeed (as the description says) there's no need to
>>> run the function on the CPU itself, it's not obvious to me that
>>> it's safe to run it outside of stop_machine() context. Or to be
>>> more precise, it's not clear to me that leaving stop_machine()
>>> context with the adjustments not done yet is not going to
>>> lead to problems (due to the gap between leaving that context
>>> and acquiring the RCU lock). Could you clarify this in the
>>> description, please (if it indeed is fine this way)?
>>
>> It is fine, as the chances are zero that any code will run on the cpu
>> just taken down and that cpu is not holding any locks we might need.
> 
> Well, of course nothing's going to run on that CPU anymore.
> But vCPU-s may still have associations with it, so what I'm
> worried about is e.g. something finding v->processor pointing
> at an offline CPU and getting confused. Another, more exotic

v->processor is allowed to have a stale value as long as the vcpu
isn't running.

> (or should I say contrived) scenario might be a soft-online
> request coming very quickly after a prior soft-offline one, with
> this function not having got around to run yet. Or basically
> anything else that accesses the same state the function
> means to update (or use).

The CPU_DEAD notifier chain is activated before calling
cpu_hotplug_done(). I don't see how an online request could make it
in between.


Juergen
Dario Faggioli March 27, 2019, 11:05 p.m. UTC | #11
On Wed, 2019-03-27 at 18:06 +0100, Juergen Gross wrote:
> On 27/03/2019 17:58, Jan Beulich wrote:
> > > > > 
> > Well, of course nothing's going to run on that CPU anymore.
> > But vCPU-s may still have associations with it, so what I'm
> > worried about is e.g. something finding v->processor pointing
> > at an offline CPU and getting confused. Another, more exotic
> 
> v->processor is allowed to have a stale value as long as the vcpu
> isn't running.
> 
I was also concerned about things being done no longer in stop-machine, 
like Jan, and in fact, I asked for similar reassurances.

Thinking more about this, I agree with Juergen that it should work. In
fact, it's important for v->processor to point to a CPU that has its
scheduling data structure allocated and valid.

In current approach, right before suspend, that is only true for the
BSP, and that's why we move every vcpu there. But after this series,
since we don't deallocate such structs any longer, it should be ok that
v->processor retains its value.

Regards,
Dario
diff mbox series

Patch

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 7d1226d7bc..b7a0a4a419 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1221,9 +1221,6 @@  void __cpu_disable(void)
     cpumask_clear_cpu(cpu, &cpu_online_map);
     fixup_irqs(&cpu_online_map, 1);
     fixup_eoi();
-
-    if ( cpu_disable_scheduler(cpu) )
-        BUG();
 }
 
 void __cpu_die(unsigned int cpu)
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 60755a631e..665747f247 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -773,8 +773,9 @@  void restore_vcpu_affinity(struct domain *d)
 }
 
 /*
- * This function is used by cpu_hotplug code from stop_machine context
+ * This function is used by cpu_hotplug code via cpu notifier chain
  * and from cpupools to switch schedulers on a cpu.
+ * Caller must get domlist_read_lock.
  */
 int cpu_disable_scheduler(unsigned int cpu)
 {
@@ -789,12 +790,6 @@  int cpu_disable_scheduler(unsigned int cpu)
     if ( c == NULL )
         return ret;
 
-    /*
-     * We'd need the domain RCU lock, but:
-     *  - when we are called from cpupool code, it's acquired there already;
-     *  - when we are called for CPU teardown, we're in stop-machine context,
-     *    so that's not be a problem.
-     */
     for_each_domain_in_cpupool ( d, c )
     {
         for_each_vcpu ( d, v )
@@ -1738,6 +1733,9 @@  static int cpu_schedule_callback(
         rc = cpu_schedule_up(cpu);
         break;
     case CPU_DEAD:
+        rcu_read_lock(&domlist_read_lock);
+        cpu_disable_scheduler(cpu);
+        rcu_read_unlock(&domlist_read_lock);
         SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
         /* Fallthrough */
     case CPU_UP_CANCELED: