[v2] xen/sched: let credit scheduler control its timer all alone
diff mbox series

Message ID 20191007063519.2912-1-jgross@suse.com
State New
Headers show
Series
  • [v2] xen/sched: let credit scheduler control its timer all alone
Related show

Commit Message

Jürgen Groß Oct. 7, 2019, 6:35 a.m. UTC
The credit scheduler is the only scheduler with tick_suspend and
tick_resume callbacks. Today those callbacks are invoked without being
guarded by the scheduler lock which is critical when at the same the
cpu those callbacks are active is being moved to or from a cpupool.

Crashes like the following are possible due to that race:

(XEN) ----[ Xen-4.13.0-8.0.12-d  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    79
(XEN) RIP:    e008:[<ffff82d0802467dc>] set_timer+0x39/0x1f7
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
<snip>
(XEN) Xen call trace:
(XEN)    [<ffff82d0802467dc>] set_timer+0x39/0x1f7
(XEN)    [<ffff82d08022c1f4>]
sched_credit.c#csched_tick_resume+0x54/0x59
(XEN)    [<ffff82d080241dfe>] sched_tick_resume+0x67/0x86
(XEN)    [<ffff82d0802eda52>] mwait-idle.c#mwait_idle+0x32b/0x357
(XEN)    [<ffff82d08027939e>] domain.c#idle_loop+0xa6/0xc2
(XEN)
(XEN) Pagetable walk from 0000000000000048:
(XEN)  L4[0x000] = 00000082cfb9c063 ffffffffffffffff
(XEN)  L3[0x000] = 00000082cfb9b063 ffffffffffffffff
(XEN)  L2[0x000] = 00000082cfb9a063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 79:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000048
(XEN) ****************************************

The callbacks are used when the cpu is going to or coming from idle in
order to allow higher C-states.

The credit scheduler knows when it is going to schedule an idle
scheduling unit or another one after idle, so it can easily stop or
resume the timer itself removing the need to do so via the callback.
As this timer handling is done in the main scheduling function the
scheduler lock is still held, so the race with cpupool operations can
no longer occur. Note that calling the callbacks from schedule_cpu_rm()
and schedule_cpu_add() is no longer needed, as the transitions to and
from idle in the cpupool with credit active will automatically occur
and do the right thing.

With the last user of the callbacks gone those can be removed.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- instead of locking handle timer in credit only (George Dunlap)
---
 xen/common/sched_arinc653.c |  3 ---
 xen/common/sched_credit.c   | 34 ++++++++--------------------------
 xen/common/schedule.c       | 26 ++------------------------
 xen/include/xen/sched-if.h  | 15 ---------------
 4 files changed, 10 insertions(+), 68 deletions(-)

Comments

Dario Faggioli Oct. 7, 2019, 9:05 a.m. UTC | #1
On Mon, 2019-10-07 at 08:35 +0200, Juergen Gross wrote:
> The credit scheduler is the only scheduler with tick_suspend and
> tick_resume callbacks. Today those callbacks are invoked without
> being
> guarded by the scheduler lock which is critical when at the same the
> cpu those callbacks are active is being moved to or from a cpupool.
> 
> Crashes like the following are possible due to that race:
> 
> (XEN) ----[ Xen-4.13.0-8.0.12-d  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    79
> (XEN) RIP:    e008:[<ffff82d0802467dc>] set_timer+0x39/0x1f7
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> <snip>
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0802467dc>] set_timer+0x39/0x1f7
> (XEN)    [<ffff82d08022c1f4>]
> sched_credit.c#csched_tick_resume+0x54/0x59
> (XEN)    [<ffff82d080241dfe>] sched_tick_resume+0x67/0x86
> (XEN)    [<ffff82d0802eda52>] mwait-idle.c#mwait_idle+0x32b/0x357
> (XEN)    [<ffff82d08027939e>] domain.c#idle_loop+0xa6/0xc2
> (XEN)
> (XEN) Pagetable walk from 0000000000000048:
> (XEN)  L4[0x000] = 00000082cfb9c063 ffffffffffffffff
> (XEN)  L3[0x000] = 00000082cfb9b063 ffffffffffffffff
> (XEN)  L2[0x000] = 00000082cfb9a063 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 79:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000048
> (XEN) ****************************************
> 
> The callbacks are used when the cpu is going to or coming from idle
> in
> order to allow higher C-states.
> 
> The credit scheduler knows when it is going to schedule an idle
> scheduling unit or another one after idle, so it can easily stop or
> resume the timer itself removing the need to do so via the callback.
> As this timer handling is done in the main scheduling function the
> scheduler lock is still held, so the race with cpupool operations can
> no longer occur. Note that calling the callbacks from
> schedule_cpu_rm()
> and schedule_cpu_add() is no longer needed, as the transitions to and
> from idle in the cpupool with credit active will automatically occur
> and do the right thing.
> 
> With the last user of the callbacks gone those can be removed.
> 
Which is great! :-0

> Suggested-by: George Dunlap <george.dunlap@citrix.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Well, unless I'm missing something, I guess that, at this point:

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -3082,32 +3078,14 @@ void schedule_dump(struct cpupool *c)
>  
>  void sched_tick_suspend(void)
>  {
> -    struct scheduler *sched;
> -    unsigned int cpu = smp_processor_id();
> -
> -    rcu_read_lock(&sched_res_rculock);
> -
> -    sched = get_sched_res(cpu)->scheduler;
> -    sched_do_tick_suspend(sched, cpu);
> -    rcu_idle_enter(cpu);
> +    rcu_idle_enter(smp_processor_id());
>      rcu_idle_timer_start();
> -
> -    rcu_read_unlock(&sched_res_rculock);
>  }
> 
sched_tick_suspend() could go away and rcu_idle_enter() be called
directly (with rcu_idle_timer_start() becoming static, and called
directly by rcu_idle_timer_enter() itself)

And the same for sched_tick_resume(), rcu_idle_timer_stop() and
rcu_idle_exit().

I'll give my:

Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

To this patch, though, as I appreciate we want it in to be able to
continue testing core-scheduling during 4.13 rc-phase.

It'd be cool if the adjustments described above (if agreed upon), could
come as a follow-up.

Regards
Jürgen Groß Oct. 7, 2019, 9:19 a.m. UTC | #2
On 07.10.19 11:05, Dario Faggioli wrote:
> On Mon, 2019-10-07 at 08:35 +0200, Juergen Gross wrote:
>> The credit scheduler is the only scheduler with tick_suspend and
>> tick_resume callbacks. Today those callbacks are invoked without
>> being
>> guarded by the scheduler lock which is critical when at the same the
>> cpu those callbacks are active is being moved to or from a cpupool.
>>
>> Crashes like the following are possible due to that race:
>>
>> (XEN) ----[ Xen-4.13.0-8.0.12-d  x86_64  debug=y   Not tainted ]----
>> (XEN) CPU:    79
>> (XEN) RIP:    e008:[<ffff82d0802467dc>] set_timer+0x39/0x1f7
>> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
>> <snip>
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d0802467dc>] set_timer+0x39/0x1f7
>> (XEN)    [<ffff82d08022c1f4>]
>> sched_credit.c#csched_tick_resume+0x54/0x59
>> (XEN)    [<ffff82d080241dfe>] sched_tick_resume+0x67/0x86
>> (XEN)    [<ffff82d0802eda52>] mwait-idle.c#mwait_idle+0x32b/0x357
>> (XEN)    [<ffff82d08027939e>] domain.c#idle_loop+0xa6/0xc2
>> (XEN)
>> (XEN) Pagetable walk from 0000000000000048:
>> (XEN)  L4[0x000] = 00000082cfb9c063 ffffffffffffffff
>> (XEN)  L3[0x000] = 00000082cfb9b063 ffffffffffffffff
>> (XEN)  L2[0x000] = 00000082cfb9a063 ffffffffffffffff
>> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 79:
>> (XEN) FATAL PAGE FAULT
>> (XEN) [error_code=0000]
>> (XEN) Faulting linear address: 0000000000000048
>> (XEN) ****************************************
>>
>> The callbacks are used when the cpu is going to or coming from idle
>> in
>> order to allow higher C-states.
>>
>> The credit scheduler knows when it is going to schedule an idle
>> scheduling unit or another one after idle, so it can easily stop or
>> resume the timer itself removing the need to do so via the callback.
>> As this timer handling is done in the main scheduling function the
>> scheduler lock is still held, so the race with cpupool operations can
>> no longer occur. Note that calling the callbacks from
>> schedule_cpu_rm()
>> and schedule_cpu_add() is no longer needed, as the transitions to and
>> from idle in the cpupool with credit active will automatically occur
>> and do the right thing.
>>
>> With the last user of the callbacks gone those can be removed.
>>
> Which is great! :-0
> 
>> Suggested-by: George Dunlap <george.dunlap@citrix.com>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
> Well, unless I'm missing something, I guess that, at this point:
> 
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -3082,32 +3078,14 @@ void schedule_dump(struct cpupool *c)
>>   
>>   void sched_tick_suspend(void)
>>   {
>> -    struct scheduler *sched;
>> -    unsigned int cpu = smp_processor_id();
>> -
>> -    rcu_read_lock(&sched_res_rculock);
>> -
>> -    sched = get_sched_res(cpu)->scheduler;
>> -    sched_do_tick_suspend(sched, cpu);
>> -    rcu_idle_enter(cpu);
>> +    rcu_idle_enter(smp_processor_id());
>>       rcu_idle_timer_start();
>> -
>> -    rcu_read_unlock(&sched_res_rculock);
>>   }
>>
> sched_tick_suspend() could go away and rcu_idle_enter() be called
> directly (with rcu_idle_timer_start() becoming static, and called
> directly by rcu_idle_timer_enter() itself)
> 
> And the same for sched_tick_resume(), rcu_idle_timer_stop() and
> rcu_idle_exit().
> 
> I'll give my:
> 
> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
> 
> To this patch, though, as I appreciate we want it in to be able to
> continue testing core-scheduling during 4.13 rc-phase.
> 
> It'd be cool if the adjustments described above (if agreed upon), could
> come as a follow-up.

Noted on my "scheduling cleanup" todo list.


Juergen
George Dunlap Oct. 7, 2019, 10:23 a.m. UTC | #3
On 10/7/19 7:35 AM, Juergen Gross wrote:
> The credit scheduler is the only scheduler with tick_suspend and
> tick_resume callbacks. Today those callbacks are invoked without being
> guarded by the scheduler lock which is critical when at the same the
> cpu those callbacks are active is being moved to or from a cpupool.
> 
> Crashes like the following are possible due to that race:
> 
> (XEN) ----[ Xen-4.13.0-8.0.12-d  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    79
> (XEN) RIP:    e008:[<ffff82d0802467dc>] set_timer+0x39/0x1f7
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> <snip>
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0802467dc>] set_timer+0x39/0x1f7
> (XEN)    [<ffff82d08022c1f4>]
> sched_credit.c#csched_tick_resume+0x54/0x59
> (XEN)    [<ffff82d080241dfe>] sched_tick_resume+0x67/0x86
> (XEN)    [<ffff82d0802eda52>] mwait-idle.c#mwait_idle+0x32b/0x357
> (XEN)    [<ffff82d08027939e>] domain.c#idle_loop+0xa6/0xc2
> (XEN)
> (XEN) Pagetable walk from 0000000000000048:
> (XEN)  L4[0x000] = 00000082cfb9c063 ffffffffffffffff
> (XEN)  L3[0x000] = 00000082cfb9b063 ffffffffffffffff
> (XEN)  L2[0x000] = 00000082cfb9a063 ffffffffffffffff
> (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 79:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: 0000000000000048
> (XEN) ****************************************
> 
> The callbacks are used when the cpu is going to or coming from idle in
> order to allow higher C-states.
> 
> The credit scheduler knows when it is going to schedule an idle
> scheduling unit or another one after idle, so it can easily stop or
> resume the timer itself removing the need to do so via the callback.
> As this timer handling is done in the main scheduling function the
> scheduler lock is still held, so the race with cpupool operations can
> no longer occur. Note that calling the callbacks from schedule_cpu_rm()
> and schedule_cpu_add() is no longer needed, as the transitions to and
> from idle in the cpupool with credit active will automatically occur
> and do the right thing.
> 
> With the last user of the callbacks gone those can be removed.
> 
> Suggested-by: George Dunlap <george.dunlap@citrix.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - instead of locking handle timer in credit only (George Dunlap)
> ---
>  xen/common/sched_arinc653.c |  3 ---
>  xen/common/sched_credit.c   | 34 ++++++++--------------------------
>  xen/common/schedule.c       | 26 ++------------------------
>  xen/include/xen/sched-if.h  | 15 ---------------
>  4 files changed, 10 insertions(+), 68 deletions(-)

Like those diffstats. :-)

Pushed, thanks.

 -George

Patch
diff mbox series

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index 45c05c6cd9..565575c326 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -724,9 +724,6 @@  static const struct scheduler sched_arinc653_def = {
 
     .dump_settings  = NULL,
     .dump_cpu_state = NULL,
-
-    .tick_suspend   = NULL,
-    .tick_resume    = NULL,
 };
 
 REGISTER_SCHEDULER(sched_arinc653_def);
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 31fdcd6a2f..fbffcf3996 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1831,6 +1831,7 @@  static void csched_schedule(
 {
     const unsigned int cur_cpu = smp_processor_id();
     const unsigned int sched_cpu = sched_get_resource_cpu(cur_cpu);
+    struct csched_pcpu *spc = CSCHED_PCPU(cur_cpu);
     struct list_head * const runq = RUNQ(sched_cpu);
     struct csched_unit * const scurr = CSCHED_UNIT(unit);
     struct csched_private *prv = CSCHED_PRIV(ops);
@@ -2000,6 +2001,13 @@  out:
     unit->next_task = snext->unit;
     snext->unit->migrated = migrated;
 
+    /* Stop credit tick when going to idle, restart it when coming from idle. */
+    if ( !is_idle_unit(unit) && is_idle_unit(unit->next_task) )
+        stop_timer(&spc->ticker);
+    if ( is_idle_unit(unit) && !is_idle_unit(unit->next_task) )
+        set_timer(&spc->ticker, now + MICROSECS(prv->tick_period_us)
+                                - now % MICROSECS(prv->tick_period_us) );
+
     CSCHED_UNIT_CHECK(unit->next_task);
 }
 
@@ -2237,29 +2245,6 @@  csched_deinit(struct scheduler *ops)
     }
 }
 
-static void csched_tick_suspend(const struct scheduler *ops, unsigned int cpu)
-{
-    struct csched_pcpu *spc;
-
-    spc = CSCHED_PCPU(cpu);
-
-    stop_timer(&spc->ticker);
-}
-
-static void csched_tick_resume(const struct scheduler *ops, unsigned int cpu)
-{
-    struct csched_private *prv;
-    struct csched_pcpu *spc;
-    uint64_t now = NOW();
-
-    spc = CSCHED_PCPU(cpu);
-
-    prv = CSCHED_PRIV(ops);
-
-    set_timer(&spc->ticker, now + MICROSECS(prv->tick_period_us)
-            - now % MICROSECS(prv->tick_period_us) );
-}
-
 static const struct scheduler sched_credit_def = {
     .name           = "SMP Credit Scheduler",
     .opt_name       = "credit",
@@ -2295,9 +2280,6 @@  static const struct scheduler sched_credit_def = {
     .switch_sched   = csched_switch_sched,
     .alloc_domdata  = csched_alloc_domdata,
     .free_domdata   = csched_free_domdata,
-
-    .tick_suspend   = csched_tick_suspend,
-    .tick_resume    = csched_tick_resume,
 };
 
 REGISTER_SCHEDULER(sched_credit_def);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 217fcb09ce..c327c40b92 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -2871,8 +2871,6 @@  int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
     /* _Not_ pcpu_schedule_unlock(): schedule_lock has changed! */
     spin_unlock_irqrestore(old_lock, flags);
 
-    sched_do_tick_resume(new_ops, cpu);
-
     sr->granularity = cpupool_get_granularity(c);
     sr->cpupool = c;
     /* The  cpu is added to a pool, trigger it to go pick up some work */
@@ -2943,8 +2941,6 @@  int schedule_cpu_rm(unsigned int cpu)
     ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
     ASSERT(!cpumask_test_cpu(cpu, sr->cpupool->cpu_valid));
 
-    sched_do_tick_suspend(old_ops, cpu);
-
     /* See comment in schedule_cpu_add() regarding lock switching. */
     old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
 
@@ -3082,32 +3078,14 @@  void schedule_dump(struct cpupool *c)
 
 void sched_tick_suspend(void)
 {
-    struct scheduler *sched;
-    unsigned int cpu = smp_processor_id();
-
-    rcu_read_lock(&sched_res_rculock);
-
-    sched = get_sched_res(cpu)->scheduler;
-    sched_do_tick_suspend(sched, cpu);
-    rcu_idle_enter(cpu);
+    rcu_idle_enter(smp_processor_id());
     rcu_idle_timer_start();
-
-    rcu_read_unlock(&sched_res_rculock);
 }
 
 void sched_tick_resume(void)
 {
-    struct scheduler *sched;
-    unsigned int cpu = smp_processor_id();
-
-    rcu_read_lock(&sched_res_rculock);
-
     rcu_idle_timer_stop();
-    rcu_idle_exit(cpu);
-    sched = get_sched_res(cpu)->scheduler;
-    sched_do_tick_resume(sched, cpu);
-
-    rcu_read_unlock(&sched_res_rculock);
+    rcu_idle_exit(smp_processor_id());
 }
 
 void wait(void)
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index cd731d7172..29715652bc 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -353,9 +353,6 @@  struct scheduler {
                                     struct xen_sysctl_scheduler_op *);
     void         (*dump_settings)  (const struct scheduler *);
     void         (*dump_cpu_state) (const struct scheduler *, int);
-
-    void         (*tick_suspend)    (const struct scheduler *, unsigned int);
-    void         (*tick_resume)     (const struct scheduler *, unsigned int);
 };
 
 static inline int sched_init(struct scheduler *s)
@@ -387,18 +384,6 @@  static inline void sched_dump_cpu_state(const struct scheduler *s, int cpu)
         s->dump_cpu_state(s, cpu);
 }
 
-static inline void sched_do_tick_suspend(const struct scheduler *s, int cpu)
-{
-    if ( s->tick_suspend )
-        s->tick_suspend(s, cpu);
-}
-
-static inline void sched_do_tick_resume(const struct scheduler *s, int cpu)
-{
-    if ( s->tick_resume )
-        s->tick_resume(s, cpu);
-}
-
 static inline void *sched_alloc_domdata(const struct scheduler *s,
                                         struct domain *d)
 {