diff mbox series

sched: fix scheduler_disable() with core scheduling

Message ID 20200409094137.13836-1-sergey.dyasli@citrix.com (mailing list archive)
State New, archived
Headers show
Series sched: fix scheduler_disable() with core scheduling | expand

Commit Message

Sergey Dyasli April 9, 2020, 9:41 a.m. UTC
In core-scheduling mode, Xen might crash when entering ACPI S5 state.
This happens in sched_slave() during is_idle_unit(next) check because
next->vcpu_list is stale and points to an already freed memory.

This situation happens shortly after scheduler_disable() is called if
some CPU is still inside sched_slave() softirq. Current logic simply
returns prev->next_task from sched_wait_rendezvous_in() which causes
the described crash because next_task->vcpu_list has become invalid.

Fix the crash by returning NULL from sched_wait_rendezvous_in() in
the case when scheduler_disable() has been called.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
CC: Juergen Gross <jgross@suse.com>
CC: Dario Faggioli <dfaggioli@suse.com>
CC: George Dunlap <george.dunlap@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/common/sched/core.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

Comments

Jürgen Groß April 9, 2020, 12:50 p.m. UTC | #1
On 09.04.20 11:41, Sergey Dyasli wrote:
> In core-scheduling mode, Xen might crash when entering ACPI S5 state.
> This happens in sched_slave() during is_idle_unit(next) check because
> next->vcpu_list is stale and points to an already freed memory.
> 
> This situation happens shortly after scheduler_disable() is called if
> some CPU is still inside sched_slave() softirq. Current logic simply
> returns prev->next_task from sched_wait_rendezvous_in() which causes
> the described crash because next_task->vcpu_list has become invalid.
> 
> Fix the crash by returning NULL from sched_wait_rendezvous_in() in
> the case when scheduler_disable() has been called.
> 
> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>

Good catch!

Have you seen any further problems (e.g. with cpu on/offlining) with
this patch applied?

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen
Sergey Dyasli April 14, 2020, 12:37 p.m. UTC | #2
(CC Igor)

On 09/04/2020 13:50, Jürgen Groß wrote:
> On 09.04.20 11:41, Sergey Dyasli wrote:
>> In core-scheduling mode, Xen might crash when entering ACPI S5 state.
>> This happens in sched_slave() during is_idle_unit(next) check because
>> next->vcpu_list is stale and points to an already freed memory.
>>
>> This situation happens shortly after scheduler_disable() is called if
>> some CPU is still inside sched_slave() softirq. Current logic simply
>> returns prev->next_task from sched_wait_rendezvous_in() which causes
>> the described crash because next_task->vcpu_list has become invalid.
>>
>> Fix the crash by returning NULL from sched_wait_rendezvous_in() in
>> the case when scheduler_disable() has been called.
>>
>> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
> 
> Good catch!
> 
> Have you seen any further problems (e.g. with cpu on/offlining) with
> this patch applied?

This patch shouldn't affect cpu on/offlining AFAICS. Igor was the one testing
cpu on/offlining and I think he came to a conclusion that it's broken even
without core-scheduling enabled.

> Reviewed-by: Juergen Gross <jgross@suse.com>

Thanks!

--
Sergey
Dario Faggioli April 16, 2020, 4:10 p.m. UTC | #3
On Thu, 2020-04-09 at 14:50 +0200, Jürgen Groß wrote:
> On 09.04.20 11:41, Sergey Dyasli wrote:
> > In core-scheduling mode, Xen might crash when entering ACPI S5
> > state.
> > This happens in sched_slave() during is_idle_unit(next) check
> > because
> > next->vcpu_list is stale and points to an already freed memory.
> > 
> > This situation happens shortly after scheduler_disable() is called
> > if
> > some CPU is still inside sched_slave() softirq. Current logic
> > simply
> > returns prev->next_task from sched_wait_rendezvous_in() which
> > causes
> > the described crash because next_task->vcpu_list has become
> > invalid.
> > 
> > Fix the crash by returning NULL from sched_wait_rendezvous_in() in
> > the case when scheduler_disable() has been called.
> > 
> > Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
> 
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Thanks and Regards
diff mbox series

Patch

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 626861a3fe..d4a6489929 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2484,19 +2484,15 @@  static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
 
         *lock = pcpu_schedule_lock_irq(cpu);
 
-        if ( unlikely(!scheduler_active) )
-        {
-            ASSERT(is_idle_unit(prev));
-            atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
-            prev->rendezvous_in_cnt = 0;
-        }
-
         /*
          * Check for scheduling resource switched. This happens when we are
          * moved away from our cpupool and cpus are subject of the idle
          * scheduler now.
+         *
+         * This is also a bail out case when scheduler_disable() has been
+         * called.
          */
-        if ( unlikely(sr != get_sched_res(cpu)) )
+        if ( unlikely(sr != get_sched_res(cpu) || !scheduler_active) )
         {
             ASSERT(is_idle_unit(prev));
             atomic_set(&prev->next_task->rendezvous_out_cnt, 0);