Message ID | 20200409094137.13836-1-sergey.dyasli@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sched: fix scheduler_disable() with core scheduling | expand |
On 09.04.20 11:41, Sergey Dyasli wrote: > In core-scheduling mode, Xen might crash when entering ACPI S5 state. > This happens in sched_slave() during is_idle_unit(next) check because > next->vcpu_list is stale and points to an already freed memory. > > This situation happens shortly after scheduler_disable() is called if > some CPU is still inside sched_slave() softirq. Current logic simply > returns prev->next_task from sched_wait_rendezvous_in() which causes > the described crash because next_task->vcpu_list has become invalid. > > Fix the crash by returning NULL from sched_wait_rendezvous_in() in > the case when scheduler_disable() has been called. > > Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Good catch! Have you seen any further problems (e.g. with cpu on/offlining) with this patch applied? Reviewed-by: Juergen Gross <jgross@suse.com> Juergen
(CC Igor) On 09/04/2020 13:50, Jürgen Groß wrote: > On 09.04.20 11:41, Sergey Dyasli wrote: >> In core-scheduling mode, Xen might crash when entering ACPI S5 state. >> This happens in sched_slave() during is_idle_unit(next) check because >> next->vcpu_list is stale and points to an already freed memory. >> >> This situation happens shortly after scheduler_disable() is called if >> some CPU is still inside sched_slave() softirq. Current logic simply >> returns prev->next_task from sched_wait_rendezvous_in() which causes >> the described crash because next_task->vcpu_list has become invalid. >> >> Fix the crash by returning NULL from sched_wait_rendezvous_in() in >> the case when scheduler_disable() has been called. >> >> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> > > Good catch! > > Have you seen any further problems (e.g. with cpu on/offlining) with > this patch applied? This patch shouldn't affect cpu on/offlining AFAICS. Igor was the one testing cpu on/offlining and I think he came to a conclusion that it's broken even without core-scheduling enabled. > Reviewed-by: Juergen Gross <jgross@suse.com> Thanks! -- Sergey
On Thu, 2020-04-09 at 14:50 +0200, Jürgen Groß wrote: > On 09.04.20 11:41, Sergey Dyasli wrote: > > In core-scheduling mode, Xen might crash when entering ACPI S5 > > state. > > This happens in sched_slave() during is_idle_unit(next) check > > because > > next->vcpu_list is stale and points to an already freed memory. > > > > This situation happens shortly after scheduler_disable() is called > > if > > some CPU is still inside sched_slave() softirq. Current logic > > simply > > returns prev->next_task from sched_wait_rendezvous_in() which > > causes > > the described crash because next_task->vcpu_list has become > > invalid. > > > > Fix the crash by returning NULL from sched_wait_rendezvous_in() in > > the case when scheduler_disable() has been called. > > > > Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> > > Reviewed-by: Juergen Gross <jgross@suse.com> > Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Thanks and Regards
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index 626861a3fe..d4a6489929 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -2484,19 +2484,15 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, *lock = pcpu_schedule_lock_irq(cpu); - if ( unlikely(!scheduler_active) ) - { - ASSERT(is_idle_unit(prev)); - atomic_set(&prev->next_task->rendezvous_out_cnt, 0); - prev->rendezvous_in_cnt = 0; - } - /* * Check for scheduling resource switched. This happens when we are * moved away from our cpupool and cpus are subject of the idle * scheduler now. + * + * This is also a bail out case when scheduler_disable() has been + * called. */ - if ( unlikely(sr != get_sched_res(cpu)) ) + if ( unlikely(sr != get_sched_res(cpu) || !scheduler_active) ) { ASSERT(is_idle_unit(prev)); atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
In core-scheduling mode, Xen might crash when entering ACPI S5 state. This happens in sched_slave() during is_idle_unit(next) check because next->vcpu_list is stale and points to an already freed memory. This situation happens shortly after scheduler_disable() is called if some CPU is still inside sched_slave() softirq. Current logic simply returns prev->next_task from sched_wait_rendezvous_in() which causes the described crash because next_task->vcpu_list has become invalid. Fix the crash by returning NULL from sched_wait_rendezvous_in() in the case when scheduler_disable() has been called. Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> --- CC: Juergen Gross <jgross@suse.com> CC: Dario Faggioli <dfaggioli@suse.com> CC: George Dunlap <george.dunlap@citrix.com> CC: Jan Beulich <jbeulich@suse.com> --- xen/common/sched/core.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)