diff mbox series

xen/sched: fix sched_move_domain()

Message ID 20231019112314.22665-1-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series xen/sched: fix sched_move_domain() | expand

Commit Message

Jürgen Groß Oct. 19, 2023, 11:23 a.m. UTC
When moving a domain out of a cpupool running with the credit2
scheduler and having multiple run-queues, the following ASSERT() can
be observed:

(XEN) Xen call trace:
(XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
(XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
(XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
(XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
(XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
(XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
(XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
(XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
(XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
(XEN) ****************************************

This is happening as sched_move_domain() is setting a different cpu
for a scheduling unit without telling the scheduler. When this unit is
removed from the scheduler, the ASSERT() will trigger.

In non-debug builds the result is usually a clobbered pointer, leading
to another crash a short time later.

Fix that by swapping the two involved actions (setting another cpu and
removing the unit from the scheduler).

Cc: Henry Wang <Henry.Wang@arm.com>
Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
This fixes a regression introduced in Xen 4.15. The fix is very simple
and it will affect only configurations with multiple cpupools. I think
whether to include it in 4.18 should be decided by the release manager
based on the current state of the release (I think I wouldn't have
added it that late in the release while being the release manager).
---
 xen/common/sched/core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Henry Wang Oct. 19, 2023, 11:31 a.m. UTC | #1
Hi Juergen,

> On Oct 19, 2023, at 19:23, Juergen Gross <jgross@suse.com> wrote:
> 
> When moving a domain out of a cpupool running with the credit2
> scheduler and having multiple run-queues, the following ASSERT() can
> be observed:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
> (XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
> (XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
> (XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
> (XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
> (XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
> (XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
> (XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
> (XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
> (XEN) ****************************************
> 
> This is happening as sched_move_domain() is setting a different cpu
> for a scheduling unit without telling the scheduler. When this unit is
> removed from the scheduler, the ASSERT() will trigger.
> 
> In non-debug builds the result is usually a clobbered pointer, leading
> to another crash a short time later.
> 
> Fix that by swapping the two involved actions (setting another cpu and
> removing the unit from the scheduler).
> 
> Cc: Henry Wang <Henry.Wang@arm.com>

Emmm, I think ^ this CC is better to me moved to the scissors line, otherwise
if this patch is committed, this line will be shown in the commit message...

> Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> This fixes a regression introduced in Xen 4.15. The fix is very simple
> and it will affect only configurations with multiple cpupools. I think
> whether to include it in 4.18 should be decided by the release manager
> based on the current state of the release (I think I wouldn't have
> added it that late in the release while being the release manager).

Thanks for the reminder :)

Please correct me if I am wrong, if this is fixing the regression introduced in
4.15, shouldn’t this patch being backported to 4.15, 4.16, 4.17 and soon
4.18? So honestly I think at least for 4.18 either add this patch now or
later won’t make much difference…I am ok either way I guess.

Kind regards,
Henry
Jürgen Groß Oct. 19, 2023, 11:38 a.m. UTC | #2
On 19.10.23 13:31, Henry Wang wrote:
> Hi Juergen,
> 
>> On Oct 19, 2023, at 19:23, Juergen Gross <jgross@suse.com> wrote:
>>
>> When moving a domain out of a cpupool running with the credit2
>> scheduler and having multiple run-queues, the following ASSERT() can
>> be observed:
>>
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
>> (XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
>> (XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
>> (XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
>> (XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
>> (XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
>> (XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
>> (XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
>> (XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 1:
>> (XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
>> (XEN) ****************************************
>>
>> This is happening as sched_move_domain() is setting a different cpu
>> for a scheduling unit without telling the scheduler. When this unit is
>> removed from the scheduler, the ASSERT() will trigger.
>>
>> In non-debug builds the result is usually a clobbered pointer, leading
>> to another crash a short time later.
>>
>> Fix that by swapping the two involved actions (setting another cpu and
>> removing the unit from the scheduler).
>>
>> Cc: Henry Wang <Henry.Wang@arm.com>
> 
> Emmm, I think ^ this CC is better to me moved to the scissors line, otherwise
> if this patch is committed, this line will be shown in the commit message...
> 
>> Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> This fixes a regression introduced in Xen 4.15. The fix is very simple
>> and it will affect only configurations with multiple cpupools. I think
>> whether to include it in 4.18 should be decided by the release manager
>> based on the current state of the release (I think I wouldn't have
>> added it that late in the release while being the release manager).
> 
> Thanks for the reminder :)
> 
> Please correct me if I am wrong, if this is fixing the regression introduced in
> 4.15, shouldn’t this patch being backported to 4.15, 4.16, 4.17 and soon
> 4.18? So honestly I think at least for 4.18 either add this patch now or
> later won’t make much difference…I am ok either way I guess.

You are right, the patch needs to be backported.

OTOH nobody noticed the regression until a few days ago. So delaying the fix
for a few weeks would probably not hurt too many people. Each patch changing
code is a risk to introduce another regression, so its your decision whether
you want to take this risk. Especially changes like this one touching the
core scheduling code always have a latent risk to open up a small race window
(this could be said for the original patch I'm fixing here, too :-) ).


Juergen
Henry Wang Oct. 19, 2023, 11:49 a.m. UTC | #3
Hi Juergen,

> On Oct 19, 2023, at 19:38, Juergen Gross <jgross@suse.com> wrote:
> 
> On 19.10.23 13:31, Henry Wang wrote:
>> Hi Juergen,
>>> On Oct 19, 2023, at 19:23, Juergen Gross <jgross@suse.com> wrote:
>>> 
>>> When moving a domain out of a cpupool running with the credit2
>>> scheduler and having multiple run-queues, the following ASSERT() can
>>> be observed:
>>> 
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
>>> (XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
>>> (XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
>>> (XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
>>> (XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
>>> (XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
>>> (XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
>>> (XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
>>> (XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 1:
>>> (XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
>>> (XEN) ****************************************
>>> 
>>> This is happening as sched_move_domain() is setting a different cpu
>>> for a scheduling unit without telling the scheduler. When this unit is
>>> removed from the scheduler, the ASSERT() will trigger.
>>> 
>>> In non-debug builds the result is usually a clobbered pointer, leading
>>> to another crash a short time later.
>>> 
>>> Fix that by swapping the two involved actions (setting another cpu and
>>> removing the unit from the scheduler).
>>> 
>>> Cc: Henry Wang <Henry.Wang@arm.com>
>> Emmm, I think ^ this CC is better to me moved to the scissors line, otherwise
>> if this patch is committed, this line will be shown in the commit message...
>>> Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> ---
>>> This fixes a regression introduced in Xen 4.15. The fix is very simple
>>> and it will affect only configurations with multiple cpupools. I think
>>> whether to include it in 4.18 should be decided by the release manager
>>> based on the current state of the release (I think I wouldn't have
>>> added it that late in the release while being the release manager).
>> Thanks for the reminder :)
>> Please correct me if I am wrong, if this is fixing the regression introduced in
>> 4.15, shouldn’t this patch being backported to 4.15, 4.16, 4.17 and soon
>> 4.18? So honestly I think at least for 4.18 either add this patch now or
>> later won’t make much difference…I am ok either way I guess.
> 
> You are right, the patch needs to be backported.
> 
> OTOH nobody noticed the regression until a few days ago. So delaying the fix
> for a few weeks would probably not hurt too many people.

I am planning to branch next week, so I would say this patch probably will be
delayed max 1 week I guess.

> Each patch changing
> code is a risk to introduce another regression, so its your decision whether
> you want to take this risk. Especially changes like this one touching the
> core scheduling code always have a latent risk to open up a small race window
> (this could be said for the original patch I'm fixing here, too :-) ).

That said, given the fact that this patch is fixing a regression years ago and
will be backported to the stable releases, I am more leaning towards merging
this patch when the tree reopens (unless others strongly object), so that this
patch will get the opportunity to be tested properly, and we won’t take too much
risk to delay the 4.18 release. Also this decision is consistent with your (an
ex release manager) above words in the scissors line :)

Hopefully you will also be ok with that.

Kind regards,
Henry

> 
> 
> Juergen
> <OpenPGP_0xB0DE9DD628BF132F.asc>
Jürgen Groß Oct. 19, 2023, 12:03 p.m. UTC | #4
On 19.10.23 13:49, Henry Wang wrote:
> Hi Juergen,
> 
>> On Oct 19, 2023, at 19:38, Juergen Gross <jgross@suse.com> wrote:
>>
>> On 19.10.23 13:31, Henry Wang wrote:
>>> Hi Juergen,
>>>> On Oct 19, 2023, at 19:23, Juergen Gross <jgross@suse.com> wrote:
>>>>
>>>> When moving a domain out of a cpupool running with the credit2
>>>> scheduler and having multiple run-queues, the following ASSERT() can
>>>> be observed:
>>>>
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
>>>> (XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
>>>> (XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
>>>> (XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
>>>> (XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
>>>> (XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
>>>> (XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
>>>> (XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
>>>> (XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
>>>> (XEN)
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 1:
>>>> (XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
>>>> (XEN) ****************************************
>>>>
>>>> This is happening as sched_move_domain() is setting a different cpu
>>>> for a scheduling unit without telling the scheduler. When this unit is
>>>> removed from the scheduler, the ASSERT() will trigger.
>>>>
>>>> In non-debug builds the result is usually a clobbered pointer, leading
>>>> to another crash a short time later.
>>>>
>>>> Fix that by swapping the two involved actions (setting another cpu and
>>>> removing the unit from the scheduler).
>>>>
>>>> Cc: Henry Wang <Henry.Wang@arm.com>
>>> Emmm, I think ^ this CC is better to me moved to the scissors line, otherwise
>>> if this patch is committed, this line will be shown in the commit message...
>>>> Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>> This fixes a regression introduced in Xen 4.15. The fix is very simple
>>>> and it will affect only configurations with multiple cpupools. I think
>>>> whether to include it in 4.18 should be decided by the release manager
>>>> based on the current state of the release (I think I wouldn't have
>>>> added it that late in the release while being the release manager).
>>> Thanks for the reminder :)
>>> Please correct me if I am wrong, if this is fixing the regression introduced in
>>> 4.15, shouldn’t this patch being backported to 4.15, 4.16, 4.17 and soon
>>> 4.18? So honestly I think at least for 4.18 either add this patch now or
>>> later won’t make much difference…I am ok either way I guess.
>>
>> You are right, the patch needs to be backported.
>>
>> OTOH nobody noticed the regression until a few days ago. So delaying the fix
>> for a few weeks would probably not hurt too many people.
> 
> I am planning to branch next week, so I would say this patch probably will be
> delayed max 1 week I guess.
> 
>> Each patch changing
>> code is a risk to introduce another regression, so its your decision whether
>> you want to take this risk. Especially changes like this one touching the
>> core scheduling code always have a latent risk to open up a small race window
>> (this could be said for the original patch I'm fixing here, too :-) ).
> 
> That said, given the fact that this patch is fixing a regression years ago and
> will be backported to the stable releases, I am more leaning towards merging
> this patch when the tree reopens (unless others strongly object), so that this
> patch will get the opportunity to be tested properly, and we won’t take too much
> risk to delay the 4.18 release. Also this decision is consistent with your (an
> ex release manager) above words in the scissors line :)
> 
> Hopefully you will also be ok with that.

Of course I am.


Juergen
Andrew Cooper Oct. 20, 2023, 6:03 p.m. UTC | #5
On 19/10/2023 12:23 pm, Juergen Gross wrote:
> When moving a domain out of a cpupool running with the credit2
> scheduler and having multiple run-queues, the following ASSERT() can
> be observed:
>
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
> (XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
> (XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
> (XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
> (XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
> (XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
> (XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
> (XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
> (XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
> (XEN) ****************************************
>
> This is happening as sched_move_domain() is setting a different cpu
> for a scheduling unit without telling the scheduler. When this unit is
> removed from the scheduler, the ASSERT() will trigger.
>
> In non-debug builds the result is usually a clobbered pointer, leading
> to another crash a short time later.
>
> Fix that by swapping the two involved actions (setting another cpu and
> removing the unit from the scheduler).
>
> Cc: Henry Wang <Henry.Wang@arm.com>
> Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")

Link: https://github.com/Dasharo/dasharo-issues/issues/488

And a Reported/Tested-by if the discoverer wishes.

> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> This fixes a regression introduced in Xen 4.15. The fix is very simple
> and it will affect only configurations with multiple cpupools. I think
> whether to include it in 4.18 should be decided by the release manager
> based on the current state of the release (I think I wouldn't have
> added it that late in the release while being the release manager).
> ---
>  xen/common/sched/core.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index 12deefa745..e9f7486197 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -738,12 +738,13 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>      new_p = cpumask_first(d->cpupool->cpu_valid);
>      for_each_sched_unit ( d, unit )
>      {
> -        spinlock_t *lock = unit_schedule_lock_irq(unit);
> +        spinlock_t *lock;
> +
> +        sched_remove_unit(old_ops, unit);
>  
> +        lock = unit_schedule_lock_irq(unit);
>          sched_set_res(unit, get_sched_res(new_p));
>          spin_unlock_irq(lock);
> -
> -        sched_remove_unit(old_ops, unit);

I'm happy to see the T-by, and that you know what's going on here, but I
don't understand the explanation.

The change here is the ordering of sched_remove_unit() with respect to
the lock/get&set/unlock block.

remove unit is moderately complicated, but the get&set is just an RCU
pointer assignment.  But as the assertion fires in the latter action, it
must be the get&set causing problems.

And that's because new_p is actually a cpu number, which has the
consequence of causing Credit2's c2rqd() to materialise the wrong
csched2_runqueue_data pointer, and then we're operating on someone
else's data without a suitable lock held.

And it's only by luck that none of the other schedulers tie something to
per-cpu data like this?


Other observations.

I think sched_move_domain() would be easier to follow with
s/new_p/new_cpu/ (and similar for unit_p) as "p" is not obviously
processor unless you notice the cpumask_first() calls.

Why do we move a domain back to cpupool0 on domain destroy?  There's
nothing magic about cpupool0 that I'm aware of here.  It seems like
unnecessary complexity.

~Andrew
Jürgen Groß Oct. 23, 2023, 6:31 a.m. UTC | #6
On 20.10.23 20:03, Andrew Cooper wrote:
> On 19/10/2023 12:23 pm, Juergen Gross wrote:
>> When moving a domain out of a cpupool running with the credit2
>> scheduler and having multiple run-queues, the following ASSERT() can
>> be observed:
>>
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
>> (XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
>> (XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
>> (XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
>> (XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
>> (XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
>> (XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
>> (XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
>> (XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 1:
>> (XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at common/sched/credit2.c:1159
>> (XEN) ****************************************
>>
>> This is happening as sched_move_domain() is setting a different cpu
>> for a scheduling unit without telling the scheduler. When this unit is
>> removed from the scheduler, the ASSERT() will trigger.
>>
>> In non-debug builds the result is usually a clobbered pointer, leading
>> to another crash a short time later.
>>
>> Fix that by swapping the two involved actions (setting another cpu and
>> removing the unit from the scheduler).
>>
>> Cc: Henry Wang <Henry.Wang@arm.com>
>> Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools with different granularity")
> 
> Link: https://github.com/Dasharo/dasharo-issues/issues/488
> 
> And a Reported/Tested-by if the discoverer wishes.
> 
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> This fixes a regression introduced in Xen 4.15. The fix is very simple
>> and it will affect only configurations with multiple cpupools. I think
>> whether to include it in 4.18 should be decided by the release manager
>> based on the current state of the release (I think I wouldn't have
>> added it that late in the release while being the release manager).
>> ---
>>   xen/common/sched/core.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>> index 12deefa745..e9f7486197 100644
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>> @@ -738,12 +738,13 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>       new_p = cpumask_first(d->cpupool->cpu_valid);
>>       for_each_sched_unit ( d, unit )
>>       {
>> -        spinlock_t *lock = unit_schedule_lock_irq(unit);
>> +        spinlock_t *lock;
>> +
>> +        sched_remove_unit(old_ops, unit);
>>   
>> +        lock = unit_schedule_lock_irq(unit);
>>           sched_set_res(unit, get_sched_res(new_p));
>>           spin_unlock_irq(lock);
>> -
>> -        sched_remove_unit(old_ops, unit);
> 
> I'm happy to see the T-by, and that you know what's going on here, but I
> don't understand the explanation.
> 
> The change here is the ordering of sched_remove_unit() with respect to
> the lock/get&set/unlock block.

Yes.

> remove unit is moderately complicated, but the get&set is just an RCU
> pointer assignment.  But as the assertion fires in the latter action, it
> must be the get&set causing problems.

Yes. As I wrote, it is because the cpu assignment of the unit is being
changed without telling the specific scheduler (credit2) about it.

> And that's because new_p is actually a cpu number, which has the
> consequence of causing Credit2's c2rqd() to materialise the wrong
> csched2_runqueue_data pointer, and then we're operating on someone
> else's data without a suitable lock held.

I haven't looked into the exact details what went wrong further down the
road, but I think it is not only the missing lock, but probably the
fact that depending on the use case different run queues will be addressed
for the same unit, causing inconsistencies.

> And it's only by luck that none of the other schedulers tie something to
> per-cpu data like this?

I didn't look into the other schedulers.

> Other observations.
> 
> I think sched_move_domain() would be easier to follow with
> s/new_p/new_cpu/ (and similar for unit_p) as "p" is not obviously
> processor unless you notice the cpumask_first() calls.

Maybe. I don't have that problem, OTOH I've written that code. :-D

> Why do we move a domain back to cpupool0 on domain destroy?  There's
> nothing magic about cpupool0 that I'm aware of here.  It seems like
> unnecessary complexity.

See commit bac6334b51d9.


Juergen
diff mbox series

Patch

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 12deefa745..e9f7486197 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -738,12 +738,13 @@  int sched_move_domain(struct domain *d, struct cpupool *c)
     new_p = cpumask_first(d->cpupool->cpu_valid);
     for_each_sched_unit ( d, unit )
     {
-        spinlock_t *lock = unit_schedule_lock_irq(unit);
+        spinlock_t *lock;
+
+        sched_remove_unit(old_ops, unit);
 
+        lock = unit_schedule_lock_irq(unit);
         sched_set_res(unit, get_sched_res(new_p));
         spin_unlock_irq(lock);
-
-        sched_remove_unit(old_ops, unit);
     }
 
     old_units = d->sched_unit_list;