diff mbox

[2/3] xen: Have schedulers revise initial placement

Message ID 1468605722-24239-2-git-send-email-george.dunlap@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

George Dunlap July 15, 2016, 6:02 p.m. UTC
The generic domain creation logic in
xen/common/domctl.c:default_vcpu0_location() attempts to try to do
initial placement load-balancing by placing vcpu 0 on the least-busy
non-primary hyperthread available.  Unfortunately, the logic can end
up picking a pcpu that's not in the online mask.  When this is passed
to a scheduler such which assumes that the initial assignment is
valid, it causes a null pointer dereference looking up the runqueue.

Furthermore, this initial placement doesn't take into account hard or
soft affinity, or any scheduler-specific knowledge (such as historic
runqueue load, as in credit2).

To solve this, when inserting a vcpu, always call the per-scheduler
"pick" function to revise the initial placement.  This will
automatically take all knowledge the scheduler has into account.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
CC: Dario Faggioli <dario.faggioli@citrix.com>
CC: Anshul Makkar <anshul.makkar@citrix.com>
CC: Meng Xu <mengxu@cis.upenn.edu>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/common/sched_credit.c  |  3 +++
 xen/common/sched_credit2.c | 12 ++++++++++--
 xen/common/sched_rt.c      |  3 +++
 3 files changed, 16 insertions(+), 2 deletions(-)

Comments

Andrew Cooper July 15, 2016, 6:07 p.m. UTC | #1
On 15/07/16 19:02, George Dunlap wrote:
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 3b9aa27..5a04985 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1620,15 +1620,23 @@ csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
>  
>      BUG_ON(is_idle_vcpu(vc));
>  
> +    /* Locks in cpu_pick expect irqs to be disabled */
> +    local_irq_disable();

This doesn't make the problem much worse, but is there a plan to fix
this issue?

None of the scheduler-accounting functions should be disabling interrupts.

~Andrew
Dario Faggioli July 16, 2016, 2:12 p.m. UTC | #2
On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote:
> On 15/07/16 19:02, George Dunlap wrote:
> > 
> > diff --git a/xen/common/sched_credit2.c
> > b/xen/common/sched_credit2.c
> > index 3b9aa27..5a04985 100644
> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > @@ -1620,15 +1620,23 @@ csched2_vcpu_insert(const struct scheduler
> > *ops, struct vcpu *vc)
> >  
> >      BUG_ON(is_idle_vcpu(vc));
> >  
> > +    /* Locks in cpu_pick expect irqs to be disabled */
> > +    local_irq_disable();
> This doesn't make the problem much worse, but is there a plan to fix
> this issue?
> 
There's a little bit more than a plan.

I've got a proof of concept implementation which was working (now I
need to refresh it), but for which I never really managed to evaluate
the performance impact as accurately as I wanted to.

In fact, I actually have a couple of variants implemented, that I was
comparing against each others, in addition to against 'vanilla'. The
problem was that I really was not seeing any impact at all, which
looked strange (I was expecting improvement, at least on some
workloads), and I wanted to investigate further.

I'm leaving here the link to two branches, where I stashed some of the
code that I have come up so far. As I said, it's WIP and needs
refreshing and reworking.

> None of the scheduler-accounting functions should be disabling
> interrupts.
> 
They don't. But you can't keep irq disabled for some operations and
enabled for others, on the same lock (because of the irq-safety
spinlock checks/enforcement).

So you have to always keep IRQ enabled, for all scheduling operations,
which is ok for _almost_ all of them, with the only exception of the
wakeup of a vcpu.

So, the idea was to treat that one case specially, i.e., put the waking
vcpus in a queue, and then drain the queue somehow. The insertion in
the queue needs to be done disabling interrupts, but the draining
--which is where the actual scheduling related hooks and operations are
done-- can be done with IRQs on, which is what we want.

What I was experimenting on was trying different ways of managing such
a queue, e.g., only one queue for all CPUs or per-CPU queues; or
whether to always drain the queue or only pick a couple of vcpu and
defer the rest again; or whether to allow concurrent draining of the
queue, or only have one CPU (at a time) doing that; etc etc.

The "1 queue for all" and "per-CPU queues" is what's in the following
two branches:

git://xenbits.xen.org/people/dariof/xen.git  wip/sched/irq-enabled
http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/wip/sched/irq-enabled

git://xenbits.xen.org/people/dariof/xen.git  wip/sched/irq-enabled-percpu
http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/wip/sched/irq-enabled-percpu

I'll get back to this soon. In the meanwhile, feel free to comment,
toss ideas, criticize, whatever. :-D

Regards,
Dario
Dario Faggioli July 18, 2016, 10:28 a.m. UTC | #3
On Fri, 2016-07-15 at 19:02 +0100, George Dunlap wrote:
> The generic domain creation logic in
> xen/common/domctl.c:default_vcpu0_location() attempts to try to do
> initial placement load-balancing by placing vcpu 0 on the least-busy
> non-primary hyperthread available.  Unfortunately, the logic can end
> up picking a pcpu that's not in the online mask.  When this is passed
> to a scheduler such which assumes that the initial assignment is
> valid, it causes a null pointer dereference looking up the runqueue.
> 
> Furthermore, this initial placement doesn't take into account hard or
> soft affinity, or any scheduler-specific knowledge (such as historic
> runqueue load, as in credit2).
> 
> To solve this, when inserting a vcpu, always call the per-scheduler
> "pick" function to revise the initial placement.  This will
> automatically take all knowledge the scheduler has into account.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> CC: Anshul Makkar <anshul.makkar@citrix.com>
> CC: Meng Xu <mengxu@cis.upenn.edu>
> CC: Jan Beulich <jbeulich@suse.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario
Andrew Cooper July 18, 2016, 6:10 p.m. UTC | #4
On 16/07/16 15:12, Dario Faggioli wrote:
> On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote:
>
>> None of the scheduler-accounting functions should be disabling
>> interrupts.
>>
> They don't. But you can't keep irq disabled for some operations and
> enabled for others, on the same lock (because of the irq-safety
> spinlock checks/enforcement).
>
> So you have to always keep IRQ enabled, for all scheduling operations,
> which is ok for _almost_ all of them, with the only exception of the
> wakeup of a vcpu.

I know that it is all or nothing.  What specific action about waking a
vcpu requires holding a lock?

If it is simply re-queueing the vcpu onto the runable queue, there are a
number of lockless queuing algorithms which can be used.

~Andrew
Dario Faggioli July 18, 2016, 6:55 p.m. UTC | #5
On Mon, 2016-07-18 at 19:10 +0100, Andrew Cooper wrote:
> On 16/07/16 15:12, Dario Faggioli wrote:
> > On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote:
> > So you have to always keep IRQ enabled, for all scheduling
> > operations,
> > which is ok for _almost_ all of them, with the only exception of
> > the
> > wakeup of a vcpu.
> I know that it is all or nothing.  What specific action about waking
> a
> vcpu requires holding a lock?
> 
> If it is simply re-queueing the vcpu onto the runable queue, there
> are a
> number of lockless queuing algorithms which can be used.
> 
Yes, it's vcpu_wake() that does vcpu_schedule_lock_irqsave() before,
among other things, calling SCHED_OP(wake, v).

What the implementation of that hook does, is scheduler dependant. In
most of the case, the core of it is, as you say, enqueueing the vcpu,
but that may not be all.

For instance, something that, all the schedulers, after queueing, do is
tickling pCPUs for having them pick up the queued work, and that also
requires serialization to be correct and effective (i.e., "just"
turning and mandating runqueues to be lockless would not be enough).

> ~Andrew
Andrew Cooper July 18, 2016, 9:36 p.m. UTC | #6
On 18/07/2016 19:55, Dario Faggioli wrote:
> On Mon, 2016-07-18 at 19:10 +0100, Andrew Cooper wrote:
>> On 16/07/16 15:12, Dario Faggioli wrote:
>>> On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote:
>>> So you have to always keep IRQ enabled, for all scheduling
>>> operations,
>>> which is ok for _almost_ all of them, with the only exception of
>>> the
>>> wakeup of a vcpu.
>> I know that it is all or nothing.  What specific action about waking
>> a
>> vcpu requires holding a lock?
>>
>> If it is simply re-queueing the vcpu onto the runable queue, there
>> are a
>> number of lockless queuing algorithms which can be used.
>>
> Yes, it's vcpu_wake() that does vcpu_schedule_lock_irqsave() before,
> among other things, calling SCHED_OP(wake, v).

Right - this looks easy to fix.  Use a per-pcpu single linked list,
which can be mutated safely with cmpxchg() rather than locks, have
vcpu_wake() add v to the linked list, and schedule itself a
SCHEDULE_SOFTIRQ, (possibly a new SCHEDULE_WAKE_SOFTIRQ with higher
priority than SCHEDULE_SOFTIRQ).

Then in the schedule softirq can take the vcpu schedule lock without
disabling interrupts and run the internals of what is currently
vcpu_wake().  The current content of vcpu_wake() very large for what is
typically executed from interrupt context.

~Andrew
Dario Faggioli July 19, 2016, 7:14 a.m. UTC | #7
On Mon, 2016-07-18 at 22:36 +0100, Andrew Cooper wrote:
> On 18/07/2016 19:55, Dario Faggioli wrote:
> > 
> > On Mon, 2016-07-18 at 19:10 +0100, Andrew Cooper wrote:
> > > 
> > > On 16/07/16 15:12, Dario Faggioli wrote:
> > > > 
> > > > On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote:
> > > > So you have to always keep IRQ enabled, for all scheduling
> > > > operations,
> > > > which is ok for _almost_ all of them, with the only exception
> > > > of
> > > > the
> > > > wakeup of a vcpu.
> > > I know that it is all or nothing.  What specific action about
> > > waking
> > > a
> > > vcpu requires holding a lock?
> > > 
> > > If it is simply re-queueing the vcpu onto the runable queue,
> > > there
> > > are a
> > > number of lockless queuing algorithms which can be used.
> > > 
> > Yes, it's vcpu_wake() that does vcpu_schedule_lock_irqsave()
> > before,
> > among other things, calling SCHED_OP(wake, v).
> Right - this looks easy to fix.  Use a per-pcpu single linked list,
> which can be mutated safely with cmpxchg() rather than locks, have
> vcpu_wake() add v to the linked list, and schedule itself a
> SCHEDULE_SOFTIRQ, (possibly a new SCHEDULE_WAKE_SOFTIRQ with higher
> priority than SCHEDULE_SOFTIRQ).
> 
That's exactly what I'm doing in one of the variant I've implemented so
far (with a dedicated lock, rather than cmpxchg, but I was indeed
looking at removing that):

http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=blobdiff;f=xen/include/xen/softirq.h;h=e62c247fdc41dd4e36ceb5f8094647fa62530c56;hp=0895a162031b64ad6acff6be9d17707dad7a89bf;hb=431423e69c9fe4503d26bd4c657699284e0223b7;hpb=a282bcd6f7751d2ff428ca6c1e14120aa6fcc5fc

:-)

> Then in the schedule softirq can take the vcpu schedule lock without
> disabling interrupts and run the internals of what is currently
> vcpu_wake().  The current content of vcpu_wake() very large for what
> is
> typically executed from interrupt context.
> 
Totally agree.

Something I was reasoning on, and trying to assess by means of
benchmarks is, for instance, when taking out the vcpus from the queue
and waking them, whether to do that always "to completion" (considering
that it's probably even possible that new vcpus are added to the queue
itself while I'm trying to drain it), or in batches (and after each
batch, let Xen do something else and re-trigger the softirq).

And this (and other similar "subtleties") is where the numbers I could
get were not conclusive.

I'll dust off the code and let you know. :-)

Regards,
Dario
George Dunlap July 25, 2016, 11:17 a.m. UTC | #8
On 18/07/16 11:28, Dario Faggioli wrote:
> On Fri, 2016-07-15 at 19:02 +0100, George Dunlap wrote:
>> The generic domain creation logic in
>> xen/common/domctl.c:default_vcpu0_location() attempts to try to do
>> initial placement load-balancing by placing vcpu 0 on the least-busy
>> non-primary hyperthread available.  Unfortunately, the logic can end
>> up picking a pcpu that's not in the online mask.  When this is passed
>> to a scheduler such which assumes that the initial assignment is
>> valid, it causes a null pointer dereference looking up the runqueue.
>>
>> Furthermore, this initial placement doesn't take into account hard or
>> soft affinity, or any scheduler-specific knowledge (such as historic
>> runqueue load, as in credit2).
>>
>> To solve this, when inserting a vcpu, always call the per-scheduler
>> "pick" function to revise the initial placement.  This will
>> automatically take all knowledge the scheduler has into account.
>>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> ---
>> CC: Dario Faggioli <dario.faggioli@citrix.com>
>> CC: Anshul Makkar <anshul.makkar@citrix.com>
>> CC: Meng Xu <mengxu@cis.upenn.edu>
>> CC: Jan Beulich <jbeulich@suse.com>
>>
> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Just to clarify - does this cover the sched_rt.c changes as well?

Thanks,
 -George
Meng Xu July 25, 2016, 2:35 p.m. UTC | #9
On Fri, Jul 15, 2016 at 2:02 PM, George Dunlap <george.dunlap@citrix.com> wrote:
>
> The generic domain creation logic in
> xen/common/domctl.c:default_vcpu0_location() attempts to try to do
> initial placement load-balancing by placing vcpu 0 on the least-busy
> non-primary hyperthread available.  Unfortunately, the logic can end
> up picking a pcpu that's not in the online mask.  When this is passed
> to a scheduler such which assumes that the initial assignment is
> valid, it causes a null pointer dereference looking up the runqueue.
>
> Furthermore, this initial placement doesn't take into account hard or
> soft affinity, or any scheduler-specific knowledge (such as historic
> runqueue load, as in credit2).
>
> To solve this, when inserting a vcpu, always call the per-scheduler
> "pick" function to revise the initial placement.  This will
> automatically take all knowledge the scheduler has into account.
>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> CC: Anshul Makkar <anshul.makkar@citrix.com>
> CC: Meng Xu <mengxu@cis.upenn.edu>
> CC: Jan Beulich <jbeulich@suse.com>
> ---
>  xen/common/sched_credit.c  |  3 +++
>  xen/common/sched_credit2.c | 12 ++++++++++--
>  xen/common/sched_rt.c      |  3 +++
>  3 files changed, 16 insertions(+), 2 deletions(-)


As to xen/common/sched_rt.c,

Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>

------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
Meng Xu July 25, 2016, 2:36 p.m. UTC | #10
On Mon, Jul 25, 2016 at 7:17 AM, George Dunlap <george.dunlap@citrix.com> wrote:
> On 18/07/16 11:28, Dario Faggioli wrote:
>> On Fri, 2016-07-15 at 19:02 +0100, George Dunlap wrote:
>>> The generic domain creation logic in
>>> xen/common/domctl.c:default_vcpu0_location() attempts to try to do
>>> initial placement load-balancing by placing vcpu 0 on the least-busy
>>> non-primary hyperthread available.  Unfortunately, the logic can end
>>> up picking a pcpu that's not in the online mask.  When this is passed
>>> to a scheduler such which assumes that the initial assignment is
>>> valid, it causes a null pointer dereference looking up the runqueue.
>>>
>>> Furthermore, this initial placement doesn't take into account hard or
>>> soft affinity, or any scheduler-specific knowledge (such as historic
>>> runqueue load, as in credit2).
>>>
>>> To solve this, when inserting a vcpu, always call the per-scheduler
>>> "pick" function to revise the initial placement.  This will
>>> automatically take all knowledge the scheduler has into account.
>>>
>>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>>> ---
>>> CC: Dario Faggioli <dario.faggioli@citrix.com>
>>> CC: Anshul Makkar <anshul.makkar@citrix.com>
>>> CC: Meng Xu <mengxu@cis.upenn.edu>
>>> CC: Jan Beulich <jbeulich@suse.com>
>>>
>> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
>
> Just to clarify - does this cover the sched_rt.c changes as well?

Hi George,

I had a look at the sched_rt.c, it looks good to me. I sent out my
review tag for the sched_rt.c, just now.

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
Dario Faggioli July 26, 2016, 9:17 a.m. UTC | #11
On Mon, 2016-07-25 at 12:17 +0100, George Dunlap wrote:
> On 18/07/16 11:28, Dario Faggioli wrote:
> > 
> > On Fri, 2016-07-15 at 19:02 +0100, George Dunlap wrote:
> > > 
> > > Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> > > ---
> > > CC: Dario Faggioli <dario.faggioli@citrix.com>
> > > CC: Anshul Makkar <anshul.makkar@citrix.com>
> > > CC: Meng Xu <mengxu@cis.upenn.edu>
> > > CC: Jan Beulich <jbeulich@suse.com>
> > > 
> > Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
> Just to clarify - does this cover the sched_rt.c changes as well?
> 
Oh, sorry, I missed this yesterday. :-(

Yes, it did, but in any case, Meng sent in his own Reviewed-by (and did
it for v2 also already), so, whatever. :-P

Thanks and Regards,
Dario
Jan Beulich Aug. 1, 2016, 10:40 a.m. UTC | #12
>>> On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote:
> The generic domain creation logic in
> xen/common/domctl.c:default_vcpu0_location() attempts to try to do
> initial placement load-balancing by placing vcpu 0 on the least-busy
> non-primary hyperthread available.  Unfortunately, the logic can end
> up picking a pcpu that's not in the online mask.  When this is passed
> to a scheduler such which assumes that the initial assignment is
> valid, it causes a null pointer dereference looking up the runqueue.
> 
> Furthermore, this initial placement doesn't take into account hard or
> soft affinity, or any scheduler-specific knowledge (such as historic
> runqueue load, as in credit2).
> 
> To solve this, when inserting a vcpu, always call the per-scheduler
> "pick" function to revise the initial placement.  This will
> automatically take all knowledge the scheduler has into account.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

Should this and patch 3 be backported?

Jan
Dario Faggioli Aug. 1, 2016, 12:32 p.m. UTC | #13
On Mon, 2016-08-01 at 04:40 -0600, Jan Beulich wrote:
> > > > On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote:
> > 
> > To solve this, when inserting a vcpu, always call the per-scheduler
> > "pick" function to revise the initial placement.  This will
> > automatically take all knowledge the scheduler has into account.
> > 
> > Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>
> Should this and patch 3 be backported?
> 
Yes, I think they're good backporting candidates.

Dario
Jan Beulich Aug. 5, 2016, 1:24 p.m. UTC | #14
>>> On 01.08.16 at 14:32, <dario.faggioli@citrix.com> wrote:
> On Mon, 2016-08-01 at 04:40 -0600, Jan Beulich wrote:
>> > > > On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote:
>> > 
>> > To solve this, when inserting a vcpu, always call the per-scheduler
>> > "pick" function to revise the initial placement.  This will
>> > automatically take all knowledge the scheduler has into account.
>> > 
>> > Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>>
>> Should this and patch 3 be backported?
>> 
> Yes, I think they're good backporting candidates.

Well, they appear to work fine on 4.7, but putting them onto 4.5
causes an early boot crash (BUG_ON( cpu != svc->vcpu->processor )
in __runq_insert()). Pulling in e59321d154 ("credit: remove cpu
argument to __runq_insert()") obviously makes that crash go
away, just to, a little later, hit the similar one close to the top of
csched_load_balance().

I'd really like to have those backported, but I have to ask one
of you to identify which prereq-s are needed on 4.6 and 4.5
(I'll revert them from 4.5 right away, but I'll wait for an osstest
flight to confirm the same issue exists on 4.6). On 4.5 I'd then
additionally depend on you to indicate whether sedf needs
some additional adjustment.

Jan
Dario Faggioli Aug. 5, 2016, 2:09 p.m. UTC | #15
On Fri, 2016-08-05 at 07:24 -0600, Jan Beulich wrote:
> > > > On 01.08.16 at 14:32, <dario.faggioli@citrix.com> wrote:
> > On Mon, 2016-08-01 at 04:40 -0600, Jan Beulich wrote:
> > > > > > On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote:
> > > > Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> > > Should this and patch 3 be backported?
> > > 
> > Yes, I think they're good backporting candidates.
> Well, they appear to work fine on 4.7, but putting them onto 4.5
> causes an early boot crash (BUG_ON( cpu != svc->vcpu->processor )
> in __runq_insert()). Pulling in e59321d154 ("credit: remove cpu
> argument to __runq_insert()") obviously makes that crash go
> away, just to, a little later, hit the similar one close to the top
> of
> csched_load_balance().
> 
Ah, I see.. Thanks for trying and letting us know.

> I'd really like to have those backported, but I have to ask one
> of you to identify which prereq-s are needed on 4.6 and 4.5
> (I'll revert them from 4.5 right away, but I'll wait for an osstest
> flight to confirm the same issue exists on 4.6). On 4.5 I'd then
> additionally depend on you to indicate whether sedf needs
> some additional adjustment.
> 
Ok. I can't, immediately.

But if Monday is fine, I'm happy to take care of this.

Dario
Jan Beulich Aug. 5, 2016, 2:44 p.m. UTC | #16
>>> On 05.08.16 at 16:09, <dario.faggioli@citrix.com> wrote:
> On Fri, 2016-08-05 at 07:24 -0600, Jan Beulich wrote:
>> > > > On 01.08.16 at 14:32, <dario.faggioli@citrix.com> wrote:
>> > On Mon, 2016-08-01 at 04:40 -0600, Jan Beulich wrote:
>> > > > > > On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote:
>> > > > Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> > > Should this and patch 3 be backported?
>> > > 
>> > Yes, I think they're good backporting candidates.
>> Well, they appear to work fine on 4.7, but putting them onto 4.5
>> causes an early boot crash (BUG_ON( cpu != svc->vcpu->processor )
>> in __runq_insert()). Pulling in e59321d154 ("credit: remove cpu
>> argument to __runq_insert()") obviously makes that crash go
>> away, just to, a little later, hit the similar one close to the top
>> of
>> csched_load_balance().
>> 
> Ah, I see.. Thanks for trying and letting us know.
> 
>> I'd really like to have those backported, but I have to ask one
>> of you to identify which prereq-s are needed on 4.6 and 4.5
>> (I'll revert them from 4.5 right away, but I'll wait for an osstest
>> flight to confirm the same issue exists on 4.6). On 4.5 I'd then
>> additionally depend on you to indicate whether sedf needs
>> some additional adjustment.
>> 
> Ok. I can't, immediately.
> 
> But if Monday is fine, I'm happy to take care of this.

Oh, of course, any time next week would be fine. I'd like to have
this for 4.5.4, but in the worst case we'll ship that without it.

Jan
Dario Faggioli Aug. 12, 2016, 8:58 a.m. UTC | #17
On Fri, 2016-08-05 at 07:24 -0600, Jan Beulich wrote:
> > > > On 01.08.16 at 14:32, <dario.faggioli@citrix.com> wrote:
> > Yes, I think they're good backporting candidates.
> Well, they appear to work fine on 4.7, but putting them onto 4.5
> causes an early boot crash (BUG_ON( cpu != svc->vcpu->processor )
> in __runq_insert()). 
>
I just tested staging-4.5 plus your backport of these patches (i.e.,
the two commits that you have then reverted), and I haven't seen
this... how can it that be?

> Pulling in e59321d154 ("credit: remove cpu
> argument to __runq_insert()") obviously makes that crash go
> away, just to, a little later, hit the similar one close to the top
> of
> csched_load_balance().
> 
In fact, I'd say that as far as I can see, the same that I said about
backporting to 4.6, does the trick for 4.5 as well:

https://lists.xen.org/archives/html/xen-devel/2016-08/msg01673.html

I've double checked that I am using the proper tree and code, and it
looks like I do.

Can (anyone of) you have a go with that?

If I'm right, and "just" avoiding to call insert_vcpu() for idle vcpu
is enough, the question about what's the best path is the same as fo
4.6.

Regards,
Dario
diff mbox

Patch

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index ac22746..a3ef00a 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -998,6 +998,9 @@  csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
 
     BUG_ON( is_idle_vcpu(vc) );
 
+    /* This is safe because vc isn't yet being scheduled */
+    vc->processor = csched_cpu_pick(ops, vc);
+
     lock = vcpu_schedule_lock_irq(vc);
 
     if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3b9aa27..5a04985 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1620,15 +1620,23 @@  csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
 
     BUG_ON(is_idle_vcpu(vc));
 
+    /* Locks in cpu_pick expect irqs to be disabled */
+    local_irq_disable();
+    
+    /* This is safe because vc isn't yet being scheduled */
+    vc->processor = csched2_cpu_pick(ops, vc);
+
     /* Add vcpu to runqueue of initial processor */
-    lock = vcpu_schedule_lock_irq(vc);
+    lock = vcpu_schedule_lock(vc);
 
     runq_assign(ops, vc);
 
-    vcpu_schedule_unlock_irq(lock, vc);
+    vcpu_schedule_unlock(lock, vc);
 
     sdom->nr_vcpus++;
 
+    local_irq_enable();
+
     SCHED_STAT_CRANK(vcpu_insert);
 
     CSCHED2_VCPU_CHECK(vc);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index bd3a2a0..41c61a7 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -874,6 +874,9 @@  rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
 
     BUG_ON( is_idle_vcpu(vc) );
 
+    /* This is safe because vc isn't yet being scheduled */
+    vc->processor = rt_cpu_pick(ops, vc);
+
     lock = vcpu_schedule_lock_irq(vc);
 
     now = NOW();