Message ID | 1468605722-24239-2-git-send-email-george.dunlap@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 15/07/16 19:02, George Dunlap wrote: > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 3b9aa27..5a04985 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -1620,15 +1620,23 @@ csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc) > > BUG_ON(is_idle_vcpu(vc)); > > + /* Locks in cpu_pick expect irqs to be disabled */ > + local_irq_disable(); This doesn't make the problem much worse, but is there a plan to fix this issue? None of the scheduler-accounting functions should be disabling interrupts. ~Andrew
On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote: > On 15/07/16 19:02, George Dunlap wrote: > > > > diff --git a/xen/common/sched_credit2.c > > b/xen/common/sched_credit2.c > > index 3b9aa27..5a04985 100644 > > --- a/xen/common/sched_credit2.c > > +++ b/xen/common/sched_credit2.c > > @@ -1620,15 +1620,23 @@ csched2_vcpu_insert(const struct scheduler > > *ops, struct vcpu *vc) > > > > BUG_ON(is_idle_vcpu(vc)); > > > > + /* Locks in cpu_pick expect irqs to be disabled */ > > + local_irq_disable(); > This doesn't make the problem much worse, but is there a plan to fix > this issue? > There's a little bit more than a plan. I've got a proof of concept implementation which was working (now I need to refresh it), but for which I never really managed to evaluate the performance impact as accurately as I wanted to. In fact, I actually have a couple of variants implemented, that I was comparing against each others, in addition to against 'vanilla'. The problem was that I really was not seeing any impact at all, which looked strange (I was expecting improvement, at least on some workloads), and I wanted to investigate further. I'm leaving here the link to two branches, where I stashed some of the code that I have come up so far. As I said, it's WIP and needs refreshing and reworking. > None of the scheduler-accounting functions should be disabling > interrupts. > They don't. But you can't keep irq disabled for some operations and enabled for others, on the same lock (because of the irq-safety spinlock checks/enforcement). So you have to always keep IRQ enabled, for all scheduling operations, which is ok for _almost_ all of them, with the only exception of the wakeup of a vcpu. So, the idea was to treat that one case specially, i.e., put the waking vcpus in a queue, and then drain the queue somehow. The insertion in the queue needs to be done disabling interrupts, but the draining --which is where the actual scheduling related hooks and operations are done-- can be done with IRQs on, which is what we want. What I was experimenting on was trying different ways of managing such a queue, e.g., only one queue for all CPUs or per-CPU queues; or whether to always drain the queue or only pick a couple of vcpu and defer the rest again; or whether to allow concurrent draining of the queue, or only have one CPU (at a time) doing that; etc etc. The "1 queue for all" and "per-CPU queues" is what's in the following two branches: git://xenbits.xen.org/people/dariof/xen.git wip/sched/irq-enabled http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/wip/sched/irq-enabled git://xenbits.xen.org/people/dariof/xen.git wip/sched/irq-enabled-percpu http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/wip/sched/irq-enabled-percpu I'll get back to this soon. In the meanwhile, feel free to comment, toss ideas, criticize, whatever. :-D Regards, Dario
On Fri, 2016-07-15 at 19:02 +0100, George Dunlap wrote: > The generic domain creation logic in > xen/common/domctl.c:default_vcpu0_location() attempts to try to do > initial placement load-balancing by placing vcpu 0 on the least-busy > non-primary hyperthread available. Unfortunately, the logic can end > up picking a pcpu that's not in the online mask. When this is passed > to a scheduler such which assumes that the initial assignment is > valid, it causes a null pointer dereference looking up the runqueue. > > Furthermore, this initial placement doesn't take into account hard or > soft affinity, or any scheduler-specific knowledge (such as historic > runqueue load, as in credit2). > > To solve this, when inserting a vcpu, always call the per-scheduler > "pick" function to revise the initial placement. This will > automatically take all knowledge the scheduler has into account. > > Signed-off-by: George Dunlap <george.dunlap@citrix.com> > --- > CC: Dario Faggioli <dario.faggioli@citrix.com> > CC: Anshul Makkar <anshul.makkar@citrix.com> > CC: Meng Xu <mengxu@cis.upenn.edu> > CC: Jan Beulich <jbeulich@suse.com> > Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Regards, Dario
On 16/07/16 15:12, Dario Faggioli wrote: > On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote: > >> None of the scheduler-accounting functions should be disabling >> interrupts. >> > They don't. But you can't keep irq disabled for some operations and > enabled for others, on the same lock (because of the irq-safety > spinlock checks/enforcement). > > So you have to always keep IRQ enabled, for all scheduling operations, > which is ok for _almost_ all of them, with the only exception of the > wakeup of a vcpu. I know that it is all or nothing. What specific action about waking a vcpu requires holding a lock? If it is simply re-queueing the vcpu onto the runable queue, there are a number of lockless queuing algorithms which can be used. ~Andrew
On Mon, 2016-07-18 at 19:10 +0100, Andrew Cooper wrote: > On 16/07/16 15:12, Dario Faggioli wrote: > > On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote: > > So you have to always keep IRQ enabled, for all scheduling > > operations, > > which is ok for _almost_ all of them, with the only exception of > > the > > wakeup of a vcpu. > I know that it is all or nothing. What specific action about waking > a > vcpu requires holding a lock? > > If it is simply re-queueing the vcpu onto the runable queue, there > are a > number of lockless queuing algorithms which can be used. > Yes, it's vcpu_wake() that does vcpu_schedule_lock_irqsave() before, among other things, calling SCHED_OP(wake, v). What the implementation of that hook does, is scheduler dependant. In most of the case, the core of it is, as you say, enqueueing the vcpu, but that may not be all. For instance, something that, all the schedulers, after queueing, do is tickling pCPUs for having them pick up the queued work, and that also requires serialization to be correct and effective (i.e., "just" turning and mandating runqueues to be lockless would not be enough). > ~Andrew
On 18/07/2016 19:55, Dario Faggioli wrote: > On Mon, 2016-07-18 at 19:10 +0100, Andrew Cooper wrote: >> On 16/07/16 15:12, Dario Faggioli wrote: >>> On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote: >>> So you have to always keep IRQ enabled, for all scheduling >>> operations, >>> which is ok for _almost_ all of them, with the only exception of >>> the >>> wakeup of a vcpu. >> I know that it is all or nothing. What specific action about waking >> a >> vcpu requires holding a lock? >> >> If it is simply re-queueing the vcpu onto the runable queue, there >> are a >> number of lockless queuing algorithms which can be used. >> > Yes, it's vcpu_wake() that does vcpu_schedule_lock_irqsave() before, > among other things, calling SCHED_OP(wake, v). Right - this looks easy to fix. Use a per-pcpu single linked list, which can be mutated safely with cmpxchg() rather than locks, have vcpu_wake() add v to the linked list, and schedule itself a SCHEDULE_SOFTIRQ, (possibly a new SCHEDULE_WAKE_SOFTIRQ with higher priority than SCHEDULE_SOFTIRQ). Then in the schedule softirq can take the vcpu schedule lock without disabling interrupts and run the internals of what is currently vcpu_wake(). The current content of vcpu_wake() very large for what is typically executed from interrupt context. ~Andrew
On Mon, 2016-07-18 at 22:36 +0100, Andrew Cooper wrote: > On 18/07/2016 19:55, Dario Faggioli wrote: > > > > On Mon, 2016-07-18 at 19:10 +0100, Andrew Cooper wrote: > > > > > > On 16/07/16 15:12, Dario Faggioli wrote: > > > > > > > > On Fri, 2016-07-15 at 19:07 +0100, Andrew Cooper wrote: > > > > So you have to always keep IRQ enabled, for all scheduling > > > > operations, > > > > which is ok for _almost_ all of them, with the only exception > > > > of > > > > the > > > > wakeup of a vcpu. > > > I know that it is all or nothing. What specific action about > > > waking > > > a > > > vcpu requires holding a lock? > > > > > > If it is simply re-queueing the vcpu onto the runable queue, > > > there > > > are a > > > number of lockless queuing algorithms which can be used. > > > > > Yes, it's vcpu_wake() that does vcpu_schedule_lock_irqsave() > > before, > > among other things, calling SCHED_OP(wake, v). > Right - this looks easy to fix. Use a per-pcpu single linked list, > which can be mutated safely with cmpxchg() rather than locks, have > vcpu_wake() add v to the linked list, and schedule itself a > SCHEDULE_SOFTIRQ, (possibly a new SCHEDULE_WAKE_SOFTIRQ with higher > priority than SCHEDULE_SOFTIRQ). > That's exactly what I'm doing in one of the variant I've implemented so far (with a dedicated lock, rather than cmpxchg, but I was indeed looking at removing that): http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=blobdiff;f=xen/include/xen/softirq.h;h=e62c247fdc41dd4e36ceb5f8094647fa62530c56;hp=0895a162031b64ad6acff6be9d17707dad7a89bf;hb=431423e69c9fe4503d26bd4c657699284e0223b7;hpb=a282bcd6f7751d2ff428ca6c1e14120aa6fcc5fc :-) > Then in the schedule softirq can take the vcpu schedule lock without > disabling interrupts and run the internals of what is currently > vcpu_wake(). The current content of vcpu_wake() very large for what > is > typically executed from interrupt context. > Totally agree. Something I was reasoning on, and trying to assess by means of benchmarks is, for instance, when taking out the vcpus from the queue and waking them, whether to do that always "to completion" (considering that it's probably even possible that new vcpus are added to the queue itself while I'm trying to drain it), or in batches (and after each batch, let Xen do something else and re-trigger the softirq). And this (and other similar "subtleties") is where the numbers I could get were not conclusive. I'll dust off the code and let you know. :-) Regards, Dario
On 18/07/16 11:28, Dario Faggioli wrote: > On Fri, 2016-07-15 at 19:02 +0100, George Dunlap wrote: >> The generic domain creation logic in >> xen/common/domctl.c:default_vcpu0_location() attempts to try to do >> initial placement load-balancing by placing vcpu 0 on the least-busy >> non-primary hyperthread available. Unfortunately, the logic can end >> up picking a pcpu that's not in the online mask. When this is passed >> to a scheduler such which assumes that the initial assignment is >> valid, it causes a null pointer dereference looking up the runqueue. >> >> Furthermore, this initial placement doesn't take into account hard or >> soft affinity, or any scheduler-specific knowledge (such as historic >> runqueue load, as in credit2). >> >> To solve this, when inserting a vcpu, always call the per-scheduler >> "pick" function to revise the initial placement. This will >> automatically take all knowledge the scheduler has into account. >> >> Signed-off-by: George Dunlap <george.dunlap@citrix.com> >> --- >> CC: Dario Faggioli <dario.faggioli@citrix.com> >> CC: Anshul Makkar <anshul.makkar@citrix.com> >> CC: Meng Xu <mengxu@cis.upenn.edu> >> CC: Jan Beulich <jbeulich@suse.com> >> > Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Just to clarify - does this cover the sched_rt.c changes as well? Thanks, -George
On Fri, Jul 15, 2016 at 2:02 PM, George Dunlap <george.dunlap@citrix.com> wrote: > > The generic domain creation logic in > xen/common/domctl.c:default_vcpu0_location() attempts to try to do > initial placement load-balancing by placing vcpu 0 on the least-busy > non-primary hyperthread available. Unfortunately, the logic can end > up picking a pcpu that's not in the online mask. When this is passed > to a scheduler such which assumes that the initial assignment is > valid, it causes a null pointer dereference looking up the runqueue. > > Furthermore, this initial placement doesn't take into account hard or > soft affinity, or any scheduler-specific knowledge (such as historic > runqueue load, as in credit2). > > To solve this, when inserting a vcpu, always call the per-scheduler > "pick" function to revise the initial placement. This will > automatically take all knowledge the scheduler has into account. > > Signed-off-by: George Dunlap <george.dunlap@citrix.com> > --- > CC: Dario Faggioli <dario.faggioli@citrix.com> > CC: Anshul Makkar <anshul.makkar@citrix.com> > CC: Meng Xu <mengxu@cis.upenn.edu> > CC: Jan Beulich <jbeulich@suse.com> > --- > xen/common/sched_credit.c | 3 +++ > xen/common/sched_credit2.c | 12 ++++++++++-- > xen/common/sched_rt.c | 3 +++ > 3 files changed, 16 insertions(+), 2 deletions(-) As to xen/common/sched_rt.c, Reviewed-by: Meng Xu <mengxu@cis.upenn.edu> ------ Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/
On Mon, Jul 25, 2016 at 7:17 AM, George Dunlap <george.dunlap@citrix.com> wrote: > On 18/07/16 11:28, Dario Faggioli wrote: >> On Fri, 2016-07-15 at 19:02 +0100, George Dunlap wrote: >>> The generic domain creation logic in >>> xen/common/domctl.c:default_vcpu0_location() attempts to try to do >>> initial placement load-balancing by placing vcpu 0 on the least-busy >>> non-primary hyperthread available. Unfortunately, the logic can end >>> up picking a pcpu that's not in the online mask. When this is passed >>> to a scheduler such which assumes that the initial assignment is >>> valid, it causes a null pointer dereference looking up the runqueue. >>> >>> Furthermore, this initial placement doesn't take into account hard or >>> soft affinity, or any scheduler-specific knowledge (such as historic >>> runqueue load, as in credit2). >>> >>> To solve this, when inserting a vcpu, always call the per-scheduler >>> "pick" function to revise the initial placement. This will >>> automatically take all knowledge the scheduler has into account. >>> >>> Signed-off-by: George Dunlap <george.dunlap@citrix.com> >>> --- >>> CC: Dario Faggioli <dario.faggioli@citrix.com> >>> CC: Anshul Makkar <anshul.makkar@citrix.com> >>> CC: Meng Xu <mengxu@cis.upenn.edu> >>> CC: Jan Beulich <jbeulich@suse.com> >>> >> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> > > Just to clarify - does this cover the sched_rt.c changes as well? Hi George, I had a look at the sched_rt.c, it looks good to me. I sent out my review tag for the sched_rt.c, just now. Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/
On Mon, 2016-07-25 at 12:17 +0100, George Dunlap wrote: > On 18/07/16 11:28, Dario Faggioli wrote: > > > > On Fri, 2016-07-15 at 19:02 +0100, George Dunlap wrote: > > > > > > Signed-off-by: George Dunlap <george.dunlap@citrix.com> > > > --- > > > CC: Dario Faggioli <dario.faggioli@citrix.com> > > > CC: Anshul Makkar <anshul.makkar@citrix.com> > > > CC: Meng Xu <mengxu@cis.upenn.edu> > > > CC: Jan Beulich <jbeulich@suse.com> > > > > > Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> > Just to clarify - does this cover the sched_rt.c changes as well? > Oh, sorry, I missed this yesterday. :-( Yes, it did, but in any case, Meng sent in his own Reviewed-by (and did it for v2 also already), so, whatever. :-P Thanks and Regards, Dario
>>> On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote: > The generic domain creation logic in > xen/common/domctl.c:default_vcpu0_location() attempts to try to do > initial placement load-balancing by placing vcpu 0 on the least-busy > non-primary hyperthread available. Unfortunately, the logic can end > up picking a pcpu that's not in the online mask. When this is passed > to a scheduler such which assumes that the initial assignment is > valid, it causes a null pointer dereference looking up the runqueue. > > Furthermore, this initial placement doesn't take into account hard or > soft affinity, or any scheduler-specific knowledge (such as historic > runqueue load, as in credit2). > > To solve this, when inserting a vcpu, always call the per-scheduler > "pick" function to revise the initial placement. This will > automatically take all knowledge the scheduler has into account. > > Signed-off-by: George Dunlap <george.dunlap@citrix.com> Should this and patch 3 be backported? Jan
On Mon, 2016-08-01 at 04:40 -0600, Jan Beulich wrote: > > > > On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote: > > > > To solve this, when inserting a vcpu, always call the per-scheduler > > "pick" function to revise the initial placement. This will > > automatically take all knowledge the scheduler has into account. > > > > Signed-off-by: George Dunlap <george.dunlap@citrix.com> > > Should this and patch 3 be backported? > Yes, I think they're good backporting candidates. Dario
>>> On 01.08.16 at 14:32, <dario.faggioli@citrix.com> wrote: > On Mon, 2016-08-01 at 04:40 -0600, Jan Beulich wrote: >> > > > On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote: >> > >> > To solve this, when inserting a vcpu, always call the per-scheduler >> > "pick" function to revise the initial placement. This will >> > automatically take all knowledge the scheduler has into account. >> > >> > Signed-off-by: George Dunlap <george.dunlap@citrix.com> >> >> Should this and patch 3 be backported? >> > Yes, I think they're good backporting candidates. Well, they appear to work fine on 4.7, but putting them onto 4.5 causes an early boot crash (BUG_ON( cpu != svc->vcpu->processor ) in __runq_insert()). Pulling in e59321d154 ("credit: remove cpu argument to __runq_insert()") obviously makes that crash go away, just to, a little later, hit the similar one close to the top of csched_load_balance(). I'd really like to have those backported, but I have to ask one of you to identify which prereq-s are needed on 4.6 and 4.5 (I'll revert them from 4.5 right away, but I'll wait for an osstest flight to confirm the same issue exists on 4.6). On 4.5 I'd then additionally depend on you to indicate whether sedf needs some additional adjustment. Jan
On Fri, 2016-08-05 at 07:24 -0600, Jan Beulich wrote: > > > > On 01.08.16 at 14:32, <dario.faggioli@citrix.com> wrote: > > On Mon, 2016-08-01 at 04:40 -0600, Jan Beulich wrote: > > > > > > On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote: > > > > Signed-off-by: George Dunlap <george.dunlap@citrix.com> > > > Should this and patch 3 be backported? > > > > > Yes, I think they're good backporting candidates. > Well, they appear to work fine on 4.7, but putting them onto 4.5 > causes an early boot crash (BUG_ON( cpu != svc->vcpu->processor ) > in __runq_insert()). Pulling in e59321d154 ("credit: remove cpu > argument to __runq_insert()") obviously makes that crash go > away, just to, a little later, hit the similar one close to the top > of > csched_load_balance(). > Ah, I see.. Thanks for trying and letting us know. > I'd really like to have those backported, but I have to ask one > of you to identify which prereq-s are needed on 4.6 and 4.5 > (I'll revert them from 4.5 right away, but I'll wait for an osstest > flight to confirm the same issue exists on 4.6). On 4.5 I'd then > additionally depend on you to indicate whether sedf needs > some additional adjustment. > Ok. I can't, immediately. But if Monday is fine, I'm happy to take care of this. Dario
>>> On 05.08.16 at 16:09, <dario.faggioli@citrix.com> wrote: > On Fri, 2016-08-05 at 07:24 -0600, Jan Beulich wrote: >> > > > On 01.08.16 at 14:32, <dario.faggioli@citrix.com> wrote: >> > On Mon, 2016-08-01 at 04:40 -0600, Jan Beulich wrote: >> > > > > > On 15.07.16 at 20:02, <george.dunlap@citrix.com> wrote: >> > > > Signed-off-by: George Dunlap <george.dunlap@citrix.com> >> > > Should this and patch 3 be backported? >> > > >> > Yes, I think they're good backporting candidates. >> Well, they appear to work fine on 4.7, but putting them onto 4.5 >> causes an early boot crash (BUG_ON( cpu != svc->vcpu->processor ) >> in __runq_insert()). Pulling in e59321d154 ("credit: remove cpu >> argument to __runq_insert()") obviously makes that crash go >> away, just to, a little later, hit the similar one close to the top >> of >> csched_load_balance(). >> > Ah, I see.. Thanks for trying and letting us know. > >> I'd really like to have those backported, but I have to ask one >> of you to identify which prereq-s are needed on 4.6 and 4.5 >> (I'll revert them from 4.5 right away, but I'll wait for an osstest >> flight to confirm the same issue exists on 4.6). On 4.5 I'd then >> additionally depend on you to indicate whether sedf needs >> some additional adjustment. >> > Ok. I can't, immediately. > > But if Monday is fine, I'm happy to take care of this. Oh, of course, any time next week would be fine. I'd like to have this for 4.5.4, but in the worst case we'll ship that without it. Jan
On Fri, 2016-08-05 at 07:24 -0600, Jan Beulich wrote: > > > > On 01.08.16 at 14:32, <dario.faggioli@citrix.com> wrote: > > Yes, I think they're good backporting candidates. > Well, they appear to work fine on 4.7, but putting them onto 4.5 > causes an early boot crash (BUG_ON( cpu != svc->vcpu->processor ) > in __runq_insert()). > I just tested staging-4.5 plus your backport of these patches (i.e., the two commits that you have then reverted), and I haven't seen this... how can it that be? > Pulling in e59321d154 ("credit: remove cpu > argument to __runq_insert()") obviously makes that crash go > away, just to, a little later, hit the similar one close to the top > of > csched_load_balance(). > In fact, I'd say that as far as I can see, the same that I said about backporting to 4.6, does the trick for 4.5 as well: https://lists.xen.org/archives/html/xen-devel/2016-08/msg01673.html I've double checked that I am using the proper tree and code, and it looks like I do. Can (anyone of) you have a go with that? If I'm right, and "just" avoiding to call insert_vcpu() for idle vcpu is enough, the question about what's the best path is the same as fo 4.6. Regards, Dario
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index ac22746..a3ef00a 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -998,6 +998,9 @@ csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc) BUG_ON( is_idle_vcpu(vc) ); + /* This is safe because vc isn't yet being scheduled */ + vc->processor = csched_cpu_pick(ops, vc); + lock = vcpu_schedule_lock_irq(vc); if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running ) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 3b9aa27..5a04985 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1620,15 +1620,23 @@ csched2_vcpu_insert(const struct scheduler *ops, struct vcpu *vc) BUG_ON(is_idle_vcpu(vc)); + /* Locks in cpu_pick expect irqs to be disabled */ + local_irq_disable(); + + /* This is safe because vc isn't yet being scheduled */ + vc->processor = csched2_cpu_pick(ops, vc); + /* Add vcpu to runqueue of initial processor */ - lock = vcpu_schedule_lock_irq(vc); + lock = vcpu_schedule_lock(vc); runq_assign(ops, vc); - vcpu_schedule_unlock_irq(lock, vc); + vcpu_schedule_unlock(lock, vc); sdom->nr_vcpus++; + local_irq_enable(); + SCHED_STAT_CRANK(vcpu_insert); CSCHED2_VCPU_CHECK(vc); diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index bd3a2a0..41c61a7 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -874,6 +874,9 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc) BUG_ON( is_idle_vcpu(vc) ); + /* This is safe because vc isn't yet being scheduled */ + vc->processor = rt_cpu_pick(ops, vc); + lock = vcpu_schedule_lock_irq(vc); now = NOW();
The generic domain creation logic in xen/common/domctl.c:default_vcpu0_location() attempts to try to do initial placement load-balancing by placing vcpu 0 on the least-busy non-primary hyperthread available. Unfortunately, the logic can end up picking a pcpu that's not in the online mask. When this is passed to a scheduler such which assumes that the initial assignment is valid, it causes a null pointer dereference looking up the runqueue. Furthermore, this initial placement doesn't take into account hard or soft affinity, or any scheduler-specific knowledge (such as historic runqueue load, as in credit2). To solve this, when inserting a vcpu, always call the per-scheduler "pick" function to revise the initial placement. This will automatically take all knowledge the scheduler has into account. Signed-off-by: George Dunlap <george.dunlap@citrix.com> --- CC: Dario Faggioli <dario.faggioli@citrix.com> CC: Anshul Makkar <anshul.makkar@citrix.com> CC: Meng Xu <mengxu@cis.upenn.edu> CC: Jan Beulich <jbeulich@suse.com> --- xen/common/sched_credit.c | 3 +++ xen/common/sched_credit2.c | 12 ++++++++++-- xen/common/sched_rt.c | 3 +++ 3 files changed, 16 insertions(+), 2 deletions(-)