diff mbox

[2/3] xen: Have schedulers revise initial placement

Message ID 1470927584.6250.26.camel@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli Aug. 11, 2016, 2:59 p.m. UTC
On Fri, 2016-08-05 at 07:24 -0600, Jan Beulich wrote:
> I'd really like to have those backported, but I have to ask one
> of you to identify which prereq-s are needed on 4.6 and 4.5
> (I'll revert them from 4.5 right away, but I'll wait for an osstest
> flight to confirm the same issue exists on 4.6). 
>
Hey, I could only start working on this this morning (sorry for the
delay), and I'll continue tomorrow but, at least here, staging-4.6
(plus the patches!) crashes like this:

(XEN) [    0.000000] ----[ Xen-4.6.4-pre  x86_64  debug=y  Not tainted ]----
(XEN) [    0.000000] CPU:    0
(XEN) [    0.000000] RIP:    e008:[<ffff82d0801238c4>] _csched_cpu_pick+0x156/0x612
(XEN) [    0.000000] RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) [    0.000000] rax: 0000000000000040   rbx: 0000000000000040   rcx: 0000000000000000
(XEN) [    0.000000] rdx: 000000000000003f   rsi: 0000000000000040   rdi: 0000000000000040
(XEN) [    0.000000] rbp: ffff82d0802f7d68   rsp: ffff82d0802f7c78   r8:  0000000000000000
(XEN) [    0.000000] r9:  0000000000000001   r10: 0000000000000001   r11: 00000000000000b4
(XEN) [    0.000000] r12: 0000000000000040   r13: ffff83032152bf40   r14: ffff82d08033ae40
(XEN) [    0.000000] r15: ffff8300dbdf4000   cr0: 0000000080050033   cr4: 00000000000006e0
(XEN) [    0.000000] cr3: 00000000dba9f000   cr2: 0000000000000000
(XEN) [    0.000000] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) [    0.000000] Xen stack trace from rsp=ffff82d0802f7c78:
(XEN) [    0.000000]    0000000000000046 0000000200000092 ffff82d08033ae48 ffff82d08028b020
(XEN) [    0.000000]    0000000000000000 ffff82d0802ff040 0000000100000001 ffff82d08033ae40
(XEN) [    0.000000]    0000000000000097 ffff82d0802f7cd8 0000000000000006 ffff82d0802f7ce8
(XEN) [    0.000000]    ffff82d08012d027 ffff830321532000 ffff82d0802f7d28 ffff82d08013c779
(XEN) [    0.000000]    0000000000000000 0000000000000010 0000000000000048 0000000000000048
(XEN) [    0.000000]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) [    0.000000]    ffff82d0802f7d78 ffff8300dbdf4000 ffff83032152a000 ffff83032152bf40
(XEN) [    0.000000]    0000000000000000 ffff82d08028e020 ffff82d0802f7d78 ffff82d080123d8e
(XEN) [    0.000000]    ffff82d0802f7db8 ffff82d080123db0 ffff83032152a000 ffff8300dbdf4000
(XEN) [    0.000000]    ffff83032152a000 0000000000000000 0000000000000000 ffff82d08028e020
(XEN) [    0.000000]    ffff82d0802f7de8 ffff82d080129e8d ffff82d0802f7de8 ffff82d08013cda0
(XEN) [    0.000000]    ffff8300dbdf4000 ffff83032152a000 ffff82d0802f7e18 ffff82d080105f59
(XEN) [    0.000000]    ffff82d0802a1720 ffff82d0802a1718 ffff82d0802a1720 ffff82d0802daa60
(XEN) [    0.000000]    ffff82d0802f7e48 ffff82d0802a8145 0000000000000002 ffff83032152b610
(XEN) [    0.000000]    0000000000000002 0000000000000001 ffff82d0802f7f08 ffff82d0802c5c6f
(XEN) [    0.000000]    0000000000000000 0000000000100000 00000000014ae000 0000000000324000
(XEN) [    0.000000]    0080016300000000 0000000300000015 0000000000000000 0000000000000010
(XEN) [    0.000000]    ffff83000008dd50 ffff83000008df20 0000000000000002 ffff83000008dfb0
(XEN) [    0.000000]    0000000800000000 000000010000006e 0000000000000003 00000000000002f8
(XEN) [    0.000000]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) [    0.000000] Xen call trace:
(XEN) [    0.000000]    [<ffff82d0801238c4>] _csched_cpu_pick+0x156/0x612
(XEN) [    0.000000]    [<ffff82d080123d8e>] csched_cpu_pick+0xe/0x10
(XEN) [    0.000000]    [<ffff82d080123db0>] csched_vcpu_insert+0x20/0x145
(XEN) [    0.000000]    [<ffff82d080129e8d>] sched_init_vcpu+0x1d1/0x218
(XEN) [    0.000000]    [<ffff82d080105f59>] alloc_vcpu+0x1ba/0x2a4
(XEN) [    0.000000]    [<ffff82d0802a8145>] scheduler_init+0x1b0/0x271
(XEN) [    0.000000]    [<ffff82d0802c5c6f>] __start_xen+0x1f85/0x2550
(XEN) [    0.000000]    [<ffff82d080100073>] __high_start+0x53/0x55
(XEN) [    0.000000] 
(XEN) [    0.000000] 
(XEN) [    0.000000] ****************************************
(XEN) [    0.000000] Panic on CPU 0:
(XEN) [    0.000000] Assertion 'cpu < nr_cpu_ids' failed at ...e/SOURCES/xen/xen.git/xen/include/xen/cpumask.h:97

Which, I think needs at least this hunk (from 6b53bb4ab3c9  "sched:
better handle (not) inserting idle vCPUs in runqueues"):


So, yeah, it's proving a little more complicated than how I thought it
would have, just by looking at the patches. :-/

Will let know.

Regards,
Dario

Comments

Andrew Cooper Aug. 11, 2016, 3:51 p.m. UTC | #1
On 11/08/16 15:59, Dario Faggioli wrote:
> On Fri, 2016-08-05 at 07:24 -0600, Jan Beulich wrote:
>> I'd really like to have those backported, but I have to ask one
>> of you to identify which prereq-s are needed on 4.6 and 4.5
>> (I'll revert them from 4.5 right away, but I'll wait for an osstest
>> flight to confirm the same issue exists on 4.6). 
>>
> Hey, I could only start working on this this morning (sorry for the
> delay), and I'll continue tomorrow but, at least here, staging-4.6
> (plus the patches!) crashes like this:
>
> (XEN) [    0.000000] ----[ Xen-4.6.4-pre  x86_64  debug=y  Not tainted ]----
> (XEN) [    0.000000] CPU:    0
> (XEN) [    0.000000] RIP:    e008:[<ffff82d0801238c4>] _csched_cpu_pick+0x156/0x612
> (XEN) [    0.000000] RFLAGS: 0000000000010046   CONTEXT: hypervisor
> (XEN) [    0.000000] rax: 0000000000000040   rbx: 0000000000000040   rcx: 0000000000000000
> (XEN) [    0.000000] rdx: 000000000000003f   rsi: 0000000000000040   rdi: 0000000000000040
> (XEN) [    0.000000] rbp: ffff82d0802f7d68   rsp: ffff82d0802f7c78   r8:  0000000000000000
> (XEN) [    0.000000] r9:  0000000000000001   r10: 0000000000000001   r11: 00000000000000b4
> (XEN) [    0.000000] r12: 0000000000000040   r13: ffff83032152bf40   r14: ffff82d08033ae40
> (XEN) [    0.000000] r15: ffff8300dbdf4000   cr0: 0000000080050033   cr4: 00000000000006e0
> (XEN) [    0.000000] cr3: 00000000dba9f000   cr2: 0000000000000000
> (XEN) [    0.000000] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) [    0.000000] Xen stack trace from rsp=ffff82d0802f7c78:
> (XEN) [    0.000000]    0000000000000046 0000000200000092 ffff82d08033ae48 ffff82d08028b020
> (XEN) [    0.000000]    0000000000000000 ffff82d0802ff040 0000000100000001 ffff82d08033ae40
> (XEN) [    0.000000]    0000000000000097 ffff82d0802f7cd8 0000000000000006 ffff82d0802f7ce8
> (XEN) [    0.000000]    ffff82d08012d027 ffff830321532000 ffff82d0802f7d28 ffff82d08013c779
> (XEN) [    0.000000]    0000000000000000 0000000000000010 0000000000000048 0000000000000048
> (XEN) [    0.000000]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) [    0.000000]    ffff82d0802f7d78 ffff8300dbdf4000 ffff83032152a000 ffff83032152bf40
> (XEN) [    0.000000]    0000000000000000 ffff82d08028e020 ffff82d0802f7d78 ffff82d080123d8e
> (XEN) [    0.000000]    ffff82d0802f7db8 ffff82d080123db0 ffff83032152a000 ffff8300dbdf4000
> (XEN) [    0.000000]    ffff83032152a000 0000000000000000 0000000000000000 ffff82d08028e020
> (XEN) [    0.000000]    ffff82d0802f7de8 ffff82d080129e8d ffff82d0802f7de8 ffff82d08013cda0
> (XEN) [    0.000000]    ffff8300dbdf4000 ffff83032152a000 ffff82d0802f7e18 ffff82d080105f59
> (XEN) [    0.000000]    ffff82d0802a1720 ffff82d0802a1718 ffff82d0802a1720 ffff82d0802daa60
> (XEN) [    0.000000]    ffff82d0802f7e48 ffff82d0802a8145 0000000000000002 ffff83032152b610
> (XEN) [    0.000000]    0000000000000002 0000000000000001 ffff82d0802f7f08 ffff82d0802c5c6f
> (XEN) [    0.000000]    0000000000000000 0000000000100000 00000000014ae000 0000000000324000
> (XEN) [    0.000000]    0080016300000000 0000000300000015 0000000000000000 0000000000000010
> (XEN) [    0.000000]    ffff83000008dd50 ffff83000008df20 0000000000000002 ffff83000008dfb0
> (XEN) [    0.000000]    0000000800000000 000000010000006e 0000000000000003 00000000000002f8
> (XEN) [    0.000000]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) [    0.000000] Xen call trace:
> (XEN) [    0.000000]    [<ffff82d0801238c4>] _csched_cpu_pick+0x156/0x612
> (XEN) [    0.000000]    [<ffff82d080123d8e>] csched_cpu_pick+0xe/0x10
> (XEN) [    0.000000]    [<ffff82d080123db0>] csched_vcpu_insert+0x20/0x145
> (XEN) [    0.000000]    [<ffff82d080129e8d>] sched_init_vcpu+0x1d1/0x218
> (XEN) [    0.000000]    [<ffff82d080105f59>] alloc_vcpu+0x1ba/0x2a4
> (XEN) [    0.000000]    [<ffff82d0802a8145>] scheduler_init+0x1b0/0x271
> (XEN) [    0.000000]    [<ffff82d0802c5c6f>] __start_xen+0x1f85/0x2550
> (XEN) [    0.000000]    [<ffff82d080100073>] __high_start+0x53/0x55
> (XEN) [    0.000000] 
> (XEN) [    0.000000] 
> (XEN) [    0.000000] ****************************************
> (XEN) [    0.000000] Panic on CPU 0:
> (XEN) [    0.000000] Assertion 'cpu < nr_cpu_ids' failed at ...e/SOURCES/xen/xen.git/xen/include/xen/cpumask.h:97
>
> Which, I think needs at least this hunk (from 6b53bb4ab3c9  "sched:
> better handle (not) inserting idle vCPUs in runqueues"):
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 2beebe8..fddcd52 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -240,20 +240,22 @@ int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>      init_timer(&v->poll_timer, poll_timer_fn,
>                 v, v->processor);
>  
> -    /* Idle VCPUs are scheduled immediately. */
> +    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d->sched_priv);
> +    if ( v->sched_priv == NULL )
> +        return 1;
> +
> +    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
> +
> +    /* Idle VCPUs are scheduled immediately, so don't put them in runqueue. */
>      if ( is_idle_domain(d) )
>      {
>          per_cpu(schedule_data, v->processor).curr = v;
>          v->is_running = 1;
>      }
> -
> -    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
> -
> -    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d->sched_priv);
> -    if ( v->sched_priv == NULL )
> -        return 1;
> -
> -    SCHED_OP(DOM2OP(d), insert_vcpu, v);
> +    else
> +    {
> +        SCHED_OP(DOM2OP(d), insert_vcpu, v);
> +    }
>  
>      return 0;
>  }
>
> So, yeah, it's proving a little more complicated than how I thought it
> would have, just by looking at the patches. :-/
>
> Will let know.

FWIW, this looks very similar to the regression I just raised against
Xen 4.7 "[Xen-devel] Scheduler regression in 4.7".  The stack traces are
suspiciously similar.  I expect they have the same root cause.

~Andrew
Dario Faggioli Aug. 11, 2016, 11:35 p.m. UTC | #2
On Thu, 2016-08-11 at 16:51 +0100, Andrew Cooper wrote:
> On 11/08/16 15:59, Dario Faggioli wrote:
> > 
> > Which, I think needs at least this hunk (from 6b53bb4ab3c9  "sched:
> > better handle (not) inserting idle vCPUs in runqueues"):
> > 
> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index 2beebe8..fddcd52 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -240,20 +240,22 @@ int sched_init_vcpu(struct vcpu *v, unsigned
> > int processor)
> >      init_timer(&v->poll_timer, poll_timer_fn,
> >                 v, v->processor);
> >  
> > -    /* Idle VCPUs are scheduled immediately. */
> > +    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d-
> > >sched_priv);
> > +    if ( v->sched_priv == NULL )
> > +        return 1;
> > +
> > +    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
> > +
> > +    /* Idle VCPUs are scheduled immediately, so don't put them in
> > runqueue. */
> >      if ( is_idle_domain(d) )
> >      {
> >          per_cpu(schedule_data, v->processor).curr = v;
> >          v->is_running = 1;
> >      }
> > -
> > -    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
> > -
> > -    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d-
> > >sched_priv);
> > -    if ( v->sched_priv == NULL )
> > -        return 1;
> > -
> > -    SCHED_OP(DOM2OP(d), insert_vcpu, v);
> > +    else
> > +    {
> > +        SCHED_OP(DOM2OP(d), insert_vcpu, v);
> > +    }
> >  
> >      return 0;
> >  }
> > 
> > So, yeah, it's proving a little more complicated than how I thought
> > it
> > would have, just by looking at the patches. :-/
> > 
> > Will let know.
> FWIW, this looks very similar to the regression I just raised against
> Xen 4.7 "[Xen-devel] Scheduler regression in 4.7".  The stack traces
> are
> suspiciously similar.  
>
I thought the same at the beginning, but they actually may not be the
same or even related.

This happens at early boot, and reason is we try to call
csched_cpu_pick() on the idle vcpus, which does not make any sense, and
in fact one of the ASSERTS triggers.

In your case, system has booted fine already. And the reason for that
is you're looking at 4.7, and 4.7 is no longer calling insert_vcpu(),
which then calls csched_cpu_pick(), on idle vcpus at boot, thanks to
the patch I'm mentioning above.

And in fact, I confirm that, on 4.6, with "just" the hunk above of said
patch, I can boot, create a (large) VM, play a bit with it, shutdown or
reboot it, and shutdown the host as well.

Also, yours seems to _explode_ because of a race on the runq (in
IS_RUNQ_IDLE()), this one _asserts_ here:

        /* Pick an online CPU from the proper affinity mask */
        csched_balance_cpumask(vc, balance_step, &cpus);

        cpumask_and(&cpus, &cpus, online);
        /* If present, prefer vc's current processor */
        cpu = cpumask_test_cpu(vc->processor, &cpus)
                ? vc->processor
                : cpumask_cycle(vc->processor, &cpus);
        ASSERT(cpumask_test_cpu(cpu, &cpus));

Because, as I said, we're on early boot, and most likely, there's
almost no one in online!

> I expect they have the same root cause.
> 
No, I think they're two different things.

Regards,
Dario
diff mbox

Patch

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 2beebe8..fddcd52 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -240,20 +240,22 @@  int sched_init_vcpu(struct vcpu *v, unsigned int processor)
     init_timer(&v->poll_timer, poll_timer_fn,
                v, v->processor);
 
-    /* Idle VCPUs are scheduled immediately. */
+    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d->sched_priv);
+    if ( v->sched_priv == NULL )
+        return 1;
+
+    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
+
+    /* Idle VCPUs are scheduled immediately, so don't put them in runqueue. */
     if ( is_idle_domain(d) )
     {
         per_cpu(schedule_data, v->processor).curr = v;
         v->is_running = 1;
     }
-
-    TRACE_2D(TRC_SCHED_DOM_ADD, v->domain->domain_id, v->vcpu_id);
-
-    v->sched_priv = SCHED_OP(DOM2OP(d), alloc_vdata, v, d->sched_priv);
-    if ( v->sched_priv == NULL )
-        return 1;
-
-    SCHED_OP(DOM2OP(d), insert_vcpu, v);
+    else
+    {
+        SCHED_OP(DOM2OP(d), insert_vcpu, v);
+    }
 
     return 0;
 }