diff mbox

[v6,RFC] xen: sched: convert RTDS from time to event driven model

Message ID 1457109279.2959.553.camel@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli March 4, 2016, 4:34 p.m. UTC
On Fri, 2016-02-26 at 13:33 -0500, Chen, Tianyang wrote:
> Attached.
> 
Hey,

here I am... sorry it took a bit.

I've had a look, and I've been able to come up with some code that I at
least do not dislike too much! ;-P

Have a look at the attached patch (you should apply it on top of the
sched_rt.c file that you sent me in attachment).

About what was happening...

> On 2016-02-26 13:09, Dario Faggioli wrote:
> > On Fri, 2016-02-26 at 12:28 -0500, Tianyang Chen wrote:
> > > 
> > > I added debug prints in all these functions and noticed that it
> > > could
> > > be 
> > > caused by racing between spurious wakeups and context switching.
> > > 
It's not really spurious wakeup, it's regular wakeup happening
immediately after the vcpu blocked, when we still haven't been able to
execute rt_context_saved().

It's not really a race, and in fact we have the _RTDS_scheduled and
_RTDS_delayed_runq_add flags that are meant to deal with exactly these
situations.

In fact (I added some more debug printk):

> > > (XEN) vcpu1 wakes up nowhere
> > > (XEN) cpu1 picked vcpu1     *** vcpu1 is on a cpu
> > > (XEN) cpu1 picked idle      *** vcpu1 is waiting to be context
> > > switched
> > > (XEN) vcpu2 wakes up nowhere
> > > (XEN) cpu0 picked vcpu0
> > > (XEN) cpu2 picked vcpu2
> > > (XEN) cpu0 picked vcpu0
> > > (XEN) cpu0 picked vcpu0
> > > (XEN) d0 attempted to change d0v2's CR4 flags 00000620 ->
> > > 00040660
> > > (XEN) cpu0 picked vcpu0
> > > (XEN) vcpu2 sleeps on cpu
> > > (XEN) vcpu1 wakes up nowhere      *** vcpu1 wakes up without
> > > sleep?
> > > 
> > > (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:526
> > > (XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]---
> > > -
>
(XEN) [   56.113897] vcpu13 wakes up nowhere
(XEN) [   56.113906] cpu4 picked vcpu13
(XEN) [   56.113962] vcpu13 blocks
(XEN) [   56.113965] cpu4 picked idle
(XEN) [   56.113969] vcpu13 unblocks
(XEN) [   56.113972] vcpu13 wakes up nowhere
(XEN) [   56.113980] vcpu13 woken while still in scheduling tail
(XEN) [   56.113985] vcpu13 context saved and added back to queue
(XEN) [   56.113992] cpu4 picked vcpu13

So, as you can see, at 56.113962 vcpu13 blocks (note, _blocks_ !=
_sleeps_), and cpu4 goes idle. Ideally, rt_context_saved() would run
and remove the replenishment event of vcpu13 from the replenishment
queue.

But, at 56.113969, vcpu13 wakes up already, before rt_context_saved()
had a chance to run (which happens at 56.113985). It is not a spurious
wakeup, as the vcpu was actually blocked and is being unblocked.

Since rt_context_saved() hasn't run yet, the replenishment event is
still in the queue, and hence any ASSERT asking for
!__vcpu_on_replq(vcpu13) is doomed to making Xen explode! :-/

That does not happen in the execution trace above, because that was
collected with my patch applied, where I either leave the replenishment
event alone, if it still valid (i.e., no deadline miss happened during
the blocked period) or, if a new one needs to be enqueued, I dequeue
the old one first.

The end result is not particularly pretty, but I am up for doing even
worse, for the sake of keeping things like this:

 ASSERT( !__vcpu_on_replq(svc) );

at the beginning of replq_insert() (and its appropriate counterpart at
the beginning of replq_remove()).

In fact, I consider them really really helpful, when reasoning and
trying to figure out how the code works... There is nothing that I hate
more than an 'enqueue' function for which you don't know if it is ok to
call it when the entity being enqueued is in the queue already (and
what actually happens if it is).

These ASSERTs, greatly help, from that point of view, clearly stating
that, _no_, in this case it's absolutely not right to call the enqueue
function if the event is in the queue already. :-)

Hope I made myself clear.

I gave some testing to the attached patch, and it seems to work to me,
but I'd appreciate if you could do more of that.

Regards,
Dario
diff mbox

Patch

--- /home/dario/Desktop/sched_rt.c	2016-03-04 15:47:14.901787122 +0100
+++ xen/common/sched_rt.c	2016-03-04 16:55:06.746641424 +0100
@@ -439,6 +439,8 @@ 
     struct list_head *replq = rt_replq(ops);
     struct timer* repl_timer = prv->repl_timer;
 
+    ASSERT( __vcpu_on_replq(svc) );
+
     /*
      * Disarm the timer before re-programming it.
      * It doesn't matter if the vcpu to be removed
@@ -580,7 +582,7 @@ 
 }
 
 static void
-rt_deinit(const struct scheduler *ops)
+rt_deinit(struct scheduler *ops)
 {
     struct rt_private *prv = rt_priv(ops);
 
@@ -1153,6 +1155,7 @@ 
 {
     struct rt_vcpu * const svc = rt_vcpu(vc);
     s_time_t now = NOW();
+    bool_t missed;
 
     BUG_ON( is_idle_vcpu(vc) );
 
@@ -1181,28 +1184,42 @@ 
 
     /*
      * If a deadline passed while svc was asleep/blocked, we need new
-     * scheduling parameters ( a new deadline and full budget), and
-     * also a new replenishment event
+     * scheduling parameters (a new deadline and full budget).
      */
-    if ( now >= svc->cur_deadline)
-    {   
+    if ( (missed = now >= svc->cur_deadline) )
         rt_update_deadline(now, svc);
-        __replq_remove(ops, svc);
-    }
-
-   // if( !__vcpu_on_replq(svc) )
-        __replq_insert(ops, svc);
 
-    /* If context hasn't been saved for this vcpu yet, we can't put it on
-     * the Runqueue/DepletedQ. Instead, we set a flag so that it will be
-     * put on the Runqueue/DepletedQ after the context has been saved.
+    /* 
+     * If context hasn't been saved for this vcpu yet, we can't put it on
+     * the run-queue/depleted-queue. Instead, we set the appropriate flag,
+     * it the vcpu will be put back on queue after the context has been saved
+     * (in rt_context_save()).
      */
     if ( unlikely(svc->flags & RTDS_scheduled) )
     {
+        printk("vcpu%d woken while still in scheduling tail\n", vc->vcpu_id);
         set_bit(__RTDS_delayed_runq_add, &svc->flags);
+        /*
+         * The vcpu is waking up already, and we didn't even had the time to
+         * remove its next replenishment event from the replenishment queue
+         * when he blocked! No big deal. If we did not miss the deadline in
+         * the meantime, let's just leave it there. If we did, let's remove it
+         * and queue a new one (to occur at our new deadline).
+         */
+        if ( missed )
+        {
+           // XXX You may want to implement something like replq_reinsert(),
+           //     for dealing with this case, and calling that one from here
+           //     (of course!). I'd do that by merging _remove and _insert
+           //     and killing the duplicated bits.
+           __replq_remove(ops, svc); 
+           __replq_insert(ops, svc); 
+        }
         return;
     }
 
+    /* Replenishment event got cancelled when we blocked. Add it back. */
+    __replq_insert(ops, svc); 
     /* insert svc to runq/depletedq because svc is not in queue now */
     __runq_insert(ops, svc);