diff mbox

[1/1] Improved RTDS scheduler

Message ID 1451555121-8085-2-git-send-email-tiche@seas.upenn.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Tianyang Chen Dec. 31, 2015, 9:45 a.m. UTC
Budget replenishment is now handled by a dedicated timer which is
triggered at the most imminent release time of all runnable vcpus.

Signed-off-by: Tianyang Chen <tiche@seas.upenn.edu>
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Signed-off-by: Dagaen Golomb <dgolomb@seas.upenn.edu>
---
 0000-cover-letter.patch            |   16 +++
 0001-Improved-RTDS-scheduler.patch |  280 ++++++++++++++++++++++++++++++++++++
 bak                                |   62 ++++++++
 xen/common/sched_rt.c              |  159 +++++++++++---------
 4 files changed, 453 insertions(+), 64 deletions(-)
 create mode 100644 0000-cover-letter.patch
 create mode 100644 0001-Improved-RTDS-scheduler.patch
 create mode 100644 bak

Comments

Ian Campbell Jan. 4, 2016, 11:42 a.m. UTC | #1
On Thu, 2015-12-31 at 04:45 -0500, Tianyang Chen wrote:
> Budget replenishment is now handled by a dedicated timer which is
> triggered at the most imminent release time of all runnable vcpus.
> 
> Signed-off-by: Tianyang Chen <tiche@seas.upenn.edu>
> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Signed-off-by: Dagaen Golomb <dgolomb@seas.upenn.edu>
> ---
>  0000-cover-letter.patch            |   16 +++
>  0001-Improved-RTDS-scheduler.patch |  280 ++++++++++++++++++++++++++++++++++++
>  bak                                |   62 ++++++++

I assume you didn't actually intend to commit those ;-)

(In general single patches do not require a separate cover letter unless
the required context contains a large amount of information which is not
appropriate for the commit message of the actual change, I'll leave it to
you and the scheduler maintainers to decide how much of your cover letter
it would be appropriate to move to the commit message).

Ian.
Meng Xu Jan. 4, 2016, 1:48 p.m. UTC | #2
On Mon, Jan 4, 2016 at 7:42 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Thu, 2015-12-31 at 04:45 -0500, Tianyang Chen wrote:
>> Budget replenishment is now handled by a dedicated timer which is
>> triggered at the most imminent release time of all runnable vcpus.
>>
>> Signed-off-by: Tianyang Chen <tiche@seas.upenn.edu>
>> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
>> Signed-off-by: Dagaen Golomb <dgolomb@seas.upenn.edu>
>> ---
>>  0000-cover-letter.patch            |   16 +++
>>  0001-Improved-RTDS-scheduler.patch |  280 ++++++++++++++++++++++++++++++++++++
>>  bak                                |   62 ++++++++
>
> I assume you didn't actually intend to commit those ;-)

No. This version was sent by mistake. Please ignore this version. :-(

Tianyang sent another version (v2) with the subject "[PATCH V2 1/1]
Improved RTDS scheduler".


>
> (In general single patches do not require a separate cover letter unless
> the required context contains a large amount of information which is not
> appropriate for the commit message of the actual change, I'll leave it to
> you and the scheduler maintainers to decide how much of your cover letter
> it would be appropriate to move to the commit message).

The cover letter is supposed to explain the design idea of the
improved RTDS scheduler so that reviewers could (potentially) save
time in reviewing the patch, we think. :-)
Probably, the commit message should be refined and self-contained?

Thank you very much for your comments and suggestions!

Best,

Meng
Ian Campbell Jan. 4, 2016, 2:04 p.m. UTC | #3
On Mon, 2016-01-04 at 21:48 +0800, Meng Xu wrote:
> > (In general single patches do not require a separate cover letter
> > unless
> > the required context contains a large amount of information which is
> > not
> > appropriate for the commit message of the actual change, I'll leave it
> > to
> > you and the scheduler maintainers to decide how much of your cover
> > letter
> > it would be appropriate to move to the commit message).
> 
> The cover letter is supposed to explain the design idea of the
> improved RTDS scheduler so that reviewers could (potentially) save
> time in reviewing the patch, we think. :-)
> Probably, the commit message should be refined and self-contained?

If I were maintainer of this code I would likely ask that a bunch of the
information from the cover letter (i..e anything which would still be
relevant to a code archaeologist in 6 months or 10 years) was moved into
the commit message, but I'm not maintainer of this code.

Ian.
Dario Faggioli Jan. 4, 2016, 2:36 p.m. UTC | #4
On Mon, 2016-01-04 at 21:48 +0800, Meng Xu wrote:
> On Mon, Jan 4, 2016 at 7:42 PM, Ian Campbell <ian.campbell@citrix.com
> > wrote:
> > On Thu, 2015-12-31 at 04:45 -0500, Tianyang Chen wrote:
> > > Budget replenishment is now handled by a dedicated timer which is
> > > triggered at the most imminent release time of all runnable
> > > vcpus.
> > > 
> > > Signed-off-by: Tianyang Chen <tiche@seas.upenn.edu>
> > > Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> > > Signed-off-by: Dagaen Golomb <dgolomb@seas.upenn.edu>
> > > ---
> > >  0000-cover-letter.patch            |   16 +++
> > >  0001-Improved-RTDS-scheduler.patch |  280
> > > ++++++++++++++++++++++++++++++++++++
> > >  bak                                |   62 ++++++++
> > 
> > I assume you didn't actually intend to commit those ;-)
> 
> No. This version was sent by mistake. Please ignore this version. :-(
> 
> Tianyang sent another version (v2) with the subject "[PATCH V2 1/1]
> Improved RTDS scheduler".
> 
Indeed. BTW, I'm back today from Winter Holidays. I'll give a look at
this series as soon as practical.

I'll comment directly on V2, with the only exception of this very
email.

So, first of all, Cc list (which is the same here and in V2): how have
the names of the people that are in there been picked?

The patch is only about xen/common/sched_rt.c. As per MAINTAINERS, the
maintainer of the code hosted in that file is just me. Surely it's a
good thing to also include George, as he's the other maintainer of
scheduling in general.

All the other people, though, should not be bothered by being copied
directly... Especially IanC and IanJ, I'd say.

> > (In general single patches do not require a separate cover letter
> > unless
> > the required context contains a large amount of information which
> > is not
> > appropriate for the commit message of the actual change, I'll leave
> > it to
> > you and the scheduler maintainers to decide how much of your cover
> > letter
> > it would be appropriate to move to the commit message).
> 
> The cover letter is supposed to explain the design idea of the
> improved RTDS scheduler so that reviewers could (potentially) save
> time in reviewing the patch, we think. :-)
> Probably, the commit message should be refined and self-contained?
> 
That's true. In particular, the "context", defined as summary of what
happened and have been discussed on the mailing list before, during
previous submissions, etc., definitely belongs to a cover letter.

High level design ideas and concepts, also do, e.g., in order to avoid
ending up with commit changelogs that are all several pages long! :-)

All that being said, we certainly want patches' subject lines and
changelogs to be descriptive of at least what is being done, and why.
If we commit this as it is right now, here it is what we'll find in
`git log':

  Improved RTDS scheduler

  Budget replenishment is now handled by a dedicated timer which is
  triggered at the most imminent release time of all runnable vcpus.

Which won't, I'm quite sure about it, be that much useful when looking
at it, say, in 3 years! :-D

I know, finding the proper balance of what's better put where is not
always super easy, especially at the beginning... but that's part of
the process of cooking a good patch (series) :-)

A few suggestions:

 - we want tags in the subject. E.g., in this case, something like
   "xen: sched: Improved RTDS scheduler"

 - avoid (in both subject and changelog) generic terms like "improve"
   "fix", etc., unless you can explain more specifically to what they 
   refer... I mean, pretty much all commits that touch sched_rt.c can 
   have "Improve RTDS" as a subject, can't they? (Unless we decide to 
   commit something making things deliberately worse!! :-))

   In this case, e.g. (although, perhaps a bit long):
     "xen: sched: convert RTDS from time to event driven model"

 - as Ian said, there are information in the cover letter that can
   well be moved in the changelog, making it a lot more useful, and
   not less refined or self contained.

Thanks and Regards,
Dario
Tianyang Chen Jan. 6, 2016, 7:11 a.m. UTC | #5
On 1/4/2016 9:36 AM, Dario Faggioli wrote:
> On Mon, 2016-01-04 at 21:48 +0800, Meng Xu wrote:
>> On Mon, Jan 4, 2016 at 7:42 PM, Ian Campbell <ian.campbell@citrix.com
>>> wrote:
>>> On Thu, 2015-12-31 at 04:45 -0500, Tianyang Chen wrote:
>>>> Budget replenishment is now handled by a dedicated timer which is
>>>> triggered at the most imminent release time of all runnable
>>>> vcpus.
>>>>
>>>> Signed-off-by: Tianyang Chen <tiche@seas.upenn.edu>
>>>> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
>>>> Signed-off-by: Dagaen Golomb <dgolomb@seas.upenn.edu>
>>>> ---
>>>>   0000-cover-letter.patch            |   16 +++
>>>>   0001-Improved-RTDS-scheduler.patch |  280
>>>> ++++++++++++++++++++++++++++++++++++
>>>>   bak                                |   62 ++++++++
>>>
>>> I assume you didn't actually intend to commit those ;-)
>>
>> No. This version was sent by mistake. Please ignore this version. :-(
>>
>> Tianyang sent another version (v2) with the subject "[PATCH V2 1/1]
>> Improved RTDS scheduler".
>>
> Indeed. BTW, I'm back today from Winter Holidays. I'll give a look at
> this series as soon as practical.
>
> I'll comment directly on V2, with the only exception of this very
> email.
>
> So, first of all, Cc list (which is the same here and in V2): how have
> the names of the people that are in there been picked?
>

Dario: I picked the names based on previous discussion threads between 
you, Meng and Dagaen. Now I have changed who will be cc'ed so they won't 
be bothered.

> The patch is only about xen/common/sched_rt.c. As per MAINTAINERS, the
> maintainer of the code hosted in that file is just me. Surely it's a
> good thing to also include George, as he's the other maintainer of
> scheduling in general.
>
> All the other people, though, should not be bothered by being copied
> directly... Especially IanC and IanJ, I'd say.
>
>>> (In general single patches do not require a separate cover letter
>>> unless
>>> the required context contains a large amount of information which
>>> is not
>>> appropriate for the commit message of the actual change, I'll leave
>>> it to
>>> you and the scheduler maintainers to decide how much of your cover
>>> letter
>>> it would be appropriate to move to the commit message).
>>
>> The cover letter is supposed to explain the design idea of the
>> improved RTDS scheduler so that reviewers could (potentially) save
>> time in reviewing the patch, we think. :-)
>> Probably, the commit message should be refined and self-contained?
>>
> That's true. In particular, the "context", defined as summary of what
> happened and have been discussed on the mailing list before, during
> previous submissions, etc., definitely belongs to a cover letter.
>

There are pages of discussions on the current design and we figured we 
should make a cover page to summarize it and put more details in.

> High level design ideas and concepts, also do, e.g., in order to avoid
> ending up with commit changelogs that are all several pages long! :-)
>
> All that being said, we certainly want patches' subject lines and
> changelogs to be descriptive of at least what is being done, and why.
> If we commit this as it is right now, here it is what we'll find in
> `git log':
>
>    Improved RTDS scheduler
>
>    Budget replenishment is now handled by a dedicated timer which is
>    triggered at the most imminent release time of all runnable vcpus.
>
> Which won't, I'm quite sure about it, be that much useful when looking
> at it, say, in 3 years! :-D
>
> I know, finding the proper balance of what's better put where is not
> always super easy, especially at the beginning... but that's part of
> the process of cooking a good patch (series) :-)
>

> A few suggestions:
>
>   - we want tags in the subject. E.g., in this case, something like
>     "xen: sched: Improved RTDS scheduler"
>
>   - avoid (in both subject and changelog) generic terms like "improve"
>     "fix", etc., unless you can explain more specifically to what they
>     refer... I mean, pretty much all commits that touch sched_rt.c can
>     have "Improve RTDS" as a subject, can't they? (Unless we decide to
>     commit something making things deliberately worse!! :-))
>


Good call.

>     In this case, e.g. (although, perhaps a bit long):
>       "xen: sched: convert RTDS from time to event driven model"
>
>   - as Ian said, there are information in the cover letter that can
>     well be moved in the changelog, making it a lot more useful, and
>     not less refined or self contained.
>

Here is what I propose, please let me know your thoughts:

Cover letter:
1. How this patch could improve RTDS
2. Summary of previous discussion

Current RTDS scheduler is time driven and is called every 1ms. During 
each scheduler call, the repl_update() scans both runq and depeletedq, 
which might not be necessary every 1ms.

Since each vcpu is implemented as a deferable server, budget is 
preserved during its period and refilled in the next. It is not 
necessary to check every 1ms as the current design does. The 
replenishment is needed at the nearest next period(nearest 
current_deadline) of all runnable vcpus.

This improved design tries to reduce scheduler invocation by using an 
event driven approach;rt_schedule() will return a value when the 
scheduler needs to be called next time. In addition, the sched_rt will 
have one dedicated timer to handle replenishment when necessary. In 
other words, the budget replenishment and scheduler 
decision(rt_schedule) are separated.
----------------------------------------------------------------------

Changelog:
1. Changed/removed functions
2. Function graphs for handler and rt_schedule

xen: sched: convert RTDS from time to event driven model

Budget replenishment and enforcement are separated by adding a 
replenishment timer, which fires at the next most imminent release time 
of all runnable vcpus.

repl_handler(): a timer handler which is reprogrammed to fire at the 
nearest vcpu deadline to replenish vcpus on runq and depeletedq. When 
the replenishment is done, each replenished vcpu in the runq should 
tickle a pcpu to see if it needs to preempt any running vcpus.

rt_schedule(): picks the highest runnable vcpu based on cpu affinity and 
ret.time will be passed to schedule(). repl_update() has been removed.

The following functions have major changes:

rt_vcpu_wake(): when a vcpu is awake, it tickles instead of picking one 
from runq.

rt_context_saved(): when context switching is finished, the preempted 
vcpu will be put back into the runq. Picking from runq and tickling are 
removed.

schedule.c SCHEDULE_SOFTIRQ:
     rt_schedule():
         [spin_lock]
         burn_budget(scurr)
         snext = runq_pick()
         [spin_unlock]


sched_rt.c TIMER_SOFTIRQ
     replenishment_timer_handler()
         [spin_lock]
         <for_each_depleted_vcpu(i, i.repl_time < NOW()) {
             replenish(i)
             runq_tickle(i)
         }>
         program_timer()
         [spin_lock]



Thanks,
Tianyang Chen
> Thanks and Regards,
> Dario
>
diff mbox

Patch

diff --git a/0000-cover-letter.patch b/0000-cover-letter.patch
new file mode 100644
index 0000000..f5aca91
--- /dev/null
+++ b/0000-cover-letter.patch
@@ -0,0 +1,16 @@ 
+From 25ca27ea281885eb9873244a11f08e6987efb36e Mon Sep 17 00:00:00 2001
+From: Tianyang Chen <tiche@seas.upenn.edu>
+Date: Thu, 31 Dec 2015 04:05:21 -0500
+Subject: [PATCH] *** SUBJECT HERE ***
+
+*** BLURB HERE ***
+
+Tianyang Chen (1):
+  Improved RTDS scheduler
+
+ xen/common/sched_rt.c |  159 +++++++++++++++++++++++++++++--------------------
+ 1 file changed, 95 insertions(+), 64 deletions(-)
+
+-- 
+1.7.9.5
+
diff --git a/0001-Improved-RTDS-scheduler.patch b/0001-Improved-RTDS-scheduler.patch
new file mode 100644
index 0000000..02cb1f7
--- /dev/null
+++ b/0001-Improved-RTDS-scheduler.patch
@@ -0,0 +1,280 @@ 
+From 25ca27ea281885eb9873244a11f08e6987efb36e Mon Sep 17 00:00:00 2001
+From: Tianyang Chen <tianyangpenn@gmail.com>
+Date: Thu, 31 Dec 2015 01:55:19 -0500
+Subject: [PATCH] Improved RTDS scheduler
+
+Budget replenishment is now handled by a dedicated timer which is
+triggered at the most imminent release time of all runnable vcpus.
+
+Signed-off-by: Tianyang Chen <tiche@seas.upenn.edu>
+Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
+Signed-off-by: Dagaen Golomb <dgolomb@seas.upenn.edu>
+---
+ xen/common/sched_rt.c |  159 +++++++++++++++++++++++++++++--------------------
+ 1 file changed, 95 insertions(+), 64 deletions(-)
+
+diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
+index 4372486..d522272 100644
+--- a/xen/common/sched_rt.c
++++ b/xen/common/sched_rt.c
+@@ -16,6 +16,7 @@
+ #include <xen/delay.h>
+ #include <xen/event.h>
+ #include <xen/time.h>
++#include <xen/timer.h>
+ #include <xen/perfc.h>
+ #include <xen/sched-if.h>
+ #include <xen/softirq.h>
+@@ -147,6 +148,16 @@ static unsigned int nr_rt_ops;
+  * Global lock is referenced by schedule_data.schedule_lock from all
+  * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
+  */
++
++/* dedicated timer for replenishment */
++static struct timer repl_timer;
++
++/* controls when to first start the timer*/
++static int timer_started;
++
++/* handler for the replenishment timer */
++static void repl_handler(void *data); 
++
+ struct rt_private {
+     spinlock_t lock;            /* the global coarse grand lock */
+     struct list_head sdom;      /* list of availalbe domains, used for dump */
+@@ -426,6 +437,7 @@ __runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
+ static int
+ rt_init(struct scheduler *ops)
+ {
++    const int cpu = smp_processor_id(); 
+     struct rt_private *prv = xzalloc(struct rt_private);
+ 
+     printk("Initializing RTDS scheduler\n"
+@@ -454,6 +466,8 @@ rt_init(struct scheduler *ops)
+ 
+     ops->sched_data = prv;
+ 
++    init_timer(&repl_timer, repl_handler, ops, 0);
++
+     return 0;
+ 
+  no_mem:
+@@ -473,6 +487,9 @@ rt_deinit(const struct scheduler *ops)
+         xfree(_cpumask_scratch);
+         _cpumask_scratch = NULL;
+     }
++
++    kill_timer(&repl_timer);
++
+     xfree(prv);
+ }
+ 
+@@ -635,6 +652,13 @@ rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
+ 
+     /* add rt_vcpu svc to scheduler-specific vcpu list of the dom */
+     list_add_tail(&svc->sdom_elem, &svc->sdom->vcpu);
++
++    if(!timer_started)
++    {   
++        /* the first vcpu starts the timer for the first time*/
++        timer_started = 1;
++        set_timer(&repl_timer,svc->cur_deadline);
++    }
+ }
+ 
+ /*
+@@ -792,44 +816,6 @@ __runq_pick(const struct scheduler *ops, const cpumask_t *mask)
+ }
+ 
+ /*
+- * Update vcpu's budget and
+- * sort runq by insert the modifed vcpu back to runq
+- * lock is grabbed before calling this function
+- */
+-static void
+-__repl_update(const struct scheduler *ops, s_time_t now)
+-{
+-    struct list_head *runq = rt_runq(ops);
+-    struct list_head *depletedq = rt_depletedq(ops);
+-    struct list_head *iter;
+-    struct list_head *tmp;
+-    struct rt_vcpu *svc = NULL;
+-
+-    list_for_each_safe(iter, tmp, runq)
+-    {
+-        svc = __q_elem(iter);
+-        if ( now < svc->cur_deadline )
+-            break;
+-
+-        rt_update_deadline(now, svc);
+-        /* reinsert the vcpu if its deadline is updated */
+-        __q_remove(svc);
+-        __runq_insert(ops, svc);
+-    }
+-
+-    list_for_each_safe(iter, tmp, depletedq)
+-    {
+-        svc = __q_elem(iter);
+-        if ( now >= svc->cur_deadline )
+-        {
+-            rt_update_deadline(now, svc);
+-            __q_remove(svc); /* remove from depleted queue */
+-            __runq_insert(ops, svc); /* add to runq */
+-        }
+-    }
+-}
+-
+-/*
+  * schedule function for rt scheduler.
+  * The lock is already grabbed in schedule.c, no need to lock here
+  */
+@@ -848,7 +834,6 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
+     /* burn_budget would return for IDLE VCPU */
+     burn_budget(ops, scurr, now);
+ 
+-    __repl_update(ops, now);
+ 
+     if ( tasklet_work_scheduled )
+     {
+@@ -889,7 +874,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
+         }
+     }
+ 
+-    ret.time = MIN(snext->budget, MAX_SCHEDULE); /* sched quantum */
++    ret.time = snext->budget; /* invoke the scheduler next time */
+     ret.task = snext->vcpu;
+ 
+     /* TRACE */
+@@ -1033,10 +1018,6 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+ {
+     struct rt_vcpu * const svc = rt_vcpu(vc);
+     s_time_t now = NOW();
+-    struct rt_private *prv = rt_priv(ops);
+-    struct rt_vcpu *snext = NULL; /* highest priority on RunQ */
+-    struct rt_dom *sdom = NULL;
+-    cpumask_t *online;
+ 
+     BUG_ON( is_idle_vcpu(vc) );
+ 
+@@ -1074,14 +1055,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+     /* insert svc to runq/depletedq because svc is not in queue now */
+     __runq_insert(ops, svc);
+ 
+-    __repl_update(ops, now);
+-
+-    ASSERT(!list_empty(&prv->sdom));
+-    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
+-    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
+-    snext = __runq_pick(ops, online); /* pick snext from ALL valid cpus */
+-
+-    runq_tickle(ops, snext);
++    runq_tickle(ops, svc);
+ 
+     return;
+ }
+@@ -1094,10 +1068,6 @@ static void
+ rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
+ {
+     struct rt_vcpu *svc = rt_vcpu(vc);
+-    struct rt_vcpu *snext = NULL;
+-    struct rt_dom *sdom = NULL;
+-    struct rt_private *prv = rt_priv(ops);
+-    cpumask_t *online;
+     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
+ 
+     clear_bit(__RTDS_scheduled, &svc->flags);
+@@ -1108,15 +1078,8 @@ rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
+     if ( test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) &&
+          likely(vcpu_runnable(vc)) )
+     {
++        /* only insert the pre-empted vcpu back*/
+         __runq_insert(ops, svc);
+-        __repl_update(ops, NOW());
+-
+-        ASSERT(!list_empty(&prv->sdom));
+-        sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
+-        online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
+-        snext = __runq_pick(ops, online); /* pick snext from ALL cpus */
+-
+-        runq_tickle(ops, snext);
+     }
+ out:
+     vcpu_schedule_unlock_irq(lock, vc);
+@@ -1167,6 +1130,74 @@ rt_dom_cntl(
+     return rc;
+ }
+ 
++static void repl_handler(void *data){
++    unsigned long flags;
++    s_time_t now = NOW();
++    s_time_t min_repl = LONG_MAX; /* max time used in comparison*/
++    struct scheduler *ops = data; 
++    struct rt_private *prv = rt_priv(ops);
++    struct list_head *runq = rt_runq(ops);
++    struct list_head *depletedq = rt_depletedq(ops);
++    struct list_head *iter;
++    struct list_head *tmp;
++    struct rt_vcpu *svc = NULL;
++
++    spin_lock_irqsave(&prv->lock,flags);
++
++    stop_timer(&repl_timer);
++
++    list_for_each_safe(iter, tmp, runq)
++    {
++        svc = __q_elem(iter);
++        if ( now < svc->cur_deadline )
++            break;
++        rt_update_deadline(now, svc);
++        /* scan the runq to find the min release time 
++         * this happens when vcpus on runq miss deadline
++         */
++        if( min_repl> svc->cur_deadline )
++        {
++            min_repl = svc->cur_deadline;
++        }
++        /* reinsert the vcpu if its deadline is updated */
++        __q_remove(svc);
++        __runq_insert(ops, svc);
++    }
++
++    list_for_each_safe(iter, tmp, depletedq)
++    {
++        svc = __q_elem(iter);
++        if ( now >= svc->cur_deadline )
++        {
++            rt_update_deadline(now, svc);
++
++            /* scan the delp_q to find the minimal release time */
++            if(min_repl > svc->cur_deadline)
++            {    
++                min_repl = svc->cur_deadline;
++            }
++            __q_remove(svc);
++            __runq_insert(ops, svc);
++            runq_tickle(ops, svc);
++        }
++    }
++
++    /* if timer was triggered but none of the vcpus
++     * need to be refilled, set the timer to be the   
++     * default period + now
++     */
++    if(min_repl == LONG_MAX)
++    {
++        set_timer(&repl_timer, now + RTDS_DEFAULT_PERIOD);
++    }
++    else
++    {
++        /* reprogram the timer using the most imminent release time*/
++        set_timer(&repl_timer, min_repl);
++    }
++    spin_unlock_irqrestore(&prv->lock,flags);
++}
++
+ static struct rt_private _rt_priv;
+ 
+ const struct scheduler sched_rtds_def = {
+-- 
+1.7.9.5
+
diff --git a/bak b/bak
new file mode 100644
index 0000000..fa7b197
--- /dev/null
+++ b/bak
@@ -0,0 +1,62 @@ 
+From 25ca27ea281885eb9873244a11f08e6987efb36e Mon Sep 17 00:00:00 2001
+From: Tianyang Chen <tiche@seas.upenn.edu>
+Date: Thu, 31 Dec 2015 03:42:33 -0500
+Subject: [PATCH 0/1] Improved RTDS scheduler
+
+Current RTDS scheduler is time driven and is called every 1ms. During each scheduler call, the repl_update() scans both runq and depeletedq, which might not be necessary every 1ms.
+
+Since each vcpu is implemented as a deferable server, budget is preserved during its period and refilled in the next. It is not necessary to check every 1ms as the current design does. The replenishment is needed at the nearest next period(nearest current_deadline) of all runnable vcpus.
+
+This improved design tries to reduce scheduler invocation by using an event driven approach;rt_schedule() will return a value when the scheduler needs to be called next time. In addition, the sched_rt will have one dedicated timer to handle replenishment when necessary. In other words, the budget replenishment and scheduler decision(rt_schedule) are separated.
+
+Based on previous decision between Dario, Dagaen and Meng, the improved design can be implemented/modified as follows:
+
+rt_schedule(): picks the highest runnable vcpu based on cpu affinity and ret.time will be passed to schedule().
+
+rt_vcpu_wake(): when a vcpu is awake, it tickles instead of picking one from runq.
+
+rt_context_saved(): when context switching is finished, the preempted vcpu will be put back into the runq. Picking from runq and tickling are removed.
+
+repl_handler(): a timer handler which is reprogrammed to fire at the nearest vcpu deadline to replenish vcpus on depeletedq while keeping the runq sorted. When the replenishment is done, each replenished vcpu in the runq should tickle a pcpu to see if it needs to preempt any running vcpus.
+
+
+An extra field to record the last replenishing time will be added.
+
+schedule.c SCHEDULE_SOFTIRQ:
+    rt_schedule():
+        [spin_lock]
+        burn_budget(scurr)
+        snext = runq_pick()
+        [spin_unlock]
+
+
+sched_rt.c TIMER_SOFTIRQ
+    replenishment_timer_handler()
+        [spin_lock]
+        <for_each_depleted_vcpu(i, i.repl_time < NOW()) {
+            replenish(i)
+            runq_tickle(i)
+        }>
+        program_timer()
+        [spin_lock]
+
+The transient behavior should be noted. It happens between a vcpu tickles and a pcpu actually picks it. As previous discussions, this is unavoidable.
+
+Previous discussions:
+http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg02629.html
+
+Signed-off-by: Tianyang Chen <tiche@seas.upenn.edu>
+Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
+Signed-off-by: Dagaen Golomb <dgolomb@seas.upenn.edu>
+
+
+
+Tianyang Chen (1):
+  Improved RTDS scheduler
+
+ xen/common/sched_rt.c |  159 +++++++++++++++++++++++++++++--------------------
+ 1 file changed, 95 insertions(+), 64 deletions(-)
+
+-- 
+1.7.9.5
+
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 4372486..d522272 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -16,6 +16,7 @@ 
 #include <xen/delay.h>
 #include <xen/event.h>
 #include <xen/time.h>
+#include <xen/timer.h>
 #include <xen/perfc.h>
 #include <xen/sched-if.h>
 #include <xen/softirq.h>
@@ -147,6 +148,16 @@  static unsigned int nr_rt_ops;
  * Global lock is referenced by schedule_data.schedule_lock from all
  * physical cpus. It can be grabbed via vcpu_schedule_lock_irq()
  */
+
+/* dedicated timer for replenishment */
+static struct timer repl_timer;
+
+/* controls when to first start the timer*/
+static int timer_started;
+
+/* handler for the replenishment timer */
+static void repl_handler(void *data); 
+
 struct rt_private {
     spinlock_t lock;            /* the global coarse grand lock */
     struct list_head sdom;      /* list of availalbe domains, used for dump */
@@ -426,6 +437,7 @@  __runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
 static int
 rt_init(struct scheduler *ops)
 {
+    const int cpu = smp_processor_id(); 
     struct rt_private *prv = xzalloc(struct rt_private);
 
     printk("Initializing RTDS scheduler\n"
@@ -454,6 +466,8 @@  rt_init(struct scheduler *ops)
 
     ops->sched_data = prv;
 
+    init_timer(&repl_timer, repl_handler, ops, 0);
+
     return 0;
 
  no_mem:
@@ -473,6 +487,9 @@  rt_deinit(const struct scheduler *ops)
         xfree(_cpumask_scratch);
         _cpumask_scratch = NULL;
     }
+
+    kill_timer(&repl_timer);
+
     xfree(prv);
 }
 
@@ -635,6 +652,13 @@  rt_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
 
     /* add rt_vcpu svc to scheduler-specific vcpu list of the dom */
     list_add_tail(&svc->sdom_elem, &svc->sdom->vcpu);
+
+    if(!timer_started)
+    {   
+        /* the first vcpu starts the timer for the first time*/
+        timer_started = 1;
+        set_timer(&repl_timer,svc->cur_deadline);
+    }
 }
 
 /*
@@ -792,44 +816,6 @@  __runq_pick(const struct scheduler *ops, const cpumask_t *mask)
 }
 
 /*
- * Update vcpu's budget and
- * sort runq by insert the modifed vcpu back to runq
- * lock is grabbed before calling this function
- */
-static void
-__repl_update(const struct scheduler *ops, s_time_t now)
-{
-    struct list_head *runq = rt_runq(ops);
-    struct list_head *depletedq = rt_depletedq(ops);
-    struct list_head *iter;
-    struct list_head *tmp;
-    struct rt_vcpu *svc = NULL;
-
-    list_for_each_safe(iter, tmp, runq)
-    {
-        svc = __q_elem(iter);
-        if ( now < svc->cur_deadline )
-            break;
-
-        rt_update_deadline(now, svc);
-        /* reinsert the vcpu if its deadline is updated */
-        __q_remove(svc);
-        __runq_insert(ops, svc);
-    }
-
-    list_for_each_safe(iter, tmp, depletedq)
-    {
-        svc = __q_elem(iter);
-        if ( now >= svc->cur_deadline )
-        {
-            rt_update_deadline(now, svc);
-            __q_remove(svc); /* remove from depleted queue */
-            __runq_insert(ops, svc); /* add to runq */
-        }
-    }
-}
-
-/*
  * schedule function for rt scheduler.
  * The lock is already grabbed in schedule.c, no need to lock here
  */
@@ -848,7 +834,6 @@  rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
     /* burn_budget would return for IDLE VCPU */
     burn_budget(ops, scurr, now);
 
-    __repl_update(ops, now);
 
     if ( tasklet_work_scheduled )
     {
@@ -889,7 +874,7 @@  rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
         }
     }
 
-    ret.time = MIN(snext->budget, MAX_SCHEDULE); /* sched quantum */
+    ret.time = snext->budget; /* invoke the scheduler next time */
     ret.task = snext->vcpu;
 
     /* TRACE */
@@ -1033,10 +1018,6 @@  rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
 {
     struct rt_vcpu * const svc = rt_vcpu(vc);
     s_time_t now = NOW();
-    struct rt_private *prv = rt_priv(ops);
-    struct rt_vcpu *snext = NULL; /* highest priority on RunQ */
-    struct rt_dom *sdom = NULL;
-    cpumask_t *online;
 
     BUG_ON( is_idle_vcpu(vc) );
 
@@ -1074,14 +1055,7 @@  rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
     /* insert svc to runq/depletedq because svc is not in queue now */
     __runq_insert(ops, svc);
 
-    __repl_update(ops, now);
-
-    ASSERT(!list_empty(&prv->sdom));
-    sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
-    online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
-    snext = __runq_pick(ops, online); /* pick snext from ALL valid cpus */
-
-    runq_tickle(ops, snext);
+    runq_tickle(ops, svc);
 
     return;
 }
@@ -1094,10 +1068,6 @@  static void
 rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
 {
     struct rt_vcpu *svc = rt_vcpu(vc);
-    struct rt_vcpu *snext = NULL;
-    struct rt_dom *sdom = NULL;
-    struct rt_private *prv = rt_priv(ops);
-    cpumask_t *online;
     spinlock_t *lock = vcpu_schedule_lock_irq(vc);
 
     clear_bit(__RTDS_scheduled, &svc->flags);
@@ -1108,15 +1078,8 @@  rt_context_saved(const struct scheduler *ops, struct vcpu *vc)
     if ( test_and_clear_bit(__RTDS_delayed_runq_add, &svc->flags) &&
          likely(vcpu_runnable(vc)) )
     {
+        /* only insert the pre-empted vcpu back*/
         __runq_insert(ops, svc);
-        __repl_update(ops, NOW());
-
-        ASSERT(!list_empty(&prv->sdom));
-        sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
-        online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
-        snext = __runq_pick(ops, online); /* pick snext from ALL cpus */
-
-        runq_tickle(ops, snext);
     }
 out:
     vcpu_schedule_unlock_irq(lock, vc);
@@ -1167,6 +1130,74 @@  rt_dom_cntl(
     return rc;
 }
 
+static void repl_handler(void *data){
+    unsigned long flags;
+    s_time_t now = NOW();
+    s_time_t min_repl = LONG_MAX; /* max time used in comparison*/
+    struct scheduler *ops = data; 
+    struct rt_private *prv = rt_priv(ops);
+    struct list_head *runq = rt_runq(ops);
+    struct list_head *depletedq = rt_depletedq(ops);
+    struct list_head *iter;
+    struct list_head *tmp;
+    struct rt_vcpu *svc = NULL;
+
+    spin_lock_irqsave(&prv->lock,flags);
+
+    stop_timer(&repl_timer);
+
+    list_for_each_safe(iter, tmp, runq)
+    {
+        svc = __q_elem(iter);
+        if ( now < svc->cur_deadline )
+            break;
+        rt_update_deadline(now, svc);
+        /* scan the runq to find the min release time 
+         * this happens when vcpus on runq miss deadline
+         */
+        if( min_repl> svc->cur_deadline )
+        {
+            min_repl = svc->cur_deadline;
+        }
+        /* reinsert the vcpu if its deadline is updated */
+        __q_remove(svc);
+        __runq_insert(ops, svc);
+    }
+
+    list_for_each_safe(iter, tmp, depletedq)
+    {
+        svc = __q_elem(iter);
+        if ( now >= svc->cur_deadline )
+        {
+            rt_update_deadline(now, svc);
+
+            /* scan the delp_q to find the minimal release time */
+            if(min_repl > svc->cur_deadline)
+            {    
+                min_repl = svc->cur_deadline;
+            }
+            __q_remove(svc);
+            __runq_insert(ops, svc);
+            runq_tickle(ops, svc);
+        }
+    }
+
+    /* if timer was triggered but none of the vcpus
+     * need to be refilled, set the timer to be the   
+     * default period + now
+     */
+    if(min_repl == LONG_MAX)
+    {
+        set_timer(&repl_timer, now + RTDS_DEFAULT_PERIOD);
+    }
+    else
+    {
+        /* reprogram the timer using the most imminent release time*/
+        set_timer(&repl_timer, min_repl);
+    }
+    spin_unlock_irqrestore(&prv->lock,flags);
+}
+
 static struct rt_private _rt_priv;
 
 const struct scheduler sched_rtds_def = {