diff mbox

xen:rtds:fix bug in accounting budget

Message ID 1476890041-4248-1-git-send-email-mengxu@cis.upenn.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Meng Xu Oct. 19, 2016, 3:13 p.m. UTC
The bug is introduced in Xen 4.7 when we converted RTDS scheduler
from quantum-driven model to event-driven model.
We assumed rt_schedule() is always called for a VCPU
before the VCPUs budget replenished handler.
This assumption does not hold, when system is overloaded, or
when the VCPU budget is almost equal its period.

Buggy behavior:
1) A VCPU may get less budget that assigned in a period.
2) A full capacity VCPU, i.e., a VCPU whose period is equal to budget,
   may not get any budget in some period.

Bug analysis:
1) A VCPU deadline can be fast-forwarded by more than one period.
   However, the VCPU last_start time was not updated immediately.
   If rt_schedule() is called after rt_update_deadline(), which happens
   when VCPU budget is equal to period or when VCPU has deadline miss,
   burn_budget() will burn the budget that was just replenished,
   although the replenished budget should be used in the most recent period only.

   We should update VCPU last_start time to the start of the current period
   when rt_update_deadline() updates a VCPU period.

2) When a full capacity VCPU depletes its budget and is context switching out,
   but has not updated the cores current running VCPU,
   the budget replenish timer may be triggerred.
   The replenish handler failed to re-schedule the full capacity VCPU
   because it thought the VCPU is running.

   When a VCPU budget is replenished, we try to tickle a CPU.
   When we find a core for a VCPU to tickle and the VCPU is context switching out,
   we will always tickle the core where the VCPU was running,
   if the VCPU cannot find another core to tickle

This bug was reported by Dagaen Golomb

Signed-off-by: Meng Xu

---
Cc: Dagaen Golomb <dgolomb@seas.upenn.edu>
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Linh Thi Xuan Phan <linhphan@cis.upenn.edu>
Cc: Haoran Li <lihaoran@wustl.edu>
Cc: Meng Xu <xumengpanda@gmail.com>
---
 xen/common/sched_rt.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

Comments

Meng Xu Oct. 19, 2016, 3:30 p.m. UTC | #1
On Wed, Oct 19, 2016 at 11:13 AM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> The bug is introduced in Xen 4.7 when we converted RTDS scheduler
> from quantum-driven model to event-driven model.
> We assumed rt_schedule() is always called for a VCPU
> before the VCPUs budget replenished handler.
> This assumption does not hold, when system is overloaded, or
> when the VCPU budget is almost equal its period.
>
> Buggy behavior:
> 1) A VCPU may get less budget that assigned in a period.
> 2) A full capacity VCPU, i.e., a VCPU whose period is equal to budget,
>    may not get any budget in some period.

I'm speculating that this bug may make a VCPU starve from receiving
budget. This may cause the dom0 to freeze for an arbitrary long time
in some "random" cases, especially if the dom0 only has one VCPU with
large period.

In the feature document, we said:
+# Known issues
+
+* OSSTest reports occasional failures on ARM.

Hopefully, this known issue can be fixed as well.

Meng
Wei Liu Oct. 19, 2016, 3:45 p.m. UTC | #2
On Wed, Oct 19, 2016 at 11:13:54AM -0400, Meng Xu wrote:
> The bug is introduced in Xen 4.7 when we converted RTDS scheduler
> from quantum-driven model to event-driven model.
> We assumed rt_schedule() is always called for a VCPU
> before the VCPUs budget replenished handler.
> This assumption does not hold, when system is overloaded, or
> when the VCPU budget is almost equal its period.
> 
> Buggy behavior:
> 1) A VCPU may get less budget that assigned in a period.
> 2) A full capacity VCPU, i.e., a VCPU whose period is equal to budget,
>    may not get any budget in some period.
> 
> Bug analysis:
> 1) A VCPU deadline can be fast-forwarded by more than one period.
>    However, the VCPU last_start time was not updated immediately.
>    If rt_schedule() is called after rt_update_deadline(), which happens
>    when VCPU budget is equal to period or when VCPU has deadline miss,
>    burn_budget() will burn the budget that was just replenished,
>    although the replenished budget should be used in the most recent period only.
> 
>    We should update VCPU last_start time to the start of the current period
>    when rt_update_deadline() updates a VCPU period.
> 
> 2) When a full capacity VCPU depletes its budget and is context switching out,
>    but has not updated the cores current running VCPU,
>    the budget replenish timer may be triggerred.
>    The replenish handler failed to re-schedule the full capacity VCPU
>    because it thought the VCPU is running.
> 
>    When a VCPU budget is replenished, we try to tickle a CPU.
>    When we find a core for a VCPU to tickle and the VCPU is context switching out,
>    we will always tickle the core where the VCPU was running,
>    if the VCPU cannot find another core to tickle
> 
> This bug was reported by Dagaen Golomb
> 
> Signed-off-by: Meng Xu

You missed you email address here.

Wei.
Meng Xu Oct. 19, 2016, 3:55 p.m. UTC | #3
[cc. Tianyang...]

On Wed, Oct 19, 2016 at 11:45 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> On Wed, Oct 19, 2016 at 11:13:54AM -0400, Meng Xu wrote:
>> The bug is introduced in Xen 4.7 when we converted RTDS scheduler
>> from quantum-driven model to event-driven model.
>> We assumed rt_schedule() is always called for a VCPU
>> before the VCPUs budget replenished handler.
>> This assumption does not hold, when system is overloaded, or
>> when the VCPU budget is almost equal its period.
>>
>> Buggy behavior:
>> 1) A VCPU may get less budget that assigned in a period.
>> 2) A full capacity VCPU, i.e., a VCPU whose period is equal to budget,
>>    may not get any budget in some period.
>>
>> Bug analysis:
>> 1) A VCPU deadline can be fast-forwarded by more than one period.
>>    However, the VCPU last_start time was not updated immediately.
>>    If rt_schedule() is called after rt_update_deadline(), which happens
>>    when VCPU budget is equal to period or when VCPU has deadline miss,
>>    burn_budget() will burn the budget that was just replenished,
>>    although the replenished budget should be used in the most recent period only.
>>
>>    We should update VCPU last_start time to the start of the current period
>>    when rt_update_deadline() updates a VCPU period.
>>
>> 2) When a full capacity VCPU depletes its budget and is context switching out,
>>    but has not updated the cores current running VCPU,
>>    the budget replenish timer may be triggerred.
>>    The replenish handler failed to re-schedule the full capacity VCPU
>>    because it thought the VCPU is running.
>>
>>    When a VCPU budget is replenished, we try to tickle a CPU.
>>    When we find a core for a VCPU to tickle and the VCPU is context switching out,
>>    we will always tickle the core where the VCPU was running,
>>    if the VCPU cannot find another core to tickle
>>
>> This bug was reported by Dagaen Golomb
>>
>> Signed-off-by: Meng Xu
>
> You missed you email address here.
>

Oh, sorry! :-(
Thank you very much for pointing it out, Wei!

I will add my email address <mengxu@cis.upenn.edu> in the next version.

Let's see if Dario and George have some comments on the fix. I can
incorporate them in the next version.

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
Dario Faggioli Oct. 21, 2016, 5:36 p.m. UTC | #4
On Wed, 2016-10-19 at 11:13 -0400, Meng Xu wrote:
> The bug is introduced in Xen 4.7 when we converted RTDS scheduler
> from quantum-driven model to event-driven model.
> We assumed rt_schedule() is always called for a VCPU
> before the VCPUs budget replenished handler.
>
No, we didn't.

Or at least, I've never done so, and tried as hard as I could to tell
you guys to not make any assumptions on who run first.

So, yes, I agree that, if there is code that only works under such
assumption, it's a bug.

> This assumption does not hold, when system is overloaded, or
> when the VCPU budget is almost equal its period.
> 
> Buggy behavior:
> 1) A VCPU may get less budget that assigned in a period.
> 2) A full capacity VCPU, i.e., a VCPU whose period is equal to
> budget,
>    may not get any budget in some period.
> 
So, there are two bugs. And things are very subtle, as far as I can
judge from both the bugs description and the code.

It would be, therefore, a lot more clear if you could send _one_ patch
per bug.

> Bug analysis:
> 1) A VCPU deadline can be fast-forwarded by more than one period.
>    However, the VCPU last_start time was not updated immediately.
>    If rt_schedule() is called after rt_update_deadline(), which
> happens
>    when VCPU budget is equal to period or when VCPU has deadline
> miss,
>    burn_budget() will burn the budget that was just replenished,
>    although the replenished budget should be used in the most recent
> period only.
> 
-EPARSE.

I've looked at the code and try to match current behavior, your
proposed change, and this description, but failed.

Can you be a little more precise and specific about what happens when?

I'll keep looking and thinking, but any help in making all this a bit
more clear would be very welcome.

> 2) When a full capacity VCPU depletes its budget and is context
> switching out,
>    but has not updated the cores current running VCPU,
>
"has not updated the cores current running VCPU,"

I've not idea what this sentence means. What is it that has not yet
been updated?

>    the budget replenish timer may be triggerred.
>    The replenish handler failed to re-schedule the full capacity VCPU
>    because it thought the VCPU is running.
> 
>    When a VCPU budget is replenished, we try to tickle a CPU.
>    When we find a core for a VCPU to tickle and the VCPU is context
> switching out,
>    we will always tickle the core where the VCPU was running,
>    if the VCPU cannot find another core to tickle
> 
Can't understand much again... I guess this is the description of the
solution to the bug?

> This bug was reported by Dagaen Golomb
> 
You can give credit by using the following tag:

Reported by: Dagaen Golomb <xxx@yyy.zz>

Thanks and Regards,
Dario
Meng Xu Oct. 21, 2016, 8:58 p.m. UTC | #5
On Fri, Oct 21, 2016 at 1:36 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
>
> On Wed, 2016-10-19 at 11:13 -0400, Meng Xu wrote:
> > The bug is introduced in Xen 4.7 when we converted RTDS scheduler
> > from quantum-driven model to event-driven model.
> > We assumed rt_schedule() is always called for a VCPU
> > before the VCPUs budget replenished handler.
> >
> No, we didn't.
>
> Or at least, I've never done so, and tried as hard as I could to tell
> you guys to not make any assumptions on who run first.


OK... I never made such assumption either. To be precise, I guess I
should rephrase the words as "The current event-driven RTDS scheduler
does not correctly handle the corner cases when
 r
t_schedule()
is called for a VCPU after the VCPU's budget replenishment handler.
 "
>
>
> So, yes, I agree that, if there is code that only works under such
> assumption, it's a bug.
>
> > This assumption does not hold, when system is overloaded, or
> > when the VCPU budget is almost equal its period.
> >
> > Buggy behavior:
> > 1) A VCPU may get less budget that assigned in a period.
> > 2) A full capacity VCPU, i.e., a VCPU whose period is equal to
> > budget,
> >    may not get any budget in some period.
> >
> So, there are two bugs. And things are very subtle, as far as I can
> judge from both the bugs description and the code.
>
> It would be, therefore, a lot more clear if you could send _one_ patch
> per bug.


OK. Will do that soon.


>
>
> > Bug analysis:
> > 1) A VCPU deadline can be fast-forwarded by more than one period.
> >    However, the VCPU last_start time was not updated immediately.
> >    If rt_schedule() is called after rt_update_deadline(), which
> > happens
> >    when VCPU budget is equal to period or when VCPU has deadline
> > miss,
> >    burn_budget() will burn the budget that was just replenished,
> >    although the replenished budget should be used in the most recent
> > period only.
> >
> -EPARSE.
>
> I've looked at the code and try to match current behavior, your
> proposed change, and this description, but failed.
>
> Can you be a little more precise and specific about what happens when?


Sure!

Let me give an example in the new patch.


>
> I'll keep looking and thinking, but any help in making all this a bit
> more clear would be very welcome.


I'm sending a new patch in one or two hours.


>
>
> > 2) When a full capacity VCPU depletes its budget and is context
> > switching out,
> >    but has not updated the cores current running VCPU,
> >
> "has not updated the cores current running VCPU,"
>
> I've not idea what this sentence means. What is it that has not yet
> been updated?


The current running VCPU is per_cpu(schedule_data, c).curr .
The key point here is that rt_schedule() decides to deschedule a VCPU
whose budget is just depleted, and the repl_timer_handler is called
before the VCPU finishes the context switch. Until when
rt_contexted_saved() is called, we won't update the
per_cpu(schedule_data, c).curr. So the repl_timer_handler() still
thought the VCPU is running, while the VCPU is in the progress of
being scheduled out. Then repl_timer_handler() fails to reschedule the
VCPU since the VCPU's budget is replenished.

>
>
> >    the budget replenish timer may be triggerred.
> >    The replenish handler failed to re-schedule the full capacity VCPU
> >    because it thought the VCPU is running.
> >
> >    When a VCPU budget is replenished, we try to tickle a CPU.
> >    When we find a core for a VCPU to tickle and the VCPU is context
> > switching out,
> >    we will always tickle the core where the VCPU was running,
> >    if the VCPU cannot find another core to tickle
> >
> Can't understand much again... I guess this is the description of the
> solution to the bug?


Yes. This is the solution.

Hmm, I will describe the steps of how the bugs happens in the next
version and clearly mark which part is the solution idea.
>
>
> > This bug was reported by Dagaen Golomb
> >
> You can give credit by using the following tag:
>
> Reported by: Dagaen Golomb <xxx@yyy.zz>


Sure!  Will do.

Thanks and best regards,

Meng

>
>
> Thanks and Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
diff mbox

Patch

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index d95f798..cdc5c06 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -407,6 +407,13 @@  rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
         svc->cur_deadline += count * svc->period;
     }
 
+    /*
+     * rt_schedule may be scheduled after update deadline
+     * we should only deduct the budget consumed in current period
+     */
+    if ( svc->last_start < (svc->cur_deadline - svc->period) )
+        svc->last_start = svc->cur_deadline - svc->period;
+
     svc->cur_budget = svc->budget;
 
     /* TRACE */
@@ -1195,6 +1202,19 @@  runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
         goto out;
     }
 
+    /*
+     * new may be preempted due to out of budget
+     * new may replenish its budget before it is contexted switched out
+     * then new may preempt the to-be-scheduled task on its prev cpu
+     */
+    if ( curr_on_cpu(new->vcpu->processor) == new->vcpu &&
+         test_bit(__RTDS_delayed_runq_add, &new->flags) )
+    {
+        SCHED_STAT_CRANK(tickled_busy_cpu);
+        cpu_to_tickle = new->vcpu->processor;
+        goto out;
+    }
+
     /* didn't tickle any cpu */
     SCHED_STAT_CRANK(tickled_no_cpu);
     return;
@@ -1472,6 +1492,7 @@  static void repl_timer_handler(void *data){
     {
         svc = replq_elem(iter);
 
+        /* Another ready VCPU may preempt svc who updates its deadline */
         if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu &&
              !list_empty(runq) )
         {
@@ -1480,8 +1501,9 @@  static void repl_timer_handler(void *data){
             if ( svc->cur_deadline > next_on_runq->cur_deadline )
                 runq_tickle(ops, next_on_runq);
         }
-        else if ( vcpu_on_q(svc) &&
-                  __test_and_clear_bit(__RTDS_depleted, &svc->flags) )
+        /* svc may preempt another VCPU because it has budget again */
+        if ( __test_and_clear_bit(__RTDS_depleted, &svc->flags) &&
+             vcpu_runnable(svc->vcpu) )
             runq_tickle(ops, svc);
 
         list_del(&svc->replq_elem);