Message ID | 1476890041-4248-1-git-send-email-mengxu@cis.upenn.edu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Oct 19, 2016 at 11:13 AM, Meng Xu <mengxu@cis.upenn.edu> wrote: > The bug is introduced in Xen 4.7 when we converted RTDS scheduler > from quantum-driven model to event-driven model. > We assumed rt_schedule() is always called for a VCPU > before the VCPUs budget replenished handler. > This assumption does not hold, when system is overloaded, or > when the VCPU budget is almost equal its period. > > Buggy behavior: > 1) A VCPU may get less budget that assigned in a period. > 2) A full capacity VCPU, i.e., a VCPU whose period is equal to budget, > may not get any budget in some period. I'm speculating that this bug may make a VCPU starve from receiving budget. This may cause the dom0 to freeze for an arbitrary long time in some "random" cases, especially if the dom0 only has one VCPU with large period. In the feature document, we said: +# Known issues + +* OSSTest reports occasional failures on ARM. Hopefully, this known issue can be fixed as well. Meng
On Wed, Oct 19, 2016 at 11:13:54AM -0400, Meng Xu wrote: > The bug is introduced in Xen 4.7 when we converted RTDS scheduler > from quantum-driven model to event-driven model. > We assumed rt_schedule() is always called for a VCPU > before the VCPUs budget replenished handler. > This assumption does not hold, when system is overloaded, or > when the VCPU budget is almost equal its period. > > Buggy behavior: > 1) A VCPU may get less budget that assigned in a period. > 2) A full capacity VCPU, i.e., a VCPU whose period is equal to budget, > may not get any budget in some period. > > Bug analysis: > 1) A VCPU deadline can be fast-forwarded by more than one period. > However, the VCPU last_start time was not updated immediately. > If rt_schedule() is called after rt_update_deadline(), which happens > when VCPU budget is equal to period or when VCPU has deadline miss, > burn_budget() will burn the budget that was just replenished, > although the replenished budget should be used in the most recent period only. > > We should update VCPU last_start time to the start of the current period > when rt_update_deadline() updates a VCPU period. > > 2) When a full capacity VCPU depletes its budget and is context switching out, > but has not updated the cores current running VCPU, > the budget replenish timer may be triggerred. > The replenish handler failed to re-schedule the full capacity VCPU > because it thought the VCPU is running. > > When a VCPU budget is replenished, we try to tickle a CPU. > When we find a core for a VCPU to tickle and the VCPU is context switching out, > we will always tickle the core where the VCPU was running, > if the VCPU cannot find another core to tickle > > This bug was reported by Dagaen Golomb > > Signed-off-by: Meng Xu You missed you email address here. Wei.
[cc. Tianyang...] On Wed, Oct 19, 2016 at 11:45 AM, Wei Liu <wei.liu2@citrix.com> wrote: > On Wed, Oct 19, 2016 at 11:13:54AM -0400, Meng Xu wrote: >> The bug is introduced in Xen 4.7 when we converted RTDS scheduler >> from quantum-driven model to event-driven model. >> We assumed rt_schedule() is always called for a VCPU >> before the VCPUs budget replenished handler. >> This assumption does not hold, when system is overloaded, or >> when the VCPU budget is almost equal its period. >> >> Buggy behavior: >> 1) A VCPU may get less budget that assigned in a period. >> 2) A full capacity VCPU, i.e., a VCPU whose period is equal to budget, >> may not get any budget in some period. >> >> Bug analysis: >> 1) A VCPU deadline can be fast-forwarded by more than one period. >> However, the VCPU last_start time was not updated immediately. >> If rt_schedule() is called after rt_update_deadline(), which happens >> when VCPU budget is equal to period or when VCPU has deadline miss, >> burn_budget() will burn the budget that was just replenished, >> although the replenished budget should be used in the most recent period only. >> >> We should update VCPU last_start time to the start of the current period >> when rt_update_deadline() updates a VCPU period. >> >> 2) When a full capacity VCPU depletes its budget and is context switching out, >> but has not updated the cores current running VCPU, >> the budget replenish timer may be triggerred. >> The replenish handler failed to re-schedule the full capacity VCPU >> because it thought the VCPU is running. >> >> When a VCPU budget is replenished, we try to tickle a CPU. >> When we find a core for a VCPU to tickle and the VCPU is context switching out, >> we will always tickle the core where the VCPU was running, >> if the VCPU cannot find another core to tickle >> >> This bug was reported by Dagaen Golomb >> >> Signed-off-by: Meng Xu > > You missed you email address here. > Oh, sorry! :-( Thank you very much for pointing it out, Wei! I will add my email address <mengxu@cis.upenn.edu> in the next version. Let's see if Dario and George have some comments on the fix. I can incorporate them in the next version. Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/
On Wed, 2016-10-19 at 11:13 -0400, Meng Xu wrote: > The bug is introduced in Xen 4.7 when we converted RTDS scheduler > from quantum-driven model to event-driven model. > We assumed rt_schedule() is always called for a VCPU > before the VCPUs budget replenished handler. > No, we didn't. Or at least, I've never done so, and tried as hard as I could to tell you guys to not make any assumptions on who run first. So, yes, I agree that, if there is code that only works under such assumption, it's a bug. > This assumption does not hold, when system is overloaded, or > when the VCPU budget is almost equal its period. > > Buggy behavior: > 1) A VCPU may get less budget that assigned in a period. > 2) A full capacity VCPU, i.e., a VCPU whose period is equal to > budget, > may not get any budget in some period. > So, there are two bugs. And things are very subtle, as far as I can judge from both the bugs description and the code. It would be, therefore, a lot more clear if you could send _one_ patch per bug. > Bug analysis: > 1) A VCPU deadline can be fast-forwarded by more than one period. > However, the VCPU last_start time was not updated immediately. > If rt_schedule() is called after rt_update_deadline(), which > happens > when VCPU budget is equal to period or when VCPU has deadline > miss, > burn_budget() will burn the budget that was just replenished, > although the replenished budget should be used in the most recent > period only. > -EPARSE. I've looked at the code and try to match current behavior, your proposed change, and this description, but failed. Can you be a little more precise and specific about what happens when? I'll keep looking and thinking, but any help in making all this a bit more clear would be very welcome. > 2) When a full capacity VCPU depletes its budget and is context > switching out, > but has not updated the cores current running VCPU, > "has not updated the cores current running VCPU," I've not idea what this sentence means. What is it that has not yet been updated? > the budget replenish timer may be triggerred. > The replenish handler failed to re-schedule the full capacity VCPU > because it thought the VCPU is running. > > When a VCPU budget is replenished, we try to tickle a CPU. > When we find a core for a VCPU to tickle and the VCPU is context > switching out, > we will always tickle the core where the VCPU was running, > if the VCPU cannot find another core to tickle > Can't understand much again... I guess this is the description of the solution to the bug? > This bug was reported by Dagaen Golomb > You can give credit by using the following tag: Reported by: Dagaen Golomb <xxx@yyy.zz> Thanks and Regards, Dario
On Fri, Oct 21, 2016 at 1:36 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote: > > On Wed, 2016-10-19 at 11:13 -0400, Meng Xu wrote: > > The bug is introduced in Xen 4.7 when we converted RTDS scheduler > > from quantum-driven model to event-driven model. > > We assumed rt_schedule() is always called for a VCPU > > before the VCPUs budget replenished handler. > > > No, we didn't. > > Or at least, I've never done so, and tried as hard as I could to tell > you guys to not make any assumptions on who run first. OK... I never made such assumption either. To be precise, I guess I should rephrase the words as "The current event-driven RTDS scheduler does not correctly handle the corner cases when r t_schedule() is called for a VCPU after the VCPU's budget replenishment handler. " > > > So, yes, I agree that, if there is code that only works under such > assumption, it's a bug. > > > This assumption does not hold, when system is overloaded, or > > when the VCPU budget is almost equal its period. > > > > Buggy behavior: > > 1) A VCPU may get less budget that assigned in a period. > > 2) A full capacity VCPU, i.e., a VCPU whose period is equal to > > budget, > > may not get any budget in some period. > > > So, there are two bugs. And things are very subtle, as far as I can > judge from both the bugs description and the code. > > It would be, therefore, a lot more clear if you could send _one_ patch > per bug. OK. Will do that soon. > > > > Bug analysis: > > 1) A VCPU deadline can be fast-forwarded by more than one period. > > However, the VCPU last_start time was not updated immediately. > > If rt_schedule() is called after rt_update_deadline(), which > > happens > > when VCPU budget is equal to period or when VCPU has deadline > > miss, > > burn_budget() will burn the budget that was just replenished, > > although the replenished budget should be used in the most recent > > period only. > > > -EPARSE. > > I've looked at the code and try to match current behavior, your > proposed change, and this description, but failed. > > Can you be a little more precise and specific about what happens when? Sure! Let me give an example in the new patch. > > I'll keep looking and thinking, but any help in making all this a bit > more clear would be very welcome. I'm sending a new patch in one or two hours. > > > > 2) When a full capacity VCPU depletes its budget and is context > > switching out, > > but has not updated the cores current running VCPU, > > > "has not updated the cores current running VCPU," > > I've not idea what this sentence means. What is it that has not yet > been updated? The current running VCPU is per_cpu(schedule_data, c).curr . The key point here is that rt_schedule() decides to deschedule a VCPU whose budget is just depleted, and the repl_timer_handler is called before the VCPU finishes the context switch. Until when rt_contexted_saved() is called, we won't update the per_cpu(schedule_data, c).curr. So the repl_timer_handler() still thought the VCPU is running, while the VCPU is in the progress of being scheduled out. Then repl_timer_handler() fails to reschedule the VCPU since the VCPU's budget is replenished. > > > > the budget replenish timer may be triggerred. > > The replenish handler failed to re-schedule the full capacity VCPU > > because it thought the VCPU is running. > > > > When a VCPU budget is replenished, we try to tickle a CPU. > > When we find a core for a VCPU to tickle and the VCPU is context > > switching out, > > we will always tickle the core where the VCPU was running, > > if the VCPU cannot find another core to tickle > > > Can't understand much again... I guess this is the description of the > solution to the bug? Yes. This is the solution. Hmm, I will describe the steps of how the bugs happens in the next version and clearly mark which part is the solution idea. > > > > This bug was reported by Dagaen Golomb > > > You can give credit by using the following tag: > > Reported by: Dagaen Golomb <xxx@yyy.zz> Sure! Will do. Thanks and best regards, Meng > > > Thanks and Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index d95f798..cdc5c06 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -407,6 +407,13 @@ rt_update_deadline(s_time_t now, struct rt_vcpu *svc) svc->cur_deadline += count * svc->period; } + /* + * rt_schedule may be scheduled after update deadline + * we should only deduct the budget consumed in current period + */ + if ( svc->last_start < (svc->cur_deadline - svc->period) ) + svc->last_start = svc->cur_deadline - svc->period; + svc->cur_budget = svc->budget; /* TRACE */ @@ -1195,6 +1202,19 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new) goto out; } + /* + * new may be preempted due to out of budget + * new may replenish its budget before it is contexted switched out + * then new may preempt the to-be-scheduled task on its prev cpu + */ + if ( curr_on_cpu(new->vcpu->processor) == new->vcpu && + test_bit(__RTDS_delayed_runq_add, &new->flags) ) + { + SCHED_STAT_CRANK(tickled_busy_cpu); + cpu_to_tickle = new->vcpu->processor; + goto out; + } + /* didn't tickle any cpu */ SCHED_STAT_CRANK(tickled_no_cpu); return; @@ -1472,6 +1492,7 @@ static void repl_timer_handler(void *data){ { svc = replq_elem(iter); + /* Another ready VCPU may preempt svc who updates its deadline */ if ( curr_on_cpu(svc->vcpu->processor) == svc->vcpu && !list_empty(runq) ) { @@ -1480,8 +1501,9 @@ static void repl_timer_handler(void *data){ if ( svc->cur_deadline > next_on_runq->cur_deadline ) runq_tickle(ops, next_on_runq); } - else if ( vcpu_on_q(svc) && - __test_and_clear_bit(__RTDS_depleted, &svc->flags) ) + /* svc may preempt another VCPU because it has budget again */ + if ( __test_and_clear_bit(__RTDS_depleted, &svc->flags) && + vcpu_runnable(svc->vcpu) ) runq_tickle(ops, svc); list_del(&svc->replq_elem);
The bug is introduced in Xen 4.7 when we converted RTDS scheduler from quantum-driven model to event-driven model. We assumed rt_schedule() is always called for a VCPU before the VCPUs budget replenished handler. This assumption does not hold, when system is overloaded, or when the VCPU budget is almost equal its period. Buggy behavior: 1) A VCPU may get less budget that assigned in a period. 2) A full capacity VCPU, i.e., a VCPU whose period is equal to budget, may not get any budget in some period. Bug analysis: 1) A VCPU deadline can be fast-forwarded by more than one period. However, the VCPU last_start time was not updated immediately. If rt_schedule() is called after rt_update_deadline(), which happens when VCPU budget is equal to period or when VCPU has deadline miss, burn_budget() will burn the budget that was just replenished, although the replenished budget should be used in the most recent period only. We should update VCPU last_start time to the start of the current period when rt_update_deadline() updates a VCPU period. 2) When a full capacity VCPU depletes its budget and is context switching out, but has not updated the cores current running VCPU, the budget replenish timer may be triggerred. The replenish handler failed to re-schedule the full capacity VCPU because it thought the VCPU is running. When a VCPU budget is replenished, we try to tickle a CPU. When we find a core for a VCPU to tickle and the VCPU is context switching out, we will always tickle the core where the VCPU was running, if the VCPU cannot find another core to tickle This bug was reported by Dagaen Golomb Signed-off-by: Meng Xu --- Cc: Dagaen Golomb <dgolomb@seas.upenn.edu> Cc: Dario Faggioli <dario.faggioli@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Linh Thi Xuan Phan <linhphan@cis.upenn.edu> Cc: Haoran Li <lihaoran@wustl.edu> Cc: Meng Xu <xumengpanda@gmail.com> --- xen/common/sched_rt.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-)