[v2] xen:rtds:Fix bug in budget accounting

Message ID	1477102276-4116-1-git-send-email-mengxu@cis.upenn.edu (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Meng Xu <mengxu@cis.upenn.edu> To: xen-devel@lists.xenproject.org Date: Fri, 21 Oct 2016 22:11:16 -0400 Message-Id: <1477102276-4116-1-git-send-email-mengxu@cis.upenn.edu> Cc: Wei Liu <wei.liu2@citrix.com>, Dagaen Golomb <dgolomb@cis.upenn.edu>, George Dunlap <George.Dunlap@eu.citrix.com>, Haoran Li <lihaoran@wustl.edu>, Dario Faggioli <dario.faggioli@citrix.com>, Linh Thi Xuan Phan <linhphan@cis.upenn.edu>, Meng Xu <xumengpanda@gmail.com>, Meng Xu <mengxu@cis.upenn.edu>, Tianyang Chen <tiche@cis.upenn.edu> Subject: [Xen-devel] [PATCH v2] xen:rtds:Fix bug in budget accounting Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Meng Xu Oct. 22, 2016, 2:11 a.m. UTC

Bug scenario:
While a VCPU is running on a core, it may misses its deadline.
Then repl_timer_handler() will update the VCPU period and deadline.
The VCPU may have high priority with the new deadline and repl_timer_handler()
decides to keep the VCPU running on the core, but with new period and deadline.
However, the budget enforcement timer for the previous period is still armed.
When rt_schedule() is called in the new period to enforce the budget
for the previous period, current burn_budget() will deduct the time spent
in previous period from the budget in current period, which is incorrect.

Fix:
We keeps last_start always within the current period for a VCPU, so that
We only deduct the time spent in the current period from the VCPU budget.

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Reported-by: Dagaen Golomb <dgolomb@cis.upenn.edu>
---
Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Linh Thi Xuan Phan <linhphan@cis.upenn.edu>
Cc: Haoran Li <lihaoran@wustl.edu>
Cc: Meng Xu <xumengpanda@gmail.com>
Cc: Dagaen Golomb <dgolomb@cis.upenn.edu>
Cc: Tianyang Chen <tiche@cis.upenn.edu>
---
Changes from v1:
* Change commit message to make the bug scenario easire to understand;
* The two bug scenarios described in v1 can be actually fixed by this patch;
  so we do not need to change the runq_tickle
---
 xen/common/sched_rt.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Meng Xu Oct. 22, 2016, 2:18 a.m. UTC | #1

On Fri, Oct 21, 2016 at 10:11 PM, Meng Xu <mengxu@cis.upenn.edu> wrote:
> Bug scenario:
> While a VCPU is running on a core, it may misses its deadline.
> Then repl_timer_handler() will update the VCPU period and deadline.
> The VCPU may have high priority with the new deadline and repl_timer_handler()
> decides to keep the VCPU running on the core, but with new period and deadline.
> However, the budget enforcement timer for the previous period is still armed.
> When rt_schedule() is called in the new period to enforce the budget
> for the previous period, current burn_budget() will deduct the time spent
> in previous period from the budget in current period, which is incorrect.

In order to quickly show the impact of the bug, we have a quick test
case to demonstrate it.
The test case is at https://github.com/PennPanda/dagaen
It is a simple Linux module, which keeps invoking
HYPERVISOR_xen_version for thousands of times and measure the latency.

The bug will cause the maximum latency equal to the VCPU's period.
With the patch, the maximum latency is within tens of microseconds.
(On my computer with 2.1GHz, the maximum latency is 20us.)

Thanks,

Meng
-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

Dario Faggioli Oct. 26, 2016, 2:43 p.m. UTC | #2

On Fri, 2016-10-21 at 22:11 -0400, Meng Xu wrote:
> Bug scenario:
> While a VCPU is running on a core, it may misses its deadline.
>
May be useful to mention why (at least the most common reasons, like
overhead and/or system being overloaded... are there others?).

> Then repl_timer_handler() will update the VCPU period and deadline.
> The VCPU may have high priority with the new deadline and
> repl_timer_handler()
> decides to keep the VCPU running on the core, but with new period and
> deadline.
>
repl_timer_handler() does not decide what to run. It's probably better
to say something like "If the VCPU is still the highest priority one,
even with the new deadline, it will continue to run"

> However, the budget enforcement timer for the previous period is
> still armed.
> When rt_schedule() is called in the new period to enforce the budget
> for the previous period, current burn_budget() will deduct the time
> spent
> in previous period from the budget in current period, which is
> incorrect.
> 
Ok, so, at time t=10 the replenishment timer triggers. It finds the
vcpu in the following status:

 v->cur_deadline = 10
 v->cur_budget = 3
 v->last_start = 2

I.e., the vcpu still has some budget left, but, e.g., because the
system is overloaded, could not use it in its previous period.

Assuming v->period=10 and v->budget=5, rt_update_deadline(), called
from within repl_timer_handler(), will put the vcpu in the following
state:

 v->curr_deadline = 20
 v->curr_budget = 5
 v->last_start = 2

Then, at t=15 rt_schedule() is invoked, it in turns calls burn_budget()
which does:

 delta = NOW() - v->last_start = 15 - 2 = 13
 v->cur_budget -= 13

Which is too much. Is this what we are talking about?

> Fix:
> We keeps last_start always within the current period for a VCPU, so
> that
> We only deduct the time spent in the current period from the VCPU
> budget.
> 
Well, this is sort of ok, I guess.

In general, I think that cur_deadline, cur_budget and last_start are
tightly related between each others. Especially cur_budget and
last_start, considering that the former is updated in a way that
depends from the value of the latter.

The problem here seems to me to be that, indeed, we don't update
last_start all the time that we update cur_budget. If, for instance,
you look at Credit2, start_time is updated inside burn_credits(), while
in RTDS, last_start is not updated in burn_budget().

So (1), one thing that I would do is to set svc->last_start=now; in
burn_budget().

However, just doing that won't solve the problem, because
repl_timer_handler() does not call burn_budget(). I've wondered a bit
whether that is correct or not... I mean, especially in the situation
we are discussing --when repl_timer_handler() runs "before"
rt_schedule()-- how do we account for the budget spent from the last
invocation of burn_budget() and where in time we are now? Well, I'm not
really sure whether or not this is a problem.

Since we don't allow the budget to go negative, and since we are giving
both new deadline and new budget to the vcpu anyway, it may not be a
big deal, actually. Or maybe it could be an issue, but only if either
the overhead and/or the overload are so big (e.g., if the vcpu is
overrunning and doing that for more than one full period), that the
system is out of the window already.

So, let's not consider this, for the moment... And let's focus on the
fact that, in rt_update_deadline(), we are changing cur_deadline and
cur_budget, so we also need to update last_start (as we said above
they're related!). I guess we either call burn_budget from
rt_update_deadline(), or we open-code svc->last_start=now;

burn_budget() does things that we probably don't need and don't want to
do in rt_update_deadline(). In fact, if we don't allow the budget to go
negative, and ignore the (potential, and maybe non-existent) accounting
issue described above, there's no point in marking the vcpu as depleted
(in burn_budget()) and then (since we're calling it from
rt_update_deadline()) replenishing it right away! :-O

So (2), I guess the best thing to do is "just" to update last_start in
rt_update_deadline() to, which is sort of what the patch is doing.

Yet, there's something I don't understand, i.e.:

> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
> Reported-by: Dagaen Golomb <dgolomb@cis.upenn.edu>
> 
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -407,6 +407,14 @@ rt_update_deadline(s_time_t now, struct rt_vcpu
> *svc)
>          svc->cur_deadline += count * svc->period;
>      }
>  
> +    /*
> +     * svc may be scheduled to run immediately after it misses
> deadline
> +     * Then rt_update_deadline is called before rt_schedule, which 
> +     * should only deduct the time spent in current period from the
> budget
> +     */
> +    if ( svc->last_start < (svc->cur_deadline - svc->period) )
> +        svc->last_start = svc->cur_deadline - svc->period;
> +
 - Do we need the if()? Isn't it safe to just always update last_start?
   If it is, as it looks to me to be, I'd prefer it done like that.
 - Why cur_deadline-period, and not NOW() (well, actually, now)?
   Well, in the scenario above, and in all the others I can think of,
   it would be almost the same. And yet, I'd prefer using now, for
   clarity and consistency.

In summary, what I think we need is a patch that does both (1) and (2),
i.e., it makes sure that we always update last_start to NOW() all the
times that we assign to a vcpu its new budget. That would make the code
more consistent and should also fix the buggy behavior you're seeing.

What do you think?

Regards,
Dario

Meng Xu Oct. 26, 2016, 4:08 p.m. UTC | #3

On Wed, Oct 26, 2016 at 10:43 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Fri, 2016-10-21 at 22:11 -0400, Meng Xu wrote:
>> Bug scenario:
>> While a VCPU is running on a core, it may misses its deadline.
>>
> May be useful to mention why (at least the most common reasons, like
> overhead and/or system being overloaded... are there others?).

How about adding this:
For example, when the sum of the utilizations  of all VCPUs that are
pinned onto a core is larger than 1.

As you know,  the VCPU utilization is not the only factor that
determines the schedulability of gEDF scheduling. It will be hard to
give the exact situation when deadline may happen.

>
>> Then repl_timer_handler() will update the VCPU period and deadline.
>> The VCPU may have high priority with the new deadline and
>> repl_timer_handler()
>> decides to keep the VCPU running on the core, but with new period and
>> deadline.
>>
> repl_timer_handler() does not decide what to run. It's probably better
> to say something like "If the VCPU is still the highest priority one,
> even with the new deadline, it will continue to run"

OK.

>
>> However, the budget enforcement timer for the previous period is
>> still armed.
>> When rt_schedule() is called in the new period to enforce the budget
>> for the previous period, current burn_budget() will deduct the time
>> spent
>> in previous period from the budget in current period, which is
>> incorrect.
>>
> Ok, so, at time t=10 the replenishment timer triggers. It finds the
> vcpu in the following status:
>
>  v->cur_deadline = 10
>  v->cur_budget = 3
>  v->last_start = 2

The idea you got is correct.
However, the values here won't happen. If the last_start is 2 and
cur_budget is 3, the budget enforcement should be no later than t = 5.

How about using the following parameters:
v->cur_deadline = 10
v->cur_budget = 4
v->last_start = 8

rt_schedule() will be invoked at t = 12 for budget enforcement.

>
> I.e., the vcpu still has some budget left, but, e.g., because the
> system is overloaded, could not use it in its previous period.
>
> Assuming v->period=10 and v->budget=5, rt_update_deadline(), called
> from within repl_timer_handler(), will put the vcpu in the following
> state:
>
>  v->curr_deadline = 20
>  v->curr_budget = 5
>  v->last_start = 2
>

With the parameters I suggested above, it will be
 v->curr_deadline = 20
 v->curr_budget = 5
 v->last_start = 8

> Then, at t=15 rt_schedule() is invoked,

Wait, why rt_schedule() is invoked at t=15? It should be invoked at t
= 12 as mentioned above.

> it in turns calls burn_budget()
> which does:
>
>  delta = NOW() - v->last_start = 15 - 2 = 13
>  v->cur_budget -= 13
with the new parameters I suggested,
burn_budget() does:
delta = NOW() - v->last_start = 15 - 8 = 7,
which is too much.

>
> Which is too much. Is this what we are talking about?

The high-level idea you got is correct. It is what we are talking about. :-)
But the parameters used in your example will not happen. :-)

>
>> Fix:
>> We keeps last_start always within the current period for a VCPU, so
>> that
>> We only deduct the time spent in the current period from the VCPU
>> budget.
>>
> Well, this is sort of ok, I guess.
>
> In general, I think that cur_deadline, cur_budget and last_start are
> tightly related between each others. Especially cur_budget and
> last_start, considering that the former is updated in a way that
> depends from the value of the latter.
>
> The problem here seems to me to be that, indeed, we don't update
> last_start all the time that we update cur_budget. If, for instance,
> you look at Credit2, start_time is updated inside burn_credits(), while
> in RTDS, last_start is not updated in burn_budget().
>

The burn_budget() is only invoked by rt_schedule(). So we won't update
last_start until rt_schedule() is called.
However, rt_schedule() can be called after repl_timer_handle() is
called for a VCPU, which makes the last_start no longer inside the
current period.

> So (1), one thing that I would do is to set svc->last_start=now; in
> burn_budget().
>
> However, just doing that won't solve the problem, because
> repl_timer_handler() does not call burn_budget(). I've wondered a bit
> whether that is correct or not... I mean, especially in the situation
> we are discussing --when repl_timer_handler() runs "before"
> rt_schedule()-- how do we account for the budget spent from the last
> invocation of burn_budget() and where in time we are now? Well, I'm not
> really sure whether or not this is a problem.

I see. You want to account for the budget spent in the previous period
when deadline happens.

How about in the repl_timer_handler(), we check if the VCPU is current
running. If yes, we will call burn_budget() for the VCPU and update
the svc->last_start = now.

I don't really like updating the last_start in burn_budget().

last_start is the recent starting time of the VCPU. It is to record
the starting time of a currently running VCPU. So it should be updated
only when the VCPU is scheduled.
When burn_budget() is called, the VCPU can be scheduled or
de-scheduled. When the VCPU is de-scheduled, we don't have to update
the last_start, since it will be updated in the next time when the
VCPU is scheduled.

If we update last_start in burn_budget(), it won't cause incorrect
functionality. It just wastes time on one assignment operation.

>
> Since we don't allow the budget to go negative, and since we are giving
> both new deadline and new budget to the vcpu anyway, it may not be a
> big deal, actually.

OK

> Or maybe it could be an issue, but only if either
> the overhead and/or the overload are so big (e.g., if the vcpu is
> overrunning and doing that for more than one full period), that the
> system is out of the window already.

I'm sorry that I didn't quite get this part "doing that for more than
one full period".

Another thing:
rt_schedule() can be called after repl_timer_handler() for a VCPU
whose budget = period.
When this happens, the VCPU will not get any budget for a period.
Since people usually want low-latency for such VCPU and configure the
period to be large value to avoid scheduling overhead, it will cause a
long delay (equal to period) to the VCPU.

>
> So, let's not consider this, for the moment... And let's focus on the
> fact that, in rt_update_deadline(), we are changing cur_deadline and
> cur_budget, so we also need to update last_start (as we said above
> they're related!). I guess we either call burn_budget from
> rt_update_deadline(), or we open-code svc->last_start=now;

Yep. If we call burn_budget in rt_update_deadline, we will need to
update the last_start in burn_budget.

>
> burn_budget() does things that we probably don't need and don't want to
> do in rt_update_deadline(). In fact, if we don't allow the budget to go
> negative, and ignore the (potential, and maybe non-existent) accounting
> issue described above, there's no point in marking the vcpu as depleted
> (in burn_budget()) and then (since we're calling it from
> rt_update_deadline()) replenishing it right away! :-O
>
> So (2), I guess the best thing to do is "just" to update last_start in
> rt_update_deadline() to, which is sort of what the patch is doing.

Yes. This is the simplest and most neat way I can think of so far.
As you suggested, we can also call burn_budget in rt_update_deadline()
and update last_start in burn_budget.
Both solutions are doing the same thing. However, calling burn_budget
in rt_update_deadline() is more expensive than just updating the
last_start in rt_update_deadline(). :-)

>
> Yet, there's something I don't understand, i.e.:
>
>> Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
>> Reported-by: Dagaen Golomb <dgolomb@cis.upenn.edu>
>>
>> --- a/xen/common/sched_rt.c
>> +++ b/xen/common/sched_rt.c
>> @@ -407,6 +407,14 @@ rt_update_deadline(s_time_t now, struct rt_vcpu
>> *svc)
>>          svc->cur_deadline += count * svc->period;
>>      }
>>
>> +    /*
>> +     * svc may be scheduled to run immediately after it misses
>> deadline
>> +     * Then rt_update_deadline is called before rt_schedule, which
>> +     * should only deduct the time spent in current period from the
>> budget
>> +     */
>> +    if ( svc->last_start < (svc->cur_deadline - svc->period) )
>> +        svc->last_start = svc->cur_deadline - svc->period;
>> +
>  - Do we need the if()? Isn't it safe to just always update last_start?
>    If it is, as it looks to me to be, I'd prefer it done like that.

Yes. I will ditch the if().

>  - Why cur_deadline-period, and not NOW() (well, actually, now)?
>    Well, in the scenario above, and in all the others I can think of,
>    it would be almost the same. And yet, I'd prefer using now, for
>    clarity and consistency.

Hmm, it depends on how we account for the budget.
Suppose the VCPU keeps running.
the start of next period is t = 10 and the next period ends at t = 20.
rt_update_deadline is invoked at t = 10.1.

If last_start = 10, which is cur_deadline - period, we will account
for the 0.1 time unit as the budget consumed by the VCPU.
If last_start = 10.1, which is now, we will not account for the 0.1 time unit.

Since we always account for the time spent in hypervisor into the
budget of currently running VCPU, I chose last_start = cur_deadline -
period.
I'm ok for last_start = now, considering the consistency. So your call. :-)

>
> In summary, what I think we need is a patch that does both (1) and (2),
> i.e., it makes sure that we always update last_start to NOW() all the
> times that we assign to a vcpu its new budget. That would make the code
> more consistent and should also fix the buggy behavior you're seeing.
>
> What do you think?

(1) I agree.
(2) I'm ok with either way.
So I will change the code and test it.
If it solves the problem, which it should, I will send out a path that
has both (1) and (2) today.

Thank you very much!

Meng
-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

Dario Faggioli Oct. 26, 2016, 5:08 p.m. UTC | #4

On Wed, 2016-10-26 at 12:08 -0400, Meng Xu wrote:
> On Wed, Oct 26, 2016 at 10:43 AM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > 
> > On Fri, 2016-10-21 at 22:11 -0400, Meng Xu wrote:
> > > 
> > > Bug scenario:
> > > While a VCPU is running on a core, it may misses its deadline.
> > > 
> > May be useful to mention why (at least the most common reasons,
> > like
> > overhead and/or system being overloaded... are there others?).
> 
> How about adding this:
> For example, when the sum of the utilizations  of all VCPUs that are
> pinned onto a core is larger than 1.
> 
Yep, that's what I mean with "overload" (Or "oversubscription", if you
wish, or whatever).

> As you know,  the VCPU utilization is not the only factor that
> determines the schedulability of gEDF scheduling. It will be hard to
> give the exact situation when deadline may happen.
> 
Well, but that's still overload or oversubscription. It just happens to
be that overload occurs below 100% utilization! :-)

Basically, I'm not asking you to write a summary of gEDF schedulability
theory in this changelog... I'm just suggesting to mention that the
cause and the reason why this budget overrun/deadline miss situation
can happen is overload, and not, e.g., a bug in the code, or anything
like that.


> > Which is too much. Is this what we are talking about?
> 
> The high-level idea you got is correct. It is what we are talking
> about. :-)
> But the parameters used in your example will not happen. :-)
> 
Right, sorry. Let's forget the numbers, and discuss how to fix this.

> > So (1), one thing that I would do is to set svc->last_start=now; in
> > burn_budget().
> > 
> > However, just doing that won't solve the problem, because
> > repl_timer_handler() does not call burn_budget(). I've wondered a
> > bit
> > whether that is correct or not... I mean, especially in the
> > situation
> > we are discussing --when repl_timer_handler() runs "before"
> > rt_schedule()-- how do we account for the budget spent from the
> > last
> > invocation of burn_budget() and where in time we are now? Well, I'm
> > not
> > really sure whether or not this is a problem.
> 
> I see. You want to account for the budget spent in the previous
> period
> when deadline happens.
> 
> How about in the repl_timer_handler(), we check if the VCPU is
> current
> running. If yes, we will call burn_budget() for the VCPU and update
> the svc->last_start = now.
> 
No, sorry, why would we need to know if it's currently running? IMO,
there's way more checks like that in sched_rt.c than how I'd like to,
so I'm hesitant to adding more of them.

> I don't really like updating the last_start in burn_budget().
> 
> last_start is the recent starting time of the VCPU. It is to record
> the starting time of a currently running VCPU. So it should be
> updated
> only when the VCPU is scheduled.
>
Yes, the name suggest it is what you write. But how is it used? It's
used to make sure that we burn the proper amount of budget.

And it looks to me that, to serve this purpose, it needs to be updated
to NOW() (or something equivalent to that), when we touch cur_budget,
or we risk inconsistencies like, well, like the one we're discussing
here! :-)

OTOH, is there a reason for which we need to track the time at which a
vcpu started to execute? If yes, I don't see that.

So, I agree that last_start may look a bit misleading, but that's what
it is... let's change its name if you think it's worth. Note, however,
that apart from this case when the budget is actually updated within
the timer handler, because of a deadline miss, the value contained in
that variable would actually _really_ be the time at which it started
executing, even if you update it in burn_budget() and in
rt_update_deadline().

Perhaps, add a comment about this whole thing, in rt_update_deadline().

> When burn_budget() is called, the VCPU can be scheduled or
> de-scheduled. When the VCPU is de-scheduled, we don't have to update
> the last_start, since it will be updated in the next time when the
> VCPU is scheduled.
> 
> If we update last_start in burn_budget(), it won't cause incorrect
> functionality. It just wastes time on one assignment operation.
> 
Well, let's not bother about one assignment, ok? :-P

When you deschedule a vcpu, whatever it is in last_start is inaccurate
and useless anyway.

That being said, until burn_budget() is called only from 1 place, it's
also true that we don't need to bother updating last_start inside it.
But as soon as we'll add a call to burn_budget() somewhere else, if we
forgot to update last_start, we'll have problems.

For now, I don't foresee us adding such calls, but doing things in the
way that leaves the least possibilities to cause bugs is in general a
good thing. :-)

> > Or maybe it could be an issue, but only if either
> > the overhead and/or the overload are so big (e.g., if the vcpu is
> > overrunning and doing that for more than one full period), that the
> > system is out of the window already.
> 
> I'm sorry that I didn't quite get this part "doing that for more than
> one full period".
> 
Yeah, forget about it. The core of the question is that I really think
the budget should be allowed to go negative, and I actually don't
recall why we didn't do it that way.

But, really, let's not deal with this here.

> > So, let's not consider this, for the moment... And let's focus on
> > the
> > fact that, in rt_update_deadline(), we are changing cur_deadline
> > and
> > cur_budget, so we also need to update last_start (as we said above
> > they're related!). I guess we either call burn_budget from
> > rt_update_deadline(), or we open-code svc->last_start=now;
> 
> Yep. If we call burn_budget in rt_update_deadline, we will need to
> update the last_start in burn_budget.
> 
Yeah, but, as I say below, I don't think that's necessary. What I think
is necessary, is that we update cur_budget and last_start together.

> > burn_budget() does things that we probably don't need and don't
> > want to
> > do in rt_update_deadline(). In fact, if we don't allow the budget
> > to go
> > negative, and ignore the (potential, and maybe non-existent)
> > accounting
> > issue described above, there's no point in marking the vcpu as
> > depleted
> > (in burn_budget()) and then (since we're calling it from
> > rt_update_deadline()) replenishing it right away! :-O
> > 
> > So (2), I guess the best thing to do is "just" to update last_start
> > in
> > rt_update_deadline() to, which is sort of what the patch is doing.
> 
> Yes. This is the simplest and most neat way I can think of so far.
> As you suggested, we can also call burn_budget in
> rt_update_deadline()
> and update last_start in burn_budget.
> Both solutions are doing the same thing. However, calling burn_budget
> in rt_update_deadline() is more expensive than just updating the
> last_start in rt_update_deadline(). :-)
> 
It's not only more expensive, it will also require more code
adjustment, I think. In fact, there's the risk that burn budget will
set __RTDS_depleted, and we'd need to make sure that, in the case at
hand, it would get reset (either in rt_update_deadline() or in
repl_timer_handler()).

So, really, I'd say let's not do this. I mentioned it because
_conceptually_ it's what we want, but practically, we'd better achieve
the same in another way.

> > > --- a/xen/common/sched_rt.c
> > > +++ b/xen/common/sched_rt.c
> > > @@ -407,6 +407,14 @@ rt_update_deadline(s_time_t now, struct
> > > rt_vcpu
> > > *svc)
> > >          svc->cur_deadline += count * svc->period;
> > >      }
> > > 
> > > +    /*
> > > +     * svc may be scheduled to run immediately after it misses
> > > deadline
> > > +     * Then rt_update_deadline is called before rt_schedule,
> > > which
> > > +     * should only deduct the time spent in current period from
> > > the
> > > budget
> > > +     */
> > > +    if ( svc->last_start < (svc->cur_deadline - svc->period) )
> > > +        svc->last_start = svc->cur_deadline - svc->period;
> > > +
> >  - Do we need the if()? Isn't it safe to just always update
> > last_start?
> >    If it is, as it looks to me to be, I'd prefer it done like that.
> 
> Yes. I will ditch the if().
> 
Ok.

> >  - Why cur_deadline-period, and not NOW() (well, actually, now)?
> >    Well, in the scenario above, and in all the others I can think
> > of,
> >    it would be almost the same. And yet, I'd prefer using now, for
> >    clarity and consistency.
> 
> Hmm, it depends on how we account for the budget.
> Suppose the VCPU keeps running.
> the start of next period is t = 10 and the next period ends at t =
> 20.
> rt_update_deadline is invoked at t = 10.1.
> 
Yes, I know...

> If last_start = 10, which is cur_deadline - period, we will account
> for the 0.1 time unit as the budget consumed by the VCPU.
> If last_start = 10.1, which is now, we will not account for the 0.1
> time unit.
> 
...but by using 10, you're saying that you are sure that the vcpu has
been running since 10? Because, if you do that, that's what you are
charging the vcpu for.

Note that this is a genuine question that I'm _deliberately_ dumping on
you! :-P :-P

> Since we always account for the time spent in hypervisor into the
> budget of currently running VCPU, I chose last_start = cur_deadline -
> period.
> I'm ok for last_start = now, considering the consistency. So your
> call. :-)
> 
If the answer to the above question is 'yes', I guess I'm fine with
that too.

Actually, you can use cur_deadline, if you put the assignment _before_
updating the deadline (together with a comment about why we use that
instead of NOW()).

> > In summary, what I think we need is a patch that does both (1) and
> > (2),
> > i.e., it makes sure that we always update last_start to NOW() all
> > the
> > times that we assign to a vcpu its new budget. That would make the
> > code
> > more consistent and should also fix the buggy behavior you're
> > seeing.
> > 
> > What do you think?
> 
> (1) I agree.
> (2) I'm ok with either way.
> So I will change the code and test it.
> If it solves the problem, which it should, I will send out a path
> that
> has both (1) and (2) today.
> 
Ok.

Thanks,
Dario

Meng Xu Oct. 26, 2016, 5:57 p.m. UTC | #5

>> I don't really like updating the last_start in burn_budget().
>>
>> last_start is the recent starting time of the VCPU. It is to record
>> the starting time of a currently running VCPU. So it should be
>> updated
>> only when the VCPU is scheduled.
>>
> Yes, the name suggest it is what you write. But how is it used? It's
> used to make sure that we burn the proper amount of budget.
>
> And it looks to me that, to serve this purpose, it needs to be updated
> to NOW() (or something equivalent to that), when we touch cur_budget,
> or we risk inconsistencies like, well, like the one we're discussing
> here! :-)
>
> OTOH, is there a reason for which we need to track the time at which a
> vcpu started to execute? If yes, I don't see that.
>
> So, I agree that last_start may look a bit misleading, but that's what
> it is... let's change its name if you think it's worth. Note, however,
> that apart from this case when the budget is actually updated within
> the timer handler, because of a deadline miss, the value contained in
> that variable would actually _really_ be the time at which it started
> executing, even if you update it in burn_budget() and in
> rt_update_deadline().
>
> Perhaps, add a comment about this whole thing, in rt_update_deadline().
>
>> When burn_budget() is called, the VCPU can be scheduled or
>> de-scheduled. When the VCPU is de-scheduled, we don't have to update
>> the last_start, since it will be updated in the next time when the
>> VCPU is scheduled.
>>
>> If we update last_start in burn_budget(), it won't cause incorrect
>> functionality. It just wastes time on one assignment operation.
>>
> Well, let's not bother about one assignment, ok? :-P
>
> When you deschedule a vcpu, whatever it is in last_start is inaccurate
> and useless anyway.
>
> That being said, until burn_budget() is called only from 1 place, it's
> also true that we don't need to bother updating last_start inside it.
> But as soon as we'll add a call to burn_budget() somewhere else, if we
> forgot to update last_start, we'll have problems.
>
> For now, I don't foresee us adding such calls, but doing things in the
> way that leaves the least possibilities to cause bugs is in general a
> good thing. :-)
>

Sure. Then I will send a separate patch to update last_start when we
update cur_budget. It is just to avoid subtle bug in the future and
won't fix the bug we discussed here.
So I think it may be better to be a separate patch.

>
>> >  - Why cur_deadline-period, and not NOW() (well, actually, now)?
>> >    Well, in the scenario above, and in all the others I can think
>> > of,
>> >    it would be almost the same. And yet, I'd prefer using now, for
>> >    clarity and consistency.
>>
>> Hmm, it depends on how we account for the budget.
>> Suppose the VCPU keeps running.
>> the start of next period is t = 10 and the next period ends at t =
>> 20.
>> rt_update_deadline is invoked at t = 10.1.
>>
> Yes, I know...
>
>> If last_start = 10, which is cur_deadline - period, we will account
>> for the 0.1 time unit as the budget consumed by the VCPU.
>> If last_start = 10.1, which is now, we will not account for the 0.1
>> time unit.
>>
> ...but by using 10, you're saying that you are sure that the vcpu has
> been running since 10? Because, if you do that, that's what you are
> charging the vcpu for.
>
> Note that this is a genuine question that I'm _deliberately_ dumping on
> you! :-P :-P

Ah-ha. The rt_update_deadline is called at the current deadline, which
is also the start of next period. So "now" is actually a value close
to 10. Either last_start = now, or last_start = cur_deadline - period
does not make much difference. now - (cur_deadline - period) is the
delay of the repl_timer_handler(), which should not be much.

 If the vcpu is not running at 10, the last_start = now still has no
real meaning. However, last_start will be updated later when it is
scheduled. So the VCPU's parameters will still be correctly updated.

I'm using last_start = now then to make code more consistent.

>
>> Since we always account for the time spent in hypervisor into the
>> budget of currently running VCPU, I chose last_start = cur_deadline -
>> period.
>> I'm ok for last_start = now, considering the consistency. So your
>> call. :-)
>>
> If the answer to the above question is 'yes', I guess I'm fine with
> that too.
>
> Actually, you can use cur_deadline, if you put the assignment _before_
> updating the deadline (together with a comment about why we use that
> instead of NOW()).

Ah-ha. Yes. That is better than cur_deadline - period. However, I
chose to use last_start = now for the next version. ;-)

Meng

[v2] xen:rtds:Fix bug in budget accounting

Commit Message

Comments

Patch