diff mbox series

[RFC,v1,3/6] sched, credit2: improve scheduler fairness

Message ID 20200612002205.174295-4-volodymyr_babchuk@epam.com (mailing list archive)
State New, archived
Headers show
Series Fair scheduling | expand

Commit Message

Volodymyr Babchuk June 12, 2020, 12:22 a.m. UTC
Now we can make corrections for scheduling unit run time, based on
data gathered in previous patches. This includes time spent in IRQ
handlers and time spent for hypervisor housekeeping tasks. Those time
spans needs to be deduced from a total run time.

This patch adds sched_get_time_correction() function which returns
time correction value. This value should be subtracted by a scheduler
implementation from a total vCPU/shced_unit run time.

TODO: Make the corresponding changes to all other schedulers.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/common/sched/core.c    | 23 +++++++++++++++++++++++
 xen/common/sched/credit2.c |  2 +-
 xen/common/sched/private.h | 10 ++++++++++
 3 files changed, 34 insertions(+), 1 deletion(-)

Comments

Jürgen Groß June 12, 2020, 4:51 a.m. UTC | #1
On 12.06.20 02:22, Volodymyr Babchuk wrote:
> Now we can make corrections for scheduling unit run time, based on
> data gathered in previous patches. This includes time spent in IRQ
> handlers and time spent for hypervisor housekeeping tasks. Those time
> spans needs to be deduced from a total run time.
> 
> This patch adds sched_get_time_correction() function which returns
> time correction value. This value should be subtracted by a scheduler
> implementation from a total vCPU/shced_unit run time.
> 
> TODO: Make the corresponding changes to all other schedulers.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   xen/common/sched/core.c    | 23 +++++++++++++++++++++++
>   xen/common/sched/credit2.c |  2 +-
>   xen/common/sched/private.h | 10 ++++++++++
>   3 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index d597811fef..a7294ff5c3 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -974,6 +974,29 @@ void vcpu_end_hyp_task(struct vcpu *v)
>   #ifndef NDEBUG
>       v->in_hyp_task = false;
>   #endif
> +
> +s_time_t sched_get_time_correction(struct sched_unit *u)
> +{
> +    unsigned long flags;
> +    int irq, hyp;

Using "irq" for a time value is misleading IMO.

> +
> +    while ( true )
> +    {
> +        irq = atomic_read(&u->irq_time);
> +        if ( likely( irq == atomic_cmpxchg(&u->irq_time, irq, 0)) )
> +            break;
> +    }

Just use atomic_xchg()?

> +
> +    while ( true )
> +    {
> +        hyp = atomic_read(&u->hyp_time);
> +        if ( likely( hyp == atomic_cmpxchg(&u->hyp_time, hyp, 0)) )
> +            break;
> +    }
> +
> +    return irq + hyp;

Ah, I didn't look into this patch until now.

You can replace my comments about overflow of an int for patches 1 and 2
with:

   Please modify the comment about not overflowing hinting to the value
   being reset when making scheduling decisions.

And this (of course) needs to be handled in all other schedulers, too.


Juergen
Volodymyr Babchuk June 12, 2020, 11:38 a.m. UTC | #2
On Fri, 2020-06-12 at 06:51 +0200, Jürgen Groß wrote:
> On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > Now we can make corrections for scheduling unit run time, based on
> > data gathered in previous patches. This includes time spent in IRQ
> > handlers and time spent for hypervisor housekeeping tasks. Those time
> > spans needs to be deduced from a total run time.
> > 
> > This patch adds sched_get_time_correction() function which returns
> > time correction value. This value should be subtracted by a scheduler
> > implementation from a total vCPU/shced_unit run time.
> > 
> > TODO: Make the corresponding changes to all other schedulers.
> > 
> > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > ---
> >   xen/common/sched/core.c    | 23 +++++++++++++++++++++++
> >   xen/common/sched/credit2.c |  2 +-
> >   xen/common/sched/private.h | 10 ++++++++++
> >   3 files changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index d597811fef..a7294ff5c3 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -974,6 +974,29 @@ void vcpu_end_hyp_task(struct vcpu *v)
> >   #ifndef NDEBUG
> >       v->in_hyp_task = false;
> >   #endif
> > +
> > +s_time_t sched_get_time_correction(struct sched_unit *u)
> > +{
> > +    unsigned long flags;
> > +    int irq, hyp;
> 
> Using "irq" for a time value is misleading IMO.

Yes, you are right. I'll rename this variables to irq_time and
hyp_time. 

> > +
> > +    while ( true )
> > +    {
> > +        irq = atomic_read(&u->irq_time);
> > +        if ( likely( irq == atomic_cmpxchg(&u->irq_time, irq, 0)) )
> > +            break;
> > +    }
> 
> Just use atomic_xchg()?

Thanks. I somehow missed this macro.

> > +
> > +    while ( true )
> > +    {
> > +        hyp = atomic_read(&u->hyp_time);
> > +        if ( likely( hyp == atomic_cmpxchg(&u->hyp_time, hyp, 0)) )
> > +            break;
> > +    }
> > +
> > +    return irq + hyp;
> 
> Ah, I didn't look into this patch until now.
> 
> You can replace my comments about overflow of an int for patches 1 and 2
> with:
> 
>    Please modify the comment about not overflowing hinting to the value
>    being reset when making scheduling decisions.

Will do.

> And this (of course) needs to be handled in all other schedulers, too.
> 

Yes, the plan is to call this function in all schedulers. I skipped
this in RFC, because I wanted to discuss the general approch. I'll add
support for all other schedulers in the next version.
diff mbox series

Patch

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index d597811fef..a7294ff5c3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -974,6 +974,29 @@  void vcpu_end_hyp_task(struct vcpu *v)
 #ifndef NDEBUG
     v->in_hyp_task = false;
 #endif
+
+s_time_t sched_get_time_correction(struct sched_unit *u)
+{
+    unsigned long flags;
+    int irq, hyp;
+
+    while ( true )
+    {
+        irq = atomic_read(&u->irq_time);
+        if ( likely( irq == atomic_cmpxchg(&u->irq_time, irq, 0)) )
+            break;
+    }
+
+    while ( true )
+    {
+        hyp = atomic_read(&u->hyp_time);
+        if ( likely( hyp == atomic_cmpxchg(&u->hyp_time, hyp, 0)) )
+            break;
+    }
+
+    return irq + hyp;
+}
+
 }
 
 /*
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 34f05c3e2a..7a0aca078b 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -1722,7 +1722,7 @@  void burn_credits(struct csched2_runqueue_data *rqd,
         return;
     }
 
-    delta = now - svc->start_time;
+    delta = now - svc->start_time - sched_get_time_correction(svc->unit);
 
     if ( unlikely(delta <= 0) )
     {
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index b9a5b4c01c..3f4859ce23 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -604,4 +604,14 @@  void cpupool_put(struct cpupool *pool);
 int cpupool_add_domain(struct domain *d, int poolid);
 void cpupool_rm_domain(struct domain *d);
 
+/*
+ * Get amount of time spent doing non-guest related work on
+ * current scheduling unit. This includes time spent in soft IRQs
+ * and in hardware interrupt handlers.
+ *
+ * Call to this function resets the counters, so it is supposed to
+ * be called when scheduler calculates time used by the scheduling
+ * unit.
+ */
+s_time_t sched_get_time_correction(struct sched_unit *u);
 #endif /* __XEN_SCHED_IF_H__ */