diff mbox series

[RFC,v1,1/6] sched: track time spent in IRQ handler

Message ID 20200612002205.174295-2-volodymyr_babchuk@epam.com (mailing list archive)
State New, archived
Headers show
Series Fair scheduling | expand

Commit Message

Volodymyr Babchuk June 12, 2020, 12:22 a.m. UTC
Add code that saves time spent in IRQ handler, so later we can make
adjustments to schedule unit run time.

This and following changes are called upon to provide fair
scheduling. Problem is that any running vCPU can be interrupted by to
handle IRQ which is bound to some other vCPU. Thus, current vCPU can
be charged for a time, it actually didn't used.

TODO: move vcpu_{begin|end}_irq_handler() calls to entry.S for even
more fair time tracking.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/arch/arm/irq.c      |  2 ++
 xen/arch/x86/irq.c      |  2 ++
 xen/common/sched/core.c | 29 +++++++++++++++++++++++++++++
 xen/include/xen/sched.h | 13 +++++++++++++
 4 files changed, 46 insertions(+)

Comments

Jürgen Groß June 12, 2020, 4:36 a.m. UTC | #1
On 12.06.20 02:22, Volodymyr Babchuk wrote:
> Add code that saves time spent in IRQ handler, so later we can make
> adjustments to schedule unit run time.
> 
> This and following changes are called upon to provide fair
> scheduling. Problem is that any running vCPU can be interrupted by to
> handle IRQ which is bound to some other vCPU. Thus, current vCPU can
> be charged for a time, it actually didn't used.
> 
> TODO: move vcpu_{begin|end}_irq_handler() calls to entry.S for even
> more fair time tracking.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   xen/arch/arm/irq.c      |  2 ++
>   xen/arch/x86/irq.c      |  2 ++
>   xen/common/sched/core.c | 29 +++++++++++++++++++++++++++++
>   xen/include/xen/sched.h | 13 +++++++++++++
>   4 files changed, 46 insertions(+)
> 
> diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
> index 3877657a52..51b517c0cd 100644
> --- a/xen/arch/arm/irq.c
> +++ b/xen/arch/arm/irq.c
> @@ -201,6 +201,7 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq)
>       struct irq_desc *desc = irq_to_desc(irq);
>       struct irqaction *action;
>   
> +    vcpu_begin_irq_handler();
>       perfc_incr(irqs);
>   
>       ASSERT(irq >= 16); /* SGIs do not come down this path */
> @@ -267,6 +268,7 @@ out:
>   out_no_end:
>       spin_unlock(&desc->lock);
>       irq_exit();
> +    vcpu_end_irq_handler();
>   }
>   
>   void release_irq(unsigned int irq, const void *dev_id)
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index a69937c840..3ef4221b64 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1895,6 +1895,7 @@ void do_IRQ(struct cpu_user_regs *regs)
>       int               irq = this_cpu(vector_irq)[vector];
>       struct cpu_user_regs *old_regs = set_irq_regs(regs);
>   
> +    vcpu_begin_irq_handler();
>       perfc_incr(irqs);
>       this_cpu(irq_count)++;
>       irq_enter();
> @@ -2024,6 +2025,7 @@ void do_IRQ(struct cpu_user_regs *regs)
>    out_no_unlock:
>       irq_exit();
>       set_irq_regs(old_regs);
> +    vcpu_end_irq_handler();
>   }
>   
>   static inline bool is_free_pirq(const struct domain *d,
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index cb49a8bc02..8f642ada05 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -916,6 +916,35 @@ void vcpu_unblock(struct vcpu *v)
>       vcpu_wake(v);
>   }
>   
> +void vcpu_begin_irq_handler(void)
> +{
> +    if (is_idle_vcpu(current))
> +        return;
> +
> +    /* XXX: Looks like ASSERT_INTERRUPTS_DISABLED() is available only for x86 */
> +    if ( current->irq_nesting++ )
> +        return;
> +
> +    current->irq_entry_time = NOW();
> +}
> +
> +void vcpu_end_irq_handler(void)
> +{
> +    int delta;
> +
> +    if (is_idle_vcpu(current))
> +        return;
> +
> +    ASSERT(current->irq_nesting);
> +
> +    if ( --current->irq_nesting )
> +        return;
> +
> +    /* We assume that irq handling time will not overflow int */

This assumption might not hold for long running VMs.


Juergen
Volodymyr Babchuk June 12, 2020, 11:26 a.m. UTC | #2
Hi Jurgen,

thanks for the review

On Fri, 2020-06-12 at 06:36 +0200, Jürgen Groß wrote:

> On 12.06.20 02:22, Volodymyr Babchuk wrote:

[...]

> > +void vcpu_end_irq_handler(void)
> > +{
> > +    int delta;
> > +
> > +    if (is_idle_vcpu(current))
> > +        return;
> > +
> > +    ASSERT(current->irq_nesting);
> > +
> > +    if ( --current->irq_nesting )
> > +        return;
> > +
> > +    /* We assume that irq handling time will not overflow int */
> 
> This assumption might not hold for long running VMs.

Basically, this value holds time span between calls to schedule(). This
variable gets zeroed out every time scheduler requests for time
adjustment value. So, it should not depend on total VM run time.
Julien Grall June 12, 2020, 11:29 a.m. UTC | #3
On 12/06/2020 12:26, Volodymyr Babchuk wrote:
> Hi Jurgen,
> 
> thanks for the review
> 
> On Fri, 2020-06-12 at 06:36 +0200, Jürgen Groß wrote:
> 
>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
> 
> [...]
> 
>>> +void vcpu_end_irq_handler(void)
>>> +{
>>> +    int delta;
>>> +
>>> +    if (is_idle_vcpu(current))
>>> +        return;
>>> +
>>> +    ASSERT(current->irq_nesting);
>>> +
>>> +    if ( --current->irq_nesting )
>>> +        return;
>>> +
>>> +    /* We assume that irq handling time will not overflow int */
>>
>> This assumption might not hold for long running VMs.
> 
> Basically, this value holds time span between calls to schedule(). This
> variable gets zeroed out every time scheduler requests for time
> adjustment value. So, it should not depend on total VM run time.
This is assuming that the scheduler will be called. With the NULL 
scheduler in place, there is a fair chance this may never be called.

So I think using a 64-bit value is likely safer.

Cheers,
Volodymyr Babchuk June 12, 2020, 11:33 a.m. UTC | #4
On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
> 
> On 12/06/2020 12:26, Volodymyr Babchuk wrote:
> > Hi Jurgen,
> > 
> > thanks for the review
> > 
> > On Fri, 2020-06-12 at 06:36 +0200, Jürgen Groß wrote:
> > 
> > > On 12.06.20 02:22, Volodymyr Babchuk wrote:
> > 
> > [...]
> > 
> > > > +void vcpu_end_irq_handler(void)
> > > > +{
> > > > +    int delta;
> > > > +
> > > > +    if (is_idle_vcpu(current))
> > > > +        return;
> > > > +
> > > > +    ASSERT(current->irq_nesting);
> > > > +
> > > > +    if ( --current->irq_nesting )
> > > > +        return;
> > > > +
> > > > +    /* We assume that irq handling time will not overflow int */
> > > 
> > > This assumption might not hold for long running VMs.
> > 
> > Basically, this value holds time span between calls to schedule(). This
> > variable gets zeroed out every time scheduler requests for time
> > adjustment value. So, it should not depend on total VM run time.
> This is assuming that the scheduler will be called. With the NULL 
> scheduler in place, there is a fair chance this may never be called.
> 
> So I think using a 64-bit value is likely safer.

Well, I wanted to use 64-bit value in the first place. But I got the
impression that atomic_t supports only 32-bit values. At least, this is
what I'm seeing in atomic.h

Am I wrong?

> Cheers,
>
Julien Grall June 12, 2020, 12:21 p.m. UTC | #5
On 12/06/2020 12:33, Volodymyr Babchuk wrote:
> 
> On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
>>
>> On 12/06/2020 12:26, Volodymyr Babchuk wrote:
>>> Hi Jurgen,
>>>
>>> thanks for the review
>>>
>>> On Fri, 2020-06-12 at 06:36 +0200, Jürgen Groß wrote:
>>>
>>>> On 12.06.20 02:22, Volodymyr Babchuk wrote:
>>>
>>> [...]
>>>
>>>>> +void vcpu_end_irq_handler(void)
>>>>> +{
>>>>> +    int delta;
>>>>> +
>>>>> +    if (is_idle_vcpu(current))
>>>>> +        return;
>>>>> +
>>>>> +    ASSERT(current->irq_nesting);
>>>>> +
>>>>> +    if ( --current->irq_nesting )
>>>>> +        return;
>>>>> +
>>>>> +    /* We assume that irq handling time will not overflow int */
>>>>
>>>> This assumption might not hold for long running VMs.
>>>
>>> Basically, this value holds time span between calls to schedule(). This
>>> variable gets zeroed out every time scheduler requests for time
>>> adjustment value. So, it should not depend on total VM run time.
>> This is assuming that the scheduler will be called. With the NULL
>> scheduler in place, there is a fair chance this may never be called.
>>
>> So I think using a 64-bit value is likely safer.
> 
> Well, I wanted to use 64-bit value in the first place. But I got the
> impression that atomic_t supports only 32-bit values. At least, this is
> what I'm seeing in atomic.h
> 
> Am I wrong?

There is no atomic64_t support in Xen yet. It shouldn't be very 
difficult to add support for it if you require them.

Cheers,
Dario Faggioli June 12, 2020, 8:08 p.m. UTC | #6
On Fri, 2020-06-12 at 13:21 +0100, Julien Grall wrote:
> On 12/06/2020 12:33, Volodymyr Babchuk wrote:
> > On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
> > > > Basically, this value holds time span between calls to
> > > > schedule(). This
> > > > variable gets zeroed out every time scheduler requests for time
> > > > adjustment value. So, it should not depend on total VM run
> > > > time.
> > > This is assuming that the scheduler will be called. With the NULL
> > > scheduler in place, there is a fair chance this may never be
> > > called.
> > > 
>
Yeah, this is a good point. I mean, I wouldn't be sure about "never",
as even there, we'd probably have softirqs, tasklets, etc... And I
still have to look at these patches in more details to figure out
properly whether they'd help for this.

But I'd say that, in general, we should depend of the frequency of the
scheduling events as few as possible. Therefore, using 64 bits from the
start would be preferrable IMO.

> > > So I think using a 64-bit value is likely safer.
> > 
Yep.

> > Well, I wanted to use 64-bit value in the first place. But I got
> > the
> > impression that atomic_t supports only 32-bit values. At least,
> > this is
> > what I'm seeing in atomic.h
> > 
> > Am I wrong?
> 
> There is no atomic64_t support in Xen yet. It shouldn't be very 
> difficult to add support for it if you require them.
> 
Cool! That would be much appreciated. :-D

Regards
Volodymyr Babchuk June 12, 2020, 10:25 p.m. UTC | #7
Hi Dario,

On Fri, 2020-06-12 at 22:08 +0200, Dario Faggioli wrote:
> On Fri, 2020-06-12 at 13:21 +0100, Julien Grall wrote:
> > On 12/06/2020 12:33, Volodymyr Babchuk wrote:
> > > On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
> > > > > Basically, this value holds time span between calls to
> > > > > schedule(). This
> > > > > variable gets zeroed out every time scheduler requests for time
> > > > > adjustment value. So, it should not depend on total VM run
> > > > > time.
> > > > This is assuming that the scheduler will be called. With the NULL
> > > > scheduler in place, there is a fair chance this may never be
> > > > called.
> > > > 
> Yeah, this is a good point. I mean, I wouldn't be sure about "never",
> as even there, we'd probably have softirqs, tasklets, etc... And I
> still have to look at these patches in more details to figure out
> properly whether they'd help for this.

Well. I think, it is possible to reset counters when we are switching
to a different scheduler. Just for cases like that.

> But I'd say that, in general, we should depend of the frequency of the
> scheduling events as few as possible. Therefore, using 64 bits from the
> start would be preferrable IMO.

I should done that calculation earlier... So, it appears that 32 bit
counter can count up to 4 mere seconds. It should be enought for normal
flow. But I'm agree with you - 64 bits looks much safer. 

> 
> > > > So I think using a 64-bit value is likely safer.
> Yep.
> 
> > > Well, I wanted to use 64-bit value in the first place. But I got
> > > the
> > > impression that atomic_t supports only 32-bit values. At least,
> > > this is
> > > what I'm seeing in atomic.h
> > > 
> > > Am I wrong?
> > 
> > There is no atomic64_t support in Xen yet. It shouldn't be very 
> > difficult to add support for it if you require them.
> > 
> Cool! That would be much appreciated. :-D
> 

Certainly! :)

I believe, there will be another users for atmic64_t as well.
Julien Grall June 12, 2020, 10:54 p.m. UTC | #8
On Fri, 12 Jun 2020 at 21:08, Dario Faggioli <dfaggioli@suse.com> wrote:
>
> On Fri, 2020-06-12 at 13:21 +0100, Julien Grall wrote:
> > On 12/06/2020 12:33, Volodymyr Babchuk wrote:
> > > On Fri, 2020-06-12 at 12:29 +0100, Julien Grall wrote:
> > > > > Basically, this value holds time span between calls to
> > > > > schedule(). This
> > > > > variable gets zeroed out every time scheduler requests for time
> > > > > adjustment value. So, it should not depend on total VM run
> > > > > time.
> > > > This is assuming that the scheduler will be called. With the NULL
> > > > scheduler in place, there is a fair chance this may never be
> > > > called.
> > > >
> >
> Yeah, this is a good point. I mean, I wouldn't be sure about "never",
> as even there, we'd probably have softirqs, tasklets, etc... And I
> still have to look at these patches in more details to figure out
> properly whether they'd help for this.

Unlike x86, Xen doesn't prod another pCPU consistently. :) This was
already discussed in multiple threads in the past (see [1] which not
resolved yet).

So yes, I am pretty confident I can recreate a case where the
scheduling function may never be called on Arm :).

Cheers,

[1] 315740e1-3591-0e11-923a-718e06c36445@arm.com
Jan Beulich June 16, 2020, 10:06 a.m. UTC | #9
On 12.06.2020 02:22, Volodymyr Babchuk wrote:
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1895,6 +1895,7 @@ void do_IRQ(struct cpu_user_regs *regs)
>      int               irq = this_cpu(vector_irq)[vector];
>      struct cpu_user_regs *old_regs = set_irq_regs(regs);
>  
> +    vcpu_begin_irq_handler();
>      perfc_incr(irqs);
>      this_cpu(irq_count)++;
>      irq_enter();
> @@ -2024,6 +2025,7 @@ void do_IRQ(struct cpu_user_regs *regs)
>   out_no_unlock:
>      irq_exit();
>      set_irq_regs(old_regs);
> +    vcpu_end_irq_handler();
>  }

This looks like a fight for who's going to be first/last here. I
think you want to add your calls after irq_enter() and before
irq_exit().

Jan
diff mbox series

Patch

diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 3877657a52..51b517c0cd 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -201,6 +201,7 @@  void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq)
     struct irq_desc *desc = irq_to_desc(irq);
     struct irqaction *action;
 
+    vcpu_begin_irq_handler();
     perfc_incr(irqs);
 
     ASSERT(irq >= 16); /* SGIs do not come down this path */
@@ -267,6 +268,7 @@  out:
 out_no_end:
     spin_unlock(&desc->lock);
     irq_exit();
+    vcpu_end_irq_handler();
 }
 
 void release_irq(unsigned int irq, const void *dev_id)
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index a69937c840..3ef4221b64 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1895,6 +1895,7 @@  void do_IRQ(struct cpu_user_regs *regs)
     int               irq = this_cpu(vector_irq)[vector];
     struct cpu_user_regs *old_regs = set_irq_regs(regs);
 
+    vcpu_begin_irq_handler();
     perfc_incr(irqs);
     this_cpu(irq_count)++;
     irq_enter();
@@ -2024,6 +2025,7 @@  void do_IRQ(struct cpu_user_regs *regs)
  out_no_unlock:
     irq_exit();
     set_irq_regs(old_regs);
+    vcpu_end_irq_handler();
 }
 
 static inline bool is_free_pirq(const struct domain *d,
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index cb49a8bc02..8f642ada05 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -916,6 +916,35 @@  void vcpu_unblock(struct vcpu *v)
     vcpu_wake(v);
 }
 
+void vcpu_begin_irq_handler(void)
+{
+    if (is_idle_vcpu(current))
+        return;
+
+    /* XXX: Looks like ASSERT_INTERRUPTS_DISABLED() is available only for x86 */
+    if ( current->irq_nesting++ )
+        return;
+
+    current->irq_entry_time = NOW();
+}
+
+void vcpu_end_irq_handler(void)
+{
+    int delta;
+
+    if (is_idle_vcpu(current))
+        return;
+
+    ASSERT(current->irq_nesting);
+
+    if ( --current->irq_nesting )
+        return;
+
+    /* We assume that irq handling time will not overflow int */
+    delta = NOW() - current->irq_entry_time;
+    atomic_add(delta, &current->sched_unit->irq_time);
+}
+
 /*
  * Do the actual movement of an unit from old to new CPU. Locks for *both*
  * CPUs needs to have been taken already when calling this!
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519d7f..ceed53364b 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -237,6 +237,9 @@  struct vcpu
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
     spinlock_t       virq_lock;
 
+    /* Fair scheduling state */
+    uint64_t         irq_entry_time;
+    unsigned int     irq_nesting;
     /* Tasklet for continue_hypercall_on_cpu(). */
     struct tasklet   continue_hypercall_tasklet;
 
@@ -276,6 +279,9 @@  struct sched_unit {
     /* Vcpu state summary. */
     unsigned int           runstate_cnt[4];
 
+    /* Fair scheduling correction value */
+    atomic_t               irq_time;
+
     /* Bitmask of CPUs on which this VCPU may run. */
     cpumask_var_t          cpu_hard_affinity;
     /* Used to save affinity during temporary pinning. */
@@ -690,6 +696,13 @@  long vcpu_yield(void);
 void vcpu_sleep_nosync(struct vcpu *v);
 void vcpu_sleep_sync(struct vcpu *v);
 
+/*
+ * Report IRQ handling time to scheduler. As IRQs can be nested,
+ * next two functions are re-enterable.
+ */
+void vcpu_begin_irq_handler(void);
+void vcpu_end_irq_handler(void);
+
 /*
  * Force synchronisation of given VCPU's state. If it is currently descheduled,
  * this call will ensure that all its state is committed to memory and that