[RFC,2/6] schedule: account true system idle time

Message ID	1564137460-25629-3-git-send-email-andrii.anisov@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> From: Andrii Anisov <andrii.anisov@gmail.com> To: xen-devel@lists.xenproject.org Date: Fri, 26 Jul 2019 13:37:35 +0300 Message-Id: <1564137460-25629-3-git-send-email-andrii.anisov@gmail.com> In-Reply-To: <1564137460-25629-1-git-send-email-andrii.anisov@gmail.com> References: <1564137460-25629-1-git-send-email-andrii.anisov@gmail.com> Subject: [Xen-devel] [RFC 2/6] schedule: account true system idle time Precedence: list Cc: Stefano Stabellini <sstabellini@kernel.org>, Andrii Anisov <andrii_anisov@epam.com>, George Dunlap <george.dunlap@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com>, Julien Grall <julien.grall@arm.com>, Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	XEN scheduling hardening \| expand [RFC,0/6] XEN scheduling hardening [RFC,1/6] xen/arm: Re-enable interrupt later in the trap path [RFC,2/6] schedule: account true system idle time [RFC,3/6] sysctl: extend XEN_SYSCTL_getcpuinfo interface [RFC,4/6] xentop: show CPU load information [RFC,5/6] arm64: сall enter_hypervisor_head only when it is needed [RFC,6/6] schedule: account all the hypervisor time to the idle vcpu

Message ID

1564137460-25629-3-git-send-email-andrii.anisov@gmail.com (mailing list archive)

State

New, archived

Headers

From: Andrii Anisov <andrii.anisov@gmail.com>
To: xen-devel@lists.xenproject.org
Date: Fri, 26 Jul 2019 13:37:35 +0300
Message-Id: <1564137460-25629-3-git-send-email-andrii.anisov@gmail.com>
In-Reply-To: <1564137460-25629-1-git-send-email-andrii.anisov@gmail.com>
References: <1564137460-25629-1-git-send-email-andrii.anisov@gmail.com>
Subject: [Xen-devel] [RFC 2/6] schedule: account true system idle time
Precedence: list
Cc: Stefano Stabellini <sstabellini@kernel.org>,
 Andrii Anisov <andrii_anisov@epam.com>,
 George Dunlap <george.dunlap@eu.citrix.com>,
 Dario Faggioli <dfaggioli@suse.com>, Julien Grall <julien.grall@arm.com>,
 Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Series

XEN scheduling hardening | expand

Commit Message

Andrii Anisov July 26, 2019, 10:37 a.m. UTC

From: Andrii Anisov <andrii_anisov@epam.com>

Currently the idle time is being accounted as a idle vcpu runtime.
This is not entirely correct, because the entity named idle vcpu is
in fact a hypervisor tasks worker. E.g. some softirqs are processed
by the idle vcpu.
So lets change idle vcpu time accounting and specify system idle time
as a idle vcpu blocked time. For this we should appropriately change
idle vcpu runstates around the real processor idle entry.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/domain.c | 24 ++++++++++++++++++++++++
 xen/common/schedule.c |  4 +++-
 2 files changed, 27 insertions(+), 1 deletion(-)

Comments

Dario Faggioli July 26, 2019, noon UTC | #1

On Fri, 2019-07-26 at 13:37 +0300, Andrii Anisov wrote:
> From: Andrii Anisov <andrii_anisov@epam.com>
> 
> Currently the idle time is being accounted as a idle vcpu runtime.
> This is not entirely correct, because the entity named idle vcpu is
> in fact a hypervisor tasks worker. E.g. some softirqs are processed
> by the idle vcpu.
>
That's all very true, and, as discussed both via mail and in person,
I'm all for it.

About the implementation.

> So lets change idle vcpu time accounting and specify system idle time
> as a idle vcpu blocked time. 
>
This, for one, doesn't really look right to me. You're trying to make
things more clear and more precise... and that's by hiding real idle
time in the idle_vcpu blocked time metric? :-D :-P

Jokes apart, I see how it is rather easy to do something like this, so
I understand it being done like this in an RFC patch, but I don't think
it's correct.

And, on an even more general perspective, the fact that the hypervisor,
when scheduling the idle vcpu, runs softirq, tasklets, etc, it's a
generic concept, not an arch specific one. So, we really should find a
way to implement this in common code, not in arch code.

Maybe, but I'm just thinking out loud, and I need to think more about
this, we can do things the other way round. I.e., we measure the time
that it takes to run softirq and tasklets, and we subtract it from
idle_vcpu runtime?

Regards

Andrii Anisov July 26, 2019, 12:42 p.m. UTC | #2

Hello Dario,

On 26.07.19 15:00, Dario Faggioli wrote:
> On Fri, 2019-07-26 at 13:37 +0300, Andrii Anisov wrote:
>> From: Andrii Anisov <andrii_anisov@epam.com>
>>
>> Currently the idle time is being accounted as a idle vcpu runtime.
>> This is not entirely correct, because the entity named idle vcpu is
>> in fact a hypervisor tasks worker. E.g. some softirqs are processed
>> by the idle vcpu.
>>
> That's all very true, and, as discussed both via mail and in person,
> I'm all for it.

Thank you for you interest. And I hope to have some productive discussion here. :)

> and that's by hiding real idle
> time in the idle_vcpu blocked time metric? :-D :-P

Yes, I do. You should be noticed I told about idle_vcpu renaming in the cover letter.
So if you treat current idle_vcpu as a hypervisor_vcpu, you will see that getting it blocked on wait for event strictly match the idle concept.

> Jokes apart,

So let it be :)

> I see how it is rather easy to do something like this, so
> I understand it being done like this in an RFC patch, but I don't think
> it's correct.

This is the VERY RFC with the minimal changes to the existing code and adopting existing approaches.
This topic is really complex and requires wide discussion, so this series is rather an invitation to the discussion.

> And, on an even more general perspective, the fact that the hypervisor,
> when scheduling the idle vcpu, runs softirq, tasklets, etc, it's a
> generic concept, not an arch specific one. So, we really should find a
> way to implement this in common code, not in arch code.

Yes, in terms of this patch, idle_vcpu_runstate_change() better be moved to common/schedule.c.

> Maybe, but I'm just thinking out loud, and I need to think more about
> this, we can do things the other way round. I.e., we measure the time
> that it takes to run softirq and tasklets, and we subtract it from
> idle_vcpu runtime?

In the patch "schedule: account all the hypervisor time to the idle vcpu" I extend what I think should be accounted for the hypervisor run time. And subtraction approach will result in more complex code over there.

Dario Faggioli July 29, 2019, 11:40 a.m. UTC | #3

On Fri, 2019-07-26 at 15:42 +0300, Andrii Anisov wrote:
> Hello Dario,
> 
Hi,

> On 26.07.19 15:00, Dario Faggioli wrote:
> > On Fri, 2019-07-26 at 13:37 +0300, Andrii Anisov wrote:
> > I see how it is rather easy to do something like this, so
> > I understand it being done like this in an RFC patch, but I don't
> > think
> > it's correct.
> 
> This is the VERY RFC with the minimal changes to the existing code
> and adopting existing approaches.
>
Sure, I know, and this is fine.

> This topic is really complex and requires wide discussion, so this
> series is rather an invitation to the discussion.
> 
Absolutely.

> > And, on an even more general perspective, the fact that the
> > hypervisor,
> > when scheduling the idle vcpu, runs softirq, tasklets, etc, it's a
> > generic concept, not an arch specific one. So, we really should
> > find a
> > way to implement this in common code, not in arch code.
> 
> Yes, in terms of this patch, idle_vcpu_runstate_change() better be
> moved to common/schedule.c.
> 
I think we should, first of all, think, if using runstates and
runstates manipulation functions is really the best way forward here.

And, if that reveals to be the case, I feel like runstates would need
to be extended to be able to deal with want we want to achieve.

I'll think more about this, and try to form an idea...

> > Maybe, but I'm just thinking out loud, and I need to think more
> > about
> > this, we can do things the other way round. I.e., we measure the
> > time
> > that it takes to run softirq and tasklets, and we subtract it from
> > idle_vcpu runtime?
> 
> In the patch "schedule: account all the hypervisor time to the idle
> vcpu" I extend what I think should be accounted for the hypervisor
> run time. And subtraction approach will result in more complex code
> over there.
> 
Yep, I quickly looked at it, but I need to review it carefully, to
properly understand it, come up with comments, think about
alternatives, etc.

Will do that soon, hopefully.

For now, just consider that, IMO, the big value of this series (or, if
you want, the discussion this series is aimed at starting) is getting
Xen toward having a better, more accurate and more fine grained time
accounting and reporting.

If we time. e.g., interrupts, softirqs and tasklets, we can store such
metrics too, and, if we want, report a breakout of the time spent in
hypervisor... something like (in xentop, as you're doing already, or
somewhere else):

 hyp=12%(irq=4%+softirq=1%+tasklet=5%+other=2%) 

Anyway, thanks again and Regards

Andrii Anisov Aug. 1, 2019, 8:23 a.m. UTC | #4

Hello Dario,

On 29.07.19 14:40, Dario Faggioli wrote:
>> Yes, in terms of this patch, idle_vcpu_runstate_change() better be
>> moved to common/schedule.c.
>>
> I think we should, first of all, think, if using runstates and
> runstates manipulation functions is really the best way forward here.
> 
> And, if that reveals to be the case, I feel like runstates would need
> to be extended to be able to deal with want we want to achieve.
> 
> I'll think more about this, and try to form an idea...

I think runstate does not suite. Its format is linked to the existing interface and is not flexible enough for possible changes.
 From other hand, we would need reordering in runstate time calculation as well.

> If we time. e.g., interrupts, softirqs and tasklets, we can store such
> metrics too, and, if we want, report a breakout of the time spent in
> hypervisor... something like (in xentop, as you're doing already, or
> somewhere else):
> 
>   hyp=12%(irq=4%+softirq=1%+tasklet=5%+other=2%)

One more downside of runstate - it will not hold all that stuff.

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 941bbff..a4e0fd7 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -19,6 +19,7 @@ 
 #include <xen/sched.h>
 #include <xen/softirq.h>
 #include <xen/wait.h>
+#include <xen/sched-if.h>
 
 #include <asm/alternative.h>
 #include <asm/cpuerrata.h>
@@ -42,6 +43,27 @@ 
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 
+static inline void idle_vcpu_runstate_change(
+    unsigned int cpu, int new_state, s_time_t new_entry_time)
+{
+    s_time_t delta;
+    struct vcpu *v = idle_vcpu[cpu];
+    spinlock_t *lock = vcpu_schedule_lock(v);
+
+    ASSERT(v == current);
+    ASSERT(v->runstate.state != new_state);
+
+    delta = new_entry_time - v->runstate.state_entry_time;
+    if ( delta > 0 )
+    {
+        v->runstate.time[v->runstate.state] += delta;
+        v->runstate.state_entry_time = new_entry_time;
+    }
+
+    v->runstate.state = new_state;
+    vcpu_schedule_unlock(lock, v);
+}
+
 static void do_idle(void)
 {
     unsigned int cpu = smp_processor_id();
@@ -51,11 +73,13 @@  static void do_idle(void)
     process_pending_softirqs();
 
     local_irq_disable();
+    idle_vcpu_runstate_change(cpu, RUNSTATE_blocked, NOW());
     if ( cpu_is_haltable(cpu) )
     {
         dsb(sy);
         wfi();
     }
+    idle_vcpu_runstate_change(cpu, RUNSTATE_running, NOW());
     local_irq_enable();
 
     sched_tick_resume();
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 349f962..0a38d4a 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -214,7 +214,7 @@  uint64_t get_cpu_idle_time(unsigned int cpu)
     if ( cpu_online(cpu) && v )
         vcpu_runstate_get(v, &state);
 
-    return state.time[RUNSTATE_running];
+    return state.time[RUNSTATE_blocked];
 }
 
 /*
@@ -922,6 +922,8 @@  void vcpu_block(void)
 {
     struct vcpu *v = current;
 
+    ASSERT(!is_idle_vcpu(v));
+
     set_bit(_VPF_blocked, &v->pause_flags);
 
     arch_vcpu_block(v);

[RFC,2/6] schedule: account true system idle time

Commit Message

Comments

Patch