diff mbox

[v3] sched/cputime: add steal time support to full dynticks CPU time accounting

Message ID 1463574454-3587-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li May 18, 2016, 12:27 p.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>

This patch adds steal guest time support to full dynticks CPU 
time accounting. After 'commit ff9a9b4c4334 ("sched, time: Switch 
VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")', time is jiffy 
based sampling even if it's still listened to ring boundaries, so 
steal_account_process_tick() is reused to account how much 'ticks' 
are steal time after the last accumulation. 

Suggested-by: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v2 -> v3:
 * convert steal time jiffies to cputime
v1 -> v2:
 * fix divide zero bug, thanks Rik

 kernel/sched/cputime.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Rik van Riel May 24, 2016, 7:22 p.m. UTC | #1
On Wed, 2016-05-18 at 20:27 +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> This patch adds steal guest time support to full dynticks CPU 
> time accounting. After 'commit ff9a9b4c4334 ("sched, time: Switch 
> VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")', time is jiffy 
> based sampling even if it's still listened to ring boundaries, so 
> steal_account_process_tick() is reused to account how much 'ticks' 
> are steal time after the last accumulation. 
> 
> Suggested-by: Rik van Riel <riel@redhat.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>

This also nicely fixes up f9c904b7613b ("sched/cputime: 
Fix steal_account_process_tick() to always return jiffies"),
which relies on a bool function returning a certain number
of jiffies :)

Reviewed-by: Rik van Riel <riel@redhat.com>
Wanpeng Li May 25, 2016, 2:16 a.m. UTC | #2
Ping Paolo or Peterz.
2016-05-25 3:22 GMT+08:00 Rik van Riel <riel@redhat.com>:
> On Wed, 2016-05-18 at 20:27 +0800, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> This patch adds steal guest time support to full dynticks CPU
>> time accounting. After 'commit ff9a9b4c4334 ("sched, time: Switch
>> VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")', time is jiffy
>> based sampling even if it's still listened to ring boundaries, so
>> steal_account_process_tick() is reused to account how much 'ticks'
>> are steal time after the last accumulation.
>>
>> Suggested-by: Rik van Riel <riel@redhat.com>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>
> This also nicely fixes up f9c904b7613b ("sched/cputime:
> Fix steal_account_process_tick() to always return jiffies"),
> which relies on a bool function returning a certain number
> of jiffies :)
>
> Reviewed-by: Rik van Riel <riel@redhat.com>
>
> --
> All rights reversed
Paolo Bonzini May 25, 2016, 10:35 a.m. UTC | #3
On 25/05/2016 04:16, Wanpeng Li wrote:
> Ping Paolo or Peterz.

No need to ping, since Rik reviewed it 7 hours ago so the thread has
gotten a bump in our mailboxes.

And anyway this is the merge window, which is the most annoying time to
get pings and one-patch changes.  I don't mind at all getting large
series during the merge window, but the small ones definitely can wait a
week or two.

Thanks,

Paolo

> 2016-05-25 3:22 GMT+08:00 Rik van Riel <riel@redhat.com>:
>> On Wed, 2016-05-18 at 20:27 +0800, Wanpeng Li wrote:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> This patch adds steal guest time support to full dynticks CPU
>>> time accounting. After 'commit ff9a9b4c4334 ("sched, time: Switch
>>> VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")', time is jiffy
>>> based sampling even if it's still listened to ring boundaries, so
>>> steal_account_process_tick() is reused to account how much 'ticks'
>>> are steal time after the last accumulation.
>>>
>>> Suggested-by: Rik van Riel <riel@redhat.com>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Cc: Rik van Riel <riel@redhat.com>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Frederic Weisbecker <fweisbec@gmail.com>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim <rkrcmar@redhat.com>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> This also nicely fixes up f9c904b7613b ("sched/cputime:
>> Fix steal_account_process_tick() to always return jiffies"),
>> which relies on a bool function returning a certain number
>> of jiffies :)
>>
>> Reviewed-by: Rik van Riel <riel@redhat.com>
>>
>> --
>> All rights reversed
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar June 3, 2016, 7:16 a.m. UTC | #4
* Wanpeng Li <kernellwp@gmail.com> wrote:

> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> This patch adds steal guest time support to full dynticks CPU 
> time accounting. After 'commit ff9a9b4c4334 ("sched, time: Switch 
> VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")', time is jiffy 
> based sampling even if it's still listened to ring boundaries, so 
> steal_account_process_tick() is reused to account how much 'ticks' 
> are steal time after the last accumulation. 

WTF? This changelog has 4 grammar errors and it sails through review just like 
that?

 1) What does 'time is jiffy based sampling' mean?
 2) what does 'even if it's still listened to ring boundaries' mean?
 3) "how muck 'ticks'"?
 4) "are steal time"?

So I fixed this to be at least parseable:

  This patch adds guest steal-time support to full dynticks CPU
  time accounting. After the following commit:

    ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity")

  ... time sampling became jiffy based, even if it's still listened
  to ring boundaries, so steal_account_process_tick() is reused
  to account how many 'ticks' are stolen-time, after the last accumulation.

Although I'm still wondering what this key phrase means:

  even if it's still listened to ring boundaries,

Could someone please explain what this means? (Beyond the 5th grammar error this 
portion has, which I'll fix once it actually makes sense to me...)

Furthermore, the real problem that made me go back and tear the changelog apart is 
that the code flow itself is incredibly ugly and fragile as hell:

>  	write_seqcount_begin(&tsk->vtime_seqcount);
>  	tsk->vtime_snap_whence = VTIME_SYS;
>  	if (vtime_delta(tsk)) {
> +		cputime_t steal_time;
> +		unsigned long delta_st = steal_account_process_tick();
>  		delta_cpu = get_vtime_delta(tsk);
> +		steal_time = jiffies_to_cputime(delta_st);
> +
> +		if (steal_time >= delta_cpu) {
> +			write_seqcount_end(&tsk->vtime_seqcount);
> +			return;
> +		}
> +		delta_cpu -= steal_time;
>  		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
>  	}
>  	write_seqcount_end(&tsk->vtime_seqcount);
> }

Yeah, a return in the middle of a locking critical section, really??

Also, how about basic style details like leaving an extra newline after local 
variable definition sections, like every other scheduler function does?

Also, what's this thing about calling a time unit variable 'delta_cpu'? When I 
reviewed this one of my first reactions was: "Why are we comparing time to CPU 
ID??".

Plus as an added bonus a 'delta_st' variable name to count ticks, which variable 
is not just badly named but single-use. WTF?

Something like this looks much better and shorter:

void vtime_account_user(struct task_struct *tsk)
{
        cputime_t delta_time, steal_time;

        write_seqcount_begin(&tsk->vtime_seqcount);
        tsk->vtime_snap_whence = VTIME_SYS;
        if (vtime_delta(tsk)) {
                delta_time = get_vtime_delta(tsk);
                steal_time = jiffies_to_cputime(steal_account_process_tick());

                if (steal_time < delta_time) {
                        delta_time -= steal_time;
                        account_user_time(tsk, delta_time, cputime_to_scaled(delta_time));
                }
        }
        write_seqcount_end(&tsk->vtime_seqcount);
}

See the consistent, obvious naming of the variables and the clear code flow?

(Totally untested, etc.)

But I'm very annoyed that this was in v3 and still has so many trivial problems: 
incredibly bad and confusing spelling, totally bad, fragile, apparently write-only 
code - and I counted like three acks from 3 other people who should really know 
better ...

This is core scheduler code!

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 6, 2016, 1:42 p.m. UTC | #5
On 03/06/2016 09:16, Ingo Molnar wrote:
> 
> But I'm very annoyed that this was in v3 and still has so many trivial problems: 
> incredibly bad and confusing spelling, totally bad, fragile, apparently write-only 
> code - and I counted like three acks from 3 other people who should really know 
> better ...
> 
> This is core scheduler code!

I actually want to fix this in arch/x86/kernel/kvm.c.  Please do not
apply this patch.

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li June 6, 2016, 10:32 p.m. UTC | #6
2016-06-06 21:42 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 03/06/2016 09:16, Ingo Molnar wrote:
>>
>> But I'm very annoyed that this was in v3 and still has so many trivial problems:
>> incredibly bad and confusing spelling, totally bad, fragile, apparently write-only
>> code - and I counted like three acks from 3 other people who should really know
>> better ...
>>
>> This is core scheduler code!
>
> I actually want to fix this in arch/x86/kernel/kvm.c.  Please do not
> apply this patch.

Your proposal will be a great appreciated. :)

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5..f51c98c 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -257,7 +257,7 @@  void account_idle_time(cputime_t cputime)
 		cpustat[CPUTIME_IDLE] += (__force u64) cputime;
 }
 
-static __always_inline bool steal_account_process_tick(void)
+static __always_inline unsigned long steal_account_process_tick(void)
 {
 #ifdef CONFIG_PARAVIRT
 	if (static_key_false(&paravirt_steal_enabled)) {
@@ -279,7 +279,7 @@  static __always_inline bool steal_account_process_tick(void)
 		return steal_jiffies;
 	}
 #endif
-	return false;
+	return 0;
 }
 
 /*
@@ -691,8 +691,14 @@  static cputime_t get_vtime_delta(struct task_struct *tsk)
 
 static void __vtime_account_system(struct task_struct *tsk)
 {
+	cputime_t steal_time;
 	cputime_t delta_cpu = get_vtime_delta(tsk);
+	unsigned long delta_st = steal_account_process_tick();
+	steal_time = jiffies_to_cputime(delta_st);
 
+	if (steal_time >= delta_cpu)
+		return;
+	delta_cpu -= steal_time;
 	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
 }
 
@@ -723,7 +729,16 @@  void vtime_account_user(struct task_struct *tsk)
 	write_seqcount_begin(&tsk->vtime_seqcount);
 	tsk->vtime_snap_whence = VTIME_SYS;
 	if (vtime_delta(tsk)) {
+		cputime_t steal_time;
+		unsigned long delta_st = steal_account_process_tick();
 		delta_cpu = get_vtime_delta(tsk);
+		steal_time = jiffies_to_cputime(delta_st);
+
+		if (steal_time >= delta_cpu) {
+			write_seqcount_end(&tsk->vtime_seqcount);
+			return;
+		}
+		delta_cpu -= steal_time;
 		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
 	}
 	write_seqcount_end(&tsk->vtime_seqcount);