diff mbox

sched/cputime: add steal time support to full dynticks CPU time accounting

Message ID 1462858484-3267-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li May 10, 2016, 5:34 a.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>

This patch adds steal guest time support to full dynticks CPU time 
accounting. After commit ff9a9b4c(sched, time: Switch VIRT_CPU_ACCOUNTING_GEN 
to jiffy granularity), time is jiffy based sampling even if it's 
still listened to ring boundaries, so steal_account_process_tick() 
is reused to account how much 'ticks' are steal time after the 
last accumulation. 

Suggested-by: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 kernel/sched/cputime.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Rik van Riel May 18, 2016, 4:04 a.m. UTC | #1
On Tue, 2016-05-10 at 13:34 +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> This patch adds steal guest time support to full dynticks CPU time 
> accounting. After commit ff9a9b4c(sched, time: Switch
> VIRT_CPU_ACCOUNTING_GEN 
> to jiffy granularity), time is jiffy based sampling even if it's 
> still listened to ring boundaries, so steal_account_process_tick() 
> is reused to account how much 'ticks' are steal time after the 
> last accumulation. 
> 
> Suggested-by: Rik van Riel <riel@redhat.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> 

Acked-by: Rik van Riel <riel@redhat.com>
Rik van Riel May 18, 2016, 7:10 a.m. UTC | #2
On Tue, 2016-05-10 at 13:34 +0800, Wanpeng Li wrote:

> +++ b/kernel/sched/cputime.c

> @@ -691,8 +691,11 @@ static cputime_t get_vtime_delta(struct
> task_struct *tsk)
>  
>  static void __vtime_account_system(struct task_struct *tsk)
>  {
> +	unsigned long steal_time = steal_account_process_tick();
>  	cputime_t delta_cpu = get_vtime_delta(tsk);
>  
> +	delta_cpu = steal_time ? (delta_cpu -
> +		jiffies_to_cputime(steal_time)) : delta_cpu;
>  	account_system_time(tsk, irq_count(), delta_cpu,
> cputime_to_scaled(delta_cpu));
>  }
>  

Sorry to have to go back on my previous email, but
this is now a NAK

The above code can end up passing a negative number
to account_system_time(), which in turn can cause a
divide by zero in scale_stime()

The code needs to ensure that if all the time that
passed was accounted as steal time (which could be
more jiffies than expected, due to remaining partial
jiffies in steal_account_process_tick), the function
does not call account_system_time().
Wanpeng Li May 18, 2016, 7:45 a.m. UTC | #3
2016-05-18 15:10 GMT+08:00 Rik van Riel <riel@redhat.com>:
> On Tue, 2016-05-10 at 13:34 +0800, Wanpeng Li wrote:
>>
>> +++ b/kernel/sched/cputime.c
>>
>> @@ -691,8 +691,11 @@ static cputime_t get_vtime_delta(struct
>> task_struct *tsk)
>>
>>  static void __vtime_account_system(struct task_struct *tsk)
>>  {
>> +     unsigned long steal_time = steal_account_process_tick();
>>       cputime_t delta_cpu = get_vtime_delta(tsk);
>>
>> +     delta_cpu = steal_time ? (delta_cpu -
>> +             jiffies_to_cputime(steal_time)) : delta_cpu;
>>       account_system_time(tsk, irq_count(), delta_cpu,
>> cputime_to_scaled(delta_cpu));
>>  }
>>
>
> Sorry to have to go back on my previous email, but
> this is now a NAK
>
> The above code can end up passing a negative number
> to account_system_time(), which in turn can cause a
> divide by zero in scale_stime()
>
> The code needs to ensure that if all the time that
> passed was accounted as steal time (which could be
> more jiffies than expected, due to remaining partial
> jiffies in steal_account_process_tick), the function
> does not call account_system_time().

Yeah, I will fix it in next version, thank Rik very much for debugging
with me to figure out the root cause of divide zero.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 75f98c5..b96bd8f 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -257,7 +257,7 @@  void account_idle_time(cputime_t cputime)
 		cpustat[CPUTIME_IDLE] += (__force u64) cputime;
 }
 
-static __always_inline bool steal_account_process_tick(void)
+static __always_inline unsigned long steal_account_process_tick(void)
 {
 #ifdef CONFIG_PARAVIRT
 	if (static_key_false(&paravirt_steal_enabled)) {
@@ -279,7 +279,7 @@  static __always_inline bool steal_account_process_tick(void)
 		return steal_jiffies;
 	}
 #endif
-	return false;
+	return 0;
 }
 
 /*
@@ -691,8 +691,11 @@  static cputime_t get_vtime_delta(struct task_struct *tsk)
 
 static void __vtime_account_system(struct task_struct *tsk)
 {
+	unsigned long steal_time = steal_account_process_tick();
 	cputime_t delta_cpu = get_vtime_delta(tsk);
 
+	delta_cpu = steal_time ? (delta_cpu -
+		jiffies_to_cputime(steal_time)) : delta_cpu;
 	account_system_time(tsk, irq_count(), delta_cpu, cputime_to_scaled(delta_cpu));
 }
 
@@ -723,7 +726,11 @@  void vtime_account_user(struct task_struct *tsk)
 	write_seqcount_begin(&tsk->vtime_seqcount);
 	tsk->vtime_snap_whence = VTIME_SYS;
 	if (vtime_delta(tsk)) {
+		unsigned long steal_time = steal_account_process_tick();
 		delta_cpu = get_vtime_delta(tsk);
+
+		delta_cpu = steal_time ? (delta_cpu -
+			jiffies_to_cputime(steal_time)) : delta_cpu;
 		account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
 	}
 	write_seqcount_end(&tsk->vtime_seqcount);