diff mbox series

[1/6] delayacct: Use sched_clock()

Message ID 20210505111525.001031466@infradead.org (mailing list archive)
State New, archived
Headers show
Series sched,delayacct: Some cleanups | expand

Commit Message

Peter Zijlstra May 5, 2021, 10:59 a.m. UTC
Like all scheduler statistics, use sched_clock() based time.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/delayacct.c |   22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

Comments

Rik van Riel May 5, 2021, 2:40 p.m. UTC | #1
On Wed, 2021-05-05 at 12:59 +0200, Peter Zijlstra wrote:
> Like all scheduler statistics, use sched_clock() based time.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Looks like this works even on messed up systems because
the delayacct code is called from the same CPU at sleep
time and wakeup time, before a task is migrated.

Reviewed-by: Rik van Riel <riel@surriel.com>
Johannes Weiner May 6, 2021, 1:59 p.m. UTC | #2
On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st
>   * Finish delay accounting for a statistic using its timestamps (@start),
>   * accumalator (@total) and @count
>   */
> -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total,
> -			  u32 *count)
> +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count)
>  {
> -	s64 ns = ktime_get_ns() - *start;
> +	s64 ns = local_clock() - *start;

I don't think this is safe. These time sections that have preemption
and migration enabled and so might span multiple CPUs. local_clock()
could end up behind *start, AFAICS.
Peter Zijlstra May 6, 2021, 2:17 p.m. UTC | #3
On Thu, May 06, 2021 at 09:59:11AM -0400, Johannes Weiner wrote:
> On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> > @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st
> >   * Finish delay accounting for a statistic using its timestamps (@start),
> >   * accumalator (@total) and @count
> >   */
> > -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total,
> > -			  u32 *count)
> > +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count)
> >  {
> > -	s64 ns = ktime_get_ns() - *start;
> > +	s64 ns = local_clock() - *start;
> 
> I don't think this is safe. These time sections that have preemption
> and migration enabled and so might span multiple CPUs. local_clock()
> could end up behind *start, AFAICS.

Only if you have really crummy hardware, and in that case the drift is
bounded by around 1 tick. Also, this function actually checks: ns > 0.

And if you do have crummy hardware like that, ktime_get_ns() is the very
last thing you want to call at any frequency because it'll be the HPET.
Johannes Weiner May 6, 2021, 3:17 p.m. UTC | #4
On Thu, May 06, 2021 at 04:17:33PM +0200, Peter Zijlstra wrote:
> On Thu, May 06, 2021 at 09:59:11AM -0400, Johannes Weiner wrote:
> > On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> > > @@ -42,10 +42,9 @@ void __delayacct_tsk_init(struct task_st
> > >   * Finish delay accounting for a statistic using its timestamps (@start),
> > >   * accumalator (@total) and @count
> > >   */
> > > -static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total,
> > > -			  u32 *count)
> > > +static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count)
> > >  {
> > > -	s64 ns = ktime_get_ns() - *start;
> > > +	s64 ns = local_clock() - *start;
> > 
> > I don't think this is safe. These time sections that have preemption
> > and migration enabled and so might span multiple CPUs. local_clock()
> > could end up behind *start, AFAICS.
> 
> Only if you have really crummy hardware, and in that case the drift is
> bounded by around 1 tick. Also, this function actually checks: ns > 0.

Oh, I didn't realize it was that close. I just went off the dramatic
warnings on cpu_clock() :-) But yeah, that seems plenty accurate for
this purpose.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Education Directorate May 7, 2021, 12:40 p.m. UTC | #5
On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> Like all scheduler statistics, use sched_clock() based time.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Goind by your comment about preemption safety not being a concern
the patch looks good.

Acked-by: Balbir Singh <bsingharora@gmail.com>
Mel Gorman May 12, 2021, 10:43 a.m. UTC | #6
On Wed, May 05, 2021 at 12:59:41PM +0200, Peter Zijlstra wrote:
> Like all scheduler statistics, use sched_clock() based time.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Mel Gorman <mgorman@suse.de>
diff mbox series

Patch

--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -7,9 +7,9 @@ 
 #include <linux/sched.h>
 #include <linux/sched/task.h>
 #include <linux/sched/cputime.h>
+#include <linux/sched/clock.h>
 #include <linux/slab.h>
 #include <linux/taskstats.h>
-#include <linux/time.h>
 #include <linux/sysctl.h>
 #include <linux/delayacct.h>
 #include <linux/module.h>
@@ -42,10 +42,9 @@  void __delayacct_tsk_init(struct task_st
  * Finish delay accounting for a statistic using its timestamps (@start),
  * accumalator (@total) and @count
  */
-static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total,
-			  u32 *count)
+static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count)
 {
-	s64 ns = ktime_get_ns() - *start;
+	s64 ns = local_clock() - *start;
 	unsigned long flags;
 
 	if (ns > 0) {
@@ -58,7 +57,7 @@  static void delayacct_end(raw_spinlock_t
 
 void __delayacct_blkio_start(void)
 {
-	current->delays->blkio_start = ktime_get_ns();
+	current->delays->blkio_start = local_clock();
 }
 
 /*
@@ -151,21 +150,20 @@  __u64 __delayacct_blkio_ticks(struct tas
 
 void __delayacct_freepages_start(void)
 {
-	current->delays->freepages_start = ktime_get_ns();
+	current->delays->freepages_start = local_clock();
 }
 
 void __delayacct_freepages_end(void)
 {
-	delayacct_end(
-		&current->delays->lock,
-		&current->delays->freepages_start,
-		&current->delays->freepages_delay,
-		&current->delays->freepages_count);
+	delayacct_end(&current->delays->lock,
+		      &current->delays->freepages_start,
+		      &current->delays->freepages_delay,
+		      &current->delays->freepages_count);
 }
 
 void __delayacct_thrashing_start(void)
 {
-	current->delays->thrashing_start = ktime_get_ns();
+	current->delays->thrashing_start = local_clock();
 }
 
 void __delayacct_thrashing_end(void)