Revert "cpuidle: Replace ktime_get() with local_clock()"

Message ID	20170420130813.h7dycr5cptbrvdkz@hirez.programming.kicks-ass.net (mailing list archive)
State	Not Applicable, archived
Headers	show Return-Path: <linux-pm-owner@kernel.org> Date: Thu, 20 Apr 2017 15:08:13 +0200 From: Peter Zijlstra <peterz@infradead.org> To: ville.syrjala@linux.intel.com Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, Daniel Lezcano <daniel.lezcano@linaro.org>, "Rafael J . Wysocki" <rafael.j.wysocki@intel.com> Subject: Re: [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()" Message-ID: <20170420130813.h7dycr5cptbrvdkz@hirez.programming.kicks-ass.net> References: <20170420124447.13716-1-ville.syrjala@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170420124447.13716-1-ville.syrjala@linux.intel.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-pm-owner@vger.kernel.org Precedence: bulk

Message ID

20170420130813.h7dycr5cptbrvdkz@hirez.programming.kicks-ass.net (mailing list archive)

State

Not Applicable, archived

Headers

Date: Thu, 20 Apr 2017 15:08:13 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: ville.syrjala@linux.intel.com
Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, Daniel Lezcano <daniel.lezcano@linaro.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Subject: Re: [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()"
Message-ID: <20170420130813.h7dycr5cptbrvdkz@hirez.programming.kicks-ass.net>
References: <20170420124447.13716-1-ville.syrjala@linux.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20170420124447.13716-1-ville.syrjala@linux.intel.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-pm-owner@vger.kernel.org
Precedence: bulk

Commit Message

Peter Zijlstra April 20, 2017, 1:08 p.m. UTC

On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> 
> The TSC stops in deeper C states, 

On some old hardware (Core2 era and before) only. You've forgotten to
mention what hardware you've observed problems with.

> so using local_clock() in cpuidle

But on said hardware, local_clock() isn't an immediate TSC user.

> to track the C state residency seems like a bad idea. With local_clock()
> powertop is reporting mostly 0% residency for C states here. Presumably
> the core is still spending most of its time in some deep C-state since
> the totals typically add up to only 5% or so, so perhaps the governor
> isn't getting totally confused by these bogus numbers. But let's go
> back to using ktime_get() as that at least works correctly across the
> board.

Does this cure it?

---
 drivers/cpuidle/cpuidle.c | 2 ++
 kernel/sched/clock.c      | 7 +++----
 2 files changed, 5 insertions(+), 4 deletions(-)

Comments

Ville Syrjälä April 20, 2017, 1:43 p.m. UTC | #1

On Thu, Apr 20, 2017 at 03:08:13PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> > 
> > The TSC stops in deeper C states, 
> 
> On some old hardware (Core2 era and before) only. You've forgotten to
> mention what hardware you've observed problems with.

Yeah, Core2 is what I used when I finally decided to bisect this. I've
been plagued by the bogus powertop numbers on many machines, most
likely all of them were of some older vintage.

> 
> > so using local_clock() in cpuidle
> 
> But on said hardware, local_clock() isn't an immediate TSC user.
> 
> > to track the C state residency seems like a bad idea. With local_clock()
> > powertop is reporting mostly 0% residency for C states here. Presumably
> > the core is still spending most of its time in some deep C-state since
> > the totals typically add up to only 5% or so, so perhaps the governor
> > isn't getting totally confused by these bogus numbers. But let's go
> > back to using ktime_get() as that at least works correctly across the
> > board.
> 
> Does this cure it?

It does indeed.

Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> 
> ---
>  drivers/cpuidle/cpuidle.c | 2 ++
>  kernel/sched/clock.c      | 7 +++----
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 548b90be7685..e0d4ad108887 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -219,6 +219,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  	entered_state = target_state->enter(dev, drv, index);
>  	start_critical_timings();
>  
> +	sched_clock_idle_wakeup_event(0);
> +
>  	time_end = ns_to_ktime(local_clock());
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>  
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 00a45c45beca..15e848706be4 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -347,6 +347,9 @@ void sched_clock_tick(void)
>  {
>  	struct sched_clock_data *scd;
>  
> +	if (timekeeping_suspended)
> +		return;
> +
>  	WARN_ON_ONCE(!irqs_disabled());
>  
>  	/*
> @@ -378,11 +381,7 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
>   */
>  void sched_clock_idle_wakeup_event(u64 delta_ns)
>  {
> -	if (timekeeping_suspended)
> -		return;
> -
>  	sched_clock_tick();
> -	touch_softlockup_watchdog_sched();
>  }
>  EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
>

Peter Zijlstra April 20, 2017, 1:49 p.m. UTC | #2

On Thu, Apr 20, 2017 at 04:43:45PM +0300, Ville Syrjälä wrote:
> On Thu, Apr 20, 2017 at 03:08:13PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> > > 
> > > The TSC stops in deeper C states, 
> > 
> > On some old hardware (Core2 era and before) only. You've forgotten to
> > mention what hardware you've observed problems with.
> 
> Yeah, Core2 is what I used when I finally decided to bisect this. I've
> been plagued by the bogus powertop numbers on many machines, most
> likely all of them were of some older vintage.

> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

OK, thanks. I'm currently chasing some other Core2 issue that is
somewhat related. See:

  http://lkml.kernel.org/r/20170413132349.thxkwptdymsfsyxb@hirez.programming.kicks-ass.net

Once I have that sorted I'll post both patches.

Daniel Lezcano April 20, 2017, 2:37 p.m. UTC | #3

On Thu, Apr 20, 2017 at 03:08:13PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> > 
> > The TSC stops in deeper C states, 
> 
> On some old hardware (Core2 era and before) only. You've forgotten to
> mention what hardware you've observed problems with.
> 
> > so using local_clock() in cpuidle
> 
> But on said hardware, local_clock() isn't an immediate TSC user.
> 
> > to track the C state residency seems like a bad idea. With local_clock()
> > powertop is reporting mostly 0% residency for C states here. Presumably
> > the core is still spending most of its time in some deep C-state since
> > the totals typically add up to only 5% or so, so perhaps the governor
> > isn't getting totally confused by these bogus numbers. But let's go
> > back to using ktime_get() as that at least works correctly across the
> > board.
> 
> Does this cure it?
> 
> ---
>  drivers/cpuidle/cpuidle.c | 2 ++
>  kernel/sched/clock.c      | 7 +++----
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 548b90be7685..e0d4ad108887 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -219,6 +219,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  	entered_state = target_state->enter(dev, drv, index);
>  	start_critical_timings();
>  
> +	sched_clock_idle_wakeup_event(0);
> +

Is it planned to skip this if the tsc is reliable?

>  	time_end = ns_to_ktime(local_clock());
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>  
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 00a45c45beca..15e848706be4 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -347,6 +347,9 @@ void sched_clock_tick(void)
>  {
>  	struct sched_clock_data *scd;
>  
> +	if (timekeeping_suspended)
> +		return;
> +
>  	WARN_ON_ONCE(!irqs_disabled());
>  
>  	/*
> @@ -378,11 +381,7 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
>   */
>  void sched_clock_idle_wakeup_event(u64 delta_ns)
>  {
> -	if (timekeeping_suspended)
> -		return;
> -
>  	sched_clock_tick();
> -	touch_softlockup_watchdog_sched();
>  }
>  EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
>

Peter Zijlstra April 20, 2017, 2:41 p.m. UTC | #4

On Thu, Apr 20, 2017 at 04:37:51PM +0200, Daniel Lezcano wrote:

> >  
> > +	sched_clock_idle_wakeup_event(0);
> > +
> 
> Is it planned to skip this if the tsc is reliable?

Yes. Current code doesn't quite do that, but if you follow that link I
send earlier, I'm about to fix that (again).

Now, if only this Core2 piece of crap would actually boot a recent
kernel, I could go figure out wth is wrong.. :/

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 548b90be7685..e0d4ad108887 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -219,6 +219,8 @@  int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	entered_state = target_state->enter(dev, drv, index);
 	start_critical_timings();
 
+	sched_clock_idle_wakeup_event(0);
+
 	time_end = ns_to_ktime(local_clock());
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
 
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 00a45c45beca..15e848706be4 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -347,6 +347,9 @@  void sched_clock_tick(void)
 {
 	struct sched_clock_data *scd;
 
+	if (timekeeping_suspended)
+		return;
+
 	WARN_ON_ONCE(!irqs_disabled());
 
 	/*
@@ -378,11 +381,7 @@  EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
  */
 void sched_clock_idle_wakeup_event(u64 delta_ns)
 {
-	if (timekeeping_suspended)
-		return;
-
 	sched_clock_tick();
-	touch_softlockup_watchdog_sched();
 }
 EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);

Revert "cpuidle: Replace ktime_get() with local_clock()"

Commit Message

Comments

Patch