diff mbox

Revert "cpuidle: Replace ktime_get() with local_clock()"

Message ID 20170420130813.h7dycr5cptbrvdkz@hirez.programming.kicks-ass.net (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Peter Zijlstra April 20, 2017, 1:08 p.m. UTC
On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> 
> The TSC stops in deeper C states, 

On some old hardware (Core2 era and before) only. You've forgotten to
mention what hardware you've observed problems with.

> so using local_clock() in cpuidle

But on said hardware, local_clock() isn't an immediate TSC user.

> to track the C state residency seems like a bad idea. With local_clock()
> powertop is reporting mostly 0% residency for C states here. Presumably
> the core is still spending most of its time in some deep C-state since
> the totals typically add up to only 5% or so, so perhaps the governor
> isn't getting totally confused by these bogus numbers. But let's go
> back to using ktime_get() as that at least works correctly across the
> board.

Does this cure it?

---
 drivers/cpuidle/cpuidle.c | 2 ++
 kernel/sched/clock.c      | 7 +++----
 2 files changed, 5 insertions(+), 4 deletions(-)

Comments

Ville Syrjälä April 20, 2017, 1:43 p.m. UTC | #1
On Thu, Apr 20, 2017 at 03:08:13PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> > 
> > The TSC stops in deeper C states, 
> 
> On some old hardware (Core2 era and before) only. You've forgotten to
> mention what hardware you've observed problems with.

Yeah, Core2 is what I used when I finally decided to bisect this. I've
been plagued by the bogus powertop numbers on many machines, most
likely all of them were of some older vintage.

> 
> > so using local_clock() in cpuidle
> 
> But on said hardware, local_clock() isn't an immediate TSC user.
> 
> > to track the C state residency seems like a bad idea. With local_clock()
> > powertop is reporting mostly 0% residency for C states here. Presumably
> > the core is still spending most of its time in some deep C-state since
> > the totals typically add up to only 5% or so, so perhaps the governor
> > isn't getting totally confused by these bogus numbers. But let's go
> > back to using ktime_get() as that at least works correctly across the
> > board.
> 
> Does this cure it?

It does indeed.

Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> 
> ---
>  drivers/cpuidle/cpuidle.c | 2 ++
>  kernel/sched/clock.c      | 7 +++----
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 548b90be7685..e0d4ad108887 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -219,6 +219,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  	entered_state = target_state->enter(dev, drv, index);
>  	start_critical_timings();
>  
> +	sched_clock_idle_wakeup_event(0);
> +
>  	time_end = ns_to_ktime(local_clock());
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>  
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 00a45c45beca..15e848706be4 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -347,6 +347,9 @@ void sched_clock_tick(void)
>  {
>  	struct sched_clock_data *scd;
>  
> +	if (timekeeping_suspended)
> +		return;
> +
>  	WARN_ON_ONCE(!irqs_disabled());
>  
>  	/*
> @@ -378,11 +381,7 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
>   */
>  void sched_clock_idle_wakeup_event(u64 delta_ns)
>  {
> -	if (timekeeping_suspended)
> -		return;
> -
>  	sched_clock_tick();
> -	touch_softlockup_watchdog_sched();
>  }
>  EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
>
Peter Zijlstra April 20, 2017, 1:49 p.m. UTC | #2
On Thu, Apr 20, 2017 at 04:43:45PM +0300, Ville Syrjälä wrote:
> On Thu, Apr 20, 2017 at 03:08:13PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> > > 
> > > The TSC stops in deeper C states, 
> > 
> > On some old hardware (Core2 era and before) only. You've forgotten to
> > mention what hardware you've observed problems with.
> 
> Yeah, Core2 is what I used when I finally decided to bisect this. I've
> been plagued by the bogus powertop numbers on many machines, most
> likely all of them were of some older vintage.

> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

OK, thanks. I'm currently chasing some other Core2 issue that is
somewhat related. See:

  http://lkml.kernel.org/r/20170413132349.thxkwptdymsfsyxb@hirez.programming.kicks-ass.net

Once I have that sorted I'll post both patches.
Daniel Lezcano April 20, 2017, 2:37 p.m. UTC | #3
On Thu, Apr 20, 2017 at 03:08:13PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> > 
> > The TSC stops in deeper C states, 
> 
> On some old hardware (Core2 era and before) only. You've forgotten to
> mention what hardware you've observed problems with.
> 
> > so using local_clock() in cpuidle
> 
> But on said hardware, local_clock() isn't an immediate TSC user.
> 
> > to track the C state residency seems like a bad idea. With local_clock()
> > powertop is reporting mostly 0% residency for C states here. Presumably
> > the core is still spending most of its time in some deep C-state since
> > the totals typically add up to only 5% or so, so perhaps the governor
> > isn't getting totally confused by these bogus numbers. But let's go
> > back to using ktime_get() as that at least works correctly across the
> > board.
> 
> Does this cure it?
> 
> ---
>  drivers/cpuidle/cpuidle.c | 2 ++
>  kernel/sched/clock.c      | 7 +++----
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 548b90be7685..e0d4ad108887 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -219,6 +219,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  	entered_state = target_state->enter(dev, drv, index);
>  	start_critical_timings();
>  
> +	sched_clock_idle_wakeup_event(0);
> +

Is it planned to skip this if the tsc is reliable?

>  	time_end = ns_to_ktime(local_clock());
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>  
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 00a45c45beca..15e848706be4 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -347,6 +347,9 @@ void sched_clock_tick(void)
>  {
>  	struct sched_clock_data *scd;
>  
> +	if (timekeeping_suspended)
> +		return;
> +
>  	WARN_ON_ONCE(!irqs_disabled());
>  
>  	/*
> @@ -378,11 +381,7 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
>   */
>  void sched_clock_idle_wakeup_event(u64 delta_ns)
>  {
> -	if (timekeeping_suspended)
> -		return;
> -
>  	sched_clock_tick();
> -	touch_softlockup_watchdog_sched();
>  }
>  EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
>
Peter Zijlstra April 20, 2017, 2:41 p.m. UTC | #4
On Thu, Apr 20, 2017 at 04:37:51PM +0200, Daniel Lezcano wrote:

> >  
> > +	sched_clock_idle_wakeup_event(0);
> > +
> 
> Is it planned to skip this if the tsc is reliable?

Yes. Current code doesn't quite do that, but if you follow that link I
send earlier, I'm about to fix that (again).

Now, if only this Core2 piece of crap would actually boot a recent
kernel, I could go figure out wth is wrong.. :/
diff mbox

Patch

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 548b90be7685..e0d4ad108887 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -219,6 +219,8 @@  int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	entered_state = target_state->enter(dev, drv, index);
 	start_critical_timings();
 
+	sched_clock_idle_wakeup_event(0);
+
 	time_end = ns_to_ktime(local_clock());
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
 
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 00a45c45beca..15e848706be4 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -347,6 +347,9 @@  void sched_clock_tick(void)
 {
 	struct sched_clock_data *scd;
 
+	if (timekeeping_suspended)
+		return;
+
 	WARN_ON_ONCE(!irqs_disabled());
 
 	/*
@@ -378,11 +381,7 @@  EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
  */
 void sched_clock_idle_wakeup_event(u64 delta_ns)
 {
-	if (timekeeping_suspended)
-		return;
-
 	sched_clock_tick();
-	touch_softlockup_watchdog_sched();
 }
 EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);