diff mbox

SKL BOOT FAILURE unless idle=nomwait (was Re: PROBLEM: Cpufreq constantly keeps frequency at maximum on 4.5-rc4)

Message ID 20160311090306.1bfe380b@annuminas.surriel.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Rik van Riel March 11, 2016, 2:03 p.m. UTC
On Thu, 10 Mar 2016 00:59:01 +0100
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> OK, thanks.
> 
> Rik, that seems to go against the changelog of
> a9ceb78bc75ca47972096372ff3d48648b16317a:
> 
> "This is not a big deal on most x86 CPUs, which have very low C1
> latencies, and the patch should not have any effect on those CPUs."
> 
> The effect is actually measurable and quite substantial to my eyes.

Indeed, my mistake was testing not just against the predicted
latency, but against the predicted latency multiplied by the
load correction factor, which can be as much as 10x the load...

The patch below should fix that.

It didn't for Arto, due to the other issues on his system, but
it might resolve the issue for Doug, where cstate/pstate is
otherwise working fine.

Doug, does the patch below solve your issue?

If it does not, we should figure out why the idle state selection
loop is not selecting the right mode.

Is the latency_req "load correction" too aggressive?

Or is it only too aggressive for IDLE->HLT selection, and fine to
drive choices between deeper C states?

After all, if it causes the IDLE->HLT selection to go wrong, maybe
it is also causing us to pick shallower C states when we should be
picking deeper ones?

        /*
         * Find the idle state with the lowest power while satisfying
         * our constraints.
         */
        for (i = data->last_state_idx + 1; i < drv->state_count; i++) {
                struct cpuidle_state *s = &drv->states[i];
                struct cpuidle_state_usage *su = &dev->states_usage[i];

                if (s->disabled || su->disable)
                        continue;
                if (s->target_residency > data->predicted_us)
                        continue;
                if (s->exit_latency > latency_req)
                        continue;

                data->last_state_idx = i;
        }

---8<---

Subject: cpuidle: use predicted_us not interactivity_req to consider polling

The interactivity_req variable is the expected sleep time, divided
by the CPU load. This can be too aggressive a factor in deciding
whether or not to consider polling in the cpuidle state selection.

Use the (not corrected for load) predicted_us instead.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 drivers/cpuidle/governors/menu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Doug Smythies March 11, 2016, 6:22 p.m. UTC | #1
On 2016.03.11 06:03 Rik van Riel wrote:
> On Thu, 10 Mar 2016 00:59:01 +0100 "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>
>> Rik, that seems to go against the changelog of
>> a9ceb78bc75ca47972096372ff3d48648b16317a:
>> 
>> "This is not a big deal on most x86 CPUs, which have very low C1
>> latencies, and the patch should not have any effect on those CPUs."
>> 
>> The effect is actually measurable and quite substantial to my eyes.
>
> Indeed, my mistake was testing not just against the predicted
> latency, but against the predicted latency multiplied by the
> load correction factor, which can be as much as 10x the load...
>
> The patch below should fix that.
>
> It didn't for Arto, due to the other issues on his system, but
> it might resolve the issue for Doug, where cstate/pstate is
> otherwise working fine.
>
> Doug, does the patch below solve your issue?

No.

Old data restated with new data added below:
Aggregate times in each idle state for the 2000 second test:

k45rc7 (minutes)	reverted (mins)	rvr patch(mins)	State
20.1771917		2.638311483		19.11342298		0
13.02770225		21.81474838		13.55643397		1
3.428136783		3.951405		3.698494867		2
1.4540243		1.552488167		1.528558717		3
134.9057413		143.5533		138.5279812		4
			
172.9927963		173.5102531		176.4248918		total

>> Energy:
>>
>> Reverted: 56178 Joules
>> Kernel 4.5-rc7: 63269 Joules (revert saves 12.6% energy)
Kernel 4.5-rc7 + rvr patch: 62914 Joules

> If it does not, we should figure out why the idle state selection
> loop is not selecting the right mode.

For my part of it, I am struggling to understand this area of the code.
It would take me awhile, quite awhile, to be able to provide useful
input.

... Doug


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 0742b3296673..97022ae01d2e 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -330,7 +330,7 @@  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 		 * We want to default to C1 (hlt), not to busy polling
 		 * unless the timer is happening really really soon.
 		 */
-		if (interactivity_req > 20 &&
+		if (data->predicted_us > 20 &&
 		    !drv->states[CPUIDLE_DRIVER_STATE_START].disabled &&
 			dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0)
 			data->last_state_idx = CPUIDLE_DRIVER_STATE_START;