diff mbox

[RFC] cpuidle: mvebu: indicate failure to enter deeper sleep states

Message ID E1ZiI7T-0002zo-Ez@rmk-PC.arm.linux.org.uk (mailing list archive)
State RFC, archived
Headers show

Commit Message

Russell King Oct. 3, 2015, 8:24 a.m. UTC
The cpuidle ->enter method expects the return value to be the sleep
state we entered.  Returning negative numbers or other codes is not
permissible since coupled CPU idle was merged.

At least some of the mvebu_v7_cpu_suspend() implementations return the
value from cpu_suspend(), which returns zero if the CPU vectors back
into the kernel via cpu_resume() (the success case), or the non-zero
return value of the suspend actor, or one (failure cases).

We do not want to be returning the failure case value back to CPU idle
as that indicates that we successfully entered one of the deeper idle
states.  Always return zero instead, indicating that we slept for the
shortest amount of time.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
Please think about this - having only an Armada 388 platform, I can't
test this due to CPU idle being disabled there.  However, the code paths
indicate that if we fail to sleep in armada_370_xp_pmsu_idle_enter() or
armada_38x_do_cpu_suspend() fail to sleep, we end up reporting that we
successfully slept in CPU idle state 1 in both cases.

Rafael - the code in cpuidle.c prior to the addition of coupled CPU idle
seems to allow the return of negative error numbers from the ->enter()
method:

        if (entered_state >= 0) {
                /* Update cpuidle counters */
                /* This can be moved to within driver enter routine
                 * but that results in multiple copies of same code.
                 */
                dev->states_usage[entered_state].time += dev->last_residency;
                dev->states_usage[entered_state].usage++;
        } else {
                dev->last_residency = 0;
        }

Since coupled CPU idle, negative return codes will give a negative table
index inside cpuidle_state_is_coupled() - which is obviously buggy.  Does
this need fixing?  If so, this patch below is wrong, and we should be
returning a negative errno rather than zero on failure to enter a low
power state.

 drivers/cpuidle/cpuidle-mvebu-v7.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Daniel Lezcano Oct. 5, 2015, 11:45 a.m. UTC | #1
On 10/03/2015 10:24 AM, Russell King wrote:
> The cpuidle ->enter method expects the return value to be the sleep
> state we entered.  Returning negative numbers or other codes is not
> permissible since coupled CPU idle was merged.



> At least some of the mvebu_v7_cpu_suspend() implementations return the
> value from cpu_suspend(), which returns zero if the CPU vectors back
> into the kernel via cpu_resume() (the success case), or the non-zero
> return value of the suspend actor, or one (failure cases).
>
> We do not want to be returning the failure case value back to CPU idle
> as that indicates that we successfully entered one of the deeper idle
> states.  Always return zero instead, indicating that we slept for the
> shortest amount of time.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
> Please think about this - having only an Armada 388 platform, I can't
> test this due to CPU idle being disabled there.  However, the code paths
> indicate that if we fail to sleep in armada_370_xp_pmsu_idle_enter() or
> armada_38x_do_cpu_suspend() fail to sleep, we end up reporting that we
> successfully slept in CPU idle state 1 in both cases.
>
> Rafael - the code in cpuidle.c prior to the addition of coupled CPU idle
> seems to allow the return of negative error numbers from the ->enter()
> method:
>
>          if (entered_state >= 0) {
>                  /* Update cpuidle counters */
>                  /* This can be moved to within driver enter routine
>                   * but that results in multiple copies of same code.
>                   */
>                  dev->states_usage[entered_state].time += dev->last_residency;
>                  dev->states_usage[entered_state].usage++;
>          } else {
>                  dev->last_residency = 0;
>          }
>
> Since coupled CPU idle, negative return codes will give a negative table
> index inside cpuidle_state_is_coupled() - which is obviously buggy.  Does
> this need fixing?  If so, this patch below is wrong, and we should be
> returning a negative errno rather than zero on failure to enter a low
> power state.

It is not possible since the commit 0b89e9aa2.

This change was to prevent to re-enable the local interrupt when we are 
in the process of coupling an idle state and let the coupled code to 
handle that.

If the backend driver changed the state in our back (for example to a 
shallower state) to a non coupled one, the local interrupt will be 
re-enabled. If at the contrary, the backend driver choose a deeper idle 
state which is coupled, we won't re-enable the interrupt. I don't know 
any driver doing this but the code allows it (not sure it is a good 
thing - anyway).

So the test:

if (!cpuidle_state_is_coupled(dev, drv, entered_state))

should be:

if (!cpuidle_state_is_coupled(dev, drv, *index*))

... in order to have the decisions made when entering this function 
symmetric to the ones which will be taken when exiting the function 
because they will be based on the same index.

It is the same behavior than the broadcast timer enter/exit in the same 
function.

To be clean, the above should be fixed.

But in any case, armada_370_xp_pmsu_idle_enter and 
armada_38x_do_cpu_suspend can return a negative value because they don't 
depend on the ARCH_NEEDS_CPU_IDLE_COUPLED, and cpuidle_state_is_coupled 
is an inline for 'return false'.

  -- Daniel
Russell King (Oracle) May 17, 2016, 9:20 a.m. UTC | #2
On Mon, Oct 05, 2015 at 01:45:21PM +0200, Daniel Lezcano wrote:
> On 10/03/2015 10:24 AM, Russell King wrote:
> >The cpuidle ->enter method expects the return value to be the sleep
> >state we entered.  Returning negative numbers or other codes is not
> >permissible since coupled CPU idle was merged.
> 
> 
> 
> >At least some of the mvebu_v7_cpu_suspend() implementations return the
> >value from cpu_suspend(), which returns zero if the CPU vectors back
> >into the kernel via cpu_resume() (the success case), or the non-zero
> >return value of the suspend actor, or one (failure cases).
> >
> >We do not want to be returning the failure case value back to CPU idle
> >as that indicates that we successfully entered one of the deeper idle
> >states.  Always return zero instead, indicating that we slept for the
> >shortest amount of time.
> >
> >Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> >---
> >Please think about this - having only an Armada 388 platform, I can't
> >test this due to CPU idle being disabled there.  However, the code paths
> >indicate that if we fail to sleep in armada_370_xp_pmsu_idle_enter() or
> >armada_38x_do_cpu_suspend() fail to sleep, we end up reporting that we
> >successfully slept in CPU idle state 1 in both cases.
> >
> >Rafael - the code in cpuidle.c prior to the addition of coupled CPU idle
> >seems to allow the return of negative error numbers from the ->enter()
> >method:
> >
> >         if (entered_state >= 0) {
> >                 /* Update cpuidle counters */
> >                 /* This can be moved to within driver enter routine
> >                  * but that results in multiple copies of same code.
> >                  */
> >                 dev->states_usage[entered_state].time += dev->last_residency;
> >                 dev->states_usage[entered_state].usage++;
> >         } else {
> >                 dev->last_residency = 0;
> >         }
> >
> >Since coupled CPU idle, negative return codes will give a negative table
> >index inside cpuidle_state_is_coupled() - which is obviously buggy.  Does
> >this need fixing?  If so, this patch below is wrong, and we should be
> >returning a negative errno rather than zero on failure to enter a low
> >power state.
> 
> It is not possible since the commit 0b89e9aa2.
> 
> This change was to prevent to re-enable the local interrupt when we are in
> the process of coupling an idle state and let the coupled code to handle
> that.
> 
> If the backend driver changed the state in our back (for example to a
> shallower state) to a non coupled one, the local interrupt will be
> re-enabled. If at the contrary, the backend driver choose a deeper idle
> state which is coupled, we won't re-enable the interrupt. I don't know any
> driver doing this but the code allows it (not sure it is a good thing -
> anyway).
> 
> So the test:
> 
> if (!cpuidle_state_is_coupled(dev, drv, entered_state))
> 
> should be:
> 
> if (!cpuidle_state_is_coupled(dev, drv, *index*))
> 
> ... in order to have the decisions made when entering this function
> symmetric to the ones which will be taken when exiting the function because
> they will be based on the same index.

Is someone going to fix this seven month bug?
diff mbox

Patch

diff --git a/drivers/cpuidle/cpuidle-mvebu-v7.c b/drivers/cpuidle/cpuidle-mvebu-v7.c
index 01a856971f05..18ded9e7cb34 100644
--- a/drivers/cpuidle/cpuidle-mvebu-v7.c
+++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
@@ -39,8 +39,12 @@  static int mvebu_v7_enter_idle(struct cpuidle_device *dev,
 	ret = mvebu_v7_cpu_suspend(deepidle);
 	cpu_pm_exit();
 
+	/*
+	 * If we failed to enter the desired state, indicate that we
+	 * slept lightly.
+	 */
 	if (ret)
-		return ret;
+		return 0;
 
 	return index;
 }