diff mbox

cpuidle: Do not use CPUIDLE_DRIVER_STATE_START in cpuidle.c

Message ID 5494512.JcUbNX5Mxs@vostro.rjw.lan (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Rafael J. Wysocki May 27, 2015, 1:36 a.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
entry in the cpuidle driver's table of states is overwritten with
the default "poll" entry by the core.  The "state" defined by the
"poll" entry doesn't provide ->enter_dead and ->enter_freeze
callbacks and its exit_latency is 0.

For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
will be skipped by the loop) and in find_deepest_state() (since
exit_latency is 0, the "poll state" will become the default if the
"s->exit_latency <= latency_req" check is replaced with
"s->exit_latency < latency_req" which may only matter for drivers
providing different states with the same exit_latency).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/cpuidle.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Daniel Lezcano May 27, 2015, 8:22 a.m. UTC | #1
On 05/27/2015 03:36 AM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
> CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
> However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
> entry in the cpuidle driver's table of states is overwritten with
> the default "poll" entry by the core.  The "state" defined by the
> "poll" entry doesn't provide ->enter_dead and ->enter_freeze
> callbacks and its exit_latency is 0.
>
> For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
> in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
> will be skipped by the loop) and in find_deepest_state() (since
> exit_latency is 0, the "poll state" will become the default if the
> "s->exit_latency <= latency_req" check is replaced with
> "s->exit_latency < latency_req" which may only matter for drivers
> providing different states with the same exit_latency).
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

I was about to send *exactly* the same patch :)

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
preeti May 27, 2015, 11:31 a.m. UTC | #2
On 05/27/2015 07:06 AM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
> CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
> However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
> entry in the cpuidle driver's table of states is overwritten with
> the default "poll" entry by the core.  The "state" defined by the
> "poll" entry doesn't provide ->enter_dead and ->enter_freeze
> callbacks and its exit_latency is 0.
> 
> For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
> in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
> will be skipped by the loop) and in find_deepest_state() (since
> exit_latency is 0, the "poll state" will become the default if the
> "s->exit_latency <= latency_req" check is replaced with
> "s->exit_latency < latency_req" which may only matter for drivers
> providing different states with the same exit_latency).
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpuidle/cpuidle.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
<snip>

> 
> @@ -79,13 +79,13 @@ static int find_deepest_state(struct cpu
>  			      bool freeze)
>  {
>  	unsigned int latency_req = 0;
> -	int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
> +	int i, ret = -ENXIO;
> 
> -	for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
> +	for (i = 0; i < drv->state_count; i++) {
>  		struct cpuidle_state *s = &drv->states[i];
>  		struct cpuidle_state_usage *su = &dev->states_usage[i];
> 
> -		if (s->disabled || su->disable || s->exit_latency <= latency_req
> +		if (s->disabled || su->disable || s->exit_latency < latency_req

Prior to this patch,

For drivers on which CPUIDLE_DRIVER_STATE_START takes a value 0 and
whose first idle state has an exit_latency of 0, find_deepest_state()
would return -1 if it failed to find a deeper idle state.

But as an effect of this patch, find_deepest_state() returns 0 in the
above circumstance.

My concern is if these drivers do not intend to enter a polling state
during suspend, this will cause an issue, won't it? This also gets me
wondering if polling state is an acceptable idle state during suspend,
given that the drivers with ARCH_HAS_CPU_RELAX permit entry into it
during suspend today. I would expect the cpus to be in a hardware
defined idle state.

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Lezcano May 27, 2015, 12:25 p.m. UTC | #3
On 05/27/2015 01:31 PM, Preeti U Murthy wrote:
> On 05/27/2015 07:06 AM, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
>> CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
>> However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
>> entry in the cpuidle driver's table of states is overwritten with
>> the default "poll" entry by the core.  The "state" defined by the
>> "poll" entry doesn't provide ->enter_dead and ->enter_freeze
>> callbacks and its exit_latency is 0.
>>
>> For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
>> in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
>> will be skipped by the loop) and in find_deepest_state() (since
>> exit_latency is 0, the "poll state" will become the default if the
>> "s->exit_latency <= latency_req" check is replaced with
>> "s->exit_latency < latency_req" which may only matter for drivers
>> providing different states with the same exit_latency).
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>   drivers/cpuidle/cpuidle.c |    8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
> <snip>
>
>>
>> @@ -79,13 +79,13 @@ static int find_deepest_state(struct cpu
>>   			      bool freeze)
>>   {
>>   	unsigned int latency_req = 0;
>> -	int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
>> +	int i, ret = -ENXIO;
>>
>> -	for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
>> +	for (i = 0; i < drv->state_count; i++) {
>>   		struct cpuidle_state *s = &drv->states[i];
>>   		struct cpuidle_state_usage *su = &dev->states_usage[i];
>>
>> -		if (s->disabled || su->disable || s->exit_latency <= latency_req
>> +		if (s->disabled || su->disable || s->exit_latency < latency_req
>
> Prior to this patch,
>
> For drivers on which CPUIDLE_DRIVER_STATE_START takes a value 0 and
> whose first idle state has an exit_latency of 0, find_deepest_state()
> would return -1 if it failed to find a deeper idle state.
> But as an effect of this patch, find_deepest_state() returns 0 in the
> above circumstance.

Except I am missing something, with an exit_latency = 0, the state will 
be never selected, because of the "s->exit_latency < latency_req" 
condition (strictly greater than).

> My concern is if these drivers do not intend to enter a polling state
> during suspend, this will cause an issue, won't it? This also gets me
> wondering if polling state is an acceptable idle state during suspend,
> given that the drivers with ARCH_HAS_CPU_RELAX permit entry into it
> during suspend today.

Definitively poll can cause thermal issues, especially when suspending. 
It is a dangerous state (let's imagine you close your laptop => 
suspend/poll and then put it in your bag for a travel).

I don't think with the code above we can reach this situation but I 
agree this is something we have to take care carefully.

Actually, I am in favour of removing poll at all from the cpuidle driver 
and poll only when a cpuidle state selection fails under certain condition.

So I fully agree with your statement below.

> I would expect the cpus to be in a hardware
> defined idle state.
Rafael J. Wysocki May 27, 2015, 1:57 p.m. UTC | #4
On Wed, May 27, 2015 at 2:25 PM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 05/27/2015 01:31 PM, Preeti U Murthy wrote:
>>
>> On 05/27/2015 07:06 AM, Rafael J. Wysocki wrote:
>>>
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
>>> CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
>>> However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
>>> entry in the cpuidle driver's table of states is overwritten with
>>> the default "poll" entry by the core.  The "state" defined by the
>>> "poll" entry doesn't provide ->enter_dead and ->enter_freeze
>>> callbacks and its exit_latency is 0.
>>>
>>> For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
>>> in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
>>> will be skipped by the loop) and in find_deepest_state() (since
>>> exit_latency is 0, the "poll state" will become the default if the
>>> "s->exit_latency <= latency_req" check is replaced with
>>> "s->exit_latency < latency_req" which may only matter for drivers
>>> providing different states with the same exit_latency).
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> ---
>>>   drivers/cpuidle/cpuidle.c |    8 ++++----
>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> <snip>
>>
>>>
>>> @@ -79,13 +79,13 @@ static int find_deepest_state(struct cpu
>>>                               bool freeze)
>>>   {
>>>         unsigned int latency_req = 0;
>>> -       int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
>>> +       int i, ret = -ENXIO;
>>>
>>> -       for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
>>> +       for (i = 0; i < drv->state_count; i++) {
>>>                 struct cpuidle_state *s = &drv->states[i];
>>>                 struct cpuidle_state_usage *su = &dev->states_usage[i];
>>>
>>> -               if (s->disabled || su->disable || s->exit_latency <=
>>> latency_req
>>> +               if (s->disabled || su->disable || s->exit_latency <
>>> latency_req
>>
>>
>> Prior to this patch,
>>
>> For drivers on which CPUIDLE_DRIVER_STATE_START takes a value 0 and
>> whose first idle state has an exit_latency of 0, find_deepest_state()
>> would return -1 if it failed to find a deeper idle state.
>> But as an effect of this patch, find_deepest_state() returns 0 in the
>> above circumstance.
>
>
> Except I am missing something, with an exit_latency = 0, the state will be
> never selected, because of the "s->exit_latency < latency_req" condition
> (strictly greater than).

No, this is the condition to skip the state, so previously it wouldn't
be selected, but after the patch it will.

So yes, the patch changes behavior for systems with
CONFIG_ARCH_HAS_CPU_RELAX unset.

>> My concern is if these drivers do not intend to enter a polling state
>> during suspend, this will cause an issue, won't it?

The change in behavior happens for architectures where
CONFIG_ARCH_HAS_CPU_RELAX is not set.  In those cases the 0-index
state is supposed to be provided by the driver.  Is there a reason to
expect that this may not be a genuine idle state?

>> This also gets me
>> wondering if polling state is an acceptable idle state during suspend,
>> given that the drivers with ARCH_HAS_CPU_RELAX permit entry into it
>> during suspend today.

That actually is a good question.

> Definitively poll can cause thermal issues, especially when suspending. It
> is a dangerous state (let's imagine you close your laptop => suspend/poll
> and then put it in your bag for a travel).

With ARCH_HAS_CPU_RELAX set the "poll state" is supposed to be thermally safe.

> I don't think with the code above we can reach this situation but I agree
> this is something we have to take care carefully.
>
> Actually, I am in favour of removing poll at all from the cpuidle driver and
> poll only when a cpuidle state selection fails under certain condition.
>
> So I fully agree with your statement below.
>
>> I would expect the cpus to be in a hardware
>> defined idle state.

Well, except for the degenerate case in which all of them are disabled
and for the "broadcast timer stopping aviodance" use case for
find_deepest_state().

So there are two questions in my view:
(1) Should find_deepest_state() ever return states with exit_latency equal to 0?
(2) If the answer to (1) is "yes", should the "poll state" be ever
returned by find_deepest_state()?

In any case, find_deepest_state() will only return a state with
exit_latency equal to 0 if there's no other choice and if it returns
nothing, we'll fall back to the architecture idle method.  So the
question really is whether or not falling back to arch idle is any
better than using any state we have in the table.

The patch is based on the assumption that any state from the table
will be better than arch idle, including the "polling state".  If that
does not hold, we'll need to rethink a couple of other things in my
view.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
preeti May 27, 2015, 4:19 p.m. UTC | #5
On 05/27/2015 07:27 PM, Rafael J. Wysocki wrote:
> On Wed, May 27, 2015 at 2:25 PM, Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>> On 05/27/2015 01:31 PM, Preeti U Murthy wrote:
>>>
>>> On 05/27/2015 07:06 AM, Rafael J. Wysocki wrote:
>>>>
>>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>
>>>> The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
>>>> CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
>>>> However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
>>>> entry in the cpuidle driver's table of states is overwritten with
>>>> the default "poll" entry by the core.  The "state" defined by the
>>>> "poll" entry doesn't provide ->enter_dead and ->enter_freeze
>>>> callbacks and its exit_latency is 0.
>>>>
>>>> For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
>>>> in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
>>>> will be skipped by the loop) and in find_deepest_state() (since
>>>> exit_latency is 0, the "poll state" will become the default if the
>>>> "s->exit_latency <= latency_req" check is replaced with
>>>> "s->exit_latency < latency_req" which may only matter for drivers
>>>> providing different states with the same exit_latency).
>>>>
>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>> ---
>>>>   drivers/cpuidle/cpuidle.c |    8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> <snip>
>>>
>>>>
>>>> @@ -79,13 +79,13 @@ static int find_deepest_state(struct cpu
>>>>                               bool freeze)
>>>>   {
>>>>         unsigned int latency_req = 0;
>>>> -       int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
>>>> +       int i, ret = -ENXIO;
>>>>
>>>> -       for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
>>>> +       for (i = 0; i < drv->state_count; i++) {
>>>>                 struct cpuidle_state *s = &drv->states[i];
>>>>                 struct cpuidle_state_usage *su = &dev->states_usage[i];
>>>>
>>>> -               if (s->disabled || su->disable || s->exit_latency <=
>>>> latency_req
>>>> +               if (s->disabled || su->disable || s->exit_latency <
>>>> latency_req
>>>
>>>
>>> Prior to this patch,
>>>
>>> For drivers on which CPUIDLE_DRIVER_STATE_START takes a value 0 and
>>> whose first idle state has an exit_latency of 0, find_deepest_state()
>>> would return -1 if it failed to find a deeper idle state.
>>> But as an effect of this patch, find_deepest_state() returns 0 in the
>>> above circumstance.
>>
>>
>> Except I am missing something, with an exit_latency = 0, the state will be
>> never selected, because of the "s->exit_latency < latency_req" condition
>> (strictly greater than).
> 
> No, this is the condition to skip the state, so previously it wouldn't
> be selected, but after the patch it will.
> 
> So yes, the patch changes behavior for systems with
> CONFIG_ARCH_HAS_CPU_RELAX unset.
> 
>>> My concern is if these drivers do not intend to enter a polling state
>>> during suspend, this will cause an issue, won't it?
> 
> The change in behavior happens for architectures where
> CONFIG_ARCH_HAS_CPU_RELAX is not set.  In those cases the 0-index
> state is supposed to be provided by the driver.  Is there a reason to
> expect that this may not be a genuine idle state?

On PowerPC, we have the 0-index idle state, whose exit_latency is 0 and
all that the CPU does in this state, is poll on need_resched(), except
at a lower priority from the hardware's standpoint. Nevertheless, the
CPU is busy polling. So, I would not consider it a genuine idle state.

On a side note, we do not yet support suspend on Power servers, but we
may in the future. Hence the concern.

>> Definitively poll can cause thermal issues, especially when suspending. It
>> is a dangerous state (let's imagine you close your laptop => suspend/poll
>> and then put it in your bag for a travel).
> 
> With ARCH_HAS_CPU_RELAX set the "poll state" is supposed to be thermally safe.
> 
>> I don't think with the code above we can reach this situation but I agree
>> this is something we have to take care carefully.
>>
>> Actually, I am in favour of removing poll at all from the cpuidle driver and
>> poll only when a cpuidle state selection fails under certain condition.
>>
>> So I fully agree with your statement below.
>>
>>> I would expect the cpus to be in a hardware
>>> defined idle state.
> 
> Well, except for the degenerate case in which all of them are disabled
> and for the "broadcast timer stopping aviodance" use case for
> find_deepest_state().

During suspend, the CPUs can very well enter states where the timer
stops since we stop timer interrupts anyway. So unless the idle states
are explicitly disabled by the user/hardware for some reason, deeper
idle states will still be available during suspend as far as I can see.

> 
> So there are two questions in my view:
> (1) Should find_deepest_state() ever return states with exit_latency equal to 0?

I would say no, since such an idle state would mostly be polling on a
wakeup event. Atleast, there is one such case in PowerPC.

> (2) If the answer to (1) is "yes", should the "poll state" be ever
> returned by find_deepest_state()?
> 
> In any case, find_deepest_state() will only return a state with
> exit_latency equal to 0 if there's no other choice and if it returns
> nothing, we'll fall back to the architecture idle method.  So the
> question really is whether or not falling back to arch idle is any
> better than using any state we have in the table.

My suggestion is to *not* fall back to arch idle code, since that is a
black box from the core cpuidle's perspective.

> 
> The patch is based on the assumption that any state from the table
> will be better than arch idle, including the "polling state".  If that
> does not hold, we'll need to rethink a couple of other things in my
> view.

We could fail suspend if a non-polling idle state is not available perhaps ?

Regards
Preeti U Murthy
> 
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 28, 2015, 12:51 a.m. UTC | #6
On Wednesday, May 27, 2015 09:49:54 PM Preeti U Murthy wrote:
> On 05/27/2015 07:27 PM, Rafael J. Wysocki wrote:
> > On Wed, May 27, 2015 at 2:25 PM, Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >> On 05/27/2015 01:31 PM, Preeti U Murthy wrote:
> >>>
> >>> On 05/27/2015 07:06 AM, Rafael J. Wysocki wrote:
> >>>>
> >>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>>
> >>>> The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
> >>>> CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
> >>>> However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
> >>>> entry in the cpuidle driver's table of states is overwritten with
> >>>> the default "poll" entry by the core.  The "state" defined by the
> >>>> "poll" entry doesn't provide ->enter_dead and ->enter_freeze
> >>>> callbacks and its exit_latency is 0.
> >>>>
> >>>> For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
> >>>> in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
> >>>> will be skipped by the loop) and in find_deepest_state() (since
> >>>> exit_latency is 0, the "poll state" will become the default if the
> >>>> "s->exit_latency <= latency_req" check is replaced with
> >>>> "s->exit_latency < latency_req" which may only matter for drivers
> >>>> providing different states with the same exit_latency).
> >>>>
> >>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>> ---
> >>>>   drivers/cpuidle/cpuidle.c |    8 ++++----
> >>>>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>>
> >>> <snip>
> >>>
> >>>>
> >>>> @@ -79,13 +79,13 @@ static int find_deepest_state(struct cpu
> >>>>                               bool freeze)
> >>>>   {
> >>>>         unsigned int latency_req = 0;
> >>>> -       int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
> >>>> +       int i, ret = -ENXIO;
> >>>>
> >>>> -       for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
> >>>> +       for (i = 0; i < drv->state_count; i++) {
> >>>>                 struct cpuidle_state *s = &drv->states[i];
> >>>>                 struct cpuidle_state_usage *su = &dev->states_usage[i];
> >>>>
> >>>> -               if (s->disabled || su->disable || s->exit_latency <=
> >>>> latency_req
> >>>> +               if (s->disabled || su->disable || s->exit_latency <
> >>>> latency_req
> >>>
> >>>
> >>> Prior to this patch,
> >>>
> >>> For drivers on which CPUIDLE_DRIVER_STATE_START takes a value 0 and
> >>> whose first idle state has an exit_latency of 0, find_deepest_state()
> >>> would return -1 if it failed to find a deeper idle state.
> >>> But as an effect of this patch, find_deepest_state() returns 0 in the
> >>> above circumstance.
> >>
> >>
> >> Except I am missing something, with an exit_latency = 0, the state will be
> >> never selected, because of the "s->exit_latency < latency_req" condition
> >> (strictly greater than).
> > 
> > No, this is the condition to skip the state, so previously it wouldn't
> > be selected, but after the patch it will.
> > 
> > So yes, the patch changes behavior for systems with
> > CONFIG_ARCH_HAS_CPU_RELAX unset.
> > 
> >>> My concern is if these drivers do not intend to enter a polling state
> >>> during suspend, this will cause an issue, won't it?
> > 
> > The change in behavior happens for architectures where
> > CONFIG_ARCH_HAS_CPU_RELAX is not set.  In those cases the 0-index
> > state is supposed to be provided by the driver.  Is there a reason to
> > expect that this may not be a genuine idle state?
> 
> On PowerPC, we have the 0-index idle state, whose exit_latency is 0 and
> all that the CPU does in this state, is poll on need_resched(), except
> at a lower priority from the hardware's standpoint. Nevertheless, the
> CPU is busy polling. So, I would not consider it a genuine idle state.

OK, that's good to know.

Arguably, then, returning states with exit_latency equal to 0 from
find_deepest_state() may not be safe in general.

Well, we can make that rule, so I'll send an updated patch with that taken
into account.

> On a side note, we do not yet support suspend on Power servers, but we
> may in the future. Hence the concern.
> 
> >> Definitively poll can cause thermal issues, especially when suspending. It
> >> is a dangerous state (let's imagine you close your laptop => suspend/poll
> >> and then put it in your bag for a travel).
> > 
> > With ARCH_HAS_CPU_RELAX set the "poll state" is supposed to be thermally safe.
> > 
> >> I don't think with the code above we can reach this situation but I agree
> >> this is something we have to take care carefully.
> >>
> >> Actually, I am in favour of removing poll at all from the cpuidle driver and
> >> poll only when a cpuidle state selection fails under certain condition.
> >>
> >> So I fully agree with your statement below.
> >>
> >>> I would expect the cpus to be in a hardware
> >>> defined idle state.
> > 
> > Well, except for the degenerate case in which all of them are disabled
> > and for the "broadcast timer stopping aviodance" use case for
> > find_deepest_state().
> 
> During suspend, the CPUs can very well enter states where the timer
> stops since we stop timer interrupts anyway.

That's if you provide ->enter_freeze callbacks, otherwise it works like
runtime idle.

> So unless the idle states
> are explicitly disabled by the user/hardware for some reason, deeper
> idle states will still be available during suspend as far as I can see.

The case at hand is when ->enter_freeze callbacks are not available or
all states having them are disabled.

> > So there are two questions in my view:
> > (1) Should find_deepest_state() ever return states with exit_latency equal to 0?
> 
> I would say no, since such an idle state would mostly be polling on a
> wakeup event. Atleast, there is one such case in PowerPC.

OK

> > (2) If the answer to (1) is "yes", should the "poll state" be ever
> > returned by find_deepest_state()?
> > 
> > In any case, find_deepest_state() will only return a state with
> > exit_latency equal to 0 if there's no other choice and if it returns
> > nothing, we'll fall back to the architecture idle method.  So the
> > question really is whether or not falling back to arch idle is any
> > better than using any state we have in the table.
> 
> My suggestion is to *not* fall back to arch idle code, since that is a
> black box from the core cpuidle's perspective.

Well, we do that today in some cases.

> > The patch is based on the assumption that any state from the table
> > will be better than arch idle, including the "polling state".  If that
> > does not hold, we'll need to rethink a couple of other things in my
> > view.
> 
> We could fail suspend if a non-polling idle state is not available perhaps ?

The only thing we can do is to wake up the system immediately if there's
no "genuine" state to enter.  I can send a (separate) patch for that too.
diff mbox

Patch

Index: linux-pm/drivers/cpuidle/cpuidle.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/cpuidle.c
+++ linux-pm/drivers/cpuidle/cpuidle.c
@@ -65,7 +65,7 @@  int cpuidle_play_dead(void)
 		return -ENODEV;
 
 	/* Find lowest-power state that supports long-term idle */
-	for (i = drv->state_count - 1; i >= CPUIDLE_DRIVER_STATE_START; i--)
+	for (i = drv->state_count - 1; i >= 0; i--)
 		if (drv->states[i].enter_dead)
 			return drv->states[i].enter_dead(dev, i);
 
@@ -79,13 +79,13 @@  static int find_deepest_state(struct cpu
 			      bool freeze)
 {
 	unsigned int latency_req = 0;
-	int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
+	int i, ret = -ENXIO;
 
-	for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
+	for (i = 0; i < drv->state_count; i++) {
 		struct cpuidle_state *s = &drv->states[i];
 		struct cpuidle_state_usage *su = &dev->states_usage[i];
 
-		if (s->disabled || su->disable || s->exit_latency <= latency_req
+		if (s->disabled || su->disable || s->exit_latency < latency_req
 		    || s->exit_latency > max_latency
 		    || (s->flags & forbidden_flags)
 		    || (freeze && !cpuidle_has_enter_freeze(s)))