diff mbox

[2/2] PM / Domains: Avoid infinite loops in attach/detach code

Message ID 1434958282-27376-1-git-send-email-geert+renesas@glider.be (mailing list archive)
State Superseded
Delegated to: Geert Uytterhoeven
Headers show

Commit Message

Geert Uytterhoeven June 22, 2015, 7:31 a.m. UTC
If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end
up with an infinite loop in genpd_dev_pm_{at,de}tach().

This may happen due to a genpd.prepared_count imbalance.  This is a bug
elsewhere, but it will result in a system lock up, possibly during
reboot of an otherwise functioning system.

To avoid this, put a limit on the maximum number of loop iterations,
including a simple back-off mechanism.  If the limit is reached, the
operation will just fail.  An error message is already printed.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/base/power/domain.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki June 22, 2015, 11:41 p.m. UTC | #1
On Monday, June 22, 2015 09:31:22 AM Geert Uytterhoeven wrote:
> If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end
> up with an infinite loop in genpd_dev_pm_{at,de}tach().
> 
> This may happen due to a genpd.prepared_count imbalance.  This is a bug
> elsewhere, but it will result in a system lock up, possibly during
> reboot of an otherwise functioning system.
> 
> To avoid this, put a limit on the maximum number of loop iterations,
> including a simple back-off mechanism.  If the limit is reached, the
> operation will just fail.  An error message is already printed.
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

This was too late for my first 4.2 pull request, but I may push it in the
second half of the merge window.  Does it depend on anything?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
Geert Uytterhoeven June 23, 2015, 7:16 a.m. UTC | #2
Hi Rafael,

On Tue, Jun 23, 2015 at 1:41 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Monday, June 22, 2015 09:31:22 AM Geert Uytterhoeven wrote:
>> If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end
>> up with an infinite loop in genpd_dev_pm_{at,de}tach().
>>
>> This may happen due to a genpd.prepared_count imbalance.  This is a bug
>> elsewhere, but it will result in a system lock up, possibly during
>> reboot of an otherwise functioning system.
>>
>> To avoid this, put a limit on the maximum number of loop iterations,
>> including a simple back-off mechanism.  If the limit is reached, the
>> operation will just fail.  An error message is already printed.
>>
>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>
> This was too late for my first 4.2 pull request, but I may push it in the
> second half of the merge window.  Does it depend on anything?

Not that I'm aware of.

It applies fine on v4.1 or next-20150622.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
Ulf Hansson June 23, 2015, 12:50 p.m. UTC | #3
On 22 June 2015 at 09:31, Geert Uytterhoeven <geert+renesas@glider.be> wrote:
> If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end
> up with an infinite loop in genpd_dev_pm_{at,de}tach().
>
> This may happen due to a genpd.prepared_count imbalance.  This is a bug
> elsewhere, but it will result in a system lock up, possibly during
> reboot of an otherwise functioning system.
>
> To avoid this, put a limit on the maximum number of loop iterations,
> including a simple back-off mechanism.  If the limit is reached, the
> operation will just fail.  An error message is already printed.
>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
>  drivers/base/power/domain.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index cdd547bd67df8218..60e0309dd8dd0264 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -6,6 +6,7 @@
>   * This file is released under the GPLv2.
>   */
>
> +#include <linux/delay.h>
>  #include <linux/kernel.h>
>  #include <linux/io.h>
>  #include <linux/platform_device.h>
> @@ -19,6 +20,9 @@
>  #include <linux/suspend.h>
>  #include <linux/export.h>
>
> +#define GENPD_RETRIES  20
> +#define GENPD_DELAY_US 10
> +
>  #define GENPD_DEV_CALLBACK(genpd, type, callback, dev)         \
>  ({                                                             \
>         type (*__routine)(struct device *__d);                  \
> @@ -2131,6 +2135,7 @@ EXPORT_SYMBOL_GPL(of_genpd_get_from_provider);
>  static void genpd_dev_pm_detach(struct device *dev, bool power_off)
>  {
>         struct generic_pm_domain *pd;
> +       unsigned int i;
>         int ret = 0;
>
>         pd = pm_genpd_lookup_dev(dev);
> @@ -2139,10 +2144,13 @@ static void genpd_dev_pm_detach(struct device *dev, bool power_off)
>
>         dev_dbg(dev, "removing from PM domain %s\n", pd->name);
>
> -       while (1) {
> +       for (i = 0; i < GENPD_RETRIES; i++) {
>                 ret = pm_genpd_remove_device(pd, dev);
>                 if (ret != -EAGAIN)
>                         break;
> +
> +               if (i > GENPD_RETRIES / 2)
> +                       udelay(GENPD_DELAY_US);
>                 cond_resched();
>         }
>
> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev)
>  {
>         struct of_phandle_args pd_args;
>         struct generic_pm_domain *pd;
> +       unsigned int i;
>         int ret;
>
>         if (!dev->of_node)
> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev)
>
>         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
>
> -       while (1) {
> +       for (i = 0; i < GENPD_RETRIES; i++) {
>                 ret = pm_genpd_add_device(pd, dev);
>                 if (ret != -EAGAIN)
>                         break;
> +
> +               if (i > GENPD_RETRIES / 2)
> +                       udelay(GENPD_DELAY_US);

In this execution path, we retry when getting -EAGAIN while believing
the reason to the error are only *temporary* as we are soon waiting
for all devices in the genpd to be system PM resumed. At least that's
my understanding to why we want to deal with -EAGAIN here, but I might
be wrong.

In this regards, I wonder whether it could be better to re-try only a
few times but with a far longer interval time than a couple us. What
do you think?

However, what if the reason to why we get -EAGAIN isn't *temporary*,
because we are about to enter system PM suspend state. Then the caller
of this function which comes via some bus' ->probe(), will hang until
the a system PM resume is completed. Is that really going to work? So,
for this case your limited re-try approach will affect this scenario
as well, have you considered that?

>                 cond_resched();
>         }
>
> --
> 1.9.1
>

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven June 23, 2015, 1:20 p.m. UTC | #4
Hi Ulf,

On Tue, Jun 23, 2015 at 2:50 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 22 June 2015 at 09:31, Geert Uytterhoeven <geert+renesas@glider.be> wrote:
>> If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end
>> up with an infinite loop in genpd_dev_pm_{at,de}tach().
>>
>> This may happen due to a genpd.prepared_count imbalance.  This is a bug
>> elsewhere, but it will result in a system lock up, possibly during
>> reboot of an otherwise functioning system.
>>
>> To avoid this, put a limit on the maximum number of loop iterations,
>> including a simple back-off mechanism.  If the limit is reached, the
>> operation will just fail.  An error message is already printed.
>>
>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>> ---
>>  drivers/base/power/domain.c | 16 ++++++++++++++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
>> index cdd547bd67df8218..60e0309dd8dd0264 100644
>> --- a/drivers/base/power/domain.c
>> +++ b/drivers/base/power/domain.c
>> @@ -6,6 +6,7 @@
>>   * This file is released under the GPLv2.
>>   */
>>
>> +#include <linux/delay.h>
>>  #include <linux/kernel.h>
>>  #include <linux/io.h>
>>  #include <linux/platform_device.h>
>> @@ -19,6 +20,9 @@
>>  #include <linux/suspend.h>
>>  #include <linux/export.h>
>>
>> +#define GENPD_RETRIES  20
>> +#define GENPD_DELAY_US 10
>> +
>>  #define GENPD_DEV_CALLBACK(genpd, type, callback, dev)         \
>>  ({                                                             \
>>         type (*__routine)(struct device *__d);                  \
>> @@ -2131,6 +2135,7 @@ EXPORT_SYMBOL_GPL(of_genpd_get_from_provider);
>>  static void genpd_dev_pm_detach(struct device *dev, bool power_off)
>>  {
>>         struct generic_pm_domain *pd;
>> +       unsigned int i;
>>         int ret = 0;
>>
>>         pd = pm_genpd_lookup_dev(dev);
>> @@ -2139,10 +2144,13 @@ static void genpd_dev_pm_detach(struct device *dev, bool power_off)
>>
>>         dev_dbg(dev, "removing from PM domain %s\n", pd->name);
>>
>> -       while (1) {
>> +       for (i = 0; i < GENPD_RETRIES; i++) {
>>                 ret = pm_genpd_remove_device(pd, dev);
>>                 if (ret != -EAGAIN)
>>                         break;
>> +
>> +               if (i > GENPD_RETRIES / 2)
>> +                       udelay(GENPD_DELAY_US);
>>                 cond_resched();
>>         }
>>
>> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev)
>>  {
>>         struct of_phandle_args pd_args;
>>         struct generic_pm_domain *pd;
>> +       unsigned int i;
>>         int ret;
>>
>>         if (!dev->of_node)
>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev)
>>
>>         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
>>
>> -       while (1) {
>> +       for (i = 0; i < GENPD_RETRIES; i++) {
>>                 ret = pm_genpd_add_device(pd, dev);
>>                 if (ret != -EAGAIN)
>>                         break;
>> +
>> +               if (i > GENPD_RETRIES / 2)
>> +                       udelay(GENPD_DELAY_US);
>
> In this execution path, we retry when getting -EAGAIN while believing
> the reason to the error are only *temporary* as we are soon waiting
> for all devices in the genpd to be system PM resumed. At least that's
> my understanding to why we want to deal with -EAGAIN here, but I might
> be wrong.
>
> In this regards, I wonder whether it could be better to re-try only a
> few times but with a far longer interval time than a couple us. What
> do you think?

That's indeed viable. I have no idea for how long this temporary state can
extend.

> However, what if the reason to why we get -EAGAIN isn't *temporary*,
> because we are about to enter system PM suspend state. Then the caller
> of this function which comes via some bus' ->probe(), will hang until
> the a system PM resume is completed. Is that really going to work? So,
> for this case your limited re-try approach will affect this scenario
> as well, have you considered that?

There's a limit on the number of retries, so it won't hang indefinitely.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki June 23, 2015, 1:38 p.m. UTC | #5
On Tue, Jun 23, 2015 at 3:20 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> Hi Ulf,
>
> On Tue, Jun 23, 2015 at 2:50 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 22 June 2015 at 09:31, Geert Uytterhoeven <geert+renesas@glider.be> wrote:
>>> If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end
>>> up with an infinite loop in genpd_dev_pm_{at,de}tach().
>>>
>>> This may happen due to a genpd.prepared_count imbalance.  This is a bug
>>> elsewhere, but it will result in a system lock up, possibly during
>>> reboot of an otherwise functioning system.
>>>
>>> To avoid this, put a limit on the maximum number of loop iterations,
>>> including a simple back-off mechanism.  If the limit is reached, the
>>> operation will just fail.  An error message is already printed.
>>>
>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>>> ---
>>>  drivers/base/power/domain.c | 16 ++++++++++++++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
>>> index cdd547bd67df8218..60e0309dd8dd0264 100644
>>> --- a/drivers/base/power/domain.c
>>> +++ b/drivers/base/power/domain.c
>>> @@ -6,6 +6,7 @@
>>>   * This file is released under the GPLv2.
>>>   */
>>>
>>> +#include <linux/delay.h>
>>>  #include <linux/kernel.h>
>>>  #include <linux/io.h>
>>>  #include <linux/platform_device.h>
>>> @@ -19,6 +20,9 @@
>>>  #include <linux/suspend.h>
>>>  #include <linux/export.h>
>>>
>>> +#define GENPD_RETRIES  20
>>> +#define GENPD_DELAY_US 10
>>> +
>>>  #define GENPD_DEV_CALLBACK(genpd, type, callback, dev)         \
>>>  ({                                                             \
>>>         type (*__routine)(struct device *__d);                  \
>>> @@ -2131,6 +2135,7 @@ EXPORT_SYMBOL_GPL(of_genpd_get_from_provider);
>>>  static void genpd_dev_pm_detach(struct device *dev, bool power_off)
>>>  {
>>>         struct generic_pm_domain *pd;
>>> +       unsigned int i;
>>>         int ret = 0;
>>>
>>>         pd = pm_genpd_lookup_dev(dev);
>>> @@ -2139,10 +2144,13 @@ static void genpd_dev_pm_detach(struct device *dev, bool power_off)
>>>
>>>         dev_dbg(dev, "removing from PM domain %s\n", pd->name);
>>>
>>> -       while (1) {
>>> +       for (i = 0; i < GENPD_RETRIES; i++) {
>>>                 ret = pm_genpd_remove_device(pd, dev);
>>>                 if (ret != -EAGAIN)
>>>                         break;
>>> +
>>> +               if (i > GENPD_RETRIES / 2)
>>> +                       udelay(GENPD_DELAY_US);
>>>                 cond_resched();
>>>         }
>>>
>>> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev)
>>>  {
>>>         struct of_phandle_args pd_args;
>>>         struct generic_pm_domain *pd;
>>> +       unsigned int i;
>>>         int ret;
>>>
>>>         if (!dev->of_node)
>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev)
>>>
>>>         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
>>>
>>> -       while (1) {
>>> +       for (i = 0; i < GENPD_RETRIES; i++) {
>>>                 ret = pm_genpd_add_device(pd, dev);
>>>                 if (ret != -EAGAIN)
>>>                         break;
>>> +
>>> +               if (i > GENPD_RETRIES / 2)
>>> +                       udelay(GENPD_DELAY_US);
>>
>> In this execution path, we retry when getting -EAGAIN while believing
>> the reason to the error are only *temporary* as we are soon waiting
>> for all devices in the genpd to be system PM resumed. At least that's
>> my understanding to why we want to deal with -EAGAIN here, but I might
>> be wrong.
>>
>> In this regards, I wonder whether it could be better to re-try only a
>> few times but with a far longer interval time than a couple us. What
>> do you think?
>
> That's indeed viable. I have no idea for how long this temporary state can
> extend.

A usual approach to this kind of thing is to use exponential fallback
where you increase the delay twice with respect to the previous one
every time.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven June 23, 2015, 1:45 p.m. UTC | #6
Hi Rafael,

On Tue, Jun 23, 2015 at 3:38 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev)
>>>>
>>>>         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
>>>>
>>>> -       while (1) {
>>>> +       for (i = 0; i < GENPD_RETRIES; i++) {
>>>>                 ret = pm_genpd_add_device(pd, dev);
>>>>                 if (ret != -EAGAIN)
>>>>                         break;
>>>> +
>>>> +               if (i > GENPD_RETRIES / 2)
>>>> +                       udelay(GENPD_DELAY_US);
>>>
>>> In this execution path, we retry when getting -EAGAIN while believing
>>> the reason to the error are only *temporary* as we are soon waiting
>>> for all devices in the genpd to be system PM resumed. At least that's
>>> my understanding to why we want to deal with -EAGAIN here, but I might
>>> be wrong.
>>>
>>> In this regards, I wonder whether it could be better to re-try only a
>>> few times but with a far longer interval time than a couple us. What
>>> do you think?
>>
>> That's indeed viable. I have no idea for how long this temporary state can
>> extend.
>
> A usual approach to this kind of thing is to use exponential fallback
> where you increase the delay twice with respect to the previous one
> every time.

Right, but when do you give up?

Note that udelay() is a busy loop. Should it fall back to msleep() after
a while?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson June 24, 2015, 8:33 a.m. UTC | #7
[...]

>>>
>>> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev)
>>>  {
>>>         struct of_phandle_args pd_args;
>>>         struct generic_pm_domain *pd;
>>> +       unsigned int i;
>>>         int ret;
>>>
>>>         if (!dev->of_node)
>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev)
>>>
>>>         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
>>>
>>> -       while (1) {
>>> +       for (i = 0; i < GENPD_RETRIES; i++) {
>>>                 ret = pm_genpd_add_device(pd, dev);
>>>                 if (ret != -EAGAIN)
>>>                         break;
>>> +
>>> +               if (i > GENPD_RETRIES / 2)
>>> +                       udelay(GENPD_DELAY_US);
>>
>> In this execution path, we retry when getting -EAGAIN while believing
>> the reason to the error are only *temporary* as we are soon waiting
>> for all devices in the genpd to be system PM resumed. At least that's
>> my understanding to why we want to deal with -EAGAIN here, but I might
>> be wrong.
>>
>> In this regards, I wonder whether it could be better to re-try only a
>> few times but with a far longer interval time than a couple us. What
>> do you think?
>
> That's indeed viable. I have no idea for how long this temporary state can
> extend.

That will depend on the system PM resume time for the devices residing
in the genpd. So, I guess we need a guestimate then. How about a total
sleep time of a few seconds?

>
>> However, what if the reason to why we get -EAGAIN isn't *temporary*,
>> because we are about to enter system PM suspend state. Then the caller
>> of this function which comes via some bus' ->probe(), will hang until
>> the a system PM resume is completed. Is that really going to work? So,
>> for this case your limited re-try approach will affect this scenario
>> as well, have you considered that?
>
> There's a limit on the number of retries, so it won't hang indefinitely.

What happens with the timer functions (like msleep()) during the
system PM suspend transition?

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven June 24, 2015, 8:35 a.m. UTC | #8
On Wed, Jun 24, 2015 at 10:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> [...]
>
>>>>
>>>> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev)
>>>>  {
>>>>         struct of_phandle_args pd_args;
>>>>         struct generic_pm_domain *pd;
>>>> +       unsigned int i;
>>>>         int ret;
>>>>
>>>>         if (!dev->of_node)
>>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev)
>>>>
>>>>         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
>>>>
>>>> -       while (1) {
>>>> +       for (i = 0; i < GENPD_RETRIES; i++) {
>>>>                 ret = pm_genpd_add_device(pd, dev);
>>>>                 if (ret != -EAGAIN)
>>>>                         break;
>>>> +
>>>> +               if (i > GENPD_RETRIES / 2)
>>>> +                       udelay(GENPD_DELAY_US);
>>>
>>> In this execution path, we retry when getting -EAGAIN while believing
>>> the reason to the error are only *temporary* as we are soon waiting
>>> for all devices in the genpd to be system PM resumed. At least that's
>>> my understanding to why we want to deal with -EAGAIN here, but I might
>>> be wrong.
>>>
>>> In this regards, I wonder whether it could be better to re-try only a
>>> few times but with a far longer interval time than a couple us. What
>>> do you think?
>>
>> That's indeed viable. I have no idea for how long this temporary state can
>> extend.
>
> That will depend on the system PM resume time for the devices residing
> in the genpd. So, I guess we need a guestimate then. How about a total
> sleep time of a few seconds?
>
>>
>>> However, what if the reason to why we get -EAGAIN isn't *temporary*,
>>> because we are about to enter system PM suspend state. Then the caller
>>> of this function which comes via some bus' ->probe(), will hang until
>>> the a system PM resume is completed. Is that really going to work? So,
>>> for this case your limited re-try approach will affect this scenario
>>> as well, have you considered that?
>>
>> There's a limit on the number of retries, so it won't hang indefinitely.
>
> What happens with the timer functions (like msleep()) during the
> system PM suspend transition?

I guess we can no longer call msleep() after syscore suspend?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki June 24, 2015, 1:48 p.m. UTC | #9
On Wednesday, June 24, 2015 10:35:44 AM Geert Uytterhoeven wrote:
> On Wed, Jun 24, 2015 at 10:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > [...]
> >
> >>>>
> >>>> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev)
> >>>>  {
> >>>>         struct of_phandle_args pd_args;
> >>>>         struct generic_pm_domain *pd;
> >>>> +       unsigned int i;
> >>>>         int ret;
> >>>>
> >>>>         if (!dev->of_node)
> >>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev)
> >>>>
> >>>>         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
> >>>>
> >>>> -       while (1) {
> >>>> +       for (i = 0; i < GENPD_RETRIES; i++) {
> >>>>                 ret = pm_genpd_add_device(pd, dev);
> >>>>                 if (ret != -EAGAIN)
> >>>>                         break;
> >>>> +
> >>>> +               if (i > GENPD_RETRIES / 2)
> >>>> +                       udelay(GENPD_DELAY_US);
> >>>
> >>> In this execution path, we retry when getting -EAGAIN while believing
> >>> the reason to the error are only *temporary* as we are soon waiting
> >>> for all devices in the genpd to be system PM resumed. At least that's
> >>> my understanding to why we want to deal with -EAGAIN here, but I might
> >>> be wrong.
> >>>
> >>> In this regards, I wonder whether it could be better to re-try only a
> >>> few times but with a far longer interval time than a couple us. What
> >>> do you think?
> >>
> >> That's indeed viable. I have no idea for how long this temporary state can
> >> extend.
> >
> > That will depend on the system PM resume time for the devices residing
> > in the genpd. So, I guess we need a guestimate then. How about a total
> > sleep time of a few seconds?
> >
> >>
> >>> However, what if the reason to why we get -EAGAIN isn't *temporary*,
> >>> because we are about to enter system PM suspend state. Then the caller
> >>> of this function which comes via some bus' ->probe(), will hang until
> >>> the a system PM resume is completed. Is that really going to work? So,
> >>> for this case your limited re-try approach will affect this scenario
> >>> as well, have you considered that?
> >>
> >> There's a limit on the number of retries, so it won't hang indefinitely.
> >
> > What happens with the timer functions (like msleep()) during the
> > system PM suspend transition?
> 
> I guess we can no longer call msleep() after syscore suspend?

That's correct.  Time is effectively frozen at that point.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki June 24, 2015, 2:10 p.m. UTC | #10
On Tuesday, June 23, 2015 03:45:43 PM Geert Uytterhoeven wrote:
> Hi Rafael,
> 
> On Tue, Jun 23, 2015 at 3:38 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev)
> >>>>
> >>>>         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
> >>>>
> >>>> -       while (1) {
> >>>> +       for (i = 0; i < GENPD_RETRIES; i++) {
> >>>>                 ret = pm_genpd_add_device(pd, dev);
> >>>>                 if (ret != -EAGAIN)
> >>>>                         break;
> >>>> +
> >>>> +               if (i > GENPD_RETRIES / 2)
> >>>> +                       udelay(GENPD_DELAY_US);
> >>>
> >>> In this execution path, we retry when getting -EAGAIN while believing
> >>> the reason to the error are only *temporary* as we are soon waiting
> >>> for all devices in the genpd to be system PM resumed. At least that's
> >>> my understanding to why we want to deal with -EAGAIN here, but I might
> >>> be wrong.
> >>>
> >>> In this regards, I wonder whether it could be better to re-try only a
> >>> few times but with a far longer interval time than a couple us. What
> >>> do you think?
> >>
> >> That's indeed viable. I have no idea for how long this temporary state can
> >> extend.
> >
> > A usual approach to this kind of thing is to use exponential fallback
> > where you increase the delay twice with respect to the previous one
> > every time.
> 
> Right, but when do you give up?

Well, I guess you know what a reasonable timeout should be?

> Note that udelay() is a busy loop. Should it fall back to msleep() after
> a while?

If we can't fall back to msleep() at one point, you may as well simply poll
periodically as you did originally.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index cdd547bd67df8218..60e0309dd8dd0264 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -6,6 +6,7 @@ 
  * This file is released under the GPLv2.
  */
 
+#include <linux/delay.h>
 #include <linux/kernel.h>
 #include <linux/io.h>
 #include <linux/platform_device.h>
@@ -19,6 +20,9 @@ 
 #include <linux/suspend.h>
 #include <linux/export.h>
 
+#define GENPD_RETRIES	20
+#define GENPD_DELAY_US	10
+
 #define GENPD_DEV_CALLBACK(genpd, type, callback, dev)		\
 ({								\
 	type (*__routine)(struct device *__d); 			\
@@ -2131,6 +2135,7 @@  EXPORT_SYMBOL_GPL(of_genpd_get_from_provider);
 static void genpd_dev_pm_detach(struct device *dev, bool power_off)
 {
 	struct generic_pm_domain *pd;
+	unsigned int i;
 	int ret = 0;
 
 	pd = pm_genpd_lookup_dev(dev);
@@ -2139,10 +2144,13 @@  static void genpd_dev_pm_detach(struct device *dev, bool power_off)
 
 	dev_dbg(dev, "removing from PM domain %s\n", pd->name);
 
-	while (1) {
+	for (i = 0; i < GENPD_RETRIES; i++) {
 		ret = pm_genpd_remove_device(pd, dev);
 		if (ret != -EAGAIN)
 			break;
+
+		if (i > GENPD_RETRIES / 2)
+			udelay(GENPD_DELAY_US);
 		cond_resched();
 	}
 
@@ -2183,6 +2191,7 @@  int genpd_dev_pm_attach(struct device *dev)
 {
 	struct of_phandle_args pd_args;
 	struct generic_pm_domain *pd;
+	unsigned int i;
 	int ret;
 
 	if (!dev->of_node)
@@ -2218,10 +2227,13 @@  int genpd_dev_pm_attach(struct device *dev)
 
 	dev_dbg(dev, "adding to PM domain %s\n", pd->name);
 
-	while (1) {
+	for (i = 0; i < GENPD_RETRIES; i++) {
 		ret = pm_genpd_add_device(pd, dev);
 		if (ret != -EAGAIN)
 			break;
+
+		if (i > GENPD_RETRIES / 2)
+			udelay(GENPD_DELAY_US);
 		cond_resched();
 	}