Message ID | 1434958282-27376-1-git-send-email-geert+renesas@glider.be (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Geert Uytterhoeven |
Headers | show |
On Monday, June 22, 2015 09:31:22 AM Geert Uytterhoeven wrote: > If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end > up with an infinite loop in genpd_dev_pm_{at,de}tach(). > > This may happen due to a genpd.prepared_count imbalance. This is a bug > elsewhere, but it will result in a system lock up, possibly during > reboot of an otherwise functioning system. > > To avoid this, put a limit on the maximum number of loop iterations, > including a simple back-off mechanism. If the limit is reached, the > operation will just fail. An error message is already printed. > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> This was too late for my first 4.2 pull request, but I may push it in the second half of the merge window. Does it depend on anything? Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in
Hi Rafael, On Tue, Jun 23, 2015 at 1:41 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > On Monday, June 22, 2015 09:31:22 AM Geert Uytterhoeven wrote: >> If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end >> up with an infinite loop in genpd_dev_pm_{at,de}tach(). >> >> This may happen due to a genpd.prepared_count imbalance. This is a bug >> elsewhere, but it will result in a system lock up, possibly during >> reboot of an otherwise functioning system. >> >> To avoid this, put a limit on the maximum number of loop iterations, >> including a simple back-off mechanism. If the limit is reached, the >> operation will just fail. An error message is already printed. >> >> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > This was too late for my first 4.2 pull request, but I may push it in the > second half of the merge window. Does it depend on anything? Not that I'm aware of. It applies fine on v4.1 or next-20150622. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in
On 22 June 2015 at 09:31, Geert Uytterhoeven <geert+renesas@glider.be> wrote: > If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end > up with an infinite loop in genpd_dev_pm_{at,de}tach(). > > This may happen due to a genpd.prepared_count imbalance. This is a bug > elsewhere, but it will result in a system lock up, possibly during > reboot of an otherwise functioning system. > > To avoid this, put a limit on the maximum number of loop iterations, > including a simple back-off mechanism. If the limit is reached, the > operation will just fail. An error message is already printed. > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > --- > drivers/base/power/domain.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c > index cdd547bd67df8218..60e0309dd8dd0264 100644 > --- a/drivers/base/power/domain.c > +++ b/drivers/base/power/domain.c > @@ -6,6 +6,7 @@ > * This file is released under the GPLv2. > */ > > +#include <linux/delay.h> > #include <linux/kernel.h> > #include <linux/io.h> > #include <linux/platform_device.h> > @@ -19,6 +20,9 @@ > #include <linux/suspend.h> > #include <linux/export.h> > > +#define GENPD_RETRIES 20 > +#define GENPD_DELAY_US 10 > + > #define GENPD_DEV_CALLBACK(genpd, type, callback, dev) \ > ({ \ > type (*__routine)(struct device *__d); \ > @@ -2131,6 +2135,7 @@ EXPORT_SYMBOL_GPL(of_genpd_get_from_provider); > static void genpd_dev_pm_detach(struct device *dev, bool power_off) > { > struct generic_pm_domain *pd; > + unsigned int i; > int ret = 0; > > pd = pm_genpd_lookup_dev(dev); > @@ -2139,10 +2144,13 @@ static void genpd_dev_pm_detach(struct device *dev, bool power_off) > > dev_dbg(dev, "removing from PM domain %s\n", pd->name); > > - while (1) { > + for (i = 0; i < GENPD_RETRIES; i++) { > ret = pm_genpd_remove_device(pd, dev); > if (ret != -EAGAIN) > break; > + > + if (i > GENPD_RETRIES / 2) > + udelay(GENPD_DELAY_US); > cond_resched(); > } > > @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev) > { > struct of_phandle_args pd_args; > struct generic_pm_domain *pd; > + unsigned int i; > int ret; > > if (!dev->of_node) > @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev) > > dev_dbg(dev, "adding to PM domain %s\n", pd->name); > > - while (1) { > + for (i = 0; i < GENPD_RETRIES; i++) { > ret = pm_genpd_add_device(pd, dev); > if (ret != -EAGAIN) > break; > + > + if (i > GENPD_RETRIES / 2) > + udelay(GENPD_DELAY_US); In this execution path, we retry when getting -EAGAIN while believing the reason to the error are only *temporary* as we are soon waiting for all devices in the genpd to be system PM resumed. At least that's my understanding to why we want to deal with -EAGAIN here, but I might be wrong. In this regards, I wonder whether it could be better to re-try only a few times but with a far longer interval time than a couple us. What do you think? However, what if the reason to why we get -EAGAIN isn't *temporary*, because we are about to enter system PM suspend state. Then the caller of this function which comes via some bus' ->probe(), will hang until the a system PM resume is completed. Is that really going to work? So, for this case your limited re-try approach will affect this scenario as well, have you considered that? > cond_resched(); > } > > -- > 1.9.1 > Kind regards Uffe -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Ulf, On Tue, Jun 23, 2015 at 2:50 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > On 22 June 2015 at 09:31, Geert Uytterhoeven <geert+renesas@glider.be> wrote: >> If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end >> up with an infinite loop in genpd_dev_pm_{at,de}tach(). >> >> This may happen due to a genpd.prepared_count imbalance. This is a bug >> elsewhere, but it will result in a system lock up, possibly during >> reboot of an otherwise functioning system. >> >> To avoid this, put a limit on the maximum number of loop iterations, >> including a simple back-off mechanism. If the limit is reached, the >> operation will just fail. An error message is already printed. >> >> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> >> --- >> drivers/base/power/domain.c | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c >> index cdd547bd67df8218..60e0309dd8dd0264 100644 >> --- a/drivers/base/power/domain.c >> +++ b/drivers/base/power/domain.c >> @@ -6,6 +6,7 @@ >> * This file is released under the GPLv2. >> */ >> >> +#include <linux/delay.h> >> #include <linux/kernel.h> >> #include <linux/io.h> >> #include <linux/platform_device.h> >> @@ -19,6 +20,9 @@ >> #include <linux/suspend.h> >> #include <linux/export.h> >> >> +#define GENPD_RETRIES 20 >> +#define GENPD_DELAY_US 10 >> + >> #define GENPD_DEV_CALLBACK(genpd, type, callback, dev) \ >> ({ \ >> type (*__routine)(struct device *__d); \ >> @@ -2131,6 +2135,7 @@ EXPORT_SYMBOL_GPL(of_genpd_get_from_provider); >> static void genpd_dev_pm_detach(struct device *dev, bool power_off) >> { >> struct generic_pm_domain *pd; >> + unsigned int i; >> int ret = 0; >> >> pd = pm_genpd_lookup_dev(dev); >> @@ -2139,10 +2144,13 @@ static void genpd_dev_pm_detach(struct device *dev, bool power_off) >> >> dev_dbg(dev, "removing from PM domain %s\n", pd->name); >> >> - while (1) { >> + for (i = 0; i < GENPD_RETRIES; i++) { >> ret = pm_genpd_remove_device(pd, dev); >> if (ret != -EAGAIN) >> break; >> + >> + if (i > GENPD_RETRIES / 2) >> + udelay(GENPD_DELAY_US); >> cond_resched(); >> } >> >> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev) >> { >> struct of_phandle_args pd_args; >> struct generic_pm_domain *pd; >> + unsigned int i; >> int ret; >> >> if (!dev->of_node) >> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev) >> >> dev_dbg(dev, "adding to PM domain %s\n", pd->name); >> >> - while (1) { >> + for (i = 0; i < GENPD_RETRIES; i++) { >> ret = pm_genpd_add_device(pd, dev); >> if (ret != -EAGAIN) >> break; >> + >> + if (i > GENPD_RETRIES / 2) >> + udelay(GENPD_DELAY_US); > > In this execution path, we retry when getting -EAGAIN while believing > the reason to the error are only *temporary* as we are soon waiting > for all devices in the genpd to be system PM resumed. At least that's > my understanding to why we want to deal with -EAGAIN here, but I might > be wrong. > > In this regards, I wonder whether it could be better to re-try only a > few times but with a far longer interval time than a couple us. What > do you think? That's indeed viable. I have no idea for how long this temporary state can extend. > However, what if the reason to why we get -EAGAIN isn't *temporary*, > because we are about to enter system PM suspend state. Then the caller > of this function which comes via some bus' ->probe(), will hang until > the a system PM resume is completed. Is that really going to work? So, > for this case your limited re-try approach will affect this scenario > as well, have you considered that? There's a limit on the number of retries, so it won't hang indefinitely. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 23, 2015 at 3:20 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > Hi Ulf, > > On Tue, Jun 23, 2015 at 2:50 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote: >> On 22 June 2015 at 09:31, Geert Uytterhoeven <geert+renesas@glider.be> wrote: >>> If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end >>> up with an infinite loop in genpd_dev_pm_{at,de}tach(). >>> >>> This may happen due to a genpd.prepared_count imbalance. This is a bug >>> elsewhere, but it will result in a system lock up, possibly during >>> reboot of an otherwise functioning system. >>> >>> To avoid this, put a limit on the maximum number of loop iterations, >>> including a simple back-off mechanism. If the limit is reached, the >>> operation will just fail. An error message is already printed. >>> >>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> >>> --- >>> drivers/base/power/domain.c | 16 ++++++++++++++-- >>> 1 file changed, 14 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c >>> index cdd547bd67df8218..60e0309dd8dd0264 100644 >>> --- a/drivers/base/power/domain.c >>> +++ b/drivers/base/power/domain.c >>> @@ -6,6 +6,7 @@ >>> * This file is released under the GPLv2. >>> */ >>> >>> +#include <linux/delay.h> >>> #include <linux/kernel.h> >>> #include <linux/io.h> >>> #include <linux/platform_device.h> >>> @@ -19,6 +20,9 @@ >>> #include <linux/suspend.h> >>> #include <linux/export.h> >>> >>> +#define GENPD_RETRIES 20 >>> +#define GENPD_DELAY_US 10 >>> + >>> #define GENPD_DEV_CALLBACK(genpd, type, callback, dev) \ >>> ({ \ >>> type (*__routine)(struct device *__d); \ >>> @@ -2131,6 +2135,7 @@ EXPORT_SYMBOL_GPL(of_genpd_get_from_provider); >>> static void genpd_dev_pm_detach(struct device *dev, bool power_off) >>> { >>> struct generic_pm_domain *pd; >>> + unsigned int i; >>> int ret = 0; >>> >>> pd = pm_genpd_lookup_dev(dev); >>> @@ -2139,10 +2144,13 @@ static void genpd_dev_pm_detach(struct device *dev, bool power_off) >>> >>> dev_dbg(dev, "removing from PM domain %s\n", pd->name); >>> >>> - while (1) { >>> + for (i = 0; i < GENPD_RETRIES; i++) { >>> ret = pm_genpd_remove_device(pd, dev); >>> if (ret != -EAGAIN) >>> break; >>> + >>> + if (i > GENPD_RETRIES / 2) >>> + udelay(GENPD_DELAY_US); >>> cond_resched(); >>> } >>> >>> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev) >>> { >>> struct of_phandle_args pd_args; >>> struct generic_pm_domain *pd; >>> + unsigned int i; >>> int ret; >>> >>> if (!dev->of_node) >>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev) >>> >>> dev_dbg(dev, "adding to PM domain %s\n", pd->name); >>> >>> - while (1) { >>> + for (i = 0; i < GENPD_RETRIES; i++) { >>> ret = pm_genpd_add_device(pd, dev); >>> if (ret != -EAGAIN) >>> break; >>> + >>> + if (i > GENPD_RETRIES / 2) >>> + udelay(GENPD_DELAY_US); >> >> In this execution path, we retry when getting -EAGAIN while believing >> the reason to the error are only *temporary* as we are soon waiting >> for all devices in the genpd to be system PM resumed. At least that's >> my understanding to why we want to deal with -EAGAIN here, but I might >> be wrong. >> >> In this regards, I wonder whether it could be better to re-try only a >> few times but with a far longer interval time than a couple us. What >> do you think? > > That's indeed viable. I have no idea for how long this temporary state can > extend. A usual approach to this kind of thing is to use exponential fallback where you increase the delay twice with respect to the previous one every time. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Rafael, On Tue, Jun 23, 2015 at 3:38 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: >>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev) >>>> >>>> dev_dbg(dev, "adding to PM domain %s\n", pd->name); >>>> >>>> - while (1) { >>>> + for (i = 0; i < GENPD_RETRIES; i++) { >>>> ret = pm_genpd_add_device(pd, dev); >>>> if (ret != -EAGAIN) >>>> break; >>>> + >>>> + if (i > GENPD_RETRIES / 2) >>>> + udelay(GENPD_DELAY_US); >>> >>> In this execution path, we retry when getting -EAGAIN while believing >>> the reason to the error are only *temporary* as we are soon waiting >>> for all devices in the genpd to be system PM resumed. At least that's >>> my understanding to why we want to deal with -EAGAIN here, but I might >>> be wrong. >>> >>> In this regards, I wonder whether it could be better to re-try only a >>> few times but with a far longer interval time than a couple us. What >>> do you think? >> >> That's indeed viable. I have no idea for how long this temporary state can >> extend. > > A usual approach to this kind of thing is to use exponential fallback > where you increase the delay twice with respect to the previous one > every time. Right, but when do you give up? Note that udelay() is a busy loop. Should it fall back to msleep() after a while? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[...] >>> >>> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev) >>> { >>> struct of_phandle_args pd_args; >>> struct generic_pm_domain *pd; >>> + unsigned int i; >>> int ret; >>> >>> if (!dev->of_node) >>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev) >>> >>> dev_dbg(dev, "adding to PM domain %s\n", pd->name); >>> >>> - while (1) { >>> + for (i = 0; i < GENPD_RETRIES; i++) { >>> ret = pm_genpd_add_device(pd, dev); >>> if (ret != -EAGAIN) >>> break; >>> + >>> + if (i > GENPD_RETRIES / 2) >>> + udelay(GENPD_DELAY_US); >> >> In this execution path, we retry when getting -EAGAIN while believing >> the reason to the error are only *temporary* as we are soon waiting >> for all devices in the genpd to be system PM resumed. At least that's >> my understanding to why we want to deal with -EAGAIN here, but I might >> be wrong. >> >> In this regards, I wonder whether it could be better to re-try only a >> few times but with a far longer interval time than a couple us. What >> do you think? > > That's indeed viable. I have no idea for how long this temporary state can > extend. That will depend on the system PM resume time for the devices residing in the genpd. So, I guess we need a guestimate then. How about a total sleep time of a few seconds? > >> However, what if the reason to why we get -EAGAIN isn't *temporary*, >> because we are about to enter system PM suspend state. Then the caller >> of this function which comes via some bus' ->probe(), will hang until >> the a system PM resume is completed. Is that really going to work? So, >> for this case your limited re-try approach will affect this scenario >> as well, have you considered that? > > There's a limit on the number of retries, so it won't hang indefinitely. What happens with the timer functions (like msleep()) during the system PM suspend transition? Kind regards Uffe -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 24, 2015 at 10:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > [...] > >>>> >>>> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev) >>>> { >>>> struct of_phandle_args pd_args; >>>> struct generic_pm_domain *pd; >>>> + unsigned int i; >>>> int ret; >>>> >>>> if (!dev->of_node) >>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev) >>>> >>>> dev_dbg(dev, "adding to PM domain %s\n", pd->name); >>>> >>>> - while (1) { >>>> + for (i = 0; i < GENPD_RETRIES; i++) { >>>> ret = pm_genpd_add_device(pd, dev); >>>> if (ret != -EAGAIN) >>>> break; >>>> + >>>> + if (i > GENPD_RETRIES / 2) >>>> + udelay(GENPD_DELAY_US); >>> >>> In this execution path, we retry when getting -EAGAIN while believing >>> the reason to the error are only *temporary* as we are soon waiting >>> for all devices in the genpd to be system PM resumed. At least that's >>> my understanding to why we want to deal with -EAGAIN here, but I might >>> be wrong. >>> >>> In this regards, I wonder whether it could be better to re-try only a >>> few times but with a far longer interval time than a couple us. What >>> do you think? >> >> That's indeed viable. I have no idea for how long this temporary state can >> extend. > > That will depend on the system PM resume time for the devices residing > in the genpd. So, I guess we need a guestimate then. How about a total > sleep time of a few seconds? > >> >>> However, what if the reason to why we get -EAGAIN isn't *temporary*, >>> because we are about to enter system PM suspend state. Then the caller >>> of this function which comes via some bus' ->probe(), will hang until >>> the a system PM resume is completed. Is that really going to work? So, >>> for this case your limited re-try approach will affect this scenario >>> as well, have you considered that? >> >> There's a limit on the number of retries, so it won't hang indefinitely. > > What happens with the timer functions (like msleep()) during the > system PM suspend transition? I guess we can no longer call msleep() after syscore suspend? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday, June 24, 2015 10:35:44 AM Geert Uytterhoeven wrote: > On Wed, Jun 24, 2015 at 10:33 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > > [...] > > > >>>> > >>>> @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev) > >>>> { > >>>> struct of_phandle_args pd_args; > >>>> struct generic_pm_domain *pd; > >>>> + unsigned int i; > >>>> int ret; > >>>> > >>>> if (!dev->of_node) > >>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev) > >>>> > >>>> dev_dbg(dev, "adding to PM domain %s\n", pd->name); > >>>> > >>>> - while (1) { > >>>> + for (i = 0; i < GENPD_RETRIES; i++) { > >>>> ret = pm_genpd_add_device(pd, dev); > >>>> if (ret != -EAGAIN) > >>>> break; > >>>> + > >>>> + if (i > GENPD_RETRIES / 2) > >>>> + udelay(GENPD_DELAY_US); > >>> > >>> In this execution path, we retry when getting -EAGAIN while believing > >>> the reason to the error are only *temporary* as we are soon waiting > >>> for all devices in the genpd to be system PM resumed. At least that's > >>> my understanding to why we want to deal with -EAGAIN here, but I might > >>> be wrong. > >>> > >>> In this regards, I wonder whether it could be better to re-try only a > >>> few times but with a far longer interval time than a couple us. What > >>> do you think? > >> > >> That's indeed viable. I have no idea for how long this temporary state can > >> extend. > > > > That will depend on the system PM resume time for the devices residing > > in the genpd. So, I guess we need a guestimate then. How about a total > > sleep time of a few seconds? > > > >> > >>> However, what if the reason to why we get -EAGAIN isn't *temporary*, > >>> because we are about to enter system PM suspend state. Then the caller > >>> of this function which comes via some bus' ->probe(), will hang until > >>> the a system PM resume is completed. Is that really going to work? So, > >>> for this case your limited re-try approach will affect this scenario > >>> as well, have you considered that? > >> > >> There's a limit on the number of retries, so it won't hang indefinitely. > > > > What happens with the timer functions (like msleep()) during the > > system PM suspend transition? > > I guess we can no longer call msleep() after syscore suspend? That's correct. Time is effectively frozen at that point. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday, June 23, 2015 03:45:43 PM Geert Uytterhoeven wrote: > Hi Rafael, > > On Tue, Jun 23, 2015 at 3:38 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > >>>> @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev) > >>>> > >>>> dev_dbg(dev, "adding to PM domain %s\n", pd->name); > >>>> > >>>> - while (1) { > >>>> + for (i = 0; i < GENPD_RETRIES; i++) { > >>>> ret = pm_genpd_add_device(pd, dev); > >>>> if (ret != -EAGAIN) > >>>> break; > >>>> + > >>>> + if (i > GENPD_RETRIES / 2) > >>>> + udelay(GENPD_DELAY_US); > >>> > >>> In this execution path, we retry when getting -EAGAIN while believing > >>> the reason to the error are only *temporary* as we are soon waiting > >>> for all devices in the genpd to be system PM resumed. At least that's > >>> my understanding to why we want to deal with -EAGAIN here, but I might > >>> be wrong. > >>> > >>> In this regards, I wonder whether it could be better to re-try only a > >>> few times but with a far longer interval time than a couple us. What > >>> do you think? > >> > >> That's indeed viable. I have no idea for how long this temporary state can > >> extend. > > > > A usual approach to this kind of thing is to use exponential fallback > > where you increase the delay twice with respect to the previous one > > every time. > > Right, but when do you give up? Well, I guess you know what a reasonable timeout should be? > Note that udelay() is a busy loop. Should it fall back to msleep() after > a while? If we can't fall back to msleep() at one point, you may as well simply poll periodically as you did originally. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index cdd547bd67df8218..60e0309dd8dd0264 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -6,6 +6,7 @@ * This file is released under the GPLv2. */ +#include <linux/delay.h> #include <linux/kernel.h> #include <linux/io.h> #include <linux/platform_device.h> @@ -19,6 +20,9 @@ #include <linux/suspend.h> #include <linux/export.h> +#define GENPD_RETRIES 20 +#define GENPD_DELAY_US 10 + #define GENPD_DEV_CALLBACK(genpd, type, callback, dev) \ ({ \ type (*__routine)(struct device *__d); \ @@ -2131,6 +2135,7 @@ EXPORT_SYMBOL_GPL(of_genpd_get_from_provider); static void genpd_dev_pm_detach(struct device *dev, bool power_off) { struct generic_pm_domain *pd; + unsigned int i; int ret = 0; pd = pm_genpd_lookup_dev(dev); @@ -2139,10 +2144,13 @@ static void genpd_dev_pm_detach(struct device *dev, bool power_off) dev_dbg(dev, "removing from PM domain %s\n", pd->name); - while (1) { + for (i = 0; i < GENPD_RETRIES; i++) { ret = pm_genpd_remove_device(pd, dev); if (ret != -EAGAIN) break; + + if (i > GENPD_RETRIES / 2) + udelay(GENPD_DELAY_US); cond_resched(); } @@ -2183,6 +2191,7 @@ int genpd_dev_pm_attach(struct device *dev) { struct of_phandle_args pd_args; struct generic_pm_domain *pd; + unsigned int i; int ret; if (!dev->of_node) @@ -2218,10 +2227,13 @@ int genpd_dev_pm_attach(struct device *dev) dev_dbg(dev, "adding to PM domain %s\n", pd->name); - while (1) { + for (i = 0; i < GENPD_RETRIES; i++) { ret = pm_genpd_add_device(pd, dev); if (ret != -EAGAIN) break; + + if (i > GENPD_RETRIES / 2) + udelay(GENPD_DELAY_US); cond_resched(); }
If pm_genpd_{add,remove}_device() keeps on failing with -EAGAIN, we end up with an infinite loop in genpd_dev_pm_{at,de}tach(). This may happen due to a genpd.prepared_count imbalance. This is a bug elsewhere, but it will result in a system lock up, possibly during reboot of an otherwise functioning system. To avoid this, put a limit on the maximum number of loop iterations, including a simple back-off mechanism. If the limit is reached, the operation will just fail. An error message is already printed. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> --- drivers/base/power/domain.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)