Message ID | 20210926224058.1252-14-digetx@gmail.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | NVIDIA Tegra power management patches for 5.16 | expand |
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote: > > Add runtime power management and support generic power domains. > > Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 > Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20 > Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124 > Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 > Signed-off-by: Dmitry Osipenko <digetx@gmail.com> > --- > drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++-- [...] > static int gr2d_remove(struct platform_device *pdev) > @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev) > return err; > } > > + pm_runtime_dont_use_autosuspend(&pdev->dev); > + pm_runtime_disable(&pdev->dev); There is no guarantee that the ->runtime_suspend() has been invoked here, which means that clock may be left prepared/enabled beyond this point. I suggest you call pm_runtime_force_suspend(), instead of pm_runtime_disable(), to make sure that gets done. [...] Kind regards Uffe
01.10.2021 16:39, Ulf Hansson пишет: > On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote: >> >> Add runtime power management and support generic power domains. >> >> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 >> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20 >> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124 >> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 >> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> >> --- >> drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++-- > > [...] > >> static int gr2d_remove(struct platform_device *pdev) >> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev) >> return err; >> } >> >> + pm_runtime_dont_use_autosuspend(&pdev->dev); >> + pm_runtime_disable(&pdev->dev); > > There is no guarantee that the ->runtime_suspend() has been invoked > here, which means that clock may be left prepared/enabled beyond this > point. > > I suggest you call pm_runtime_force_suspend(), instead of > pm_runtime_disable(), to make sure that gets done. The pm_runtime_disable() performs the final synchronization, please see [1]. [1] https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412 Calling pm_runtime_force_suspend() isn't correct because each 'enable' must have the corresponding 'disable'. Hence there is no problem here.
On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote: > > 01.10.2021 16:39, Ulf Hansson пишет: > > On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote: > >> > >> Add runtime power management and support generic power domains. > >> > >> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 > >> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20 > >> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124 > >> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 > >> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> > >> --- > >> drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++-- > > > > [...] > > > >> static int gr2d_remove(struct platform_device *pdev) > >> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev) > >> return err; > >> } > >> > >> + pm_runtime_dont_use_autosuspend(&pdev->dev); > >> + pm_runtime_disable(&pdev->dev); > > > > There is no guarantee that the ->runtime_suspend() has been invoked > > here, which means that clock may be left prepared/enabled beyond this > > point. > > > > I suggest you call pm_runtime_force_suspend(), instead of > > pm_runtime_disable(), to make sure that gets done. > > The pm_runtime_disable() performs the final synchronization, please see [1]. > > [1] > https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412 pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls cancel_work_sync() if dev->power.request_pending has been set. If the work that was punted to the pm_wq in rpm_idle() has not been started yet, we end up just canceling it. In other words, there are no guarantees it runs to completion. Moreover, use space may have bumped the usage count via sysfs for the device (pm_runtime_forbid()) to keep the device runtime resumed. > > Calling pm_runtime_force_suspend() isn't correct because each 'enable' > must have the corresponding 'disable'. Hence there is no problem here. pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that should be fine. No? Kind regards Uffe
01.10.2021 17:55, Ulf Hansson пишет: > On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote: >> >> 01.10.2021 16:39, Ulf Hansson пишет: >>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote: >>>> >>>> Add runtime power management and support generic power domains. >>>> >>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 >>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20 >>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124 >>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 >>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> >>>> --- >>>> drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++-- >>> >>> [...] >>> >>>> static int gr2d_remove(struct platform_device *pdev) >>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev) >>>> return err; >>>> } >>>> >>>> + pm_runtime_dont_use_autosuspend(&pdev->dev); >>>> + pm_runtime_disable(&pdev->dev); >>> >>> There is no guarantee that the ->runtime_suspend() has been invoked >>> here, which means that clock may be left prepared/enabled beyond this >>> point. >>> >>> I suggest you call pm_runtime_force_suspend(), instead of >>> pm_runtime_disable(), to make sure that gets done. >> >> The pm_runtime_disable() performs the final synchronization, please see [1]. >> >> [1] >> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412 > > pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls > cancel_work_sync() if dev->power.request_pending has been set. > > If the work that was punted to the pm_wq in rpm_idle() has not been > started yet, we end up just canceling it. In other words, there are no > guarantees it runs to completion. You're right. Although, in a case of this particular patch, the syncing is actually implicitly done by pm_runtime_dont_use_autosuspend(). But for drivers which don't use auto-suspend, there is no sync. This looks like a disaster, it's a very common pattern for drivers to 'put+disable'. > Moreover, use space may have bumped the usage count via sysfs for the > device (pm_runtime_forbid()) to keep the device runtime resumed. Right, this is also a disaster in a case of driver removal. >> Calling pm_runtime_force_suspend() isn't correct because each 'enable' >> must have the corresponding 'disable'. Hence there is no problem here. > > pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that > should be fine. No? [adding Rafael] Rafael, could you please explain how drivers are supposed to properly suspend and disable RPM to cut off power and reset state that was altered by the driver's resume callback? What we're missing? Is Ulf's suggestion acceptable? The RPM state of a device is getting reset on driver's removal, hence all refcounts that were bumped by the rpm-resume callback of the device driver will be screwed up if device is kept resumed after removal. I just verified that it's true in practice.
On Fri, 1 Oct 2021 at 21:00, Dmitry Osipenko <digetx@gmail.com> wrote: > > 01.10.2021 17:55, Ulf Hansson пишет: > > On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote: > >> > >> 01.10.2021 16:39, Ulf Hansson пишет: > >>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote: > >>>> > >>>> Add runtime power management and support generic power domains. > >>>> > >>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 > >>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20 > >>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124 > >>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 > >>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> > >>>> --- > >>>> drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++-- > >>> > >>> [...] > >>> > >>>> static int gr2d_remove(struct platform_device *pdev) > >>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev) > >>>> return err; > >>>> } > >>>> > >>>> + pm_runtime_dont_use_autosuspend(&pdev->dev); > >>>> + pm_runtime_disable(&pdev->dev); > >>> > >>> There is no guarantee that the ->runtime_suspend() has been invoked > >>> here, which means that clock may be left prepared/enabled beyond this > >>> point. > >>> > >>> I suggest you call pm_runtime_force_suspend(), instead of > >>> pm_runtime_disable(), to make sure that gets done. > >> > >> The pm_runtime_disable() performs the final synchronization, please see [1]. > >> > >> [1] > >> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412 > > > > pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls > > cancel_work_sync() if dev->power.request_pending has been set. > > > > If the work that was punted to the pm_wq in rpm_idle() has not been > > started yet, we end up just canceling it. In other words, there are no > > guarantees it runs to completion. > > You're right. Although, in a case of this particular patch, the syncing > is actually implicitly done by pm_runtime_dont_use_autosuspend(). > > But for drivers which don't use auto-suspend, there is no sync. This > looks like a disaster, it's a very common pattern for drivers to > 'put+disable'. > > > Moreover, use space may have bumped the usage count via sysfs for the > > device (pm_runtime_forbid()) to keep the device runtime resumed. > > Right, this is also a disaster in a case of driver removal. > > >> Calling pm_runtime_force_suspend() isn't correct because each 'enable' > >> must have the corresponding 'disable'. Hence there is no problem here. > > > > pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that > > should be fine. No? > > [adding Rafael] > > Rafael, could you please explain how drivers are supposed to properly > suspend and disable RPM to cut off power and reset state that was > altered by the driver's resume callback? What we're missing? Is Ulf's > suggestion acceptable? > > The RPM state of a device is getting reset on driver's removal, hence > all refcounts that were bumped by the rpm-resume callback of the device > driver will be screwed up if device is kept resumed after removal. I > just verified that it's true in practice. Note that, what makes the Tegra drivers a bit special is that they are always built with CONFIG_PM being set (selected from the "SoC" Kconfig). Therefore, pm_runtime_force_suspend() can work for some of these cases. Using this, would potentially avoid the driver from having to runtime resume the device in ->remove(), according to the below generic sequence, which is used in many drivers. pm_runtime_get_sync() clk_disable_unprepare() (+ additional things to turn off the device) pm_runtime_disable() pm_runtime_put_noidle() Kind regards Uffe
04.10.2021 14:01, Ulf Hansson пишет: > On Fri, 1 Oct 2021 at 21:00, Dmitry Osipenko <digetx@gmail.com> wrote: >> >> 01.10.2021 17:55, Ulf Hansson пишет: >>> On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote: >>>> >>>> 01.10.2021 16:39, Ulf Hansson пишет: >>>>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote: >>>>>> >>>>>> Add runtime power management and support generic power domains. >>>>>> >>>>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 >>>>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20 >>>>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124 >>>>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 >>>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> >>>>>> --- >>>>>> drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++-- >>>>> >>>>> [...] >>>>> >>>>>> static int gr2d_remove(struct platform_device *pdev) >>>>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev) >>>>>> return err; >>>>>> } >>>>>> >>>>>> + pm_runtime_dont_use_autosuspend(&pdev->dev); >>>>>> + pm_runtime_disable(&pdev->dev); >>>>> >>>>> There is no guarantee that the ->runtime_suspend() has been invoked >>>>> here, which means that clock may be left prepared/enabled beyond this >>>>> point. >>>>> >>>>> I suggest you call pm_runtime_force_suspend(), instead of >>>>> pm_runtime_disable(), to make sure that gets done. >>>> >>>> The pm_runtime_disable() performs the final synchronization, please see [1]. >>>> >>>> [1] >>>> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412 >>> >>> pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls >>> cancel_work_sync() if dev->power.request_pending has been set. >>> >>> If the work that was punted to the pm_wq in rpm_idle() has not been >>> started yet, we end up just canceling it. In other words, there are no >>> guarantees it runs to completion. >> >> You're right. Although, in a case of this particular patch, the syncing >> is actually implicitly done by pm_runtime_dont_use_autosuspend(). >> >> But for drivers which don't use auto-suspend, there is no sync. This >> looks like a disaster, it's a very common pattern for drivers to >> 'put+disable'. >> >>> Moreover, use space may have bumped the usage count via sysfs for the >>> device (pm_runtime_forbid()) to keep the device runtime resumed. >> >> Right, this is also a disaster in a case of driver removal. >> >>>> Calling pm_runtime_force_suspend() isn't correct because each 'enable' >>>> must have the corresponding 'disable'. Hence there is no problem here. >>> >>> pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that >>> should be fine. No? >> >> [adding Rafael] >> >> Rafael, could you please explain how drivers are supposed to properly >> suspend and disable RPM to cut off power and reset state that was >> altered by the driver's resume callback? What we're missing? Is Ulf's >> suggestion acceptable? >> >> The RPM state of a device is getting reset on driver's removal, hence >> all refcounts that were bumped by the rpm-resume callback of the device >> driver will be screwed up if device is kept resumed after removal. I >> just verified that it's true in practice. > > Note that, what makes the Tegra drivers a bit special is that they are > always built with CONFIG_PM being set (selected from the "SoC" > Kconfig). > > Therefore, pm_runtime_force_suspend() can work for some of these > cases. Using this, would potentially avoid the driver from having to > runtime resume the device in ->remove(), according to the below > generic sequence, which is used in many drivers. > > pm_runtime_get_sync() > clk_disable_unprepare() (+ additional things to turn off the device) > pm_runtime_disable() > pm_runtime_put_noidle() It's not a problem to change this patchset. The problem is that if you'll grep mainline for 'pm_runtime_disable', you will find that there are a lot of drivers in a potential trouble. I'm proposing that we should change pm_runtime_disable() to perform the syncing with this oneliner: diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index ec94049442b9..5c9f28165824 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -1380,6 +1380,8 @@ EXPORT_SYMBOL_GPL(pm_runtime_barrier); */ void __pm_runtime_disable(struct device *dev, bool check_resume) { + flush_work(&dev->power.work); + spin_lock_irq(&dev->power.lock); if (dev->power.disable_depth > 0) { Objections? The sysfs rpm-forbid is a separate problem and it's less troublesome since it requires root privileges. It's also not something that userspace touches casually. For now I don't know what could be done about it.
On Mon, 4 Oct 2021 at 17:57, Dmitry Osipenko <digetx@gmail.com> wrote: > > 04.10.2021 14:01, Ulf Hansson пишет: > > On Fri, 1 Oct 2021 at 21:00, Dmitry Osipenko <digetx@gmail.com> wrote: > >> > >> 01.10.2021 17:55, Ulf Hansson пишет: > >>> On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote: > >>>> > >>>> 01.10.2021 16:39, Ulf Hansson пишет: > >>>>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote: > >>>>>> > >>>>>> Add runtime power management and support generic power domains. > >>>>>> > >>>>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 > >>>>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20 > >>>>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124 > >>>>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 > >>>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> > >>>>>> --- > >>>>>> drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++-- > >>>>> > >>>>> [...] > >>>>> > >>>>>> static int gr2d_remove(struct platform_device *pdev) > >>>>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev) > >>>>>> return err; > >>>>>> } > >>>>>> > >>>>>> + pm_runtime_dont_use_autosuspend(&pdev->dev); > >>>>>> + pm_runtime_disable(&pdev->dev); > >>>>> > >>>>> There is no guarantee that the ->runtime_suspend() has been invoked > >>>>> here, which means that clock may be left prepared/enabled beyond this > >>>>> point. > >>>>> > >>>>> I suggest you call pm_runtime_force_suspend(), instead of > >>>>> pm_runtime_disable(), to make sure that gets done. > >>>> > >>>> The pm_runtime_disable() performs the final synchronization, please see [1]. > >>>> > >>>> [1] > >>>> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412 > >>> > >>> pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls > >>> cancel_work_sync() if dev->power.request_pending has been set. > >>> > >>> If the work that was punted to the pm_wq in rpm_idle() has not been > >>> started yet, we end up just canceling it. In other words, there are no > >>> guarantees it runs to completion. > >> > >> You're right. Although, in a case of this particular patch, the syncing > >> is actually implicitly done by pm_runtime_dont_use_autosuspend(). > >> > >> But for drivers which don't use auto-suspend, there is no sync. This > >> looks like a disaster, it's a very common pattern for drivers to > >> 'put+disable'. > >> > >>> Moreover, use space may have bumped the usage count via sysfs for the > >>> device (pm_runtime_forbid()) to keep the device runtime resumed. > >> > >> Right, this is also a disaster in a case of driver removal. > >> > >>>> Calling pm_runtime_force_suspend() isn't correct because each 'enable' > >>>> must have the corresponding 'disable'. Hence there is no problem here. > >>> > >>> pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that > >>> should be fine. No? > >> > >> [adding Rafael] > >> > >> Rafael, could you please explain how drivers are supposed to properly > >> suspend and disable RPM to cut off power and reset state that was > >> altered by the driver's resume callback? What we're missing? Is Ulf's > >> suggestion acceptable? > >> > >> The RPM state of a device is getting reset on driver's removal, hence > >> all refcounts that were bumped by the rpm-resume callback of the device > >> driver will be screwed up if device is kept resumed after removal. I > >> just verified that it's true in practice. > > > > Note that, what makes the Tegra drivers a bit special is that they are > > always built with CONFIG_PM being set (selected from the "SoC" > > Kconfig). > > > > Therefore, pm_runtime_force_suspend() can work for some of these > > cases. Using this, would potentially avoid the driver from having to > > runtime resume the device in ->remove(), according to the below > > generic sequence, which is used in many drivers. > > > > pm_runtime_get_sync() > > clk_disable_unprepare() (+ additional things to turn off the device) > > pm_runtime_disable() > > pm_runtime_put_noidle() > > It's not a problem to change this patchset. The problem is that if > you'll grep mainline for 'pm_runtime_disable', you will find that there > are a lot of drivers in a potential trouble. Let's start by fixing this patchset, please - then we can consider what to do with the other cases separately. > > I'm proposing that we should change pm_runtime_disable() to perform the > syncing with this oneliner: > > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c > index ec94049442b9..5c9f28165824 100644 > --- a/drivers/base/power/runtime.c > +++ b/drivers/base/power/runtime.c > @@ -1380,6 +1380,8 @@ EXPORT_SYMBOL_GPL(pm_runtime_barrier); > */ > void __pm_runtime_disable(struct device *dev, bool check_resume) > { > + flush_work(&dev->power.work); > + What about the latency this may introduce? I am not sure that is acceptable here!? > spin_lock_irq(&dev->power.lock); > > if (dev->power.disable_depth > 0) { > > Objections? > > The sysfs rpm-forbid is a separate problem and it's less troublesome > since it requires root privileges. It's also not something that > userspace touches casually. For now I don't know what could be done > about it. As I said, the common method to address this problem is to run the following sequence: pm_runtime_get_sync() "power off the device" pm_runtime_disable() pm_runtime_put_noidle() This works even if user space, via sysfs, has triggered a call to pm_runtime_forbid(). Or doesn't it? If you don't like it, pm_runtime_force_suspend() should work too, at least for your cases, I believe. Kind regards Uffe
... >> It's not a problem to change this patchset. The problem is that if >> you'll grep mainline for 'pm_runtime_disable', you will find that there >> are a lot of drivers in a potential trouble. > > Let's start by fixing this patchset, please - then we can consider > what to do with the other cases separately. Yeah, should be better to discuss it separately. ... >> void __pm_runtime_disable(struct device *dev, bool check_resume) >> { >> + flush_work(&dev->power.work); >> + > > What about the latency this may introduce? I am not sure that is > acceptable here!? I'm not aware about any code which relies on the original 'cancelling' behaviour, perhaps Rafael should have more insight. ... >> The sysfs rpm-forbid is a separate problem and it's less troublesome >> since it requires root privileges. It's also not something that >> userspace touches casually. For now I don't know what could be done >> about it. > > As I said, the common method to address this problem is to run the > following sequence: > > pm_runtime_get_sync() > "power off the device" > pm_runtime_disable() > pm_runtime_put_noidle() > > This works even if user space, via sysfs, has triggered a call to > pm_runtime_forbid(). Or doesn't it? > > If you don't like it, pm_runtime_force_suspend() should work too, at > least for your cases, I believe. I'll update the patches, thank you.
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c index de288cba3905..13df8f118f75 100644 --- a/drivers/gpu/drm/tegra/gr2d.c +++ b/drivers/gpu/drm/tegra/gr2d.c @@ -7,11 +7,21 @@ #include <linux/iommu.h> #include <linux/module.h> #include <linux/of_device.h> +#include <linux/pm_runtime.h> +#include <linux/reset.h> + +#include <soc/tegra/common.h> #include "drm.h" #include "gem.h" #include "gr2d.h" +enum { + RST_MC, + RST_GR2D, + RST_GR2D_MAX, +}; + struct gr2d_soc { unsigned int version; }; @@ -21,6 +31,9 @@ struct gr2d { struct host1x_channel *channel; struct clk *clk; + struct reset_control_bulk_data resets[RST_GR2D_MAX]; + unsigned int nresets; + const struct gr2d_soc *soc; DECLARE_BITMAP(addr_regs, GR2D_NUM_REGS); @@ -101,16 +114,24 @@ static int gr2d_open_channel(struct tegra_drm_client *client, struct tegra_drm_context *context) { struct gr2d *gr2d = to_gr2d(client); + int err; context->channel = host1x_channel_get(gr2d->channel); if (!context->channel) return -ENOMEM; + err = pm_runtime_resume_and_get(client->base.dev); + if (err) { + host1x_channel_put(context->channel); + return err; + } + return 0; } static void gr2d_close_channel(struct tegra_drm_context *context) { + pm_runtime_put_sync(context->client->base.dev); host1x_channel_put(context->channel); } @@ -190,6 +211,27 @@ static const u32 gr2d_addr_regs[] = { GR2D_VA_BASE_ADDR_SB, }; +static int gr2d_get_resets(struct device *dev, struct gr2d *gr2d) +{ + int err; + + gr2d->resets[RST_MC].id = "mc"; + gr2d->resets[RST_GR2D].id = "2d"; + gr2d->nresets = RST_GR2D_MAX; + + err = devm_reset_control_bulk_get_optional_exclusive_released( + dev, gr2d->nresets, gr2d->resets); + if (err) { + dev_err(dev, "failed to get reset: %d\n", err); + return err; + } + + if (WARN_ON(!gr2d->resets[RST_GR2D].rstc)) + return -ENOENT; + + return 0; +} + static int gr2d_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -202,6 +244,8 @@ static int gr2d_probe(struct platform_device *pdev) if (!gr2d) return -ENOMEM; + platform_set_drvdata(pdev, gr2d); + gr2d->soc = of_device_get_match_data(dev); syncpts = devm_kzalloc(dev, sizeof(*syncpts), GFP_KERNEL); @@ -214,11 +258,9 @@ static int gr2d_probe(struct platform_device *pdev) return PTR_ERR(gr2d->clk); } - err = clk_prepare_enable(gr2d->clk); - if (err) { - dev_err(dev, "cannot turn on clock\n"); + err = gr2d_get_resets(dev, gr2d); + if (err) return err; - } INIT_LIST_HEAD(&gr2d->client.base.list); gr2d->client.base.ops = &gr2d_client_ops; @@ -231,20 +273,31 @@ static int gr2d_probe(struct platform_device *pdev) gr2d->client.version = gr2d->soc->version; gr2d->client.ops = &gr2d_ops; + pm_runtime_enable(dev); + pm_runtime_use_autosuspend(dev); + pm_runtime_set_autosuspend_delay(dev, 200); + + err = devm_tegra_core_dev_init_opp_table_common(dev); + if (err) + goto disable_rpm; + err = host1x_client_register(&gr2d->client.base); if (err < 0) { dev_err(dev, "failed to register host1x client: %d\n", err); - clk_disable_unprepare(gr2d->clk); - return err; + goto disable_rpm; } /* initialize address register map */ for (i = 0; i < ARRAY_SIZE(gr2d_addr_regs); i++) set_bit(gr2d_addr_regs[i], gr2d->addr_regs); - platform_set_drvdata(pdev, gr2d); - return 0; + +disable_rpm: + pm_runtime_dont_use_autosuspend(dev); + pm_runtime_disable(dev); + + return err; } static int gr2d_remove(struct platform_device *pdev) @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev) return err; } + pm_runtime_dont_use_autosuspend(&pdev->dev); + pm_runtime_disable(&pdev->dev); + + return 0; +} + +static int __maybe_unused gr2d_runtime_suspend(struct device *dev) +{ + struct gr2d *gr2d = dev_get_drvdata(dev); + int err; + + host1x_channel_stop(gr2d->channel); + reset_control_bulk_release(gr2d->nresets, gr2d->resets); + + /* + * GR2D module shouldn't be reset while hardware is idling, otherwise + * host1x's cmdproc will stuck on trying to access any G2 register + * after reset. GR2D module could be either hot-reset or reset after + * power-gating of the HEG partition. Hence we will put in reset only + * the memory client part of the module, the HEG GENPD will take care + * of resetting GR2D module across power-gating. + * + * On Tegra20 there is no HEG partition, but it's okay to have + * undetermined h/w state since userspace is expected to reprogram + * the state on each job submission anyways. + */ + err = reset_control_acquire(gr2d->resets[RST_MC].rstc); + if (err) { + dev_err(dev, "failed to acquire MC reset: %d\n", err); + goto acquire_reset; + } + + err = reset_control_assert(gr2d->resets[RST_MC].rstc); + reset_control_release(gr2d->resets[RST_MC].rstc); + if (err) { + dev_err(dev, "failed to assert MC reset: %d\n", err); + goto acquire_reset; + } + clk_disable_unprepare(gr2d->clk); return 0; + +acquire_reset: + reset_control_bulk_acquire(gr2d->nresets, gr2d->resets); + reset_control_bulk_deassert(gr2d->nresets, gr2d->resets); + + return err; } +static int __maybe_unused gr2d_runtime_resume(struct device *dev) +{ + struct gr2d *gr2d = dev_get_drvdata(dev); + int err; + + err = reset_control_bulk_acquire(gr2d->nresets, gr2d->resets); + if (err) { + dev_err(dev, "failed to acquire reset: %d\n", err); + return err; + } + + err = clk_prepare_enable(gr2d->clk); + if (err) { + dev_err(dev, "failed to enable clock: %d\n", err); + goto release_reset; + } + + /* this is a reset array which deasserts both 2D MC and 2D itself */ + err = reset_control_bulk_deassert(gr2d->nresets, gr2d->resets); + if (err) { + dev_err(dev, "failed to deassert reset: %d\n", err); + goto disable_clk; + } + + return 0; + +disable_clk: + clk_disable_unprepare(gr2d->clk); +release_reset: + reset_control_bulk_release(gr2d->nresets, gr2d->resets); + + return err; +} + +static const struct dev_pm_ops tegra_gr2d_pm = { + SET_RUNTIME_PM_OPS(gr2d_runtime_suspend, gr2d_runtime_resume, NULL) + SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, + pm_runtime_force_resume) +}; + struct platform_driver tegra_gr2d_driver = { .driver = { .name = "tegra-gr2d", .of_match_table = gr2d_match, + .pm = &tegra_gr2d_pm, }, .probe = gr2d_probe, .remove = gr2d_remove,