diff mbox series

[PATCHv2] drm/omap: Fix issue with clocks left on after resume

Message ID 20210428092500.23521-1-tony@atomide.com (mailing list archive)
State New, archived
Headers show
Series [PATCHv2] drm/omap: Fix issue with clocks left on after resume | expand

Commit Message

Tony Lindgren April 28, 2021, 9:25 a.m. UTC
On resume, dispc pm_runtime_force_resume() is not enabling the hardware
as we pass the pm_runtime_need_not_resume() test as the device is suspended
with no child devices.

As the resume continues, omap_atomic_comit_tail() calls dispc_runtime_get()
that calls rpm_resume() enabling the hardware, and increasing child_count
for it's parent device.

But at this point device_complete() has not yet been called for dispc. So
when omap_atomic_comit_tail() calls dispc_runtime_put(), it won't idle
the hardware as rpm_suspend() returns -EBUSY, and the clocks are left on
after resume. The parent child count is not decremented as the -EBUSY
cannot be easily handled until later on after device_complete().

This can be easily seen for example after suspending Beagleboard-X15 with
no displays connected, and by reading the CM_DSS_DSS_CLKCTRL register at
0x4a009120 after resume. After a suspend and resume cycle, it shows a
value of 0x00040102 instead of 0x00070000 like it should.

Let's fix the issue by calling dispc_runtime_suspend() and
dispc_runtime_resume() directly from dispc_suspend() and dispc_resume().
This leaves out the PM runtime related issues for system suspend.

We could handle the issue by adding more calls to dispc_runtime_get()
and dispc_runtime_put() from omap_drm_suspend() and omap_drm_resume()
as suggested by Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>.
But that would just add more inter-component calls and more dependencies
to PM runtime for system suspend and does not make things easier in the
long.

See also earlier commit 88d26136a256 ("PM: Prevent runtime suspend during
system resume") and commit ca8199f13498 ("drm/msm/dpu: ensure device
suspend happens during PM sleep") for more information.

Fixes: ecfdedd7da5d ("drm/omap: force runtime PM suspend on system suspend")
Signed-off-by: Tony Lindgren <tony@atomide.com>
---

Changes since v1:
- Updated the description for a typo noticed by Tomi
- Added more info about what all goes wrong

---
 drivers/gpu/drm/omapdrm/dss/dispc.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

Comments

Laurent Pinchart April 28, 2021, 2:10 p.m. UTC | #1
Hi Tony,

Thank you for the patch.

On Wed, Apr 28, 2021 at 12:25:00PM +0300, Tony Lindgren wrote:
> On resume, dispc pm_runtime_force_resume() is not enabling the hardware
> as we pass the pm_runtime_need_not_resume() test as the device is suspended
> with no child devices.
> 
> As the resume continues, omap_atomic_comit_tail() calls dispc_runtime_get()
> that calls rpm_resume() enabling the hardware, and increasing child_count
> for it's parent device.
> 
> But at this point device_complete() has not yet been called for dispc. So
> when omap_atomic_comit_tail() calls dispc_runtime_put(), it won't idle
> the hardware as rpm_suspend() returns -EBUSY, and the clocks are left on
> after resume. The parent child count is not decremented as the -EBUSY
> cannot be easily handled until later on after device_complete().
> 
> This can be easily seen for example after suspending Beagleboard-X15 with
> no displays connected, and by reading the CM_DSS_DSS_CLKCTRL register at
> 0x4a009120 after resume. After a suspend and resume cycle, it shows a
> value of 0x00040102 instead of 0x00070000 like it should.
> 
> Let's fix the issue by calling dispc_runtime_suspend() and
> dispc_runtime_resume() directly from dispc_suspend() and dispc_resume().
> This leaves out the PM runtime related issues for system suspend.
> 
> We could handle the issue by adding more calls to dispc_runtime_get()
> and dispc_runtime_put() from omap_drm_suspend() and omap_drm_resume()
> as suggested by Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>.
> But that would just add more inter-component calls and more dependencies
> to PM runtime for system suspend and does not make things easier in the
> long.

Based on my experience on the camera and display side with devices that
are made of multiple components, suspend and resume are best handled in
a controlled way by the top-level driver. Otherwise you end up having
different components suspending and resuming in random orders, and
that's a recipe for failure.

Can we get the omapdrm suspend/resume to run first/last, and
stop/restart the whole device from there ?

> See also earlier commit 88d26136a256 ("PM: Prevent runtime suspend during
> system resume") and commit ca8199f13498 ("drm/msm/dpu: ensure device
> suspend happens during PM sleep") for more information.
> 
> Fixes: ecfdedd7da5d ("drm/omap: force runtime PM suspend on system suspend")
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
> 
> Changes since v1:
> - Updated the description for a typo noticed by Tomi
> - Added more info about what all goes wrong
> 
> ---
>  drivers/gpu/drm/omapdrm/dss/dispc.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c b/drivers/gpu/drm/omapdrm/dss/dispc.c
> --- a/drivers/gpu/drm/omapdrm/dss/dispc.c
> +++ b/drivers/gpu/drm/omapdrm/dss/dispc.c
> @@ -182,6 +182,7 @@ struct dispc_device {
>  	const struct dispc_features *feat;
>  
>  	bool is_enabled;
> +	bool needs_resume;
>  
>  	struct regmap *syscon_pol;
>  	u32 syscon_pol_offset;
> @@ -4887,10 +4888,34 @@ static int dispc_runtime_resume(struct device *dev)
>  	return 0;
>  }
>  
> +static int dispc_suspend(struct device *dev)
> +{
> +	struct dispc_device *dispc = dev_get_drvdata(dev);
> +
> +	if (!dispc->is_enabled)
> +		return 0;
> +
> +	dispc->needs_resume = true;
> +
> +	return dispc_runtime_suspend(dev);
> +}
> +
> +static int dispc_resume(struct device *dev)
> +{
> +	struct dispc_device *dispc = dev_get_drvdata(dev);
> +
> +	if (!dispc->needs_resume)
> +		return 0;
> +
> +	dispc->needs_resume = false;
> +
> +	return dispc_runtime_resume(dev);
> +}
> +
>  static const struct dev_pm_ops dispc_pm_ops = {
>  	.runtime_suspend = dispc_runtime_suspend,
>  	.runtime_resume = dispc_runtime_resume,
> -	SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
> +	SET_LATE_SYSTEM_SLEEP_PM_OPS(dispc_suspend, dispc_resume)
>  };
>  
>  struct platform_driver omap_dispchw_driver = {
Tony Lindgren April 29, 2021, 4:46 a.m. UTC | #2
Hi,

* Laurent Pinchart <laurent.pinchart@ideasonboard.com> [210428 14:10]:
> Based on my experience on the camera and display side with devices that
> are made of multiple components, suspend and resume are best handled in
> a controlled way by the top-level driver. Otherwise you end up having
> different components suspending and resuming in random orders, and
> that's a recipe for failure.

Manually suspending and resuming the components should be doable based
on the registered components. However, I'm not sure it buys much in
this case since we do have Linux driver module take care of things for
us for most part :)

The dss hardware is really a private interconnect with a control module,
and a collection of various mostly independent display output device
modules.

We also have the interconnect target module to deal with for each
module, and have the interconnect hierachy mapped in the devicetree.
So we already have Linux driver module take care of the device
hierarchy.

Because the child components are mostly independent, it should not
matter in which order they suspend and resume related to each other.

I think the remaining issue is how dispc should provide services to
the other components.

If dispc needs to be enabled to provide services to the other modules,
maybe there's some better Linux generic framework dispc could implement?
That is other than PM runtime calls for routing the signals to the
output modules? Then PM runtime can be handled private to the dispc
module.

Decoupling the system suspend and resume from PM runtime calls for
all the other dss components should still also be done IMO. But that
can be done as a separate clean-up patches after we have fixed the
$subject issue.

> Can we get the omapdrm suspend/resume to run first/last, and
> stop/restart the whole device from there ?

This is already the case since commit ecfdedd7da5d ("drm/omap: force
runtime PM suspend on system suspend"). We have omap_drv use
SIMPLE_DEV_PM_OPS, and the components use SET_LATE_SYSTEM_SLEEP_PM_OPS.
So omap_drv suspends first and resumes last. The order should not
matter for other components. Well that is as long as we can deal
with dispc providing resources.

I think we really should also change omap_drv use prepare/complete ops,
and have the components use standard SIMPLE_DEV_PM_OPS. That still
won't help with PM runtime related issues for system suspend and
resume though, but leaves out the need for late pm ops.

Regards,

Tony
Tomi Valkeinen May 3, 2021, 8:04 a.m. UTC | #3
On 29/04/2021 07:46, Tony Lindgren wrote:
> Hi,
> 
> * Laurent Pinchart <laurent.pinchart@ideasonboard.com> [210428 14:10]:
>> Based on my experience on the camera and display side with devices that
>> are made of multiple components, suspend and resume are best handled in
>> a controlled way by the top-level driver. Otherwise you end up having
>> different components suspending and resuming in random orders, and
>> that's a recipe for failure.
> 
> Manually suspending and resuming the components should be doable based
> on the registered components. However, I'm not sure it buys much in
> this case since we do have Linux driver module take care of things for
> us for most part :)
> 
> The dss hardware is really a private interconnect with a control module,
> and a collection of various mostly independent display output device
> modules.
> 
> We also have the interconnect target module to deal with for each
> module, and have the interconnect hierachy mapped in the devicetree.
> So we already have Linux driver module take care of the device
> hierarchy.
> 
> Because the child components are mostly independent, it should not
> matter in which order they suspend and resume related to each other.
> 
> I think the remaining issue is how dispc should provide services to
> the other components.
> 
> If dispc needs to be enabled to provide services to the other modules,
> maybe there's some better Linux generic framework dispc could implement?
> That is other than PM runtime calls for routing the signals to the
> output modules? Then PM runtime can be handled private to the dispc
> module.

What would be the difference? The dispc service would just call runtime 
get and put, like it does now, wouldn't it?

> Decoupling the system suspend and resume from PM runtime calls for
> all the other dss components should still also be done IMO. But that
> can be done as a separate clean-up patches after we have fixed the
> $subject issue.

I don't think I still really understand why all this is needed. I mean, 
obviously things don't work correctly at the moment, so maybe this patch 
can be applied to fix the system suspend. But it just feels like a big 
hack (the current pm_runtime_force_suspend/resume work-around feels like 
a big hack too).

Why doesn't the suspend just work? Afaics, the runtime PM code in 
omapdrm is fine: the dependencies work nicely, things get runtime 
suspended and resumes correctly. And at system suspend, omapdrm will 
disable the whole display pipeline (including bridges, panels) in a 
controlled manner, which results in the appropriate runtime PM calls. I 
think this should just work. But it doesn't, because... runtime PM and 
system suspend don't quite work well together? Or something.

So is it something that omapdrm is doing in a wrong way, or is the PM 
framework just messed up, and other drivers need to dance around the 
problems too?

>> Can we get the omapdrm suspend/resume to run first/last, and
>> stop/restart the whole device from there ?
> 
> This is already the case since commit ecfdedd7da5d ("drm/omap: force
> runtime PM suspend on system suspend"). We have omap_drv use
> SIMPLE_DEV_PM_OPS, and the components use SET_LATE_SYSTEM_SLEEP_PM_OPS.
> So omap_drv suspends first and resumes last. The order should not
> matter for other components. Well that is as long as we can deal
> with dispc providing resources.
> 
> I think we really should also change omap_drv use prepare/complete ops,
> and have the components use standard SIMPLE_DEV_PM_OPS. That still
> won't help with PM runtime related issues for system suspend and
> resume though, but leaves out the need for late pm ops.

Why do we need to do the above? What would omapdrm do in prepare & 
complete? Why would we use SIMPLE_DEV_PM_OPS for the dss subcomponents?

Slightly off topic, but I just noticed that we're using runtime_put_sync 
for some reason. Found 0eaf9f52e94f756147dbfe1faf1f77a02378dbf9. I've 
been fighting with system suspend for a long time =).

I wonder if using non-sync version would remove the EBUSY problem...

  Tomi
Tony Lindgren May 3, 2021, 10:45 a.m. UTC | #4
* Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> [210503 08:04]:
> On 29/04/2021 07:46, Tony Lindgren wrote:
> > I think the remaining issue is how dispc should provide services to
> > the other components.
> > 
> > If dispc needs to be enabled to provide services to the other modules,
> > maybe there's some better Linux generic framework dispc could implement?
> > That is other than PM runtime calls for routing the signals to the
> > output modules? Then PM runtime can be handled private to the dispc
> > module.
> 
> What would be the difference? The dispc service would just call runtime get
> and put, like it does now, wouldn't it?

I was thinking that we could have dispc always enabled when services from
dispc have been requested. I'm not sure if we need to toggle dispc state,
having it set to SYSC_IDLE_SMART mode might be enough. And I think the
clocks are already on for dispc from the top level dss module if any of
the dss components are active. I could be wrong though as I don't know
enough about the dss hardware :)

> > Decoupling the system suspend and resume from PM runtime calls for
> > all the other dss components should still also be done IMO. But that
> > can be done as a separate clean-up patches after we have fixed the
> > $subject issue.
> 
> I don't think I still really understand why all this is needed. I mean,
> obviously things don't work correctly at the moment, so maybe this patch can
> be applied to fix the system suspend. But it just feels like a big hack (the
> current pm_runtime_force_suspend/resume work-around feels like a big hack
> too).

Well omapdrm is not handling the -EBUSY error during system resume.

> Why doesn't the suspend just work? Afaics, the runtime PM code in omapdrm is
> fine: the dependencies work nicely, things get runtime suspended and resumes
> correctly. And at system suspend, omapdrm will disable the whole display
> pipeline (including bridges, panels) in a controlled manner, which results
> in the appropriate runtime PM calls. I think this should just work. But it
> doesn't, because... runtime PM and system suspend don't quite work well
> together? Or something.

Right, PM runtime and system suspend should not be mixed together.

> So is it something that omapdrm is doing in a wrong way, or is the PM
> framework just messed up, and other drivers need to dance around the
> problems too?

I think we have omapdrm close to doing the right things. But trying to
use PM runtime for system suspend is like using a Rube Goldberg machine
to turn off the lights at night when you want to sleep :p

> > I think we really should also change omap_drv use prepare/complete ops,
> > and have the components use standard SIMPLE_DEV_PM_OPS. That still
> > won't help with PM runtime related issues for system suspend and
> > resume though, but leaves out the need for late pm ops.
> 
> Why do we need to do the above? What would omapdrm do in prepare & complete?
> Why would we use SIMPLE_DEV_PM_OPS for the dss subcomponents?

That would leave out the need to use late_pm_ops at all for the driver,
we should suspend a bit earlier, and resume a bit later.

> Slightly off topic, but I just noticed that we're using runtime_put_sync for
> some reason. Found 0eaf9f52e94f756147dbfe1faf1f77a02378dbf9. I've been
> fighting with system suspend for a long time =).
> 
> I wonder if using non-sync version would remove the EBUSY problem...

Worth trying, but it will only help if the -EBUSY error from
pm_runtime_put() is handled somewhere for a retry..

Regards,

Tony
Tony Lindgren May 3, 2021, 11:16 a.m. UTC | #5
Hi,

* Tony Lindgren <tony@atomide.com> [210503 10:45]:
> * Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> [210503 08:04]:
> > On 29/04/2021 07:46, Tony Lindgren wrote:
> > > Decoupling the system suspend and resume from PM runtime calls for
> > > all the other dss components should still also be done IMO. But that
> > > can be done as a separate clean-up patches after we have fixed the
> > > $subject issue.
> > 
> > I don't think I still really understand why all this is needed. I mean,
> > obviously things don't work correctly at the moment, so maybe this patch can
> > be applied to fix the system suspend. But it just feels like a big hack (the
> > current pm_runtime_force_suspend/resume work-around feels like a big hack
> > too).
> 
> Well omapdrm is not handling the -EBUSY error during system resume.

Or rather something on the resume path is not handling and cannot handle
-EBUSY. And sounds like the reason is..

> > Slightly off topic, but I just noticed that we're using runtime_put_sync for
> > some reason. Found 0eaf9f52e94f756147dbfe1faf1f77a02378dbf9. I've been
> > fighting with system suspend for a long time =).
> > 
> > I wonder if using non-sync version would remove the EBUSY problem...
> 
> Worth trying, but it will only help if the -EBUSY error from
> pm_runtime_put() is handled somewhere for a retry..

..the use of pm_runtime_put_sync() like you suggested. I did a quick
test with the minimal change below and that works :) Seems like that's
probably the best minimal fix for the -rc cycle.

Regards,

Tony

8< ----------------
diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c b/drivers/gpu/drm/omapdrm/dss/dispc.c
--- a/drivers/gpu/drm/omapdrm/dss/dispc.c
+++ b/drivers/gpu/drm/omapdrm/dss/dispc.c
@@ -664,7 +664,7 @@ void dispc_runtime_put(struct dispc_device *dispc)
 
 	DSSDBG("dispc_runtime_put\n");
 
-	r = pm_runtime_put_sync(&dispc->pdev->dev);
+	r = pm_runtime_put(&dispc->pdev->dev);
 	WARN_ON(r < 0 && r != -ENOSYS);
 }
Tony Lindgren May 3, 2021, 12:17 p.m. UTC | #6
* Tony Lindgren <tony@atomide.com> [210503 11:16]:
> ..the use of pm_runtime_put_sync() like you suggested. I did a quick
> test with the minimal change below and that works :) Seems like that's
> probably the best minimal fix for the -rc cycle.

Sorry I was mistaken, the patch below won't help for the omapdrm
PM runtime state on suspend.

I had patch "bus: ti-sysc: Fix am335x resume hang for usb otg module"
applied, and that changes ti-sysc to get rid of the PM runtime calls
during system suspend. The side effect is ti-sysc now ignores the module
PM runtime state on suspend. This is pretty much what _od_suspend_noirq()
was also doing.

I think we still fix the dispc related issue too, otherwise the parent
child_count will just keep increasing on each suspend. I check that
again though.

Regards,


Tony


> 8< ----------------
> diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c b/drivers/gpu/drm/omapdrm/dss/dispc.c
> --- a/drivers/gpu/drm/omapdrm/dss/dispc.c
> +++ b/drivers/gpu/drm/omapdrm/dss/dispc.c
> @@ -664,7 +664,7 @@ void dispc_runtime_put(struct dispc_device *dispc)
>  
>  	DSSDBG("dispc_runtime_put\n");
>  
> -	r = pm_runtime_put_sync(&dispc->pdev->dev);
> +	r = pm_runtime_put(&dispc->pdev->dev);
>  	WARN_ON(r < 0 && r != -ENOSYS);
>  }
>
Tony Lindgren May 5, 2021, 11:12 a.m. UTC | #7
* Tony Lindgren <tony@atomide.com> [210503 12:18]:
> I think we still fix the dispc related issue too, otherwise the parent
> child_count will just keep increasing on each suspend. I check that
> again though.

After tons more debugging, I found the root cause for the parent child_count
increasing and posted a patch for it at [0] below.

This means the $subject patch here can be done later on as clean-up to
leave out lots of unnecessary PM runtime calls and simplify the system
suspend :)

Regards,

Tony


[0] https://lore.kernel.org/linux-pm/20210505110915.6861-1-tony@atomide.com/T/#u
Tomi Valkeinen May 7, 2021, 1:26 p.m. UTC | #8
On 05/05/2021 14:12, Tony Lindgren wrote:
> * Tony Lindgren <tony@atomide.com> [210503 12:18]:
>> I think we still fix the dispc related issue too, otherwise the parent
>> child_count will just keep increasing on each suspend. I check that
>> again though.
> 
> After tons more debugging, I found the root cause for the parent child_count
> increasing and posted a patch for it at [0] below.
> 
> This means the $subject patch here can be done later on as clean-up to
> leave out lots of unnecessary PM runtime calls and simplify the system
> suspend :)

Great work, thanks! It's always nice when someone else does the tons of 
debugging ;).

I tested the patch you sent, works fine for me.

While testing I noticed another problem, which happens also without your 
fix: go to suspend with HDMI connected, remove the cable, resume the 
board -> boom. I didn't debug that yet.

  Tomi
diff mbox series

Patch

diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c b/drivers/gpu/drm/omapdrm/dss/dispc.c
--- a/drivers/gpu/drm/omapdrm/dss/dispc.c
+++ b/drivers/gpu/drm/omapdrm/dss/dispc.c
@@ -182,6 +182,7 @@  struct dispc_device {
 	const struct dispc_features *feat;
 
 	bool is_enabled;
+	bool needs_resume;
 
 	struct regmap *syscon_pol;
 	u32 syscon_pol_offset;
@@ -4887,10 +4888,34 @@  static int dispc_runtime_resume(struct device *dev)
 	return 0;
 }
 
+static int dispc_suspend(struct device *dev)
+{
+	struct dispc_device *dispc = dev_get_drvdata(dev);
+
+	if (!dispc->is_enabled)
+		return 0;
+
+	dispc->needs_resume = true;
+
+	return dispc_runtime_suspend(dev);
+}
+
+static int dispc_resume(struct device *dev)
+{
+	struct dispc_device *dispc = dev_get_drvdata(dev);
+
+	if (!dispc->needs_resume)
+		return 0;
+
+	dispc->needs_resume = false;
+
+	return dispc_runtime_resume(dev);
+}
+
 static const struct dev_pm_ops dispc_pm_ops = {
 	.runtime_suspend = dispc_runtime_suspend,
 	.runtime_resume = dispc_runtime_resume,
-	SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
+	SET_LATE_SYSTEM_SLEEP_PM_OPS(dispc_suspend, dispc_resume)
 };
 
 struct platform_driver omap_dispchw_driver = {