diff mbox series

drm/panel: Avoid warnings w/ panel-simple/panel-edp at shutdown

Message ID 20240611074846.1.Ieb287c2c3ee3f6d3b0d5f49b29f746b93621749c@changeid (mailing list archive)
State New, archived
Headers show
Series drm/panel: Avoid warnings w/ panel-simple/panel-edp at shutdown | expand

Commit Message

Douglas Anderson June 11, 2024, 2:48 p.m. UTC
At shutdown if you've got a _properly_ coded DRM modeset driver then
you'll get these two warnings at shutdown time:

  Skipping disable of already disabled panel
  Skipping unprepare of already unprepared panel

These warnings are ugly and sound concerning, but they're actually a
sign of a properly working system. That's not great.

It's not easy to get rid of these warnings. Until we know that all DRM
modeset drivers used with panel-simple and panel-edp are properly
calling drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
then the panel drivers _need_ to disable/unprepare themselves in order
to power off the panel cleanly. However, there are lots of DRM modeset
drivers used with panel-edp and panel-simple and it's hard to know
when we've got them all. Since the warning happens only on the drivers
that _are_ updated there's nothing to encourage broken DRM modeset
drivers to get fixed.

In order to flip the warning to the proper place, we need to know
which modeset drivers are going to shutdown properly. Though ugly, do
this by creating a list of everyone that shuts down properly. This
allows us to generate a warning for the correct case and also lets us
get rid of the warning for drivers that are shutting down properly.

Maintaining this list is ugly, but the idea is that it's only short
term. Once everyone is converted we can delete the list and call it
done. The list is ugly enough and adding to it is annoying enough that
people should push to make this happen.

Implement this all in a shared "header" file included by the two panel
drivers that need it. This avoids us adding an new exports while still
allowing the panel drivers to be modules. The code waste should be
small and, as per above, the whole solution is temporary.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I came up with this idea to help us move forward since otherwise I
couldn't see how we were ever going to fix panel-simple and panel-edp
since they're used by so many DRM Modeset drivers. It's a bit ugly but
I don't hate it. What do others think?

This is at the end of the series so even if folks hate it we could
still land the rest of the series.
This was a "bonus" extra patch I added at the end of v3 of the series
("drm/panel: Remove most store/double-check of prepared/enabled
state") [1]. There, I had the note: "I came up with this idea to help
us move forward since otherwise I couldn't see how we were ever going
to fix panel-simple and panel-edp since they're used by so many DRM
Modeset drivers. It's a bit ugly but I don't hate it. What do others
think?"

As requested by Neil, now that the rest of the series has landed I'm
sending this as a standalone patch so it can get more attention. I'm
starting it back at "v1". There is no change between v1 and the one
sent previously except for a typo fix in an error message: Can't't =>
Can't

[1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org

 drivers/gpu/drm/drm_panel.c                   |  12 ++
 .../gpu/drm/panel/panel-drm-shutdown-check.h  | 151 ++++++++++++++++++
 drivers/gpu/drm/panel/panel-edp.c             |  19 +--
 drivers/gpu/drm/panel/panel-simple.c          |  19 +--
 4 files changed, 169 insertions(+), 32 deletions(-)
 create mode 100644 drivers/gpu/drm/panel/panel-drm-shutdown-check.h

Comments

Maxime Ripard June 12, 2024, 8:09 a.m. UTC | #1
Hi,

On Tue, Jun 11, 2024 at 07:48:51AM GMT, Douglas Anderson wrote:
> At shutdown if you've got a _properly_ coded DRM modeset driver then
> you'll get these two warnings at shutdown time:
> 
>   Skipping disable of already disabled panel
>   Skipping unprepare of already unprepared panel
> 
> These warnings are ugly and sound concerning, but they're actually a
> sign of a properly working system. That's not great.
> 
> It's not easy to get rid of these warnings. Until we know that all DRM
> modeset drivers used with panel-simple and panel-edp are properly
> calling drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
> then the panel drivers _need_ to disable/unprepare themselves in order
> to power off the panel cleanly. However, there are lots of DRM modeset
> drivers used with panel-edp and panel-simple and it's hard to know
> when we've got them all. Since the warning happens only on the drivers
> that _are_ updated there's nothing to encourage broken DRM modeset
> drivers to get fixed.
> 
> In order to flip the warning to the proper place, we need to know
> which modeset drivers are going to shutdown properly. Though ugly, do
> this by creating a list of everyone that shuts down properly. This
> allows us to generate a warning for the correct case and also lets us
> get rid of the warning for drivers that are shutting down properly.
> 
> Maintaining this list is ugly, but the idea is that it's only short
> term. Once everyone is converted we can delete the list and call it
> done. The list is ugly enough and adding to it is annoying enough that
> people should push to make this happen.
> 
> Implement this all in a shared "header" file included by the two panel
> drivers that need it. This avoids us adding an new exports while still
> allowing the panel drivers to be modules. The code waste should be
> small and, as per above, the whole solution is temporary.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> I came up with this idea to help us move forward since otherwise I
> couldn't see how we were ever going to fix panel-simple and panel-edp
> since they're used by so many DRM Modeset drivers. It's a bit ugly but
> I don't hate it. What do others think?

I don't think it's the right approach, even more so since we're so close
now to having it in every driver.

I ran the coccinelle script we started with, and here are the results:

./drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1640:25-39: ERROR: KMS driver vmw_pci_driver is missing shutdown implementation
./drivers/gpu/drm/kmb/kmb_drv.c:621:30-49: ERROR: KMS driver kmb_platform_driver is missing shutdown implementation
./drivers/gpu/drm/tiny/arcpgu.c:422:30-52: ERROR: KMS driver arcpgu_platform_driver is missing shutdown implementation

Looking at the drivers by hand, it seems consistent.

Moving forward, I think having a collection of coccinelle scripts that
we ask new driver authors to run or put them in CI somehow would be a
better path. We have other similar candidates that can't really be dealt
with any other way, like not using drmm_ memory allocations, or not
using drm_dev_enter / drm_dev_exit.

Maxime
Daniel Vetter June 12, 2024, 8:58 a.m. UTC | #2
On Tue, Jun 11, 2024 at 07:48:51AM -0700, Douglas Anderson wrote:
> At shutdown if you've got a _properly_ coded DRM modeset driver then
> you'll get these two warnings at shutdown time:
> 
>   Skipping disable of already disabled panel
>   Skipping unprepare of already unprepared panel
> 
> These warnings are ugly and sound concerning, but they're actually a
> sign of a properly working system. That's not great.
> 
> It's not easy to get rid of these warnings. Until we know that all DRM
> modeset drivers used with panel-simple and panel-edp are properly
> calling drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
> then the panel drivers _need_ to disable/unprepare themselves in order
> to power off the panel cleanly. However, there are lots of DRM modeset
> drivers used with panel-edp and panel-simple and it's hard to know
> when we've got them all. Since the warning happens only on the drivers
> that _are_ updated there's nothing to encourage broken DRM modeset
> drivers to get fixed.
> 
> In order to flip the warning to the proper place, we need to know
> which modeset drivers are going to shutdown properly. Though ugly, do
> this by creating a list of everyone that shuts down properly. This
> allows us to generate a warning for the correct case and also lets us
> get rid of the warning for drivers that are shutting down properly.
> 
> Maintaining this list is ugly, but the idea is that it's only short
> term. Once everyone is converted we can delete the list and call it
> done. The list is ugly enough and adding to it is annoying enough that
> people should push to make this happen.
> 
> Implement this all in a shared "header" file included by the two panel
> drivers that need it. This avoids us adding an new exports while still
> allowing the panel drivers to be modules. The code waste should be
> small and, as per above, the whole solution is temporary.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> I came up with this idea to help us move forward since otherwise I
> couldn't see how we were ever going to fix panel-simple and panel-edp
> since they're used by so many DRM Modeset drivers. It's a bit ugly but
> I don't hate it. What do others think?

I think it's terrible :-)

Why does something like this now work?

drm_panel_shutdown_fixup(panel)
{
	/* if you get warnings here, fix your main drm driver to call
	 * drm_atomic_helper_shutdown()
	 */
	if (WARN_ON(panel->enabled))
		drm_panel_disable(panel);

	if (WARN_ON(panel->prepared))
		drm_panel_unprepare(panel);
}

And then call that little helper in the relevant panel drivers? Also feel
free to bikeshed the name and maybe put a more lengthly explainer into the
kerneldoc for that ...

Or am I completely missing the point here?
-Sima

> 
> This is at the end of the series so even if folks hate it we could
> still land the rest of the series.
> This was a "bonus" extra patch I added at the end of v3 of the series
> ("drm/panel: Remove most store/double-check of prepared/enabled
> state") [1]. There, I had the note: "I came up with this idea to help
> us move forward since otherwise I couldn't see how we were ever going
> to fix panel-simple and panel-edp since they're used by so many DRM
> Modeset drivers. It's a bit ugly but I don't hate it. What do others
> think?"
> 
> As requested by Neil, now that the rest of the series has landed I'm
> sending this as a standalone patch so it can get more attention. I'm
> starting it back at "v1". There is no change between v1 and the one
> sent previously except for a typo fix in an error message: Can't't =>
> Can't
> 
> [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org
> 
>  drivers/gpu/drm/drm_panel.c                   |  12 ++
>  .../gpu/drm/panel/panel-drm-shutdown-check.h  | 151 ++++++++++++++++++
>  drivers/gpu/drm/panel/panel-edp.c             |  19 +--
>  drivers/gpu/drm/panel/panel-simple.c          |  19 +--
>  4 files changed, 169 insertions(+), 32 deletions(-)
>  create mode 100644 drivers/gpu/drm/panel/panel-drm-shutdown-check.h
> 
> diff --git a/drivers/gpu/drm/drm_panel.c b/drivers/gpu/drm/drm_panel.c
> index cfbe020de54e..df3f15f4625e 100644
> --- a/drivers/gpu/drm/drm_panel.c
> +++ b/drivers/gpu/drm/drm_panel.c
> @@ -161,6 +161,12 @@ int drm_panel_unprepare(struct drm_panel *panel)
>  	if (!panel)
>  		return -EINVAL;
>  
> +	/*
> +	 * If you're seeing this warning, you either need to add your driver
> +	 * to "drm_drivers_that_shutdown" (if you're seeing it with panel-edp
> +	 * or panel-simple) or you need to remove the manual call to
> +	 * drm_panel_unprepare() in your panel driver.
> +	 */
>  	if (!panel->prepared) {
>  		dev_warn(panel->dev, "Skipping unprepare of already unprepared panel\n");
>  		return 0;
> @@ -245,6 +251,12 @@ int drm_panel_disable(struct drm_panel *panel)
>  	if (!panel)
>  		return -EINVAL;
>  
> +	/*
> +	 * If you're seeing this warning, you either need to add your driver
> +	 * to "drm_drivers_that_shutdown" (if you're seeing it with panel-edp
> +	 * or panel-simple) or you need to remove the manual call to
> +	 * drm_panel_disable() in your panel driver.
> +	 */
>  	if (!panel->enabled) {
>  		dev_warn(panel->dev, "Skipping disable of already disabled panel\n");
>  		return 0;
> diff --git a/drivers/gpu/drm/panel/panel-drm-shutdown-check.h b/drivers/gpu/drm/panel/panel-drm-shutdown-check.h
> new file mode 100644
> index 000000000000..dfa8197e09fb
> --- /dev/null
> +++ b/drivers/gpu/drm/panel/panel-drm-shutdown-check.h
> @@ -0,0 +1,151 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright 2024 Google Inc.
> + *
> + * This header is a temporary solution and is intended to be included
> + * directly by panel-edp.c and panel-simple.c.
> + *
> + * This header is needed because panel-edp and panel-simple are used by a
> + * wide variety of DRM drivers and it's hard to know for sure if all of the
> + * DRM drivers used by those panel drivers are properly calling
> + * drm_atomic_helper_shutdown() or drm_helper_force_disable_all() at
> + * shutdown/remove time.
> + *
> + * The plan for this header file:
> + * - Land it and hope that the warning print will encourage DRM drivers to
> + *   get fixed.
> + * - Eventually move to a WARN() splat for extra encouragement.
> + * - Assume that everyone has been fixed and remove this header file.
> + */
> +
> +#ifndef __PANEL_DRM_SHUTDOWN_CHECK_H__
> +#define __PANEL_DRM_SHUTDOWN_CHECK_H__
> +
> +#include <drm/drm_bridge.h>
> +#include <drm/drm_drv.h>
> +
> +/*
> + * This is a list of all DRM drivers that appear to properly call
> + * drm_atomic_helper_shutdown() or drm_helper_force_disable_all() at
> + * shutdown and remove time.
> + *
> + * We can't detect this dynamically and are stuck with a list because the panel
> + * driver's shutdown() call might be called _before_ the DRM driver's
> + * shutdown() call.
> + *
> + * NOTE: no verification has been done to confirm that the below drivers
> + * are actually _used_ with panel-simple or panel-edp, only that these drivers
> + * appear to be shutting down properly. It doesn't hurt to have extra drivers
> + * listed here as long as the list doesn't contain any drivers that are
> + * missing the shutdown calls.
> + */
> +static const char * const drm_drivers_that_shutdown[] = {
> +	"armada-drm",
> +	"aspeed-gfx-drm",
> +	"ast",
> +	"atmel-hlcdc",
> +	"bochs-drm",
> +	"cirrus",
> +	"exynos",
> +	"fsl-dcu-drm",
> +	"gm12u320",
> +	"gud",
> +	"hdlcd",
> +	"hibmc",
> +	"hx8357d",
> +	"hyperv_drm",
> +	"ili9163",
> +	"ili9225",
> +	"ili9341",
> +	"ili9486",
> +	"imx-dcss",
> +	"imx-drm",
> +	"imx-lcdc",
> +	"imx-lcdif",
> +	"ingenic-drm",
> +	"kirin",
> +	"komeda",
> +	"logicvc-drm",
> +	"loongson",
> +	"mali-dp",
> +	"mcde",
> +	"meson",
> +	"mgag200",
> +	"mi0283qt",
> +	"msm",
> +	"mxsfb-drm",
> +	"omapdrm",
> +	"panel-mipi-dbi",
> +	"pl111",
> +	"qxl",
> +	"rcar-du",
> +	"repaper",
> +	"rockchip",
> +	"rzg2l-du",
> +	"ssd130x",
> +	"st7586",
> +	"st7735r",
> +	"sti",
> +	"stm",
> +	"sun4i-drm",
> +	"tidss",
> +	"tilcdc",
> +	"tve200",
> +	"vboxvideo",
> +	"zynqmp-dpsub",
> +	""
> +};
> +
> +static void panel_shutdown_if_drm_driver_needs_fixing(struct drm_panel *panel)
> +{
> +	struct drm_bridge *bridge;
> +	const struct drm_driver *driver;
> +	const char * const *driver_name;
> +
> +	/*
> +	 * Look for a bridge that shares the DT node of this panel. That only
> +	 * works if we've been linked up with a panel_bridge.
> +	 */
> +	bridge = of_drm_find_bridge(panel->dev->of_node);
> +	if (bridge && bridge->dev && bridge->dev->driver) {
> +		/*
> +		 * If the DRM driver for the bridge is known to be fine then
> +		 * we're done.
> +		 */
> +		driver = bridge->dev->driver;
> +		for (driver_name = drm_drivers_that_shutdown; *driver_name; driver_name++) {
> +			if (strcmp(*driver_name, driver->name) == 0)
> +				return;
> +		}
> +
> +		/*
> +		 * If you see the message below then:
> +		 * 1. Make sure your DRM driver is properly calling
> +		 *    drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
> +		 *    at shutdown time.
> +		 * 2. Add your driver to the list.
> +		 */
> +		dev_warn(panel->dev,
> +			 "DRM driver appears buggy; manually disable/unprepare\n");
> +	} else {
> +		/*
> +		 * If you see the message below then your setup needs to
> +		 * be moved to using a panel_bridge. This often happens
> +		 * by calling devm_drm_of_get_bridge(). Having a panel without
> +		 * an associated panel_bridge is deprecated.
> +		 */
> +		dev_warn(panel->dev,
> +			 "Can't find DRM driver; manually disable/unprepare\n");
> +	}
> +
> +	/*
> +	 * If we don't know if a DRM driver is properly shutting things down
> +	 * then we'll manually call the disable/unprepare. This is always a
> +	 * safe thing to do (in that it won't cause you to crash), but it
> +	 * does generate a warning.
> +	 */
> +	drm_panel_disable(panel);
> +	drm_panel_unprepare(panel);
> +}
> +
> +#endif
> diff --git a/drivers/gpu/drm/panel/panel-edp.c b/drivers/gpu/drm/panel/panel-edp.c
> index 67ab6915d6e4..26f89858df9d 100644
> --- a/drivers/gpu/drm/panel/panel-edp.c
> +++ b/drivers/gpu/drm/panel/panel-edp.c
> @@ -42,6 +42,8 @@
>  #include <drm/drm_edid.h>
>  #include <drm/drm_panel.h>
>  
> +#include "panel-drm-shutdown-check.h"
> +
>  /**
>   * struct panel_delay - Describes delays for a simple panel.
>   */
> @@ -948,22 +950,7 @@ static void panel_edp_shutdown(struct device *dev)
>  {
>  	struct panel_edp *panel = dev_get_drvdata(dev);
>  
> -	/*
> -	 * NOTE: the following two calls don't really belong here. It is the
> -	 * responsibility of a correctly written DRM modeset driver to call
> -	 * drm_atomic_helper_shutdown() at shutdown time and that should
> -	 * cause the panel to be disabled / unprepared if needed. For now,
> -	 * however, we'll keep these calls due to the sheer number of
> -	 * different DRM modeset drivers used with panel-edp. The fact that
> -	 * we're calling these and _also_ the drm_atomic_helper_shutdown()
> -	 * will try to disable/unprepare means that we can get a warning about
> -	 * trying to disable/unprepare an already disabled/unprepared panel,
> -	 * but that's something we'll have to live with until we've confirmed
> -	 * that all DRM modeset drivers are properly calling
> -	 * drm_atomic_helper_shutdown().
> -	 */
> -	drm_panel_disable(&panel->base);
> -	drm_panel_unprepare(&panel->base);
> +	panel_shutdown_if_drm_driver_needs_fixing(&panel->base);
>  }
>  
>  static void panel_edp_remove(struct device *dev)
> diff --git a/drivers/gpu/drm/panel/panel-simple.c b/drivers/gpu/drm/panel/panel-simple.c
> index 8345ed891f5a..f505bc27e5ae 100644
> --- a/drivers/gpu/drm/panel/panel-simple.c
> +++ b/drivers/gpu/drm/panel/panel-simple.c
> @@ -42,6 +42,8 @@
>  #include <drm/drm_panel.h>
>  #include <drm/drm_of.h>
>  
> +#include "panel-drm-shutdown-check.h"
> +
>  /**
>   * struct panel_desc - Describes a simple panel.
>   */
> @@ -720,22 +722,7 @@ static void panel_simple_shutdown(struct device *dev)
>  {
>  	struct panel_simple *panel = dev_get_drvdata(dev);
>  
> -	/*
> -	 * NOTE: the following two calls don't really belong here. It is the
> -	 * responsibility of a correctly written DRM modeset driver to call
> -	 * drm_atomic_helper_shutdown() at shutdown time and that should
> -	 * cause the panel to be disabled / unprepared if needed. For now,
> -	 * however, we'll keep these calls due to the sheer number of
> -	 * different DRM modeset drivers used with panel-simple. The fact that
> -	 * we're calling these and _also_ the drm_atomic_helper_shutdown()
> -	 * will try to disable/unprepare means that we can get a warning about
> -	 * trying to disable/unprepare an already disabled/unprepared panel,
> -	 * but that's something we'll have to live with until we've confirmed
> -	 * that all DRM modeset drivers are properly calling
> -	 * drm_atomic_helper_shutdown().
> -	 */
> -	drm_panel_disable(&panel->base);
> -	drm_panel_unprepare(&panel->base);
> +	panel_shutdown_if_drm_driver_needs_fixing(&panel->base);
>  }
>  
>  static void panel_simple_remove(struct device *dev)
> -- 
> 2.45.2.505.gda0bf45e8d-goog
>
Douglas Anderson June 12, 2024, 2:39 p.m. UTC | #3
Hi,

On Wed, Jun 12, 2024 at 1:09 AM Maxime Ripard <mripard@kernel.org> wrote:
>
> Hi,
>
> On Tue, Jun 11, 2024 at 07:48:51AM GMT, Douglas Anderson wrote:
> > At shutdown if you've got a _properly_ coded DRM modeset driver then
> > you'll get these two warnings at shutdown time:
> >
> >   Skipping disable of already disabled panel
> >   Skipping unprepare of already unprepared panel
> >
> > These warnings are ugly and sound concerning, but they're actually a
> > sign of a properly working system. That's not great.
> >
> > It's not easy to get rid of these warnings. Until we know that all DRM
> > modeset drivers used with panel-simple and panel-edp are properly
> > calling drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
> > then the panel drivers _need_ to disable/unprepare themselves in order
> > to power off the panel cleanly. However, there are lots of DRM modeset
> > drivers used with panel-edp and panel-simple and it's hard to know
> > when we've got them all. Since the warning happens only on the drivers
> > that _are_ updated there's nothing to encourage broken DRM modeset
> > drivers to get fixed.
> >
> > In order to flip the warning to the proper place, we need to know
> > which modeset drivers are going to shutdown properly. Though ugly, do
> > this by creating a list of everyone that shuts down properly. This
> > allows us to generate a warning for the correct case and also lets us
> > get rid of the warning for drivers that are shutting down properly.
> >
> > Maintaining this list is ugly, but the idea is that it's only short
> > term. Once everyone is converted we can delete the list and call it
> > done. The list is ugly enough and adding to it is annoying enough that
> > people should push to make this happen.
> >
> > Implement this all in a shared "header" file included by the two panel
> > drivers that need it. This avoids us adding an new exports while still
> > allowing the panel drivers to be modules. The code waste should be
> > small and, as per above, the whole solution is temporary.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> > I came up with this idea to help us move forward since otherwise I
> > couldn't see how we were ever going to fix panel-simple and panel-edp
> > since they're used by so many DRM Modeset drivers. It's a bit ugly but
> > I don't hate it. What do others think?
>
> I don't think it's the right approach, even more so since we're so close
> now to having it in every driver.
>
> I ran the coccinelle script we started with, and here are the results:
>
> ./drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1640:25-39: ERROR: KMS driver vmw_pci_driver is missing shutdown implementation
> ./drivers/gpu/drm/kmb/kmb_drv.c:621:30-49: ERROR: KMS driver kmb_platform_driver is missing shutdown implementation
> ./drivers/gpu/drm/tiny/arcpgu.c:422:30-52: ERROR: KMS driver arcpgu_platform_driver is missing shutdown implementation

Sure, although I think we agreed even back when we talked about this
last that your coccinelle script wasn't guaranteed to catch every
driver. ...so I guess the question is: are we willing to accept that
we'll stop disabling panels at shutdown for any drivers that might
were missed. For instance, looking at it by hand (which also could
miss things), I previously thought that we also might need:

* nouveau
* tegra
* amdgpu
* sprd
* gma500
* radeon

I sent patches for those drivers but they don't go through drm-misc
and some of the drivers had a lot of abstraction layers and were hard
to reason about. I'm also not 100% confident that all of those drivers
really are affected--they'd have to be used with panel-simple or
panel-edp...

In any case, having some sort of warning that would give us a
definitive answer would be nice. My proposed patch would give us that
warning. I could even jump to a WARN_ON right from the start.

My proposed patch is self-admittedly super ugly and is also designed
to be temporary, so I don't think of this as giving up right before
crossing the finish line but instead accepting a tiny bit of temporary
ugliness to make sure that we don't accidentally regress anyone. I
would really hope it would be obvious to anyone writing / reviewing
drivers that the function I introduced isn't intended for anyone but
panel-simple and panel-edp.

-Doug
Douglas Anderson June 12, 2024, 2:49 p.m. UTC | #4
Hi,

On Wed, Jun 12, 2024 at 1:58 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Jun 11, 2024 at 07:48:51AM -0700, Douglas Anderson wrote:
> > At shutdown if you've got a _properly_ coded DRM modeset driver then
> > you'll get these two warnings at shutdown time:
> >
> >   Skipping disable of already disabled panel
> >   Skipping unprepare of already unprepared panel
> >
> > These warnings are ugly and sound concerning, but they're actually a
> > sign of a properly working system. That's not great.
> >
> > It's not easy to get rid of these warnings. Until we know that all DRM
> > modeset drivers used with panel-simple and panel-edp are properly
> > calling drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
> > then the panel drivers _need_ to disable/unprepare themselves in order
> > to power off the panel cleanly. However, there are lots of DRM modeset
> > drivers used with panel-edp and panel-simple and it's hard to know
> > when we've got them all. Since the warning happens only on the drivers
> > that _are_ updated there's nothing to encourage broken DRM modeset
> > drivers to get fixed.
> >
> > In order to flip the warning to the proper place, we need to know
> > which modeset drivers are going to shutdown properly. Though ugly, do
> > this by creating a list of everyone that shuts down properly. This
> > allows us to generate a warning for the correct case and also lets us
> > get rid of the warning for drivers that are shutting down properly.
> >
> > Maintaining this list is ugly, but the idea is that it's only short
> > term. Once everyone is converted we can delete the list and call it
> > done. The list is ugly enough and adding to it is annoying enough that
> > people should push to make this happen.
> >
> > Implement this all in a shared "header" file included by the two panel
> > drivers that need it. This avoids us adding an new exports while still
> > allowing the panel drivers to be modules. The code waste should be
> > small and, as per above, the whole solution is temporary.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> > I came up with this idea to help us move forward since otherwise I
> > couldn't see how we were ever going to fix panel-simple and panel-edp
> > since they're used by so many DRM Modeset drivers. It's a bit ugly but
> > I don't hate it. What do others think?
>
> I think it's terrible :-)

Well, we're in agreement. It is terrible. However, in my opinion it's
still a reasonable way to move forward.


> Why does something like this now work?
>
> drm_panel_shutdown_fixup(panel)
> {
>         /* if you get warnings here, fix your main drm driver to call
>          * drm_atomic_helper_shutdown()
>          */
>         if (WARN_ON(panel->enabled))
>                 drm_panel_disable(panel);
>
>         if (WARN_ON(panel->prepared))
>                 drm_panel_unprepare(panel);
> }
>
> And then call that little helper in the relevant panel drivers? Also feel
> free to bikeshed the name and maybe put a more lengthly explainer into the
> kerneldoc for that ...
>
> Or am I completely missing the point here?

The problem is that the ordering is wrong, I think. Even if the OS was
calling driver shutdown functions in the perfect order (which I'm not
convinced about since panels aren't always child "struct device"s of
the DRM device), the OS should be calling panel shutdown _before_
shutting down the DRM device. That means that with your suggestion:

1. Shutdown starts and panel is on.

2. OS calls panel shutdown call, which prints warnings because panel
is still on.

3. OS calls DRM driver shutdown call, which prints warnings because
someone else turned the panel off.

:-P

Certainly if I goofed and the above is wrong then let me know--I did
my experiments on this many months ago and didn't try repeating them
again now.

In any case, the only way I could figure out around this was some sort
of list. As mentioned in the commit message, it's super ugly and
intended to be temporary. Once we solve all the current in-tree
drivers, I'd imagine that long term for new DRM drivers this kind of
thing would be discovered during bringup of new boards. Usually during
bringup of new boards EEs measure timing signals and complain if
they're not right. If some EE cared and said we weren't disabling the
panel correctly at shutdown time then we'd know there was a problem.

-Doug
Maxime Ripard June 12, 2024, 3:03 p.m. UTC | #5
On Wed, Jun 12, 2024 at 07:49:31AM GMT, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jun 12, 2024 at 1:58 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Jun 11, 2024 at 07:48:51AM -0700, Douglas Anderson wrote:
> > > At shutdown if you've got a _properly_ coded DRM modeset driver then
> > > you'll get these two warnings at shutdown time:
> > >
> > >   Skipping disable of already disabled panel
> > >   Skipping unprepare of already unprepared panel
> > >
> > > These warnings are ugly and sound concerning, but they're actually a
> > > sign of a properly working system. That's not great.
> > >
> > > It's not easy to get rid of these warnings. Until we know that all DRM
> > > modeset drivers used with panel-simple and panel-edp are properly
> > > calling drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
> > > then the panel drivers _need_ to disable/unprepare themselves in order
> > > to power off the panel cleanly. However, there are lots of DRM modeset
> > > drivers used with panel-edp and panel-simple and it's hard to know
> > > when we've got them all. Since the warning happens only on the drivers
> > > that _are_ updated there's nothing to encourage broken DRM modeset
> > > drivers to get fixed.
> > >
> > > In order to flip the warning to the proper place, we need to know
> > > which modeset drivers are going to shutdown properly. Though ugly, do
> > > this by creating a list of everyone that shuts down properly. This
> > > allows us to generate a warning for the correct case and also lets us
> > > get rid of the warning for drivers that are shutting down properly.
> > >
> > > Maintaining this list is ugly, but the idea is that it's only short
> > > term. Once everyone is converted we can delete the list and call it
> > > done. The list is ugly enough and adding to it is annoying enough that
> > > people should push to make this happen.
> > >
> > > Implement this all in a shared "header" file included by the two panel
> > > drivers that need it. This avoids us adding an new exports while still
> > > allowing the panel drivers to be modules. The code waste should be
> > > small and, as per above, the whole solution is temporary.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > > I came up with this idea to help us move forward since otherwise I
> > > couldn't see how we were ever going to fix panel-simple and panel-edp
> > > since they're used by so many DRM Modeset drivers. It's a bit ugly but
> > > I don't hate it. What do others think?
> >
> > I think it's terrible :-)
> 
> Well, we're in agreement. It is terrible. However, in my opinion it's
> still a reasonable way to move forward.
> 
> 
> > Why does something like this now work?
> >
> > drm_panel_shutdown_fixup(panel)
> > {
> >         /* if you get warnings here, fix your main drm driver to call
> >          * drm_atomic_helper_shutdown()
> >          */
> >         if (WARN_ON(panel->enabled))
> >                 drm_panel_disable(panel);
> >
> >         if (WARN_ON(panel->prepared))
> >                 drm_panel_unprepare(panel);
> > }
> >
> > And then call that little helper in the relevant panel drivers? Also feel
> > free to bikeshed the name and maybe put a more lengthly explainer into the
> > kerneldoc for that ...
> >
> > Or am I completely missing the point here?
> 
> The problem is that the ordering is wrong, I think. Even if the OS was
> calling driver shutdown functions in the perfect order (which I'm not
> convinced about since panels aren't always child "struct device"s of
> the DRM device), the OS should be calling panel shutdown _before_
> shutting down the DRM device. That means that with your suggestion:
> 
> 1. Shutdown starts and panel is on.
> 
> 2. OS calls panel shutdown call, which prints warnings because panel
> is still on.
> 
> 3. OS calls DRM driver shutdown call, which prints warnings because
> someone else turned the panel off.
> 
> :-P
> 
> Certainly if I goofed and the above is wrong then let me know--I did
> my experiments on this many months ago and didn't try repeating them
> again now.
> 
> In any case, the only way I could figure out around this was some sort
> of list. As mentioned in the commit message, it's super ugly and
> intended to be temporary. Once we solve all the current in-tree
> drivers, I'd imagine that long term for new DRM drivers this kind of
> thing would be discovered during bringup of new boards. Usually during
> bringup of new boards EEs measure timing signals and complain if
> they're not right. If some EE cared and said we weren't disabling the
> panel correctly at shutdown time then we'd know there was a problem.

Based on a discussion we had today With Sima on IRC, I think there's
another way forward.

We were actually discussing refcount'ing the panels to avoid lifetime
issues. It would require some API overhaul to have a function to
allocate the drm_panel structure and init'ing the refcount, plus some to
get / put the references.

Having this refcount would mean that we also get a release function now,
called when the panel is free'd.

Could we warn if the panel is still prepared/enabled and is about to be
freed?

It would require to switch panel-simple and panel-edp to that new API,
but it should be easy enough.

Maxime
Daniel Vetter June 12, 2024, 3:11 p.m. UTC | #6
On Wed, Jun 12, 2024 at 07:49:31AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jun 12, 2024 at 1:58 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Jun 11, 2024 at 07:48:51AM -0700, Douglas Anderson wrote:
> > > At shutdown if you've got a _properly_ coded DRM modeset driver then
> > > you'll get these two warnings at shutdown time:
> > >
> > >   Skipping disable of already disabled panel
> > >   Skipping unprepare of already unprepared panel
> > >
> > > These warnings are ugly and sound concerning, but they're actually a
> > > sign of a properly working system. That's not great.
> > >
> > > It's not easy to get rid of these warnings. Until we know that all DRM
> > > modeset drivers used with panel-simple and panel-edp are properly
> > > calling drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
> > > then the panel drivers _need_ to disable/unprepare themselves in order
> > > to power off the panel cleanly. However, there are lots of DRM modeset
> > > drivers used with panel-edp and panel-simple and it's hard to know
> > > when we've got them all. Since the warning happens only on the drivers
> > > that _are_ updated there's nothing to encourage broken DRM modeset
> > > drivers to get fixed.
> > >
> > > In order to flip the warning to the proper place, we need to know
> > > which modeset drivers are going to shutdown properly. Though ugly, do
> > > this by creating a list of everyone that shuts down properly. This
> > > allows us to generate a warning for the correct case and also lets us
> > > get rid of the warning for drivers that are shutting down properly.
> > >
> > > Maintaining this list is ugly, but the idea is that it's only short
> > > term. Once everyone is converted we can delete the list and call it
> > > done. The list is ugly enough and adding to it is annoying enough that
> > > people should push to make this happen.
> > >
> > > Implement this all in a shared "header" file included by the two panel
> > > drivers that need it. This avoids us adding an new exports while still
> > > allowing the panel drivers to be modules. The code waste should be
> > > small and, as per above, the whole solution is temporary.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > > I came up with this idea to help us move forward since otherwise I
> > > couldn't see how we were ever going to fix panel-simple and panel-edp
> > > since they're used by so many DRM Modeset drivers. It's a bit ugly but
> > > I don't hate it. What do others think?
> >
> > I think it's terrible :-)
> 
> Well, we're in agreement. It is terrible. However, in my opinion it's
> still a reasonable way to move forward.
> 
> 
> > Why does something like this now work?
> >
> > drm_panel_shutdown_fixup(panel)
> > {
> >         /* if you get warnings here, fix your main drm driver to call
> >          * drm_atomic_helper_shutdown()
> >          */
> >         if (WARN_ON(panel->enabled))
> >                 drm_panel_disable(panel);
> >
> >         if (WARN_ON(panel->prepared))
> >                 drm_panel_unprepare(panel);
> > }
> >
> > And then call that little helper in the relevant panel drivers? Also feel
> > free to bikeshed the name and maybe put a more lengthly explainer into the
> > kerneldoc for that ...
> >
> > Or am I completely missing the point here?
> 
> The problem is that the ordering is wrong, I think. Even if the OS was
> calling driver shutdown functions in the perfect order (which I'm not
> convinced about since panels aren't always child "struct device"s of
> the DRM device), the OS should be calling panel shutdown _before_
> shutting down the DRM device. That means that with your suggestion:
> 
> 1. Shutdown starts and panel is on.
> 
> 2. OS calls panel shutdown call, which prints warnings because panel
> is still on.
> 
> 3. OS calls DRM driver shutdown call, which prints warnings because
> someone else turned the panel off.

Uh, that's a _much_ more fundamental issue.

The fix for that is telling the driver core about this dependency with
device_link_add. Unfortuantely, despite years of me trying to push for
this, drm_bridge and drm_panel still don't automatically add these,
because the situation is a really complex mess.

Probably need to read dri-devel archives for all the past attempts around
device_link_add.

But the solution is definitely not to have a manually tracked list, what's
very architectural unsound way to tackle this problem.

> Certainly if I goofed and the above is wrong then let me know--I did
> my experiments on this many months ago and didn't try repeating them
> again now.

Oh the issue is very real and known since years. It also wreaks module
unload and driver unbinding, since currently nothing makes sure your
drm_panel lives longer than your drm_device.

> In any case, the only way I could figure out around this was some sort
> of list. As mentioned in the commit message, it's super ugly and
> intended to be temporary. Once we solve all the current in-tree
> drivers, I'd imagine that long term for new DRM drivers this kind of
> thing would be discovered during bringup of new boards. Usually during
> bringup of new boards EEs measure timing signals and complain if
> they're not right. If some EE cared and said we weren't disabling the
> panel correctly at shutdown time then we'd know there was a problem.

You've stepped into an entire hornets nest with this device dependency
issue unfortunately, I'm afraid :-/

Cheers, Sima
Daniel Vetter June 12, 2024, 3:13 p.m. UTC | #7
On Wed, Jun 12, 2024 at 07:39:01AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jun 12, 2024 at 1:09 AM Maxime Ripard <mripard@kernel.org> wrote:
> >
> > Hi,
> >
> > On Tue, Jun 11, 2024 at 07:48:51AM GMT, Douglas Anderson wrote:
> > > At shutdown if you've got a _properly_ coded DRM modeset driver then
> > > you'll get these two warnings at shutdown time:
> > >
> > >   Skipping disable of already disabled panel
> > >   Skipping unprepare of already unprepared panel
> > >
> > > These warnings are ugly and sound concerning, but they're actually a
> > > sign of a properly working system. That's not great.
> > >
> > > It's not easy to get rid of these warnings. Until we know that all DRM
> > > modeset drivers used with panel-simple and panel-edp are properly
> > > calling drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
> > > then the panel drivers _need_ to disable/unprepare themselves in order
> > > to power off the panel cleanly. However, there are lots of DRM modeset
> > > drivers used with panel-edp and panel-simple and it's hard to know
> > > when we've got them all. Since the warning happens only on the drivers
> > > that _are_ updated there's nothing to encourage broken DRM modeset
> > > drivers to get fixed.
> > >
> > > In order to flip the warning to the proper place, we need to know
> > > which modeset drivers are going to shutdown properly. Though ugly, do
> > > this by creating a list of everyone that shuts down properly. This
> > > allows us to generate a warning for the correct case and also lets us
> > > get rid of the warning for drivers that are shutting down properly.
> > >
> > > Maintaining this list is ugly, but the idea is that it's only short
> > > term. Once everyone is converted we can delete the list and call it
> > > done. The list is ugly enough and adding to it is annoying enough that
> > > people should push to make this happen.
> > >
> > > Implement this all in a shared "header" file included by the two panel
> > > drivers that need it. This avoids us adding an new exports while still
> > > allowing the panel drivers to be modules. The code waste should be
> > > small and, as per above, the whole solution is temporary.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > > I came up with this idea to help us move forward since otherwise I
> > > couldn't see how we were ever going to fix panel-simple and panel-edp
> > > since they're used by so many DRM Modeset drivers. It's a bit ugly but
> > > I don't hate it. What do others think?
> >
> > I don't think it's the right approach, even more so since we're so close
> > now to having it in every driver.
> >
> > I ran the coccinelle script we started with, and here are the results:
> >
> > ./drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1640:25-39: ERROR: KMS driver vmw_pci_driver is missing shutdown implementation
> > ./drivers/gpu/drm/kmb/kmb_drv.c:621:30-49: ERROR: KMS driver kmb_platform_driver is missing shutdown implementation
> > ./drivers/gpu/drm/tiny/arcpgu.c:422:30-52: ERROR: KMS driver arcpgu_platform_driver is missing shutdown implementation
> 
> Sure, although I think we agreed even back when we talked about this
> last that your coccinelle script wasn't guaranteed to catch every
> driver. ...so I guess the question is: are we willing to accept that
> we'll stop disabling panels at shutdown for any drivers that might
> were missed. For instance, looking at it by hand (which also could
> miss things), I previously thought that we also might need:
> 
> * nouveau
> * tegra
> * amdgpu
> * sprd
> * gma500
> * radeon
> 
> I sent patches for those drivers but they don't go through drm-misc
> and some of the drivers had a lot of abstraction layers and were hard
> to reason about. I'm also not 100% confident that all of those drivers
> really are affected--they'd have to be used with panel-simple or
> panel-edp...

Aside from amdgpu and radeon they're all in -misc now, and Alex is
generally fairly responsive.

> In any case, having some sort of warning that would give us a
> definitive answer would be nice. My proposed patch would give us that
> warning. I could even jump to a WARN_ON right from the start.

Yeah we defo want some warning to at least check this at runtime.

> My proposed patch is self-admittedly super ugly and is also designed
> to be temporary, so I don't think of this as giving up right before
> crossing the finish line but instead accepting a tiny bit of temporary
> ugliness to make sure that we don't accidentally regress anyone. I
> would really hope it would be obvious to anyone writing / reviewing
> drivers that the function I introduced isn't intended for anyone but
> panel-simple and panel-edp.

See my other reply for the proper design fix, and my apologies for what
you stepped into here :-/
-Sima
Douglas Anderson June 12, 2024, 4 p.m. UTC | #8
Hi,

On Wed, Jun 12, 2024 at 8:03 AM Maxime Ripard <mripard@kernel.org> wrote:
>
> > > Why does something like this now work?
> > >
> > > drm_panel_shutdown_fixup(panel)
> > > {
> > >         /* if you get warnings here, fix your main drm driver to call
> > >          * drm_atomic_helper_shutdown()
> > >          */
> > >         if (WARN_ON(panel->enabled))
> > >                 drm_panel_disable(panel);
> > >
> > >         if (WARN_ON(panel->prepared))
> > >                 drm_panel_unprepare(panel);
> > > }
> > >
> > > And then call that little helper in the relevant panel drivers? Also feel
> > > free to bikeshed the name and maybe put a more lengthly explainer into the
> > > kerneldoc for that ...
> > >
> > > Or am I completely missing the point here?
> >
> > The problem is that the ordering is wrong, I think. Even if the OS was
> > calling driver shutdown functions in the perfect order (which I'm not
> > convinced about since panels aren't always child "struct device"s of
> > the DRM device), the OS should be calling panel shutdown _before_
> > shutting down the DRM device. That means that with your suggestion:
> >
> > 1. Shutdown starts and panel is on.
> >
> > 2. OS calls panel shutdown call, which prints warnings because panel
> > is still on.
> >
> > 3. OS calls DRM driver shutdown call, which prints warnings because
> > someone else turned the panel off.
> >
> > :-P
> >
> > Certainly if I goofed and the above is wrong then let me know--I did
> > my experiments on this many months ago and didn't try repeating them
> > again now.
> >
> > In any case, the only way I could figure out around this was some sort
> > of list. As mentioned in the commit message, it's super ugly and
> > intended to be temporary. Once we solve all the current in-tree
> > drivers, I'd imagine that long term for new DRM drivers this kind of
> > thing would be discovered during bringup of new boards. Usually during
> > bringup of new boards EEs measure timing signals and complain if
> > they're not right. If some EE cared and said we weren't disabling the
> > panel correctly at shutdown time then we'd know there was a problem.
>
> Based on a discussion we had today With Sima on IRC, I think there's
> another way forward.
>
> We were actually discussing refcount'ing the panels to avoid lifetime
> issues. It would require some API overhaul to have a function to
> allocate the drm_panel structure and init'ing the refcount, plus some to
> get / put the references.
>
> Having this refcount would mean that we also get a release function now,
> called when the panel is free'd.
>
> Could we warn if the panel is still prepared/enabled and is about to be
> freed?
>
> It would require to switch panel-simple and panel-edp to that new API,
> but it should be easy enough.

I think there are two problems here:

1. The problem is at shutdown here. Memory isn't freed at shutdown
time. This isn't a lifetime issue. No release functions are involved
in shutdown and we don't free memory then.

2. As I tried to point out, even if we were guaranteed the correct
order it still doesn't help us. In other words: if all device links
were perfect and all ordering was proper then the panel should get
shutdown _before_ the DRM device. That means we can't put a check in
the panel code to see if the DRM device has been shutdown.


-Doug
Douglas Anderson June 12, 2024, 4 p.m. UTC | #9
Hi,

On Wed, Jun 12, 2024 at 8:11 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > The problem is that the ordering is wrong, I think. Even if the OS was
> > calling driver shutdown functions in the perfect order (which I'm not
> > convinced about since panels aren't always child "struct device"s of
> > the DRM device), the OS should be calling panel shutdown _before_
> > shutting down the DRM device. That means that with your suggestion:
> >
> > 1. Shutdown starts and panel is on.
> >
> > 2. OS calls panel shutdown call, which prints warnings because panel
> > is still on.
> >
> > 3. OS calls DRM driver shutdown call, which prints warnings because
> > someone else turned the panel off.
>
> Uh, that's a _much_ more fundamental issue.
>
> The fix for that is telling the driver core about this dependency with
> device_link_add. Unfortuantely, despite years of me trying to push for
> this, drm_bridge and drm_panel still don't automatically add these,
> because the situation is a really complex mess.
>
> Probably need to read dri-devel archives for all the past attempts around
> device_link_add.
>
> But the solution is definitely not to have a manually tracked list, what's
> very architectural unsound way to tackle this problem.
>
> > Certainly if I goofed and the above is wrong then let me know--I did
> > my experiments on this many months ago and didn't try repeating them
> > again now.
>
> Oh the issue is very real and known since years. It also wreaks module
> unload and driver unbinding, since currently nothing makes sure your
> drm_panel lives longer than your drm_device.

In this case I'm mostly worried about the device "shutdown" call, so
it's not quite a lifetime issue but it is definitely related.

As per my reply to Maxime, though, I'd expect that if all ordering
issues were fixed and things were perfect then we'd still have a
problem. Specifically it would seem pretty wrong to me to say that the
panel is the "parent" of the DRM device, right? So if the panel is the
"child" of the DRM device that means it'll get shutdown first and that
means that the panel's shutdown call cannot be used to tell whether
the DRM device's shutdown call behaved properly.


> > In any case, the only way I could figure out around this was some sort
> > of list. As mentioned in the commit message, it's super ugly and
> > intended to be temporary. Once we solve all the current in-tree
> > drivers, I'd imagine that long term for new DRM drivers this kind of
> > thing would be discovered during bringup of new boards. Usually during
> > bringup of new boards EEs measure timing signals and complain if
> > they're not right. If some EE cared and said we weren't disabling the
> > panel correctly at shutdown time then we'd know there was a problem.
>
> You've stepped into an entire hornets nest with this device dependency
> issue unfortunately, I'm afraid :-/

As you've said, you've been working on this problem for years. Solving
the device link problem doesn't help me, but even if it did it's
really not fundamental to the problem here. The only need is to get a
warning printed out so we know for sure which DRM drivers need to be
updated before deleting the old crufty code. Blocking that on a
difficult / years-long struggle might not be the best.

That all being said, I'm also totally OK with any of the following:

1. Dropping my patch and just accepting that we will have warnings
printed out for all DRM drivers that do things correctly and have no
warnings for broken DRM drivers.

2. Someone else posting / landing a patch to remove the hacky "disable
/ unprepare" for panel-simple and panel-edp and asserting that they
don't care if they break any DRM drivers that are still broken. I
don't want to be involved in authoring or landing this patch, but I
won't scream loudly if others want to do it.

3. Someone else taking over trying to solve this problem.

...mostly this work is janitorial and I'm trying to help move the DRM
framework forward and get rid of cruft, so if it's going to cause too
much conflict I'm fine just stepping back.

-Doug
Douglas Anderson June 12, 2024, 4:03 p.m. UTC | #10
Hi,

On Wed, Jun 12, 2024 at 8:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > > I ran the coccinelle script we started with, and here are the results:
> > >
> > > ./drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1640:25-39: ERROR: KMS driver vmw_pci_driver is missing shutdown implementation
> > > ./drivers/gpu/drm/kmb/kmb_drv.c:621:30-49: ERROR: KMS driver kmb_platform_driver is missing shutdown implementation
> > > ./drivers/gpu/drm/tiny/arcpgu.c:422:30-52: ERROR: KMS driver arcpgu_platform_driver is missing shutdown implementation
> >
> > Sure, although I think we agreed even back when we talked about this
> > last that your coccinelle script wasn't guaranteed to catch every
> > driver. ...so I guess the question is: are we willing to accept that
> > we'll stop disabling panels at shutdown for any drivers that might
> > were missed. For instance, looking at it by hand (which also could
> > miss things), I previously thought that we also might need:
> >
> > * nouveau
> > * tegra
> > * amdgpu
> > * sprd
> > * gma500
> > * radeon
> >
> > I sent patches for those drivers but they don't go through drm-misc
> > and some of the drivers had a lot of abstraction layers and were hard
> > to reason about. I'm also not 100% confident that all of those drivers
> > really are affected--they'd have to be used with panel-simple or
> > panel-edp...
>
> Aside from amdgpu and radeon they're all in -misc now, and Alex is
> generally fairly responsive.

Ah, nice! They weren't when I sent the patches ages ago. I guess I
should go ahead and repost things and maybe they'll get some traction.


> > In any case, having some sort of warning that would give us a
> > definitive answer would be nice. My proposed patch would give us that
> > warning. I could even jump to a WARN_ON right from the start.
>
> Yeah we defo want some warning to at least check this at runtime.

Yeah, my patch today currently just has a "dev_warn", but the question
is whether it would get more attention with a full on WARN_ON(). I
know WARN_ON() can be pretty controversial.

-Doug
Linus Walleij June 12, 2024, 4:47 p.m. UTC | #11
On Wed, Jun 12, 2024 at 5:11 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Jun 12, 2024 at 07:49:31AM -0700, Doug Anderson wrote:
(...)
> > The problem is that the ordering is wrong, I think. Even if the OS was
> > calling driver shutdown functions in the perfect order (which I'm not
> > convinced about since panels aren't always child "struct device"s of
> > the DRM device), the OS should be calling panel shutdown _before_
> > shutting down the DRM device. That means that with your suggestion:
> >
> > 1. Shutdown starts and panel is on.
> >
> > 2. OS calls panel shutdown call, which prints warnings because panel
> > is still on.
> >
> > 3. OS calls DRM driver shutdown call, which prints warnings because
> > someone else turned the panel off.
>
> Uh, that's a _much_ more fundamental issue.
>
> The fix for that is telling the driver core about this dependency with
> device_link_add. Unfortuantely, despite years of me trying to push for
> this, drm_bridge and drm_panel still don't automatically add these,
> because the situation is a really complex mess.
>
> Probably need to read dri-devel archives for all the past attempts around
> device_link_add.

I think involving Saravana Kannan in the discussions around this
is the right thing to do, because he knows how to get devicelinks
to do the right thing.

If we can describe what devicelink needs to do to get this ordering
right, I'm pretty sure Saravana can tell us how to do it.

Yours,
Linus Walleij
Douglas Anderson June 12, 2024, 4:52 p.m. UTC | #12
Sima,

On Wed, Jun 12, 2024 at 8:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > > I ran the coccinelle script we started with, and here are the results:
> > >
> > > ./drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1640:25-39: ERROR: KMS driver vmw_pci_driver is missing shutdown implementation
> > > ./drivers/gpu/drm/kmb/kmb_drv.c:621:30-49: ERROR: KMS driver kmb_platform_driver is missing shutdown implementation
> > > ./drivers/gpu/drm/tiny/arcpgu.c:422:30-52: ERROR: KMS driver arcpgu_platform_driver is missing shutdown implementation
> >
> > Sure, although I think we agreed even back when we talked about this
> > last that your coccinelle script wasn't guaranteed to catch every
> > driver. ...so I guess the question is: are we willing to accept that
> > we'll stop disabling panels at shutdown for any drivers that might
> > were missed. For instance, looking at it by hand (which also could
> > miss things), I previously thought that we also might need:
> >
> > * nouveau
> > * tegra
> > * amdgpu
> > * sprd
> > * gma500
> > * radeon
> >
> > I sent patches for those drivers but they don't go through drm-misc
> > and some of the drivers had a lot of abstraction layers and were hard
> > to reason about. I'm also not 100% confident that all of those drivers
> > really are affected--they'd have to be used with panel-simple or
> > panel-edp...
>
> Aside from amdgpu and radeon they're all in -misc now, and Alex is
> generally fairly responsive.

Sorry for not keeping up with things, but can you point to where this
was documented or what patch changed things so that these drivers went
through drm-misc? From the MAINTAINERS file I see commit 5a44d50f0072
("MAINTAINERS: Update drm-misc entry to match all drivers") and that
shows several of these drivers as "X:". As far as I can tell that
means that they _aren't_ handled by drm-misc, right? Maybe the
decision was made elsewhere and MAINTAINERS was just not updated, or
I'm not looking at the right place? I checked drm-misc-next and
drm/next and, for instance, "tegra" and "kmb" still show as excluded.

-Doug
Douglas Anderson June 12, 2024, 5:22 p.m. UTC | #13
Hi,

On Wed, Jun 12, 2024 at 9:47 AM Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Wed, Jun 12, 2024 at 5:11 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Wed, Jun 12, 2024 at 07:49:31AM -0700, Doug Anderson wrote:
> (...)
> > > The problem is that the ordering is wrong, I think. Even if the OS was
> > > calling driver shutdown functions in the perfect order (which I'm not
> > > convinced about since panels aren't always child "struct device"s of
> > > the DRM device), the OS should be calling panel shutdown _before_
> > > shutting down the DRM device. That means that with your suggestion:
> > >
> > > 1. Shutdown starts and panel is on.
> > >
> > > 2. OS calls panel shutdown call, which prints warnings because panel
> > > is still on.
> > >
> > > 3. OS calls DRM driver shutdown call, which prints warnings because
> > > someone else turned the panel off.
> >
> > Uh, that's a _much_ more fundamental issue.
> >
> > The fix for that is telling the driver core about this dependency with
> > device_link_add. Unfortuantely, despite years of me trying to push for
> > this, drm_bridge and drm_panel still don't automatically add these,
> > because the situation is a really complex mess.
> >
> > Probably need to read dri-devel archives for all the past attempts around
> > device_link_add.
>
> I think involving Saravana Kannan in the discussions around this
> is the right thing to do, because he knows how to get devicelinks
> to do the right thing.
>
> If we can describe what devicelink needs to do to get this ordering
> right, I'm pretty sure Saravana can tell us how to do it.

I'm really not convinced that hacking with device links in order to
get the shutdown notification in the right order is correct, though.
The idea is that after we're confident that all DRM modeset drivers
are calling shutdown properly that we should _remove_ any code
handling shutdown from panel-edp and panel-simple. They should just
get disabled as part of DRM's shutdown. That means that if we messed
with devicelinks just to get a different shutdown order that it was
just for a short term thing.

That being said, one could argue that having device links between the
DRM device and the panel is the right thing long term anyway and that
may well be. I guess the issue is that it's not necessarily obvious
how the "parent/child" or "supplier/consumer" relationship works w/
DRM devices, especially panels. My instinct says that the panel
logically is a "child" or "consumer" of the DRM device and thus
inserting the correct long term device link would mean we'd get
shutdown notification in the wrong order. It would be hard to argue
that the panel is the "parent" of a DRM device, but I guess you could
call it a "supplier"? ...but it's also a "consumer" of some other
stuff, like the pixels being output and also (perhaps) the DP AUX bus.
All this complexity is why the DRM framework tends to use its own
logic for things like prepare/enable instead of using what Linux gives
you. I'm sure Saravanah can also tell you about all the crazy device
link circular dependencies that DRM has thrown him through...

In any case, I guess I'll continue asserting that I'm not going to try
to solve this problem. If folks don't like my patch and there's no
suggestion other than solving years-old problems then I'm happy to
live with the way things are and hope that someone eventually comes
along and solves it.

-Doug
Daniel Vetter June 17, 2024, 2:09 p.m. UTC | #14
On Wed, Jun 12, 2024 at 09:52:40AM -0700, Doug Anderson wrote:
> Sima,
> 
> On Wed, Jun 12, 2024 at 8:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > > I ran the coccinelle script we started with, and here are the results:
> > > >
> > > > ./drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1640:25-39: ERROR: KMS driver vmw_pci_driver is missing shutdown implementation
> > > > ./drivers/gpu/drm/kmb/kmb_drv.c:621:30-49: ERROR: KMS driver kmb_platform_driver is missing shutdown implementation
> > > > ./drivers/gpu/drm/tiny/arcpgu.c:422:30-52: ERROR: KMS driver arcpgu_platform_driver is missing shutdown implementation
> > >
> > > Sure, although I think we agreed even back when we talked about this
> > > last that your coccinelle script wasn't guaranteed to catch every
> > > driver. ...so I guess the question is: are we willing to accept that
> > > we'll stop disabling panels at shutdown for any drivers that might
> > > were missed. For instance, looking at it by hand (which also could
> > > miss things), I previously thought that we also might need:
> > >
> > > * nouveau
> > > * tegra
> > > * amdgpu
> > > * sprd
> > > * gma500
> > > * radeon
> > >
> > > I sent patches for those drivers but they don't go through drm-misc
> > > and some of the drivers had a lot of abstraction layers and were hard
> > > to reason about. I'm also not 100% confident that all of those drivers
> > > really are affected--they'd have to be used with panel-simple or
> > > panel-edp...
> >
> > Aside from amdgpu and radeon they're all in -misc now, and Alex is
> > generally fairly responsive.
> 
> Sorry for not keeping up with things, but can you point to where this
> was documented or what patch changed things so that these drivers went
> through drm-misc? From the MAINTAINERS file I see commit 5a44d50f0072
> ("MAINTAINERS: Update drm-misc entry to match all drivers") and that
> shows several of these drivers as "X:". As far as I can tell that
> means that they _aren't_ handled by drm-misc, right? Maybe the
> decision was made elsewhere and MAINTAINERS was just not updated, or
> I'm not looking at the right place? I checked drm-misc-next and
> drm/next and, for instance, "tegra" and "kmb" still show as excluded.

Hm tegra moved meanwhile, but I guess it's not yet update in MAINTAINERS
(or not everywhere), at least for bugfixes. I think nouveau also has
partially moved?

Anyway past a certain point of unresponsiveness, drm-misc serves as the
catch-all for everything and you can land with an ack from drm or drm-misc
maintainers.
-Sima
Daniel Vetter June 17, 2024, 2:17 p.m. UTC | #15
On Wed, Jun 12, 2024 at 09:00:29AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jun 12, 2024 at 8:11 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > The problem is that the ordering is wrong, I think. Even if the OS was
> > > calling driver shutdown functions in the perfect order (which I'm not
> > > convinced about since panels aren't always child "struct device"s of
> > > the DRM device), the OS should be calling panel shutdown _before_
> > > shutting down the DRM device. That means that with your suggestion:
> > >
> > > 1. Shutdown starts and panel is on.
> > >
> > > 2. OS calls panel shutdown call, which prints warnings because panel
> > > is still on.
> > >
> > > 3. OS calls DRM driver shutdown call, which prints warnings because
> > > someone else turned the panel off.
> >
> > Uh, that's a _much_ more fundamental issue.
> >
> > The fix for that is telling the driver core about this dependency with
> > device_link_add. Unfortuantely, despite years of me trying to push for
> > this, drm_bridge and drm_panel still don't automatically add these,
> > because the situation is a really complex mess.
> >
> > Probably need to read dri-devel archives for all the past attempts around
> > device_link_add.
> >
> > But the solution is definitely not to have a manually tracked list, what's
> > very architectural unsound way to tackle this problem.
> >
> > > Certainly if I goofed and the above is wrong then let me know--I did
> > > my experiments on this many months ago and didn't try repeating them
> > > again now.
> >
> > Oh the issue is very real and known since years. It also wreaks module
> > unload and driver unbinding, since currently nothing makes sure your
> > drm_panel lives longer than your drm_device.
> 
> In this case I'm mostly worried about the device "shutdown" call, so
> it's not quite a lifetime issue but it is definitely related.

There's the "this is for pm" type of device_link, they have a few
flavours. And if that doesn't yet cover ->shutdown, then I guess that
needs to be addressed.

> As per my reply to Maxime, though, I'd expect that if all ordering
> issues were fixed and things were perfect then we'd still have a
> problem. Specifically it would seem pretty wrong to me to say that the
> panel is the "parent" of the DRM device, right? So if the panel is the
> "child" of the DRM device that means it'll get shutdown first and that
> means that the panel's shutdown call cannot be used to tell whether
> the DRM device's shutdown call behaved properly.

The device_link (if you add it the correct way round) means a
provider-consumer relationship. Which means:

- driver core unbinds the consumer before the provider (e.g. on module
  unload)
- driver core disables the consumer before the provider for power
  management stuff
- driver core enables providers before consumers when enabling power again

So yeah this should work the right way round ...

> > > In any case, the only way I could figure out around this was some sort
> > > of list. As mentioned in the commit message, it's super ugly and
> > > intended to be temporary. Once we solve all the current in-tree
> > > drivers, I'd imagine that long term for new DRM drivers this kind of
> > > thing would be discovered during bringup of new boards. Usually during
> > > bringup of new boards EEs measure timing signals and complain if
> > > they're not right. If some EE cared and said we weren't disabling the
> > > panel correctly at shutdown time then we'd know there was a problem.
> >
> > You've stepped into an entire hornets nest with this device dependency
> > issue unfortunately, I'm afraid :-/
> 
> As you've said, you've been working on this problem for years. Solving
> the device link problem doesn't help me, but even if it did it's
> really not fundamental to the problem here. The only need is to get a
> warning printed out so we know for sure which DRM drivers need to be
> updated before deleting the old crufty code. Blocking that on a
> difficult / years-long struggle might not be the best.

The issue is that everyone just gives up, so it's not moving.

> That all being said, I'm also totally OK with any of the following:
> 
> 1. Dropping my patch and just accepting that we will have warnings
> printed out for all DRM drivers that do things correctly and have no
> warnings for broken DRM drivers.

Can't we just flip the warnings around? Like make the hacky cleanup
conditional on the panel not yet being disabled/cleaned up, and complain
in that case only. With that:
- drivers which call shutdown shouldn't get a warning anymore, and also
  not a surplus call to drm_panel_disable/unprepare
- drivers which are broken still get the cleanup calls
- downside: we can't enforce this, because it's not yet enforced through
  device_link ordering

> 2. Someone else posting / landing a patch to remove the hacky "disable
> / unprepare" for panel-simple and panel-edp and asserting that they
> don't care if they break any DRM drivers that are still broken. I
> don't want to be involved in authoring or landing this patch, but I
> won't scream loudly if others want to do it.
> 
> 3. Someone else taking over trying to solve this problem.
> 
> ...mostly this work is janitorial and I'm trying to help move the DRM
> framework forward and get rid of cruft, so if it's going to cause too
> much conflict I'm fine just stepping back.

My issue is that you're trading an ugly warning that hurts maintenance
with an explicit list of working drivers, which also hurts maintenance.
Does seem like forward progress to me, just pushing the issue around.
-Sima
Daniel Vetter June 17, 2024, 2:22 p.m. UTC | #16
On Wed, Jun 12, 2024 at 10:22:49AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jun 12, 2024 at 9:47 AM Linus Walleij <linus.walleij@linaro.org> wrote:
> >
> > On Wed, Jun 12, 2024 at 5:11 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Wed, Jun 12, 2024 at 07:49:31AM -0700, Doug Anderson wrote:
> > (...)
> > > > The problem is that the ordering is wrong, I think. Even if the OS was
> > > > calling driver shutdown functions in the perfect order (which I'm not
> > > > convinced about since panels aren't always child "struct device"s of
> > > > the DRM device), the OS should be calling panel shutdown _before_
> > > > shutting down the DRM device. That means that with your suggestion:
> > > >
> > > > 1. Shutdown starts and panel is on.
> > > >
> > > > 2. OS calls panel shutdown call, which prints warnings because panel
> > > > is still on.
> > > >
> > > > 3. OS calls DRM driver shutdown call, which prints warnings because
> > > > someone else turned the panel off.
> > >
> > > Uh, that's a _much_ more fundamental issue.
> > >
> > > The fix for that is telling the driver core about this dependency with
> > > device_link_add. Unfortuantely, despite years of me trying to push for
> > > this, drm_bridge and drm_panel still don't automatically add these,
> > > because the situation is a really complex mess.
> > >
> > > Probably need to read dri-devel archives for all the past attempts around
> > > device_link_add.
> >
> > I think involving Saravana Kannan in the discussions around this
> > is the right thing to do, because he knows how to get devicelinks
> > to do the right thing.
> >
> > If we can describe what devicelink needs to do to get this ordering
> > right, I'm pretty sure Saravana can tell us how to do it.
> 
> I'm really not convinced that hacking with device links in order to
> get the shutdown notification in the right order is correct, though.
> The idea is that after we're confident that all DRM modeset drivers
> are calling shutdown properly that we should _remove_ any code
> handling shutdown from panel-edp and panel-simple. They should just
> get disabled as part of DRM's shutdown. That means that if we messed
> with devicelinks just to get a different shutdown order that it was
> just for a short term thing.

The device links would allow us to add consistency checks to the panel
code to make sure that the shutdown has already happened.

And we do kinda need the device ordering still, because if they're shut
down in the wrong order the panel might lose it's power already, before
the drm driver had a chance to have the last chat with it. Only relevant
for non-dumb panels like dsi, but there's cases.

> That being said, one could argue that having device links between the
> DRM device and the panel is the right thing long term anyway and that
> may well be. I guess the issue is that it's not necessarily obvious
> how the "parent/child" or "supplier/consumer" relationship works w/
> DRM devices, especially panels. My instinct says that the panel
> logically is a "child" or "consumer" of the DRM device and thus
> inserting the correct long term device link would mean we'd get
> shutdown notification in the wrong order. It would be hard to argue
> that the panel is the "parent" of a DRM device, but I guess you could
> call it a "supplier"? ...but it's also a "consumer" of some other
> stuff, like the pixels being output and also (perhaps) the DP AUX bus.
> All this complexity is why the DRM framework tends to use its own
> logic for things like prepare/enable instead of using what Linux gives
> you. I'm sure Saravanah can also tell you about all the crazy device
> link circular dependencies that DRM has thrown him through...

The panel driver provides the panel, the drm device driver consumes it.
I'm not really clear why you want to structure this the other way round, I
can't come up with another way that makes sense from a device driver
model. And it's device driver model stuff here, not what's really going on
at the hardware level when everything is set up.

> In any case, I guess I'll continue asserting that I'm not going to try
> to solve this problem. If folks don't like my patch and there's no
> suggestion other than solving years-old problems then I'm happy to
> live with the way things are and hope that someone eventually comes
> along and solves it.

See my other reply, I do think there's an intermediate solution, without
the maintenance headache of a "these drivers work for this extremely
narrow special case" list. And without having to step into the device_link
complexity.

But also I think we've piled up years of hacks by now because people don't
want to solve the fundamental problem here, which device links are meant
to be the solution for. But I guess we can do another few years of
papering over the fundamental crack, *shrugs*

Cheers, Sima
Douglas Anderson June 18, 2024, 11:49 p.m. UTC | #17
Hi,

On Mon, Jun 17, 2024 at 7:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > That all being said, I'm also totally OK with any of the following:
> >
> > 1. Dropping my patch and just accepting that we will have warnings
> > printed out for all DRM drivers that do things correctly and have no
> > warnings for broken DRM drivers.
>
> Can't we just flip the warnings around? Like make the hacky cleanup
> conditional on the panel not yet being disabled/cleaned up, and complain
> in that case only. With that:
> - drivers which call shutdown shouldn't get a warning anymore, and also
>   not a surplus call to drm_panel_disable/unprepare
> - drivers which are broken still get the cleanup calls
> - downside: we can't enforce this, because it's not yet enforced through
>   device_link ordering

I feel like something is getting lost in the discussion here. I'm just
not sure where to put the hacky cleanup without either having a list
like I have or fixing the device link problem so that the DRM device
shutdown gets called before the panel. Specifically, right now I think
it's possible for the panel's shutdown() callback to happen before the
DRM Device's shutdown(). Thus, we have:

1. Panel shutdown() checks and says "it's not shutdown yet so do my
hacky cleanup."
2. DRM device shutdown() gets called and tries to cleanup again.

...and thus in step #1 we can't detect a broken DRM device. What am I missing?


> > 2. Someone else posting / landing a patch to remove the hacky "disable
> > / unprepare" for panel-simple and panel-edp and asserting that they
> > don't care if they break any DRM drivers that are still broken. I
> > don't want to be involved in authoring or landing this patch, but I
> > won't scream loudly if others want to do it.
> >
> > 3. Someone else taking over trying to solve this problem.
> >
> > ...mostly this work is janitorial and I'm trying to help move the DRM
> > framework forward and get rid of cruft, so if it's going to cause too
> > much conflict I'm fine just stepping back.
>
> My issue is that you're trading an ugly warning that hurts maintenance
> with an explicit list of working drivers, which also hurts maintenance.
> Does seem like forward progress to me, just pushing the issue around.

IMO it at least moves things forward. If we make the warning obvious
enough then I think we could assert that, within a few kernel
versions, everyone who hit the warning would have addressed it. Once
that happens we can fully get rid of the ugly list and just make the
assumption that things are good. That feels like a clear path to me...

-Doug
Douglas Anderson June 18, 2024, 11:49 p.m. UTC | #18
Hi,

On Mon, Jun 17, 2024 at 7:22 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > I'm really not convinced that hacking with device links in order to
> > get the shutdown notification in the right order is correct, though.
> > The idea is that after we're confident that all DRM modeset drivers
> > are calling shutdown properly that we should _remove_ any code
> > handling shutdown from panel-edp and panel-simple. They should just
> > get disabled as part of DRM's shutdown. That means that if we messed
> > with devicelinks just to get a different shutdown order that it was
> > just for a short term thing.
>
> The device links would allow us to add consistency checks to the panel
> code to make sure that the shutdown has already happened.
>
> And we do kinda need the device ordering still, because if they're shut
> down in the wrong order the panel might lose it's power already, before
> the drm driver had a chance to have the last chat with it. Only relevant
> for non-dumb panels like dsi, but there's cases.

My impression is that we shouldn't be relying on the driver-level
"shutdown" call at all but should instead be relying on DRM core to
call us at the right times. I get this impression based on the fact
that panel drivers are encouraged _not_ to implement a shutdown()
callback but instead to rely on the DRM driver to do an orderly power
off of things (like via drm_atomic_helper_shutdown()) at shutdown
time.

I would also tend to say that for suspend/resume that things are more
complicated than the driver model can really account for, which again
is why DRM devices have prepare() and enable() phases with complicated
rules about the ordering that the bridge chain runs those functions
in.

Said another way, I believe I could re-phrase your paragraph above and
say the exact opposite and it would be just as true:

And we do kinda need the device ordering still, because if they're
shut down in the wrong order then the DRM device might lose its power
already, before the panel driver has a chance to use it to send a few
last commands to the panel.

...but that would imply the exact opposite ordering. The issue is that
there are established rules for the order things are powered on and
off and those rules are encoded in the orders that prepare() and
enable() are called. Trying to replicate these rules in the driver
model just seems like something destined to fail and probably causes
everyone who attempts this to give up.


> > That being said, one could argue that having device links between the
> > DRM device and the panel is the right thing long term anyway and that
> > may well be. I guess the issue is that it's not necessarily obvious
> > how the "parent/child" or "supplier/consumer" relationship works w/
> > DRM devices, especially panels. My instinct says that the panel
> > logically is a "child" or "consumer" of the DRM device and thus
> > inserting the correct long term device link would mean we'd get
> > shutdown notification in the wrong order. It would be hard to argue
> > that the panel is the "parent" of a DRM device, but I guess you could
> > call it a "supplier"? ...but it's also a "consumer" of some other
> > stuff, like the pixels being output and also (perhaps) the DP AUX bus.
> > All this complexity is why the DRM framework tends to use its own
> > logic for things like prepare/enable instead of using what Linux gives
> > you. I'm sure Saravanah can also tell you about all the crazy device
> > link circular dependencies that DRM has thrown him through...
>
> The panel driver provides the panel, the drm device driver consumes it.
> I'm not really clear why you want to structure this the other way round, I
> can't come up with another way that makes sense from a device driver
> model. And it's device driver model stuff here, not what's really going on
> at the hardware level when everything is set up.

...but, at least on eDP, the DRM device driver provides the DP AUX bus
and the panel consumes it. This is the reverse order. Perhaps you
could say that you should have a separate "struct device" for the DP
AUX bus and the panel consumes the DP AUX bus device but then the DRM
device consumes the panel?

-Doug
Daniel Vetter June 21, 2024, 4:04 p.m. UTC | #19
On Tue, Jun 18, 2024 at 04:49:22PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Mon, Jun 17, 2024 at 7:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > That all being said, I'm also totally OK with any of the following:
> > >
> > > 1. Dropping my patch and just accepting that we will have warnings
> > > printed out for all DRM drivers that do things correctly and have no
> > > warnings for broken DRM drivers.
> >
> > Can't we just flip the warnings around? Like make the hacky cleanup
> > conditional on the panel not yet being disabled/cleaned up, and complain
> > in that case only. With that:
> > - drivers which call shutdown shouldn't get a warning anymore, and also
> >   not a surplus call to drm_panel_disable/unprepare
> > - drivers which are broken still get the cleanup calls
> > - downside: we can't enforce this, because it's not yet enforced through
> >   device_link ordering
> 
> I feel like something is getting lost in the discussion here. I'm just
> not sure where to put the hacky cleanup without either having a list
> like I have or fixing the device link problem so that the DRM device
> shutdown gets called before the panel. Specifically, right now I think
> it's possible for the panel's shutdown() callback to happen before the
> DRM Device's shutdown(). Thus, we have:
> 
> 1. Panel shutdown() checks and says "it's not shutdown yet so do my
> hacky cleanup."
> 2. DRM device shutdown() gets called and tries to cleanup again.
> 
> ...and thus in step #1 we can't detect a broken DRM device. What am I missing?

The above is a broken drm driver, because shutting down something that the
main drm driver needs _before_ it is shut down itself is broken. That's
why we need the device link.

So if this case goes a bit wrong that's imo ok. That was my point that
without device links, we cannot have _any_ warning at all, but we can at
least make sure that correct drivers, meaning:
- they make sure the panel is around for longer than the drm device
- and they call drm atomic_helper_shutdown in the right places

Wont either double-shutdown the panel or get the warning.

It's not great, but it's at least better than the current situation where
correct drivers get a warning, and some broken drivers don't. So still an
improvement.

That leaves us with the issue of warning for all broken drivers. We have
two proposals for that:

- Your explicit list, which is a pain imo, and I'm not seeing the benefit
  of this, because it'll encourage each driver to hack around the core
  code bug of not having the right device links.

- Fixing this with a device link and adding the warning for everyone.

This isn't a great situation, because it's likely going to be another few
years without the device link situation sorted out. But that's been the
case already for years so *shrug*.

> > > 2. Someone else posting / landing a patch to remove the hacky "disable
> > > / unprepare" for panel-simple and panel-edp and asserting that they
> > > don't care if they break any DRM drivers that are still broken. I
> > > don't want to be involved in authoring or landing this patch, but I
> > > won't scream loudly if others want to do it.
> > >
> > > 3. Someone else taking over trying to solve this problem.
> > >
> > > ...mostly this work is janitorial and I'm trying to help move the DRM
> > > framework forward and get rid of cruft, so if it's going to cause too
> > > much conflict I'm fine just stepping back.
> >
> > My issue is that you're trading an ugly warning that hurts maintenance
> > with an explicit list of working drivers, which also hurts maintenance.
> > Does seem like forward progress to me, just pushing the issue around.
> 
> IMO it at least moves things forward. If we make the warning obvious
> enough then I think we could assert that, within a few kernel
> versions, everyone who hit the warning would have addressed it. Once
> that happens we can fully get rid of the ugly list and just make the
> assumption that things are good. That feels like a clear path to me...

Yeah, but your warning I think just encourages more hacks in drivers that
shouldn't be there (for the ordering issue). So I'm not sure it's strictly
better.

And we have _tons_ of drm driver api misuse that we don't catch with
warnings and checks. Sometimes that's just not possible, because the
situation is too messy. If we add an explicit "I'm not broken" list for
each such case, we will have an unmaintainable mess. Sometimes a "I'm the
last broken driver" flag makes sense, but even there I'm cautious that
it's a bright idea.

Cheers, Sima
Daniel Vetter June 21, 2024, 4:14 p.m. UTC | #20
On Tue, Jun 18, 2024 at 04:49:32PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Mon, Jun 17, 2024 at 7:22 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > I'm really not convinced that hacking with device links in order to
> > > get the shutdown notification in the right order is correct, though.
> > > The idea is that after we're confident that all DRM modeset drivers
> > > are calling shutdown properly that we should _remove_ any code
> > > handling shutdown from panel-edp and panel-simple. They should just
> > > get disabled as part of DRM's shutdown. That means that if we messed
> > > with devicelinks just to get a different shutdown order that it was
> > > just for a short term thing.
> >
> > The device links would allow us to add consistency checks to the panel
> > code to make sure that the shutdown has already happened.
> >
> > And we do kinda need the device ordering still, because if they're shut
> > down in the wrong order the panel might lose it's power already, before
> > the drm driver had a chance to have the last chat with it. Only relevant
> > for non-dumb panels like dsi, but there's cases.
> 
> My impression is that we shouldn't be relying on the driver-level
> "shutdown" call at all but should instead be relying on DRM core to
> call us at the right times. I get this impression based on the fact
> that panel drivers are encouraged _not_ to implement a shutdown()
> callback but instead to rely on the DRM driver to do an orderly power
> off of things (like via drm_atomic_helper_shutdown()) at shutdown
> time.
> 
> I would also tend to say that for suspend/resume that things are more
> complicated than the driver model can really account for, which again
> is why DRM devices have prepare() and enable() phases with complicated
> rules about the ordering that the bridge chain runs those functions
> in.
> 
> Said another way, I believe I could re-phrase your paragraph above and
> say the exact opposite and it would be just as true:
> 
> And we do kinda need the device ordering still, because if they're
> shut down in the wrong order then the DRM device might lose its power
> already, before the panel driver has a chance to use it to send a few
> last commands to the panel.
> 
> ...but that would imply the exact opposite ordering. The issue is that
> there are established rules for the order things are powered on and
> off and those rules are encoded in the orders that prepare() and
> enable() are called. Trying to replicate these rules in the driver
> model just seems like something destined to fail and probably causes
> everyone who attempts this to give up.

Both is true.

a) panels need a lot more ordering than what the pm core provides

b) we still need to make sure that all the _other_ things the pm core
manages (like power/clock domains) don't get shut down too early

If the panel is gone, there's no point in the drm driver calling unprepare
and disable in the right order, the thing is off already.

So yeah defacto this means:
- panel and bridge drivers should _not_ put anything that manages drm kms
  state into their pm callbacks. None of them. The only stuff they can do
  in there is essentially undo anything they do in ->probe. But then I
  have questions why they're wasting power even when not in use ...

- drm driver must be in charge of the entire sequence, and the
  panel/bridge drivers must do any pm management in their kms callbacks

- but we still need to make sure that the pm core doesn't shut down the
  panel/bridges too early, because there's parent resources that the pm
  core manages for us (and they might be entirely disjoint from the
  display device, e.g. if your panel is on an i2c bus, then that i2c must
  stay powered on until after the drm device has done the pm handling)

> > > That being said, one could argue that having device links between the
> > > DRM device and the panel is the right thing long term anyway and that
> > > may well be. I guess the issue is that it's not necessarily obvious
> > > how the "parent/child" or "supplier/consumer" relationship works w/
> > > DRM devices, especially panels. My instinct says that the panel
> > > logically is a "child" or "consumer" of the DRM device and thus
> > > inserting the correct long term device link would mean we'd get
> > > shutdown notification in the wrong order. It would be hard to argue
> > > that the panel is the "parent" of a DRM device, but I guess you could
> > > call it a "supplier"? ...but it's also a "consumer" of some other
> > > stuff, like the pixels being output and also (perhaps) the DP AUX bus.
> > > All this complexity is why the DRM framework tends to use its own
> > > logic for things like prepare/enable instead of using what Linux gives
> > > you. I'm sure Saravanah can also tell you about all the crazy device
> > > link circular dependencies that DRM has thrown him through...
> >
> > The panel driver provides the panel, the drm device driver consumes it.
> > I'm not really clear why you want to structure this the other way round, I
> > can't come up with another way that makes sense from a device driver
> > model. And it's device driver model stuff here, not what's really going on
> > at the hardware level when everything is set up.
> 
> ...but, at least on eDP, the DRM device driver provides the DP AUX bus
> and the panel consumes it. This is the reverse order.

Nah, still the same order like in a more separate panel driver:

- Whatever physical bus driver the panel sits on is the parent bus for the
  panel. Whether that's i2c, spi, dsi or dp aux kinda doesn't matter.

- The panel driver provides the panel for the drm_device. Note that
  drm_device is does have a struct device pointer, but that's just for
  nesting/linking/userspace device enumeration. There's _no_ pm
  relationship between the two

- It's up to the drm driver to figure out how to make sure it's shuts down
  the software drm_device _before_ any of the hardware pieces go away. For
  pci drivers this is usually pretty easy, for anything else you need a
  device link or it'll go wrong in corner cases.

> Perhaps you
> could say that you should have a separate "struct device" for the DP
> AUX bus and the panel consumes the DP AUX bus device but then the DRM
> device consumes the panel?

Yes. And I think for buses like the mipi dsi one we really should have
that. And indeed, we do:

struct mipi_dsi_device {
	struct mipi_dsi_host *host;
	struct device dev;

	...

So it all checks out.

The edp case is special because we don't need per-panel edp drivers,
because it's all specialized. Therefore the full driver model with a
full-blown struct device is overkill. But conceptually, it should be
there.

Cheers, Sima
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_panel.c b/drivers/gpu/drm/drm_panel.c
index cfbe020de54e..df3f15f4625e 100644
--- a/drivers/gpu/drm/drm_panel.c
+++ b/drivers/gpu/drm/drm_panel.c
@@ -161,6 +161,12 @@  int drm_panel_unprepare(struct drm_panel *panel)
 	if (!panel)
 		return -EINVAL;
 
+	/*
+	 * If you're seeing this warning, you either need to add your driver
+	 * to "drm_drivers_that_shutdown" (if you're seeing it with panel-edp
+	 * or panel-simple) or you need to remove the manual call to
+	 * drm_panel_unprepare() in your panel driver.
+	 */
 	if (!panel->prepared) {
 		dev_warn(panel->dev, "Skipping unprepare of already unprepared panel\n");
 		return 0;
@@ -245,6 +251,12 @@  int drm_panel_disable(struct drm_panel *panel)
 	if (!panel)
 		return -EINVAL;
 
+	/*
+	 * If you're seeing this warning, you either need to add your driver
+	 * to "drm_drivers_that_shutdown" (if you're seeing it with panel-edp
+	 * or panel-simple) or you need to remove the manual call to
+	 * drm_panel_disable() in your panel driver.
+	 */
 	if (!panel->enabled) {
 		dev_warn(panel->dev, "Skipping disable of already disabled panel\n");
 		return 0;
diff --git a/drivers/gpu/drm/panel/panel-drm-shutdown-check.h b/drivers/gpu/drm/panel/panel-drm-shutdown-check.h
new file mode 100644
index 000000000000..dfa8197e09fb
--- /dev/null
+++ b/drivers/gpu/drm/panel/panel-drm-shutdown-check.h
@@ -0,0 +1,151 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2024 Google Inc.
+ *
+ * This header is a temporary solution and is intended to be included
+ * directly by panel-edp.c and panel-simple.c.
+ *
+ * This header is needed because panel-edp and panel-simple are used by a
+ * wide variety of DRM drivers and it's hard to know for sure if all of the
+ * DRM drivers used by those panel drivers are properly calling
+ * drm_atomic_helper_shutdown() or drm_helper_force_disable_all() at
+ * shutdown/remove time.
+ *
+ * The plan for this header file:
+ * - Land it and hope that the warning print will encourage DRM drivers to
+ *   get fixed.
+ * - Eventually move to a WARN() splat for extra encouragement.
+ * - Assume that everyone has been fixed and remove this header file.
+ */
+
+#ifndef __PANEL_DRM_SHUTDOWN_CHECK_H__
+#define __PANEL_DRM_SHUTDOWN_CHECK_H__
+
+#include <drm/drm_bridge.h>
+#include <drm/drm_drv.h>
+
+/*
+ * This is a list of all DRM drivers that appear to properly call
+ * drm_atomic_helper_shutdown() or drm_helper_force_disable_all() at
+ * shutdown and remove time.
+ *
+ * We can't detect this dynamically and are stuck with a list because the panel
+ * driver's shutdown() call might be called _before_ the DRM driver's
+ * shutdown() call.
+ *
+ * NOTE: no verification has been done to confirm that the below drivers
+ * are actually _used_ with panel-simple or panel-edp, only that these drivers
+ * appear to be shutting down properly. It doesn't hurt to have extra drivers
+ * listed here as long as the list doesn't contain any drivers that are
+ * missing the shutdown calls.
+ */
+static const char * const drm_drivers_that_shutdown[] = {
+	"armada-drm",
+	"aspeed-gfx-drm",
+	"ast",
+	"atmel-hlcdc",
+	"bochs-drm",
+	"cirrus",
+	"exynos",
+	"fsl-dcu-drm",
+	"gm12u320",
+	"gud",
+	"hdlcd",
+	"hibmc",
+	"hx8357d",
+	"hyperv_drm",
+	"ili9163",
+	"ili9225",
+	"ili9341",
+	"ili9486",
+	"imx-dcss",
+	"imx-drm",
+	"imx-lcdc",
+	"imx-lcdif",
+	"ingenic-drm",
+	"kirin",
+	"komeda",
+	"logicvc-drm",
+	"loongson",
+	"mali-dp",
+	"mcde",
+	"meson",
+	"mgag200",
+	"mi0283qt",
+	"msm",
+	"mxsfb-drm",
+	"omapdrm",
+	"panel-mipi-dbi",
+	"pl111",
+	"qxl",
+	"rcar-du",
+	"repaper",
+	"rockchip",
+	"rzg2l-du",
+	"ssd130x",
+	"st7586",
+	"st7735r",
+	"sti",
+	"stm",
+	"sun4i-drm",
+	"tidss",
+	"tilcdc",
+	"tve200",
+	"vboxvideo",
+	"zynqmp-dpsub",
+	""
+};
+
+static void panel_shutdown_if_drm_driver_needs_fixing(struct drm_panel *panel)
+{
+	struct drm_bridge *bridge;
+	const struct drm_driver *driver;
+	const char * const *driver_name;
+
+	/*
+	 * Look for a bridge that shares the DT node of this panel. That only
+	 * works if we've been linked up with a panel_bridge.
+	 */
+	bridge = of_drm_find_bridge(panel->dev->of_node);
+	if (bridge && bridge->dev && bridge->dev->driver) {
+		/*
+		 * If the DRM driver for the bridge is known to be fine then
+		 * we're done.
+		 */
+		driver = bridge->dev->driver;
+		for (driver_name = drm_drivers_that_shutdown; *driver_name; driver_name++) {
+			if (strcmp(*driver_name, driver->name) == 0)
+				return;
+		}
+
+		/*
+		 * If you see the message below then:
+		 * 1. Make sure your DRM driver is properly calling
+		 *    drm_atomic_helper_shutdown() or drm_helper_force_disable_all()
+		 *    at shutdown time.
+		 * 2. Add your driver to the list.
+		 */
+		dev_warn(panel->dev,
+			 "DRM driver appears buggy; manually disable/unprepare\n");
+	} else {
+		/*
+		 * If you see the message below then your setup needs to
+		 * be moved to using a panel_bridge. This often happens
+		 * by calling devm_drm_of_get_bridge(). Having a panel without
+		 * an associated panel_bridge is deprecated.
+		 */
+		dev_warn(panel->dev,
+			 "Can't find DRM driver; manually disable/unprepare\n");
+	}
+
+	/*
+	 * If we don't know if a DRM driver is properly shutting things down
+	 * then we'll manually call the disable/unprepare. This is always a
+	 * safe thing to do (in that it won't cause you to crash), but it
+	 * does generate a warning.
+	 */
+	drm_panel_disable(panel);
+	drm_panel_unprepare(panel);
+}
+
+#endif
diff --git a/drivers/gpu/drm/panel/panel-edp.c b/drivers/gpu/drm/panel/panel-edp.c
index 67ab6915d6e4..26f89858df9d 100644
--- a/drivers/gpu/drm/panel/panel-edp.c
+++ b/drivers/gpu/drm/panel/panel-edp.c
@@ -42,6 +42,8 @@ 
 #include <drm/drm_edid.h>
 #include <drm/drm_panel.h>
 
+#include "panel-drm-shutdown-check.h"
+
 /**
  * struct panel_delay - Describes delays for a simple panel.
  */
@@ -948,22 +950,7 @@  static void panel_edp_shutdown(struct device *dev)
 {
 	struct panel_edp *panel = dev_get_drvdata(dev);
 
-	/*
-	 * NOTE: the following two calls don't really belong here. It is the
-	 * responsibility of a correctly written DRM modeset driver to call
-	 * drm_atomic_helper_shutdown() at shutdown time and that should
-	 * cause the panel to be disabled / unprepared if needed. For now,
-	 * however, we'll keep these calls due to the sheer number of
-	 * different DRM modeset drivers used with panel-edp. The fact that
-	 * we're calling these and _also_ the drm_atomic_helper_shutdown()
-	 * will try to disable/unprepare means that we can get a warning about
-	 * trying to disable/unprepare an already disabled/unprepared panel,
-	 * but that's something we'll have to live with until we've confirmed
-	 * that all DRM modeset drivers are properly calling
-	 * drm_atomic_helper_shutdown().
-	 */
-	drm_panel_disable(&panel->base);
-	drm_panel_unprepare(&panel->base);
+	panel_shutdown_if_drm_driver_needs_fixing(&panel->base);
 }
 
 static void panel_edp_remove(struct device *dev)
diff --git a/drivers/gpu/drm/panel/panel-simple.c b/drivers/gpu/drm/panel/panel-simple.c
index 8345ed891f5a..f505bc27e5ae 100644
--- a/drivers/gpu/drm/panel/panel-simple.c
+++ b/drivers/gpu/drm/panel/panel-simple.c
@@ -42,6 +42,8 @@ 
 #include <drm/drm_panel.h>
 #include <drm/drm_of.h>
 
+#include "panel-drm-shutdown-check.h"
+
 /**
  * struct panel_desc - Describes a simple panel.
  */
@@ -720,22 +722,7 @@  static void panel_simple_shutdown(struct device *dev)
 {
 	struct panel_simple *panel = dev_get_drvdata(dev);
 
-	/*
-	 * NOTE: the following two calls don't really belong here. It is the
-	 * responsibility of a correctly written DRM modeset driver to call
-	 * drm_atomic_helper_shutdown() at shutdown time and that should
-	 * cause the panel to be disabled / unprepared if needed. For now,
-	 * however, we'll keep these calls due to the sheer number of
-	 * different DRM modeset drivers used with panel-simple. The fact that
-	 * we're calling these and _also_ the drm_atomic_helper_shutdown()
-	 * will try to disable/unprepare means that we can get a warning about
-	 * trying to disable/unprepare an already disabled/unprepared panel,
-	 * but that's something we'll have to live with until we've confirmed
-	 * that all DRM modeset drivers are properly calling
-	 * drm_atomic_helper_shutdown().
-	 */
-	drm_panel_disable(&panel->base);
-	drm_panel_unprepare(&panel->base);
+	panel_shutdown_if_drm_driver_needs_fixing(&panel->base);
 }
 
 static void panel_simple_remove(struct device *dev)