diff mbox series

[v5,04/10] drm/bridge: add documentation of refcounted bridges

Message ID 20241231-hotplug-drm-bridge-v5-4-173065a1ece1@bootlin.com (mailing list archive)
State New
Headers show
Series Add support for hot-pluggable DRM bridges | expand

Commit Message

Luca Ceresoli Dec. 31, 2024, 10:39 a.m. UTC
Document in detail the new refcounted bridges as well as the "legacy" way.

Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

---

This patch was added in v5.
---
 Documentation/gpu/drm-kms-helpers.rst |   6 ++
 drivers/gpu/drm/drm_bridge.c          | 122 ++++++++++++++++++++++++++++++++++
 2 files changed, 128 insertions(+)

Comments

Randy Dunlap Dec. 31, 2024, 5:54 p.m. UTC | #1
Hi--

On 12/31/24 2:39 AM, Luca Ceresoli wrote:
> Document in detail the new refcounted bridges as well as the "legacy" way.
> 
> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> 
> ---
> 
> This patch was added in v5.
> ---
>  Documentation/gpu/drm-kms-helpers.rst |   6 ++
>  drivers/gpu/drm/drm_bridge.c          | 122 ++++++++++++++++++++++++++++++++++
>  2 files changed, 128 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-kms-helpers.rst b/Documentation/gpu/drm-kms-helpers.rst
> index 8cf2f041af4704875910ce8228ae04615d0f21bd..ca2cfef2101988933e1464fe146997c1a661a117 100644
> --- a/Documentation/gpu/drm-kms-helpers.rst
> +++ b/Documentation/gpu/drm-kms-helpers.rst
> @@ -151,6 +151,12 @@ Overview
>  .. kernel-doc:: drivers/gpu/drm/drm_bridge.c
>     :doc: overview
>  
> +Bridge lifecycle
> +----------------
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_bridge.c
> +   :doc: bridge lifecycle
> +
>  Display Driver Integration
>  --------------------------
>  
> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
> index 6255ef59f73d8041a8cb7f2c6e23e5a67d1ae926..e9f138aa5b3270b4e3a1a56dc8d4b7e5f993c929 100644
> --- a/drivers/gpu/drm/drm_bridge.c
> +++ b/drivers/gpu/drm/drm_bridge.c
> @@ -60,6 +60,128 @@
>   * encoder chain.
>   */
>  
> +/**
> + * DOC: bridge lifecycle
> + *
> + * Allocation, initializion and teardown of a bridge can be implemented in

                  initialization

> + * one of two ways: *refcounted* mode and *legacy* mode.
> + *
> + * In **refcounted** mode:
> + *
> + * - each &struct drm_bridge is reference counted since its instantiation
> + * - any code taking a pointer to a bridge has get and put APIs to refcount
> + *   it and so ensure the bridge won't be deallocated while using it
> + * - deallocation is done when the last put happens and the refcount drops
> + *   to zero
> + * - the driver instantiating the bridge also holds a reference, but the
> + *   allocated struct can survive it
> + *
> + * A bridge using refcounted mode is called a *refcounted bridge*.
> + *
> + * In **legacy** mode the &struct drm_bridge lifetime is tied to the device
> + * instantiating it: it is allocated on probe and freed on removal. Any
> + * other kernel entities holding a pointer to the bridge could incur in
> + * use-after-free in case the bridge is deallocated at runtime.
> + *
> + * Legacy mode used to be the only one until refcounted bridges were
> + * introduced, hance the name. It is still fine in case the bridges are a

                  hence

> + * fixed part of the pipeline, i.e. if the bridges are removed only when
> + * tearing down the entire card. Refcounted bridges support both that case
> + * and the case of more dynamic hardware with bridges that can be removed
> + * at runtime without tearing down the entire card.
> + *
> + * Usage of refcounted bridges happens in two sides: the driver
> + * implementing the bridge and the code using the bridge.
> + *
> + * For *drivers implemeting the bridge*, in both refcounted and legacy

                   implementing

> + * modes the common and expected pattern is that the driver declares a
> + * driver-specific struct embedding a &struct drm_bridge. E.g.::
Luca Ceresoli Jan. 2, 2025, 12:02 p.m. UTC | #2
Hello Randy,

On Tue, 31 Dec 2024 09:54:41 -0800
Randy Dunlap <rdunlap@infradead.org> wrote:

> Hi--
> 
> On 12/31/24 2:39 AM, Luca Ceresoli wrote:
> > Document in detail the new refcounted bridges as well as the "legacy" way.
> > 
> > Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> > 
> > ---

...

> > + * Allocation, initializion and teardown of a bridge can be implemented in  
> 
>                   initialization

Thanks for reviewing! I fixed locally the typos you spotted.

Luca
Maxime Ripard Jan. 6, 2025, 10:39 a.m. UTC | #3
Hi,

Most of these comments affect your earlier patches, but let's work on
the API-level view.

On Tue, Dec 31, 2024 at 11:39:58AM +0100, Luca Ceresoli wrote:
> + * When using refcounted mode, the driver should allocate ``struct
> + * my_bridge`` using regular allocation (as opposed to ``devm_`` or
> + * ``drmm_`` allocation), call drm_bridge_init() immediately afterwards to
> + * transfer lifecycle management to the DRM bridge core, and implement a
> + * ``.destroy`` function to deallocate the ``struct my_bridge``, as in this
> + * example::
> + *
> + *     static void my_bridge_destroy(struct drm_bridge *bridge)
> + *     {
> + *         kfree(container_of(bridge, struct my_bridge, bridge));
> + *     }
> + *
> + *     static const struct drm_bridge_funcs my_bridge_funcs = {
> + *         .destroy = my_bridge_destroy,
> + *         ...
> + *     };
> + *
> + *     static int my_bridge_probe(...)
> + *     {
> + *         struct my_bridge *mybr;
> + *         int err;
> + *
> + *         mybr = kzalloc(sizeof(*mybr), GFP_KERNEL);
> + *         if (!mybr)
> + *             return -ENOMEM;
> + *
> + *         err = drm_bridge_init(dev, &mybr->bridge, &my_bridge_funcs);
> + *         if (err)
> + *             return err;
> + *
> + *         ...
> + *         drm_bridge_add();
> + *         ...
> + *     }
> + *
> + *     static void my_bridge_remove()
> + *     {
> + *         struct my_bridge *mybr = ...;
> + *         drm_bridge_remove(&mybr->bridge);
> + *         // ... NO kfree here!
> + *     }

I'm a bit worried there, since that API is pretty difficult to get
right, and we don't have anything to catch bad patterns.

Let's take a step back. What we're trying to solve here is:

  1) We want to avoid any dangling pointers to a bridge if the bridge
     device is removed.

  2) To do so, we need to switch to reference counted allocations and
     pointers.

  3) Most bridges structures are allocated through devm_kzalloc, and they
     one that aren't are freed at remove time anyway, so the allocated
     structure will be gone when the device is removed.

  4) To properly track users, each user that will use a drm_bridge needs
     to take a reference.

AFAIU, the destroy introduction and the on-purpose omission of kfree in
remove is to solve 3.

Introducing a function that allocates the drm_bridge container struct
(like drmm_encoder_alloc for example), take a reference, register a devm
kfree action, and return the pointer to the driver structure would solve
that too pretty nicely.

So, something like:


struct driver_priv {
       struct drm_bridge bridge;

       ...
}

static int driver_probe(...)
{
	struct driver_priv *priv;
	struct drm_bridge *bridge;

        ....

	priv = devm_drm_bridge_alloc(dev, struct driver_priv, bridge);
	if (IS_ERR(priv))
	   return ERR_PTR(priv);
	bridge = &priv->bridge;

	...

	drm_bridge_add(bridge);
}

Would work just as well.

I also don't think we need explicit (at least for the common case)
drm_bridge_get and drm_bridge_put calls for bridge users.
drm_bridge_attach and drm_bridge_detach can get/put the reference
directly.

And we'll also need some flag in drm_bridge to indicate that the device
is gone, similar to what drm_dev_enter()/drm_dev_exit() provides,
because now your bridge driver sticks around for much longer than your
device so the expectation that your device managed resources (clocks,
registers, etc.) are always going to be around.

Maxime
Dmitry Baryshkov Jan. 6, 2025, 12:24 p.m. UTC | #4
On Mon, 6 Jan 2025 at 12:39, Maxime Ripard <mripard@kernel.org> wrote:
>
> Hi,
>
> Most of these comments affect your earlier patches, but let's work on
> the API-level view.
>
> On Tue, Dec 31, 2024 at 11:39:58AM +0100, Luca Ceresoli wrote:
> > + * When using refcounted mode, the driver should allocate ``struct
> > + * my_bridge`` using regular allocation (as opposed to ``devm_`` or
> > + * ``drmm_`` allocation), call drm_bridge_init() immediately afterwards to
> > + * transfer lifecycle management to the DRM bridge core, and implement a
> > + * ``.destroy`` function to deallocate the ``struct my_bridge``, as in this
> > + * example::
> > + *
> > + *     static void my_bridge_destroy(struct drm_bridge *bridge)
> > + *     {
> > + *         kfree(container_of(bridge, struct my_bridge, bridge));
> > + *     }
> > + *
> > + *     static const struct drm_bridge_funcs my_bridge_funcs = {
> > + *         .destroy = my_bridge_destroy,
> > + *         ...
> > + *     };
> > + *
> > + *     static int my_bridge_probe(...)
> > + *     {
> > + *         struct my_bridge *mybr;
> > + *         int err;
> > + *
> > + *         mybr = kzalloc(sizeof(*mybr), GFP_KERNEL);
> > + *         if (!mybr)
> > + *             return -ENOMEM;
> > + *
> > + *         err = drm_bridge_init(dev, &mybr->bridge, &my_bridge_funcs);
> > + *         if (err)
> > + *             return err;
> > + *
> > + *         ...
> > + *         drm_bridge_add();
> > + *         ...
> > + *     }
> > + *
> > + *     static void my_bridge_remove()
> > + *     {
> > + *         struct my_bridge *mybr = ...;
> > + *         drm_bridge_remove(&mybr->bridge);
> > + *         // ... NO kfree here!
> > + *     }
>
> I'm a bit worried there, since that API is pretty difficult to get
> right, and we don't have anything to catch bad patterns.
>
> Let's take a step back. What we're trying to solve here is:
>
>   1) We want to avoid any dangling pointers to a bridge if the bridge
>      device is removed.
>
>   2) To do so, we need to switch to reference counted allocations and
>      pointers.
>
>   3) Most bridges structures are allocated through devm_kzalloc, and they
>      one that aren't are freed at remove time anyway, so the allocated
>      structure will be gone when the device is removed.
>
>   4) To properly track users, each user that will use a drm_bridge needs
>      to take a reference.

5) Handle the disappearing next_bridge problem: probe() function gets
a pointer to the next bridge, but then for some reasons (e.g. because
of the other device being removed or because of some probe deferral)
the next_bridge driver gets unbdound and the next_bridge becomes
unusable before a call to drm_bridge_attach().

>
> AFAIU, the destroy introduction and the on-purpose omission of kfree in
> remove is to solve 3.
>
> Introducing a function that allocates the drm_bridge container struct
> (like drmm_encoder_alloc for example), take a reference, register a devm
> kfree action, and return the pointer to the driver structure would solve
> that too pretty nicely.
>
> So, something like:
>
>
> struct driver_priv {
>        struct drm_bridge bridge;
>
>        ...
> }
>
> static int driver_probe(...)
> {
>         struct driver_priv *priv;
>         struct drm_bridge *bridge;
>
>         ....
>
>         priv = devm_drm_bridge_alloc(dev, struct driver_priv, bridge);

Ah... And devm-cleanup will just drop a reference to that data,
freeing it when all refs are cleaned? Nice idea.

>         if (IS_ERR(priv))
>            return ERR_PTR(priv);
>         bridge = &priv->bridge;
>
>         ...
>
>         drm_bridge_add(bridge);
> }
>
> Would work just as well.
>
> I also don't think we need explicit (at least for the common case)
> drm_bridge_get and drm_bridge_put calls for bridge users.
> drm_bridge_attach and drm_bridge_detach can get/put the reference
> directly.

As I wrote previously, I think drm_bridge_attach() might be too late for that.
It sounds like drm_of_get_panel_or_bridge() and of_drm_find_bridge
should increment the refcount, possibly adding a devres action to put
the reference.

> And we'll also need some flag in drm_bridge to indicate that the device
> is gone, similar to what drm_dev_enter()/drm_dev_exit() provides,
> because now your bridge driver sticks around for much longer than your
> device so the expectation that your device managed resources (clocks,
> registers, etc.) are always going to be around.
Maxime Ripard Jan. 6, 2025, 2:49 p.m. UTC | #5
On Mon, Jan 06, 2025 at 02:24:00PM +0200, Dmitry Baryshkov wrote:
> On Mon, 6 Jan 2025 at 12:39, Maxime Ripard <mripard@kernel.org> wrote:
> >
> > Hi,
> >
> > Most of these comments affect your earlier patches, but let's work on
> > the API-level view.
> >
> > On Tue, Dec 31, 2024 at 11:39:58AM +0100, Luca Ceresoli wrote:
> > > + * When using refcounted mode, the driver should allocate ``struct
> > > + * my_bridge`` using regular allocation (as opposed to ``devm_`` or
> > > + * ``drmm_`` allocation), call drm_bridge_init() immediately afterwards to
> > > + * transfer lifecycle management to the DRM bridge core, and implement a
> > > + * ``.destroy`` function to deallocate the ``struct my_bridge``, as in this
> > > + * example::
> > > + *
> > > + *     static void my_bridge_destroy(struct drm_bridge *bridge)
> > > + *     {
> > > + *         kfree(container_of(bridge, struct my_bridge, bridge));
> > > + *     }
> > > + *
> > > + *     static const struct drm_bridge_funcs my_bridge_funcs = {
> > > + *         .destroy = my_bridge_destroy,
> > > + *         ...
> > > + *     };
> > > + *
> > > + *     static int my_bridge_probe(...)
> > > + *     {
> > > + *         struct my_bridge *mybr;
> > > + *         int err;
> > > + *
> > > + *         mybr = kzalloc(sizeof(*mybr), GFP_KERNEL);
> > > + *         if (!mybr)
> > > + *             return -ENOMEM;
> > > + *
> > > + *         err = drm_bridge_init(dev, &mybr->bridge, &my_bridge_funcs);
> > > + *         if (err)
> > > + *             return err;
> > > + *
> > > + *         ...
> > > + *         drm_bridge_add();
> > > + *         ...
> > > + *     }
> > > + *
> > > + *     static void my_bridge_remove()
> > > + *     {
> > > + *         struct my_bridge *mybr = ...;
> > > + *         drm_bridge_remove(&mybr->bridge);
> > > + *         // ... NO kfree here!
> > > + *     }
> >
> > I'm a bit worried there, since that API is pretty difficult to get
> > right, and we don't have anything to catch bad patterns.
> >
> > Let's take a step back. What we're trying to solve here is:
> >
> >   1) We want to avoid any dangling pointers to a bridge if the bridge
> >      device is removed.
> >
> >   2) To do so, we need to switch to reference counted allocations and
> >      pointers.
> >
> >   3) Most bridges structures are allocated through devm_kzalloc, and they
> >      one that aren't are freed at remove time anyway, so the allocated
> >      structure will be gone when the device is removed.
> >
> >   4) To properly track users, each user that will use a drm_bridge needs
> >      to take a reference.
> 
> 5) Handle the disappearing next_bridge problem: probe() function gets
> a pointer to the next bridge, but then for some reasons (e.g. because
> of the other device being removed or because of some probe deferral)
> the next_bridge driver gets unbdound and the next_bridge becomes
> unusable before a call to drm_bridge_attach().

Oh, right. We need to plumb it in drm_of_find_bridge somehow too.

> > AFAIU, the destroy introduction and the on-purpose omission of kfree in
> > remove is to solve 3.
> >
> > Introducing a function that allocates the drm_bridge container struct
> > (like drmm_encoder_alloc for example), take a reference, register a devm
> > kfree action, and return the pointer to the driver structure would solve
> > that too pretty nicely.
> >
> > So, something like:
> >
> >
> > struct driver_priv {
> >        struct drm_bridge bridge;
> >
> >        ...
> > }
> >
> > static int driver_probe(...)
> > {
> >         struct driver_priv *priv;
> >         struct drm_bridge *bridge;
> >
> >         ....
> >
> >         priv = devm_drm_bridge_alloc(dev, struct driver_priv, bridge);
> 
> Ah... And devm-cleanup will just drop a reference to that data,
> freeing it when all refs are cleaned? Nice idea.

Yup.

> >         if (IS_ERR(priv))
> >            return ERR_PTR(priv);
> >         bridge = &priv->bridge;
> >
> >         ...
> >
> >         drm_bridge_add(bridge);
> > }
> >
> > Would work just as well.
> >
> > I also don't think we need explicit (at least for the common case)
> > drm_bridge_get and drm_bridge_put calls for bridge users.
> > drm_bridge_attach and drm_bridge_detach can get/put the reference
> > directly.
> 
> As I wrote previously, I think drm_bridge_attach() might be too late for that.
> It sounds like drm_of_get_panel_or_bridge() and of_drm_find_bridge
> should increment the refcount, possibly adding a devres action to put
> the reference.

We probably need both. drm_bridge_attach adds the bridge pointer to the
encoder bridge_chain list, so if we had something like

bridge = drm_of_find_bridge();
drm_bridge_attach(encoder, bridge);
drm_bridge_put(bridge);

We could have a dangling pointer.

Maxime
diff mbox series

Patch

diff --git a/Documentation/gpu/drm-kms-helpers.rst b/Documentation/gpu/drm-kms-helpers.rst
index 8cf2f041af4704875910ce8228ae04615d0f21bd..ca2cfef2101988933e1464fe146997c1a661a117 100644
--- a/Documentation/gpu/drm-kms-helpers.rst
+++ b/Documentation/gpu/drm-kms-helpers.rst
@@ -151,6 +151,12 @@  Overview
 .. kernel-doc:: drivers/gpu/drm/drm_bridge.c
    :doc: overview
 
+Bridge lifecycle
+----------------
+
+.. kernel-doc:: drivers/gpu/drm/drm_bridge.c
+   :doc: bridge lifecycle
+
 Display Driver Integration
 --------------------------
 
diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
index 6255ef59f73d8041a8cb7f2c6e23e5a67d1ae926..e9f138aa5b3270b4e3a1a56dc8d4b7e5f993c929 100644
--- a/drivers/gpu/drm/drm_bridge.c
+++ b/drivers/gpu/drm/drm_bridge.c
@@ -60,6 +60,128 @@ 
  * encoder chain.
  */
 
+/**
+ * DOC: bridge lifecycle
+ *
+ * Allocation, initializion and teardown of a bridge can be implemented in
+ * one of two ways: *refcounted* mode and *legacy* mode.
+ *
+ * In **refcounted** mode:
+ *
+ * - each &struct drm_bridge is reference counted since its instantiation
+ * - any code taking a pointer to a bridge has get and put APIs to refcount
+ *   it and so ensure the bridge won't be deallocated while using it
+ * - deallocation is done when the last put happens and the refcount drops
+ *   to zero
+ * - the driver instantiating the bridge also holds a reference, but the
+ *   allocated struct can survive it
+ *
+ * A bridge using refcounted mode is called a *refcounted bridge*.
+ *
+ * In **legacy** mode the &struct drm_bridge lifetime is tied to the device
+ * instantiating it: it is allocated on probe and freed on removal. Any
+ * other kernel entities holding a pointer to the bridge could incur in
+ * use-after-free in case the bridge is deallocated at runtime.
+ *
+ * Legacy mode used to be the only one until refcounted bridges were
+ * introduced, hance the name. It is still fine in case the bridges are a
+ * fixed part of the pipeline, i.e. if the bridges are removed only when
+ * tearing down the entire card. Refcounted bridges support both that case
+ * and the case of more dynamic hardware with bridges that can be removed
+ * at runtime without tearing down the entire card.
+ *
+ * Usage of refcounted bridges happens in two sides: the driver
+ * implementing the bridge and the code using the bridge.
+ *
+ * For *drivers implemeting the bridge*, in both refcounted and legacy
+ * modes the common and expected pattern is that the driver declares a
+ * driver-specific struct embedding a &struct drm_bridge. E.g.::
+ *
+ *   struct my_bridge {
+ *       ...
+ *       struct drm_bridge bridge;
+ *       ...
+ *   };
+ *
+ * When using refcounted mode, the driver should allocate ``struct
+ * my_bridge`` using regular allocation (as opposed to ``devm_`` or
+ * ``drmm_`` allocation), call drm_bridge_init() immediately afterwards to
+ * transfer lifecycle management to the DRM bridge core, and implement a
+ * ``.destroy`` function to deallocate the ``struct my_bridge``, as in this
+ * example::
+ *
+ *     static void my_bridge_destroy(struct drm_bridge *bridge)
+ *     {
+ *         kfree(container_of(bridge, struct my_bridge, bridge));
+ *     }
+ *
+ *     static const struct drm_bridge_funcs my_bridge_funcs = {
+ *         .destroy = my_bridge_destroy,
+ *         ...
+ *     };
+ *
+ *     static int my_bridge_probe(...)
+ *     {
+ *         struct my_bridge *mybr;
+ *         int err;
+ *
+ *         mybr = kzalloc(sizeof(*mybr), GFP_KERNEL);
+ *         if (!mybr)
+ *             return -ENOMEM;
+ *
+ *         err = drm_bridge_init(dev, &mybr->bridge, &my_bridge_funcs);
+ *         if (err)
+ *             return err;
+ *
+ *         ...
+ *         drm_bridge_add();
+ *         ...
+ *     }
+ *
+ *     static void my_bridge_remove()
+ *     {
+ *         struct my_bridge *mybr = ...;
+ *         drm_bridge_remove(&mybr->bridge);
+ *         // ... NO kfree here!
+ *     }
+ *
+ * In legacy mode, the driver can either use ``devm_`` allocation or
+ * equivalently free ``struct my_bridge`` in their remove function::
+ *
+ *     static int my_bridge_probe(...)
+ *     {
+ *         struct my_bridge *mybr;
+ *
+ *         mybr = devm_kzalloc(dev, sizeof(*mybr), GFP_KERNEL);
+ *         if (!mybr)
+ *             return -ENOMEM;
+ *
+ *         ...
+ *         drm_bridge_add();
+ *         ...
+ *     }
+ *
+ *     static void my_bridge_remove()
+ *     {
+ *         struct my_bridge *mybr = ...;
+ *         drm_bridge_remove(&mybr->bridge);
+ *         // kfree(mybr) if not using devm_*() for allocation
+ *     }
+ *
+ * The *code using the bridge* is all the code taking a &struct drm_bridge
+ * pointer, including other bridges, encoders and the DRM core. As the
+ * bridge could be removed at any time, such code can incur in
+ * use-after-free. To void that, it has to call drm_bridge_get() when
+ * taking a pointer and drm_bridge_put() after it has done using it. This
+ * will extend the allocation lifetime of the bridge struct until the last
+ * reference has been put, potentially after the bridge device has been
+ * removed from the kernel.
+ *
+ * Calling drm_bridge_get() and drm_bridge_put() on a bridge that is not
+ * refcounted does nothing, so code using these two APIs will work both on
+ * refcounted bridges and non-refcounted ones.
+ */
+
 /**
  * DOC:	display driver integration
  *