Message ID | 20240510-hotplug-drm-bridge-v2-0-ec32f2c66d56@bootlin.com (mailing list archive) |
---|---|
Headers | show |
Series | Add support for GE SUNH hot-pluggable connector (was: "drm: add support for hot-pluggable bridges") | expand |
On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > Hello, > > this series aims at supporting a Linux device with a connector to > physically add and remove an add-on to/from the main device to augment its > features at runtime, using device tree overlays. > > This is the v2 of "drm: add support for hot-pluggable bridges" [0] which > was however more limited in scope, covering only the DRM aspects. This new > series also takes a different approach to the DRM bridge instantiation. > > [0] https://lore.kernel.org/all/20240326-hotplug-drm-bridge-v1-0-4b51b5eb75d5@bootlin.com/ > > Use case > ======== > > This series targets a professional product (GE SUNH) that is composed of a > "main" part running on battery, with the main SoC and able to work > autonomously with limited features, and an optional "add-on" that enables > more features by adding more hardware peripherals, some of which are on > non-discoverable busses such as I2C and MIPI DSI. > > The add-on can be connected and disconnected at runtime at any moment by > the end user, and add-on features need to be enabled and disabled > automatically at runtime. > > The add-on has status pins that are connected to GPIOs on the main board, > allowing the CPU to detect add-on insertion and removal. It also has a > reset GPIO allowign to reset all peripherals on the add-on at once. > > The features provided by the add-on include a display and a battery charger > to recharge the battery of the main part. The display on the add-on has an > LVDS input but the connector between the base and the add-on has a MIPI DSI > bus, so a DSI-to-LVDS bridge is present on the add-on. > > Different add-on models can be connected to the main part, and for this a > model ID is stored in the add-on itself so the software running on the CPU > on the main part knows which non-discoverable hardware to probe. > > Overall approach > ================ > > Device tree overlays appear as the most natural solution to support the > addition and removal of devices from a running system. > > Several features are missing from the mainline Linux kernel in order to > support this use case: > > 1. runtime (un)loading of device tree overlays is not supported Not true. Device specific applying of overlays has been supported since we merged DT overlay support. What's not supported is a general purpose interface to userspace to change any part of the DT at any point in time. > 2. if enabled, overlay (un)loading exposes several bugs Hence why there is no general purpose interface. > 3. the DRM subsystem assumes video bridges are non-removable Rob
Hello Rob, On Fri, 10 May 2024 11:44:49 -0500 Rob Herring <robh@kernel.org> wrote: > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: [...] > > Overall approach > > ================ > > > > Device tree overlays appear as the most natural solution to support the > > addition and removal of devices from a running system. > > > > Several features are missing from the mainline Linux kernel in order to > > support this use case: > > > > 1. runtime (un)loading of device tree overlays is not supported > > Not true. Device specific applying of overlays has been supported > since we merged DT overlay support. What's not supported is a general > purpose interface to userspace to change any part of the DT at any point > in time. I think I should replace "supported" with "exposed [to user space]". In other words, there is no user of of_overlay_fdt_apply() / of_overlay_remove*() in mainline Linux, except in unittest. Would it be a correct rewording? > > 2. if enabled, overlay (un)loading exposes several bugs > > Hence why there is no general purpose interface. Absolutely. Best regards, Luca
Apologies for missing v1 ... On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > DRM hotplug bridge driver > ========================= > > DRM natively supports pipelines whose display can be removed, but all the > components preceding it (all the display controller and any bridges) are > assumed to be fixed and cannot be plugged, removed or modified at runtime. > > This series adds support for DRM pipelines having a removable part after > the encoder, thus also allowing bridges to be removed and reconnected at > runtime, possibly with different components. > > This picture summarizes the DRM structure implemented by this series: > > .------------------------. > | DISPLAY CONTROLLER | > | .---------. .------. | > | | ENCODER |<--| CRTC | | > | '---------' '------' | > '------|-----------------' > | > |DSI HOTPLUG > V CONNECTOR > .---------. .--. .-. .---------. .-------. > | 0 to N | | _| _| | | 1 to N | | | > | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | > | | | | | | | | | | > '---------' '--' '-' '---------' '-------' > > [--- fixed components --] [----------- removable add-on -----------] > > Fixed components include: > > * all components up to the DRM encoder, usually part of the SoC > * optionally some bridges, in the SoC and/or as external chips > > Components on the removable add-on include: > > * one or more bridges > * a fixed connector (not one natively supporting hotplug such as HDMI) > * the panel So I think at a high level this design approach makes sense, but the implementation needs some serious thought. One big thing upfront though, we need to have a clear plan for the overlay hotunload issues, otherwise trying to make drm bridges hotpluggable makes no sense to me. Hotunload is very, very tricky, full of lifetime issues, and those need to be sorted out first or we're just trying to build a castle on quicksand. For bridges itself I don't think the current locking works. You're trying to really cleverly hide it all behind a normal-looking bridge driver, but there's many things beyond that which will blow up if bridges just disappear. Most importantly the bridge states part of an atomic update. Now in drm we have drm_connector as the only hotunpluggable thing, and it took years to sort out all the issues. I think we should either model the bridge hotunplug locking after that, or just outright reuse the connector locking and lifetime rules. I much prefer the latter personally. Anyway the big issues: - We need to refcount the hotpluggable bridges, because software (like atomic state updates) might hang onto pointers for longer than the bridge physically exists. Assuming that you can all tear it down synchronously will not work. If we reuse connector locking/lifetime then we could put the hotpluggable part of the bridge chain into the drm_connector, since that already has refcounting as needed. It would mean that finding the next bridge in the chain becomes a lot more tricky though. With that model we'd create a new connector every time the bridge is hotplugged, which I think is also the cleaner model (because you might plug in a hdmi connector after a panel, so things like the connector type change). - No notifiers please. The create a locking mess with inversions, and especially for hotunplug they create the illusion that you can synchronously keep up to date with hardware state. That's not possible. Fundamentally all bridge drivers which might be hotunplugged need to be able to cope with the hardware disappearing any momemnt. Most likely changes/fixes we need to make overlay hotunload work will impact how exactly this works all ... Also note that the entire dance around correctly stopping userspace from doing modesets on, see all the relevant changes in update_connector_routing(). Relying on hotplugging connectors will sort out a lot of these issues in a consistent way. - Related to this: You're not allowed to shut down hardware behind the user's back with drm_atomic_helper_shutdown. We've tried that approach with dp mst, it really pisses off userspace when a page_flip that it expected to work doesn't work. - There's also the design aspect that in atomic, only atomic_check is allowed to fail, atomic_commit must succeed, even when the hardware is gone. Using connectors and their refcounting should help with that. - Somewhat aside, but I noticed that the bridge->atomic_reset is in drm_bridge_attach, and that's kinda the wrong place. It should be in drm_mode_config_reset, like all the other ->atomic_reset hooks. That would make it a lot clearer that we need to figure out who/when ->atomic_reset should be called for hotplugged bridges, maybe as part of connector registration when the entire bridge and it's new connector is assembled? - Finally this very much means we need to rethink who/how the connector for a bridge is created. The new design is that the main driver creates this connector, once the entire bridge exists. But with hotplugging this gets a lot more complicated, so we might want to extract a pile of that encoder related code from drivers (same way dp mst helpers take care of connector creation too, it's just too much of a mess otherwise). The current bridge chaining infrastructure requires a lot of hand-rolled code in each bridge driver and the encoder, so that might be a good thing anyway. - Finally I think the entire bridge hotplug infrastructure should be irrespective of the underlying bus. Which means for the mipi dsi case we might also want to look into what's missing to make mipi dsi hotunpluggable, at least for the case where it's a proper driver. I think we should ignore the old bridge model where driver's stitched it all toghether using the component framework, in my opinion that approach should be deprecated. - Finally I think we should have a lot of safety checks, like only bridges which declare themselve to be hotunplug safe should be allowed as a part of the hotpluggable bridge chain part. All others must still be attached before the entire driver is registered with drm_dev_register. Or that we only allow bridges with the NO_CONNECTOR flag for drm_bridge_attach. There's probably a pile more fundamental issues I've missed, but this should get a good discussion started. -Sima
Hello Daniel, On Thu, 16 May 2024 15:22:01 +0200 Daniel Vetter <daniel@ffwll.ch> wrote: > Apologies for missing v1 ... > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > > DRM hotplug bridge driver > > ========================= > > > > DRM natively supports pipelines whose display can be removed, but all the > > components preceding it (all the display controller and any bridges) are > > assumed to be fixed and cannot be plugged, removed or modified at runtime. > > > > This series adds support for DRM pipelines having a removable part after > > the encoder, thus also allowing bridges to be removed and reconnected at > > runtime, possibly with different components. > > > > This picture summarizes the DRM structure implemented by this series: > > > > .------------------------. > > | DISPLAY CONTROLLER | > > | .---------. .------. | > > | | ENCODER |<--| CRTC | | > > | '---------' '------' | > > '------|-----------------' > > | > > |DSI HOTPLUG > > V CONNECTOR > > .---------. .--. .-. .---------. .-------. > > | 0 to N | | _| _| | | 1 to N | | | > > | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | > > | | | | | | | | | | > > '---------' '--' '-' '---------' '-------' > > > > [--- fixed components --] [----------- removable add-on -----------] > > > > Fixed components include: > > > > * all components up to the DRM encoder, usually part of the SoC > > * optionally some bridges, in the SoC and/or as external chips > > > > Components on the removable add-on include: > > > > * one or more bridges > > * a fixed connector (not one natively supporting hotplug such as HDMI) > > * the panel > > So I think at a high level this design approach makes sense, Good starting point :) > but the > implementation needs some serious thought. One big thing upfront though, > we need to have a clear plan for the overlay hotunload issues, otherwise > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is > very, very tricky, full of lifetime issues, and those need to be sorted > out first or we're just trying to build a castle on quicksand. > > For bridges itself I don't think the current locking works. You're trying > to really cleverly hide it all behind a normal-looking bridge driver, but > there's many things beyond that which will blow up if bridges just > disappear. Most importantly the bridge states part of an atomic update. Surely possible as atomic updates are definitely not stimulated in my use case. Can you recommend any testing tools to be able to trigger any issues? The main setups I used for my testing so far are 'modetest -s' for my daily work and a simple weston setup to periodically test a complete user space stack. > Now in drm we have drm_connector as the only hotunpluggable thing, and it > took years to sort out all the issues. I think we should either model the > bridge hotunplug locking after that, or just outright reuse the connector > locking and lifetime rules. I much prefer the latter personally. > > Anyway the big issues: > > - We need to refcount the hotpluggable bridges, because software (like > atomic state updates) might hang onto pointers for longer than the > bridge physically exists. Assuming that you can all tear it down > synchronously will not work. > > If we reuse connector locking/lifetime then we could put the > hotpluggable part of the bridge chain into the drm_connector, since that > already has refcounting as needed. It would mean that finding the next > bridge in the chain becomes a lot more tricky though. With that model > we'd create a new connector every time the bridge is hotplugged, which I > think is also the cleaner model (because you might plug in a hdmi > connector after a panel, so things like the connector type change). I have been investigating the option of adding/removing a connector based on hot-plug/unplug events initially, see my reply to Maxime after v1 [0]: > The first approach is based on removing the drm_connector. My laptop > uses the i915 driver, and I have observed that attaching/removing a > USB-C dock with an HDMI connector connected to a monitor, a new > drm_connector appears/disappears for the card. User space gets notified > and the external monitor is enabled/disabled, just the way a desktop > user would expect, so this is possible. I had a look at the driver but > how this magic happens was not clear to me honestly. So if you could point to where/how this is done, I would certainly reconsider that. > - No notifiers please. The create a locking mess with inversions, and > especially for hotunplug they create the illusion that you can > synchronously keep up to date with hardware state. That's not possible. > Fundamentally all bridge drivers which might be hotunplugged need to be > able to cope with the hardware disappearing any momemnt. Can you clarify this point? I'm sorry I fail to understand the relationship between the use of notifiers and the ability of bridge drivers to cope with hardware disappearing. In this patch, the hotplug-bridge uses notifiers to be informed when the following bridge is disappearing: which other way would you suggest to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is not meant to disappear, and it has no actual hardware that can go away, being just a mechanical connector. On the opposite side, the following bridges are physically removable and so their drivers will have to be fixed (if needed) to work when removal happens, but I don't see how that relates to the DRM core emitting a notification of such bridges being removed. > - Related to this: You're not allowed to shut down hardware behind the > user's back with drm_atomic_helper_shutdown. We've tried that approach > with dp mst, it really pisses off userspace when a page_flip that it > expected to work doesn't work. > > - There's also the design aspect that in atomic, only atomic_check is > allowed to fail, atomic_commit must succeed, even when the hardware is > gone. Using connectors and their refcounting should help with that. IIUC here you are suggesting again to remove the connector instead of marking it "disconnected". So, as above, pointers on how that is achieved would be helpful. > - Somewhat aside, but I noticed that the bridge->atomic_reset is in > drm_bridge_attach, and that's kinda the wrong place. It should be in > drm_mode_config_reset, like all the other ->atomic_reset hooks. That > would make it a lot clearer that we need to figure out who/when > ->atomic_reset should be called for hotplugged bridges, maybe as part of > connector registration when the entire bridge and it's new connector is > assembled? > > - Finally this very much means we need to rethink who/how the connector > for a bridge is created. The new design is that the main driver creates > this connector, once the entire bridge exists. But with hotplugging this > gets a lot more complicated, so we might want to extract a pile of that > encoder related code from drivers (same way dp mst helpers take care of > connector creation too, it's just too much of a mess otherwise). > > The current bridge chaining infrastructure requires a lot of hand-rolled > code in each bridge driver and the encoder, so that might be a good > thing anyway. > > - Finally I think the entire bridge hotplug infrastructure should be > irrespective of the underlying bus. Which means for the mipi dsi case we > might also want to look into what's missing to make mipi dsi > hotunpluggable, at least for the case where it's a proper driver. I > think we should ignore the old bridge model where driver's stitched it > all toghether using the component framework, in my opinion that approach > should be deprecated. The DSI side was one of my headaches on writing this driver, and I must say I dislike the code in hotplug_bridge_dsi_attach(), with the need for an initial "dummy" attach and detach the first time. At first sight I think we need a .update_format callback in struct mipi_dsi_host_ops so the DSI host is aware of a format change. Right now there are DSI host drivers which keep a copy of the struct mipi_dsi_device pointer and read the format from there when starting a stream (e.g. the tc358768.c driver [1]). That somewhat provides a way to react to format changes, but keeping a pointer is bad when the DSI device hot-unplugs, and the new format won't be seen until a disable/enable cycle. So a new callback looks more robust overall. Any opinion about this? > - Finally I think we should have a lot of safety checks, like only bridges > which declare themselve to be hotunplug safe should be allowed as a part > of the hotpluggable bridge chain part. All others must still be attached > before the entire driver is registered with drm_dev_register. I'm fine with the principle of a "HOTPLUG" flag. I wonder how that should be implemented, though. Based on my (surely simplistic) understanding, right now bridges can be both added and removed, but only in a specific sequence: 1. add all bridges 2. use the card 3. remove all bridges 4. EOF We'd need to change to allow: 1. add fixed bridges (including hotplug-bridge) 2. add bridges on removable add-on 3. use card 4. remove bridges on removable add-on 5. optionally goto 2 6. remove fixed bridges (including hotplug-bridge) 7. EOF One naïve idea is that the DRM core keeps a flag whenever any fixed bridge (no HOTLPUG flag) is removed and when that happens forbid adding bridges as we are now at line 5. Aside for tons of subtleties I'm surely missing, does this look a proper approach? I'd not be surprised if there is something better and more solid. > Or that we only allow bridges with the NO_CONNECTOR flag for > drm_bridge_attach. > > There's probably a pile more fundamental issues I've missed, but this > should get a good discussion started. Sure, I think it has. Bottom line, I'm clearly not seeing some issues that need to be considered, and that do not show up for my use case. Based on my limited DRM knowledge, this was expected, and I'm glad to work on those issues with some practical indications about the path forward. Luca [0] https://lore.kernel.org/all/20240327170849.0c14728d@booty/ [1] https://elixir.bootlin.com/linux/v6.9.1/source/drivers/gpu/drm/bridge/tc358768.c#L435
On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote: > Hello Daniel, > > On Thu, 16 May 2024 15:22:01 +0200 > Daniel Vetter <daniel@ffwll.ch> wrote: > > > Apologies for missing v1 ... > > > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > > > DRM hotplug bridge driver > > > ========================= > > > > > > DRM natively supports pipelines whose display can be removed, but all the > > > components preceding it (all the display controller and any bridges) are > > > assumed to be fixed and cannot be plugged, removed or modified at runtime. > > > > > > This series adds support for DRM pipelines having a removable part after > > > the encoder, thus also allowing bridges to be removed and reconnected at > > > runtime, possibly with different components. > > > > > > This picture summarizes the DRM structure implemented by this series: > > > > > > .------------------------. > > > | DISPLAY CONTROLLER | > > > | .---------. .------. | > > > | | ENCODER |<--| CRTC | | > > > | '---------' '------' | > > > '------|-----------------' > > > | > > > |DSI HOTPLUG > > > V CONNECTOR > > > .---------. .--. .-. .---------. .-------. > > > | 0 to N | | _| _| | | 1 to N | | | > > > | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | > > > | | | | | | | | | | > > > '---------' '--' '-' '---------' '-------' > > > > > > [--- fixed components --] [----------- removable add-on -----------] > > > > > > Fixed components include: > > > > > > * all components up to the DRM encoder, usually part of the SoC > > > * optionally some bridges, in the SoC and/or as external chips > > > > > > Components on the removable add-on include: > > > > > > * one or more bridges > > > * a fixed connector (not one natively supporting hotplug such as HDMI) > > > * the panel > > > > So I think at a high level this design approach makes sense, > > Good starting point :) > > > but the > > implementation needs some serious thought. One big thing upfront though, > > we need to have a clear plan for the overlay hotunload issues, otherwise > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is > > very, very tricky, full of lifetime issues, and those need to be sorted > > out first or we're just trying to build a castle on quicksand. > > > > For bridges itself I don't think the current locking works. You're trying > > to really cleverly hide it all behind a normal-looking bridge driver, but > > there's many things beyond that which will blow up if bridges just > > disappear. Most importantly the bridge states part of an atomic update. > > Surely possible as atomic updates are definitely not stimulated in my > use case. Can you recommend any testing tools to be able to trigger any > issues? Uh really hard ... You'd need to create an atomic commit that's blocked on a sync_file in-fence (so that you can extend the race window). And then hotunplug the bridge chain _before_ you signal that fence. That's not going to cover all possible races, but at least a large chunk of the really big ones. > The main setups I used for my testing so far are 'modetest -s' for my > daily work and a simple weston setup to periodically test a complete > user space stack. > > > Now in drm we have drm_connector as the only hotunpluggable thing, and it > > took years to sort out all the issues. I think we should either model the > > bridge hotunplug locking after that, or just outright reuse the connector > > locking and lifetime rules. I much prefer the latter personally. > > > > Anyway the big issues: > > > > - We need to refcount the hotpluggable bridges, because software (like > > atomic state updates) might hang onto pointers for longer than the > > bridge physically exists. Assuming that you can all tear it down > > synchronously will not work. > > > > If we reuse connector locking/lifetime then we could put the > > hotpluggable part of the bridge chain into the drm_connector, since that > > already has refcounting as needed. It would mean that finding the next > > bridge in the chain becomes a lot more tricky though. With that model > > we'd create a new connector every time the bridge is hotplugged, which I > > think is also the cleaner model (because you might plug in a hdmi > > connector after a panel, so things like the connector type change). > > I have been investigating the option of adding/removing a connector > based on hot-plug/unplug events initially, see my reply to Maxime after > v1 [0]: > > > The first approach is based on removing the drm_connector. My laptop > > uses the i915 driver, and I have observed that attaching/removing a > > USB-C dock with an HDMI connector connected to a monitor, a new > > drm_connector appears/disappears for the card. User space gets notified > > and the external monitor is enabled/disabled, just the way a desktop > > user would expect, so this is possible. I had a look at the driver but > > how this magic happens was not clear to me honestly. > > So if you could point to where/how this is done, I would certainly > reconsider that. Right now only the dp mst code uses hotplug/unplug (like you've observed in your testing with i915, usb-c docks are generally all dp mst). For code reading it might be best to start with the i915 dp mst code and then go through the helpers. > > - No notifiers please. The create a locking mess with inversions, and > > especially for hotunplug they create the illusion that you can > > synchronously keep up to date with hardware state. That's not possible. > > Fundamentally all bridge drivers which might be hotunplugged need to be > > able to cope with the hardware disappearing any momemnt. > > Can you clarify this point? I'm sorry I fail to understand the > relationship between the use of notifiers and the ability of bridge > drivers to cope with hardware disappearing. > > In this patch, the hotplug-bridge uses notifiers to be informed when > the following bridge is disappearing: which other way would you suggest > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is > not meant to disappear, and it has no actual hardware that can go away, > being just a mechanical connector. Yeah you need code to handle that. My point is that using a notifier is the wrong design, because the notifier has it's own locking. Which tends to get in the way. Instead I think that code should be directly in core bridge code (I don't see the benefit of having this entirely in a separate driver), using drm locking to make sure there's no races. > On the opposite side, the following bridges are physically removable > and so their drivers will have to be fixed (if needed) to work when > removal happens, but I don't see how that relates to the DRM core > emitting a notification of such bridges being removed. Yeah it's not directly related, just my experience that people assume notifiers provide you more synchronization and race-preventation than they really do. So it's better to hand-roll to make it all really explicit. > > - Related to this: You're not allowed to shut down hardware behind the > > user's back with drm_atomic_helper_shutdown. We've tried that approach > > with dp mst, it really pisses off userspace when a page_flip that it > > expected to work doesn't work. > > > > - There's also the design aspect that in atomic, only atomic_check is > > allowed to fail, atomic_commit must succeed, even when the hardware is > > gone. Using connectors and their refcounting should help with that. > > IIUC here you are suggesting again to remove the connector instead of > marking it "disconnected". So, as above, pointers on how that is > achieved would be helpful. See dp mst code. It's complex unfortunately, so there's some reading involved :-/ > > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in > > drm_bridge_attach, and that's kinda the wrong place. It should be in > > drm_mode_config_reset, like all the other ->atomic_reset hooks. That > > would make it a lot clearer that we need to figure out who/when > > ->atomic_reset should be called for hotplugged bridges, maybe as part of > > connector registration when the entire bridge and it's new connector is > > assembled? > > > > - Finally this very much means we need to rethink who/how the connector > > for a bridge is created. The new design is that the main driver creates > > this connector, once the entire bridge exists. But with hotplugging this > > gets a lot more complicated, so we might want to extract a pile of that > > encoder related code from drivers (same way dp mst helpers take care of > > connector creation too, it's just too much of a mess otherwise). > > > > The current bridge chaining infrastructure requires a lot of hand-rolled > > code in each bridge driver and the encoder, so that might be a good > > thing anyway. > > > > - Finally I think the entire bridge hotplug infrastructure should be > > irrespective of the underlying bus. Which means for the mipi dsi case we > > might also want to look into what's missing to make mipi dsi > > hotunpluggable, at least for the case where it's a proper driver. I > > think we should ignore the old bridge model where driver's stitched it > > all toghether using the component framework, in my opinion that approach > > should be deprecated. > > The DSI side was one of my headaches on writing this driver, and I > must say I dislike the code in hotplug_bridge_dsi_attach(), with the > need for an initial "dummy" attach and detach the first time. At first > sight I think we need a .update_format callback in struct > mipi_dsi_host_ops so the DSI host is aware of a format change. > > Right now there are DSI host drivers which keep a copy of the struct > mipi_dsi_device pointer and read the format from there when starting a > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way > to react to format changes, but keeping a pointer is bad when the DSI > device hot-unplugs, and the new format won't be seen until a > disable/enable cycle. So a new callback looks more robust overall. > > Any opinion about this? Yeah I don't like the code either. What might help for prototyping is if you start with a hotpluggeable bridge where the bridge is a i2c device. That way I think we should be able to benefit from the driver bind/unbind code that exists already. Although there's going to be issues with i2c hotunplug too in i2c core code. Then lift whatever we learn there to our dsi code. But essentially I think we should model the driver core model a lot more, so I guess you could use any hotunplug capable bus as a template. Usb might be closest, since that's also a packet/message based interface, mostly at least? > > - Finally I think we should have a lot of safety checks, like only bridges > > which declare themselve to be hotunplug safe should be allowed as a part > > of the hotpluggable bridge chain part. All others must still be attached > > before the entire driver is registered with drm_dev_register. > > I'm fine with the principle of a "HOTPLUG" flag. > > I wonder how that should be implemented, though. Based on my (surely > simplistic) understanding, right now bridges can be both added and > removed, but only in a specific sequence: > > 1. add all bridges > 2. use the card > 3. remove all bridges > 4. EOF > > We'd need to change to allow: > > 1. add fixed bridges (including hotplug-bridge) > 2. add bridges on removable add-on > 3. use card > 4. remove bridges on removable add-on > 5. optionally goto 2 > 6. remove fixed bridges (including hotplug-bridge) > 7. EOF > > One naïve idea is that the DRM core keeps a flag whenever any fixed > bridge (no HOTLPUG flag) is removed and when that happens forbid adding > bridges as we are now at line 5. If a fixed bridge is removed while the driver is still in use (i.e. before drm_dev_unregister is called), that's a driver bug. Would be good to catch that, which is why I think a lot of all the bridge hotplug handling should be in core bridge code and not the separate hotplug driver, so that we can enforce all these constraints. Also conceptually 3 can happen before 2 (but also before), and that's how dp mst works too. It does add all kinds of complications though ... > Aside for tons of subtleties I'm surely missing, does this look a > proper approach? I'd not be surprised if there is something better and > more solid. Yeah roughly. If you look through dp mst code then that also adds all kinds of structures (since dp mst is a routed network really), not just the connectors. They also all come with refcounts (because the network is a tree), but like I said I think for your case we can avoid the per-bridge refcounts by relying on the connector refcount we have already. But I might be overlooking something, and we need refcounting for each bridge like dp mst also needs refcounting for all its internal structures that map the entire hotpluggable display chain. If you want to read some dp mst code, these internal structures are ports/branches with struct drm_dp_mst_branch/port. > > Or that we only allow bridges with the NO_CONNECTOR flag for > > drm_bridge_attach. > > > > There's probably a pile more fundamental issues I've missed, but this > > should get a good discussion started. > > Sure, I think it has. > > Bottom line, I'm clearly not seeing some issues that need to be > considered, and that do not show up for my use case. Based on my > limited DRM knowledge, this was expected, and I'm glad to work on those > issues with some practical indications about the path forward. Yeah for learning I think dp mst is best. It's a bit complex, but since you have a dock you should be able to play around and experiment with the code with some real hardware. That should help to get a bit a feel for the complexity, since lots of opportunities for you to ask why/how locking/refcounting is used and against which potential issue they protect. Cheers, Sima
Hello Sima, thanks again for your comments. Not yet enough for me to clearly focus what you are suggesting, but getting closer. See my questions below. On Tue, 21 May 2024 14:01:19 +0200 Daniel Vetter <daniel@ffwll.ch> wrote: > On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote: > > Hello Daniel, > > > > On Thu, 16 May 2024 15:22:01 +0200 > > Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > Apologies for missing v1 ... > > > > > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > > > > DRM hotplug bridge driver > > > > ========================= > > > > > > > > DRM natively supports pipelines whose display can be removed, but all the > > > > components preceding it (all the display controller and any bridges) are > > > > assumed to be fixed and cannot be plugged, removed or modified at runtime. > > > > > > > > This series adds support for DRM pipelines having a removable part after > > > > the encoder, thus also allowing bridges to be removed and reconnected at > > > > runtime, possibly with different components. > > > > > > > > This picture summarizes the DRM structure implemented by this series: > > > > > > > > .------------------------. > > > > | DISPLAY CONTROLLER | > > > > | .---------. .------. | > > > > | | ENCODER |<--| CRTC | | > > > > | '---------' '------' | > > > > '------|-----------------' > > > > | > > > > |DSI HOTPLUG > > > > V CONNECTOR > > > > .---------. .--. .-. .---------. .-------. > > > > | 0 to N | | _| _| | | 1 to N | | | > > > > | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | > > > > | | | | | | | | | | > > > > '---------' '--' '-' '---------' '-------' > > > > > > > > [--- fixed components --] [----------- removable add-on -----------] > > > > > > > > Fixed components include: > > > > > > > > * all components up to the DRM encoder, usually part of the SoC > > > > * optionally some bridges, in the SoC and/or as external chips > > > > > > > > Components on the removable add-on include: > > > > > > > > * one or more bridges > > > > * a fixed connector (not one natively supporting hotplug such as HDMI) > > > > * the panel > > > > > > So I think at a high level this design approach makes sense, > > > > Good starting point :) > > > > > but the > > > implementation needs some serious thought. One big thing upfront though, > > > we need to have a clear plan for the overlay hotunload issues, otherwise > > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is > > > very, very tricky, full of lifetime issues, and those need to be sorted > > > out first or we're just trying to build a castle on quicksand. > > > > > > For bridges itself I don't think the current locking works. You're trying > > > to really cleverly hide it all behind a normal-looking bridge driver, but > > > there's many things beyond that which will blow up if bridges just > > > disappear. Most importantly the bridge states part of an atomic update. > > > > Surely possible as atomic updates are definitely not stimulated in my > > use case. Can you recommend any testing tools to be able to trigger any > > issues? > > Uh really hard ... You'd need to create an atomic commit that's blocked on > a sync_file in-fence (so that you can extend the race window). And then > hotunplug the bridge chain _before_ you signal that fence. > > That's not going to cover all possible races, but at least a large chunk > of the really big ones. > > > The main setups I used for my testing so far are 'modetest -s' for my > > daily work and a simple weston setup to periodically test a complete > > user space stack. > > > > > Now in drm we have drm_connector as the only hotunpluggable thing, and it > > > took years to sort out all the issues. I think we should either model the > > > bridge hotunplug locking after that, or just outright reuse the connector > > > locking and lifetime rules. I much prefer the latter personally. > > > > > > Anyway the big issues: > > > > > > - We need to refcount the hotpluggable bridges, because software (like > > > atomic state updates) might hang onto pointers for longer than the > > > bridge physically exists. Assuming that you can all tear it down > > > synchronously will not work. > > > > > > If we reuse connector locking/lifetime then we could put the > > > hotpluggable part of the bridge chain into the drm_connector, since that > > > already has refcounting as needed. It would mean that finding the next > > > bridge in the chain becomes a lot more tricky though. With that model > > > we'd create a new connector every time the bridge is hotplugged, which I > > > think is also the cleaner model (because you might plug in a hdmi > > > connector after a panel, so things like the connector type change). > > > > I have been investigating the option of adding/removing a connector > > based on hot-plug/unplug events initially, see my reply to Maxime after > > v1 [0]: > > > > > The first approach is based on removing the drm_connector. My laptop > > > uses the i915 driver, and I have observed that attaching/removing a > > > USB-C dock with an HDMI connector connected to a monitor, a new > > > drm_connector appears/disappears for the card. User space gets notified > > > and the external monitor is enabled/disabled, just the way a desktop > > > user would expect, so this is possible. I had a look at the driver but > > > how this magic happens was not clear to me honestly. > > > > So if you could point to where/how this is done, I would certainly > > reconsider that. > > Right now only the dp mst code uses hotplug/unplug (like you've observed > in your testing with i915, usb-c docks are generally all dp mst). For code > reading it might be best to start with the i915 dp mst code and then go > through the helpers. > > > > - No notifiers please. The create a locking mess with inversions, and > > > especially for hotunplug they create the illusion that you can > > > synchronously keep up to date with hardware state. That's not possible. > > > Fundamentally all bridge drivers which might be hotunplugged need to be > > > able to cope with the hardware disappearing any momemnt. > > > > Can you clarify this point? I'm sorry I fail to understand the > > relationship between the use of notifiers and the ability of bridge > > drivers to cope with hardware disappearing. > > > > In this patch, the hotplug-bridge uses notifiers to be informed when > > the following bridge is disappearing: which other way would you suggest > > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is > > not meant to disappear, and it has no actual hardware that can go away, > > being just a mechanical connector. > > Yeah you need code to handle that. My point is that using a notifier is > the wrong design, because the notifier has it's own locking. Which tends > to get in the way. > > Instead I think that code should be directly in core bridge code (I don't > see the benefit of having this entirely in a separate driver), using drm > locking to make sure there's no races. > > > On the opposite side, the following bridges are physically removable > > and so their drivers will have to be fixed (if needed) to work when > > removal happens, but I don't see how that relates to the DRM core > > emitting a notification of such bridges being removed. > > Yeah it's not directly related, just my experience that people assume > notifiers provide you more synchronization and race-preventation than they > really do. So it's better to hand-roll to make it all really explicit. > > > > - Related to this: You're not allowed to shut down hardware behind the > > > user's back with drm_atomic_helper_shutdown. We've tried that approach > > > with dp mst, it really pisses off userspace when a page_flip that it > > > expected to work doesn't work. > > > > > > - There's also the design aspect that in atomic, only atomic_check is > > > allowed to fail, atomic_commit must succeed, even when the hardware is > > > gone. Using connectors and their refcounting should help with that. > > > > IIUC here you are suggesting again to remove the connector instead of > > marking it "disconnected". So, as above, pointers on how that is > > achieved would be helpful. > > See dp mst code. It's complex unfortunately, so there's some reading > involved :-/ > > > > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in > > > drm_bridge_attach, and that's kinda the wrong place. It should be in > > > drm_mode_config_reset, like all the other ->atomic_reset hooks. That > > > would make it a lot clearer that we need to figure out who/when > > > ->atomic_reset should be called for hotplugged bridges, maybe as part of > > > connector registration when the entire bridge and it's new connector is > > > assembled? > > > > > > - Finally this very much means we need to rethink who/how the connector > > > for a bridge is created. The new design is that the main driver creates > > > this connector, once the entire bridge exists. But with hotplugging this > > > gets a lot more complicated, so we might want to extract a pile of that > > > encoder related code from drivers (same way dp mst helpers take care of > > > connector creation too, it's just too much of a mess otherwise). > > > > > > The current bridge chaining infrastructure requires a lot of hand-rolled > > > code in each bridge driver and the encoder, so that might be a good > > > thing anyway. > > > > > > - Finally I think the entire bridge hotplug infrastructure should be > > > irrespective of the underlying bus. Which means for the mipi dsi case we > > > might also want to look into what's missing to make mipi dsi > > > hotunpluggable, at least for the case where it's a proper driver. I > > > think we should ignore the old bridge model where driver's stitched it > > > all toghether using the component framework, in my opinion that approach > > > should be deprecated. > > > > The DSI side was one of my headaches on writing this driver, and I > > must say I dislike the code in hotplug_bridge_dsi_attach(), with the > > need for an initial "dummy" attach and detach the first time. At first > > sight I think we need a .update_format callback in struct > > mipi_dsi_host_ops so the DSI host is aware of a format change. > > > > Right now there are DSI host drivers which keep a copy of the struct > > mipi_dsi_device pointer and read the format from there when starting a > > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way > > to react to format changes, but keeping a pointer is bad when the DSI > > device hot-unplugs, and the new format won't be seen until a > > disable/enable cycle. So a new callback looks more robust overall. > > > > Any opinion about this? > > Yeah I don't like the code either. > > What might help for prototyping is if you start with a hotpluggeable > bridge where the bridge is a i2c device. That way I think we should be > able to benefit from the driver bind/unbind code that exists already. > Although there's going to be issues with i2c hotunplug too in i2c core > code. Good news: the bridge here is an I2C device (ti-sn65dsi84, dsi-to-lvds). > Then lift whatever we learn there to our dsi code. But essentially I think > we should model the driver core model a lot more, so I guess you could use > any hotunplug capable bus as a template. Usb might be closest, since > that's also a packet/message based interface, mostly at least? > > > > - Finally I think we should have a lot of safety checks, like only bridges > > > which declare themselve to be hotunplug safe should be allowed as a part > > > of the hotpluggable bridge chain part. All others must still be attached > > > before the entire driver is registered with drm_dev_register. > > > > I'm fine with the principle of a "HOTPLUG" flag. > > > > I wonder how that should be implemented, though. Based on my (surely > > simplistic) understanding, right now bridges can be both added and > > removed, but only in a specific sequence: > > > > 1. add all bridges > > 2. use the card > > 3. remove all bridges > > 4. EOF > > > > We'd need to change to allow: > > > > 1. add fixed bridges (including hotplug-bridge) > > 2. add bridges on removable add-on > > 3. use card > > 4. remove bridges on removable add-on > > 5. optionally goto 2 > > 6. remove fixed bridges (including hotplug-bridge) > > 7. EOF > > > > One naïve idea is that the DRM core keeps a flag whenever any fixed > > bridge (no HOTLPUG flag) is removed and when that happens forbid adding > > bridges as we are now at line 5. > > If a fixed bridge is removed while the driver is still in use (i.e. before > drm_dev_unregister is called), that's a driver bug. Would be good to catch > that, which is why I think a lot of all the bridge hotplug handling should > be in core bridge code and not the separate hotplug driver, so that we can > enforce all these constraints. > > Also conceptually 3 can happen before 2 (but also before), and that's how > dp mst works too. It does add all kinds of complications though ... > > > Aside for tons of subtleties I'm surely missing, does this look a > > proper approach? I'd not be surprised if there is something better and > > more solid. > > Yeah roughly. If you look through dp mst code then that also adds all > kinds of structures (since dp mst is a routed network really), not just > the connectors. They also all come with refcounts (because the network is > a tree), but like I said I think for your case we can avoid the per-bridge > refcounts by relying on the connector refcount we have already. > > But I might be overlooking something, and we need refcounting for each > bridge like dp mst also needs refcounting for all its internal structures > that map the entire hotpluggable display chain. If you want to read some > dp mst code, these internal structures are ports/branches with struct > drm_dp_mst_branch/port. > > > > Or that we only allow bridges with the NO_CONNECTOR flag for > > > drm_bridge_attach. > > > > > > There's probably a pile more fundamental issues I've missed, but this > > > should get a good discussion started. > > > > Sure, I think it has. > > > > Bottom line, I'm clearly not seeing some issues that need to be > > considered, and that do not show up for my use case. Based on my > > limited DRM knowledge, this was expected, and I'm glad to work on those > > issues with some practical indications about the path forward. > > Yeah for learning I think dp mst is best. It's a bit complex, but since > you have a dock you should be able to play around and experiment with the > code with some real hardware. > > That should help to get a bit a feel for the complexity, since lots of > opportunities for you to ask why/how locking/refcounting is used and > against which potential issue they protect. I had an initial quick look at DP MSI and it looks like a huge amount code where it's going to take forever to find my way without guidance. Keep in mind there is no DP hardware involved in this whole work: as I see it, DP MST is just one user (possibly the only one currently) of runtime connector insertion and removal. This is my understanding so far: 1. DP MST creates/destroys connectors at runtime, based on MST protocol info the MST devices use to announce themselves 2. the DRM core has no idea of how these connectors relate to each other (e.g. DP-1 is a child of DP-0, just another unrelated connector); only the drivers know, and implement this leveraging the dp_mst helpers 3. DP MST does not involve DRM bridges: everything after the "first" connector (USB-C for docks) is just "DP MST stuff" Point 3 is problematic as my primary need is exactly to have a removable bridge. Besides, its creation is not under my driver control, being done via device tree, and there could even be more than 1 bridge, or even no bridges at all on the removable side. I'd appreciate your guidance especially about: * test atomic change to expose any lifetime bugs, as you suggested * which are the exact APIs to handle connector insertion/removal (as done by DP MST) * comments about the three points above, especially point 3 Pointers to code or other specific indications that lead me to the exact code would be very appreciated, and definitely save me days of digging into complex code without a precise knowledge about where to go. Luca
Hello Sima, these days I started looking in more detail into some of the topics you had mentioned in your v2 review. I have questions about those I have investigated, see below. On Tue, 21 May 2024 14:01:19 +0200 Daniel Vetter <daniel@ffwll.ch> wrote: > On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote: > > Hello Daniel, > > > > On Thu, 16 May 2024 15:22:01 +0200 > > Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > Apologies for missing v1 ... > > > > > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > > > > DRM hotplug bridge driver > > > > ========================= > > > > > > > > DRM natively supports pipelines whose display can be removed, but all the > > > > components preceding it (all the display controller and any bridges) are > > > > assumed to be fixed and cannot be plugged, removed or modified at runtime. > > > > > > > > This series adds support for DRM pipelines having a removable part after > > > > the encoder, thus also allowing bridges to be removed and reconnected at > > > > runtime, possibly with different components. > > > > > > > > This picture summarizes the DRM structure implemented by this series: > > > > > > > > .------------------------. > > > > | DISPLAY CONTROLLER | > > > > | .---------. .------. | > > > > | | ENCODER |<--| CRTC | | > > > > | '---------' '------' | > > > > '------|-----------------' > > > > | > > > > |DSI HOTPLUG > > > > V CONNECTOR > > > > .---------. .--. .-. .---------. .-------. > > > > | 0 to N | | _| _| | | 1 to N | | | > > > > | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | > > > > | | | | | | | | | | > > > > '---------' '--' '-' '---------' '-------' > > > > > > > > [--- fixed components --] [----------- removable add-on -----------] > > > > > > > > Fixed components include: > > > > > > > > * all components up to the DRM encoder, usually part of the SoC > > > > * optionally some bridges, in the SoC and/or as external chips > > > > > > > > Components on the removable add-on include: > > > > > > > > * one or more bridges > > > > * a fixed connector (not one natively supporting hotplug such as HDMI) > > > > * the panel > > > > > > So I think at a high level this design approach makes sense, > > > > Good starting point :) > > > > > but the > > > implementation needs some serious thought. One big thing upfront though, > > > we need to have a clear plan for the overlay hotunload issues, otherwise > > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is > > > very, very tricky, full of lifetime issues, and those need to be sorted > > > out first or we're just trying to build a castle on quicksand. > > > > > > For bridges itself I don't think the current locking works. You're trying > > > to really cleverly hide it all behind a normal-looking bridge driver, but > > > there's many things beyond that which will blow up if bridges just > > > disappear. Most importantly the bridge states part of an atomic update. > > > > Surely possible as atomic updates are definitely not stimulated in my > > use case. Can you recommend any testing tools to be able to trigger any > > issues? > > Uh really hard ... You'd need to create an atomic commit that's blocked on > a sync_file in-fence (so that you can extend the race window). And then > hotunplug the bridge chain _before_ you signal that fence. > > That's not going to cover all possible races, but at least a large chunk > of the really big ones. > > > The main setups I used for my testing so far are 'modetest -s' for my > > daily work and a simple weston setup to periodically test a complete > > user space stack. > > > > > Now in drm we have drm_connector as the only hotunpluggable thing, and it > > > took years to sort out all the issues. I think we should either model the > > > bridge hotunplug locking after that, or just outright reuse the connector > > > locking and lifetime rules. I much prefer the latter personally. > > > > > > Anyway the big issues: > > > > > > - We need to refcount the hotpluggable bridges, because software (like > > > atomic state updates) might hang onto pointers for longer than the > > > bridge physically exists. Assuming that you can all tear it down > > > synchronously will not work. > > > > > > If we reuse connector locking/lifetime then we could put the > > > hotpluggable part of the bridge chain into the drm_connector, since that > > > already has refcounting as needed. It would mean that finding the next > > > bridge in the chain becomes a lot more tricky though. With that model > > > we'd create a new connector every time the bridge is hotplugged, which I > > > think is also the cleaner model (because you might plug in a hdmi > > > connector after a panel, so things like the connector type change). So, based on your dp__mst hint I added connector creation/removal in the v3 I sent a few days ago. All other aspects in your list are unchanged from the v2 you have reviewed. Now I'm trying to tackle some other of the items you mentioned here, and locking/lifetime is the hardest one for me to understand at the moment. If I get you right, you suggest making all the removable bridges "owned" by the connector that gets hotplugged, the reason being that locking and lifetime issues for the hotplug connectors have already been sorted out, while that has not been done for bridges. Correct? Assuming the above is correct, I'm wondering whether this would be a correct design or rather a shortcut to leverage the connector locking instead of implementing bridge locking/lifetime. Note I'm not criticizing, I'm really asking myself this question and I'd like to know your position about that. Again about putting the removable bridges inside the hotplug connector, I'm trying to understand how that may happen in the device tree case. The hot-pluggable bridge is populated starting from the DT code (through i2c-core-of in my case being an I2C chip), and that code is not aware that it is being instantiating a DRM device. So there is nothing we can do before the bridge probe is called. The bridge probe then does know much about any connectors. It also does not know about hotplug: this is a design decision I made to make the regular bridge drivers reusable without edits for the hotplug case, but it can be waived if needed. So the only way I currently to move the bridge inside the connector is to catch it in the hotplug-bridge driver, when it gets notified about the new bridge having appeared (was a notifier, I'm changing that, see below). So the hotplug-bridge would need a function line drm_connector_add_bridge(conn, br) to make the association. Is this the direction of development you had in mind? > > I have been investigating the option of adding/removing a connector > > based on hot-plug/unplug events initially, see my reply to Maxime after > > v1 [0]: > > > > > The first approach is based on removing the drm_connector. My laptop > > > uses the i915 driver, and I have observed that attaching/removing a > > > USB-C dock with an HDMI connector connected to a monitor, a new > > > drm_connector appears/disappears for the card. User space gets notified > > > and the external monitor is enabled/disabled, just the way a desktop > > > user would expect, so this is possible. I had a look at the driver but > > > how this magic happens was not clear to me honestly. > > > > So if you could point to where/how this is done, I would certainly > > reconsider that. > > Right now only the dp mst code uses hotplug/unplug (like you've observed > in your testing with i915, usb-c docks are generally all dp mst). For code > reading it might be best to start with the i915 dp mst code and then go > through the helpers. > > > > - No notifiers please. The create a locking mess with inversions, and > > > especially for hotunplug they create the illusion that you can > > > synchronously keep up to date with hardware state. That's not possible. > > > Fundamentally all bridge drivers which might be hotunplugged need to be > > > able to cope with the hardware disappearing any momemnt. > > > > Can you clarify this point? I'm sorry I fail to understand the > > relationship between the use of notifiers and the ability of bridge > > drivers to cope with hardware disappearing. > > > > In this patch, the hotplug-bridge uses notifiers to be informed when > > the following bridge is disappearing: which other way would you suggest > > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is > > not meant to disappear, and it has no actual hardware that can go away, > > being just a mechanical connector. > > Yeah you need code to handle that. My point is that using a notifier is > the wrong design, because the notifier has it's own locking. Which tends > to get in the way. I went into this subject to see how drm_bridge_add() could inform the interested bridges using a DRM-specific mechanism instead of standard notifiers. However I think to inform *the interested* bridges, at least in the device tree case, there is a fundamental issue: DRM core has no idea about the topology. Definitely not at drm_bridge_add() time, way before drm_bridge_attach() where the 'previous' pointer introduces a minimum of topology awareness in the DRM core. Option 1 is to iterate over all the ports and endpoints and for every remote-endpoint pointing to a bridge, inform the remote bridge about that via a new optional callback in drm_bridge_funcs. That means likely informing more bridges than needed, so when they get informed the bridges will still have to check whether they are interested or not. Pseudocode: void drm_bridge_add(struct drm_bridge *new_bridge) { // existing code unchanged + if (bridge->of_node) { + for_each_remote_endpoint(bridge->of_node) { + br = of_drm_find_bridge(remote_endpoint); + if (br && br->funcs.bridge_event_notify) + br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_ADD, + new_bridge); + } + } } That means informing both upstream and downstream bridges, which could even be useful, but anyway there is no reliable way to pick only the sink or source ports. It also implies having lots of of_*() calls which iterate over the tree, which is not optimal, but it happens only at add/removal so it's fine I guess. Option 2 is be to just inform all the bridges (using the global bridge_list). Pros and cons: * less of_*() calls to crawl around the remote-endpoints * simpler code * more bridges than needed would be informed, could be be an issue if many implement the .bridge_event_notify() * notifies even in the non-OF case too, not sure it's useful What are your thoughts about these two options? > Instead I think that code should be directly in core bridge code (I don't > see the benefit of having this entirely in a separate driver), using drm > locking to make sure there's no races. Not sure I got what you mean here. Which one of the following? 1. The entire hotplug-bridge driver should not exist, and the DRM core should handle it all (seems not doable, this driver has lots of device-specific details) 2. The hotplug-driver should exist, but the code to attach/detach the pipeline on hotplug/unplug should be moved to the core, with the hotplug-driver providing callbacks for the device-specific aspects 3. Same as 2, but additionally the hotplug-bridge should become a connector driver (hotplug-connector.c?) -- not sure it can decouple the two sides without a bridge however 4. None of the above > > On the opposite side, the following bridges are physically removable > > and so their drivers will have to be fixed (if needed) to work when > > removal happens, but I don't see how that relates to the DRM core > > emitting a notification of such bridges being removed. > > Yeah it's not directly related, just my experience that people assume > notifiers provide you more synchronization and race-preventation than they > really do. So it's better to hand-roll to make it all really explicit. > > > > - Related to this: You're not allowed to shut down hardware behind the > > > user's back with drm_atomic_helper_shutdown. We've tried that approach > > > with dp mst, it really pisses off userspace when a page_flip that it > > > expected to work doesn't work. > > > > > > - There's also the design aspect that in atomic, only atomic_check is > > > allowed to fail, atomic_commit must succeed, even when the hardware is > > > gone. Using connectors and their refcounting should help with that. > > > > IIUC here you are suggesting again to remove the connector instead of > > marking it "disconnected". So, as above, pointers on how that is > > achieved would be helpful. > > See dp mst code. It's complex unfortunately, so there's some reading > involved :-/ > > > > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in > > > drm_bridge_attach, and that's kinda the wrong place. It should be in > > > drm_mode_config_reset, like all the other ->atomic_reset hooks. That > > > would make it a lot clearer that we need to figure out who/when > > > ->atomic_reset should be called for hotplugged bridges, maybe as part of > > > connector registration when the entire bridge and it's new connector is > > > assembled? > > > > > > - Finally this very much means we need to rethink who/how the connector > > > for a bridge is created. The new design is that the main driver creates > > > this connector, once the entire bridge exists. But with hotplugging this > > > gets a lot more complicated, so we might want to extract a pile of that > > > encoder related code from drivers (same way dp mst helpers take care of > > > connector creation too, it's just too much of a mess otherwise). > > > > > > The current bridge chaining infrastructure requires a lot of hand-rolled > > > code in each bridge driver and the encoder, so that might be a good > > > thing anyway. > > > > > > - Finally I think the entire bridge hotplug infrastructure should be > > > irrespective of the underlying bus. Which means for the mipi dsi case we > > > might also want to look into what's missing to make mipi dsi > > > hotunpluggable, at least for the case where it's a proper driver. I > > > think we should ignore the old bridge model where driver's stitched it > > > all toghether using the component framework, in my opinion that approach > > > should be deprecated. > > > > The DSI side was one of my headaches on writing this driver, and I > > must say I dislike the code in hotplug_bridge_dsi_attach(), with the > > need for an initial "dummy" attach and detach the first time. At first > > sight I think we need a .update_format callback in struct > > mipi_dsi_host_ops so the DSI host is aware of a format change. > > > > Right now there are DSI host drivers which keep a copy of the struct > > mipi_dsi_device pointer and read the format from there when starting a > > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way > > to react to format changes, but keeping a pointer is bad when the DSI > > device hot-unplugs, and the new format won't be seen until a > > disable/enable cycle. So a new callback looks more robust overall. > > > > Any opinion about this? > > Yeah I don't like the code either. > > What might help for prototyping is if you start with a hotpluggeable > bridge where the bridge is a i2c device. That way I think we should be > able to benefit from the driver bind/unbind code that exists already. > Although there's going to be issues with i2c hotunplug too in i2c core > code. > > Then lift whatever we learn there to our dsi code. But essentially I think > we should model the driver core model a lot more, so I guess you could use > any hotunplug capable bus as a template. Usb might be closest, since > that's also a packet/message based interface, mostly at least? > > > > - Finally I think we should have a lot of safety checks, like only bridges > > > which declare themselve to be hotunplug safe should be allowed as a part > > > of the hotpluggable bridge chain part. All others must still be attached > > > before the entire driver is registered with drm_dev_register. > > > > I'm fine with the principle of a "HOTPLUG" flag. > > > > I wonder how that should be implemented, though. Based on my (surely > > simplistic) understanding, right now bridges can be both added and > > removed, but only in a specific sequence: > > > > 1. add all bridges > > 2. use the card > > 3. remove all bridges > > 4. EOF > > > > We'd need to change to allow: > > > > 1. add fixed bridges (including hotplug-bridge) > > 2. add bridges on removable add-on > > 3. use card > > 4. remove bridges on removable add-on > > 5. optionally goto 2 > > 6. remove fixed bridges (including hotplug-bridge) > > 7. EOF > > > > One naïve idea is that the DRM core keeps a flag whenever any fixed > > bridge (no HOTLPUG flag) is removed and when that happens forbid adding > > bridges as we are now at line 5. > > If a fixed bridge is removed while the driver is still in use (i.e. before > drm_dev_unregister is called), that's a driver bug. Would be good to catch > that, which is why I think a lot of all the bridge hotplug handling should > be in core bridge code and not the separate hotplug driver, so that we can > enforce all these constraints. > > Also conceptually 3 can happen before 2 (but also before), and that's how > dp mst works too. It does add all kinds of complications though ... > > > Aside for tons of subtleties I'm surely missing, does this look a > > proper approach? I'd not be surprised if there is something better and > > more solid. > > Yeah roughly. If you look through dp mst code then that also adds all > kinds of structures (since dp mst is a routed network really), not just > the connectors. They also all come with refcounts (because the network is > a tree), but like I said I think for your case we can avoid the per-bridge > refcounts by relying on the connector refcount we have already. > > But I might be overlooking something, and we need refcounting for each > bridge like dp mst also needs refcounting for all its internal structures > that map the entire hotpluggable display chain. If you want to read some > dp mst code, these internal structures are ports/branches with struct > drm_dp_mst_branch/port. > > > > Or that we only allow bridges with the NO_CONNECTOR flag for > > > drm_bridge_attach. > > > > > > There's probably a pile more fundamental issues I've missed, but this > > > should get a good discussion started. > > > > Sure, I think it has. > > > > Bottom line, I'm clearly not seeing some issues that need to be > > considered, and that do not show up for my use case. Based on my > > limited DRM knowledge, this was expected, and I'm glad to work on those > > issues with some practical indications about the path forward. > > Yeah for learning I think dp mst is best. It's a bit complex, but since > you have a dock you should be able to play around and experiment with the > code with some real hardware. > > That should help to get a bit a feel for the complexity, since lots of > opportunities for you to ask why/how locking/refcounting is used and > against which potential issue they protect. > > Cheers, Sima Luca
On Fri, Aug 23, 2024 at 12:39:03PM +0200, Luca Ceresoli wrote: > Hello Sima, > > these days I started looking in more detail into some of the topics you > had mentioned in your v2 review. I have questions about those I have > investigated, see below. > > On Tue, 21 May 2024 14:01:19 +0200 > Daniel Vetter <daniel@ffwll.ch> wrote: > > > On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote: > > > Hello Daniel, > > > > > > On Thu, 16 May 2024 15:22:01 +0200 > > > Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > > > Apologies for missing v1 ... > > > > > > > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > > > > > DRM hotplug bridge driver > > > > > ========================= > > > > > > > > > > DRM natively supports pipelines whose display can be removed, but all the > > > > > components preceding it (all the display controller and any bridges) are > > > > > assumed to be fixed and cannot be plugged, removed or modified at runtime. > > > > > > > > > > This series adds support for DRM pipelines having a removable part after > > > > > the encoder, thus also allowing bridges to be removed and reconnected at > > > > > runtime, possibly with different components. > > > > > > > > > > This picture summarizes the DRM structure implemented by this series: > > > > > > > > > > .------------------------. > > > > > | DISPLAY CONTROLLER | > > > > > | .---------. .------. | > > > > > | | ENCODER |<--| CRTC | | > > > > > | '---------' '------' | > > > > > '------|-----------------' > > > > > | > > > > > |DSI HOTPLUG > > > > > V CONNECTOR > > > > > .---------. .--. .-. .---------. .-------. > > > > > | 0 to N | | _| _| | | 1 to N | | | > > > > > | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | > > > > > | | | | | | | | | | > > > > > '---------' '--' '-' '---------' '-------' > > > > > > > > > > [--- fixed components --] [----------- removable add-on -----------] > > > > > > > > > > Fixed components include: > > > > > > > > > > * all components up to the DRM encoder, usually part of the SoC > > > > > * optionally some bridges, in the SoC and/or as external chips > > > > > > > > > > Components on the removable add-on include: > > > > > > > > > > * one or more bridges > > > > > * a fixed connector (not one natively supporting hotplug such as HDMI) > > > > > * the panel > > > > > > > > So I think at a high level this design approach makes sense, > > > > > > Good starting point :) > > > > > > > but the > > > > implementation needs some serious thought. One big thing upfront though, > > > > we need to have a clear plan for the overlay hotunload issues, otherwise > > > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is > > > > very, very tricky, full of lifetime issues, and those need to be sorted > > > > out first or we're just trying to build a castle on quicksand. > > > > > > > > For bridges itself I don't think the current locking works. You're trying > > > > to really cleverly hide it all behind a normal-looking bridge driver, but > > > > there's many things beyond that which will blow up if bridges just > > > > disappear. Most importantly the bridge states part of an atomic update. > > > > > > Surely possible as atomic updates are definitely not stimulated in my > > > use case. Can you recommend any testing tools to be able to trigger any > > > issues? > > > > Uh really hard ... You'd need to create an atomic commit that's blocked on > > a sync_file in-fence (so that you can extend the race window). And then > > hotunplug the bridge chain _before_ you signal that fence. > > > > That's not going to cover all possible races, but at least a large chunk > > of the really big ones. > > > > > The main setups I used for my testing so far are 'modetest -s' for my > > > daily work and a simple weston setup to periodically test a complete > > > user space stack. > > > > > > > Now in drm we have drm_connector as the only hotunpluggable thing, and it > > > > took years to sort out all the issues. I think we should either model the > > > > bridge hotunplug locking after that, or just outright reuse the connector > > > > locking and lifetime rules. I much prefer the latter personally. > > > > > > > > Anyway the big issues: > > > > > > > > - We need to refcount the hotpluggable bridges, because software (like > > > > atomic state updates) might hang onto pointers for longer than the > > > > bridge physically exists. Assuming that you can all tear it down > > > > synchronously will not work. > > > > > > > > If we reuse connector locking/lifetime then we could put the > > > > hotpluggable part of the bridge chain into the drm_connector, since that > > > > already has refcounting as needed. It would mean that finding the next > > > > bridge in the chain becomes a lot more tricky though. With that model > > > > we'd create a new connector every time the bridge is hotplugged, which I > > > > think is also the cleaner model (because you might plug in a hdmi > > > > connector after a panel, so things like the connector type change). > > So, based on your dp__mst hint I added connector creation/removal in the > v3 I sent a few days ago. All other aspects in your list are unchanged > from the v2 you have reviewed. > > Now I'm trying to tackle some other of the items you mentioned here, > and locking/lifetime is the hardest one for me to understand at the > moment. > > If I get you right, you suggest making all the removable bridges > "owned" by the connector that gets hotplugged, the reason being that > locking and lifetime issues for the hotplug connectors have already > been sorted out, while that has not been done for bridges. > > Correct? Yes. > Assuming the above is correct, I'm wondering whether this would be a > correct design or rather a shortcut to leverage the connector locking > instead of implementing bridge locking/lifetime. Note I'm not > criticizing, I'm really asking myself this question and I'd like to > know your position about that. > > Again about putting the removable bridges inside the hotplug connector, > I'm trying to understand how that may happen in the device tree case. > The hot-pluggable bridge is populated starting from the DT code > (through i2c-core-of in my case being an I2C chip), and that code is > not aware that it is being instantiating a DRM device. So there is > nothing we can do before the bridge probe is called. The bridge probe > then does know much about any connectors. It also does not know about > hotplug: this is a design decision I made to make the regular bridge > drivers reusable without edits for the hotplug case, but it can be > waived if needed. > > So the only way I currently to move the bridge inside the connector is > to catch it in the hotplug-bridge driver, when it gets notified about > the new bridge having appeared (was a notifier, I'm changing that, see > below). So the hotplug-bridge would need a function line > drm_connector_add_bridge(conn, br) to make the association. > > Is this the direction of development you had in mind? Not quite. The issue is that while bridges are getting hotplugged, you don't yet have a connector. So there's nothing you can attach to. What I had in mind is more: - The last fixed bridges knows that all subsequent bridges are hotplugged. Which means instead of just asking for the next bridge, it needs to ask for the fully formed bridge chain, including the connector. - The hotplug bridge connector code checks every time a new bridge shows up whether the chain is now complete. Once that is the case, it creates the connector (with the new bridge design this is no longer the job of the last bridge in the chain, so we need to require that for hotpluggable bridges). Then it can attach all the bridges to that connector, and hand the entire fully formed chain to the last fixed bridge to hotplug insert and register. - Hotunplug is the same, but on inverse: The entire chain including the connector gets removed in it's entirety, before it gets dissassembled and freed (once the last refcount to the connector is dropped). There is a _lot_ of handwaving in this on the driver model side, but with this the drm side should be clear and clean at least. > > > I have been investigating the option of adding/removing a connector > > > based on hot-plug/unplug events initially, see my reply to Maxime after > > > v1 [0]: > > > > > > > The first approach is based on removing the drm_connector. My laptop > > > > uses the i915 driver, and I have observed that attaching/removing a > > > > USB-C dock with an HDMI connector connected to a monitor, a new > > > > drm_connector appears/disappears for the card. User space gets notified > > > > and the external monitor is enabled/disabled, just the way a desktop > > > > user would expect, so this is possible. I had a look at the driver but > > > > how this magic happens was not clear to me honestly. > > > > > > So if you could point to where/how this is done, I would certainly > > > reconsider that. > > > > Right now only the dp mst code uses hotplug/unplug (like you've observed > > in your testing with i915, usb-c docks are generally all dp mst). For code > > reading it might be best to start with the i915 dp mst code and then go > > through the helpers. > > > > > > - No notifiers please. The create a locking mess with inversions, and > > > > especially for hotunplug they create the illusion that you can > > > > synchronously keep up to date with hardware state. That's not possible. > > > > Fundamentally all bridge drivers which might be hotunplugged need to be > > > > able to cope with the hardware disappearing any momemnt. > > > > > > Can you clarify this point? I'm sorry I fail to understand the > > > relationship between the use of notifiers and the ability of bridge > > > drivers to cope with hardware disappearing. > > > > > > In this patch, the hotplug-bridge uses notifiers to be informed when > > > the following bridge is disappearing: which other way would you suggest > > > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is > > > not meant to disappear, and it has no actual hardware that can go away, > > > being just a mechanical connector. > > > > Yeah you need code to handle that. My point is that using a notifier is > > the wrong design, because the notifier has it's own locking. Which tends > > to get in the way. > > I went into this subject to see how drm_bridge_add() could inform the > interested bridges using a DRM-specific mechanism instead of standard > notifiers. > > However I think to inform *the interested* bridges, at least in the > device tree case, there is a fundamental issue: DRM core has no idea > about the topology. Definitely not at drm_bridge_add() time, way before > drm_bridge_attach() where the 'previous' pointer introduces a minimum > of topology awareness in the DRM core. Yeah we need special functions, which the last fixed bridge needs to call instead of the current set of functions. So instead of of_drm_find_bridge we need something like of_drm_register_hotplug_source_bridge or similar. > Option 1 is to iterate over all the ports and endpoints and for every > remote-endpoint pointing to a bridge, inform the remote bridge about > that via a new optional callback in drm_bridge_funcs. That means likely > informing more bridges than needed, so when they get informed the > bridges will still have to check whether they are interested or not. > > Pseudocode: > > void drm_bridge_add(struct drm_bridge *new_bridge) > { > // existing code unchanged > > + if (bridge->of_node) { > + for_each_remote_endpoint(bridge->of_node) { > + br = of_drm_find_bridge(remote_endpoint); > + if (br && br->funcs.bridge_event_notify) > + br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_ADD, > + new_bridge); > + } > + } > } > > That means informing both upstream and downstream bridges, which could > even be useful, but anyway there is no reliable way to pick only the > sink or source ports. It also implies having lots of of_*() calls which > iterate over the tree, which is not optimal, but it happens only at > add/removal so it's fine I guess. > > Option 2 is be to just inform all the bridges (using the global > bridge_list). Pros and cons: > > * less of_*() calls to crawl around the remote-endpoints > * simpler code > * more bridges than needed would be informed, could be be an issue > if many implement the .bridge_event_notify() > * notifies even in the non-OF case too, not sure it's useful > > What are your thoughts about these two options? I'm not sure why you need to inform bridges? Currently for fixed bridges we retry with EPROBE_DEFER if the subsequent bridge isn't there yet, which means we only add the bridge when the entire chain exists. I think hotplugged bridge chains should work the same, which means you only ever need to watch the first element shows up. Then you create the connector for the entire chain and through special callbacks inform the bridge that called of_drm_register_hotplug_source_bridge that it now has a connector. All the other bridges can be ignored I think, even when they're part of the hotplug chain. I just realized that means we need at least a refcount on hotpluggable bridges, because otherwise drm_bridge_remove will result in use-after-free issues. So should probably rename those from drm_bridge_add/remove to drm_bridge_register/unregister. For fixed bridges no one else should ever increment the refcount, so should work exactly the same. > > Instead I think that code should be directly in core bridge code (I don't > > see the benefit of having this entirely in a separate driver), using drm > > locking to make sure there's no races. > > Not sure I got what you mean here. Which one of the following? > > 1. The entire hotplug-bridge driver should not exist, and the DRM > core should handle it all (seems not doable, this driver has lots of > device-specific details) > 2. The hotplug-driver should exist, but the code to attach/detach the > pipeline on hotplug/unplug should be moved to the core, with the > hotplug-driver providing callbacks for the device-specific aspects > 3. Same as 2, but additionally the hotplug-bridge should become a > connector driver (hotplug-connector.c?) -- not sure it can decouple > the two sides without a bridge however > 4. None of the above 3, roughly. The connector creation must be in core bridge code, or things will go boom. If you also want to keep the current lifetime model for bridge (without a refcount, meaning drm_bridge_remove removes the bridge from all use right away) there's also additional locking and tricks you need in in that bridge connector code, including the entire bridge chain. Meaning those hotplugged bridges must not be visible at all to the wider driver. Or you refcount bridges and make them like driver model struct kobj (or kobj directly), but that's a pile of work too. That would be the other design approach, instead of only relying on drm_connector. We might need both, I haven't dug into the details enough to be sure here. Cheers, Sima > > > On the opposite side, the following bridges are physically removable > > > and so their drivers will have to be fixed (if needed) to work when > > > removal happens, but I don't see how that relates to the DRM core > > > emitting a notification of such bridges being removed. > > > > Yeah it's not directly related, just my experience that people assume > > notifiers provide you more synchronization and race-preventation than they > > really do. So it's better to hand-roll to make it all really explicit. > > > > > > - Related to this: You're not allowed to shut down hardware behind the > > > > user's back with drm_atomic_helper_shutdown. We've tried that approach > > > > with dp mst, it really pisses off userspace when a page_flip that it > > > > expected to work doesn't work. > > > > > > > > - There's also the design aspect that in atomic, only atomic_check is > > > > allowed to fail, atomic_commit must succeed, even when the hardware is > > > > gone. Using connectors and their refcounting should help with that. > > > > > > IIUC here you are suggesting again to remove the connector instead of > > > marking it "disconnected". So, as above, pointers on how that is > > > achieved would be helpful. > > > > See dp mst code. It's complex unfortunately, so there's some reading > > involved :-/ > > > > > > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in > > > > drm_bridge_attach, and that's kinda the wrong place. It should be in > > > > drm_mode_config_reset, like all the other ->atomic_reset hooks. That > > > > would make it a lot clearer that we need to figure out who/when > > > > ->atomic_reset should be called for hotplugged bridges, maybe as part of > > > > connector registration when the entire bridge and it's new connector is > > > > assembled? > > > > > > > > - Finally this very much means we need to rethink who/how the connector > > > > for a bridge is created. The new design is that the main driver creates > > > > this connector, once the entire bridge exists. But with hotplugging this > > > > gets a lot more complicated, so we might want to extract a pile of that > > > > encoder related code from drivers (same way dp mst helpers take care of > > > > connector creation too, it's just too much of a mess otherwise). > > > > > > > > The current bridge chaining infrastructure requires a lot of hand-rolled > > > > code in each bridge driver and the encoder, so that might be a good > > > > thing anyway. > > > > > > > > - Finally I think the entire bridge hotplug infrastructure should be > > > > irrespective of the underlying bus. Which means for the mipi dsi case we > > > > might also want to look into what's missing to make mipi dsi > > > > hotunpluggable, at least for the case where it's a proper driver. I > > > > think we should ignore the old bridge model where driver's stitched it > > > > all toghether using the component framework, in my opinion that approach > > > > should be deprecated. > > > > > > The DSI side was one of my headaches on writing this driver, and I > > > must say I dislike the code in hotplug_bridge_dsi_attach(), with the > > > need for an initial "dummy" attach and detach the first time. At first > > > sight I think we need a .update_format callback in struct > > > mipi_dsi_host_ops so the DSI host is aware of a format change. > > > > > > Right now there are DSI host drivers which keep a copy of the struct > > > mipi_dsi_device pointer and read the format from there when starting a > > > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way > > > to react to format changes, but keeping a pointer is bad when the DSI > > > device hot-unplugs, and the new format won't be seen until a > > > disable/enable cycle. So a new callback looks more robust overall. > > > > > > Any opinion about this? > > > > Yeah I don't like the code either. > > > > What might help for prototyping is if you start with a hotpluggeable > > bridge where the bridge is a i2c device. That way I think we should be > > able to benefit from the driver bind/unbind code that exists already. > > Although there's going to be issues with i2c hotunplug too in i2c core > > code. > > > > Then lift whatever we learn there to our dsi code. But essentially I think > > we should model the driver core model a lot more, so I guess you could use > > any hotunplug capable bus as a template. Usb might be closest, since > > that's also a packet/message based interface, mostly at least? > > > > > > - Finally I think we should have a lot of safety checks, like only bridges > > > > which declare themselve to be hotunplug safe should be allowed as a part > > > > of the hotpluggable bridge chain part. All others must still be attached > > > > before the entire driver is registered with drm_dev_register. > > > > > > I'm fine with the principle of a "HOTPLUG" flag. > > > > > > I wonder how that should be implemented, though. Based on my (surely > > > simplistic) understanding, right now bridges can be both added and > > > removed, but only in a specific sequence: > > > > > > 1. add all bridges > > > 2. use the card > > > 3. remove all bridges > > > 4. EOF > > > > > > We'd need to change to allow: > > > > > > 1. add fixed bridges (including hotplug-bridge) > > > 2. add bridges on removable add-on > > > 3. use card > > > 4. remove bridges on removable add-on > > > 5. optionally goto 2 > > > 6. remove fixed bridges (including hotplug-bridge) > > > 7. EOF > > > > > > One naïve idea is that the DRM core keeps a flag whenever any fixed > > > bridge (no HOTLPUG flag) is removed and when that happens forbid adding > > > bridges as we are now at line 5. > > > > If a fixed bridge is removed while the driver is still in use (i.e. before > > drm_dev_unregister is called), that's a driver bug. Would be good to catch > > that, which is why I think a lot of all the bridge hotplug handling should > > be in core bridge code and not the separate hotplug driver, so that we can > > enforce all these constraints. > > > > Also conceptually 3 can happen before 2 (but also before), and that's how > > dp mst works too. It does add all kinds of complications though ... > > > > > Aside for tons of subtleties I'm surely missing, does this look a > > > proper approach? I'd not be surprised if there is something better and > > > more solid. > > > > Yeah roughly. If you look through dp mst code then that also adds all > > kinds of structures (since dp mst is a routed network really), not just > > the connectors. They also all come with refcounts (because the network is > > a tree), but like I said I think for your case we can avoid the per-bridge > > refcounts by relying on the connector refcount we have already. > > > > But I might be overlooking something, and we need refcounting for each > > bridge like dp mst also needs refcounting for all its internal structures > > that map the entire hotpluggable display chain. If you want to read some > > dp mst code, these internal structures are ports/branches with struct > > drm_dp_mst_branch/port. > > > > > > Or that we only allow bridges with the NO_CONNECTOR flag for > > > > drm_bridge_attach. > > > > > > > > There's probably a pile more fundamental issues I've missed, but this > > > > should get a good discussion started. > > > > > > Sure, I think it has. > > > > > > Bottom line, I'm clearly not seeing some issues that need to be > > > considered, and that do not show up for my use case. Based on my > > > limited DRM knowledge, this was expected, and I'm glad to work on those > > > issues with some practical indications about the path forward. > > > > Yeah for learning I think dp mst is best. It's a bit complex, but since > > you have a dock you should be able to play around and experiment with the > > code with some real hardware. > > > > That should help to get a bit a feel for the complexity, since lots of > > opportunities for you to ask why/how locking/refcounting is used and > > against which potential issue they protect. > > > > Cheers, Sima > > Luca > > -- > Luca Ceresoli, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com
Hello Sima, again thanks for your feedback, and again I'm afraid I'm having a hard time in understanding large parts of your e-mail. Below I'm trying to focus on one fundamental aspect only, which I need to understand the overall approach you are suggesting. On Tue, 27 Aug 2024 18:37:12 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > On Fri, Aug 23, 2024 at 12:39:03PM +0200, Luca Ceresoli wrote: > > Hello Sima, > > > > these days I started looking in more detail into some of the topics you > > had mentioned in your v2 review. I have questions about those I have > > investigated, see below. > > > > On Tue, 21 May 2024 14:01:19 +0200 > > Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote: > > > > Hello Daniel, > > > > > > > > On Thu, 16 May 2024 15:22:01 +0200 > > > > Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > > > > > Apologies for missing v1 ... > > > > > > > > > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > > > > > > DRM hotplug bridge driver > > > > > > ========================= > > > > > > > > > > > > DRM natively supports pipelines whose display can be removed, but all the > > > > > > components preceding it (all the display controller and any bridges) are > > > > > > assumed to be fixed and cannot be plugged, removed or modified at runtime. > > > > > > > > > > > > This series adds support for DRM pipelines having a removable part after > > > > > > the encoder, thus also allowing bridges to be removed and reconnected at > > > > > > runtime, possibly with different components. > > > > > > > > > > > > This picture summarizes the DRM structure implemented by this series: > > > > > > > > > > > > .------------------------. > > > > > > | DISPLAY CONTROLLER | > > > > > > | .---------. .------. | > > > > > > | | ENCODER |<--| CRTC | | > > > > > > | '---------' '------' | > > > > > > '------|-----------------' > > > > > > | > > > > > > |DSI HOTPLUG > > > > > > V CONNECTOR > > > > > > .---------. .--. .-. .---------. .-------. > > > > > > | 0 to N | | _| _| | | 1 to N | | | > > > > > > | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | > > > > > > | | | | | | | | | | > > > > > > '---------' '--' '-' '---------' '-------' > > > > > > > > > > > > [--- fixed components --] [----------- removable add-on -----------] > > > > > > > > > > > > Fixed components include: > > > > > > > > > > > > * all components up to the DRM encoder, usually part of the SoC > > > > > > * optionally some bridges, in the SoC and/or as external chips > > > > > > > > > > > > Components on the removable add-on include: > > > > > > > > > > > > * one or more bridges > > > > > > * a fixed connector (not one natively supporting hotplug such as HDMI) > > > > > > * the panel > > > > > > > > > > So I think at a high level this design approach makes sense, > > > > > > > > Good starting point :) > > > > > > > > > but the > > > > > implementation needs some serious thought. One big thing upfront though, > > > > > we need to have a clear plan for the overlay hotunload issues, otherwise > > > > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is > > > > > very, very tricky, full of lifetime issues, and those need to be sorted > > > > > out first or we're just trying to build a castle on quicksand. > > > > > > > > > > For bridges itself I don't think the current locking works. You're trying > > > > > to really cleverly hide it all behind a normal-looking bridge driver, but > > > > > there's many things beyond that which will blow up if bridges just > > > > > disappear. Most importantly the bridge states part of an atomic update. > > > > > > > > Surely possible as atomic updates are definitely not stimulated in my > > > > use case. Can you recommend any testing tools to be able to trigger any > > > > issues? > > > > > > Uh really hard ... You'd need to create an atomic commit that's blocked on > > > a sync_file in-fence (so that you can extend the race window). And then > > > hotunplug the bridge chain _before_ you signal that fence. > > > > > > That's not going to cover all possible races, but at least a large chunk > > > of the really big ones. > > > > > > > The main setups I used for my testing so far are 'modetest -s' for my > > > > daily work and a simple weston setup to periodically test a complete > > > > user space stack. > > > > > > > > > Now in drm we have drm_connector as the only hotunpluggable thing, and it > > > > > took years to sort out all the issues. I think we should either model the > > > > > bridge hotunplug locking after that, or just outright reuse the connector > > > > > locking and lifetime rules. I much prefer the latter personally. > > > > > > > > > > Anyway the big issues: > > > > > > > > > > - We need to refcount the hotpluggable bridges, because software (like > > > > > atomic state updates) might hang onto pointers for longer than the > > > > > bridge physically exists. Assuming that you can all tear it down > > > > > synchronously will not work. > > > > > > > > > > If we reuse connector locking/lifetime then we could put the > > > > > hotpluggable part of the bridge chain into the drm_connector, since that > > > > > already has refcounting as needed. It would mean that finding the next > > > > > bridge in the chain becomes a lot more tricky though. With that model > > > > > we'd create a new connector every time the bridge is hotplugged, which I > > > > > think is also the cleaner model (because you might plug in a hdmi > > > > > connector after a panel, so things like the connector type change). > > > > So, based on your dp__mst hint I added connector creation/removal in the > > v3 I sent a few days ago. All other aspects in your list are unchanged > > from the v2 you have reviewed. > > > > Now I'm trying to tackle some other of the items you mentioned here, > > and locking/lifetime is the hardest one for me to understand at the > > moment. > > > > If I get you right, you suggest making all the removable bridges > > "owned" by the connector that gets hotplugged, the reason being that > > locking and lifetime issues for the hotplug connectors have already > > been sorted out, while that has not been done for bridges. > > > > Correct? > > Yes. > > > Assuming the above is correct, I'm wondering whether this would be a > > correct design or rather a shortcut to leverage the connector locking > > instead of implementing bridge locking/lifetime. Note I'm not > > criticizing, I'm really asking myself this question and I'd like to > > know your position about that. > > > > Again about putting the removable bridges inside the hotplug connector, > > I'm trying to understand how that may happen in the device tree case. > > The hot-pluggable bridge is populated starting from the DT code > > (through i2c-core-of in my case being an I2C chip), and that code is > > not aware that it is being instantiating a DRM device. So there is > > nothing we can do before the bridge probe is called. The bridge probe > > then does know much about any connectors. It also does not know about > > hotplug: this is a design decision I made to make the regular bridge > > drivers reusable without edits for the hotplug case, but it can be > > waived if needed. > > > > So the only way I currently to move the bridge inside the connector is > > to catch it in the hotplug-bridge driver, when it gets notified about > > the new bridge having appeared (was a notifier, I'm changing that, see > > below). So the hotplug-bridge would need a function line > > drm_connector_add_bridge(conn, br) to make the association. > > > > Is this the direction of development you had in mind? > > Not quite. The issue is that while bridges are getting hotplugged, you > don't yet have a connector. So there's nothing you can attach to. What I > had in mind is more: > There is a fundamental question where your position is not clear to me. Based on this: > - The last fixed bridges knows that all subsequent bridges are hotplugged. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Which means instead of just asking for the next bridge, it needs to ask > for the fully formed bridge chain, including the connector. > and this: > - The hotplug bridge connector code checks every time a new bridge shows ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > up whether the chain is now complete. Once that is the case, it creates > the connector (with the new bridge design this is no longer the job of > the last bridge in the chain, so we need to require that for > hotpluggable bridges). Then it can attach all the bridges to that > connector, and hand the entire fully formed chain to the last fixed > bridge to hotplug insert and register. The question is: do you think the hotplug-bridge driver should be present, or not? To me the two above sentences appear to contradict each other. The reason we decided to implement a hotplug DRM bridge is it allows to decouple the fixed and the remote segments of the pipeline, in such a way that all the regular bridge drivers don't need any special handling to support hotplug. In my work the upstream bridge driver is samsung-dsim.c and the downstream one is ti-sn65dsi83.c and none of them needed a single line changed to support hotplug. I think this is useful: virtually any physical bridge with pins can be used in a hotplug setup, either in the fixed or in the removable section, so not needing to modify them is valuable. OTOH in various parts of this and other e-mails you seem to imply that there should be no hotplug-bridge (not as a struct drm_bridge, not as a driver, or both). Except for the fact that there is no chip implementing such a bridge (there is a physical connector though) I do not see many reasons. > - Hotunplug is the same, but on inverse: The entire chain including the > connector gets removed in it's entirety, before it gets dissassembled > and freed (once the last refcount to the connector is dropped). > > There is a _lot_ of handwaving in this on the driver model side, but with > this the drm side should be clear and clean at least. > > > > > I have been investigating the option of adding/removing a connector > > > > based on hot-plug/unplug events initially, see my reply to Maxime after > > > > v1 [0]: > > > > > > > > > The first approach is based on removing the drm_connector. My laptop > > > > > uses the i915 driver, and I have observed that attaching/removing a > > > > > USB-C dock with an HDMI connector connected to a monitor, a new > > > > > drm_connector appears/disappears for the card. User space gets notified > > > > > and the external monitor is enabled/disabled, just the way a desktop > > > > > user would expect, so this is possible. I had a look at the driver but > > > > > how this magic happens was not clear to me honestly. > > > > > > > > So if you could point to where/how this is done, I would certainly > > > > reconsider that. > > > > > > Right now only the dp mst code uses hotplug/unplug (like you've observed > > > in your testing with i915, usb-c docks are generally all dp mst). For code > > > reading it might be best to start with the i915 dp mst code and then go > > > through the helpers. > > > > > > > > - No notifiers please. The create a locking mess with inversions, and > > > > > especially for hotunplug they create the illusion that you can > > > > > synchronously keep up to date with hardware state. That's not possible. > > > > > Fundamentally all bridge drivers which might be hotunplugged need to be > > > > > able to cope with the hardware disappearing any momemnt. > > > > > > > > Can you clarify this point? I'm sorry I fail to understand the > > > > relationship between the use of notifiers and the ability of bridge > > > > drivers to cope with hardware disappearing. > > > > > > > > In this patch, the hotplug-bridge uses notifiers to be informed when > > > > the following bridge is disappearing: which other way would you suggest > > > > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is > > > > not meant to disappear, and it has no actual hardware that can go away, > > > > being just a mechanical connector. > > > > > > Yeah you need code to handle that. My point is that using a notifier is > > > the wrong design, because the notifier has it's own locking. Which tends > > > to get in the way. > > > > I went into this subject to see how drm_bridge_add() could inform the > > interested bridges using a DRM-specific mechanism instead of standard > > notifiers. > > > > However I think to inform *the interested* bridges, at least in the > > device tree case, there is a fundamental issue: DRM core has no idea > > about the topology. Definitely not at drm_bridge_add() time, way before > > drm_bridge_attach() where the 'previous' pointer introduces a minimum > > of topology awareness in the DRM core. > > Yeah we need special functions, which the last fixed bridge needs to call > instead of the current set of functions. So instead of of_drm_find_bridge > we need something like of_drm_register_hotplug_source_bridge or similar. Continuing on the above topic, are you suggesting that there should be no hotplug-bridge, and that every bridge that can be used as the "last fixed bridge" should handle the hotplug case individually? If that is the case, we'd need to modify any bridge driver that people want to use as "last fixed bridge" to: 1. know they are the "last fixed bridge" (via device tree?) 2. use of_drm_register_hotplug_source_bridge() instead of of_drm_find_bridge() accordingly > > Option 1 is to iterate over all the ports and endpoints and for every > > remote-endpoint pointing to a bridge, inform the remote bridge about > > that via a new optional callback in drm_bridge_funcs. That means likely > > informing more bridges than needed, so when they get informed the > > bridges will still have to check whether they are interested or not. > > > > Pseudocode: > > > > void drm_bridge_add(struct drm_bridge *new_bridge) > > { > > // existing code unchanged > > > > + if (bridge->of_node) { > > + for_each_remote_endpoint(bridge->of_node) { > > + br = of_drm_find_bridge(remote_endpoint); > > + if (br && br->funcs.bridge_event_notify) > > + br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_ADD, > > + new_bridge); > > + } > > + } > > } > > > > That means informing both upstream and downstream bridges, which could > > even be useful, but anyway there is no reliable way to pick only the > > sink or source ports. It also implies having lots of of_*() calls which > > iterate over the tree, which is not optimal, but it happens only at > > add/removal so it's fine I guess. > > > > Option 2 is be to just inform all the bridges (using the global > > bridge_list). Pros and cons: > > > > * less of_*() calls to crawl around the remote-endpoints > > * simpler code > > * more bridges than needed would be informed, could be be an issue > > if many implement the .bridge_event_notify() > > * notifies even in the non-OF case too, not sure it's useful > > > > What are your thoughts about these two options? > > I'm not sure why you need to inform bridges? Currently for fixed bridges > we retry with EPROBE_DEFER if the subsequent bridge isn't there yet, which > means we only add the bridge when the entire chain exists. I think > hotplugged bridge chains should work the same, which means you only ever > need to watch the first element shows up. Then you create the connector > for the entire chain and through special callbacks inform the bridge that > called of_drm_register_hotplug_source_bridge that it now has a connector. > > All the other bridges can be ignored I think, even when they're part of > the hotplug chain. > > I just realized that means we need at least a refcount on hotpluggable > bridges, because otherwise drm_bridge_remove will result in use-after-free > issues. So should probably rename those from drm_bridge_add/remove to > drm_bridge_register/unregister. > > For fixed bridges no one else should ever increment the refcount, so > should work exactly the same. > > > > Instead I think that code should be directly in core bridge code (I don't > > > see the benefit of having this entirely in a separate driver), using drm > > > locking to make sure there's no races. > > > > Not sure I got what you mean here. Which one of the following? > > > > 1. The entire hotplug-bridge driver should not exist, and the DRM > > core should handle it all (seems not doable, this driver has lots of > > device-specific details) > > 2. The hotplug-driver should exist, but the code to attach/detach the > > pipeline on hotplug/unplug should be moved to the core, with the > > hotplug-driver providing callbacks for the device-specific aspects > > 3. Same as 2, but additionally the hotplug-bridge should become a > > connector driver (hotplug-connector.c?) -- not sure it can decouple > > the two sides without a bridge however > > 4. None of the above > > 3, roughly. The connector creation must be in core bridge code, or things > will go boom. Based on this I think you mean: 1. the hotplug-*something* driver should exist 2. it should add the fixed connector (DSI in my case) on probe 3. it should add/remove the removable connector (LVDS) on hot(un)plug 4. it should add _no_ bridge (in the sense of struct drm_bridge) [not sure it can still decouple the fixed VS addon pipeline parts] Could you please clearly state your position about the above points? Best regards, Luca > If you also want to keep the current lifetime model for bridge (without a > refcount, meaning drm_bridge_remove removes the bridge from all use right > away) there's also additional locking and tricks you need in in that > bridge connector code, including the entire bridge chain. Meaning those > hotplugged bridges must not be visible at all to the wider driver. > > Or you refcount bridges and make them like driver model struct kobj (or > kobj directly), but that's a pile of work too. That would be the other > design approach, instead of only relying on drm_connector. We might need > both, I haven't dug into the details enough to be sure here. > > Cheers, Sima > > > > > > On the opposite side, the following bridges are physically removable > > > > and so their drivers will have to be fixed (if needed) to work when > > > > removal happens, but I don't see how that relates to the DRM core > > > > emitting a notification of such bridges being removed. > > > > > > Yeah it's not directly related, just my experience that people assume > > > notifiers provide you more synchronization and race-preventation than they > > > really do. So it's better to hand-roll to make it all really explicit. > > > > > > > > - Related to this: You're not allowed to shut down hardware behind the > > > > > user's back with drm_atomic_helper_shutdown. We've tried that approach > > > > > with dp mst, it really pisses off userspace when a page_flip that it > > > > > expected to work doesn't work. > > > > > > > > > > - There's also the design aspect that in atomic, only atomic_check is > > > > > allowed to fail, atomic_commit must succeed, even when the hardware is > > > > > gone. Using connectors and their refcounting should help with that. > > > > > > > > IIUC here you are suggesting again to remove the connector instead of > > > > marking it "disconnected". So, as above, pointers on how that is > > > > achieved would be helpful. > > > > > > See dp mst code. It's complex unfortunately, so there's some reading > > > involved :-/ > > > > > > > > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in > > > > > drm_bridge_attach, and that's kinda the wrong place. It should be in > > > > > drm_mode_config_reset, like all the other ->atomic_reset hooks. That > > > > > would make it a lot clearer that we need to figure out who/when > > > > > ->atomic_reset should be called for hotplugged bridges, maybe as part of > > > > > connector registration when the entire bridge and it's new connector is > > > > > assembled? > > > > > > > > > > - Finally this very much means we need to rethink who/how the connector > > > > > for a bridge is created. The new design is that the main driver creates > > > > > this connector, once the entire bridge exists. But with hotplugging this > > > > > gets a lot more complicated, so we might want to extract a pile of that > > > > > encoder related code from drivers (same way dp mst helpers take care of > > > > > connector creation too, it's just too much of a mess otherwise). > > > > > > > > > > The current bridge chaining infrastructure requires a lot of hand-rolled > > > > > code in each bridge driver and the encoder, so that might be a good > > > > > thing anyway. > > > > > > > > > > - Finally I think the entire bridge hotplug infrastructure should be > > > > > irrespective of the underlying bus. Which means for the mipi dsi case we > > > > > might also want to look into what's missing to make mipi dsi > > > > > hotunpluggable, at least for the case where it's a proper driver. I > > > > > think we should ignore the old bridge model where driver's stitched it > > > > > all toghether using the component framework, in my opinion that approach > > > > > should be deprecated. > > > > > > > > The DSI side was one of my headaches on writing this driver, and I > > > > must say I dislike the code in hotplug_bridge_dsi_attach(), with the > > > > need for an initial "dummy" attach and detach the first time. At first > > > > sight I think we need a .update_format callback in struct > > > > mipi_dsi_host_ops so the DSI host is aware of a format change. > > > > > > > > Right now there are DSI host drivers which keep a copy of the struct > > > > mipi_dsi_device pointer and read the format from there when starting a > > > > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way > > > > to react to format changes, but keeping a pointer is bad when the DSI > > > > device hot-unplugs, and the new format won't be seen until a > > > > disable/enable cycle. So a new callback looks more robust overall. > > > > > > > > Any opinion about this? > > > > > > Yeah I don't like the code either. > > > > > > What might help for prototyping is if you start with a hotpluggeable > > > bridge where the bridge is a i2c device. That way I think we should be > > > able to benefit from the driver bind/unbind code that exists already. > > > Although there's going to be issues with i2c hotunplug too in i2c core > > > code. > > > > > > Then lift whatever we learn there to our dsi code. But essentially I think > > > we should model the driver core model a lot more, so I guess you could use > > > any hotunplug capable bus as a template. Usb might be closest, since > > > that's also a packet/message based interface, mostly at least? > > > > > > > > - Finally I think we should have a lot of safety checks, like only bridges > > > > > which declare themselve to be hotunplug safe should be allowed as a part > > > > > of the hotpluggable bridge chain part. All others must still be attached > > > > > before the entire driver is registered with drm_dev_register. > > > > > > > > I'm fine with the principle of a "HOTPLUG" flag. > > > > > > > > I wonder how that should be implemented, though. Based on my (surely > > > > simplistic) understanding, right now bridges can be both added and > > > > removed, but only in a specific sequence: > > > > > > > > 1. add all bridges > > > > 2. use the card > > > > 3. remove all bridges > > > > 4. EOF > > > > > > > > We'd need to change to allow: > > > > > > > > 1. add fixed bridges (including hotplug-bridge) > > > > 2. add bridges on removable add-on > > > > 3. use card > > > > 4. remove bridges on removable add-on > > > > 5. optionally goto 2 > > > > 6. remove fixed bridges (including hotplug-bridge) > > > > 7. EOF > > > > > > > > One naïve idea is that the DRM core keeps a flag whenever any fixed > > > > bridge (no HOTLPUG flag) is removed and when that happens forbid adding > > > > bridges as we are now at line 5. > > > > > > If a fixed bridge is removed while the driver is still in use (i.e. before > > > drm_dev_unregister is called), that's a driver bug. Would be good to catch > > > that, which is why I think a lot of all the bridge hotplug handling should > > > be in core bridge code and not the separate hotplug driver, so that we can > > > enforce all these constraints. > > > > > > Also conceptually 3 can happen before 2 (but also before), and that's how > > > dp mst works too. It does add all kinds of complications though ... > > > > > > > Aside for tons of subtleties I'm surely missing, does this look a > > > > proper approach? I'd not be surprised if there is something better and > > > > more solid. > > > > > > Yeah roughly. If you look through dp mst code then that also adds all > > > kinds of structures (since dp mst is a routed network really), not just > > > the connectors. They also all come with refcounts (because the network is > > > a tree), but like I said I think for your case we can avoid the per-bridge > > > refcounts by relying on the connector refcount we have already. > > > > > > But I might be overlooking something, and we need refcounting for each > > > bridge like dp mst also needs refcounting for all its internal structures > > > that map the entire hotpluggable display chain. If you want to read some > > > dp mst code, these internal structures are ports/branches with struct > > > drm_dp_mst_branch/port. > > > > > > > > Or that we only allow bridges with the NO_CONNECTOR flag for > > > > > drm_bridge_attach. > > > > > > > > > > There's probably a pile more fundamental issues I've missed, but this > > > > > should get a good discussion started. > > > > > > > > Sure, I think it has. > > > > > > > > Bottom line, I'm clearly not seeing some issues that need to be > > > > considered, and that do not show up for my use case. Based on my > > > > limited DRM knowledge, this was expected, and I'm glad to work on those > > > > issues with some practical indications about the path forward. > > > > > > Yeah for learning I think dp mst is best. It's a bit complex, but since > > > you have a dock you should be able to play around and experiment with the > > > code with some real hardware. > > > > > > That should help to get a bit a feel for the complexity, since lots of > > > opportunities for you to ask why/how locking/refcounting is used and > > > against which potential issue they protect. > > > > > > Cheers, Sima > > > > Luca > > > > -- > > Luca Ceresoli, Bootlin > > Embedded Linux and Kernel engineering > > https://bootlin.com >
On Mon, Sep 09, 2024 at 03:26:04PM +0200, Luca Ceresoli wrote: > Hello Sima, > > again thanks for your feedback, and again I'm afraid I'm having a hard > time in understanding large parts of your e-mail. Below I'm trying to > focus on one fundamental aspect only, which I need to understand the > overall approach you are suggesting. Sorry for the late reply, was a bit distracted last weeks ... > On Tue, 27 Aug 2024 18:37:12 +0200 > Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > On Fri, Aug 23, 2024 at 12:39:03PM +0200, Luca Ceresoli wrote: > > > Hello Sima, > > > > > > these days I started looking in more detail into some of the topics you > > > had mentioned in your v2 review. I have questions about those I have > > > investigated, see below. > > > > > > On Tue, 21 May 2024 14:01:19 +0200 > > > Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > > > On Mon, May 20, 2024 at 02:01:48PM +0200, Luca Ceresoli wrote: > > > > > Hello Daniel, > > > > > > > > > > On Thu, 16 May 2024 15:22:01 +0200 > > > > > Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > > > > > > > Apologies for missing v1 ... > > > > > > > > > > > > On Fri, May 10, 2024 at 09:10:36AM +0200, Luca Ceresoli wrote: > > > > > > > DRM hotplug bridge driver > > > > > > > ========================= > > > > > > > > > > > > > > DRM natively supports pipelines whose display can be removed, but all the > > > > > > > components preceding it (all the display controller and any bridges) are > > > > > > > assumed to be fixed and cannot be plugged, removed or modified at runtime. > > > > > > > > > > > > > > This series adds support for DRM pipelines having a removable part after > > > > > > > the encoder, thus also allowing bridges to be removed and reconnected at > > > > > > > runtime, possibly with different components. > > > > > > > > > > > > > > This picture summarizes the DRM structure implemented by this series: > > > > > > > > > > > > > > .------------------------. > > > > > > > | DISPLAY CONTROLLER | > > > > > > > | .---------. .------. | > > > > > > > | | ENCODER |<--| CRTC | | > > > > > > > | '---------' '------' | > > > > > > > '------|-----------------' > > > > > > > | > > > > > > > |DSI HOTPLUG > > > > > > > V CONNECTOR > > > > > > > .---------. .--. .-. .---------. .-------. > > > > > > > | 0 to N | | _| _| | | 1 to N | | | > > > > > > > | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | > > > > > > > | | | | | | | | | | > > > > > > > '---------' '--' '-' '---------' '-------' > > > > > > > > > > > > > > [--- fixed components --] [----------- removable add-on -----------] > > > > > > > > > > > > > > Fixed components include: > > > > > > > > > > > > > > * all components up to the DRM encoder, usually part of the SoC > > > > > > > * optionally some bridges, in the SoC and/or as external chips > > > > > > > > > > > > > > Components on the removable add-on include: > > > > > > > > > > > > > > * one or more bridges > > > > > > > * a fixed connector (not one natively supporting hotplug such as HDMI) > > > > > > > * the panel > > > > > > > > > > > > So I think at a high level this design approach makes sense, > > > > > > > > > > Good starting point :) > > > > > > > > > > > but the > > > > > > implementation needs some serious thought. One big thing upfront though, > > > > > > we need to have a clear plan for the overlay hotunload issues, otherwise > > > > > > trying to make drm bridges hotpluggable makes no sense to me. Hotunload is > > > > > > very, very tricky, full of lifetime issues, and those need to be sorted > > > > > > out first or we're just trying to build a castle on quicksand. > > > > > > > > > > > > For bridges itself I don't think the current locking works. You're trying > > > > > > to really cleverly hide it all behind a normal-looking bridge driver, but > > > > > > there's many things beyond that which will blow up if bridges just > > > > > > disappear. Most importantly the bridge states part of an atomic update. > > > > > > > > > > Surely possible as atomic updates are definitely not stimulated in my > > > > > use case. Can you recommend any testing tools to be able to trigger any > > > > > issues? > > > > > > > > Uh really hard ... You'd need to create an atomic commit that's blocked on > > > > a sync_file in-fence (so that you can extend the race window). And then > > > > hotunplug the bridge chain _before_ you signal that fence. > > > > > > > > That's not going to cover all possible races, but at least a large chunk > > > > of the really big ones. > > > > > > > > > The main setups I used for my testing so far are 'modetest -s' for my > > > > > daily work and a simple weston setup to periodically test a complete > > > > > user space stack. > > > > > > > > > > > Now in drm we have drm_connector as the only hotunpluggable thing, and it > > > > > > took years to sort out all the issues. I think we should either model the > > > > > > bridge hotunplug locking after that, or just outright reuse the connector > > > > > > locking and lifetime rules. I much prefer the latter personally. > > > > > > > > > > > > Anyway the big issues: > > > > > > > > > > > > - We need to refcount the hotpluggable bridges, because software (like > > > > > > atomic state updates) might hang onto pointers for longer than the > > > > > > bridge physically exists. Assuming that you can all tear it down > > > > > > synchronously will not work. > > > > > > > > > > > > If we reuse connector locking/lifetime then we could put the > > > > > > hotpluggable part of the bridge chain into the drm_connector, since that > > > > > > already has refcounting as needed. It would mean that finding the next > > > > > > bridge in the chain becomes a lot more tricky though. With that model > > > > > > we'd create a new connector every time the bridge is hotplugged, which I > > > > > > think is also the cleaner model (because you might plug in a hdmi > > > > > > connector after a panel, so things like the connector type change). > > > > > > So, based on your dp__mst hint I added connector creation/removal in the > > > v3 I sent a few days ago. All other aspects in your list are unchanged > > > from the v2 you have reviewed. > > > > > > Now I'm trying to tackle some other of the items you mentioned here, > > > and locking/lifetime is the hardest one for me to understand at the > > > moment. > > > > > > If I get you right, you suggest making all the removable bridges > > > "owned" by the connector that gets hotplugged, the reason being that > > > locking and lifetime issues for the hotplug connectors have already > > > been sorted out, while that has not been done for bridges. > > > > > > Correct? > > > > Yes. > > > > > Assuming the above is correct, I'm wondering whether this would be a > > > correct design or rather a shortcut to leverage the connector locking > > > instead of implementing bridge locking/lifetime. Note I'm not > > > criticizing, I'm really asking myself this question and I'd like to > > > know your position about that. > > > > > > Again about putting the removable bridges inside the hotplug connector, > > > I'm trying to understand how that may happen in the device tree case. > > > The hot-pluggable bridge is populated starting from the DT code > > > (through i2c-core-of in my case being an I2C chip), and that code is > > > not aware that it is being instantiating a DRM device. So there is > > > nothing we can do before the bridge probe is called. The bridge probe > > > then does know much about any connectors. It also does not know about > > > hotplug: this is a design decision I made to make the regular bridge > > > drivers reusable without edits for the hotplug case, but it can be > > > waived if needed. > > > > > > So the only way I currently to move the bridge inside the connector is > > > to catch it in the hotplug-bridge driver, when it gets notified about > > > the new bridge having appeared (was a notifier, I'm changing that, see > > > below). So the hotplug-bridge would need a function line > > > drm_connector_add_bridge(conn, br) to make the association. > > > > > > Is this the direction of development you had in mind? > > > > Not quite. The issue is that while bridges are getting hotplugged, you > > don't yet have a connector. So there's nothing you can attach to. What I > > had in mind is more: > > > > There is a fundamental question where your position is not clear to me. > Based on this: > > > - The last fixed bridges knows that all subsequent bridges are hotplugged. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Which means instead of just asking for the next bridge, it needs to ask > > for the fully formed bridge chain, including the connector. > > > > and this: > > > - The hotplug bridge connector code checks every time a new bridge shows > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > up whether the chain is now complete. Once that is the case, it creates > > the connector (with the new bridge design this is no longer the job of > > the last bridge in the chain, so we need to require that for > > hotpluggable bridges). Then it can attach all the bridges to that > > connector, and hand the entire fully formed chain to the last fixed > > bridge to hotplug insert and register. > > The question is: do you think the hotplug-bridge driver should be > present, or not? To me the two above sentences appear to contradict each > other. > > The reason we decided to implement a hotplug DRM bridge is it allows to > decouple the fixed and the remote segments of the pipeline, in such a > way that all the regular bridge drivers don't need any special handling > to support hotplug. > > In my work the upstream bridge driver is samsung-dsim.c and the > downstream one is ti-sn65dsi83.c and none of them needed a single line > changed to support hotplug. I think this is useful: virtually any > physical bridge with pins can be used in a hotplug setup, either in the > fixed or in the removable section, so not needing to modify them is > valuable. > > OTOH in various parts of this and other e-mails you seem to imply that > there should be no hotplug-bridge (not as a struct drm_bridge, not as > a driver, or both). Except for the fact that there is no chip > implementing such a bridge (there is a physical connector though) I do > not see many reasons. Yeah you can have a dummy hotplug-bridge driver which just represents the hotplug connector, I don't see an issue with that. And sounds like a reasonable idea if it helps modelling with DT and all that. What I described above was just focused on the interaction between bridge drivers and the hotplug support code. I think you absolutely need the last bridge driver to be aware that the entire subsequent chain is hotplugged, otherwise it wont work. That last bridge driver can be a special hotplug driver, but it doesn't need to be the case. > > - Hotunplug is the same, but on inverse: The entire chain including the > > connector gets removed in it's entirety, before it gets dissassembled > > and freed (once the last refcount to the connector is dropped). > > > > There is a _lot_ of handwaving in this on the driver model side, but with > > this the drm side should be clear and clean at least. > > > > > > > I have been investigating the option of adding/removing a connector > > > > > based on hot-plug/unplug events initially, see my reply to Maxime after > > > > > v1 [0]: > > > > > > > > > > > The first approach is based on removing the drm_connector. My laptop > > > > > > uses the i915 driver, and I have observed that attaching/removing a > > > > > > USB-C dock with an HDMI connector connected to a monitor, a new > > > > > > drm_connector appears/disappears for the card. User space gets notified > > > > > > and the external monitor is enabled/disabled, just the way a desktop > > > > > > user would expect, so this is possible. I had a look at the driver but > > > > > > how this magic happens was not clear to me honestly. > > > > > > > > > > So if you could point to where/how this is done, I would certainly > > > > > reconsider that. > > > > > > > > Right now only the dp mst code uses hotplug/unplug (like you've observed > > > > in your testing with i915, usb-c docks are generally all dp mst). For code > > > > reading it might be best to start with the i915 dp mst code and then go > > > > through the helpers. > > > > > > > > > > - No notifiers please. The create a locking mess with inversions, and > > > > > > especially for hotunplug they create the illusion that you can > > > > > > synchronously keep up to date with hardware state. That's not possible. > > > > > > Fundamentally all bridge drivers which might be hotunplugged need to be > > > > > > able to cope with the hardware disappearing any momemnt. > > > > > > > > > > Can you clarify this point? I'm sorry I fail to understand the > > > > > relationship between the use of notifiers and the ability of bridge > > > > > drivers to cope with hardware disappearing. > > > > > > > > > > In this patch, the hotplug-bridge uses notifiers to be informed when > > > > > the following bridge is disappearing: which other way would you suggest > > > > > to make the hotplug-bridge aware of that? OTOH the hotplug-bridge is > > > > > not meant to disappear, and it has no actual hardware that can go away, > > > > > being just a mechanical connector. > > > > > > > > Yeah you need code to handle that. My point is that using a notifier is > > > > the wrong design, because the notifier has it's own locking. Which tends > > > > to get in the way. > > > > > > I went into this subject to see how drm_bridge_add() could inform the > > > interested bridges using a DRM-specific mechanism instead of standard > > > notifiers. > > > > > > However I think to inform *the interested* bridges, at least in the > > > device tree case, there is a fundamental issue: DRM core has no idea > > > about the topology. Definitely not at drm_bridge_add() time, way before > > > drm_bridge_attach() where the 'previous' pointer introduces a minimum > > > of topology awareness in the DRM core. > > > > Yeah we need special functions, which the last fixed bridge needs to call > > instead of the current set of functions. So instead of of_drm_find_bridge > > we need something like of_drm_register_hotplug_source_bridge or similar. > > Continuing on the above topic, are you suggesting that there should be > no hotplug-bridge, and that every bridge that can be used as the "last > fixed bridge" should handle the hotplug case individually? > > If that is the case, we'd need to modify any bridge driver that people > want to use as "last fixed bridge" to: > > 1. know they are the "last fixed bridge" (via device tree?) > 2. use of_drm_register_hotplug_source_bridge() > instead of of_drm_find_bridge() accordingly This depends upon the dt design. If the dt design has a separate distinct hotplug node, then we could bind a hotplug bridge against that, which is aware. But I think for some case (maybe dsi nodes) the dt design would be more an attribute somewhere plus a link to the first hotplugged element. And in that case the last physical bridge driver needs to be aware of that - we simply don't have any dt node we can bind the hotplug bridge driver against. I think the generic bridge hotplug design should not make an assumption about how the dt is designed here and allow both. Also if the dt design for the approach without a separate hotplug connector is standardized, we can have a of_drm_handle_next_bridge function which does the right thing for both cases automatically. I think that way the impact on existing bridge drivers is minimal. > > > Option 1 is to iterate over all the ports and endpoints and for every > > > remote-endpoint pointing to a bridge, inform the remote bridge about > > > that via a new optional callback in drm_bridge_funcs. That means likely > > > informing more bridges than needed, so when they get informed the > > > bridges will still have to check whether they are interested or not. > > > > > > Pseudocode: > > > > > > void drm_bridge_add(struct drm_bridge *new_bridge) > > > { > > > // existing code unchanged > > > > > > + if (bridge->of_node) { > > > + for_each_remote_endpoint(bridge->of_node) { > > > + br = of_drm_find_bridge(remote_endpoint); > > > + if (br && br->funcs.bridge_event_notify) > > > + br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_ADD, > > > + new_bridge); > > > + } > > > + } > > > } > > > > > > That means informing both upstream and downstream bridges, which could > > > even be useful, but anyway there is no reliable way to pick only the > > > sink or source ports. It also implies having lots of of_*() calls which > > > iterate over the tree, which is not optimal, but it happens only at > > > add/removal so it's fine I guess. > > > > > > Option 2 is be to just inform all the bridges (using the global > > > bridge_list). Pros and cons: > > > > > > * less of_*() calls to crawl around the remote-endpoints > > > * simpler code > > > * more bridges than needed would be informed, could be be an issue > > > if many implement the .bridge_event_notify() > > > * notifies even in the non-OF case too, not sure it's useful > > > > > > What are your thoughts about these two options? > > > > I'm not sure why you need to inform bridges? Currently for fixed bridges > > we retry with EPROBE_DEFER if the subsequent bridge isn't there yet, which > > means we only add the bridge when the entire chain exists. I think > > hotplugged bridge chains should work the same, which means you only ever > > need to watch the first element shows up. Then you create the connector > > for the entire chain and through special callbacks inform the bridge that > > called of_drm_register_hotplug_source_bridge that it now has a connector. > > > > All the other bridges can be ignored I think, even when they're part of > > the hotplug chain. > > > > I just realized that means we need at least a refcount on hotpluggable > > bridges, because otherwise drm_bridge_remove will result in use-after-free > > issues. So should probably rename those from drm_bridge_add/remove to > > drm_bridge_register/unregister. > > > > For fixed bridges no one else should ever increment the refcount, so > > should work exactly the same. > > > > > > Instead I think that code should be directly in core bridge code (I don't > > > > see the benefit of having this entirely in a separate driver), using drm > > > > locking to make sure there's no races. > > > > > > Not sure I got what you mean here. Which one of the following? > > > > > > 1. The entire hotplug-bridge driver should not exist, and the DRM > > > core should handle it all (seems not doable, this driver has lots of > > > device-specific details) > > > 2. The hotplug-driver should exist, but the code to attach/detach the > > > pipeline on hotplug/unplug should be moved to the core, with the > > > hotplug-driver providing callbacks for the device-specific aspects > > > 3. Same as 2, but additionally the hotplug-bridge should become a > > > connector driver (hotplug-connector.c?) -- not sure it can decouple > > > the two sides without a bridge however > > > 4. None of the above > > > > 3, roughly. The connector creation must be in core bridge code, or things > > will go boom. > > Based on this I think you mean: > > 1. the hotplug-*something* driver should exist This part I'm not sure is required, see my comments above. I think it depends upon how the dt design ultimately will look like, and I don't have an input on that. > 2. it should add the fixed connector (DSI in my case) on probe > 3. it should add/remove the removable connector (LVDS) on hot(un)plug The new bridge design is that bridges do _not_ create connectors themselves. Instead the driver does that, using the bridge code as helpers to make sure things work correctly. But aside from that I think this sounds good. I'm not sure you need the connector from step 2, but it shouldn't hurt either. With dp mst we create that connector because dp can also be driven directly without mst, so there we need that connector to be able to do modesets from userspace. > 4. it should add _no_ bridge (in the sense of struct drm_bridge) > [not sure it can still decouple the fixed VS addon pipeline parts] Yeah that's the tricky part, but definitely those hotplugged bridges should not be part of the currently existing bridge chain, because that cannot cope with hotplugs. Cheers, Sima > > Could you please clearly state your position about the above points? > > Best regards, > Luca > > > If you also want to keep the current lifetime model for bridge (without a > > refcount, meaning drm_bridge_remove removes the bridge from all use right > > away) there's also additional locking and tricks you need in in that > > bridge connector code, including the entire bridge chain. Meaning those > > hotplugged bridges must not be visible at all to the wider driver. > > > > Or you refcount bridges and make them like driver model struct kobj (or > > kobj directly), but that's a pile of work too. That would be the other > > design approach, instead of only relying on drm_connector. We might need > > both, I haven't dug into the details enough to be sure here. > > > > Cheers, Sima > > > > > > > > > On the opposite side, the following bridges are physically removable > > > > > and so their drivers will have to be fixed (if needed) to work when > > > > > removal happens, but I don't see how that relates to the DRM core > > > > > emitting a notification of such bridges being removed. > > > > > > > > Yeah it's not directly related, just my experience that people assume > > > > notifiers provide you more synchronization and race-preventation than they > > > > really do. So it's better to hand-roll to make it all really explicit. > > > > > > > > > > - Related to this: You're not allowed to shut down hardware behind the > > > > > > user's back with drm_atomic_helper_shutdown. We've tried that approach > > > > > > with dp mst, it really pisses off userspace when a page_flip that it > > > > > > expected to work doesn't work. > > > > > > > > > > > > - There's also the design aspect that in atomic, only atomic_check is > > > > > > allowed to fail, atomic_commit must succeed, even when the hardware is > > > > > > gone. Using connectors and their refcounting should help with that. > > > > > > > > > > IIUC here you are suggesting again to remove the connector instead of > > > > > marking it "disconnected". So, as above, pointers on how that is > > > > > achieved would be helpful. > > > > > > > > See dp mst code. It's complex unfortunately, so there's some reading > > > > involved :-/ > > > > > > > > > > > - Somewhat aside, but I noticed that the bridge->atomic_reset is in > > > > > > drm_bridge_attach, and that's kinda the wrong place. It should be in > > > > > > drm_mode_config_reset, like all the other ->atomic_reset hooks. That > > > > > > would make it a lot clearer that we need to figure out who/when > > > > > > ->atomic_reset should be called for hotplugged bridges, maybe as part of > > > > > > connector registration when the entire bridge and it's new connector is > > > > > > assembled? > > > > > > > > > > > > - Finally this very much means we need to rethink who/how the connector > > > > > > for a bridge is created. The new design is that the main driver creates > > > > > > this connector, once the entire bridge exists. But with hotplugging this > > > > > > gets a lot more complicated, so we might want to extract a pile of that > > > > > > encoder related code from drivers (same way dp mst helpers take care of > > > > > > connector creation too, it's just too much of a mess otherwise). > > > > > > > > > > > > The current bridge chaining infrastructure requires a lot of hand-rolled > > > > > > code in each bridge driver and the encoder, so that might be a good > > > > > > thing anyway. > > > > > > > > > > > > - Finally I think the entire bridge hotplug infrastructure should be > > > > > > irrespective of the underlying bus. Which means for the mipi dsi case we > > > > > > might also want to look into what's missing to make mipi dsi > > > > > > hotunpluggable, at least for the case where it's a proper driver. I > > > > > > think we should ignore the old bridge model where driver's stitched it > > > > > > all toghether using the component framework, in my opinion that approach > > > > > > should be deprecated. > > > > > > > > > > The DSI side was one of my headaches on writing this driver, and I > > > > > must say I dislike the code in hotplug_bridge_dsi_attach(), with the > > > > > need for an initial "dummy" attach and detach the first time. At first > > > > > sight I think we need a .update_format callback in struct > > > > > mipi_dsi_host_ops so the DSI host is aware of a format change. > > > > > > > > > > Right now there are DSI host drivers which keep a copy of the struct > > > > > mipi_dsi_device pointer and read the format from there when starting a > > > > > stream (e.g. the tc358768.c driver [1]). That somewhat provides a way > > > > > to react to format changes, but keeping a pointer is bad when the DSI > > > > > device hot-unplugs, and the new format won't be seen until a > > > > > disable/enable cycle. So a new callback looks more robust overall. > > > > > > > > > > Any opinion about this? > > > > > > > > Yeah I don't like the code either. > > > > > > > > What might help for prototyping is if you start with a hotpluggeable > > > > bridge where the bridge is a i2c device. That way I think we should be > > > > able to benefit from the driver bind/unbind code that exists already. > > > > Although there's going to be issues with i2c hotunplug too in i2c core > > > > code. > > > > > > > > Then lift whatever we learn there to our dsi code. But essentially I think > > > > we should model the driver core model a lot more, so I guess you could use > > > > any hotunplug capable bus as a template. Usb might be closest, since > > > > that's also a packet/message based interface, mostly at least? > > > > > > > > > > - Finally I think we should have a lot of safety checks, like only bridges > > > > > > which declare themselve to be hotunplug safe should be allowed as a part > > > > > > of the hotpluggable bridge chain part. All others must still be attached > > > > > > before the entire driver is registered with drm_dev_register. > > > > > > > > > > I'm fine with the principle of a "HOTPLUG" flag. > > > > > > > > > > I wonder how that should be implemented, though. Based on my (surely > > > > > simplistic) understanding, right now bridges can be both added and > > > > > removed, but only in a specific sequence: > > > > > > > > > > 1. add all bridges > > > > > 2. use the card > > > > > 3. remove all bridges > > > > > 4. EOF > > > > > > > > > > We'd need to change to allow: > > > > > > > > > > 1. add fixed bridges (including hotplug-bridge) > > > > > 2. add bridges on removable add-on > > > > > 3. use card > > > > > 4. remove bridges on removable add-on > > > > > 5. optionally goto 2 > > > > > 6. remove fixed bridges (including hotplug-bridge) > > > > > 7. EOF > > > > > > > > > > One naïve idea is that the DRM core keeps a flag whenever any fixed > > > > > bridge (no HOTLPUG flag) is removed and when that happens forbid adding > > > > > bridges as we are now at line 5. > > > > > > > > If a fixed bridge is removed while the driver is still in use (i.e. before > > > > drm_dev_unregister is called), that's a driver bug. Would be good to catch > > > > that, which is why I think a lot of all the bridge hotplug handling should > > > > be in core bridge code and not the separate hotplug driver, so that we can > > > > enforce all these constraints. > > > > > > > > Also conceptually 3 can happen before 2 (but also before), and that's how > > > > dp mst works too. It does add all kinds of complications though ... > > > > > > > > > Aside for tons of subtleties I'm surely missing, does this look a > > > > > proper approach? I'd not be surprised if there is something better and > > > > > more solid. > > > > > > > > Yeah roughly. If you look through dp mst code then that also adds all > > > > kinds of structures (since dp mst is a routed network really), not just > > > > the connectors. They also all come with refcounts (because the network is > > > > a tree), but like I said I think for your case we can avoid the per-bridge > > > > refcounts by relying on the connector refcount we have already. > > > > > > > > But I might be overlooking something, and we need refcounting for each > > > > bridge like dp mst also needs refcounting for all its internal structures > > > > that map the entire hotpluggable display chain. If you want to read some > > > > dp mst code, these internal structures are ports/branches with struct > > > > drm_dp_mst_branch/port. > > > > > > > > > > Or that we only allow bridges with the NO_CONNECTOR flag for > > > > > > drm_bridge_attach. > > > > > > > > > > > > There's probably a pile more fundamental issues I've missed, but this > > > > > > should get a good discussion started. > > > > > > > > > > Sure, I think it has. > > > > > > > > > > Bottom line, I'm clearly not seeing some issues that need to be > > > > > considered, and that do not show up for my use case. Based on my > > > > > limited DRM knowledge, this was expected, and I'm glad to work on those > > > > > issues with some practical indications about the path forward. > > > > > > > > Yeah for learning I think dp mst is best. It's a bit complex, but since > > > > you have a dock you should be able to play around and experiment with the > > > > code with some real hardware. > > > > > > > > That should help to get a bit a feel for the complexity, since lots of > > > > opportunities for you to ask why/how locking/refcounting is used and > > > > against which potential issue they protect. > > > > > > > > Cheers, Sima > > > > > > Luca > > > > > > -- > > > Luca Ceresoli, Bootlin > > > Embedded Linux and Kernel engineering > > > https://bootlin.com > > > > > > -- > Luca Ceresoli, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com
Hello Simona, [+Cc: Dmitry Baryshkov who took part to the LPC discussion] On Tue, 24 Sep 2024 10:26:37 +0200 Simona Vetter <simona.vetter@ffwll.ch> wrote: > On Mon, Sep 09, 2024 at 03:26:04PM +0200, Luca Ceresoli wrote: ... > > There is a fundamental question where your position is not clear to me. > > Based on this: > > > > > - The last fixed bridges knows that all subsequent bridges are hotplugged. > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > Which means instead of just asking for the next bridge, it needs to ask > > > for the fully formed bridge chain, including the connector. > > > > > > > and this: > > > > > - The hotplug bridge connector code checks every time a new bridge shows > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > up whether the chain is now complete. Once that is the case, it creates > > > the connector (with the new bridge design this is no longer the job of > > > the last bridge in the chain, so we need to require that for > > > hotpluggable bridges). Then it can attach all the bridges to that > > > connector, and hand the entire fully formed chain to the last fixed > > > bridge to hotplug insert and register. > > > > The question is: do you think the hotplug-bridge driver should be > > present, or not? To me the two above sentences appear to contradict each > > other. > > > > The reason we decided to implement a hotplug DRM bridge is it allows to > > decouple the fixed and the remote segments of the pipeline, in such a > > way that all the regular bridge drivers don't need any special handling > > to support hotplug. > > > > In my work the upstream bridge driver is samsung-dsim.c and the > > downstream one is ti-sn65dsi83.c and none of them needed a single line > > changed to support hotplug. I think this is useful: virtually any > > physical bridge with pins can be used in a hotplug setup, either in the > > fixed or in the removable section, so not needing to modify them is > > valuable. > > > > OTOH in various parts of this and other e-mails you seem to imply that > > there should be no hotplug-bridge (not as a struct drm_bridge, not as > > a driver, or both). Except for the fact that there is no chip > > implementing such a bridge (there is a physical connector though) I do > > not see many reasons. > > Yeah you can have a dummy hotplug-bridge driver which just represents the > hotplug connector, I don't see an issue with that. And sounds like a > reasonable idea if it helps modelling with DT and all that. > > What I described above was just focused on the interaction between bridge > drivers and the hotplug support code. I think you absolutely need the last > bridge driver to be aware that the entire subsequent chain is hotplugged, > otherwise it wont work. That last bridge driver can be a special > hotplug driver, but it doesn't need to be the case. I see, that clarifies your position, thanks. ... > > > Yeah we need special functions, which the last fixed bridge needs to call > > > instead of the current set of functions. So instead of of_drm_find_bridge > > > we need something like of_drm_register_hotplug_source_bridge or similar. > > > > Continuing on the above topic, are you suggesting that there should be > > no hotplug-bridge, and that every bridge that can be used as the "last > > fixed bridge" should handle the hotplug case individually? > > > > If that is the case, we'd need to modify any bridge driver that people > > want to use as "last fixed bridge" to: > > > > 1. know they are the "last fixed bridge" (via device tree?) > > 2. use of_drm_register_hotplug_source_bridge() > > instead of of_drm_find_bridge() accordingly > > This depends upon the dt design. If the dt design has a separate distinct > hotplug node, then we could bind a hotplug bridge against that, which is > aware. > > But I think for some case (maybe dsi nodes) the dt design would be more an > attribute somewhere plus a link to the first hotplugged element. And in > that case the last physical bridge driver needs to be aware of that - we > simply don't have any dt node we can bind the hotplug bridge driver > against. I think the generic bridge hotplug design should not make an > assumption about how the dt is designed here and allow both. > > Also if the dt design for the approach without a separate hotplug > connector is standardized, we can have a of_drm_handle_next_bridge > function which does the right thing for both cases automatically. I think > that way the impact on existing bridge drivers is minimal. Pushing this even more, instead of having bridges aware of being the last fixed ones, I've been pondering on a structure where every bridge assumes the next one could disappear, and works appropriately. So the current case (all bridges are fixed) would be just a special case where the next bridge is found initially and never disappears. This would probably be cleaner, no [if (hotplug) {this} else {that}] constructs, but I'm concerned about the transition path for old poorly-maintained drivers. ... > > > > > Instead I think that code should be directly in core bridge code (I don't > > > > > see the benefit of having this entirely in a separate driver), using drm > > > > > locking to make sure there's no races. > > > > > > > > Not sure I got what you mean here. Which one of the following? > > > > > > > > 1. The entire hotplug-bridge driver should not exist, and the DRM > > > > core should handle it all (seems not doable, this driver has lots of > > > > device-specific details) > > > > 2. The hotplug-driver should exist, but the code to attach/detach the > > > > pipeline on hotplug/unplug should be moved to the core, with the > > > > hotplug-driver providing callbacks for the device-specific aspects > > > > 3. Same as 2, but additionally the hotplug-bridge should become a > > > > connector driver (hotplug-connector.c?) -- not sure it can decouple > > > > the two sides without a bridge however > > > > 4. None of the above > > > > > > 3, roughly. The connector creation must be in core bridge code, or things > > > will go boom. > > > > Based on this I think you mean: > > > > 1. the hotplug-*something* driver should exist > > This part I'm not sure is required, see my comments above. I think it > depends upon how the dt design ultimately will look like, and I don't have > an input on that. > > > 2. it should add the fixed connector (DSI in my case) on probe > > 3. it should add/remove the removable connector (LVDS) on hot(un)plug > > The new bridge design is that bridges do _not_ create connectors > themselves. Instead the driver does that, using the bridge code as helpers > to make sure things work correctly. > > But aside from that I think this sounds good. I'm not sure you need the > connector from step 2, but it shouldn't hurt either. With dp mst we create > that connector because dp can also be driven directly without mst, so > there we need that connector to be able to do modesets from userspace. I had a sort of assumption that the fixed connector is needed to even populate the card, not sure I was correct. Surely from a high-level API it would make sense to remove it. I'll postpone this aspect to a later moment, and by that time questions about the hotplug-bridge will have been clarified. Right now I'm going to tackle the drm_bridge refcounting. > > 4. it should add _no_ bridge (in the sense of struct drm_bridge) > > [not sure it can still decouple the fixed VS addon pipeline parts] > > Yeah that's the tricky part, but definitely those hotplugged bridges > should not be part of the currently existing bridge chain, because that > cannot cope with hotplugs. Not sure what you mean here, and what you mean by "the currently existing bridge chain". Do you mean hot-plugged bridges, when present, should not be in the global bridge_list? That would make sense, sure. Best regards, Luca
Hello, this series aims at supporting a Linux device with a connector to physically add and remove an add-on to/from the main device to augment its features at runtime, using device tree overlays. This is the v2 of "drm: add support for hot-pluggable bridges" [0] which was however more limited in scope, covering only the DRM aspects. This new series also takes a different approach to the DRM bridge instantiation. [0] https://lore.kernel.org/all/20240326-hotplug-drm-bridge-v1-0-4b51b5eb75d5@bootlin.com/ Use case ======== This series targets a professional product (GE SUNH) that is composed of a "main" part running on battery, with the main SoC and able to work autonomously with limited features, and an optional "add-on" that enables more features by adding more hardware peripherals, some of which are on non-discoverable busses such as I2C and MIPI DSI. The add-on can be connected and disconnected at runtime at any moment by the end user, and add-on features need to be enabled and disabled automatically at runtime. The add-on has status pins that are connected to GPIOs on the main board, allowing the CPU to detect add-on insertion and removal. It also has a reset GPIO allowign to reset all peripherals on the add-on at once. The features provided by the add-on include a display and a battery charger to recharge the battery of the main part. The display on the add-on has an LVDS input but the connector between the base and the add-on has a MIPI DSI bus, so a DSI-to-LVDS bridge is present on the add-on. Different add-on models can be connected to the main part, and for this a model ID is stored in the add-on itself so the software running on the CPU on the main part knows which non-discoverable hardware to probe. Overall approach ================ Device tree overlays appear as the most natural solution to support the addition and removal of devices from a running system. Several features are missing from the mainline Linux kernel in order to support this use case: 1. runtime (un)loading of device tree overlays is not supported 2. if enabled, overlay (un)loading exposes several bugs 3. the DRM subsystem assumes video bridges are non-removable This series targets items 1 and 3. Item 2 is being handled separately (see "Device tree overlay issues" below). Device tree representation and connector driver =============================================== The device tree description we propose involves 3 main parts. 1: the main (fixed) device tree The main device tree describes the connector itself along with the status and reset GPIOs. It also provides a 'ports' node to describe how the two sides of the MIPI DSI bus connect to each other. So now, differently from v1, there is no standalone representation of the DRM bridge because it is not really a hardware component. Here is how the connector is represented in the fixed part of the device tree: / { #include <dt-bindings/gpio/gpio.h> addon_connector: addon-connector { compatible = "ge,sunh-addon-connector"; reset-gpios = <&gpio1 1 GPIO_ACTIVE_LOW>; plugged-gpios = <&gpio1 2 GPIO_ACTIVE_LOW>; powergood-gpios = <&gpio1 3 GPIO_ACTIVE_HIGH>; ports { #address-cells = <1>; #size-cells = <0>; port@0 { reg = <0>; hotplug_conn_dsi_in: endpoint { remote-endpoint = <&previous_bridge_out>; }; }; port@1 { reg = <1>; hotplug_conn_dsi_out: endpoint { // remote-endpoint to be added by overlay }; }; }; }; }; The connector has a specific compatible string, and this series adds a driver supporting it. This driver uses the deivce tree overlay loading and unloading facilities already implemented by the kernel but not currently exposed. The driver detects the connection status from the GPIOs and reacts to a connection event by loading a first overlay (the "base" overlay). 2: the "base" overlay The "base" overlay describes the common components that are required to read the model ID. These are identical for all add-on models, thus only one "base" overlay is needed: &i2c1 { #address-cells = <1>; #size-cells = <0>; eeprom@50 { compatible = "atmel,24c64"; reg = <0x50>; nvmem-layout { compatible = "fixed-layout"; #address-cells = <1>; #size-cells = <1>; addon_model_id: addon-model-id@400 { reg = <0x400 0x1>; }; }; }; }; &addon_connector { nvmem-cells = <&addon_model_id>; nvmem-cell-names = "id"; }; Here "i2c1" is an I2C bus whose adapter is on the main part (possibly with some clients) but whose lines extend through the connector to the add-on where the EEPROM is (possibly with more clients). The EEPROM holds the model ID of each add-on, using always the same I2C address and memory offset. The '&addon_connector' section provides the "glue" to describe how the NVMEM cell can be accessed via the connector. This allows the connector driver to read the model ID. 3: the "add-on-specific" overlay Based on the model ID, the connector driver loads the second overlay, which describes all the add-on hardware not yet described by the base overlay. This overlay is model-specific. &hotplug_conn_dsi_out { remote-endpoint = <&addon_bridge_in>; }; &i2c1 { #address-cells = <1>; #size-cells = <0>; dsi-lvds-bridge@42 { compatible = "some-video-bridge"; reg = <0x42>; ports { #address-cells = <1>; #size-cells = <0>; port@0 { reg = <0>; addon_bridge_in: endpoint { remote-endpoint = <&hotplug_conn_dsi_out>; data-lanes = <1 2 3 4>; }; }; port@1 { reg = <1>; addon_bridge_out: endpoint { remote-endpoint = <&panel_in>; }; }; }; }; }; &{/} { panel { ports { #address-cells = <1>; #size-cells = <0>; port@0{ reg = <0>; panel_in: endpoint { remote-endpoint = <&addon_bridge_out>; }; }; }; }; The '&hotplug_conn_dsi_out' section fills the port@1 section that was initially missing, in order to point to the continuation of the MIPI DSI line. The rest of this overlay just completes the video pipeline. In the '&i2c1' section, another I2C client is added to a bus, just as done for the EEPROM. After these steps, the add-on is fully described and working on the system. When the status GPIOs report a disconnection, the overlays are unloaded in reverse order. DRM hotplug bridge driver ========================= DRM natively supports pipelines whose display can be removed, but all the components preceding it (all the display controller and any bridges) are assumed to be fixed and cannot be plugged, removed or modified at runtime. This series adds support for DRM pipelines having a removable part after the encoder, thus also allowing bridges to be removed and reconnected at runtime, possibly with different components. This picture summarizes the DRM structure implemented by this series: .------------------------. | DISPLAY CONTROLLER | | .---------. .------. | | | ENCODER |<--| CRTC | | | '---------' '------' | '------|-----------------' | |DSI HOTPLUG V CONNECTOR .---------. .--. .-. .---------. .-------. | 0 to N | | _| _| | | 1 to N | | | | BRIDGES |--DSI-->||_ |_ |--DSI-->| BRIDGES |--LVDS-->| PANEL | | | | | | | | | | | '---------' '--' '-' '---------' '-------' [--- fixed components --] [----------- removable add-on -----------] Fixed components include: * all components up to the DRM encoder, usually part of the SoC * optionally some bridges, in the SoC and/or as external chips Components on the removable add-on include: * one or more bridges * a fixed connector (not one natively supporting hotplug such as HDMI) * the panel The video bus is MIPI DSI in the example and in the implementation provided by this series, but the implementation is meant to allow generalization to other video busses without native hotplug support, such as parallel video and LVDS. Note that the term "connector" in this context has nothing to do with the "DRM connector" abstraction already present in the DRM subsystem (struct drm_connector). In order to support the above use case while being reasonably generic and self-contained, the implementation is based on the introduction of the "hotplug-bridge". The hotplug-bridge implementation is as transparent as possible. In a nutshell, it decouples the preceding and the following components: the upstream components will believe that the pipeline is always there, whil the downstream components will undergo a normal probe/remove process on insertion/removal as if the entire DRM pipeline were being added/removed. This means all the other DRM components can be implemented normally, without having to even be aware of hot-plugging. The hotplug-bridge it is based on a few self-contained additions to drm_bridge and drm_encoder, which are provided as individual patches in this series, and does not require any modifications to other bridges. However the implementation has some tricky aspects that make it more complex than in an ideal design. See the patch adding the driver for the details. Outstanding bugs and issues =========================== Unsurprisingly, enabling device tree overlay loading/unloading at runtime is exposing a number of issues. While testing this work and another, unrelated work also using overlay insertion/removal [1] we ancountered several and we are working on solving them one by one. Here is a list of the issues for which we have sent some patches: 1. ERROR: remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon' - https://lore.kernel.org/all/20240227113426.253232-1-herve.codina@bootlin.com/ 2. leds: gpio: Add devlink between the leds-gpio device and the gpio used - https://lore.kernel.org/all/20240220133950.138452-1-herve.codina@bootlin.com/ 3. kobject: 'gpiochip8' ((____ptrval____)): is not initialized, yet kobject_get() is being called. - https://lore.kernel.org/all/CAGETcx_YjRzA0joyESsgk=XJKBqqFD7YZeSwKu1a1deo-EyeKw@mail.gmail.com/ Overlay removal is also known for memory leaks of some properties (the "deadprops" list). This is being examined for a proper solution. An issue related to devlink appeared since commit 782bfd03c3ae ("of: property: Improve finding the supplier of a remote-endpoint property"), merged in v6.8-rc5, as reported in: https://lore.kernel.org/all/20240223171849.10f9901d@booty This is on my todo list as well. The current workaround is reverting 782bfd03c3ae. Finally, a known issue is related to NVMEM. The connector driver uses NVMEM notifiers to be notified of cell addition. This works reliably assuming the cell to be available when the notification fucntion is called, but this does not happen. Two alternative patches have been sent that would fix this: - https://lore.kernel.org/all/20240130160021.70ddef92@xps-13/ - https://lore.kernel.org/all/20231229-nvmem-cell-add-notification-v1-1-8d8b426be9f9@bootlin.com/ There was no activity recently about this. Continuing this work in on my todo list. [1] https://lore.kernel.org/all/20240430083730.134918-1-herve.codina@bootlin.com That's all ========== Thanks for you patience in reading this! Luca Changes in v2: - Added bindings and driver for ge,sunh-addon-connector - Removed bindings for the hotplug-video-connector, this is now represented in DT as part of the ge,sunh-addon-connector - Various monior improvements to the DRM hotplug-bridge driver - Link to v1: https://lore.kernel.org/r/20240326-hotplug-drm-bridge-v1-0-4b51b5eb75d5@bootlin.com Co-developed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> --- Luca Ceresoli (4): dt-bindings: connector: add GE SUNH hotplug addon connector drm/encoder: add drm_encoder_cleanup_from() drm/bridge: hotplug-bridge: add driver to support hot-pluggable DSI bridges misc: add ge-addon-connector driver Paul Kocialkowski (1): drm/bridge: add bridge notifier to be notified of bridge addition and removal .../connector/ge,sunh-addon-connector.yaml | 197 +++++++ MAINTAINERS | 11 + drivers/gpu/drm/bridge/Kconfig | 15 + drivers/gpu/drm/bridge/Makefile | 1 + drivers/gpu/drm/bridge/hotplug-bridge.c | 577 +++++++++++++++++++++ drivers/gpu/drm/drm_bridge.c | 35 ++ drivers/gpu/drm/drm_encoder.c | 21 + drivers/misc/Kconfig | 15 + drivers/misc/Makefile | 1 + drivers/misc/ge-sunh-connector.c | 464 +++++++++++++++++ include/drm/drm_bridge.h | 19 + include/drm/drm_encoder.h | 1 + 12 files changed, 1357 insertions(+) --- base-commit: 5e3810587f43a24d2c568b7e08fcff7ce05d71a9 change-id: 20240319-hotplug-drm-bridge-16b86e67fe92 Best regards,