Message ID | 8448e90a-4562-b564-c160-1b5c67e0f92f@axentia.se (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2018-04-24 12:14, Peter Rosin wrote: > On 2018-04-24 10:08, Russell King - ARM Linux wrote: >> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote: >>> On 2018-04-23 18:08, Russell King - ARM Linux wrote: >>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote: >>>>> static int tda998x_remove(struct i2c_client *client) >>>>> { >>>>> - component_del(&client->dev, &tda998x_ops); >>>>> + struct device *dev = &client->dev; >>>>> + struct tda998x_bridge *bridge = dev_get_drvdata(dev); >>>>> + >>>>> + drm_bridge_remove(&bridge->bridge); >>>>> + component_del(dev, &tda998x_ops); >>>>> + >>>> >>>> I'd like to ask a rather fundamental question about DRM bridge support, >>>> because I suspect that there's a major fsckup here. >>>> >>>> The above is the function that deals with the TDA998x device being >>>> unbound from the driver. With the component API, this results in the >>>> DRM device correctly being torn down, because one of the hardware >>>> devices has gone. >>>> >>>> With DRM bridge, the bridge is merely removed from the list of >>>> bridges: >>>> >>>> void drm_bridge_remove(struct drm_bridge *bridge) >>>> { >>>> mutex_lock(&bridge_lock); >>>> list_del_init(&bridge->list); >>>> mutex_unlock(&bridge_lock); >>>> } >>>> EXPORT_SYMBOL(drm_bridge_remove); >>>> >>>> and the memory backing the "struct tda998x_bridge" (which contains >>>> the struct drm_bridge) will be freed by the devm subsystem. >>>> >>>> However, there is no notification into the rest of the DRM subsystem >>>> that the device has gone away. Worse, the memory that is still in >>>> use by DRM has now been freed, so further use of the DRM device >>>> results in a use-after-free bug. >>>> >>>> This is really not good, and to me looks like a fundamental problem >>>> with the DRM bridge code. I see nothing in the DRM bridge code that >>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of >>>> the actual device itself. >>>> >>>> So, from what I can see, there seems to be a fundamental lifetime >>>> issue with the design of the DRM bridge code. This needs to be >>>> fixed. >>> >>> Oh crap. A gigantic can of worms... >> >> Yes, it's especially annoying for me, having put the effort in to >> the component helper to cover all these cases. >> >>> Would a patch (completely untested btw) along this line of thinking make >>> any difference whatsoever? >> >> It looks interesting - from what I can see of the device links code, >> it would have the effect of unbinding the DRM device just before >> TDA998x is unbound, so that's an improvement. >> >> However, from what I can see, the link vanishes at that point (as >> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results >> in nothing further happening - the link will be recreated, but there >> appears to be nothing that triggers the "consumer" to rebind at that >> point. Maybe I've missed something? > > Right, auto-remove is a no-go. So, improving on the previous... Heh, I didn't address the rebind triggering part at all, and while I'm by no means responsible or have any deep knowledge, I thought this was true: - driver .remove for the device owning the drm_bridge is what typically calls drm_bridge_remove() - driver .remove is called as part of the device being unbound from the driver by the bus (i2c in this case) - by registering a link to the consumer, this unbinding will trigger the removal of this main drm consumer device as part of the unbinding - so everything aboput the drm device will be torn down, and everything will thus have to be reprobed to get things back But I could easily have misunderstood just about everything in the above... And maybe it's really inconvenient to have to trigger a reprobe of the whole drm cluster? Maybe all drm driver parts should be components? I have no idea. Cheers, Peter PS. compile-tested the below and drm_bridge.c needs to #include <drm/drm_device.h> > (I think drm_panel might suffer from this issue too?) > > Cheers, > Peter > > diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c > index 1638bfe9627c..b1365cfee445 100644 > --- a/drivers/gpu/drm/drm_bridge.c > +++ b/drivers/gpu/drm/drm_bridge.c > @@ -121,12 +121,17 @@ int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge, > if (bridge->dev) > return -EBUSY; > > + bridge->link = device_link_add(encoder->dev->dev, bridge->owner, 0); > + if (!bridge->link) > + return -EINVAL; > + > bridge->dev = encoder->dev; > bridge->encoder = encoder; > > if (bridge->funcs->attach) { > ret = bridge->funcs->attach(bridge); > if (ret < 0) { > + device_link_del(bridge->link); > bridge->dev = NULL; > bridge->encoder = NULL; > return ret; > @@ -153,6 +158,8 @@ void drm_bridge_detach(struct drm_bridge *bridge) > if (bridge->funcs->detach) > bridge->funcs->detach(bridge); > > + device_link_del(bridge->link); > + > bridge->dev = NULL; > } > > diff --git a/drivers/gpu/drm/i2c/tda998x_drv.c b/drivers/gpu/drm/i2c/tda998x_drv.c > index b8cb6237a38b..29eba4e9a39d 100644 > --- a/drivers/gpu/drm/i2c/tda998x_drv.c > +++ b/drivers/gpu/drm/i2c/tda998x_drv.c > @@ -1857,6 +1857,7 @@ tda998x_probe(struct i2c_client *client, const struct i2c_device_id *id) > bridge->dev = dev; > dev_set_drvdata(dev, bridge); > > + bridge->bridge.owner = dev; > bridge->bridge.funcs = &tda998x_bridge_funcs; > #ifdef CONFIG_OF > bridge->bridge.of_node = dev->of_node; > diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h > index 682d01ba920c..b8f33aba3216 100644 > --- a/include/drm/drm_bridge.h > +++ b/include/drm/drm_bridge.h > @@ -224,6 +224,8 @@ struct drm_bridge_funcs { > > /** > * struct drm_bridge - central DRM bridge control structure > + * @owner: device that owns the bridge > + * @link: drm consumer <-> bridge supplier > * @dev: DRM device this bridge belongs to > * @encoder: encoder to which this bridge is connected > * @next: the next bridge in the encoder chain > @@ -233,6 +235,8 @@ struct drm_bridge_funcs { > * @driver_private: pointer to the bridge driver's internal context > */ > struct drm_bridge { > + struct device *owner; > + struct device_link *link; > struct drm_device *dev; > struct drm_encoder *encoder; > struct drm_bridge *next; > >
On 24/04/18 13:14, Peter Rosin wrote: > On 2018-04-24 10:08, Russell King - ARM Linux wrote: >> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote: >>> On 2018-04-23 18:08, Russell King - ARM Linux wrote: >>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote: >>>>> static int tda998x_remove(struct i2c_client *client) >>>>> { >>>>> - component_del(&client->dev, &tda998x_ops); >>>>> + struct device *dev = &client->dev; >>>>> + struct tda998x_bridge *bridge = dev_get_drvdata(dev); >>>>> + >>>>> + drm_bridge_remove(&bridge->bridge); >>>>> + component_del(dev, &tda998x_ops); >>>>> + >>>> >>>> I'd like to ask a rather fundamental question about DRM bridge support, >>>> because I suspect that there's a major fsckup here. >>>> >>>> The above is the function that deals with the TDA998x device being >>>> unbound from the driver. With the component API, this results in the >>>> DRM device correctly being torn down, because one of the hardware >>>> devices has gone. >>>> >>>> With DRM bridge, the bridge is merely removed from the list of >>>> bridges: >>>> >>>> void drm_bridge_remove(struct drm_bridge *bridge) >>>> { >>>> mutex_lock(&bridge_lock); >>>> list_del_init(&bridge->list); >>>> mutex_unlock(&bridge_lock); >>>> } >>>> EXPORT_SYMBOL(drm_bridge_remove); >>>> >>>> and the memory backing the "struct tda998x_bridge" (which contains >>>> the struct drm_bridge) will be freed by the devm subsystem. >>>> >>>> However, there is no notification into the rest of the DRM subsystem >>>> that the device has gone away. Worse, the memory that is still in >>>> use by DRM has now been freed, so further use of the DRM device >>>> results in a use-after-free bug. >>>> >>>> This is really not good, and to me looks like a fundamental problem >>>> with the DRM bridge code. I see nothing in the DRM bridge code that >>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of >>>> the actual device itself. >>>> >>>> So, from what I can see, there seems to be a fundamental lifetime >>>> issue with the design of the DRM bridge code. This needs to be >>>> fixed. >>> >>> Oh crap. A gigantic can of worms... >> >> Yes, it's especially annoying for me, having put the effort in to >> the component helper to cover all these cases. >> >>> Would a patch (completely untested btw) along this line of thinking make >>> any difference whatsoever? >> >> It looks interesting - from what I can see of the device links code, >> it would have the effect of unbinding the DRM device just before >> TDA998x is unbound, so that's an improvement. >> >> However, from what I can see, the link vanishes at that point (as >> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results >> in nothing further happening - the link will be recreated, but there >> appears to be nothing that triggers the "consumer" to rebind at that >> point. Maybe I've missed something? > > Right, auto-remove is a no-go. So, improving on the previous... > > (I think drm_panel might suffer from this issue too?) Yes it does and I took a shot at trying to fix it at the end of the previous merge window, but gave up as I run out of time. I re-spun the work now after reading this thread. I add you and Russell to cc. As far as I understand the re-binding problem can be solved with this patch: https://lists.freedesktop.org/archives/dri-devel/2018-February/166907.html The device_links are such a new concept that there is at least hope this change won't have too unwanted side effects. > > Cheers, > Peter > > diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c > index 1638bfe9627c..b1365cfee445 100644 > --- a/drivers/gpu/drm/drm_bridge.c > +++ b/drivers/gpu/drm/drm_bridge.c > @@ -121,12 +121,17 @@ int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge, > if (bridge->dev) > return -EBUSY; > > + bridge->link = device_link_add(encoder->dev->dev, bridge->owner, 0); > + if (!bridge->link) > + return -EINVAL; > + At least this piece code should prepare for the bridge owner and the master drm device being the same, I which case the link should not be needed. This happens at least when a panel is attached using drm_panel_bridge_add(). Best reagards, Jyri > bridge->dev = encoder->dev; > bridge->encoder = encoder; > > if (bridge->funcs->attach) { > ret = bridge->funcs->attach(bridge); > if (ret < 0) { > + device_link_del(bridge->link); > bridge->dev = NULL; > bridge->encoder = NULL; > return ret; > @@ -153,6 +158,8 @@ void drm_bridge_detach(struct drm_bridge *bridge) > if (bridge->funcs->detach) > bridge->funcs->detach(bridge); > > + device_link_del(bridge->link); > + > bridge->dev = NULL; > } > > diff --git a/drivers/gpu/drm/i2c/tda998x_drv.c b/drivers/gpu/drm/i2c/tda998x_drv.c > index b8cb6237a38b..29eba4e9a39d 100644 > --- a/drivers/gpu/drm/i2c/tda998x_drv.c > +++ b/drivers/gpu/drm/i2c/tda998x_drv.c > @@ -1857,6 +1857,7 @@ tda998x_probe(struct i2c_client *client, const struct i2c_device_id *id) > bridge->dev = dev; > dev_set_drvdata(dev, bridge); > > + bridge->bridge.owner = dev; > bridge->bridge.funcs = &tda998x_bridge_funcs; > #ifdef CONFIG_OF > bridge->bridge.of_node = dev->of_node; > diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h > index 682d01ba920c..b8f33aba3216 100644 > --- a/include/drm/drm_bridge.h > +++ b/include/drm/drm_bridge.h > @@ -224,6 +224,8 @@ struct drm_bridge_funcs { > > /** > * struct drm_bridge - central DRM bridge control structure > + * @owner: device that owns the bridge > + * @link: drm consumer <-> bridge supplier > * @dev: DRM device this bridge belongs to > * @encoder: encoder to which this bridge is connected > * @next: the next bridge in the encoder chain > @@ -233,6 +235,8 @@ struct drm_bridge_funcs { > * @driver_private: pointer to the bridge driver's internal context > */ > struct drm_bridge { > + struct device *owner; > + struct device_link *link; > struct drm_device *dev; > struct drm_encoder *encoder; > struct drm_bridge *next; > >
On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote: > On 24/04/18 13:14, Peter Rosin wrote: > > On 2018-04-24 10:08, Russell King - ARM Linux wrote: > >> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote: > >>> On 2018-04-23 18:08, Russell King - ARM Linux wrote: > >>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote: > >>>>> static int tda998x_remove(struct i2c_client *client) > >>>>> { > >>>>> - component_del(&client->dev, &tda998x_ops); > >>>>> + struct device *dev = &client->dev; > >>>>> + struct tda998x_bridge *bridge = dev_get_drvdata(dev); > >>>>> + > >>>>> + drm_bridge_remove(&bridge->bridge); > >>>>> + component_del(dev, &tda998x_ops); > >>>>> + > >>>> > >>>> I'd like to ask a rather fundamental question about DRM bridge support, > >>>> because I suspect that there's a major fsckup here. > >>>> > >>>> The above is the function that deals with the TDA998x device being > >>>> unbound from the driver. With the component API, this results in the > >>>> DRM device correctly being torn down, because one of the hardware > >>>> devices has gone. > >>>> > >>>> With DRM bridge, the bridge is merely removed from the list of > >>>> bridges: > >>>> > >>>> void drm_bridge_remove(struct drm_bridge *bridge) > >>>> { > >>>> mutex_lock(&bridge_lock); > >>>> list_del_init(&bridge->list); > >>>> mutex_unlock(&bridge_lock); > >>>> } > >>>> EXPORT_SYMBOL(drm_bridge_remove); > >>>> > >>>> and the memory backing the "struct tda998x_bridge" (which contains > >>>> the struct drm_bridge) will be freed by the devm subsystem. > >>>> > >>>> However, there is no notification into the rest of the DRM subsystem > >>>> that the device has gone away. Worse, the memory that is still in > >>>> use by DRM has now been freed, so further use of the DRM device > >>>> results in a use-after-free bug. > >>>> > >>>> This is really not good, and to me looks like a fundamental problem > >>>> with the DRM bridge code. I see nothing in the DRM bridge code that > >>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of > >>>> the actual device itself. > >>>> > >>>> So, from what I can see, there seems to be a fundamental lifetime > >>>> issue with the design of the DRM bridge code. This needs to be > >>>> fixed. > >>> > >>> Oh crap. A gigantic can of worms... > >> > >> Yes, it's especially annoying for me, having put the effort in to > >> the component helper to cover all these cases. > >> > >>> Would a patch (completely untested btw) along this line of thinking make > >>> any difference whatsoever? > >> > >> It looks interesting - from what I can see of the device links code, > >> it would have the effect of unbinding the DRM device just before > >> TDA998x is unbound, so that's an improvement. > >> > >> However, from what I can see, the link vanishes at that point (as > >> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results > >> in nothing further happening - the link will be recreated, but there > >> appears to be nothing that triggers the "consumer" to rebind at that > >> point. Maybe I've missed something? > > > > Right, auto-remove is a no-go. So, improving on the previous... > > > > (I think drm_panel might suffer from this issue too?) > > Yes it does and I took a shot at trying to fix it at the end of the > previous merge window, but gave up as I run out of time. I re-spun the > work now after reading this thread. I add you and Russell to cc. Right, and these exact problems are what the component helper is there to sort out, in a subsystem independent way. What is the problem with the component helper that people seem to be soo loathed to use it, instead preferring to come up with sub- standard and broken alternatives?
On 24/04/18 20:06, Russell King - ARM Linux wrote: > On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote: >> On 24/04/18 13:14, Peter Rosin wrote: >>> On 2018-04-24 10:08, Russell King - ARM Linux wrote: >>>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote: >>>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote: >>>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote: >>>>>>> static int tda998x_remove(struct i2c_client *client) >>>>>>> { >>>>>>> - component_del(&client->dev, &tda998x_ops); >>>>>>> + struct device *dev = &client->dev; >>>>>>> + struct tda998x_bridge *bridge = dev_get_drvdata(dev); >>>>>>> + >>>>>>> + drm_bridge_remove(&bridge->bridge); >>>>>>> + component_del(dev, &tda998x_ops); >>>>>>> + >>>>>> >>>>>> I'd like to ask a rather fundamental question about DRM bridge support, >>>>>> because I suspect that there's a major fsckup here. >>>>>> >>>>>> The above is the function that deals with the TDA998x device being >>>>>> unbound from the driver. With the component API, this results in the >>>>>> DRM device correctly being torn down, because one of the hardware >>>>>> devices has gone. >>>>>> >>>>>> With DRM bridge, the bridge is merely removed from the list of >>>>>> bridges: >>>>>> >>>>>> void drm_bridge_remove(struct drm_bridge *bridge) >>>>>> { >>>>>> mutex_lock(&bridge_lock); >>>>>> list_del_init(&bridge->list); >>>>>> mutex_unlock(&bridge_lock); >>>>>> } >>>>>> EXPORT_SYMBOL(drm_bridge_remove); >>>>>> >>>>>> and the memory backing the "struct tda998x_bridge" (which contains >>>>>> the struct drm_bridge) will be freed by the devm subsystem. >>>>>> >>>>>> However, there is no notification into the rest of the DRM subsystem >>>>>> that the device has gone away. Worse, the memory that is still in >>>>>> use by DRM has now been freed, so further use of the DRM device >>>>>> results in a use-after-free bug. >>>>>> >>>>>> This is really not good, and to me looks like a fundamental problem >>>>>> with the DRM bridge code. I see nothing in the DRM bridge code that >>>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of >>>>>> the actual device itself. >>>>>> >>>>>> So, from what I can see, there seems to be a fundamental lifetime >>>>>> issue with the design of the DRM bridge code. This needs to be >>>>>> fixed. >>>>> >>>>> Oh crap. A gigantic can of worms... >>>> >>>> Yes, it's especially annoying for me, having put the effort in to >>>> the component helper to cover all these cases. >>>> >>>>> Would a patch (completely untested btw) along this line of thinking make >>>>> any difference whatsoever? >>>> >>>> It looks interesting - from what I can see of the device links code, >>>> it would have the effect of unbinding the DRM device just before >>>> TDA998x is unbound, so that's an improvement. >>>> >>>> However, from what I can see, the link vanishes at that point (as >>>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results >>>> in nothing further happening - the link will be recreated, but there >>>> appears to be nothing that triggers the "consumer" to rebind at that >>>> point. Maybe I've missed something? >>> >>> Right, auto-remove is a no-go. So, improving on the previous... >>> >>> (I think drm_panel might suffer from this issue too?) >> >> Yes it does and I took a shot at trying to fix it at the end of the >> previous merge window, but gave up as I run out of time. I re-spun the >> work now after reading this thread. I add you and Russell to cc. > > Right, and these exact problems are what the component helper is > there to sort out, in a subsystem independent way. > > What is the problem with the component helper that people seem to > be soo loathed to use it, instead preferring to come up with sub- > standard and broken alternatives? > Nothing but time. Embedding component helpers seamlessly into drm framework does not sound like a couple of days job. Right now I simply do not have time to take on a challenge like that. If someone does it I am all for it. However, I would not call device links substandard. They are in the device core after all. Best regards, Jyri
On Tue, Apr 24, 2018 at 09:25:44PM +0300, Jyri Sarha wrote: > On 24/04/18 20:06, Russell King - ARM Linux wrote: > > On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote: > >> On 24/04/18 13:14, Peter Rosin wrote: > >>> On 2018-04-24 10:08, Russell King - ARM Linux wrote: > >>>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote: > >>>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote: > >>>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote: > >>>>>>> static int tda998x_remove(struct i2c_client *client) > >>>>>>> { > >>>>>>> - component_del(&client->dev, &tda998x_ops); > >>>>>>> + struct device *dev = &client->dev; > >>>>>>> + struct tda998x_bridge *bridge = dev_get_drvdata(dev); > >>>>>>> + > >>>>>>> + drm_bridge_remove(&bridge->bridge); > >>>>>>> + component_del(dev, &tda998x_ops); > >>>>>>> + > >>>>>> > >>>>>> I'd like to ask a rather fundamental question about DRM bridge support, > >>>>>> because I suspect that there's a major fsckup here. > >>>>>> > >>>>>> The above is the function that deals with the TDA998x device being > >>>>>> unbound from the driver. With the component API, this results in the > >>>>>> DRM device correctly being torn down, because one of the hardware > >>>>>> devices has gone. > >>>>>> > >>>>>> With DRM bridge, the bridge is merely removed from the list of > >>>>>> bridges: > >>>>>> > >>>>>> void drm_bridge_remove(struct drm_bridge *bridge) > >>>>>> { > >>>>>> mutex_lock(&bridge_lock); > >>>>>> list_del_init(&bridge->list); > >>>>>> mutex_unlock(&bridge_lock); > >>>>>> } > >>>>>> EXPORT_SYMBOL(drm_bridge_remove); > >>>>>> > >>>>>> and the memory backing the "struct tda998x_bridge" (which contains > >>>>>> the struct drm_bridge) will be freed by the devm subsystem. > >>>>>> > >>>>>> However, there is no notification into the rest of the DRM subsystem > >>>>>> that the device has gone away. Worse, the memory that is still in > >>>>>> use by DRM has now been freed, so further use of the DRM device > >>>>>> results in a use-after-free bug. > >>>>>> > >>>>>> This is really not good, and to me looks like a fundamental problem > >>>>>> with the DRM bridge code. I see nothing in the DRM bridge code that > >>>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of > >>>>>> the actual device itself. > >>>>>> > >>>>>> So, from what I can see, there seems to be a fundamental lifetime > >>>>>> issue with the design of the DRM bridge code. This needs to be > >>>>>> fixed. > >>>>> > >>>>> Oh crap. A gigantic can of worms... > >>>> > >>>> Yes, it's especially annoying for me, having put the effort in to > >>>> the component helper to cover all these cases. > >>>> > >>>>> Would a patch (completely untested btw) along this line of thinking make > >>>>> any difference whatsoever? > >>>> > >>>> It looks interesting - from what I can see of the device links code, > >>>> it would have the effect of unbinding the DRM device just before > >>>> TDA998x is unbound, so that's an improvement. > >>>> > >>>> However, from what I can see, the link vanishes at that point (as > >>>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results > >>>> in nothing further happening - the link will be recreated, but there > >>>> appears to be nothing that triggers the "consumer" to rebind at that > >>>> point. Maybe I've missed something? > >>> > >>> Right, auto-remove is a no-go. So, improving on the previous... > >>> > >>> (I think drm_panel might suffer from this issue too?) > >> > >> Yes it does and I took a shot at trying to fix it at the end of the > >> previous merge window, but gave up as I run out of time. I re-spun the > >> work now after reading this thread. I add you and Russell to cc. > > > > Right, and these exact problems are what the component helper is > > there to sort out, in a subsystem independent way. > > > > What is the problem with the component helper that people seem to > > be soo loathed to use it, instead preferring to come up with sub- > > standard and broken alternatives? > > > > Nothing but time. Embedding component helpers seamlessly into drm > framework does not sound like a couple of days job. Right now I simply > do not have time to take on a challenge like that. If someone does it I > am all for it. > > However, I would not call device links substandard. They are in the > device core after all. Umm, no, I was not talking about the device links, but the tendency to have subsystem or component specific solutions to this problem. We're now heading down the path of trying to retrofit the functionality that one expects and is provided by the component helpers into DRM bridge and DRM panel by using device links, which appears to only partially resolve the problem. On the point of device links, I've been wondering whether the component helpers could take advantage of the device links, but at the moment that would cause a regression if there's no facility to re-probe the "consumer" when a "supplier" returns. As you bring that up, I would say that they _are_ a substandard way to solve this problem _if_, as I suspect, they would cause a regression to the component helper if the component helper were to use them as a means to control the probing of the "master" device. This is not a matter of which part of the kernel they are "in", but the functionality that they can offer vs what would be expected.
On 25/04/18 02:25, Russell King - ARM Linux wrote: > On Tue, Apr 24, 2018 at 09:25:44PM +0300, Jyri Sarha wrote: >> On 24/04/18 20:06, Russell King - ARM Linux wrote: >>> On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote: >>>> On 24/04/18 13:14, Peter Rosin wrote: >>>>> On 2018-04-24 10:08, Russell King - ARM Linux wrote: >>>>>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote: >>>>>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote: >>>>>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote: >>>>>>>>> static int tda998x_remove(struct i2c_client *client) >>>>>>>>> { >>>>>>>>> - component_del(&client->dev, &tda998x_ops); >>>>>>>>> + struct device *dev = &client->dev; >>>>>>>>> + struct tda998x_bridge *bridge = dev_get_drvdata(dev); >>>>>>>>> + >>>>>>>>> + drm_bridge_remove(&bridge->bridge); >>>>>>>>> + component_del(dev, &tda998x_ops); >>>>>>>>> + >>>>>>>> >>>>>>>> I'd like to ask a rather fundamental question about DRM bridge support, >>>>>>>> because I suspect that there's a major fsckup here. >>>>>>>> >>>>>>>> The above is the function that deals with the TDA998x device being >>>>>>>> unbound from the driver. With the component API, this results in the >>>>>>>> DRM device correctly being torn down, because one of the hardware >>>>>>>> devices has gone. >>>>>>>> >>>>>>>> With DRM bridge, the bridge is merely removed from the list of >>>>>>>> bridges: >>>>>>>> >>>>>>>> void drm_bridge_remove(struct drm_bridge *bridge) >>>>>>>> { >>>>>>>> mutex_lock(&bridge_lock); >>>>>>>> list_del_init(&bridge->list); >>>>>>>> mutex_unlock(&bridge_lock); >>>>>>>> } >>>>>>>> EXPORT_SYMBOL(drm_bridge_remove); >>>>>>>> >>>>>>>> and the memory backing the "struct tda998x_bridge" (which contains >>>>>>>> the struct drm_bridge) will be freed by the devm subsystem. >>>>>>>> >>>>>>>> However, there is no notification into the rest of the DRM subsystem >>>>>>>> that the device has gone away. Worse, the memory that is still in >>>>>>>> use by DRM has now been freed, so further use of the DRM device >>>>>>>> results in a use-after-free bug. >>>>>>>> >>>>>>>> This is really not good, and to me looks like a fundamental problem >>>>>>>> with the DRM bridge code. I see nothing in the DRM bridge code that >>>>>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of >>>>>>>> the actual device itself. >>>>>>>> >>>>>>>> So, from what I can see, there seems to be a fundamental lifetime >>>>>>>> issue with the design of the DRM bridge code. This needs to be >>>>>>>> fixed. >>>>>>> >>>>>>> Oh crap. A gigantic can of worms... >>>>>> >>>>>> Yes, it's especially annoying for me, having put the effort in to >>>>>> the component helper to cover all these cases. >>>>>> >>>>>>> Would a patch (completely untested btw) along this line of thinking make >>>>>>> any difference whatsoever? >>>>>> >>>>>> It looks interesting - from what I can see of the device links code, >>>>>> it would have the effect of unbinding the DRM device just before >>>>>> TDA998x is unbound, so that's an improvement. >>>>>> >>>>>> However, from what I can see, the link vanishes at that point (as >>>>>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results >>>>>> in nothing further happening - the link will be recreated, but there >>>>>> appears to be nothing that triggers the "consumer" to rebind at that >>>>>> point. Maybe I've missed something? >>>>> >>>>> Right, auto-remove is a no-go. So, improving on the previous... >>>>> >>>>> (I think drm_panel might suffer from this issue too?) >>>> >>>> Yes it does and I took a shot at trying to fix it at the end of the >>>> previous merge window, but gave up as I run out of time. I re-spun the >>>> work now after reading this thread. I add you and Russell to cc. >>> >>> Right, and these exact problems are what the component helper is >>> there to sort out, in a subsystem independent way. >>> >>> What is the problem with the component helper that people seem to >>> be soo loathed to use it, instead preferring to come up with sub- >>> standard and broken alternatives? >>> >> >> Nothing but time. Embedding component helpers seamlessly into drm >> framework does not sound like a couple of days job. Right now I simply >> do not have time to take on a challenge like that. If someone does it I >> am all for it. >> >> However, I would not call device links substandard. They are in the >> device core after all. > > Umm, no, I was not talking about the device links, but the tendency to > have subsystem or component specific solutions to this problem. > Oh yes. But in this case the substandard solution is already there and it is already widely used, despite it being severely broken. I am merely trying to fix the existing substandard solution. I admit that a full integration with component helpers would probably be more elegant solution to the problem, but the amount of work is just too much. The change would impact the way all the master drm drivers pull them selves together. The drivers that already use the component helpers for some internal stuff will add their own challenge. Separate component matching implementations are needed for device-tree and ACPI (are ther more flavors?) etc. I just do not see this happening any time soon (am happy to be wrong about this). > We're now heading down the path of trying to retrofit the functionality > that one expects and is provided by the component helpers into DRM > bridge and DRM panel by using device links, which appears to only > partially resolve the problem. > > On the point of device links, I've been wondering whether the component > helpers could take advantage of the device links, but at the moment > that would cause a regression if there's no facility to re-probe the > "consumer" when a "supplier" returns. > I sent a patch[1] in February that solves the re-probing problem at least in my environment. > As you bring that up, I would say that they _are_ a substandard way to > solve this problem _if_, as I suspect, they would cause a regression > to the component helper if the component helper were to use them as a > means to control the probing of the "master" device. This is not a > matter of which part of the kernel they are "in", but the functionality > that they can offer vs what would be expected. > The component helpers certainly have many features in that are not (yet?) in device core's device links, but the amount of work needed to fix the most pressing problem of attached drm panels or bridges is couple of magnitudes less with device links when compared to full blown component helper usage. Best regards, Jyri [1] https://lists.freedesktop.org/archives/dri-devel/2018-February/166907.html
On Wed, Apr 25, 2018 at 11:01:15PM +0300, Jyri Sarha wrote: > On 25/04/18 02:25, Russell King - ARM Linux wrote: > > On Tue, Apr 24, 2018 at 09:25:44PM +0300, Jyri Sarha wrote: > >> On 24/04/18 20:06, Russell King - ARM Linux wrote: > >>> On Tue, Apr 24, 2018 at 07:04:16PM +0300, Jyri Sarha wrote: > >>>> On 24/04/18 13:14, Peter Rosin wrote: > >>>>> On 2018-04-24 10:08, Russell King - ARM Linux wrote: > >>>>>> On Tue, Apr 24, 2018 at 08:58:42AM +0200, Peter Rosin wrote: > >>>>>>> On 2018-04-23 18:08, Russell King - ARM Linux wrote: > >>>>>>>> On Mon, Apr 23, 2018 at 09:23:00AM +0200, Peter Rosin wrote: > >>>>>>>>> static int tda998x_remove(struct i2c_client *client) > >>>>>>>>> { > >>>>>>>>> - component_del(&client->dev, &tda998x_ops); > >>>>>>>>> + struct device *dev = &client->dev; > >>>>>>>>> + struct tda998x_bridge *bridge = dev_get_drvdata(dev); > >>>>>>>>> + > >>>>>>>>> + drm_bridge_remove(&bridge->bridge); > >>>>>>>>> + component_del(dev, &tda998x_ops); > >>>>>>>>> + > >>>>>>>> > >>>>>>>> I'd like to ask a rather fundamental question about DRM bridge support, > >>>>>>>> because I suspect that there's a major fsckup here. > >>>>>>>> > >>>>>>>> The above is the function that deals with the TDA998x device being > >>>>>>>> unbound from the driver. With the component API, this results in the > >>>>>>>> DRM device correctly being torn down, because one of the hardware > >>>>>>>> devices has gone. > >>>>>>>> > >>>>>>>> With DRM bridge, the bridge is merely removed from the list of > >>>>>>>> bridges: > >>>>>>>> > >>>>>>>> void drm_bridge_remove(struct drm_bridge *bridge) > >>>>>>>> { > >>>>>>>> mutex_lock(&bridge_lock); > >>>>>>>> list_del_init(&bridge->list); > >>>>>>>> mutex_unlock(&bridge_lock); > >>>>>>>> } > >>>>>>>> EXPORT_SYMBOL(drm_bridge_remove); > >>>>>>>> > >>>>>>>> and the memory backing the "struct tda998x_bridge" (which contains > >>>>>>>> the struct drm_bridge) will be freed by the devm subsystem. > >>>>>>>> > >>>>>>>> However, there is no notification into the rest of the DRM subsystem > >>>>>>>> that the device has gone away. Worse, the memory that is still in > >>>>>>>> use by DRM has now been freed, so further use of the DRM device > >>>>>>>> results in a use-after-free bug. > >>>>>>>> > >>>>>>>> This is really not good, and to me looks like a fundamental problem > >>>>>>>> with the DRM bridge code. I see nothing in the DRM bridge code that > >>>>>>>> deals with the lifetime of a "DRM bridge" or indeed the lifetime of > >>>>>>>> the actual device itself. > >>>>>>>> > >>>>>>>> So, from what I can see, there seems to be a fundamental lifetime > >>>>>>>> issue with the design of the DRM bridge code. This needs to be > >>>>>>>> fixed. > >>>>>>> > >>>>>>> Oh crap. A gigantic can of worms... > >>>>>> > >>>>>> Yes, it's especially annoying for me, having put the effort in to > >>>>>> the component helper to cover all these cases. > >>>>>> > >>>>>>> Would a patch (completely untested btw) along this line of thinking make > >>>>>>> any difference whatsoever? > >>>>>> > >>>>>> It looks interesting - from what I can see of the device links code, > >>>>>> it would have the effect of unbinding the DRM device just before > >>>>>> TDA998x is unbound, so that's an improvement. > >>>>>> > >>>>>> However, from what I can see, the link vanishes at that point (as > >>>>>> DL_FLAG_AUTOREMOVE is set), and re-binding the TDA998x device results > >>>>>> in nothing further happening - the link will be recreated, but there > >>>>>> appears to be nothing that triggers the "consumer" to rebind at that > >>>>>> point. Maybe I've missed something? > >>>>> > >>>>> Right, auto-remove is a no-go. So, improving on the previous... > >>>>> > >>>>> (I think drm_panel might suffer from this issue too?) > >>>> > >>>> Yes it does and I took a shot at trying to fix it at the end of the > >>>> previous merge window, but gave up as I run out of time. I re-spun the > >>>> work now after reading this thread. I add you and Russell to cc. > >>> > >>> Right, and these exact problems are what the component helper is > >>> there to sort out, in a subsystem independent way. > >>> > >>> What is the problem with the component helper that people seem to > >>> be soo loathed to use it, instead preferring to come up with sub- > >>> standard and broken alternatives? > >>> > >> > >> Nothing but time. Embedding component helpers seamlessly into drm > >> framework does not sound like a couple of days job. Right now I simply > >> do not have time to take on a challenge like that. If someone does it I > >> am all for it. > >> > >> However, I would not call device links substandard. They are in the > >> device core after all. > > > > Umm, no, I was not talking about the device links, but the tendency to > > have subsystem or component specific solutions to this problem. > > > > Oh yes. But in this case the substandard solution is already there and > it is already widely used, despite it being severely broken. I am merely > trying to fix the existing substandard solution. > > I admit that a full integration with component helpers would probably be > more elegant solution to the problem, but the amount of work is just too > much. The change would impact the way all the master drm drivers pull > them selves together. The drivers that already use the component helpers > for some internal stuff will add their own challenge. Separate component > matching implementations are needed for device-tree and ACPI (are ther > more flavors?) etc. I just do not see this happening any time soon (am > happy to be wrong about this). The issue is actually worse than that: - drivers that are already componentised can't use bridges - drivers that use bridges can't use componentised stuff because bridges don't register themselves with the component helper, and the helpers in drm_of.c assume that all graph nodes will be components. The whole thing about whether stuff is componentised or bridge based is really getting out of hand, and the push is towards bridge based stuff even though that is technically inferior when it comes to being able to develop and test (which involves being able to remove and re-insert modules.) Consequently more and more code is being written for bridges, and the component helper ignored, and the problems with bridges are being ignored. This is not healthy. The problem is only going to get worse. Someone needs to bite the bullet and fix bridges before the problem gets any more out of hand.
diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c index 1638bfe9627c..b1365cfee445 100644 --- a/drivers/gpu/drm/drm_bridge.c +++ b/drivers/gpu/drm/drm_bridge.c @@ -121,12 +121,17 @@ int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge, if (bridge->dev) return -EBUSY; + bridge->link = device_link_add(encoder->dev->dev, bridge->owner, 0); + if (!bridge->link) + return -EINVAL; + bridge->dev = encoder->dev; bridge->encoder = encoder; if (bridge->funcs->attach) { ret = bridge->funcs->attach(bridge); if (ret < 0) { + device_link_del(bridge->link); bridge->dev = NULL; bridge->encoder = NULL; return ret; @@ -153,6 +158,8 @@ void drm_bridge_detach(struct drm_bridge *bridge) if (bridge->funcs->detach) bridge->funcs->detach(bridge); + device_link_del(bridge->link); + bridge->dev = NULL; } diff --git a/drivers/gpu/drm/i2c/tda998x_drv.c b/drivers/gpu/drm/i2c/tda998x_drv.c index b8cb6237a38b..29eba4e9a39d 100644 --- a/drivers/gpu/drm/i2c/tda998x_drv.c +++ b/drivers/gpu/drm/i2c/tda998x_drv.c @@ -1857,6 +1857,7 @@ tda998x_probe(struct i2c_client *client, const struct i2c_device_id *id) bridge->dev = dev; dev_set_drvdata(dev, bridge); + bridge->bridge.owner = dev; bridge->bridge.funcs = &tda998x_bridge_funcs; #ifdef CONFIG_OF bridge->bridge.of_node = dev->of_node; diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h index 682d01ba920c..b8f33aba3216 100644 --- a/include/drm/drm_bridge.h +++ b/include/drm/drm_bridge.h @@ -224,6 +224,8 @@ struct drm_bridge_funcs { /** * struct drm_bridge - central DRM bridge control structure + * @owner: device that owns the bridge + * @link: drm consumer <-> bridge supplier * @dev: DRM device this bridge belongs to * @encoder: encoder to which this bridge is connected * @next: the next bridge in the encoder chain @@ -233,6 +235,8 @@ struct drm_bridge_funcs { * @driver_private: pointer to the bridge driver's internal context */ struct drm_bridge { + struct device *owner; + struct device_link *link; struct drm_device *dev; struct drm_encoder *encoder; struct drm_bridge *next;