Message ID | 20170718190401.14797-12-sakari.ailus@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 18/07/17 21:03, Sakari Ailus wrote: > The async notifier supports three callbacks to the notifier: bound, unbound > and complete. The complete callback has been traditionally used for > creating the sub-device nodes. > > This approach has an inherent weakness: if registration of a single > sub-device fails for whatever reason, it renders the entire media device > unusable even if only that piece of hardware is not working. This is a > problem in particular in systems with multiple independent image pipelines > on a single device. We have had such devices (e.g. omap3isp) supported for > a number of years and the problem is growing more pressing as time passes > so there is an incentive to resolve this. I don't think this is a good reason. If one of the subdevices fail, then your hardware is messed up and there is no point in continuing. There may be other valid reasons why you would want this (reconfigurable FPGA, hotplugging of sensors), but the reason you give here doesn't hold water IMHO. Regards, Hans > The solution is to register device nodes at the time when the driver of > those devices is complete with initialising the piece of hardware it is > controlling. > > This leaves partial pipelines visible to the user. There are two things to > consider here: > > 1) Registering multiple device nodes was never an atomic operation so the > user space still had to be prepared for partial media graph being visible > and > > 2) The user space can figure out that a pipeline is not startable from the > fact that there are pads with MEDIA_PAD_FL_MUST_CONNECT flag set without > an (enabled) link. > > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> > --- > drivers/media/v4l2-core/v4l2-async.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c > index d2ce39ac402e..55fa7106345c 100644 > --- a/drivers/media/v4l2-core/v4l2-async.c > +++ b/drivers/media/v4l2-core/v4l2-async.c > @@ -127,19 +127,19 @@ static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier, > { > int ret; > > - if (notifier->bound) { > - ret = notifier->bound(notifier, sd, asd); > - if (ret < 0) > - return ret; > - } > - > ret = v4l2_device_register_subdev(notifier->v4l2_dev, sd); > if (ret < 0) { > - if (notifier->unbind) > - notifier->unbind(notifier, sd, asd); > return ret; > } > > + if (notifier->bound) { > + ret = notifier->bound(notifier, sd, asd); > + if (ret < 0) { > + v4l2_device_unregister_subdev(sd); > + return ret; > + } > + } > + > /* Remove from the waiting list */ > list_del(&asd->list); > sd->asd = asd; >
Hi Hans, On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote: > On 18/07/17 21:03, Sakari Ailus wrote: > > The async notifier supports three callbacks to the notifier: bound, unbound > > and complete. The complete callback has been traditionally used for > > creating the sub-device nodes. > > > > This approach has an inherent weakness: if registration of a single > > sub-device fails for whatever reason, it renders the entire media device > > unusable even if only that piece of hardware is not working. This is a > > problem in particular in systems with multiple independent image pipelines > > on a single device. We have had such devices (e.g. omap3isp) supported for > > a number of years and the problem is growing more pressing as time passes > > so there is an incentive to resolve this. > > I don't think this is a good reason. If one of the subdevices fail, then your > hardware is messed up and there is no point in continuing. That's entirely untrue in general case. If you have e.g. a mobile phone with a single camera, yes, you're right. But most mobile phones have two cameras these days. Embedded systems may have many, think of automotive use cases: you could have five or ten cameras there. It is not feasible to prevent the entire system from working if a single component is at fault --- this is really any component such as a lens controller.
On 20/07/17 18:09, Sakari Ailus wrote: > Hi Hans, > > On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote: >> On 18/07/17 21:03, Sakari Ailus wrote: >>> The async notifier supports three callbacks to the notifier: bound, unbound >>> and complete. The complete callback has been traditionally used for >>> creating the sub-device nodes. >>> >>> This approach has an inherent weakness: if registration of a single >>> sub-device fails for whatever reason, it renders the entire media device >>> unusable even if only that piece of hardware is not working. This is a >>> problem in particular in systems with multiple independent image pipelines >>> on a single device. We have had such devices (e.g. omap3isp) supported for >>> a number of years and the problem is growing more pressing as time passes >>> so there is an incentive to resolve this. >> >> I don't think this is a good reason. If one of the subdevices fail, then your >> hardware is messed up and there is no point in continuing. > > That's entirely untrue in general case. > > If you have e.g. a mobile phone with a single camera, yes, you're right. > But most mobile phones have two cameras these days. Embedded systems may > have many, think of automotive use cases: you could have five or ten > cameras there. These are all very recent developments. Today userspace can safely assume that either everything would be up and running, or nothing at all. > It is not feasible to prevent the entire system from working if a single > component is at fault --- this is really any component such as a lens > controller. All I am saying is that there should be a way to indicate that you accept that parts are faulty, and that you (i.e. userspace) are able to detect and handle that. You can't just change the current behavior and expect existing applications to work. E.g. says a sensor failed. Today the application might detect that the video node didn't come up, so something is seriously wrong with the hardware and it shows a message on the display. If this would change and the video node *would* come up, even though there is no sensor the behavior of the application would almost certainly change unexpectedly. How to select which behavior you want isn't easy. The only thing I can come up with is a module option. Not very elegant, unfortunately. But it doesn't belong in the DT, and when userspace gets involved it is already too late. Regards, Hans
Hi Hans, On Thu, Jul 20, 2017 at 06:23:38PM +0200, Hans Verkuil wrote: > On 20/07/17 18:09, Sakari Ailus wrote: > > Hi Hans, > > > > On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote: > >> On 18/07/17 21:03, Sakari Ailus wrote: > >>> The async notifier supports three callbacks to the notifier: bound, unbound > >>> and complete. The complete callback has been traditionally used for > >>> creating the sub-device nodes. > >>> > >>> This approach has an inherent weakness: if registration of a single > >>> sub-device fails for whatever reason, it renders the entire media device > >>> unusable even if only that piece of hardware is not working. This is a > >>> problem in particular in systems with multiple independent image pipelines > >>> on a single device. We have had such devices (e.g. omap3isp) supported for > >>> a number of years and the problem is growing more pressing as time passes > >>> so there is an incentive to resolve this. > >> > >> I don't think this is a good reason. If one of the subdevices fail, then your > >> hardware is messed up and there is no point in continuing. > > > > That's entirely untrue in general case. > > > > If you have e.g. a mobile phone with a single camera, yes, you're right. > > But most mobile phones have two cameras these days. Embedded systems may > > have many, think of automotive use cases: you could have five or ten > > cameras there. > > These are all very recent developments. Today userspace can safely assume > that either everything would be up and running, or nothing at all. > > > It is not feasible to prevent the entire system from working if a single > > component is at fault --- this is really any component such as a lens > > controller. > > All I am saying is that there should be a way to indicate that you accept > that parts are faulty, and that you (i.e. userspace) are able to detect > and handle that. > > You can't just change the current behavior and expect existing applications > to work. E.g. says a sensor failed. Today the application might detect that > the video node didn't come up, so something is seriously wrong with the hardware > and it shows a message on the display. If this would change and the video node > *would* come up, even though there is no sensor the behavior of the application > would almost certainly change unexpectedly. > > How to select which behavior you want isn't easy. The only thing I can come up > with is a module option. Not very elegant, unfortunately. But it doesn't > belong in the DT, and when userspace gets involved it is already too late. Module options don't scale if you want to change kernel interface behaviour. Adding a Kconfig option would. We can neither make this application specific since the application isn't known by the time the nodes are created. Kconfig option (that defaults to no) with the events and media device status info amended with documentation change would achieve the goal, although it'd take a lot of time to adjust all the applications before the Kconfig option can be safely removed. This approach does have the benefit of being able to provide the feature to those systems that really depend on it.
Hi Hans, On 20/07/17 17:23, Hans Verkuil wrote: > On 20/07/17 18:09, Sakari Ailus wrote: >> Hi Hans, >> >> On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote: >>> On 18/07/17 21:03, Sakari Ailus wrote: >>>> The async notifier supports three callbacks to the notifier: bound, unbound >>>> and complete. The complete callback has been traditionally used for >>>> creating the sub-device nodes. >>>> >>>> This approach has an inherent weakness: if registration of a single >>>> sub-device fails for whatever reason, it renders the entire media device >>>> unusable even if only that piece of hardware is not working. This is a >>>> problem in particular in systems with multiple independent image pipelines >>>> on a single device. We have had such devices (e.g. omap3isp) supported for >>>> a number of years and the problem is growing more pressing as time passes >>>> so there is an incentive to resolve this. >>> >>> I don't think this is a good reason. If one of the subdevices fail, then your >>> hardware is messed up and there is no point in continuing. >> >> That's entirely untrue in general case. Adding my 2 cents ... because I'm hitting this ... right now. >> >> If you have e.g. a mobile phone with a single camera, yes, you're right. >> But most mobile phones have two cameras these days. Embedded systems may >> have many, think of automotive use cases: you could have five or ten >> cameras there. I have a MAX9286 which has 4 camera subdevices connected. Right now, if *ONE* fails - they all fail. - This is very much undesirable behaviour. In this instance, when one fails (perhaps I have not connected one camera) then the remaining camera subdevices all probe successfully, but the complete callback is never called. Therefore the rest of my pipeline is dead, - But that could now mean my reversing camera is not working because my wing mirror camera was 'knocked' off... :-( > These are all very recent developments. Today userspace can safely assume > that either everything would be up and running, or nothing at all. This is a strong point, and I'm struggling to decide if I agree or not :D There are so many use cases, it's hard to make one statement fit all. For example - currently - an analogue input source might not be connected - but userspace may not know that, and would instead capture a black screen. Maybe that doesn't even match ... I'm tired ;) >> It is not feasible to prevent the entire system from working if a single >> component is at fault --- this is really any component such as a lens >> controller. > > All I am saying is that there should be a way to indicate that you accept > that parts are faulty, and that you (i.e. userspace) are able to detect > and handle that. > > You can't just change the current behavior and expect existing applications > to work. E.g. says a sensor failed. Today the application might detect that > the video node didn't come up, so something is seriously wrong with the hardware > and it shows a message on the display. If this would change and the video node > *would* come up, even though there is no sensor the behavior of the application > would almost certainly change unexpectedly. > > How to select which behavior you want isn't easy. The only thing I can come up > with is a module option. Not very elegant, unfortunately. But it doesn't > belong in the DT, and when userspace gets involved it is already too late. Yes, this sounds nasty - but 3 out of 4 working cameras being disabled / not coming up because one failed sounds worse to me currently (I would say that ... my cameras didn't come up) ... :-( <thinking cap on ... going to bed> -- Kieran > Regards, > > Hans >
On 20/07/17 21:23, Sakari Ailus wrote: > Hi Hans, > > On Thu, Jul 20, 2017 at 06:23:38PM +0200, Hans Verkuil wrote: >> On 20/07/17 18:09, Sakari Ailus wrote: >>> Hi Hans, >>> >>> On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote: >>>> On 18/07/17 21:03, Sakari Ailus wrote: >>>>> The async notifier supports three callbacks to the notifier: bound, unbound >>>>> and complete. The complete callback has been traditionally used for >>>>> creating the sub-device nodes. >>>>> >>>>> This approach has an inherent weakness: if registration of a single >>>>> sub-device fails for whatever reason, it renders the entire media device >>>>> unusable even if only that piece of hardware is not working. This is a >>>>> problem in particular in systems with multiple independent image pipelines >>>>> on a single device. We have had such devices (e.g. omap3isp) supported for >>>>> a number of years and the problem is growing more pressing as time passes >>>>> so there is an incentive to resolve this. >>>> >>>> I don't think this is a good reason. If one of the subdevices fail, then your >>>> hardware is messed up and there is no point in continuing. >>> >>> That's entirely untrue in general case. >>> >>> If you have e.g. a mobile phone with a single camera, yes, you're right. >>> But most mobile phones have two cameras these days. Embedded systems may >>> have many, think of automotive use cases: you could have five or ten >>> cameras there. >> >> These are all very recent developments. Today userspace can safely assume >> that either everything would be up and running, or nothing at all. >> >>> It is not feasible to prevent the entire system from working if a single >>> component is at fault --- this is really any component such as a lens >>> controller. >> >> All I am saying is that there should be a way to indicate that you accept >> that parts are faulty, and that you (i.e. userspace) are able to detect >> and handle that. >> >> You can't just change the current behavior and expect existing applications >> to work. E.g. says a sensor failed. Today the application might detect that >> the video node didn't come up, so something is seriously wrong with the hardware >> and it shows a message on the display. If this would change and the video node >> *would* come up, even though there is no sensor the behavior of the application >> would almost certainly change unexpectedly. >> >> How to select which behavior you want isn't easy. The only thing I can come up >> with is a module option. Not very elegant, unfortunately. But it doesn't >> belong in the DT, and when userspace gets involved it is already too late. > > Module options don't scale if you want to change kernel interface > behaviour. Adding a Kconfig option would. We can neither make this > application specific since the application isn't known by the time the > nodes are created. > > Kconfig option (that defaults to no) with the events and media device > status info amended with documentation change would achieve the goal, > although it'd take a lot of time to adjust all the applications before the > Kconfig option can be safely removed. This approach does have the benefit > of being able to provide the feature to those systems that really depend on > it. That sounds like a good idea. I don't think you ever want to remove this option in the future. Depending on the product I think there is often a very good case to be made to just fail if one component failed. You need specialized applications that can handle partially configured systems, and having to explicitly enable this is a good way to force users to carefully think about this. Some notes: I recommend to first finish this async rework. Adding support for this new feature should be discussed a bit more before we start work on that. Some discussion points: 1) What about adding time-out support? Today we wait forever until all components are found, but wouldn't it make sense to optionally time-out? And if so, then we can keep most of the code the same and decide in the complete() callback whether or not we accept missing components. And decide how badly 'impaired' the system is. We can also still bring up all the devices in the complete rather than one-by-one as you proposed (and which I am not sure I like). 2) This can be hard to test, so perhaps we should support some form of error injection to easily test what happens if something doesn't come up. Regards, Hans
diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c index d2ce39ac402e..55fa7106345c 100644 --- a/drivers/media/v4l2-core/v4l2-async.c +++ b/drivers/media/v4l2-core/v4l2-async.c @@ -127,19 +127,19 @@ static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier, { int ret; - if (notifier->bound) { - ret = notifier->bound(notifier, sd, asd); - if (ret < 0) - return ret; - } - ret = v4l2_device_register_subdev(notifier->v4l2_dev, sd); if (ret < 0) { - if (notifier->unbind) - notifier->unbind(notifier, sd, asd); return ret; } + if (notifier->bound) { + ret = notifier->bound(notifier, sd, asd); + if (ret < 0) { + v4l2_device_unregister_subdev(sd); + return ret; + } + } + /* Remove from the waiting list */ list_del(&asd->list); sd->asd = asd;
The async notifier supports three callbacks to the notifier: bound, unbound and complete. The complete callback has been traditionally used for creating the sub-device nodes. This approach has an inherent weakness: if registration of a single sub-device fails for whatever reason, it renders the entire media device unusable even if only that piece of hardware is not working. This is a problem in particular in systems with multiple independent image pipelines on a single device. We have had such devices (e.g. omap3isp) supported for a number of years and the problem is growing more pressing as time passes so there is an incentive to resolve this. The solution is to register device nodes at the time when the driver of those devices is complete with initialising the piece of hardware it is controlling. This leaves partial pipelines visible to the user. There are two things to consider here: 1) Registering multiple device nodes was never an atomic operation so the user space still had to be prepared for partial media graph being visible and 2) The user space can figure out that a pipeline is not startable from the fact that there are pads with MEDIA_PAD_FL_MUST_CONNECT flag set without an (enabled) link. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> --- drivers/media/v4l2-core/v4l2-async.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)