diff mbox

[RFC,11/19] v4l2-async: Register sub-devices before calling bound callback

Message ID 20170718190401.14797-12-sakari.ailus@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sakari Ailus July 18, 2017, 7:03 p.m. UTC
The async notifier supports three callbacks to the notifier: bound, unbound
and complete. The complete callback has been traditionally used for
creating the sub-device nodes.

This approach has an inherent weakness: if registration of a single
sub-device fails for whatever reason, it renders the entire media device
unusable even if only that piece of hardware is not working. This is a
problem in particular in systems with multiple independent image pipelines
on a single device. We have had such devices (e.g. omap3isp) supported for
a number of years and the problem is growing more pressing as time passes
so there is an incentive to resolve this.

The solution is to register device nodes at the time when the driver of
those devices is complete with initialising the piece of hardware it is
controlling.

This leaves partial pipelines visible to the user. There are two things to
consider here:

1) Registering multiple device nodes was never an atomic operation so the
user space still had to be prepared for partial media graph being visible
and

2) The user space can figure out that a pipeline is not startable from the
fact that there are pads with MEDIA_PAD_FL_MUST_CONNECT flag set without
an (enabled) link.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/v4l2-async.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Comments

Hans Verkuil July 19, 2017, 11:24 a.m. UTC | #1
On 18/07/17 21:03, Sakari Ailus wrote:
> The async notifier supports three callbacks to the notifier: bound, unbound
> and complete. The complete callback has been traditionally used for
> creating the sub-device nodes.
> 
> This approach has an inherent weakness: if registration of a single
> sub-device fails for whatever reason, it renders the entire media device
> unusable even if only that piece of hardware is not working. This is a
> problem in particular in systems with multiple independent image pipelines
> on a single device. We have had such devices (e.g. omap3isp) supported for
> a number of years and the problem is growing more pressing as time passes
> so there is an incentive to resolve this.

I don't think this is a good reason. If one of the subdevices fail, then your
hardware is messed up and there is no point in continuing.

There may be other valid reasons why you would want this (reconfigurable
FPGA, hotplugging of sensors), but the reason you give here doesn't hold
water IMHO.

Regards,

	Hans

> The solution is to register device nodes at the time when the driver of
> those devices is complete with initialising the piece of hardware it is
> controlling.
> 
> This leaves partial pipelines visible to the user. There are two things to
> consider here:
> 
> 1) Registering multiple device nodes was never an atomic operation so the
> user space still had to be prepared for partial media graph being visible
> and
> 
> 2) The user space can figure out that a pipeline is not startable from the
> fact that there are pads with MEDIA_PAD_FL_MUST_CONNECT flag set without
> an (enabled) link.
> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/v4l2-async.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c
> index d2ce39ac402e..55fa7106345c 100644
> --- a/drivers/media/v4l2-core/v4l2-async.c
> +++ b/drivers/media/v4l2-core/v4l2-async.c
> @@ -127,19 +127,19 @@ static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier,
>  {
>  	int ret;
>  
> -	if (notifier->bound) {
> -		ret = notifier->bound(notifier, sd, asd);
> -		if (ret < 0)
> -			return ret;
> -	}
> -
>  	ret = v4l2_device_register_subdev(notifier->v4l2_dev, sd);
>  	if (ret < 0) {
> -		if (notifier->unbind)
> -			notifier->unbind(notifier, sd, asd);
>  		return ret;
>  	}
>  
> +	if (notifier->bound) {
> +		ret = notifier->bound(notifier, sd, asd);
> +		if (ret < 0) {
> +			v4l2_device_unregister_subdev(sd);
> +			return ret;
> +		}
> +	}
> +
>  	/* Remove from the waiting list */
>  	list_del(&asd->list);
>  	sd->asd = asd;
>
Sakari Ailus July 20, 2017, 4:09 p.m. UTC | #2
Hi Hans,

On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote:
> On 18/07/17 21:03, Sakari Ailus wrote:
> > The async notifier supports three callbacks to the notifier: bound, unbound
> > and complete. The complete callback has been traditionally used for
> > creating the sub-device nodes.
> > 
> > This approach has an inherent weakness: if registration of a single
> > sub-device fails for whatever reason, it renders the entire media device
> > unusable even if only that piece of hardware is not working. This is a
> > problem in particular in systems with multiple independent image pipelines
> > on a single device. We have had such devices (e.g. omap3isp) supported for
> > a number of years and the problem is growing more pressing as time passes
> > so there is an incentive to resolve this.
> 
> I don't think this is a good reason. If one of the subdevices fail, then your
> hardware is messed up and there is no point in continuing.

That's entirely untrue in general case.

If you have e.g. a mobile phone with a single camera, yes, you're right.
But most mobile phones have two cameras these days. Embedded systems may
have many, think of automotive use cases: you could have five or ten
cameras there.

It is not feasible to prevent the entire system from working if a single
component is at fault --- this is really any component such as a lens
controller.
Hans Verkuil July 20, 2017, 4:23 p.m. UTC | #3
On 20/07/17 18:09, Sakari Ailus wrote:
> Hi Hans,
> 
> On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote:
>> On 18/07/17 21:03, Sakari Ailus wrote:
>>> The async notifier supports three callbacks to the notifier: bound, unbound
>>> and complete. The complete callback has been traditionally used for
>>> creating the sub-device nodes.
>>>
>>> This approach has an inherent weakness: if registration of a single
>>> sub-device fails for whatever reason, it renders the entire media device
>>> unusable even if only that piece of hardware is not working. This is a
>>> problem in particular in systems with multiple independent image pipelines
>>> on a single device. We have had such devices (e.g. omap3isp) supported for
>>> a number of years and the problem is growing more pressing as time passes
>>> so there is an incentive to resolve this.
>>
>> I don't think this is a good reason. If one of the subdevices fail, then your
>> hardware is messed up and there is no point in continuing.
> 
> That's entirely untrue in general case.
> 
> If you have e.g. a mobile phone with a single camera, yes, you're right.
> But most mobile phones have two cameras these days. Embedded systems may
> have many, think of automotive use cases: you could have five or ten
> cameras there.

These are all very recent developments. Today userspace can safely assume
that either everything would be up and running, or nothing at all.

> It is not feasible to prevent the entire system from working if a single
> component is at fault --- this is really any component such as a lens
> controller.

All I am saying is that there should be a way to indicate that you accept
that parts are faulty, and that you (i.e. userspace) are able to detect
and handle that.

You can't just change the current behavior and expect existing applications
to work. E.g. says a sensor failed. Today the application might detect that
the video node didn't come up, so something is seriously wrong with the hardware
and it shows a message on the display. If this would change and the video node
*would* come up, even though there is no sensor the behavior of the application
would almost certainly change unexpectedly.

How to select which behavior you want isn't easy. The only thing I can come up
with is a module option. Not very elegant, unfortunately. But it doesn't
belong in the DT, and when userspace gets involved it is already too late.

Regards,

	Hans
Sakari Ailus July 20, 2017, 7:23 p.m. UTC | #4
Hi Hans,

On Thu, Jul 20, 2017 at 06:23:38PM +0200, Hans Verkuil wrote:
> On 20/07/17 18:09, Sakari Ailus wrote:
> > Hi Hans,
> > 
> > On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote:
> >> On 18/07/17 21:03, Sakari Ailus wrote:
> >>> The async notifier supports three callbacks to the notifier: bound, unbound
> >>> and complete. The complete callback has been traditionally used for
> >>> creating the sub-device nodes.
> >>>
> >>> This approach has an inherent weakness: if registration of a single
> >>> sub-device fails for whatever reason, it renders the entire media device
> >>> unusable even if only that piece of hardware is not working. This is a
> >>> problem in particular in systems with multiple independent image pipelines
> >>> on a single device. We have had such devices (e.g. omap3isp) supported for
> >>> a number of years and the problem is growing more pressing as time passes
> >>> so there is an incentive to resolve this.
> >>
> >> I don't think this is a good reason. If one of the subdevices fail, then your
> >> hardware is messed up and there is no point in continuing.
> > 
> > That's entirely untrue in general case.
> > 
> > If you have e.g. a mobile phone with a single camera, yes, you're right.
> > But most mobile phones have two cameras these days. Embedded systems may
> > have many, think of automotive use cases: you could have five or ten
> > cameras there.
> 
> These are all very recent developments. Today userspace can safely assume
> that either everything would be up and running, or nothing at all.
> 
> > It is not feasible to prevent the entire system from working if a single
> > component is at fault --- this is really any component such as a lens
> > controller.
> 
> All I am saying is that there should be a way to indicate that you accept
> that parts are faulty, and that you (i.e. userspace) are able to detect
> and handle that.
> 
> You can't just change the current behavior and expect existing applications
> to work. E.g. says a sensor failed. Today the application might detect that
> the video node didn't come up, so something is seriously wrong with the hardware
> and it shows a message on the display. If this would change and the video node
> *would* come up, even though there is no sensor the behavior of the application
> would almost certainly change unexpectedly.
> 
> How to select which behavior you want isn't easy. The only thing I can come up
> with is a module option. Not very elegant, unfortunately. But it doesn't
> belong in the DT, and when userspace gets involved it is already too late.

Module options don't scale if you want to change kernel interface
behaviour. Adding a Kconfig option would. We can neither make this
application specific since the application isn't known by the time the
nodes are created.

Kconfig option (that defaults to no) with the events and media device
status info amended with documentation change would achieve the goal,
although it'd take a lot of time to adjust all the applications before the
Kconfig option can be safely removed. This approach does have the benefit
of being able to provide the feature to those systems that really depend on
it.
Kieran Bingham July 20, 2017, 11:53 p.m. UTC | #5
Hi Hans,

On 20/07/17 17:23, Hans Verkuil wrote:
> On 20/07/17 18:09, Sakari Ailus wrote:
>> Hi Hans,
>>
>> On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote:
>>> On 18/07/17 21:03, Sakari Ailus wrote:
>>>> The async notifier supports three callbacks to the notifier: bound, unbound
>>>> and complete. The complete callback has been traditionally used for
>>>> creating the sub-device nodes.
>>>>
>>>> This approach has an inherent weakness: if registration of a single
>>>> sub-device fails for whatever reason, it renders the entire media device
>>>> unusable even if only that piece of hardware is not working. This is a
>>>> problem in particular in systems with multiple independent image pipelines
>>>> on a single device. We have had such devices (e.g. omap3isp) supported for
>>>> a number of years and the problem is growing more pressing as time passes
>>>> so there is an incentive to resolve this.
>>>
>>> I don't think this is a good reason. If one of the subdevices fail, then your
>>> hardware is messed up and there is no point in continuing.
>>
>> That's entirely untrue in general case.

Adding my 2 cents ... because I'm hitting this ... right now.

>>
>> If you have e.g. a mobile phone with a single camera, yes, you're right.
>> But most mobile phones have two cameras these days. Embedded systems may
>> have many, think of automotive use cases: you could have five or ten
>> cameras there.
I have a MAX9286 which has 4 camera subdevices connected.

Right now, if *ONE* fails - they all fail. - This is very much undesirable
behaviour.

In this instance, when one fails (perhaps I have not connected one camera) then
the remaining camera subdevices all probe successfully, but the complete
callback is never called. Therefore the rest of my pipeline is dead, - But that
could now mean my reversing camera is not working because my wing mirror camera
was 'knocked' off... :-(


> These are all very recent developments. Today userspace can safely assume
> that either everything would be up and running, or nothing at all.

This is a strong point, and I'm struggling to decide if I agree or not :D

There are so many use cases, it's hard to make one statement fit all.

For example - currently - an analogue input source might not be connected - but
userspace may not know that, and would instead capture a black screen.

Maybe that doesn't even match ... I'm tired ;)


>> It is not feasible to prevent the entire system from working if a single
>> component is at fault --- this is really any component such as a lens
>> controller.
> 
> All I am saying is that there should be a way to indicate that you accept
> that parts are faulty, and that you (i.e. userspace) are able to detect
> and handle that.
>
> You can't just change the current behavior and expect existing applications
> to work. E.g. says a sensor failed. Today the application might detect that
> the video node didn't come up, so something is seriously wrong with the hardware
> and it shows a message on the display. If this would change and the video node
> *would* come up, even though there is no sensor the behavior of the application
> would almost certainly change unexpectedly.
> 
> How to select which behavior you want isn't easy. The only thing I can come up
> with is a module option. Not very elegant, unfortunately. But it doesn't
> belong in the DT, and when userspace gets involved it is already too late.

Yes, this sounds nasty - but 3 out of 4 working cameras being disabled / not
coming up because one failed sounds worse to me currently (I would say that ...
my cameras didn't come up) ... :-(

<thinking cap on ... going to bed>

--
Kieran

> Regards,
> 
> 	Hans
>
Hans Verkuil July 21, 2017, 7:14 a.m. UTC | #6
On 20/07/17 21:23, Sakari Ailus wrote:
> Hi Hans,
> 
> On Thu, Jul 20, 2017 at 06:23:38PM +0200, Hans Verkuil wrote:
>> On 20/07/17 18:09, Sakari Ailus wrote:
>>> Hi Hans,
>>>
>>> On Wed, Jul 19, 2017 at 01:24:54PM +0200, Hans Verkuil wrote:
>>>> On 18/07/17 21:03, Sakari Ailus wrote:
>>>>> The async notifier supports three callbacks to the notifier: bound, unbound
>>>>> and complete. The complete callback has been traditionally used for
>>>>> creating the sub-device nodes.
>>>>>
>>>>> This approach has an inherent weakness: if registration of a single
>>>>> sub-device fails for whatever reason, it renders the entire media device
>>>>> unusable even if only that piece of hardware is not working. This is a
>>>>> problem in particular in systems with multiple independent image pipelines
>>>>> on a single device. We have had such devices (e.g. omap3isp) supported for
>>>>> a number of years and the problem is growing more pressing as time passes
>>>>> so there is an incentive to resolve this.
>>>>
>>>> I don't think this is a good reason. If one of the subdevices fail, then your
>>>> hardware is messed up and there is no point in continuing.
>>>
>>> That's entirely untrue in general case.
>>>
>>> If you have e.g. a mobile phone with a single camera, yes, you're right.
>>> But most mobile phones have two cameras these days. Embedded systems may
>>> have many, think of automotive use cases: you could have five or ten
>>> cameras there.
>>
>> These are all very recent developments. Today userspace can safely assume
>> that either everything would be up and running, or nothing at all.
>>
>>> It is not feasible to prevent the entire system from working if a single
>>> component is at fault --- this is really any component such as a lens
>>> controller.
>>
>> All I am saying is that there should be a way to indicate that you accept
>> that parts are faulty, and that you (i.e. userspace) are able to detect
>> and handle that.
>>
>> You can't just change the current behavior and expect existing applications
>> to work. E.g. says a sensor failed. Today the application might detect that
>> the video node didn't come up, so something is seriously wrong with the hardware
>> and it shows a message on the display. If this would change and the video node
>> *would* come up, even though there is no sensor the behavior of the application
>> would almost certainly change unexpectedly.
>>
>> How to select which behavior you want isn't easy. The only thing I can come up
>> with is a module option. Not very elegant, unfortunately. But it doesn't
>> belong in the DT, and when userspace gets involved it is already too late.
> 
> Module options don't scale if you want to change kernel interface
> behaviour. Adding a Kconfig option would. We can neither make this
> application specific since the application isn't known by the time the
> nodes are created.
> 
> Kconfig option (that defaults to no) with the events and media device
> status info amended with documentation change would achieve the goal,
> although it'd take a lot of time to adjust all the applications before the
> Kconfig option can be safely removed. This approach does have the benefit
> of being able to provide the feature to those systems that really depend on
> it.

That sounds like a good idea. I don't think you ever want to remove this
option in the future. Depending on the product I think there is often a very
good case to be made to just fail if one component failed.

You need specialized applications that can handle partially configured systems,
and having to explicitly enable this is a good way to force users to carefully
think about this.

Some notes:

I recommend to first finish this async rework. Adding support for this new
feature should be discussed a bit more before we start work on that.

Some discussion points:

1) What about adding time-out support? Today we wait forever until all components
   are found, but wouldn't it make sense to optionally time-out? And if so, then
   we can keep most of the code the same and decide in the complete() callback
   whether or not we accept missing components. And decide how badly 'impaired'
   the system is. We can also still bring up all the devices in the complete rather
   than one-by-one as you proposed (and which I am not sure I like).

2) This can be hard to test, so perhaps we should support some form of error
   injection to easily test what happens if something doesn't come up.

Regards,

	Hans
diff mbox

Patch

diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c
index d2ce39ac402e..55fa7106345c 100644
--- a/drivers/media/v4l2-core/v4l2-async.c
+++ b/drivers/media/v4l2-core/v4l2-async.c
@@ -127,19 +127,19 @@  static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier,
 {
 	int ret;
 
-	if (notifier->bound) {
-		ret = notifier->bound(notifier, sd, asd);
-		if (ret < 0)
-			return ret;
-	}
-
 	ret = v4l2_device_register_subdev(notifier->v4l2_dev, sd);
 	if (ret < 0) {
-		if (notifier->unbind)
-			notifier->unbind(notifier, sd, asd);
 		return ret;
 	}
 
+	if (notifier->bound) {
+		ret = notifier->bound(notifier, sd, asd);
+		if (ret < 0) {
+			v4l2_device_unregister_subdev(sd);
+			return ret;
+		}
+	}
+
 	/* Remove from the waiting list */
 	list_del(&asd->list);
 	sd->asd = asd;