[v5,15/39,media] v4l2: add a frame interval error event

Message ID	1489121599-23206-16-git-send-email-steve_longerbeam@mentor.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Steve Longerbeam <slongerbeam@gmail.com> To: robh+dt@kernel.org, mark.rutland@arm.com, shawnguo@kernel.org, kernel@pengutronix.de, fabio.estevam@nxp.com, linux@armlinux.org.uk, mchehab@kernel.org, hverkuil@xs4all.nl, nick@shmanahar.org, markus.heiser@darmarIT.de, p.zabel@pengutronix.de, laurent.pinchart+renesas@ideasonboard.com, bparrot@ti.com, geert@linux-m68k.org, arnd@arndb.de, sudipm.mukherjee@gmail.com, minghsiu.tsai@mediatek.com, tiffany.lin@mediatek.com, jean-christophe.trotin@st.com, horms+renesas@verge.net.au, niklas.soderlund+renesas@ragnatech.se, robert.jarzmik@free.fr, songjun.wu@microchip.com, andrew-ct.chen@mediatek.com, gregkh@linuxfoundation.org, shuah@kernel.org, sakari.ailus@linux.intel.com, pavel@ucw.cz Subject: [PATCH v5 15/39] [media] v4l2: add a frame interval error event Date: Thu, 9 Mar 2017 20:52:55 -0800 Message-Id: <1489121599-23206-16-git-send-email-steve_longerbeam@mentor.com> In-Reply-To: <1489121599-23206-1-git-send-email-steve_longerbeam@mentor.com> References: <1489121599-23206-1-git-send-email-steve_longerbeam@mentor.com> Precedence: list Cc: devel@driverdev.osuosl.org, devicetree@vger.kernel.org, Steve Longerbeam <steve_longerbeam@mentor.com>, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-media@vger.kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Steve Longerbeam March 10, 2017, 4:52 a.m. UTC

Add a new FRAME_INTERVAL_ERROR event to signal that a video capture or
output device has measured an interval between the reception or transmit
completion of two consecutive frames of video that is outside the nominal
frame interval by some tolerance value.

Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
---
 Documentation/media/uapi/v4l/vidioc-dqevent.rst | 6 ++++++
 Documentation/media/videodev2.h.rst.exceptions  | 1 +
 include/uapi/linux/videodev2.h                  | 1 +
 3 files changed, 8 insertions(+)

Hans Verkuil March 10, 2017, 12:03 p.m. UTC | #1

On 10/03/17 05:52, Steve Longerbeam wrote:
> Add a new FRAME_INTERVAL_ERROR event to signal that a video capture or
> output device has measured an interval between the reception or transmit
> completion of two consecutive frames of video that is outside the nominal
> frame interval by some tolerance value.

Reading back what was said on this I agree with Sakari that this doesn't
belong here.

Userspace can detect this just as easily (if not easier) with a timeout.

Regards,

	Hans

> 
> Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
> ---
>  Documentation/media/uapi/v4l/vidioc-dqevent.rst | 6 ++++++
>  Documentation/media/videodev2.h.rst.exceptions  | 1 +
>  include/uapi/linux/videodev2.h                  | 1 +
>  3 files changed, 8 insertions(+)
> 
> diff --git a/Documentation/media/uapi/v4l/vidioc-dqevent.rst b/Documentation/media/uapi/v4l/vidioc-dqevent.rst
> index 8d663a7..dc77363 100644
> --- a/Documentation/media/uapi/v4l/vidioc-dqevent.rst
> +++ b/Documentation/media/uapi/v4l/vidioc-dqevent.rst
> @@ -197,6 +197,12 @@ call.
>  	the regions changes. This event has a struct
>  	:c:type:`v4l2_event_motion_det`
>  	associated with it.
> +    * - ``V4L2_EVENT_FRAME_INTERVAL_ERROR``
> +      - 7
> +      - This event is triggered when the video capture or output device
> +	has measured an interval between the reception or transmit
> +	completion of two consecutive frames of video that is outside
> +	the nominal frame interval by some tolerance value.
>      * - ``V4L2_EVENT_PRIVATE_START``
>        - 0x08000000
>        - Base event number for driver-private events.
> diff --git a/Documentation/media/videodev2.h.rst.exceptions b/Documentation/media/videodev2.h.rst.exceptions
> index e11a0d0..c7d8fad 100644
> --- a/Documentation/media/videodev2.h.rst.exceptions
> +++ b/Documentation/media/videodev2.h.rst.exceptions
> @@ -459,6 +459,7 @@ replace define V4L2_EVENT_CTRL event-type
>  replace define V4L2_EVENT_FRAME_SYNC event-type
>  replace define V4L2_EVENT_SOURCE_CHANGE event-type
>  replace define V4L2_EVENT_MOTION_DET event-type
> +replace define V4L2_EVENT_FRAME_INTERVAL_ERROR event-type
>  replace define V4L2_EVENT_PRIVATE_START event-type
>  
>  replace define V4L2_EVENT_CTRL_CH_VALUE ctrl-changes-flags
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 45184a2..cf5a0d0 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -2131,6 +2131,7 @@ struct v4l2_streamparm {
>  #define V4L2_EVENT_FRAME_SYNC			4
>  #define V4L2_EVENT_SOURCE_CHANGE		5
>  #define V4L2_EVENT_MOTION_DET			6
> +#define V4L2_EVENT_FRAME_INTERVAL_ERROR		7
>  #define V4L2_EVENT_PRIVATE_START		0x08000000
>  
>  /* Payload for V4L2_EVENT_VSYNC */
>

Steve Longerbeam March 10, 2017, 6:37 p.m. UTC | #2

Hi Hans,

On 03/10/2017 04:03 AM, Hans Verkuil wrote:
> On 10/03/17 05:52, Steve Longerbeam wrote:
>> Add a new FRAME_INTERVAL_ERROR event to signal that a video capture or
>> output device has measured an interval between the reception or transmit
>> completion of two consecutive frames of video that is outside the nominal
>> frame interval by some tolerance value.
>
> Reading back what was said on this I agree with Sakari that this doesn't
> belong here.
>
> Userspace can detect this just as easily (if not easier) with a timeout.
>

Unfortunately measuring frame intervals from userland is not accurate
enough for i.MX6.

The issue here is that the IPUv3, specifically the CSI unit, can
permanently lose vertical sync if there are truncated frames sent
on the bt.656 bus. We have seen a single missing line of video cause
loss of vertical sync. The only way to correct this is to shutdown
the IPU capture hardware and restart, which can be accomplished
simply by restarting streaming from userland.

There are no other indicators from the sensor about these short
frame events (believe me, we've exhausted all avenues with the ADV718x).
And the IPUv3 DMA engine has no status indicators for short frames
either. So the only way to detect them is by measuring frame intervals.

The intervals have to be able to resolve a single line of missing video.
With a PAL video source that requires better than 58 usec accuracy.

There is too much uncertainty to resolve this at user level. The
driver is able to resolve this by measuring intervals between hardware
interrupts as long as interrupt latency is reasonably low, and we
have another method using the i.MX6 hardware input capture support
that can measure these intervals very accurately with no errors
introduced by interrupt latency.

I made this event a private event to imx-media driver in a previous
iteration, so I can return it to a private event, but this can't be
done at user level.

Steve

>
>>
>> Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
>> ---
>>  Documentation/media/uapi/v4l/vidioc-dqevent.rst | 6 ++++++
>>  Documentation/media/videodev2.h.rst.exceptions  | 1 +
>>  include/uapi/linux/videodev2.h                  | 1 +
>>  3 files changed, 8 insertions(+)
>>
>> diff --git a/Documentation/media/uapi/v4l/vidioc-dqevent.rst b/Documentation/media/uapi/v4l/vidioc-dqevent.rst
>> index 8d663a7..dc77363 100644
>> --- a/Documentation/media/uapi/v4l/vidioc-dqevent.rst
>> +++ b/Documentation/media/uapi/v4l/vidioc-dqevent.rst
>> @@ -197,6 +197,12 @@ call.
>>  	the regions changes. This event has a struct
>>  	:c:type:`v4l2_event_motion_det`
>>  	associated with it.
>> +    * - ``V4L2_EVENT_FRAME_INTERVAL_ERROR``
>> +      - 7
>> +      - This event is triggered when the video capture or output device
>> +	has measured an interval between the reception or transmit
>> +	completion of two consecutive frames of video that is outside
>> +	the nominal frame interval by some tolerance value.
>>      * - ``V4L2_EVENT_PRIVATE_START``
>>        - 0x08000000
>>        - Base event number for driver-private events.
>> diff --git a/Documentation/media/videodev2.h.rst.exceptions b/Documentation/media/videodev2.h.rst.exceptions
>> index e11a0d0..c7d8fad 100644
>> --- a/Documentation/media/videodev2.h.rst.exceptions
>> +++ b/Documentation/media/videodev2.h.rst.exceptions
>> @@ -459,6 +459,7 @@ replace define V4L2_EVENT_CTRL event-type
>>  replace define V4L2_EVENT_FRAME_SYNC event-type
>>  replace define V4L2_EVENT_SOURCE_CHANGE event-type
>>  replace define V4L2_EVENT_MOTION_DET event-type
>> +replace define V4L2_EVENT_FRAME_INTERVAL_ERROR event-type
>>  replace define V4L2_EVENT_PRIVATE_START event-type
>>
>>  replace define V4L2_EVENT_CTRL_CH_VALUE ctrl-changes-flags
>> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>> index 45184a2..cf5a0d0 100644
>> --- a/include/uapi/linux/videodev2.h
>> +++ b/include/uapi/linux/videodev2.h
>> @@ -2131,6 +2131,7 @@ struct v4l2_streamparm {
>>  #define V4L2_EVENT_FRAME_SYNC			4
>>  #define V4L2_EVENT_SOURCE_CHANGE		5
>>  #define V4L2_EVENT_MOTION_DET			6
>> +#define V4L2_EVENT_FRAME_INTERVAL_ERROR		7
>>  #define V4L2_EVENT_PRIVATE_START		0x08000000
>>
>>  /* Payload for V4L2_EVENT_VSYNC */
>>
>

Pavel Machek March 10, 2017, 11:30 p.m. UTC | #3

On Fri 2017-03-10 10:37:21, Steve Longerbeam wrote:
> Hi Hans,
> 
> On 03/10/2017 04:03 AM, Hans Verkuil wrote:
> >On 10/03/17 05:52, Steve Longerbeam wrote:
> >>Add a new FRAME_INTERVAL_ERROR event to signal that a video capture or
> >>output device has measured an interval between the reception or transmit
> >>completion of two consecutive frames of video that is outside the nominal
> >>frame interval by some tolerance value.
> >
> >Reading back what was said on this I agree with Sakari that this doesn't
> >belong here.
> >
> >Userspace can detect this just as easily (if not easier) with a timeout.
> >
> 
> 
> Unfortunately measuring frame intervals from userland is not accurate
> enough for i.MX6.
> 
> The issue here is that the IPUv3, specifically the CSI unit, can
> permanently lose vertical sync if there are truncated frames sent
> on the bt.656 bus. We have seen a single missing line of video cause
> loss of vertical sync. The only way to correct this is to shutdown
> the IPU capture hardware and restart, which can be accomplished
> simply by restarting streaming from userland.
> 
> There are no other indicators from the sensor about these short
> frame events (believe me, we've exhausted all avenues with the ADV718x).
> And the IPUv3 DMA engine has no status indicators for short frames
> either. So the only way to detect them is by measuring frame intervals.
> 
> The intervals have to be able to resolve a single line of missing video.
> With a PAL video source that requires better than 58 usec accuracy.
> 
> There is too much uncertainty to resolve this at user level. The
> driver is able to resolve this by measuring intervals between hardware
> interrupts as long as interrupt latency is reasonably low, and we
> have another method using the i.MX6 hardware input capture support
> that can measure these intervals very accurately with no errors
> introduced by interrupt latency.

Requiring < 58 usec interrupt latency for correct operation is a
little too optimistic, no?
									Pavel

Steve Longerbeam March 10, 2017, 11:42 p.m. UTC | #4

On 03/10/2017 03:30 PM, Pavel Machek wrote:
> On Fri 2017-03-10 10:37:21, Steve Longerbeam wrote:
>> Hi Hans,
>>
>> On 03/10/2017 04:03 AM, Hans Verkuil wrote:
>>> On 10/03/17 05:52, Steve Longerbeam wrote:
>>>> Add a new FRAME_INTERVAL_ERROR event to signal that a video capture or
>>>> output device has measured an interval between the reception or transmit
>>>> completion of two consecutive frames of video that is outside the nominal
>>>> frame interval by some tolerance value.
>>>
>>> Reading back what was said on this I agree with Sakari that this doesn't
>>> belong here.
>>>
>>> Userspace can detect this just as easily (if not easier) with a timeout.
>>>
>>
>>
>> Unfortunately measuring frame intervals from userland is not accurate
>> enough for i.MX6.
>>
>> The issue here is that the IPUv3, specifically the CSI unit, can
>> permanently lose vertical sync if there are truncated frames sent
>> on the bt.656 bus. We have seen a single missing line of video cause
>> loss of vertical sync. The only way to correct this is to shutdown
>> the IPU capture hardware and restart, which can be accomplished
>> simply by restarting streaming from userland.
>>
>> There are no other indicators from the sensor about these short
>> frame events (believe me, we've exhausted all avenues with the ADV718x).
>> And the IPUv3 DMA engine has no status indicators for short frames
>> either. So the only way to detect them is by measuring frame intervals.
>>
>> The intervals have to be able to resolve a single line of missing video.
>> With a PAL video source that requires better than 58 usec accuracy.
>>
>> There is too much uncertainty to resolve this at user level. The
>> driver is able to resolve this by measuring intervals between hardware
>> interrupts as long as interrupt latency is reasonably low, and we
>> have another method using the i.MX6 hardware input capture support
>> that can measure these intervals very accurately with no errors
>> introduced by interrupt latency.
>
> Requiring < 58 usec interrupt latency for correct operation is a
> little too optimistic, no?


No it's not too optimistic, from experience the imx6 kernel has irq
latency less than 10 usec under normal system load. False events can be
generated if the latency gets bad, it's true, and that's why there is
the imx6 timer input capture approach.

Steve

Hans Verkuil March 11, 2017, 11:39 a.m. UTC | #5

On 10/03/17 19:37, Steve Longerbeam wrote:
> Hi Hans,
> 
> On 03/10/2017 04:03 AM, Hans Verkuil wrote:
>> On 10/03/17 05:52, Steve Longerbeam wrote:
>>> Add a new FRAME_INTERVAL_ERROR event to signal that a video capture or
>>> output device has measured an interval between the reception or transmit
>>> completion of two consecutive frames of video that is outside the nominal
>>> frame interval by some tolerance value.
>>
>> Reading back what was said on this I agree with Sakari that this doesn't
>> belong here.
>>
>> Userspace can detect this just as easily (if not easier) with a timeout.
>>
> 
> 
> Unfortunately measuring frame intervals from userland is not accurate
> enough for i.MX6.
> 
> The issue here is that the IPUv3, specifically the CSI unit, can
> permanently lose vertical sync if there are truncated frames sent
> on the bt.656 bus. We have seen a single missing line of video cause
> loss of vertical sync. The only way to correct this is to shutdown
> the IPU capture hardware and restart, which can be accomplished
> simply by restarting streaming from userland.
> 
> There are no other indicators from the sensor about these short
> frame events (believe me, we've exhausted all avenues with the ADV718x).
> And the IPUv3 DMA engine has no status indicators for short frames
> either. So the only way to detect them is by measuring frame intervals.
> 
> The intervals have to be able to resolve a single line of missing video.
> With a PAL video source that requires better than 58 usec accuracy.
> 
> There is too much uncertainty to resolve this at user level. The
> driver is able to resolve this by measuring intervals between hardware
> interrupts as long as interrupt latency is reasonably low, and we
> have another method using the i.MX6 hardware input capture support
> that can measure these intervals very accurately with no errors
> introduced by interrupt latency.
> 
> I made this event a private event to imx-media driver in a previous
> iteration, so I can return it to a private event, but this can't be
> done at user level.

It's fine to use an internal event as long as the end-user doesn't
see it. But if you lose vsyncs, then you never capture another frame,
right? So userspace can detect that (i.e. no new frames arrive) and
it can timeout on that. Or you detect it in the driver and restart there,
or call vb2_queue_error().

Anything really as long as this event isn't user-visible :-)

Regards,

	Hans

> 
> Steve
> 
> 
>>
>>>
>>> Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
>>> ---
>>>  Documentation/media/uapi/v4l/vidioc-dqevent.rst | 6 ++++++
>>>  Documentation/media/videodev2.h.rst.exceptions  | 1 +
>>>  include/uapi/linux/videodev2.h                  | 1 +
>>>  3 files changed, 8 insertions(+)
>>>
>>> diff --git a/Documentation/media/uapi/v4l/vidioc-dqevent.rst b/Documentation/media/uapi/v4l/vidioc-dqevent.rst
>>> index 8d663a7..dc77363 100644
>>> --- a/Documentation/media/uapi/v4l/vidioc-dqevent.rst
>>> +++ b/Documentation/media/uapi/v4l/vidioc-dqevent.rst
>>> @@ -197,6 +197,12 @@ call.
>>>      the regions changes. This event has a struct
>>>      :c:type:`v4l2_event_motion_det`
>>>      associated with it.
>>> +    * - ``V4L2_EVENT_FRAME_INTERVAL_ERROR``
>>> +      - 7
>>> +      - This event is triggered when the video capture or output device
>>> +    has measured an interval between the reception or transmit
>>> +    completion of two consecutive frames of video that is outside
>>> +    the nominal frame interval by some tolerance value.
>>>      * - ``V4L2_EVENT_PRIVATE_START``
>>>        - 0x08000000
>>>        - Base event number for driver-private events.
>>> diff --git a/Documentation/media/videodev2.h.rst.exceptions b/Documentation/media/videodev2.h.rst.exceptions
>>> index e11a0d0..c7d8fad 100644
>>> --- a/Documentation/media/videodev2.h.rst.exceptions
>>> +++ b/Documentation/media/videodev2.h.rst.exceptions
>>> @@ -459,6 +459,7 @@ replace define V4L2_EVENT_CTRL event-type
>>>  replace define V4L2_EVENT_FRAME_SYNC event-type
>>>  replace define V4L2_EVENT_SOURCE_CHANGE event-type
>>>  replace define V4L2_EVENT_MOTION_DET event-type
>>> +replace define V4L2_EVENT_FRAME_INTERVAL_ERROR event-type
>>>  replace define V4L2_EVENT_PRIVATE_START event-type
>>>
>>>  replace define V4L2_EVENT_CTRL_CH_VALUE ctrl-changes-flags
>>> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>>> index 45184a2..cf5a0d0 100644
>>> --- a/include/uapi/linux/videodev2.h
>>> +++ b/include/uapi/linux/videodev2.h
>>> @@ -2131,6 +2131,7 @@ struct v4l2_streamparm {
>>>  #define V4L2_EVENT_FRAME_SYNC            4
>>>  #define V4L2_EVENT_SOURCE_CHANGE        5
>>>  #define V4L2_EVENT_MOTION_DET            6
>>> +#define V4L2_EVENT_FRAME_INTERVAL_ERROR        7
>>>  #define V4L2_EVENT_PRIVATE_START        0x08000000
>>>
>>>  /* Payload for V4L2_EVENT_VSYNC */
>>>
>>
>

Steve Longerbeam March 11, 2017, 6:14 p.m. UTC | #6

On 03/11/2017 03:39 AM, Hans Verkuil wrote:
> On 10/03/17 19:37, Steve Longerbeam wrote:
>> Hi Hans,
>>
>> On 03/10/2017 04:03 AM, Hans Verkuil wrote:
>>> On 10/03/17 05:52, Steve Longerbeam wrote:
>>>> Add a new FRAME_INTERVAL_ERROR event to signal that a video capture or
>>>> output device has measured an interval between the reception or transmit
>>>> completion of two consecutive frames of video that is outside the nominal
>>>> frame interval by some tolerance value.
>>>
>>> Reading back what was said on this I agree with Sakari that this doesn't
>>> belong here.
>>>
>>> Userspace can detect this just as easily (if not easier) with a timeout.
>>>
>>
>>
>> Unfortunately measuring frame intervals from userland is not accurate
>> enough for i.MX6.
>>
>> The issue here is that the IPUv3, specifically the CSI unit, can
>> permanently lose vertical sync if there are truncated frames sent
>> on the bt.656 bus. We have seen a single missing line of video cause
>> loss of vertical sync. The only way to correct this is to shutdown
>> the IPU capture hardware and restart, which can be accomplished
>> simply by restarting streaming from userland.
>>
>> There are no other indicators from the sensor about these short
>> frame events (believe me, we've exhausted all avenues with the ADV718x).
>> And the IPUv3 DMA engine has no status indicators for short frames
>> either. So the only way to detect them is by measuring frame intervals.
>>
>> The intervals have to be able to resolve a single line of missing video.
>> With a PAL video source that requires better than 58 usec accuracy.
>>
>> There is too much uncertainty to resolve this at user level. The
>> driver is able to resolve this by measuring intervals between hardware
>> interrupts as long as interrupt latency is reasonably low, and we
>> have another method using the i.MX6 hardware input capture support
>> that can measure these intervals very accurately with no errors
>> introduced by interrupt latency.
>>
>> I made this event a private event to imx-media driver in a previous
>> iteration, so I can return it to a private event, but this can't be
>> done at user level.
>
> It's fine to use an internal event as long as the end-user doesn't
> see it. But if you lose vsyncs, then you never capture another frame,
> right?

No, that's not correct. By loss of vertical sync, I mean the IPU
captures portions of two different frames, resulting in a permanent
"split image", with one frame containing portions of two consecutive
images. Or, the video rolls continuously, if you remember the old CRT
television sets of yore, it's the same rolling effect.


> So userspace can detect that (i.e. no new frames arrive) and
> it can timeout on that. Or you detect it in the driver and restart there,
> or call vb2_queue_error().
>

There is no timeout, the frames keep coming, but they are split images
or rolling.

> Anything really as long as this event isn't user-visible :-)

The event must be user visible, otherwise the user has no indication
the error, and can't correct it by stream restart.

Steve

Russell King (Oracle) March 11, 2017, 6:51 p.m. UTC | #7

On Sat, Mar 11, 2017 at 10:14:49AM -0800, Steve Longerbeam wrote:
> On 03/11/2017 03:39 AM, Hans Verkuil wrote:
> >It's fine to use an internal event as long as the end-user doesn't
> >see it. But if you lose vsyncs, then you never capture another frame,
> >right?
> 
> No, that's not correct. By loss of vertical sync, I mean the IPU
> captures portions of two different frames, resulting in a permanent
> "split image", with one frame containing portions of two consecutive
> images. Or, the video rolls continuously, if you remember the old CRT
> television sets of yore, it's the same rolling effect.

I have seen that rolling effect, but the iMX6 regains correct sync
within one complete "roll" just fine here with IMX219.  However, it
has always recovered.

So, I don't think there's a problem with the iMX6 part of the
processing, and so I don't think we should cripple the iMX6 capture
drivers for this problem.

It seems to me that the problem is with the source.

Steve Longerbeam March 11, 2017, 6:58 p.m. UTC | #8

On 03/11/2017 10:51 AM, Russell King - ARM Linux wrote:
> On Sat, Mar 11, 2017 at 10:14:49AM -0800, Steve Longerbeam wrote:
>> On 03/11/2017 03:39 AM, Hans Verkuil wrote:
>>> It's fine to use an internal event as long as the end-user doesn't
>>> see it. But if you lose vsyncs, then you never capture another frame,
>>> right?
>>
>> No, that's not correct. By loss of vertical sync, I mean the IPU
>> captures portions of two different frames, resulting in a permanent
>> "split image", with one frame containing portions of two consecutive
>> images. Or, the video rolls continuously, if you remember the old CRT
>> television sets of yore, it's the same rolling effect.
>
> I have seen that rolling effect, but the iMX6 regains correct sync
> within one complete "roll" just fine here with IMX219.  However, it
> has always recovered.

I've seen permanent split images, and rolling that continues for
minutes.


>
> So, I don't think there's a problem with the iMX6 part of the
> processing, and so I don't think we should cripple the iMX6 capture
> drivers for this problem.
>
> It seems to me that the problem is with the source.

The the problem is almost surely in the CSI. It's handling of the
bt.656 sync codes is broken in some way.

Steve

Steve Longerbeam March 11, 2017, 7 p.m. UTC | #9

On 03/11/2017 10:51 AM, Russell King - ARM Linux wrote:
> On Sat, Mar 11, 2017 at 10:14:49AM -0800, Steve Longerbeam wrote:
>> On 03/11/2017 03:39 AM, Hans Verkuil wrote:
>>> It's fine to use an internal event as long as the end-user doesn't
>>> see it. But if you lose vsyncs, then you never capture another frame,
>>> right?
>>
>> No, that's not correct. By loss of vertical sync, I mean the IPU
>> captures portions of two different frames, resulting in a permanent
>> "split image", with one frame containing portions of two consecutive
>> images. Or, the video rolls continuously, if you remember the old CRT
>> television sets of yore, it's the same rolling effect.
>
> I have seen that rolling effect, but the iMX6 regains correct sync
> within one complete "roll" just fine here with IMX219.  However, it
> has always recovered.
>
> So, I don't think there's a problem with the iMX6 part of the
> processing, and so I don't think we should cripple the iMX6 capture
> drivers for this problem.

And there is no crippling going on. Measuring the frame intervals
adds very little overhead.

Steve

Hans Verkuil March 13, 2017, 10:02 a.m. UTC | #10

On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
> 
> 
> On 03/11/2017 03:39 AM, Hans Verkuil wrote:
>> On 10/03/17 19:37, Steve Longerbeam wrote:
>>> Hi Hans,
>>>
>>> On 03/10/2017 04:03 AM, Hans Verkuil wrote:
>>>> On 10/03/17 05:52, Steve Longerbeam wrote:
>>>>> Add a new FRAME_INTERVAL_ERROR event to signal that a video capture or
>>>>> output device has measured an interval between the reception or transmit
>>>>> completion of two consecutive frames of video that is outside the nominal
>>>>> frame interval by some tolerance value.
>>>>
>>>> Reading back what was said on this I agree with Sakari that this doesn't
>>>> belong here.
>>>>
>>>> Userspace can detect this just as easily (if not easier) with a timeout.
>>>>
>>>
>>>
>>> Unfortunately measuring frame intervals from userland is not accurate
>>> enough for i.MX6.
>>>
>>> The issue here is that the IPUv3, specifically the CSI unit, can
>>> permanently lose vertical sync if there are truncated frames sent
>>> on the bt.656 bus. We have seen a single missing line of video cause
>>> loss of vertical sync. The only way to correct this is to shutdown
>>> the IPU capture hardware and restart, which can be accomplished
>>> simply by restarting streaming from userland.
>>>
>>> There are no other indicators from the sensor about these short
>>> frame events (believe me, we've exhausted all avenues with the ADV718x).
>>> And the IPUv3 DMA engine has no status indicators for short frames
>>> either. So the only way to detect them is by measuring frame intervals.
>>>
>>> The intervals have to be able to resolve a single line of missing video.
>>> With a PAL video source that requires better than 58 usec accuracy.
>>>
>>> There is too much uncertainty to resolve this at user level. The
>>> driver is able to resolve this by measuring intervals between hardware
>>> interrupts as long as interrupt latency is reasonably low, and we
>>> have another method using the i.MX6 hardware input capture support
>>> that can measure these intervals very accurately with no errors
>>> introduced by interrupt latency.
>>>
>>> I made this event a private event to imx-media driver in a previous
>>> iteration, so I can return it to a private event, but this can't be
>>> done at user level.
>>
>> It's fine to use an internal event as long as the end-user doesn't
>> see it. But if you lose vsyncs, then you never capture another frame,
>> right?
> 
> No, that's not correct. By loss of vertical sync, I mean the IPU
> captures portions of two different frames, resulting in a permanent
> "split image", with one frame containing portions of two consecutive
> images. Or, the video rolls continuously, if you remember the old CRT
> television sets of yore, it's the same rolling effect.
> 
> 
>> So userspace can detect that (i.e. no new frames arrive) and
>> it can timeout on that. Or you detect it in the driver and restart there,
>> or call vb2_queue_error().
>>
> 
> There is no timeout, the frames keep coming, but they are split images
> or rolling.

Ah, OK. That wasn't clear to me from the description.

> 
>> Anything really as long as this event isn't user-visible :-)
> 
> The event must be user visible, otherwise the user has no indication
> the error, and can't correct it by stream restart.

In that case the driver can detect this and call vb2_queue_error. It's
what it is there for.

The event doesn't help you since only this driver has this issue. So nobody
will watch this event, unless it is sw specifically written for this SoC.

Much better to call vb2_queue_error to signal a fatal error (which this
apparently is) since there are more drivers that do this, and vivid supports
triggering this condition as well.

Regards,

	Hans

Russell King (Oracle) March 13, 2017, 10:45 a.m. UTC | #11

On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
> On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
> > The event must be user visible, otherwise the user has no indication
> > the error, and can't correct it by stream restart.
> 
> In that case the driver can detect this and call vb2_queue_error. It's
> what it is there for.
> 
> The event doesn't help you since only this driver has this issue. So nobody
> will watch this event, unless it is sw specifically written for this SoC.
> 
> Much better to call vb2_queue_error to signal a fatal error (which this
> apparently is) since there are more drivers that do this, and vivid supports
> triggering this condition as well.

So today, I can fiddle around with the IMX219 registers to help gain
an understanding of how this sensor works.  Several of the registers
(such as the PLL setup [*]) require me to disable streaming on the
sensor while changing them.

This is something I've done many times while testing various ideas,
and is my primary way of figuring out and testing such things.

Whenever I resume streaming (provided I've let the sensor stop
streaming at a frame boundary) it resumes as if nothing happened.  If I
stop the sensor mid-frame, then I get the rolling issue that Steve
reports, but once the top of the frame becomes aligned with the top of
the capture, everything then becomes stable again as if nothing happened.

The side effect of what you're proposing is that when I disable streaming
at the sensor by poking at its registers, rather than the capture just
stopping, an error is going to be delivered to gstreamer, and gstreamer
is going to exit, taking the entire capture process down.

This severely restricts the ability to be able to develop and test
sensor drivers.

So, I strongly disagree with you.

Loss of capture frames is not necessarily a fatal error - as I have been
saying repeatedly.  In Steve's case, there's some unknown interaction
between the source and iMX6 hardware that is causing the instability,
but that is simply not true of other sources, and I oppose any idea that
we should cripple the iMX6 side of the capture based upon just one
hardware combination where this is a problem.

Steve suggested that the problem could be in the iMX6 CSI block - and I
note comparing Steve's code with the code in FSL's repository that there
are some changes that are missing in Steve's code to do with the CCIR656
sync code setup, particularly for >8 bit.  The progressive CCIR656 8-bit
setup looks pretty similar though - but I think what needs to be asked
is whether the same problem is visible using the FSL/NXP vendor kernel.

* - the PLL setup is something that requires research at the moment.
Sony's official position (even to their customers) is that they do not
supply the necessary information, instead they expect customers to tell
them the capture settings they want, and Sony will throw the values into
a spreadsheet, and they'll supply the register settings back to the
customer.  Hence, the only way to proceed with a generic driver for
this sensor is to experiment, and experimenting requires the ability to
pause the stream at the sensor while making changes.  Take this away,
and we're stuck with the tables-of-register-settings-for-set-of-fixed-
capture-settings approach.  I've made a lot of progress away from this
which is all down to the flexibility afforded by _not_ killing the
capture process.

Hans Verkuil March 13, 2017, 10:53 a.m. UTC | #12

On 03/13/2017 11:45 AM, Russell King - ARM Linux wrote:
> On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
>> On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
>>> The event must be user visible, otherwise the user has no indication
>>> the error, and can't correct it by stream restart.
>>
>> In that case the driver can detect this and call vb2_queue_error. It's
>> what it is there for.
>>
>> The event doesn't help you since only this driver has this issue. So nobody
>> will watch this event, unless it is sw specifically written for this SoC.
>>
>> Much better to call vb2_queue_error to signal a fatal error (which this
>> apparently is) since there are more drivers that do this, and vivid supports
>> triggering this condition as well.
> 
> So today, I can fiddle around with the IMX219 registers to help gain
> an understanding of how this sensor works.  Several of the registers
> (such as the PLL setup [*]) require me to disable streaming on the
> sensor while changing them.
> 
> This is something I've done many times while testing various ideas,
> and is my primary way of figuring out and testing such things.
> 
> Whenever I resume streaming (provided I've let the sensor stop
> streaming at a frame boundary) it resumes as if nothing happened.  If I
> stop the sensor mid-frame, then I get the rolling issue that Steve
> reports, but once the top of the frame becomes aligned with the top of
> the capture, everything then becomes stable again as if nothing happened.
> 
> The side effect of what you're proposing is that when I disable streaming
> at the sensor by poking at its registers, rather than the capture just
> stopping, an error is going to be delivered to gstreamer, and gstreamer
> is going to exit, taking the entire capture process down.
> 
> This severely restricts the ability to be able to develop and test
> sensor drivers.
> 
> So, I strongly disagree with you.
> 
> Loss of capture frames is not necessarily a fatal error - as I have been
> saying repeatedly.  In Steve's case, there's some unknown interaction
> between the source and iMX6 hardware that is causing the instability,
> but that is simply not true of other sources, and I oppose any idea that
> we should cripple the iMX6 side of the capture based upon just one
> hardware combination where this is a problem.
> 
> Steve suggested that the problem could be in the iMX6 CSI block - and I
> note comparing Steve's code with the code in FSL's repository that there
> are some changes that are missing in Steve's code to do with the CCIR656
> sync code setup, particularly for >8 bit.  The progressive CCIR656 8-bit
> setup looks pretty similar though - but I think what needs to be asked
> is whether the same problem is visible using the FSL/NXP vendor kernel.
> 
> 
> * - the PLL setup is something that requires research at the moment.
> Sony's official position (even to their customers) is that they do not
> supply the necessary information, instead they expect customers to tell
> them the capture settings they want, and Sony will throw the values into
> a spreadsheet, and they'll supply the register settings back to the
> customer.  Hence, the only way to proceed with a generic driver for
> this sensor is to experiment, and experimenting requires the ability to
> pause the stream at the sensor while making changes.  Take this away,
> and we're stuck with the tables-of-register-settings-for-set-of-fixed-
> capture-settings approach.  I've made a lot of progress away from this
> which is all down to the flexibility afforded by _not_ killing the
> capture process.
> 

In other words: Steve should either find a proper fix for this, or only
call vb2_queue_error in this specific case. Sending an event that nobody
will know how to handle or what to do with is pretty pointless IMHO.

Let's just give him time to try and figure out the real issue here.

Regards,

	Hans

Steve Longerbeam March 13, 2017, 5:06 p.m. UTC | #13

On 03/13/2017 03:53 AM, Hans Verkuil wrote:
> On 03/13/2017 11:45 AM, Russell King - ARM Linux wrote:
>> On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
>>> On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
>>>> The event must be user visible, otherwise the user has no indication
>>>> the error, and can't correct it by stream restart.
>>>
>>> In that case the driver can detect this and call vb2_queue_error. It's
>>> what it is there for.
>>>
>>> The event doesn't help you since only this driver has this issue. So nobody
>>> will watch this event, unless it is sw specifically written for this SoC.
>>>
>>> Much better to call vb2_queue_error to signal a fatal error (which this
>>> apparently is) since there are more drivers that do this, and vivid supports
>>> triggering this condition as well.
>>
>> So today, I can fiddle around with the IMX219 registers to help gain
>> an understanding of how this sensor works.  Several of the registers
>> (such as the PLL setup [*]) require me to disable streaming on the
>> sensor while changing them.
>>
>> This is something I've done many times while testing various ideas,
>> and is my primary way of figuring out and testing such things.
>>
>> Whenever I resume streaming (provided I've let the sensor stop
>> streaming at a frame boundary) it resumes as if nothing happened.  If I
>> stop the sensor mid-frame, then I get the rolling issue that Steve
>> reports, but once the top of the frame becomes aligned with the top of
>> the capture, everything then becomes stable again as if nothing happened.
>>
>> The side effect of what you're proposing is that when I disable streaming
>> at the sensor by poking at its registers, rather than the capture just
>> stopping, an error is going to be delivered to gstreamer, and gstreamer
>> is going to exit, taking the entire capture process down.
>>
>> This severely restricts the ability to be able to develop and test
>> sensor drivers.
>>
>> So, I strongly disagree with you.
>>
>> Loss of capture frames is not necessarily a fatal error - as I have been
>> saying repeatedly.  In Steve's case, there's some unknown interaction
>> between the source and iMX6 hardware that is causing the instability,
>> but that is simply not true of other sources, and I oppose any idea that
>> we should cripple the iMX6 side of the capture based upon just one
>> hardware combination where this is a problem.
>>
>> Steve suggested that the problem could be in the iMX6 CSI block - and I
>> note comparing Steve's code with the code in FSL's repository that there
>> are some changes that are missing in Steve's code to do with the CCIR656
>> sync code setup, particularly for >8 bit.  The progressive CCIR656 8-bit
>> setup looks pretty similar though - but I think what needs to be asked
>> is whether the same problem is visible using the FSL/NXP vendor kernel.
>>
>>
>> * - the PLL setup is something that requires research at the moment.
>> Sony's official position (even to their customers) is that they do not
>> supply the necessary information, instead they expect customers to tell
>> them the capture settings they want, and Sony will throw the values into
>> a spreadsheet, and they'll supply the register settings back to the
>> customer.  Hence, the only way to proceed with a generic driver for
>> this sensor is to experiment, and experimenting requires the ability to
>> pause the stream at the sensor while making changes.  Take this away,
>> and we're stuck with the tables-of-register-settings-for-set-of-fixed-
>> capture-settings approach.  I've made a lot of progress away from this
>> which is all down to the flexibility afforded by _not_ killing the
>> capture process.
>>
>
> In other words: Steve should either find a proper fix for this, or only
> call vb2_queue_error in this specific case. Sending an event that nobody
> will know how to handle or what to do with is pretty pointless IMHO.
>
> Let's just give him time to try and figure out the real issue here.

This is a long-standing issue, I've traveled to Hildesheim working with
our customer to try and get to the bottom of it. I can go into a lot of
details from those trips, we probed the bt.656 bus with a logic analyzer
and I can share those results with anyone who asks. But the results of
those investigations indicate the CSI is not handling the SAV/EAV sync
codes correctly - if there is a shift in the line position at which
those codes occur, the CSI/IPU does not abort the frame capture DMA
and start from the new sync code position, it just continues to capture
lines until the programmed number of lines are transferred, hence you
get these split images. Freescale also informed us of a mechanism in the
IPU that will add lines if it detects these short frames, until the
programmed number of lines are reached. Apparently that is what creates 
the rolling effect, but this rolling can last for up to a full minute,
which is completely unacceptable, it must be corrected as soon as
possible.

So the only thing we could come up with was to monitor frame intervals,
this is purely empirical, but we observed that frame intervals drop
by approx. one line time (~60 usec) when these short frames are
received. I don't really have any explanation for that but we take
advantage of that observation by sending this event to userspace so
the problem can be corrected immediately with a stream restart.

As I've said, the ADV718x does not provide _any_ indication via status
when the shift in the sync code position occurs. And the IPU is also 
severely lacking in DMA completion status as well (no short packet
status like in USB for example). So the only way to detect this event
is by monitoring the frame intervals.

I will review differences in the CCIR code register setup from FSL's
repo that Russell pointed out, but I'm fairly sure those code registers
are setup correctly, there's not much room for variability in those
values. They only define the values of the sync codes, so the CSI can
detect them, those values are defined by the bt.656 spec.

Anyway, perhaps for now I can remove the event, but keep the FI
monitoring, and for now just report a kernel error message on
a detected bad FI.

Steve

Hans Verkuil March 13, 2017, 5:10 p.m. UTC | #14

On 03/13/2017 06:06 PM, Steve Longerbeam wrote:
> 
> 
> On 03/13/2017 03:53 AM, Hans Verkuil wrote:
>> On 03/13/2017 11:45 AM, Russell King - ARM Linux wrote:
>>> On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
>>>> On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
>>>>> The event must be user visible, otherwise the user has no indication
>>>>> the error, and can't correct it by stream restart.
>>>>
>>>> In that case the driver can detect this and call vb2_queue_error. It's
>>>> what it is there for.
>>>>
>>>> The event doesn't help you since only this driver has this issue. So nobody
>>>> will watch this event, unless it is sw specifically written for this SoC.
>>>>
>>>> Much better to call vb2_queue_error to signal a fatal error (which this
>>>> apparently is) since there are more drivers that do this, and vivid supports
>>>> triggering this condition as well.
>>>
>>> So today, I can fiddle around with the IMX219 registers to help gain
>>> an understanding of how this sensor works.  Several of the registers
>>> (such as the PLL setup [*]) require me to disable streaming on the
>>> sensor while changing them.
>>>
>>> This is something I've done many times while testing various ideas,
>>> and is my primary way of figuring out and testing such things.
>>>
>>> Whenever I resume streaming (provided I've let the sensor stop
>>> streaming at a frame boundary) it resumes as if nothing happened.  If I
>>> stop the sensor mid-frame, then I get the rolling issue that Steve
>>> reports, but once the top of the frame becomes aligned with the top of
>>> the capture, everything then becomes stable again as if nothing happened.
>>>
>>> The side effect of what you're proposing is that when I disable streaming
>>> at the sensor by poking at its registers, rather than the capture just
>>> stopping, an error is going to be delivered to gstreamer, and gstreamer
>>> is going to exit, taking the entire capture process down.
>>>
>>> This severely restricts the ability to be able to develop and test
>>> sensor drivers.
>>>
>>> So, I strongly disagree with you.
>>>
>>> Loss of capture frames is not necessarily a fatal error - as I have been
>>> saying repeatedly.  In Steve's case, there's some unknown interaction
>>> between the source and iMX6 hardware that is causing the instability,
>>> but that is simply not true of other sources, and I oppose any idea that
>>> we should cripple the iMX6 side of the capture based upon just one
>>> hardware combination where this is a problem.
>>>
>>> Steve suggested that the problem could be in the iMX6 CSI block - and I
>>> note comparing Steve's code with the code in FSL's repository that there
>>> are some changes that are missing in Steve's code to do with the CCIR656
>>> sync code setup, particularly for >8 bit.  The progressive CCIR656 8-bit
>>> setup looks pretty similar though - but I think what needs to be asked
>>> is whether the same problem is visible using the FSL/NXP vendor kernel.
>>>
>>>
>>> * - the PLL setup is something that requires research at the moment.
>>> Sony's official position (even to their customers) is that they do not
>>> supply the necessary information, instead they expect customers to tell
>>> them the capture settings they want, and Sony will throw the values into
>>> a spreadsheet, and they'll supply the register settings back to the
>>> customer.  Hence, the only way to proceed with a generic driver for
>>> this sensor is to experiment, and experimenting requires the ability to
>>> pause the stream at the sensor while making changes.  Take this away,
>>> and we're stuck with the tables-of-register-settings-for-set-of-fixed-
>>> capture-settings approach.  I've made a lot of progress away from this
>>> which is all down to the flexibility afforded by _not_ killing the
>>> capture process.
>>>
>>
>> In other words: Steve should either find a proper fix for this, or only
>> call vb2_queue_error in this specific case. Sending an event that nobody
>> will know how to handle or what to do with is pretty pointless IMHO.
>>
>> Let's just give him time to try and figure out the real issue here.
> 
> 
> This is a long-standing issue, I've traveled to Hildesheim working with
> our customer to try and get to the bottom of it. I can go into a lot of
> details from those trips, we probed the bt.656 bus with a logic analyzer
> and I can share those results with anyone who asks. But the results of
> those investigations indicate the CSI is not handling the SAV/EAV sync
> codes correctly - if there is a shift in the line position at which
> those codes occur, the CSI/IPU does not abort the frame capture DMA
> and start from the new sync code position, it just continues to capture
> lines until the programmed number of lines are transferred, hence you
> get these split images. Freescale also informed us of a mechanism in the
> IPU that will add lines if it detects these short frames, until the
> programmed number of lines are reached. Apparently that is what creates 
> the rolling effect, but this rolling can last for up to a full minute,
> which is completely unacceptable, it must be corrected as soon as
> possible.
> 
> So the only thing we could come up with was to monitor frame intervals,
> this is purely empirical, but we observed that frame intervals drop
> by approx. one line time (~60 usec) when these short frames are
> received. I don't really have any explanation for that but we take
> advantage of that observation by sending this event to userspace so
> the problem can be corrected immediately with a stream restart.
> 
> As I've said, the ADV718x does not provide _any_ indication via status
> when the shift in the sync code position occurs. And the IPU is also 
> severely lacking in DMA completion status as well (no short packet
> status like in USB for example). So the only way to detect this event
> is by monitoring the frame intervals.
> 
> I will review differences in the CCIR code register setup from FSL's
> repo that Russell pointed out, but I'm fairly sure those code registers
> are setup correctly, there's not much room for variability in those
> values. They only define the values of the sync codes, so the CSI can
> detect them, those values are defined by the bt.656 spec.
> 
> Anyway, perhaps for now I can remove the event, but keep the FI
> monitoring, and for now just report a kernel error message on
> a detected bad FI.

Is it possible to detect this specific situation? Apparently this issue
is not present (or at least resolves itself very quickly) on the imx219.

If it is not possible, then a private event would be acceptable, but
needs to be carefully documented. After all, it is a workaround for a bug,
since otherwise there would be no need for this event.

Regards,

	Hans

Steve Longerbeam March 13, 2017, 9:47 p.m. UTC | #15

On 03/13/2017 10:10 AM, Hans Verkuil wrote:
> On 03/13/2017 06:06 PM, Steve Longerbeam wrote:
>>
>>
>> On 03/13/2017 03:53 AM, Hans Verkuil wrote:
>>> On 03/13/2017 11:45 AM, Russell King - ARM Linux wrote:
>>>> On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
>>>>> On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
>>>>>> The event must be user visible, otherwise the user has no indication
>>>>>> the error, and can't correct it by stream restart.
>>>>>
>>>>> In that case the driver can detect this and call vb2_queue_error. It's
>>>>> what it is there for.
>>>>>
>>>>> The event doesn't help you since only this driver has this issue. So nobody
>>>>> will watch this event, unless it is sw specifically written for this SoC.
>>>>>
>>>>> Much better to call vb2_queue_error to signal a fatal error (which this
>>>>> apparently is) since there are more drivers that do this, and vivid supports
>>>>> triggering this condition as well.
>>>>
>>>> So today, I can fiddle around with the IMX219 registers to help gain
>>>> an understanding of how this sensor works.  Several of the registers
>>>> (such as the PLL setup [*]) require me to disable streaming on the
>>>> sensor while changing them.
>>>>
>>>> This is something I've done many times while testing various ideas,
>>>> and is my primary way of figuring out and testing such things.
>>>>
>>>> Whenever I resume streaming (provided I've let the sensor stop
>>>> streaming at a frame boundary) it resumes as if nothing happened.  If I
>>>> stop the sensor mid-frame, then I get the rolling issue that Steve
>>>> reports, but once the top of the frame becomes aligned with the top of
>>>> the capture, everything then becomes stable again as if nothing happened.
>>>>
>>>> The side effect of what you're proposing is that when I disable streaming
>>>> at the sensor by poking at its registers, rather than the capture just
>>>> stopping, an error is going to be delivered to gstreamer, and gstreamer
>>>> is going to exit, taking the entire capture process down.
>>>>
>>>> This severely restricts the ability to be able to develop and test
>>>> sensor drivers.
>>>>
>>>> So, I strongly disagree with you.
>>>>
>>>> Loss of capture frames is not necessarily a fatal error - as I have been
>>>> saying repeatedly.  In Steve's case, there's some unknown interaction
>>>> between the source and iMX6 hardware that is causing the instability,
>>>> but that is simply not true of other sources, and I oppose any idea that
>>>> we should cripple the iMX6 side of the capture based upon just one
>>>> hardware combination where this is a problem.
>>>>
>>>> Steve suggested that the problem could be in the iMX6 CSI block - and I
>>>> note comparing Steve's code with the code in FSL's repository that there
>>>> are some changes that are missing in Steve's code to do with the CCIR656
>>>> sync code setup, particularly for >8 bit.  The progressive CCIR656 8-bit
>>>> setup looks pretty similar though - but I think what needs to be asked
>>>> is whether the same problem is visible using the FSL/NXP vendor kernel.
>>>>
>>>>
>>>> * - the PLL setup is something that requires research at the moment.
>>>> Sony's official position (even to their customers) is that they do not
>>>> supply the necessary information, instead they expect customers to tell
>>>> them the capture settings they want, and Sony will throw the values into
>>>> a spreadsheet, and they'll supply the register settings back to the
>>>> customer.  Hence, the only way to proceed with a generic driver for
>>>> this sensor is to experiment, and experimenting requires the ability to
>>>> pause the stream at the sensor while making changes.  Take this away,
>>>> and we're stuck with the tables-of-register-settings-for-set-of-fixed-
>>>> capture-settings approach.  I've made a lot of progress away from this
>>>> which is all down to the flexibility afforded by _not_ killing the
>>>> capture process.
>>>>
>>>
>>> In other words: Steve should either find a proper fix for this, or only
>>> call vb2_queue_error in this specific case. Sending an event that nobody
>>> will know how to handle or what to do with is pretty pointless IMHO.
>>>
>>> Let's just give him time to try and figure out the real issue here.
>>
>>
>> This is a long-standing issue, I've traveled to Hildesheim working with
>> our customer to try and get to the bottom of it. I can go into a lot of
>> details from those trips, we probed the bt.656 bus with a logic analyzer
>> and I can share those results with anyone who asks. But the results of
>> those investigations indicate the CSI is not handling the SAV/EAV sync
>> codes correctly - if there is a shift in the line position at which
>> those codes occur, the CSI/IPU does not abort the frame capture DMA
>> and start from the new sync code position, it just continues to capture
>> lines until the programmed number of lines are transferred, hence you
>> get these split images. Freescale also informed us of a mechanism in the
>> IPU that will add lines if it detects these short frames, until the
>> programmed number of lines are reached. Apparently that is what creates
>> the rolling effect, but this rolling can last for up to a full minute,
>> which is completely unacceptable, it must be corrected as soon as
>> possible.
>>
>> So the only thing we could come up with was to monitor frame intervals,
>> this is purely empirical, but we observed that frame intervals drop
>> by approx. one line time (~60 usec) when these short frames are
>> received. I don't really have any explanation for that but we take
>> advantage of that observation by sending this event to userspace so
>> the problem can be corrected immediately with a stream restart.
>>
>> As I've said, the ADV718x does not provide _any_ indication via status
>> when the shift in the sync code position occurs. And the IPU is also
>> severely lacking in DMA completion status as well (no short packet
>> status like in USB for example). So the only way to detect this event
>> is by monitoring the frame intervals.
>>
>> I will review differences in the CCIR code register setup from FSL's
>> repo that Russell pointed out, but I'm fairly sure those code registers
>> are setup correctly, there's not much room for variability in those
>> values. They only define the values of the sync codes, so the CSI can
>> detect them, those values are defined by the bt.656 spec.
>>
>> Anyway, perhaps for now I can remove the event, but keep the FI
>> monitoring, and for now just report a kernel error message on
>> a detected bad FI.
>
> Is it possible to detect this specific situation? Apparently this issue
> is not present (or at least resolves itself very quickly) on the imx219.

The imx219 is not a BT.656 interface, it is MIPI CSI-2, so this is
not an issue for imx219. It is an issue for any sensor with a
parallel BT.656 interface.

>
> If it is not possible, then a private event would be acceptable, but
> needs to be carefully documented. After all, it is a workaround for a bug,
> since otherwise there would be no need for this event.

Ok, sounds like a plan, I'll keep the event as a private event and make
sure it is documented well. Yes it is a workaround, for a silicon bug,
although Freescale/NXP has not issued an errata for it yet AFAIK. THat
might be because they claim to handle it via this adding lines
mechanism, but that mechanism doesn't work well (long duration rolling),
or not at all (permanent split images).


Steve

Nicolas Dufresne March 14, 2017, 4:21 p.m. UTC | #16

Le lundi 13 mars 2017 à 10:45 +0000, Russell King - ARM Linux a écrit :
> On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
> > On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
> > > The event must be user visible, otherwise the user has no indication
> > > the error, and can't correct it by stream restart.
> > 
> > In that case the driver can detect this and call vb2_queue_error. It's
> > what it is there for.
> > 
> > The event doesn't help you since only this driver has this issue. So nobody
> > will watch this event, unless it is sw specifically written for this SoC.
> > 
> > Much better to call vb2_queue_error to signal a fatal error (which this
> > apparently is) since there are more drivers that do this, and vivid supports
> > triggering this condition as well.
> 
> So today, I can fiddle around with the IMX219 registers to help gain
> an understanding of how this sensor works.  Several of the registers
> (such as the PLL setup [*]) require me to disable streaming on the
> sensor while changing them.
> 
> This is something I've done many times while testing various ideas,
> and is my primary way of figuring out and testing such things.
> 
> Whenever I resume streaming (provided I've let the sensor stop
> streaming at a frame boundary) it resumes as if nothing happened.  If I
> stop the sensor mid-frame, then I get the rolling issue that Steve
> reports, but once the top of the frame becomes aligned with the top of
> the capture, everything then becomes stable again as if nothing happened.
> 
> The side effect of what you're proposing is that when I disable streaming
> at the sensor by poking at its registers, rather than the capture just
> stopping, an error is going to be delivered to gstreamer, and gstreamer
> is going to exit, taking the entire capture process down.

Indeed, there is no recovery attempt in GStreamer code, and it's hard
for an higher level programs to handle this. Nothing prevents from
adding something of course, but the errors are really un-specific, so
it would be something pretty blind. For what it has been tested, this
case was never met, usually the error is triggered by a USB camera
being un-plugged, a driver failure or even a firmware crash. Most of
the time, this is not recoverable.

My main concern here based on what I'm reading, is that this driver is
not even able to notice immediately that a produced frame was corrupted
(because it's out of sync). From usability perspective, this is really
bad. Can't the driver derive a clock from some irq and calculate for
each frame if the timing was correct ? And if not mark the buffer with
V4L2_BUF_FLAG_ERROR ?

> 
> This severely restricts the ability to be able to develop and test
> sensor drivers.
> 
> So, I strongly disagree with you.
> 
> Loss of capture frames is not necessarily a fatal error - as I have been
> saying repeatedly.  In Steve's case, there's some unknown interaction
> between the source and iMX6 hardware that is causing the instability,
> but that is simply not true of other sources, and I oppose any idea that
> we should cripple the iMX6 side of the capture based upon just one
> hardware combination where this is a problem.

Indeed, it happens all the time with slow USB port and UVC devices.
Though, the driver is well aware, and mark the buffers with
V4L2_BUF_FLAG_ERROR.

> 
> Steve suggested that the problem could be in the iMX6 CSI block - and I
> note comparing Steve's code with the code in FSL's repository that there
> are some changes that are missing in Steve's code to do with the CCIR656
> sync code setup, particularly for >8 bit.  The progressive CCIR656 8-bit
> setup looks pretty similar though - but I think what needs to be asked
> is whether the same problem is visible using the FSL/NXP vendor kernel.
> 
> 
> * - the PLL setup is something that requires research at the moment.
> Sony's official position (even to their customers) is that they do not
> supply the necessary information, instead they expect customers to tell
> them the capture settings they want, and Sony will throw the values into
> a spreadsheet, and they'll supply the register settings back to the
> customer.  Hence, the only way to proceed with a generic driver for
> this sensor is to experiment, and experimenting requires the ability to
> pause the stream at the sensor while making changes.  Take this away,
> and we're stuck with the tables-of-register-settings-for-set-of-fixed-
> capture-settings approach.  I've made a lot of progress away from this
> which is all down to the flexibility afforded by _not_ killing the
> capture process.
>

Steve Longerbeam March 14, 2017, 4:43 p.m. UTC | #17

On 03/14/2017 09:21 AM, Nicolas Dufresne wrote:
> Le lundi 13 mars 2017 à 10:45 +0000, Russell King - ARM Linux a écrit :
>> On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
>>> On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
>>>> The event must be user visible, otherwise the user has no indication
>>>> the error, and can't correct it by stream restart.
>>> In that case the driver can detect this and call vb2_queue_error. It's
>>> what it is there for.
>>>
>>> The event doesn't help you since only this driver has this issue. So nobody
>>> will watch this event, unless it is sw specifically written for this SoC.
>>>
>>> Much better to call vb2_queue_error to signal a fatal error (which this
>>> apparently is) since there are more drivers that do this, and vivid supports
>>> triggering this condition as well.
>> So today, I can fiddle around with the IMX219 registers to help gain
>> an understanding of how this sensor works.  Several of the registers
>> (such as the PLL setup [*]) require me to disable streaming on the
>> sensor while changing them.
>>
>> This is something I've done many times while testing various ideas,
>> and is my primary way of figuring out and testing such things.
>>
>> Whenever I resume streaming (provided I've let the sensor stop
>> streaming at a frame boundary) it resumes as if nothing happened.  If I
>> stop the sensor mid-frame, then I get the rolling issue that Steve
>> reports, but once the top of the frame becomes aligned with the top of
>> the capture, everything then becomes stable again as if nothing happened.
>>
>> The side effect of what you're proposing is that when I disable streaming
>> at the sensor by poking at its registers, rather than the capture just
>> stopping, an error is going to be delivered to gstreamer, and gstreamer
>> is going to exit, taking the entire capture process down.
> Indeed, there is no recovery attempt in GStreamer code, and it's hard
> for an higher level programs to handle this. Nothing prevents from
> adding something of course, but the errors are really un-specific, so
> it would be something pretty blind. For what it has been tested, this
> case was never met, usually the error is triggered by a USB camera
> being un-plugged, a driver failure or even a firmware crash. Most of
> the time, this is not recoverable.
>
> My main concern here based on what I'm reading, is that this driver is
> not even able to notice immediately that a produced frame was corrupted
> (because it's out of sync). From usability perspective, this is really
> bad.

First, this is an isolated problem, specific to bt.656 and it only
occurs when disrupting the analog video source signal in some
way (by unplugging the RCA cable from the ADV718x connector
for example).

Second, there is no DMA status support in i.MX6 to catch these
shifted bt.656 codes, and the ADV718x does not provide any
status indicators of this event either. So monitoring frame intervals
is the only solution available, until FSL/NXP issues a new silicon rev.


>   Can't the driver derive a clock from some irq and calculate for
> each frame if the timing was correct ?

That's what is being done, essentially.

>   And if not mark the buffer with
> V4L2_BUF_FLAG_ERROR ?

I prefer to keep the private event, V4L2_BUF_FLAG_ERROR is too
unspecific.

Steve


>
>> This severely restricts the ability to be able to develop and test
>> sensor drivers.
>>
>> So, I strongly disagree with you.
>>
>> Loss of capture frames is not necessarily a fatal error - as I have been
>> saying repeatedly.  In Steve's case, there's some unknown interaction
>> between the source and iMX6 hardware that is causing the instability,
>> but that is simply not true of other sources, and I oppose any idea that
>> we should cripple the iMX6 side of the capture based upon just one
>> hardware combination where this is a problem.
> Indeed, it happens all the time with slow USB port and UVC devices.
> Though, the driver is well aware, and mark the buffers with
> V4L2_BUF_FLAG_ERROR.
>
>> Steve suggested that the problem could be in the iMX6 CSI block - and I
>> note comparing Steve's code with the code in FSL's repository that there
>> are some changes that are missing in Steve's code to do with the CCIR656
>> sync code setup, particularly for >8 bit.  The progressive CCIR656 8-bit
>> setup looks pretty similar though - but I think what needs to be asked
>> is whether the same problem is visible using the FSL/NXP vendor kernel.
>>
>>
>> * - the PLL setup is something that requires research at the moment.
>> Sony's official position (even to their customers) is that they do not
>> supply the necessary information, instead they expect customers to tell
>> them the capture settings they want, and Sony will throw the values into
>> a spreadsheet, and they'll supply the register settings back to the
>> customer.  Hence, the only way to proceed with a generic driver for
>> this sensor is to experiment, and experimenting requires the ability to
>> pause the stream at the sensor while making changes.  Take this away,
>> and we're stuck with the tables-of-register-settings-for-set-of-fixed-
>> capture-settings approach.  I've made a lot of progress away from this
>> which is all down to the flexibility afforded by _not_ killing the
>> capture process.

Russell King (Oracle) March 14, 2017, 4:47 p.m. UTC | #18

On Tue, Mar 14, 2017 at 12:21:31PM -0400, Nicolas Dufresne wrote:
> My main concern here based on what I'm reading, is that this driver is
> not even able to notice immediately that a produced frame was corrupted
> (because it's out of sync). From usability perspective, this is really
> bad. Can't the driver derive a clock from some irq and calculate for
> each frame if the timing was correct ? And if not mark the buffer with
> V4L2_BUF_FLAG_ERROR ?

One of the issues of measuring timing with IRQs is the fact that the
IRQ subsystem only allows one IRQ to run at a time.  If an IRQ takes
a relatively long time to process, then it throws the timing of other
IRQs out.

If you're going to decide that a buffer should be marked in error on
the basis of an interrupt arriving late, this can trigger spuriously.

It wasn't that long ago that USB HID was regularly eating something
like 20ms of interrupt time... that's been solved, but that doesn't
mean all cases are solved - there are still interrupt handlers in the
kernel that are on the order of milliseconds to complete.

Given the quality I observe of some USB serial devices (eg, running at
115200 baud, but feeling like they deliver characters to userspace at
9600 baud) I wouldn't be surprised if some USB serial drivers eat a lot
of IRQ time... and if so, all it'll take is to plug such a device in
to disrupt capture.

That sounds way too fragile to me.

Steve Longerbeam March 14, 2017, 4:50 p.m. UTC | #19

On 03/14/2017 09:47 AM, Russell King - ARM Linux wrote:
> On Tue, Mar 14, 2017 at 12:21:31PM -0400, Nicolas Dufresne wrote:
>> My main concern here based on what I'm reading, is that this driver is
>> not even able to notice immediately that a produced frame was corrupted
>> (because it's out of sync). From usability perspective, this is really
>> bad. Can't the driver derive a clock from some irq and calculate for
>> each frame if the timing was correct ? And if not mark the buffer with
>> V4L2_BUF_FLAG_ERROR ?
> One of the issues of measuring timing with IRQs is the fact that the
> IRQ subsystem only allows one IRQ to run at a time.  If an IRQ takes
> a relatively long time to process, then it throws the timing of other
> IRQs out.
>
> If you're going to decide that a buffer should be marked in error on
> the basis of an interrupt arriving late, this can trigger spuriously.
>
> It wasn't that long ago that USB HID was regularly eating something
> like 20ms of interrupt time... that's been solved, but that doesn't
> mean all cases are solved - there are still interrupt handlers in the
> kernel that are on the order of milliseconds to complete.
>
> Given the quality I observe of some USB serial devices (eg, running at
> 115200 baud, but feeling like they deliver characters to userspace at
> 9600 baud) I wouldn't be surprised if some USB serial drivers eat a lot
> of IRQ time... and if so, all it'll take is to plug such a device in
> to disrupt capture.
>
> That sounds way too fragile to me.

exactly, hence the imx6 timer input capture support.

Steve

Pavel Machek March 14, 2017, 6:26 p.m. UTC | #20

On Mon 2017-03-13 10:45:38, Russell King - ARM Linux wrote:
> On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
> > On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
> > > The event must be user visible, otherwise the user has no indication
> > > the error, and can't correct it by stream restart.
> > 
> > In that case the driver can detect this and call vb2_queue_error. It's
> > what it is there for.
> > 
> > The event doesn't help you since only this driver has this issue. So nobody
> > will watch this event, unless it is sw specifically written for this SoC.
> > 
> > Much better to call vb2_queue_error to signal a fatal error (which this
> > apparently is) since there are more drivers that do this, and vivid supports
> > triggering this condition as well.
> 
> So today, I can fiddle around with the IMX219 registers to help gain
> an understanding of how this sensor works.  Several of the registers
> (such as the PLL setup [*]) require me to disable streaming on the
> sensor while changing them.
> 
> This is something I've done many times while testing various ideas,
> and is my primary way of figuring out and testing such things.
> 
> Whenever I resume streaming (provided I've let the sensor stop
> streaming at a frame boundary) it resumes as if nothing happened.  If I
> stop the sensor mid-frame, then I get the rolling issue that Steve
> reports, but once the top of the frame becomes aligned with the top of
> the capture, everything then becomes stable again as if nothing happened.
> 
> The side effect of what you're proposing is that when I disable streaming
> at the sensor by poking at its registers, rather than the capture just
> stopping, an error is going to be delivered to gstreamer, and gstreamer
> is going to exit, taking the entire capture process down.
> 
> This severely restricts the ability to be able to develop and test
> sensor drivers.

Well, but kernel should do what is best for production, not what is
best for driver debugging.

And yes, I guess you can have #ifdef or module parameter or something
switching for behaviour you prefer when you are debugging. But for
production, vb2_queue_error() seems to be the right solution.

								Pavel

Sakari Ailus March 16, 2017, 10:15 p.m. UTC | #21

Hi Steve,

On Tue, Mar 14, 2017 at 09:43:09AM -0700, Steve Longerbeam wrote:
> 
> 
> On 03/14/2017 09:21 AM, Nicolas Dufresne wrote:
> >Le lundi 13 mars 2017 à 10:45 +0000, Russell King - ARM Linux a écrit :
> >>On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
> >>>On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
> >>>>The event must be user visible, otherwise the user has no indication
> >>>>the error, and can't correct it by stream restart.
> >>>In that case the driver can detect this and call vb2_queue_error. It's
> >>>what it is there for.
> >>>
> >>>The event doesn't help you since only this driver has this issue. So nobody
> >>>will watch this event, unless it is sw specifically written for this SoC.
> >>>
> >>>Much better to call vb2_queue_error to signal a fatal error (which this
> >>>apparently is) since there are more drivers that do this, and vivid supports
> >>>triggering this condition as well.
> >>So today, I can fiddle around with the IMX219 registers to help gain
> >>an understanding of how this sensor works.  Several of the registers
> >>(such as the PLL setup [*]) require me to disable streaming on the
> >>sensor while changing them.
> >>
> >>This is something I've done many times while testing various ideas,
> >>and is my primary way of figuring out and testing such things.
> >>
> >>Whenever I resume streaming (provided I've let the sensor stop
> >>streaming at a frame boundary) it resumes as if nothing happened.  If I
> >>stop the sensor mid-frame, then I get the rolling issue that Steve
> >>reports, but once the top of the frame becomes aligned with the top of
> >>the capture, everything then becomes stable again as if nothing happened.
> >>
> >>The side effect of what you're proposing is that when I disable streaming
> >>at the sensor by poking at its registers, rather than the capture just
> >>stopping, an error is going to be delivered to gstreamer, and gstreamer
> >>is going to exit, taking the entire capture process down.
> >Indeed, there is no recovery attempt in GStreamer code, and it's hard
> >for an higher level programs to handle this. Nothing prevents from
> >adding something of course, but the errors are really un-specific, so
> >it would be something pretty blind. For what it has been tested, this
> >case was never met, usually the error is triggered by a USB camera
> >being un-plugged, a driver failure or even a firmware crash. Most of
> >the time, this is not recoverable.
> >
> >My main concern here based on what I'm reading, is that this driver is
> >not even able to notice immediately that a produced frame was corrupted
> >(because it's out of sync). From usability perspective, this is really
> >bad.
> 
> First, this is an isolated problem, specific to bt.656 and it only
> occurs when disrupting the analog video source signal in some
> way (by unplugging the RCA cable from the ADV718x connector
> for example).
> 
> Second, there is no DMA status support in i.MX6 to catch these
> shifted bt.656 codes, and the ADV718x does not provide any
> status indicators of this event either. So monitoring frame intervals
> is the only solution available, until FSL/NXP issues a new silicon rev.
> 
> 
> >  Can't the driver derive a clock from some irq and calculate for
> >each frame if the timing was correct ?
> 
> That's what is being done, essentially.
> 
> >  And if not mark the buffer with
> >V4L2_BUF_FLAG_ERROR ?
> 
> I prefer to keep the private event, V4L2_BUF_FLAG_ERROR is too
> unspecific.

Is the reason you prefer an event that you have multiple drivers involved,
or that the error flag is, well, only telling there was an error with a
particular frame?

Returning -EIO (by calling vb2_queue_error()) would be a better choice as it
is documented behaviour.

[v5,15/39,media] v4l2: add a frame interval error event

Commit Message

Comments

Patch