diff mbox series

[v8,04/15] media:Add v4l2 event codec_error and skip

Message ID 647f84c1e7c2a48d6492d38fa4f06586235500b8.1631002447.git.ming.qian@nxp.com (mailing list archive)
State New, archived
Headers show
Series amphion video decoder/encoder driver | expand

Commit Message

Ming Qian Sept. 7, 2021, 9:49 a.m. UTC
The codec_error event can tell client that
there are some error occurs in the decoder engine.

The skip event can tell the client that
there are a frame has been decoded,
but it won't be outputed.

Signed-off-by: Ming Qian <ming.qian@nxp.com>
Signed-off-by: Shijie Qin <shijie.qin@nxp.com>
Signed-off-by: Zhou Peng <eagle.zhou@nxp.com>
---
 .../userspace-api/media/v4l/vidioc-dqevent.rst       | 12 ++++++++++++
 include/uapi/linux/videodev2.h                       |  2 ++
 2 files changed, 14 insertions(+)

Comments

Nicolas Dufresne Sept. 8, 2021, 1:32 p.m. UTC | #1
Hi Ming,

more API only review.

Le mardi 07 septembre 2021 à 17:49 +0800, Ming Qian a écrit :
> The codec_error event can tell client that
> there are some error occurs in the decoder engine.
> 
> The skip event can tell the client that
> there are a frame has been decoded,
> but it won't be outputed.
> 
> Signed-off-by: Ming Qian <ming.qian@nxp.com>
> Signed-off-by: Shijie Qin <shijie.qin@nxp.com>
> Signed-off-by: Zhou Peng <eagle.zhou@nxp.com>
> ---
>  .../userspace-api/media/v4l/vidioc-dqevent.rst       | 12 ++++++++++++
>  include/uapi/linux/videodev2.h                       |  2 ++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> index 6eb40073c906..87d40ad25604 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> @@ -182,6 +182,18 @@ call.
>  	the regions changes. This event has a struct
>  	:c:type:`v4l2_event_motion_det`
>  	associated with it.
> +    * - ``V4L2_EVENT_CODEC_ERROR``
> +      - 7
> +      - This event is triggered when some error occurs inside the codec engine,
> +	usually it can be replaced by a POLLERR event, but in some cases, the POLLERR
> +	may cause the application to exit, but this event can allow the application to
> +	handle the codec error without exiting.

Events are sent to userspace in a separate queue from the VB2 queue. Which means
it's impossible for userspace to know where this error actually took place.
Userspace may endup discarding valid frames from the VB queue, as it does not
know which one are good, and which one are bad.

There is likely a bit of spec work to be done here for non-fatal decode errors,
but I think the right approach is to use V4L2_BUF_FLAG_ERROR. What we expect
from decoders is that for each frame, a CAPTURE buffer is assigned. If decoding
that frame was not possible but the error is recoverable (corrupted bitstream,
missing reference, etc.), then the failing frame get marked with FLAG_ERROR and
decoding continues as usual.

What isn't documented is that you can set bytesused to 0, meaning there is
nothing useful in that frame, or a valid bytesused when you know only some
blocks are broken (e.g. missing 1 ref). Though, GStreamer might be the only
implementation of that, and byteused 0 may confuse some existing userspace.

> +    * - ``V4L2_EVENT_SKIP``
> +      - 8
> +      - This event is triggered when one frame is decoded, but it won't be outputed
> +	to the display. So the application can't get this frame, and the input frame count
> +	is dismatch with the output frame count. And this evevt is telling the client to
> +	handle this case.

Similar to my previous comment, this event is flawed, since userspace cannot
know were the skip is located in the queued buffers. Currently, all decoders are
mandated to support V4L2_BUF_FLAG_TIMESTAMP_COPY. The timestamp must NOT be
interpreted by the driver and must be reproduce as-is in the associated CAPTURE
buffer. It is possible to "garbage" collect skipped frames with this method,
though tedious.

An alternative, and I think it would be much nicer then this, would be to use
the v4l2_buffer.sequence counter, and just make it skip 1 on skips. Though, the
down side is that userspace must also know how to reorder frames (a driver job
for stateless codecs) in order to identify which frame was skipped. So this is
perhaps not that useful, other then knowing something was skipped in the past.

A third option would be to introduce V4L2_BUF_FLAG_SKIPPED. This way the driver
could return an empty payload (bytesused = 0) buffer with this flag set, and the
proper timestamp properly copied. This would let the driver communicate skipped
frames in real-time. Note that this could break with existing userspace, so it
would need to be opted-in somehow (a control or some flags).

>      * - ``V4L2_EVENT_PRIVATE_START``
>        - 0x08000000
>        - Base event number for driver-private events.
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 5bb0682b4a23..c56640d42dc5 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -2369,6 +2369,8 @@ struct v4l2_streamparm {
>  #define V4L2_EVENT_FRAME_SYNC			4
>  #define V4L2_EVENT_SOURCE_CHANGE		5
>  #define V4L2_EVENT_MOTION_DET			6
> +#define V4L2_EVENT_CODEC_ERROR			7
> +#define V4L2_EVENT_SKIP				8
>  #define V4L2_EVENT_PRIVATE_START		0x08000000
>  
>  /* Payload for V4L2_EVENT_VSYNC */
Ming Qian Sept. 9, 2021, 3:13 a.m. UTC | #2
> -----Original Message-----
> From: Nicolas Dufresne [mailto:nicolas@ndufresne.ca]
> Sent: Wednesday, September 8, 2021 9:33 PM
> To: Ming Qian <ming.qian@nxp.com>; mchehab@kernel.org;
> shawnguo@kernel.org; robh+dt@kernel.org; s.hauer@pengutronix.de
> Cc: hverkuil-cisco@xs4all.nl; kernel@pengutronix.de; festevam@gmail.com;
> dl-linux-imx <linux-imx@nxp.com>; Aisheng Dong <aisheng.dong@nxp.com>;
> linux-media@vger.kernel.org; linux-kernel@vger.kernel.org;
> devicetree@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: [EXT] Re: [PATCH v8 04/15] media:Add v4l2 event codec_error and
> skip
> 
> Caution: EXT Email
> 
> Hi Ming,
> 
> more API only review.
> 
> Le mardi 07 septembre 2021 à 17:49 +0800, Ming Qian a écrit :
> > The codec_error event can tell client that there are some error occurs
> > in the decoder engine.
> >
> > The skip event can tell the client that there are a frame has been
> > decoded, but it won't be outputed.
> >
> > Signed-off-by: Ming Qian <ming.qian@nxp.com>
> > Signed-off-by: Shijie Qin <shijie.qin@nxp.com>
> > Signed-off-by: Zhou Peng <eagle.zhou@nxp.com>
> > ---
> >  .../userspace-api/media/v4l/vidioc-dqevent.rst       | 12 ++++++++++++
> >  include/uapi/linux/videodev2.h                       |  2 ++
> >  2 files changed, 14 insertions(+)
> >
> > diff --git a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > index 6eb40073c906..87d40ad25604 100644
> > --- a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > +++ b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > @@ -182,6 +182,18 @@ call.
> >       the regions changes. This event has a struct
> >       :c:type:`v4l2_event_motion_det`
> >       associated with it.
> > +    * - ``V4L2_EVENT_CODEC_ERROR``
> > +      - 7
> > +      - This event is triggered when some error occurs inside the codec
> engine,
> > +     usually it can be replaced by a POLLERR event, but in some cases, the
> POLLERR
> > +     may cause the application to exit, but this event can allow the
> application to
> > +     handle the codec error without exiting.
> 
> Events are sent to userspace in a separate queue from the VB2 queue. Which
> means it's impossible for userspace to know where this error actually took
> place.
> Userspace may endup discarding valid frames from the VB queue, as it does
> not know which one are good, and which one are bad.
> 
> There is likely a bit of spec work to be done here for non-fatal decode errors,
> but I think the right approach is to use V4L2_BUF_FLAG_ERROR. What we
> expect from decoders is that for each frame, a CAPTURE buffer is assigned. If
> decoding that frame was not possible but the error is recoverable (corrupted
> bitstream, missing reference, etc.), then the failing frame get marked with
> FLAG_ERROR and decoding continues as usual.
> 
> What isn't documented is that you can set bytesused to 0, meaning there is
> nothing useful in that frame, or a valid bytesused when you know only some
> blocks are broken (e.g. missing 1 ref). Though, GStreamer might be the only
> implementation of that, and byteused 0 may confuse some existing userspace.
> 

Hi Nicolas,
    We don't use this event to tell userspace which frame is broken. Actually it tries to tell userspace that 
the decoder is abnormal and there will be no more frames output. The usersapce shouldn't wait, it can reset the decoder instance if it wants to continue decoding more frames, or it can exit directly.
    Usually there will no capture buffer can be dequeued, so we can't set a V4L2_BUF_FLAG_ERROR flag to a capture buffer.
    In my opinion, setting bytesused to 0 means eos, and as you say, it may confuse some existing userspace.
    I think it can be replaced by POLLERR in most of case, but we meet some applications who prefer to use this event instead of pollerr


> > +    * - ``V4L2_EVENT_SKIP``
> > +      - 8
> > +      - This event is triggered when one frame is decoded, but it won't be
> outputed
> > +     to the display. So the application can't get this frame, and the input
> frame count
> > +     is dismatch with the output frame count. And this evevt is telling the
> client to
> > +     handle this case.
> 
> Similar to my previous comment, this event is flawed, since userspace cannot
> know were the skip is located in the queued buffers. Currently, all decoders are
> mandated to support V4L2_BUF_FLAG_TIMESTAMP_COPY. The timestamp
> must NOT be interpreted by the driver and must be reproduce as-is in the
> associated CAPTURE buffer. It is possible to "garbage" collect skipped frames
> with this method, though tedious.
> 
> An alternative, and I think it would be much nicer then this, would be to use
> the v4l2_buffer.sequence counter, and just make it skip 1 on skips. Though, the
> down side is that userspace must also know how to reorder frames (a driver
> job for stateless codecs) in order to identify which frame was skipped. So this
> is perhaps not that useful, other then knowing something was skipped in the
> past.
> 
> A third option would be to introduce V4L2_BUF_FLAG_SKIPPED. This way the
> driver could return an empty payload (bytesused = 0) buffer with this flag set,
> and the proper timestamp properly copied. This would let the driver
> communicate skipped frames in real-time. Note that this could break with
> existing userspace, so it would need to be opted-in somehow (a control or
> some flags).

Hi Nicolas,
   The problem we meet is that userspace doesn't care which frame is skipped, it just need to know that there are a frame is skipped, the driver should promise the input frame count is equals to the output frame count.
    Your first method is possible in theory, but we find the timestamp may be unreliable, we meet many timestamp issues that userspace may enqueue invalid timestamp or repeated timestamp and so on, so we can't accept this solution.
    I think your second option is better. And there are only 1 question, we find some application prefer to use the V4L2_EVENT_EOS to check the eos, not checking the empty buffer, if we use this method to check skipped frame, the application should check empty buffer instead of V4L2_EVENT_EOS, otherwise if the last frame is skipped, the application will miss it. Of course this is not a problem, it just increases the complexity of the userspace implementation
    I don't think your third method is feasible, the reasons are as below
		1. usually the empty payload means eos, and as you say, it may introduce confusion.
    	2. The driver may not have the opportunity to return an empty payload during decoding, in our driver, driver will pass the capture buffer to firmware, and when some frame is skipped, the firmware won't return the buffer, driver may not find an available capture buffer to return to userspace.

   The requirement is that userspace need to match the input frame count and output frame count. It doesn't care which frame is skipped, so the V4L2_EVENT_SKIP is the easiest way for driver and userspace.
   If you think this event is really inappropriate, I prefer to adopt your second option

> 
> >      * - ``V4L2_EVENT_PRIVATE_START``
> >        - 0x08000000
> >        - Base event number for driver-private events.
> > diff --git a/include/uapi/linux/videodev2.h
> > b/include/uapi/linux/videodev2.h index 5bb0682b4a23..c56640d42dc5
> > 100644
> > --- a/include/uapi/linux/videodev2.h
> > +++ b/include/uapi/linux/videodev2.h
> > @@ -2369,6 +2369,8 @@ struct v4l2_streamparm {
> >  #define V4L2_EVENT_FRAME_SYNC                        4
> >  #define V4L2_EVENT_SOURCE_CHANGE             5
> >  #define V4L2_EVENT_MOTION_DET                        6
> > +#define V4L2_EVENT_CODEC_ERROR                       7
> > +#define V4L2_EVENT_SKIP                              8
> >  #define V4L2_EVENT_PRIVATE_START             0x08000000
> >
> >  /* Payload for V4L2_EVENT_VSYNC */
>
Nicolas Dufresne Sept. 9, 2021, 7:54 p.m. UTC | #3
Le jeudi 09 septembre 2021 à 03:13 +0000, Ming Qian a écrit :
> > -----Original Message-----
> > From: Nicolas Dufresne [mailto:nicolas@ndufresne.ca]
> > Sent: Wednesday, September 8, 2021 9:33 PM
> > To: Ming Qian <ming.qian@nxp.com>; mchehab@kernel.org;
> > shawnguo@kernel.org; robh+dt@kernel.org; s.hauer@pengutronix.de
> > Cc: hverkuil-cisco@xs4all.nl; kernel@pengutronix.de; festevam@gmail.com;
> > dl-linux-imx <linux-imx@nxp.com>; Aisheng Dong <aisheng.dong@nxp.com>;
> > linux-media@vger.kernel.org; linux-kernel@vger.kernel.org;
> > devicetree@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> > Subject: [EXT] Re: [PATCH v8 04/15] media:Add v4l2 event codec_error and
> > skip
> > 
> > Caution: EXT Email
> > 
> > Hi Ming,
> > 
> > more API only review.
> > 
> > Le mardi 07 septembre 2021 à 17:49 +0800, Ming Qian a écrit :
> > > The codec_error event can tell client that there are some error occurs
> > > in the decoder engine.
> > > 
> > > The skip event can tell the client that there are a frame has been
> > > decoded, but it won't be outputed.
> > > 
> > > Signed-off-by: Ming Qian <ming.qian@nxp.com>
> > > Signed-off-by: Shijie Qin <shijie.qin@nxp.com>
> > > Signed-off-by: Zhou Peng <eagle.zhou@nxp.com>
> > > ---
> > >  .../userspace-api/media/v4l/vidioc-dqevent.rst       | 12 ++++++++++++
> > >  include/uapi/linux/videodev2.h                       |  2 ++
> > >  2 files changed, 14 insertions(+)
> > > 
> > > diff --git a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > index 6eb40073c906..87d40ad25604 100644
> > > --- a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > +++ b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > @@ -182,6 +182,18 @@ call.
> > >       the regions changes. This event has a struct
> > >       :c:type:`v4l2_event_motion_det`
> > >       associated with it.
> > > +    * - ``V4L2_EVENT_CODEC_ERROR``
> > > +      - 7
> > > +      - This event is triggered when some error occurs inside the codec
> > engine,
> > > +     usually it can be replaced by a POLLERR event, but in some cases,
> > > the
> > POLLERR
> > > +     may cause the application to exit, but this event can allow the
> > application to
> > > +     handle the codec error without exiting.
> > 
> > Events are sent to userspace in a separate queue from the VB2 queue. Which
> > means it's impossible for userspace to know where this error actually took
> > place.
> > Userspace may endup discarding valid frames from the VB queue, as it does
> > not know which one are good, and which one are bad.
> > 
> > There is likely a bit of spec work to be done here for non-fatal decode
> > errors,
> > but I think the right approach is to use V4L2_BUF_FLAG_ERROR. What we
> > expect from decoders is that for each frame, a CAPTURE buffer is assigned.
> > If
> > decoding that frame was not possible but the error is recoverable (corrupted
> > bitstream, missing reference, etc.), then the failing frame get marked with
> > FLAG_ERROR and decoding continues as usual.
> > 
> > What isn't documented is that you can set bytesused to 0, meaning there is
> > nothing useful in that frame, or a valid bytesused when you know only some
> > blocks are broken (e.g. missing 1 ref). Though, GStreamer might be the only
> > implementation of that, and byteused 0 may confuse some existing userspace.
> > 
> 
> Hi Nicolas,
>     We don't use this event to tell userspace which frame is broken. Actually
> it tries to tell userspace that 
> the decoder is abnormal and there will be no more frames output. The usersapce
> shouldn't wait, it can reset the decoder instance if it wants to continue
> decoding more frames, or it can exit directly.
>     Usually there will no capture buffer can be dequeued, so we can't set a
> V4L2_BUF_FLAG_ERROR flag to a capture buffer.

That is not logical, if userspace asked to decode a buffer, but didn't queue
back any CAPTURE buffer, you are expected to just sit there and wait.

>     In my opinion, setting bytesused to 0 means eos, and as you say, it may
> confuse some existing userspace.

Byteused 0 only mean EOS for one specific driver, MFC. That behaviour was kept
to avoid breaking existing userspace. In fact, you have to opt in, the framework
will prevent you from using it for that purpose.

>     I think it can be replaced by POLLERR in most of case, but we meet some
> applications who prefer to use this event instead of pollerr

In general, recoverable errors should be handled without the need for userspace
to reset. This looks more like you have a bug in your error handling and deffer
it to userspace. Most userspace will just abort and report to users, I doubt
this is really what you expect.

What matters for recoverable errors is that you keep consuming OUTPUT buffers.
And userspace should be happy with never getting anything from the CAPTURE till
the propblem was recovered by the driver. Of course, userspace should probably
garbage collect the metadata it might be holding, chromium does that with a
leaky queue of 16 metadata buffer notably

My recommandation would be to drop this for now, and just try to not stall on
errors (or make it a hard failure for now, pollerr, or ioctl errors).

> 
> 
> > > +    * - ``V4L2_EVENT_SKIP``
> > > +      - 8
> > > +      - This event is triggered when one frame is decoded, but it won't
> > > be
> > outputed
> > > +     to the display. So the application can't get this frame, and the
> > > input
> > frame count
> > > +     is dismatch with the output frame count. And this evevt is telling
> > > the
> > client to
> > > +     handle this case.
> > 
> > Similar to my previous comment, this event is flawed, since userspace cannot
> > know were the skip is located in the queued buffers. Currently, all decoders
> > are
> > mandated to support V4L2_BUF_FLAG_TIMESTAMP_COPY. The timestamp
> > must NOT be interpreted by the driver and must be reproduce as-is in the
> > associated CAPTURE buffer. It is possible to "garbage" collect skipped
> > frames
> > with this method, though tedious.
> > 
> > An alternative, and I think it would be much nicer then this, would be to
> > use
> > the v4l2_buffer.sequence counter, and just make it skip 1 on skips. Though,
> > the
> > down side is that userspace must also know how to reorder frames (a driver
> > job for stateless codecs) in order to identify which frame was skipped. So
> > this
> > is perhaps not that useful, other then knowing something was skipped in the
> > past.
> > 
> > A third option would be to introduce V4L2_BUF_FLAG_SKIPPED. This way the
> > driver could return an empty payload (bytesused = 0) buffer with this flag
> > set,
> > and the proper timestamp properly copied. This would let the driver
> > communicate skipped frames in real-time. Note that this could break with
> > existing userspace, so it would need to be opted-in somehow (a control or
> > some flags).
> 
> Hi Nicolas,
>    The problem we meet is that userspace doesn't care which frame is skipped,
> it just need to know that there are a frame is skipped, the driver should
> promise the input frame count is equals to the output frame count.
>     Your first method is possible in theory, but we find the timestamp may be
> unreliable, we meet many timestamp issues that userspace may enqueue invalid
> timestamp or repeated timestamp and so on, so we can't accept this solution.

The driver should not interpret the provided timestamp, so it should not be able
to say if the timestamp is valid or not, this is not the driver's task.

The driver task is to match the timestamp to the CAPTURE buffer (if that buffer
was produced), and reproduce it exactly.

>     I think your second option is better. And there are only 1 question, we
> find some application prefer to use the V4L2_EVENT_EOS to check the eos, not
> checking the empty buffer, if we use this method to check skipped frame, the

Checking the empty buffer is a legacy method, only available in Samsung MFC
driver. The spec says that the last buffer should be flagged with _LAST, and any
further attempt to poll should unblock and DQBUF return EPIPE.

> application should check empty buffer instead of V4L2_EVENT_EOS, otherwise if
> the last frame is skipped, the application will miss it. Of course this is not
> a problem, it just increases the complexity of the userspace implementation

The EPIPE mechanism covers this issue, which we initially had with the LAST
flag.

>     I don't think your third method is feasible, the reasons are as below
> 		1. usually the empty payload means eos, and as you say, it
> may introduce confusion.
>     	2. The driver may not have the opportunity to return an empty payload
> during decoding, in our driver, driver will pass the capture buffer to
> firmware, and when some frame is skipped, the firmware won't return the
> buffer, driver may not find an available capture buffer to return to
> userspace.
> 
>    The requirement is that userspace need to match the input frame count and
> output frame count. It doesn't care which frame is skipped, so the
> V4L2_EVENT_SKIP is the easiest way for driver and userspace.
>    If you think this event is really inappropriate, I prefer to adopt your
> second option

Please, drop SKIP from you driver and this patchset and fix your draining
process handling to follow the spec. The Samsung OMX component is irrelevant to
mainline submission, the OMX code should be updated to follow the spec.

> 
> > 
> > >      * - ``V4L2_EVENT_PRIVATE_START``
> > >        - 0x08000000
> > >        - Base event number for driver-private events.
> > > diff --git a/include/uapi/linux/videodev2.h
> > > b/include/uapi/linux/videodev2.h index 5bb0682b4a23..c56640d42dc5
> > > 100644
> > > --- a/include/uapi/linux/videodev2.h
> > > +++ b/include/uapi/linux/videodev2.h
> > > @@ -2369,6 +2369,8 @@ struct v4l2_streamparm {
> > >  #define V4L2_EVENT_FRAME_SYNC                        4
> > >  #define V4L2_EVENT_SOURCE_CHANGE             5
> > >  #define V4L2_EVENT_MOTION_DET                        6
> > > +#define V4L2_EVENT_CODEC_ERROR                       7
> > > +#define V4L2_EVENT_SKIP                              8
> > >  #define V4L2_EVENT_PRIVATE_START             0x08000000
> > > 
> > >  /* Payload for V4L2_EVENT_VSYNC */
> > 
>
Ming Qian Sept. 10, 2021, 1:50 a.m. UTC | #4
> -----Original Message-----
> From: Nicolas Dufresne [mailto:nicolas@ndufresne.ca]
> Sent: Friday, September 10, 2021 3:54 AM
> To: Ming Qian <ming.qian@nxp.com>; mchehab@kernel.org;
> shawnguo@kernel.org; robh+dt@kernel.org; s.hauer@pengutronix.de
> Cc: hverkuil-cisco@xs4all.nl; kernel@pengutronix.de; festevam@gmail.com;
> dl-linux-imx <linux-imx@nxp.com>; Aisheng Dong <aisheng.dong@nxp.com>;
> linux-media@vger.kernel.org; linux-kernel@vger.kernel.org;
> devicetree@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: Re: [EXT] Re: [PATCH v8 04/15] media:Add v4l2 event codec_error and
> skip
> 
> Caution: EXT Email
> 
> Le jeudi 09 septembre 2021 à 03:13 +0000, Ming Qian a écrit :
> > > -----Original Message-----
> > > From: Nicolas Dufresne [mailto:nicolas@ndufresne.ca]
> > > Sent: Wednesday, September 8, 2021 9:33 PM
> > > To: Ming Qian <ming.qian@nxp.com>; mchehab@kernel.org;
> > > shawnguo@kernel.org; robh+dt@kernel.org; s.hauer@pengutronix.de
> > > Cc: hverkuil-cisco@xs4all.nl; kernel@pengutronix.de;
> > > festevam@gmail.com; dl-linux-imx <linux-imx@nxp.com>; Aisheng Dong
> > > <aisheng.dong@nxp.com>; linux-media@vger.kernel.org;
> > > linux-kernel@vger.kernel.org; devicetree@vger.kernel.org;
> > > linux-arm-kernel@lists.infradead.org
> > > Subject: [EXT] Re: [PATCH v8 04/15] media:Add v4l2 event codec_error
> > > and skip
> > >
> > > Caution: EXT Email
> > >
> > > Hi Ming,
> > >
> > > more API only review.
> > >
> > > Le mardi 07 septembre 2021 à 17:49 +0800, Ming Qian a écrit :
> > > > The codec_error event can tell client that there are some error
> > > > occurs in the decoder engine.
> > > >
> > > > The skip event can tell the client that there are a frame has been
> > > > decoded, but it won't be outputed.
> > > >
> > > > Signed-off-by: Ming Qian <ming.qian@nxp.com>
> > > > Signed-off-by: Shijie Qin <shijie.qin@nxp.com>
> > > > Signed-off-by: Zhou Peng <eagle.zhou@nxp.com>
> > > > ---
> > > >  .../userspace-api/media/v4l/vidioc-dqevent.rst       | 12
> ++++++++++++
> > > >  include/uapi/linux/videodev2.h                       |  2 ++
> > > >  2 files changed, 14 insertions(+)
> > > >
> > > > diff --git
> > > > a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > > b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > > index 6eb40073c906..87d40ad25604 100644
> > > > --- a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > > +++ b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > > @@ -182,6 +182,18 @@ call.
> > > >       the regions changes. This event has a struct
> > > >       :c:type:`v4l2_event_motion_det`
> > > >       associated with it.
> > > > +    * - ``V4L2_EVENT_CODEC_ERROR``
> > > > +      - 7
> > > > +      - This event is triggered when some error occurs inside the
> > > > + codec
> > > engine,
> > > > +     usually it can be replaced by a POLLERR event, but in some
> > > > + cases,
> > > > the
> > > POLLERR
> > > > +     may cause the application to exit, but this event can allow
> > > > + the
> > > application to
> > > > +     handle the codec error without exiting.
> > >
> > > Events are sent to userspace in a separate queue from the VB2 queue.
> > > Which means it's impossible for userspace to know where this error
> > > actually took place.
> > > Userspace may endup discarding valid frames from the VB queue, as it
> > > does not know which one are good, and which one are bad.
> > >
> > > There is likely a bit of spec work to be done here for non-fatal
> > > decode errors, but I think the right approach is to use
> > > V4L2_BUF_FLAG_ERROR. What we expect from decoders is that for each
> > > frame, a CAPTURE buffer is assigned.
> > > If
> > > decoding that frame was not possible but the error is recoverable
> > > (corrupted bitstream, missing reference, etc.), then the failing
> > > frame get marked with FLAG_ERROR and decoding continues as usual.
> > >
> > > What isn't documented is that you can set bytesused to 0, meaning
> > > there is nothing useful in that frame, or a valid bytesused when you
> > > know only some blocks are broken (e.g. missing 1 ref). Though,
> > > GStreamer might be the only implementation of that, and byteused 0 may
> confuse some existing userspace.
> > >
> >
> > Hi Nicolas,
> >     We don't use this event to tell userspace which frame is broken.
> > Actually it tries to tell userspace that the decoder is abnormal and
> > there will be no more frames output. The usersapce shouldn't wait, it
> > can reset the decoder instance if it wants to continue decoding more
> > frames, or it can exit directly.
> >     Usually there will no capture buffer can be dequeued, so we can't
> > set a V4L2_BUF_FLAG_ERROR flag to a capture buffer.
> 
> That is not logical, if userspace asked to decode a buffer, but didn't queue back
> any CAPTURE buffer, you are expected to just sit there and wait.
> 
> >     In my opinion, setting bytesused to 0 means eos, and as you say,
> > it may confuse some existing userspace.
> 
> Byteused 0 only mean EOS for one specific driver, MFC. That behaviour was
> kept to avoid breaking existing userspace. In fact, you have to opt in, the
> framework will prevent you from using it for that purpose.
> 
> >     I think it can be replaced by POLLERR in most of case, but we meet
> > some applications who prefer to use this event instead of pollerr
> 
> In general, recoverable errors should be handled without the need for
> userspace to reset. This looks more like you have a bug in your error handling
> and deffer it to userspace. Most userspace will just abort and report to users, I
> doubt this is really what you expect.
> 
> What matters for recoverable errors is that you keep consuming OUTPUT
> buffers.
> And userspace should be happy with never getting anything from the CAPTURE
> till the propblem was recovered by the driver. Of course, userspace should
> probably garbage collect the metadata it might be holding, chromium does
> that with a leaky queue of 16 metadata buffer notably
> 
> My recommandation would be to drop this for now, and just try to not stall on
> errors (or make it a hard failure for now, pollerr, or ioctl errors).
> 

OK, I agree with you, I will drop it

> >
> >
> > > > +    * - ``V4L2_EVENT_SKIP``
> > > > +      - 8
> > > > +      - This event is triggered when one frame is decoded, but it
> > > > + won't
> > > > be
> > > outputed
> > > > +     to the display. So the application can't get this frame, and
> > > > + the
> > > > input
> > > frame count
> > > > +     is dismatch with the output frame count. And this evevt is
> > > > + telling
> > > > the
> > > client to
> > > > +     handle this case.
> > >
> > > Similar to my previous comment, this event is flawed, since
> > > userspace cannot know were the skip is located in the queued
> > > buffers. Currently, all decoders are mandated to support
> > > V4L2_BUF_FLAG_TIMESTAMP_COPY. The timestamp must NOT be
> interpreted
> > > by the driver and must be reproduce as-is in the associated CAPTURE
> > > buffer. It is possible to "garbage" collect skipped frames with this
> > > method, though tedious.
> > >
> > > An alternative, and I think it would be much nicer then this, would
> > > be to use the v4l2_buffer.sequence counter, and just make it skip 1
> > > on skips. Though, the down side is that userspace must also know how
> > > to reorder frames (a driver job for stateless codecs) in order to
> > > identify which frame was skipped. So this is perhaps not that
> > > useful, other then knowing something was skipped in the past.
> > >
> > > A third option would be to introduce V4L2_BUF_FLAG_SKIPPED. This way
> > > the driver could return an empty payload (bytesused = 0) buffer with
> > > this flag set, and the proper timestamp properly copied. This would
> > > let the driver communicate skipped frames in real-time. Note that
> > > this could break with existing userspace, so it would need to be
> > > opted-in somehow (a control or some flags).
> >
> > Hi Nicolas,
> >    The problem we meet is that userspace doesn't care which frame is
> > skipped, it just need to know that there are a frame is skipped, the
> > driver should promise the input frame count is equals to the output frame
> count.
> >     Your first method is possible in theory, but we find the timestamp
> > may be unreliable, we meet many timestamp issues that userspace may
> > enqueue invalid timestamp or repeated timestamp and so on, so we can't
> accept this solution.
> 
> The driver should not interpret the provided timestamp, so it should not be
> able to say if the timestamp is valid or not, this is not the driver's task.
> 
> The driver task is to match the timestamp to the CAPTURE buffer (if that buffer
> was produced), and reproduce it exactly.
> 
> >     I think your second option is better. And there are only 1
> > question, we find some application prefer to use the V4L2_EVENT_EOS to
> > check the eos, not checking the empty buffer, if we use this method to
> > check skipped frame, the
> 
> Checking the empty buffer is a legacy method, only available in Samsung MFC
> driver. The spec says that the last buffer should be flagged with _LAST, and any
> further attempt to poll should unblock and DQBUF return EPIPE.
> 
> > application should check empty buffer instead of V4L2_EVENT_EOS,
> > otherwise if the last frame is skipped, the application will miss it.
> > Of course this is not a problem, it just increases the complexity of
> > the userspace implementation
> 
> The EPIPE mechanism covers this issue, which we initially had with the LAST
> flag.
> 
> >     I don't think your third method is feasible, the reasons are as below
> >               1. usually the empty payload means eos, and as you say,
> > it may introduce confusion.
> >       2. The driver may not have the opportunity to return an empty
> > payload during decoding, in our driver, driver will pass the capture
> > buffer to firmware, and when some frame is skipped, the firmware won't
> > return the buffer, driver may not find an available capture buffer to
> > return to userspace.
> >
> >    The requirement is that userspace need to match the input frame
> > count and output frame count. It doesn't care which frame is skipped,
> > so the V4L2_EVENT_SKIP is the easiest way for driver and userspace.
> >    If you think this event is really inappropriate, I prefer to adopt
> > your second option
> 
> Please, drop SKIP from you driver and this patchset and fix your draining
> process handling to follow the spec. The Samsung OMX component is
> irrelevant to mainline submission, the OMX code should be updated to follow
> the spec.
> 

OK, I'll drop it, and follow your second option that use the v4l2_buffer.sequence counter.

My previous statement about empty buffer was not appropriate.
I know the last buffer should be flagged with _LAST, and in most of case,
driver will append a empty buffer with _LAST flag.
But in userspace, I find some application will check the bytesused, if it's 0, the application will think it's eos
I checked the gstreamer v4l2 decoder, I found the following code:
    /* Legacy M2M devices return empty buffer when drained */
    if (size == 0 && GST_V4L2_IS_M2M (obj->device_caps))
        goto eos;
and I found the ffmpeg v4l2 decoder does the similar thing.

So I don't want to use empty buffer except the last buffer in the driver.

> >
> > >
> > > >      * - ``V4L2_EVENT_PRIVATE_START``
> > > >        - 0x08000000
> > > >        - Base event number for driver-private events.
> > > > diff --git a/include/uapi/linux/videodev2.h
> > > > b/include/uapi/linux/videodev2.h index 5bb0682b4a23..c56640d42dc5
> > > > 100644
> > > > --- a/include/uapi/linux/videodev2.h
> > > > +++ b/include/uapi/linux/videodev2.h
> > > > @@ -2369,6 +2369,8 @@ struct v4l2_streamparm {
> > > >  #define V4L2_EVENT_FRAME_SYNC                        4
> > > >  #define V4L2_EVENT_SOURCE_CHANGE             5
> > > >  #define V4L2_EVENT_MOTION_DET                        6
> > > > +#define V4L2_EVENT_CODEC_ERROR                       7
> > > > +#define V4L2_EVENT_SKIP                              8
> > > >  #define V4L2_EVENT_PRIVATE_START             0x08000000
> > > >
> > > >  /* Payload for V4L2_EVENT_VSYNC */
> > >
> >
>
Ming Qian Jan. 13, 2022, 7:18 a.m. UTC | #5
Hi Nicolas,

   I have question about skip event or similar concepts.
If the client control the input frame count, and it won't queue any more frames unless some frame is decoded.
But after seek, There is no requirement to begin queuing coded data starting exactly from a resume point (e.g. SPS or a keyframe). Any queued OUTPUT buffers will be processed and returned to the client until a suitable resume point is found. While looking for a resume point, the decoder should not produce any decoded frames into CAPTURE buffers.

So client may have queued some frames but without any resume point, in this case the decoder won't produce any decoded frames into CAPTURE buffers, and the client won't queue frames into output buffers. This creates some kind of deadlock.

In our previous solution, we send skip event to client to tell it that some frame is skipped instead of decoded, then the client can continue to queue frames.
But the skip event is flawed, so we need some solution to resolve it.
1. decoder can produce an empty buffer with V4L2_BUF_FLAG_SKIPPED (or V4L2_BUF_FLAG_ERROR) as you advised, but this seems to conflict with the above description in specification.
2. Define a notification mechanism to notify the client

Can you give some advice?  This constraint of frame depth is common on android

Ming

> > > > +    * - ``V4L2_EVENT_SKIP``
> > > > +      - 8
> > > > +      - This event is triggered when one frame is decoded, but it
> > > > + won't
> > > > be
> > > outputed
> > > > +     to the display. So the application can't get this frame, and
> > > > + the
> > > > input
> > > frame count
> > > > +     is dismatch with the output frame count. And this evevt is
> > > > + telling
> > > > the
> > > client to
> > > > +     handle this case.
> > >
> > > Similar to my previous comment, this event is flawed, since
> > > userspace cannot know were the skip is located in the queued
> > > buffers. Currently, all decoders are mandated to support
> > > V4L2_BUF_FLAG_TIMESTAMP_COPY. The timestamp must NOT be
> interpreted
> > > by the driver and must be reproduce as-is in the associated CAPTURE
> > > buffer. It is possible to "garbage" collect skipped frames with this
> > > method, though tedious.
> > >
> > > An alternative, and I think it would be much nicer then this, would
> > > be to use the v4l2_buffer.sequence counter, and just make it skip 1
> > > on skips. Though, the down side is that userspace must also know how
> > > to reorder frames (a driver job for stateless codecs) in order to
> > > identify which frame was skipped. So this is perhaps not that
> > > useful, other then knowing something was skipped in the past.
> > >
> > > A third option would be to introduce V4L2_BUF_FLAG_SKIPPED. This way
> > > the driver could return an empty payload (bytesused = 0) buffer with
> > > this flag set, and the proper timestamp properly copied. This would
> > > let the driver communicate skipped frames in real-time. Note that
> > > this could break with existing userspace, so it would need to be
> > > opted-in somehow (a control or some flags).
> >
> > Hi Nicolas,
> >    The problem we meet is that userspace doesn't care which frame is
> > skipped, it just need to know that there are a frame is skipped, the
> > driver should promise the input frame count is equals to the output frame
> count.
> >     Your first method is possible in theory, but we find the timestamp
> > may be unreliable, we meet many timestamp issues that userspace may
> > enqueue invalid timestamp or repeated timestamp and so on, so we can't
> accept this solution.
> 
> The driver should not interpret the provided timestamp, so it should not be
> able to say if the timestamp is valid or not, this is not the driver's task.
> 
> The driver task is to match the timestamp to the CAPTURE buffer (if that buffer
> was produced), and reproduce it exactly.
> 
> >     I think your second option is better. And there are only 1
> > question, we find some application prefer to use the V4L2_EVENT_EOS to
> > check the eos, not checking the empty buffer, if we use this method to
> > check skipped frame, the
> 
> Checking the empty buffer is a legacy method, only available in Samsung MFC
> driver. The spec says that the last buffer should be flagged with _LAST, and any
> further attempt to poll should unblock and DQBUF return EPIPE.
> 
> > application should check empty buffer instead of V4L2_EVENT_EOS,
> > otherwise if the last frame is skipped, the application will miss it.
> > Of course this is not a problem, it just increases the complexity of
> > the userspace implementation
> 
> The EPIPE mechanism covers this issue, which we initially had with the LAST
> flag.
> 
> >     I don't think your third method is feasible, the reasons are as below
> >               1. usually the empty payload means eos, and as you say,
> > it may introduce confusion.
> >       2. The driver may not have the opportunity to return an empty
> > payload during decoding, in our driver, driver will pass the capture
> > buffer to firmware, and when some frame is skipped, the firmware won't
> > return the buffer, driver may not find an available capture buffer to
> > return to userspace.
> >
> >    The requirement is that userspace need to match the input frame
> > count and output frame count. It doesn't care which frame is skipped,
> > so the V4L2_EVENT_SKIP is the easiest way for driver and userspace.
> >    If you think this event is really inappropriate, I prefer to adopt
> > your second option
> 
> Please, drop SKIP from you driver and this patchset and fix your draining
> process handling to follow the spec. The Samsung OMX component is
> irrelevant to mainline submission, the OMX code should be updated to follow
> the spec.
> 
> >
Nicolas Dufresne Jan. 21, 2022, 10:25 p.m. UTC | #6
Le jeudi 13 janvier 2022 à 07:18 +0000, Ming Qian a écrit :
> Hi Nicolas,
> 
>    I have question about skip event or similar concepts.
> If the client control the input frame count, and it won't queue any more frames unless some frame is decoded.
> But after seek, There is no requirement to begin queuing coded data starting exactly from a resume point (e.g. SPS or a keyframe). Any queued OUTPUT buffers will be processed and returned to the client until a suitable resume point is found. While looking for a resume point, the decoder should not produce any decoded frames into CAPTURE buffers.
> 
> So client may have queued some frames but without any resume point, in this case the decoder won't produce any decoded frames into CAPTURE buffers, and the client won't queue frames into output buffers. This creates some kind of deadlock.
> 
> In our previous solution, we send skip event to client to tell it that some frame is skipped instead of decoded, then the client can continue to queue frames.
> But the skip event is flawed, so we need some solution to resolve it.
> 1. decoder can produce an empty buffer with V4L2_BUF_FLAG_SKIPPED (or V4L2_BUF_FLAG_ERROR) as you advised, but this seems to conflict with the above description in specification.
> 2. Define a notification mechanism to notify the client
> 
> Can you give some advice?  This constraint of frame depth is common on android

Without going against the spec, you can as of today pop a capture buffer and
mark it done with error. As it has nothing valid in it, I would also set the
payload size to 0.

So I'd say, for every unique input timestamp, that didn't yield a frame
(skipped), pop a capture buffer, copy the timestamp, set the payload size to 0
and set it as done with error.

I'm not sure though if we that we can specify this, as I'm not sure this is
possible with all the existing HW. I must admit, I don't myself had to deal with
that issue as I'm not using a dummy framework. In GStreamer, we take care of
locating the next sync point. So unless there was an error in the framework,
this case does not exist for us.

> 
> Ming
> 
> > > > > +    * - ``V4L2_EVENT_SKIP``
> > > > > +      - 8
> > > > > +      - This event is triggered when one frame is decoded, but it
> > > > > + won't
> > > > > be
> > > > outputed
> > > > > +     to the display. So the application can't get this frame, and
> > > > > + the
> > > > > input
> > > > frame count
> > > > > +     is dismatch with the output frame count. And this evevt is
> > > > > + telling
> > > > > the
> > > > client to
> > > > > +     handle this case.
> > > > 
> > > > Similar to my previous comment, this event is flawed, since
> > > > userspace cannot know were the skip is located in the queued
> > > > buffers. Currently, all decoders are mandated to support
> > > > V4L2_BUF_FLAG_TIMESTAMP_COPY. The timestamp must NOT be
> > interpreted
> > > > by the driver and must be reproduce as-is in the associated CAPTURE
> > > > buffer. It is possible to "garbage" collect skipped frames with this
> > > > method, though tedious.
> > > > 
> > > > An alternative, and I think it would be much nicer then this, would
> > > > be to use the v4l2_buffer.sequence counter, and just make it skip 1
> > > > on skips. Though, the down side is that userspace must also know how
> > > > to reorder frames (a driver job for stateless codecs) in order to
> > > > identify which frame was skipped. So this is perhaps not that
> > > > useful, other then knowing something was skipped in the past.
> > > > 
> > > > A third option would be to introduce V4L2_BUF_FLAG_SKIPPED. This way
> > > > the driver could return an empty payload (bytesused = 0) buffer with
> > > > this flag set, and the proper timestamp properly copied. This would
> > > > let the driver communicate skipped frames in real-time. Note that
> > > > this could break with existing userspace, so it would need to be
> > > > opted-in somehow (a control or some flags).
> > > 
> > > Hi Nicolas,
> > >    The problem we meet is that userspace doesn't care which frame is
> > > skipped, it just need to know that there are a frame is skipped, the
> > > driver should promise the input frame count is equals to the output frame
> > count.
> > >     Your first method is possible in theory, but we find the timestamp
> > > may be unreliable, we meet many timestamp issues that userspace may
> > > enqueue invalid timestamp or repeated timestamp and so on, so we can't
> > accept this solution.
> > 
> > The driver should not interpret the provided timestamp, so it should not be
> > able to say if the timestamp is valid or not, this is not the driver's task.
> > 
> > The driver task is to match the timestamp to the CAPTURE buffer (if that buffer
> > was produced), and reproduce it exactly.
> > 
> > >     I think your second option is better. And there are only 1
> > > question, we find some application prefer to use the V4L2_EVENT_EOS to
> > > check the eos, not checking the empty buffer, if we use this method to
> > > check skipped frame, the
> > 
> > Checking the empty buffer is a legacy method, only available in Samsung MFC
> > driver. The spec says that the last buffer should be flagged with _LAST, and any
> > further attempt to poll should unblock and DQBUF return EPIPE.
> > 
> > > application should check empty buffer instead of V4L2_EVENT_EOS,
> > > otherwise if the last frame is skipped, the application will miss it.
> > > Of course this is not a problem, it just increases the complexity of
> > > the userspace implementation
> > 
> > The EPIPE mechanism covers this issue, which we initially had with the LAST
> > flag.
> > 
> > >     I don't think your third method is feasible, the reasons are as below
> > >               1. usually the empty payload means eos, and as you say,
> > > it may introduce confusion.
> > >       2. The driver may not have the opportunity to return an empty
> > > payload during decoding, in our driver, driver will pass the capture
> > > buffer to firmware, and when some frame is skipped, the firmware won't
> > > return the buffer, driver may not find an available capture buffer to
> > > return to userspace.
> > > 
> > >    The requirement is that userspace need to match the input frame
> > > count and output frame count. It doesn't care which frame is skipped,
> > > so the V4L2_EVENT_SKIP is the easiest way for driver and userspace.
> > >    If you think this event is really inappropriate, I prefer to adopt
> > > your second option
> > 
> > Please, drop SKIP from you driver and this patchset and fix your draining
> > process handling to follow the spec. The Samsung OMX component is
> > irrelevant to mainline submission, the OMX code should be updated to follow
> > the spec.
> > 
> > >
Ming Qian Jan. 25, 2022, 1:54 a.m. UTC | #7
> -----Original Message-----
> From: Nicolas Dufresne [mailto:nicolas@ndufresne.ca]
> Sent: Saturday, January 22, 2022 6:25 AM
> To: Ming Qian <ming.qian@nxp.com>; mchehab@kernel.org;
> shawnguo@kernel.org; robh+dt@kernel.org; s.hauer@pengutronix.de
> Cc: hverkuil-cisco@xs4all.nl; kernel@pengutronix.de; festevam@gmail.com;
> dl-linux-imx <linux-imx@nxp.com>; linux-media@vger.kernel.org;
> linux-kernel@vger.kernel.org; devicetree@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org
> Subject: Re: [EXT] Re: [PATCH v8 04/15] media:Add v4l2 event codec_error and
> skip
> 
> Caution: EXT Email
> 
> Le jeudi 13 janvier 2022 à 07:18 +0000, Ming Qian a écrit :
> > Hi Nicolas,
> >
> >    I have question about skip event or similar concepts.
> > If the client control the input frame count, and it won't queue any more
> frames unless some frame is decoded.
> > But after seek, There is no requirement to begin queuing coded data starting
> exactly from a resume point (e.g. SPS or a keyframe). Any queued OUTPUT
> buffers will be processed and returned to the client until a suitable resume
> point is found. While looking for a resume point, the decoder should not
> produce any decoded frames into CAPTURE buffers.
> >
> > So client may have queued some frames but without any resume point, in
> this case the decoder won't produce any decoded frames into CAPTURE buffers,
> and the client won't queue frames into output buffers. This creates some kind
> of deadlock.
> >
> > In our previous solution, we send skip event to client to tell it that some
> frame is skipped instead of decoded, then the client can continue to queue
> frames.
> > But the skip event is flawed, so we need some solution to resolve it.
> > 1. decoder can produce an empty buffer with V4L2_BUF_FLAG_SKIPPED (or
> V4L2_BUF_FLAG_ERROR) as you advised, but this seems to conflict with the
> above description in specification.
> > 2. Define a notification mechanism to notify the client
> >
> > Can you give some advice?  This constraint of frame depth is common on
> > android
> 
> Without going against the spec, you can as of today pop a capture buffer and
> mark it done with error. As it has nothing valid in it, I would also set the
> payload size to 0.
> 
> So I'd say, for every unique input timestamp, that didn't yield a frame (skipped),
> pop a capture buffer, copy the timestamp, set the payload size to 0 and set it
> as done with error.
> 
> I'm not sure though if we that we can specify this, as I'm not sure this is
> possible with all the existing HW. I must admit, I don't myself had to deal with
> that issue as I'm not using a dummy framework. In GStreamer, we take care of
> locating the next sync point. So unless there was an error in the framework,
> this case does not exist for us.
> 
Hi Nicolas,
    If the decoder can detect the output buffer that may trigger a error, is it better setting error flag on the output buffer, but without producing an empty capture buffer with error flag set?, or we should return both output and capture buffer with error flag set?
    As I can see the following description in spec:

if the decoder is able to precisely report the OUTPUT buffer that triggered the error, such buffer will be returned with the V4L2_BUF_FLAG_ERROR flag set.


> >
> > Ming
> >
> > > > > > +    * - ``V4L2_EVENT_SKIP``
> > > > > > +      - 8
> > > > > > +      - This event is triggered when one frame is decoded,
> > > > > > + but it won't
> > > > > > be
> > > > > outputed
> > > > > > +     to the display. So the application can't get this frame,
> > > > > > + and the
> > > > > > input
> > > > > frame count
> > > > > > +     is dismatch with the output frame count. And this evevt
> > > > > > + is telling
> > > > > > the
> > > > > client to
> > > > > > +     handle this case.
> > > > >
> > > > > Similar to my previous comment, this event is flawed, since
> > > > > userspace cannot know were the skip is located in the queued
> > > > > buffers. Currently, all decoders are mandated to support
> > > > > V4L2_BUF_FLAG_TIMESTAMP_COPY. The timestamp must NOT be
> > > interpreted
> > > > > by the driver and must be reproduce as-is in the associated
> > > > > CAPTURE buffer. It is possible to "garbage" collect skipped
> > > > > frames with this method, though tedious.
> > > > >
> > > > > An alternative, and I think it would be much nicer then this,
> > > > > would be to use the v4l2_buffer.sequence counter, and just make
> > > > > it skip 1 on skips. Though, the down side is that userspace must
> > > > > also know how to reorder frames (a driver job for stateless
> > > > > codecs) in order to identify which frame was skipped. So this is
> > > > > perhaps not that useful, other then knowing something was skipped in
> the past.
> > > > >
> > > > > A third option would be to introduce V4L2_BUF_FLAG_SKIPPED. This
> > > > > way the driver could return an empty payload (bytesused = 0)
> > > > > buffer with this flag set, and the proper timestamp properly
> > > > > copied. This would let the driver communicate skipped frames in
> > > > > real-time. Note that this could break with existing userspace,
> > > > > so it would need to be opted-in somehow (a control or some flags).
> > > >
> > > > Hi Nicolas,
> > > >    The problem we meet is that userspace doesn't care which frame
> > > > is skipped, it just need to know that there are a frame is
> > > > skipped, the driver should promise the input frame count is equals
> > > > to the output frame
> > > count.
> > > >     Your first method is possible in theory, but we find the
> > > > timestamp may be unreliable, we meet many timestamp issues that
> > > > userspace may enqueue invalid timestamp or repeated timestamp and
> > > > so on, so we can't
> > > accept this solution.
> > >
> > > The driver should not interpret the provided timestamp, so it should
> > > not be able to say if the timestamp is valid or not, this is not the driver's
> task.
> > >
> > > The driver task is to match the timestamp to the CAPTURE buffer (if
> > > that buffer was produced), and reproduce it exactly.
> > >
> > > >     I think your second option is better. And there are only 1
> > > > question, we find some application prefer to use the
> > > > V4L2_EVENT_EOS to check the eos, not checking the empty buffer, if
> > > > we use this method to check skipped frame, the
> > >
> > > Checking the empty buffer is a legacy method, only available in
> > > Samsung MFC driver. The spec says that the last buffer should be
> > > flagged with _LAST, and any further attempt to poll should unblock and
> DQBUF return EPIPE.
> > >
> > > > application should check empty buffer instead of V4L2_EVENT_EOS,
> > > > otherwise if the last frame is skipped, the application will miss it.
> > > > Of course this is not a problem, it just increases the complexity
> > > > of the userspace implementation
> > >
> > > The EPIPE mechanism covers this issue, which we initially had with
> > > the LAST flag.
> > >
> > > >     I don't think your third method is feasible, the reasons are as below
> > > >               1. usually the empty payload means eos, and as you
> > > > say, it may introduce confusion.
> > > >       2. The driver may not have the opportunity to return an
> > > > empty payload during decoding, in our driver, driver will pass the
> > > > capture buffer to firmware, and when some frame is skipped, the
> > > > firmware won't return the buffer, driver may not find an available
> > > > capture buffer to return to userspace.
> > > >
> > > >    The requirement is that userspace need to match the input frame
> > > > count and output frame count. It doesn't care which frame is
> > > > skipped, so the V4L2_EVENT_SKIP is the easiest way for driver and
> userspace.
> > > >    If you think this event is really inappropriate, I prefer to
> > > > adopt your second option
> > >
> > > Please, drop SKIP from you driver and this patchset and fix your
> > > draining process handling to follow the spec. The Samsung OMX
> > > component is irrelevant to mainline submission, the OMX code should
> > > be updated to follow the spec.
> > >
> > > >
Nicolas Dufresne Jan. 25, 2022, 8:23 p.m. UTC | #8
Le mardi 25 janvier 2022 à 01:54 +0000, Ming Qian a écrit :
> > -----Original Message-----
> > From: Nicolas Dufresne [mailto:nicolas@ndufresne.ca]
> > Sent: Saturday, January 22, 2022 6:25 AM
> > To: Ming Qian <ming.qian@nxp.com>; mchehab@kernel.org;
> > shawnguo@kernel.org; robh+dt@kernel.org; s.hauer@pengutronix.de
> > Cc: hverkuil-cisco@xs4all.nl; kernel@pengutronix.de; festevam@gmail.com;
> > dl-linux-imx <linux-imx@nxp.com>; linux-media@vger.kernel.org;
> > linux-kernel@vger.kernel.org; devicetree@vger.kernel.org;
> > linux-arm-kernel@lists.infradead.org
> > Subject: Re: [EXT] Re: [PATCH v8 04/15] media:Add v4l2 event codec_error and
> > skip
> > 
> > Caution: EXT Email
> > 
> > Le jeudi 13 janvier 2022 à 07:18 +0000, Ming Qian a écrit :
> > > Hi Nicolas,
> > > 
> > >    I have question about skip event or similar concepts.
> > > If the client control the input frame count, and it won't queue any more
> > frames unless some frame is decoded.
> > > But after seek, There is no requirement to begin queuing coded data
> > > starting
> > exactly from a resume point (e.g. SPS or a keyframe). Any queued OUTPUT
> > buffers will be processed and returned to the client until a suitable resume
> > point is found. While looking for a resume point, the decoder should not
> > produce any decoded frames into CAPTURE buffers.
> > > 
> > > So client may have queued some frames but without any resume point, in
> > this case the decoder won't produce any decoded frames into CAPTURE buffers,
> > and the client won't queue frames into output buffers. This creates some
> > kind
> > of deadlock.
> > > 
> > > In our previous solution, we send skip event to client to tell it that
> > > some
> > frame is skipped instead of decoded, then the client can continue to queue
> > frames.
> > > But the skip event is flawed, so we need some solution to resolve it.
> > > 1. decoder can produce an empty buffer with V4L2_BUF_FLAG_SKIPPED (or
> > V4L2_BUF_FLAG_ERROR) as you advised, but this seems to conflict with the
> > above description in specification.
> > > 2. Define a notification mechanism to notify the client
> > > 
> > > Can you give some advice?  This constraint of frame depth is common on
> > > android
> > 
> > Without going against the spec, you can as of today pop a capture buffer and
> > mark it done with error. As it has nothing valid in it, I would also set the
> > payload size to 0.
> > 
> > So I'd say, for every unique input timestamp, that didn't yield a frame
> > (skipped),
> > pop a capture buffer, copy the timestamp, set the payload size to 0 and set
> > it
> > as done with error.
> > 
> > I'm not sure though if we that we can specify this, as I'm not sure this is
> > possible with all the existing HW. I must admit, I don't myself had to deal
> > with
> > that issue as I'm not using a dummy framework. In GStreamer, we take care of
> > locating the next sync point. So unless there was an error in the framework,
> > this case does not exist for us.
> > 
> Hi Nicolas,
>     If the decoder can detect the output buffer that may trigger a error, is
> it better setting error flag on the output buffer, but without producing an
> empty capture buffer with error flag set?, or we should return both output and
> capture buffer with error flag set?
>     As I can see the following description in spec:
> 
> if the decoder is able to precisely report the OUTPUT buffer that triggered
> the error, such buffer will be returned with the V4L2_BUF_FLAG_ERROR flag set.

Interesting, I never noticed this one. I suppose this would mean some early
notification of decode error. I have always assumed that for m2m, the flags
after DQBUF had no meaning, considering we are just getting back an empty
buffer. I do see possible enhancement of error handling if that was to be
implemented.

For more context, if a reordered frame failed, we will only know after we had
notice failures / errors on dependent frame that are earlier in display order.
Such mechanism would tell us earlier. I wonder if it could not also be earlier
still when no reordering take place ? In general, the main goal with such
mechanism is to request a new keyframe (typically in WebRTC / RTP use cases).

To answer you question, the spec says "if the decoder is able", which looks like
a MAY in specification terminology. So if you don't also produce a capture
buffer for the error, I would be worried existing userland will not notice and
keep waiting for the lost frame. Adding Alexandre Courbot in CC, he may have
more context around this, and perhaps Chromium is using that.

> 
> 
> > > 
> > > Ming
> > > 
> > > > > > > +    * - ``V4L2_EVENT_SKIP``
> > > > > > > +      - 8
> > > > > > > +      - This event is triggered when one frame is decoded,
> > > > > > > + but it won't
> > > > > > > be
> > > > > > outputed
> > > > > > > +     to the display. So the application can't get this frame,
> > > > > > > + and the
> > > > > > > input
> > > > > > frame count
> > > > > > > +     is dismatch with the output frame count. And this evevt
> > > > > > > + is telling
> > > > > > > the
> > > > > > client to
> > > > > > > +     handle this case.
> > > > > > 
> > > > > > Similar to my previous comment, this event is flawed, since
> > > > > > userspace cannot know were the skip is located in the queued
> > > > > > buffers. Currently, all decoders are mandated to support
> > > > > > V4L2_BUF_FLAG_TIMESTAMP_COPY. The timestamp must NOT be
> > > > interpreted
> > > > > > by the driver and must be reproduce as-is in the associated
> > > > > > CAPTURE buffer. It is possible to "garbage" collect skipped
> > > > > > frames with this method, though tedious.
> > > > > > 
> > > > > > An alternative, and I think it would be much nicer then this,
> > > > > > would be to use the v4l2_buffer.sequence counter, and just make
> > > > > > it skip 1 on skips. Though, the down side is that userspace must
> > > > > > also know how to reorder frames (a driver job for stateless
> > > > > > codecs) in order to identify which frame was skipped. So this is
> > > > > > perhaps not that useful, other then knowing something was skipped in
> > the past.
> > > > > > 
> > > > > > A third option would be to introduce V4L2_BUF_FLAG_SKIPPED. This
> > > > > > way the driver could return an empty payload (bytesused = 0)
> > > > > > buffer with this flag set, and the proper timestamp properly
> > > > > > copied. This would let the driver communicate skipped frames in
> > > > > > real-time. Note that this could break with existing userspace,
> > > > > > so it would need to be opted-in somehow (a control or some flags).
> > > > > 
> > > > > Hi Nicolas,
> > > > >    The problem we meet is that userspace doesn't care which frame
> > > > > is skipped, it just need to know that there are a frame is
> > > > > skipped, the driver should promise the input frame count is equals
> > > > > to the output frame
> > > > count.
> > > > >     Your first method is possible in theory, but we find the
> > > > > timestamp may be unreliable, we meet many timestamp issues that
> > > > > userspace may enqueue invalid timestamp or repeated timestamp and
> > > > > so on, so we can't
> > > > accept this solution.
> > > > 
> > > > The driver should not interpret the provided timestamp, so it should
> > > > not be able to say if the timestamp is valid or not, this is not the
> > > > driver's
> > task.
> > > > 
> > > > The driver task is to match the timestamp to the CAPTURE buffer (if
> > > > that buffer was produced), and reproduce it exactly.
> > > > 
> > > > >     I think your second option is better. And there are only 1
> > > > > question, we find some application prefer to use the
> > > > > V4L2_EVENT_EOS to check the eos, not checking the empty buffer, if
> > > > > we use this method to check skipped frame, the
> > > > 
> > > > Checking the empty buffer is a legacy method, only available in
> > > > Samsung MFC driver. The spec says that the last buffer should be
> > > > flagged with _LAST, and any further attempt to poll should unblock and
> > DQBUF return EPIPE.
> > > > 
> > > > > application should check empty buffer instead of V4L2_EVENT_EOS,
> > > > > otherwise if the last frame is skipped, the application will miss it.
> > > > > Of course this is not a problem, it just increases the complexity
> > > > > of the userspace implementation
> > > > 
> > > > The EPIPE mechanism covers this issue, which we initially had with
> > > > the LAST flag.
> > > > 
> > > > >     I don't think your third method is feasible, the reasons are as
> > > > > below
> > > > >               1. usually the empty payload means eos, and as you
> > > > > say, it may introduce confusion.
> > > > >       2. The driver may not have the opportunity to return an
> > > > > empty payload during decoding, in our driver, driver will pass the
> > > > > capture buffer to firmware, and when some frame is skipped, the
> > > > > firmware won't return the buffer, driver may not find an available
> > > > > capture buffer to return to userspace.
> > > > > 
> > > > >    The requirement is that userspace need to match the input frame
> > > > > count and output frame count. It doesn't care which frame is
> > > > > skipped, so the V4L2_EVENT_SKIP is the easiest way for driver and
> > userspace.
> > > > >    If you think this event is really inappropriate, I prefer to
> > > > > adopt your second option
> > > > 
> > > > Please, drop SKIP from you driver and this patchset and fix your
> > > > draining process handling to follow the spec. The Samsung OMX
> > > > component is irrelevant to mainline submission, the OMX code should
> > > > be updated to follow the spec.
> > > > 
> > > > > 
>
diff mbox series

Patch

diff --git a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
index 6eb40073c906..87d40ad25604 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
@@ -182,6 +182,18 @@  call.
 	the regions changes. This event has a struct
 	:c:type:`v4l2_event_motion_det`
 	associated with it.
+    * - ``V4L2_EVENT_CODEC_ERROR``
+      - 7
+      - This event is triggered when some error occurs inside the codec engine,
+	usually it can be replaced by a POLLERR event, but in some cases, the POLLERR
+	may cause the application to exit, but this event can allow the application to
+	handle the codec error without exiting.
+    * - ``V4L2_EVENT_SKIP``
+      - 8
+      - This event is triggered when one frame is decoded, but it won't be outputed
+	to the display. So the application can't get this frame, and the input frame count
+	is dismatch with the output frame count. And this evevt is telling the client to
+	handle this case.
     * - ``V4L2_EVENT_PRIVATE_START``
       - 0x08000000
       - Base event number for driver-private events.
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 5bb0682b4a23..c56640d42dc5 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -2369,6 +2369,8 @@  struct v4l2_streamparm {
 #define V4L2_EVENT_FRAME_SYNC			4
 #define V4L2_EVENT_SOURCE_CHANGE		5
 #define V4L2_EVENT_MOTION_DET			6
+#define V4L2_EVENT_CODEC_ERROR			7
+#define V4L2_EVENT_SKIP				8
 #define V4L2_EVENT_PRIVATE_START		0x08000000
 
 /* Payload for V4L2_EVENT_VSYNC */