diff mbox series

media: docs-rst: Document m2m stateless video decoder interface

Message ID 20181205100121.181765-1-acourbot@chromium.org (mailing list archive)
State New, archived
Headers show
Series media: docs-rst: Document m2m stateless video decoder interface | expand

Commit Message

Alexandre Courbot Dec. 5, 2018, 10:01 a.m. UTC
Documents the protocol that user-space should follow when
communicating with stateless video decoders.

The stateless video decoding API makes use of the new request and tags
APIs. While it has been implemented with the Cedrus driver so far, it
should probably still be considered staging for a short while.

Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
---
Removing the RFC flag this time. Changes since RFCv3:

* Included Tomasz and Hans feedback,
* Expanded the decoding section to better describe the use of requests,
* Use the tags API.

 Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
 .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
 2 files changed, 404 insertions(+)
 create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst

Comments

Paul Kocialkowski Dec. 7, 2018, 8:30 a.m. UTC | #1
Hi,

Thanks for this new version! I only have one comment left, see below.

On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> Documents the protocol that user-space should follow when
> communicating with stateless video decoders.
> 
> The stateless video decoding API makes use of the new request and tags
> APIs. While it has been implemented with the Cedrus driver so far, it
> should probably still be considered staging for a short while.
> 
> Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> ---
> Removing the RFC flag this time. Changes since RFCv3:
> 
> * Included Tomasz and Hans feedback,
> * Expanded the decoding section to better describe the use of requests,
> * Use the tags API.
> 
>  Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
>  .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
>  2 files changed, 404 insertions(+)
>  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> index c61e938bd8dc..3e6a3e883f11 100644
> --- a/Documentation/media/uapi/v4l/dev-codec.rst
> +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> @@ -6,6 +6,11 @@
>  Codec Interface
>  ***************
>  
> +.. toctree::
> +    :maxdepth: 1
> +
> +    dev-stateless-decoder
> +
>  A V4L2 codec can compress, decompress, transform, or otherwise convert
>  video data from one format into another format, in memory. Typically
>  such devices are memory-to-memory devices (i.e. devices with the
> diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> new file mode 100644
> index 000000000000..7a781c89bd59
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> @@ -0,0 +1,399 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _stateless_decoder:
> +
> +**************************************************
> +Memory-to-memory Stateless Video Decoder Interface
> +**************************************************
> +
> +A stateless decoder is a decoder that works without retaining any kind of state
> +between processing frames. This means that each frame is decoded independently
> +of any previous and future frames, and that the client is responsible for
> +maintaining the decoding state and providing it to the decoder with each
> +decoding request. This is in contrast to the stateful video decoder interface,
> +where the hardware and driver maintain the decoding state and all the client
> +has to do is to provide the raw encoded stream.
> +
> +This section describes how user-space ("the client") is expected to communicate
> +with such decoders in order to successfully decode an encoded stream. Compared
> +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> +this simplicity is extra complexity in the client which must maintain a
> +consistent decoding state.
> +
> +Stateless decoders make use of the request API and buffer tags. A stateless
> +decoder must thus expose the following capabilities on its queues when
> +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> +
> +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> +  ``OUTPUT`` queue,
> +
> +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> +  and ``CAPTURE`` queues,
> +

[...]

> +Decoding
> +========
> +
> +For each frame, the client is responsible for submitting a request to which the
> +following is attached:
> +
> +* Exactly one frame worth of encoded data in a buffer submitted to the
> +  ``OUTPUT`` queue,

Although this is still the case in the cedrus driver (but will be fixed
eventually), this requirement should be dropped because metadata is
per-slice and not per-picture in the formats we're currently aiming to
support.

I think it would be safer to mention something like filling the output
buffer with the minimum unit size for the selected output format, to
which the associated metadata applies.

> +* All the controls relevant to the format being decoded (see below for details).
> +
> +The contents of the source ``OUTPUT`` buffer, as well as the controls that must
> +be set on the request, depend on the active coded pixel format and might be
> +affected by codec-specific extended controls, as stated in documentation of each
> +format.
> +
> +A typical frame would thus be decoded using the following sequence:
> +
> +1. Queue an ``OUTPUT`` buffer containing one frame worth of encoded bitstream

Ditto here.

> +   data for the decoding request, using :c:func:`VIDIOC_QBUF`.
> +
> +    * **Required fields:**
> +
> +      ``index``
> +          index of the buffer being queued.
> +
> +      ``type``
> +          type of the buffer.
> +
> +      ``bytesused``
> +          number of bytes taken by the encoded data frame in the buffer.
> +
> +      ``flags``
> +          the ``V4L2_BUF_FLAG_REQUEST_FD`` flag must be set. In addition, if
> +	  the decoded frame is to be used as a reference frame in the future,
> +	  then the ``V4L2_BUF_FLAG_TAG`` flag must be set (it can also be set
> +	  for non-reference frames if it helps the client).
> +
> +      ``request_fd``
> +          must be set to the file descriptor of the decoding request.
> +
> +      ``tag``
> +          if the ``V4L2_BUF_FLAG_TAG`` is set, then this must contain the tag
> +          for the frame that will be copied into the decoded frame buffer, and
> +          can be used to specify this frame as a reference frame for another
> +          one.
> +
> +   .. note::
> +
> +     The API currently requires one frame of encoded data per ``OUTPUT`` buffer,
> +     even though some encoded formats may present their data in smaller chunks
> +     (e.g. H.264's frames can be made of several slices that can be processed
> +     independently). It is currently the responsibility of the client to gather
> +     the different parts of a frame into a single ``OUTPUT`` buffer, while
> +     preserving the same layout as the original bitstream. This
> +     restriction may be lifted in the future.

And this part should probably be dropped too.

Cheers,

Paul

> +2. Set the codec-specific controls for the decoding request, using
> +   :c:func:`VIDIOC_S_EXT_CTRLS`.
> +
> +    * **Required fields:**
> +
> +      ``which``
> +          must be ``V4L2_CTRL_WHICH_REQUEST_VAL``.
> +
> +      ``request_fd``
> +          must be set to the file descriptor of the decoding request.
> +
> +      other fields
> +          other fields are set as usual when setting controls. The ``controls``
> +          array must contain all the codec-specific controls required to decode
> +          a frame.
> +
> +   .. note::
> +
> +      It is possible to specify the controls in different invocations of
> +      :c:func:`VIDIOC_S_EXT_CTRLS`, or to overwrite a previously set control, as
> +      long as ``request_fd`` and ``which`` are properly set. The controls state
> +      at the moment of request submission is the one that will be considered.
> +
> +   .. note::
> +
> +      The order in which steps 1 and 2 take place is interchangeable.
> +
> +3. Submit the request by invoking :c:func:`MEDIA_IOC_REQUEST_QUEUE` on the
> +   request FD.
> +
> +    If the request is submitted without an ``OUTPUT`` buffer, or if some of the
> +    required controls are missing from the request, then
> +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one
> +    ``OUTPUT`` buffer is queued, then it will return ``-EINVAL``.
> +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` returning non-zero means that no
> +    ``CAPTURE`` buffer will be produced for this request.
> +
> +``CAPTURE`` buffers must not be part of the request, and are queued
> +independently. They are returned in decode order (i.e. the same order as
> +``OUTPUT`` buffers were submitted).
> +
> +Runtime decoding errors are signaled by the dequeued ``CAPTURE`` buffers
> +carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If a decoded reference frame has an
> +error, then all following decoded frames that refer to it also have the
> +``V4L2_BUF_FLAG_ERROR`` flag set, although the decoder will still try to
> +produce a (likely corrupted) frame.
> +
> +Buffer management while decoding
> +================================
> +Contrary to stateful decoders, a stateless decoder does not perform any kind of
> +buffer management: it only guarantees that dequeued ``CAPTURE`` buffer can be
> +used by the client for as long as they are not queued again. "Used" here
> +encompasses using the buffer for compositing, display, or as a reference frame
> +to decode a subsequent frame.
> +
> +Reference frames are specified by using the same tag that was set to the
> +``OUTPUT`` buffer of a frame into the relevant codec-specific structures that
> +are submitted as controls. This tag will be copied to the corresponding
> +``CAPTURE`` buffer, but can be used in any subsequent decoding request as soon
> +as the decoding request for that buffer is queued successfully. This means that
> +the client does not need to wait until a ``CAPTURE`` buffer with a given tag is
> +dequeued to start using that tag in reference frames. However, it must wait
> +until all frames referencing a given tag are dequeued before queuing the
> +referenced ``CAPTURE`` buffer again, since queueing a buffer effectively removes
> +its tag.
> +
> +When queuing a decoding request, the driver will increase the reference count of
> +all the resources associated with reference frames. This means that the client
> +can e.g. close the DMABUF file descriptors of the reference frame buffers if it
> +won't need it afterwards, as long as the V4L2 ``CAPTURE`` buffer of the
> +reference frame is not re-queued before all referencing frames are decoded.
> +
> +Seeking
> +=======
> +In order to seek, the client just needs to submit requests using input buffers
> +corresponding to the new stream position. It must however be aware that
> +resolution may have changed and follow the dynamic resolution change sequence in
> +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> +for H.264) may have changed and the client is responsible for making sure that a
> +valid state is sent to the decoder.
> +
> +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> +from the pre-seek position.
> +
> +Pause
> +=====
> +
> +In order to pause, the client can just cease queuing buffers onto the ``OUTPUT``
> +queue. Without source bitstream data, there is no data to process and the codec
> +will remain idle.
> +
> +Dynamic resolution change
> +=========================
> +
> +If the client detects a resolution change in the stream, it will need to perform
> +the initialization sequence again with the new resolution:
> +
> +1. Wait until all submitted requests have completed and dequeue the
> +   corresponding output buffers.
> +
> +2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE``
> +   queues.
> +
> +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> +   ``CAPTURE`` queue with a buffer count of zero.
> +
> +4. Perform the initialization sequence again (minus the allocation of
> +   ``OUTPUT`` buffers), with the new resolution set on the ``OUTPUT`` queue.
> +   Note that due to resolution constraints, a different format may need to be
> +   picked on the ``CAPTURE`` queue.
> +
> +Drain
> +=====
> +
> +In order to drain the stream on a stateless decoder, the client just needs to
> +wait until all the submitted requests are completed. There is no need to send a
> +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> +decoder.
> +
> +End of stream
> +=============
> +
> +When the client detects that the end of stream is reached, it can simply stop
> +sending new frames to the decoder, drain the ``CAPTURE`` queue, and dispose of
> +the decoder as needed.
Tomasz Figa Jan. 22, 2019, 8:19 a.m. UTC | #2
Hi Paul,

On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi,
>
> Thanks for this new version! I only have one comment left, see below.
>
> On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > Documents the protocol that user-space should follow when
> > communicating with stateless video decoders.
> >
> > The stateless video decoding API makes use of the new request and tags
> > APIs. While it has been implemented with the Cedrus driver so far, it
> > should probably still be considered staging for a short while.
> >
> > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > ---
> > Removing the RFC flag this time. Changes since RFCv3:
> >
> > * Included Tomasz and Hans feedback,
> > * Expanded the decoding section to better describe the use of requests,
> > * Use the tags API.
> >
> >  Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
> >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
> >  2 files changed, 404 insertions(+)
> >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > index c61e938bd8dc..3e6a3e883f11 100644
> > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > @@ -6,6 +6,11 @@
> >  Codec Interface
> >  ***************
> >
> > +.. toctree::
> > +    :maxdepth: 1
> > +
> > +    dev-stateless-decoder
> > +
> >  A V4L2 codec can compress, decompress, transform, or otherwise convert
> >  video data from one format into another format, in memory. Typically
> >  such devices are memory-to-memory devices (i.e. devices with the
> > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > new file mode 100644
> > index 000000000000..7a781c89bd59
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > @@ -0,0 +1,399 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _stateless_decoder:
> > +
> > +**************************************************
> > +Memory-to-memory Stateless Video Decoder Interface
> > +**************************************************
> > +
> > +A stateless decoder is a decoder that works without retaining any kind of state
> > +between processing frames. This means that each frame is decoded independently
> > +of any previous and future frames, and that the client is responsible for
> > +maintaining the decoding state and providing it to the decoder with each
> > +decoding request. This is in contrast to the stateful video decoder interface,
> > +where the hardware and driver maintain the decoding state and all the client
> > +has to do is to provide the raw encoded stream.
> > +
> > +This section describes how user-space ("the client") is expected to communicate
> > +with such decoders in order to successfully decode an encoded stream. Compared
> > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > +this simplicity is extra complexity in the client which must maintain a
> > +consistent decoding state.
> > +
> > +Stateless decoders make use of the request API and buffer tags. A stateless
> > +decoder must thus expose the following capabilities on its queues when
> > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > +
> > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > +  ``OUTPUT`` queue,
> > +
> > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > +  and ``CAPTURE`` queues,
> > +
>
> [...]
>
> > +Decoding
> > +========
> > +
> > +For each frame, the client is responsible for submitting a request to which the
> > +following is attached:
> > +
> > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > +  ``OUTPUT`` queue,
>
> Although this is still the case in the cedrus driver (but will be fixed
> eventually), this requirement should be dropped because metadata is
> per-slice and not per-picture in the formats we're currently aiming to
> support.
>
> I think it would be safer to mention something like filling the output
> buffer with the minimum unit size for the selected output format, to
> which the associated metadata applies.

I'm not sure it's a good idea. Some of the reasons why I think so:
 1) There are streams that can have even 32 slices. With that, you
instantly run out of V4L2 buffers even just for 1 frame.
 2) The Rockchip hardware which seems to just pick all the slices one
after another and which was the reason to actually put the slice data
in the buffer like that.
 3) Not all the metadata is per-slice. Actually most of the metadata
is per frame and only what is located inside v4l2_h264_slice_param is
per-slice. The corresponding control is an array, which has an entry
for each slice in the buffer. Each entry includes an offset field,
which points to the place in the buffer where the slice is located.

Best regards,
Tomasz

>
> > +* All the controls relevant to the format being decoded (see below for details).
> > +
> > +The contents of the source ``OUTPUT`` buffer, as well as the controls that must
> > +be set on the request, depend on the active coded pixel format and might be
> > +affected by codec-specific extended controls, as stated in documentation of each
> > +format.
> > +
> > +A typical frame would thus be decoded using the following sequence:
> > +
> > +1. Queue an ``OUTPUT`` buffer containing one frame worth of encoded bitstream
>
> Ditto here.
>
> > +   data for the decoding request, using :c:func:`VIDIOC_QBUF`.
> > +
> > +    * **Required fields:**
> > +
> > +      ``index``
> > +          index of the buffer being queued.
> > +
> > +      ``type``
> > +          type of the buffer.
> > +
> > +      ``bytesused``
> > +          number of bytes taken by the encoded data frame in the buffer.
> > +
> > +      ``flags``
> > +          the ``V4L2_BUF_FLAG_REQUEST_FD`` flag must be set. In addition, if
> > +       the decoded frame is to be used as a reference frame in the future,
> > +       then the ``V4L2_BUF_FLAG_TAG`` flag must be set (it can also be set
> > +       for non-reference frames if it helps the client).
> > +
> > +      ``request_fd``
> > +          must be set to the file descriptor of the decoding request.
> > +
> > +      ``tag``
> > +          if the ``V4L2_BUF_FLAG_TAG`` is set, then this must contain the tag
> > +          for the frame that will be copied into the decoded frame buffer, and
> > +          can be used to specify this frame as a reference frame for another
> > +          one.
> > +
> > +   .. note::
> > +
> > +     The API currently requires one frame of encoded data per ``OUTPUT`` buffer,
> > +     even though some encoded formats may present their data in smaller chunks
> > +     (e.g. H.264's frames can be made of several slices that can be processed
> > +     independently). It is currently the responsibility of the client to gather
> > +     the different parts of a frame into a single ``OUTPUT`` buffer, while
> > +     preserving the same layout as the original bitstream. This
> > +     restriction may be lifted in the future.
>
> And this part should probably be dropped too.
>
> Cheers,
>
> Paul
>
> > +2. Set the codec-specific controls for the decoding request, using
> > +   :c:func:`VIDIOC_S_EXT_CTRLS`.
> > +
> > +    * **Required fields:**
> > +
> > +      ``which``
> > +          must be ``V4L2_CTRL_WHICH_REQUEST_VAL``.
> > +
> > +      ``request_fd``
> > +          must be set to the file descriptor of the decoding request.
> > +
> > +      other fields
> > +          other fields are set as usual when setting controls. The ``controls``
> > +          array must contain all the codec-specific controls required to decode
> > +          a frame.
> > +
> > +   .. note::
> > +
> > +      It is possible to specify the controls in different invocations of
> > +      :c:func:`VIDIOC_S_EXT_CTRLS`, or to overwrite a previously set control, as
> > +      long as ``request_fd`` and ``which`` are properly set. The controls state
> > +      at the moment of request submission is the one that will be considered.
> > +
> > +   .. note::
> > +
> > +      The order in which steps 1 and 2 take place is interchangeable.
> > +
> > +3. Submit the request by invoking :c:func:`MEDIA_IOC_REQUEST_QUEUE` on the
> > +   request FD.
> > +
> > +    If the request is submitted without an ``OUTPUT`` buffer, or if some of the
> > +    required controls are missing from the request, then
> > +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one
> > +    ``OUTPUT`` buffer is queued, then it will return ``-EINVAL``.
> > +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` returning non-zero means that no
> > +    ``CAPTURE`` buffer will be produced for this request.
> > +
> > +``CAPTURE`` buffers must not be part of the request, and are queued
> > +independently. They are returned in decode order (i.e. the same order as
> > +``OUTPUT`` buffers were submitted).
> > +
> > +Runtime decoding errors are signaled by the dequeued ``CAPTURE`` buffers
> > +carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If a decoded reference frame has an
> > +error, then all following decoded frames that refer to it also have the
> > +``V4L2_BUF_FLAG_ERROR`` flag set, although the decoder will still try to
> > +produce a (likely corrupted) frame.
> > +
> > +Buffer management while decoding
> > +================================
> > +Contrary to stateful decoders, a stateless decoder does not perform any kind of
> > +buffer management: it only guarantees that dequeued ``CAPTURE`` buffer can be
> > +used by the client for as long as they are not queued again. "Used" here
> > +encompasses using the buffer for compositing, display, or as a reference frame
> > +to decode a subsequent frame.
> > +
> > +Reference frames are specified by using the same tag that was set to the
> > +``OUTPUT`` buffer of a frame into the relevant codec-specific structures that
> > +are submitted as controls. This tag will be copied to the corresponding
> > +``CAPTURE`` buffer, but can be used in any subsequent decoding request as soon
> > +as the decoding request for that buffer is queued successfully. This means that
> > +the client does not need to wait until a ``CAPTURE`` buffer with a given tag is
> > +dequeued to start using that tag in reference frames. However, it must wait
> > +until all frames referencing a given tag are dequeued before queuing the
> > +referenced ``CAPTURE`` buffer again, since queueing a buffer effectively removes
> > +its tag.
> > +
> > +When queuing a decoding request, the driver will increase the reference count of
> > +all the resources associated with reference frames. This means that the client
> > +can e.g. close the DMABUF file descriptors of the reference frame buffers if it
> > +won't need it afterwards, as long as the V4L2 ``CAPTURE`` buffer of the
> > +reference frame is not re-queued before all referencing frames are decoded.
> > +
> > +Seeking
> > +=======
> > +In order to seek, the client just needs to submit requests using input buffers
> > +corresponding to the new stream position. It must however be aware that
> > +resolution may have changed and follow the dynamic resolution change sequence in
> > +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> > +for H.264) may have changed and the client is responsible for making sure that a
> > +valid state is sent to the decoder.
> > +
> > +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> > +from the pre-seek position.
> > +
> > +Pause
> > +=====
> > +
> > +In order to pause, the client can just cease queuing buffers onto the ``OUTPUT``
> > +queue. Without source bitstream data, there is no data to process and the codec
> > +will remain idle.
> > +
> > +Dynamic resolution change
> > +=========================
> > +
> > +If the client detects a resolution change in the stream, it will need to perform
> > +the initialization sequence again with the new resolution:
> > +
> > +1. Wait until all submitted requests have completed and dequeue the
> > +   corresponding output buffers.
> > +
> > +2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE``
> > +   queues.
> > +
> > +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> > +   ``CAPTURE`` queue with a buffer count of zero.
> > +
> > +4. Perform the initialization sequence again (minus the allocation of
> > +   ``OUTPUT`` buffers), with the new resolution set on the ``OUTPUT`` queue.
> > +   Note that due to resolution constraints, a different format may need to be
> > +   picked on the ``CAPTURE`` queue.
> > +
> > +Drain
> > +=====
> > +
> > +In order to drain the stream on a stateless decoder, the client just needs to
> > +wait until all the submitted requests are completed. There is no need to send a
> > +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> > +decoder.
> > +
> > +End of stream
> > +=============
> > +
> > +When the client detects that the end of stream is reached, it can simply stop
> > +sending new frames to the decoder, drain the ``CAPTURE`` queue, and dispose of
> > +the decoder as needed.
> --
> Paul Kocialkowski, Bootlin (formerly Free Electrons)
> Embedded Linux and kernel engineering
> https://bootlin.com
>
Paul Kocialkowski Jan. 22, 2019, 10:10 a.m. UTC | #3
Hi,

On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote:
> Hi Paul,
> 
> On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
> <paul.kocialkowski@bootlin.com> wrote:
> > Hi,
> > 
> > Thanks for this new version! I only have one comment left, see below.
> > 
> > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > > Documents the protocol that user-space should follow when
> > > communicating with stateless video decoders.
> > > 
> > > The stateless video decoding API makes use of the new request and tags
> > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > should probably still be considered staging for a short while.
> > > 
> > > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > > ---
> > > Removing the RFC flag this time. Changes since RFCv3:
> > > 
> > > * Included Tomasz and Hans feedback,
> > > * Expanded the decoding section to better describe the use of requests,
> > > * Use the tags API.
> > > 
> > >  Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
> > >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
> > >  2 files changed, 404 insertions(+)
> > >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > 
> > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > index c61e938bd8dc..3e6a3e883f11 100644
> > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > @@ -6,6 +6,11 @@
> > >  Codec Interface
> > >  ***************
> > > 
> > > +.. toctree::
> > > +    :maxdepth: 1
> > > +
> > > +    dev-stateless-decoder
> > > +
> > >  A V4L2 codec can compress, decompress, transform, or otherwise convert
> > >  video data from one format into another format, in memory. Typically
> > >  such devices are memory-to-memory devices (i.e. devices with the
> > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > new file mode 100644
> > > index 000000000000..7a781c89bd59
> > > --- /dev/null
> > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > @@ -0,0 +1,399 @@
> > > +.. -*- coding: utf-8; mode: rst -*-
> > > +
> > > +.. _stateless_decoder:
> > > +
> > > +**************************************************
> > > +Memory-to-memory Stateless Video Decoder Interface
> > > +**************************************************
> > > +
> > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > +between processing frames. This means that each frame is decoded independently
> > > +of any previous and future frames, and that the client is responsible for
> > > +maintaining the decoding state and providing it to the decoder with each
> > > +decoding request. This is in contrast to the stateful video decoder interface,
> > > +where the hardware and driver maintain the decoding state and all the client
> > > +has to do is to provide the raw encoded stream.
> > > +
> > > +This section describes how user-space ("the client") is expected to communicate
> > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > > +this simplicity is extra complexity in the client which must maintain a
> > > +consistent decoding state.
> > > +
> > > +Stateless decoders make use of the request API and buffer tags. A stateless
> > > +decoder must thus expose the following capabilities on its queues when
> > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > > +
> > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > > +  ``OUTPUT`` queue,
> > > +
> > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > > +  and ``CAPTURE`` queues,
> > > +
> > 
> > [...]
> > 
> > > +Decoding
> > > +========
> > > +
> > > +For each frame, the client is responsible for submitting a request to which the
> > > +following is attached:
> > > +
> > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > +  ``OUTPUT`` queue,
> > 
> > Although this is still the case in the cedrus driver (but will be fixed
> > eventually), this requirement should be dropped because metadata is
> > per-slice and not per-picture in the formats we're currently aiming to
> > support.
> > 
> > I think it would be safer to mention something like filling the output
> > buffer with the minimum unit size for the selected output format, to
> > which the associated metadata applies.
> 
> I'm not sure it's a good idea. Some of the reasons why I think so:
>  1) There are streams that can have even 32 slices. With that, you
> instantly run out of V4L2 buffers even just for 1 frame.
>  2) The Rockchip hardware which seems to just pick all the slices one
> after another and which was the reason to actually put the slice data
> in the buffer like that.
>  3) Not all the metadata is per-slice. Actually most of the metadata
> is per frame and only what is located inside v4l2_h264_slice_param is
> per-slice. The corresponding control is an array, which has an entry
> for each slice in the buffer. Each entry includes an offset field,
> which points to the place in the buffer where the slice is located.

Sorry, I realize that my email wasn't very clear. What I meant to say
is that the spec should specify that "at least the minimum unit size
for decoding should be passed in a buffer" (that's maybe not the
clearest wording), instead of "one frame worth of".

I certainly don't mean to say that each slice should be held in a
separate buffer and totally agree with all the points you're making :)

I just think we should still allow userspace to pass slices with a
finer granularity than "all the slices required for one frame".

However, it looks like supporting this might be a problem for the
rockchip decoder though. Note that our Allwinner VPU can also process
all slices one after the other, but can be configured for slice-level
granularity while decoding (at least it looks that way).

Side point: After some discussions with Thierry Reading, who's looking
into the the Tegra VPU (also stateless), it seems that using the annex-
b format for h.264 would be best for everyone. So that means including
the start code, NAL header and "raw" slice data. I guess the same
should apply to other codecs too. But that should be in the associated
pixfmt spec, not in this general document.

What do yout think?

Cheers,

Paul

> Best regards,
> Tomasz
> 
> > > +* All the controls relevant to the format being decoded (see below for details).
> > > +
> > > +The contents of the source ``OUTPUT`` buffer, as well as the controls that must
> > > +be set on the request, depend on the active coded pixel format and might be
> > > +affected by codec-specific extended controls, as stated in documentation of each
> > > +format.
> > > +
> > > +A typical frame would thus be decoded using the following sequence:
> > > +
> > > +1. Queue an ``OUTPUT`` buffer containing one frame worth of encoded bitstream
> > 
> > Ditto here.
> > 
> > > +   data for the decoding request, using :c:func:`VIDIOC_QBUF`.
> > > +
> > > +    * **Required fields:**
> > > +
> > > +      ``index``
> > > +          index of the buffer being queued.
> > > +
> > > +      ``type``
> > > +          type of the buffer.
> > > +
> > > +      ``bytesused``
> > > +          number of bytes taken by the encoded data frame in the buffer.
> > > +
> > > +      ``flags``
> > > +          the ``V4L2_BUF_FLAG_REQUEST_FD`` flag must be set. In addition, if
> > > +       the decoded frame is to be used as a reference frame in the future,
> > > +       then the ``V4L2_BUF_FLAG_TAG`` flag must be set (it can also be set
> > > +       for non-reference frames if it helps the client).
> > > +
> > > +      ``request_fd``
> > > +          must be set to the file descriptor of the decoding request.
> > > +
> > > +      ``tag``
> > > +          if the ``V4L2_BUF_FLAG_TAG`` is set, then this must contain the tag
> > > +          for the frame that will be copied into the decoded frame buffer, and
> > > +          can be used to specify this frame as a reference frame for another
> > > +          one.
> > > +
> > > +   .. note::
> > > +
> > > +     The API currently requires one frame of encoded data per ``OUTPUT`` buffer,
> > > +     even though some encoded formats may present their data in smaller chunks
> > > +     (e.g. H.264's frames can be made of several slices that can be processed
> > > +     independently). It is currently the responsibility of the client to gather
> > > +     the different parts of a frame into a single ``OUTPUT`` buffer, while
> > > +     preserving the same layout as the original bitstream. This
> > > +     restriction may be lifted in the future.
> > 
> > And this part should probably be dropped too.
> > 
> > Cheers,
> > 
> > Paul
> > 
> > > +2. Set the codec-specific controls for the decoding request, using
> > > +   :c:func:`VIDIOC_S_EXT_CTRLS`.
> > > +
> > > +    * **Required fields:**
> > > +
> > > +      ``which``
> > > +          must be ``V4L2_CTRL_WHICH_REQUEST_VAL``.
> > > +
> > > +      ``request_fd``
> > > +          must be set to the file descriptor of the decoding request.
> > > +
> > > +      other fields
> > > +          other fields are set as usual when setting controls. The ``controls``
> > > +          array must contain all the codec-specific controls required to decode
> > > +          a frame.
> > > +
> > > +   .. note::
> > > +
> > > +      It is possible to specify the controls in different invocations of
> > > +      :c:func:`VIDIOC_S_EXT_CTRLS`, or to overwrite a previously set control, as
> > > +      long as ``request_fd`` and ``which`` are properly set. The controls state
> > > +      at the moment of request submission is the one that will be considered.
> > > +
> > > +   .. note::
> > > +
> > > +      The order in which steps 1 and 2 take place is interchangeable.
> > > +
> > > +3. Submit the request by invoking :c:func:`MEDIA_IOC_REQUEST_QUEUE` on the
> > > +   request FD.
> > > +
> > > +    If the request is submitted without an ``OUTPUT`` buffer, or if some of the
> > > +    required controls are missing from the request, then
> > > +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one
> > > +    ``OUTPUT`` buffer is queued, then it will return ``-EINVAL``.
> > > +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` returning non-zero means that no
> > > +    ``CAPTURE`` buffer will be produced for this request.
> > > +
> > > +``CAPTURE`` buffers must not be part of the request, and are queued
> > > +independently. They are returned in decode order (i.e. the same order as
> > > +``OUTPUT`` buffers were submitted).
> > > +
> > > +Runtime decoding errors are signaled by the dequeued ``CAPTURE`` buffers
> > > +carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If a decoded reference frame has an
> > > +error, then all following decoded frames that refer to it also have the
> > > +``V4L2_BUF_FLAG_ERROR`` flag set, although the decoder will still try to
> > > +produce a (likely corrupted) frame.
> > > +
> > > +Buffer management while decoding
> > > +================================
> > > +Contrary to stateful decoders, a stateless decoder does not perform any kind of
> > > +buffer management: it only guarantees that dequeued ``CAPTURE`` buffer can be
> > > +used by the client for as long as they are not queued again. "Used" here
> > > +encompasses using the buffer for compositing, display, or as a reference frame
> > > +to decode a subsequent frame.
> > > +
> > > +Reference frames are specified by using the same tag that was set to the
> > > +``OUTPUT`` buffer of a frame into the relevant codec-specific structures that
> > > +are submitted as controls. This tag will be copied to the corresponding
> > > +``CAPTURE`` buffer, but can be used in any subsequent decoding request as soon
> > > +as the decoding request for that buffer is queued successfully. This means that
> > > +the client does not need to wait until a ``CAPTURE`` buffer with a given tag is
> > > +dequeued to start using that tag in reference frames. However, it must wait
> > > +until all frames referencing a given tag are dequeued before queuing the
> > > +referenced ``CAPTURE`` buffer again, since queueing a buffer effectively removes
> > > +its tag.
> > > +
> > > +When queuing a decoding request, the driver will increase the reference count of
> > > +all the resources associated with reference frames. This means that the client
> > > +can e.g. close the DMABUF file descriptors of the reference frame buffers if it
> > > +won't need it afterwards, as long as the V4L2 ``CAPTURE`` buffer of the
> > > +reference frame is not re-queued before all referencing frames are decoded.
> > > +
> > > +Seeking
> > > +=======
> > > +In order to seek, the client just needs to submit requests using input buffers
> > > +corresponding to the new stream position. It must however be aware that
> > > +resolution may have changed and follow the dynamic resolution change sequence in
> > > +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> > > +for H.264) may have changed and the client is responsible for making sure that a
> > > +valid state is sent to the decoder.
> > > +
> > > +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> > > +from the pre-seek position.
> > > +
> > > +Pause
> > > +=====
> > > +
> > > +In order to pause, the client can just cease queuing buffers onto the ``OUTPUT``
> > > +queue. Without source bitstream data, there is no data to process and the codec
> > > +will remain idle.
> > > +
> > > +Dynamic resolution change
> > > +=========================
> > > +
> > > +If the client detects a resolution change in the stream, it will need to perform
> > > +the initialization sequence again with the new resolution:
> > > +
> > > +1. Wait until all submitted requests have completed and dequeue the
> > > +   corresponding output buffers.
> > > +
> > > +2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE``
> > > +   queues.
> > > +
> > > +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> > > +   ``CAPTURE`` queue with a buffer count of zero.
> > > +
> > > +4. Perform the initialization sequence again (minus the allocation of
> > > +   ``OUTPUT`` buffers), with the new resolution set on the ``OUTPUT`` queue.
> > > +   Note that due to resolution constraints, a different format may need to be
> > > +   picked on the ``CAPTURE`` queue.
> > > +
> > > +Drain
> > > +=====
> > > +
> > > +In order to drain the stream on a stateless decoder, the client just needs to
> > > +wait until all the submitted requests are completed. There is no need to send a
> > > +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> > > +decoder.
> > > +
> > > +End of stream
> > > +=============
> > > +
> > > +When the client detects that the end of stream is reached, it can simply stop
> > > +sending new frames to the decoder, drain the ``CAPTURE`` queue, and dispose of
> > > +the decoder as needed.
> > --
> > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > Embedded Linux and kernel engineering
> > https://bootlin.com
> >
Alexandre Courbot Jan. 23, 2019, 9:43 a.m. UTC | #4
On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi,
>
> On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote:
> > Hi Paul,
> >
> > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
> > <paul.kocialkowski@bootlin.com> wrote:
> > > Hi,
> > >
> > > Thanks for this new version! I only have one comment left, see below.
> > >
> > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > > > Documents the protocol that user-space should follow when
> > > > communicating with stateless video decoders.
> > > >
> > > > The stateless video decoding API makes use of the new request and tags
> > > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > > should probably still be considered staging for a short while.
> > > >
> > > > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > > > ---
> > > > Removing the RFC flag this time. Changes since RFCv3:
> > > >
> > > > * Included Tomasz and Hans feedback,
> > > > * Expanded the decoding section to better describe the use of requests,
> > > > * Use the tags API.
> > > >
> > > >  Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
> > > >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
> > > >  2 files changed, 404 insertions(+)
> > > >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > >
> > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > index c61e938bd8dc..3e6a3e883f11 100644
> > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > @@ -6,6 +6,11 @@
> > > >  Codec Interface
> > > >  ***************
> > > >
> > > > +.. toctree::
> > > > +    :maxdepth: 1
> > > > +
> > > > +    dev-stateless-decoder
> > > > +
> > > >  A V4L2 codec can compress, decompress, transform, or otherwise convert
> > > >  video data from one format into another format, in memory. Typically
> > > >  such devices are memory-to-memory devices (i.e. devices with the
> > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > new file mode 100644
> > > > index 000000000000..7a781c89bd59
> > > > --- /dev/null
> > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > @@ -0,0 +1,399 @@
> > > > +.. -*- coding: utf-8; mode: rst -*-
> > > > +
> > > > +.. _stateless_decoder:
> > > > +
> > > > +**************************************************
> > > > +Memory-to-memory Stateless Video Decoder Interface
> > > > +**************************************************
> > > > +
> > > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > > +between processing frames. This means that each frame is decoded independently
> > > > +of any previous and future frames, and that the client is responsible for
> > > > +maintaining the decoding state and providing it to the decoder with each
> > > > +decoding request. This is in contrast to the stateful video decoder interface,
> > > > +where the hardware and driver maintain the decoding state and all the client
> > > > +has to do is to provide the raw encoded stream.
> > > > +
> > > > +This section describes how user-space ("the client") is expected to communicate
> > > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > > > +this simplicity is extra complexity in the client which must maintain a
> > > > +consistent decoding state.
> > > > +
> > > > +Stateless decoders make use of the request API and buffer tags. A stateless
> > > > +decoder must thus expose the following capabilities on its queues when
> > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > > > +
> > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > > > +  ``OUTPUT`` queue,
> > > > +
> > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > > > +  and ``CAPTURE`` queues,
> > > > +
> > >
> > > [...]
> > >
> > > > +Decoding
> > > > +========
> > > > +
> > > > +For each frame, the client is responsible for submitting a request to which the
> > > > +following is attached:
> > > > +
> > > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > > +  ``OUTPUT`` queue,
> > >
> > > Although this is still the case in the cedrus driver (but will be fixed
> > > eventually), this requirement should be dropped because metadata is
> > > per-slice and not per-picture in the formats we're currently aiming to
> > > support.
> > >
> > > I think it would be safer to mention something like filling the output
> > > buffer with the minimum unit size for the selected output format, to
> > > which the associated metadata applies.
> >
> > I'm not sure it's a good idea. Some of the reasons why I think so:
> >  1) There are streams that can have even 32 slices. With that, you
> > instantly run out of V4L2 buffers even just for 1 frame.
> >  2) The Rockchip hardware which seems to just pick all the slices one
> > after another and which was the reason to actually put the slice data
> > in the buffer like that.
> >  3) Not all the metadata is per-slice. Actually most of the metadata
> > is per frame and only what is located inside v4l2_h264_slice_param is
> > per-slice. The corresponding control is an array, which has an entry
> > for each slice in the buffer. Each entry includes an offset field,
> > which points to the place in the buffer where the slice is located.
>
> Sorry, I realize that my email wasn't very clear. What I meant to say
> is that the spec should specify that "at least the minimum unit size
> for decoding should be passed in a buffer" (that's maybe not the
> clearest wording), instead of "one frame worth of".
>
> I certainly don't mean to say that each slice should be held in a
> separate buffer and totally agree with all the points you're making :)

Thanks for clarifying. I will update the document and post v3 accordingly.

> I just think we should still allow userspace to pass slices with a
> finer granularity than "all the slices required for one frame".

I'm afraid that doing so could open the door to some ambiguities. If
you allow that, then are you also allowed to send more than one frame
if the decode parameters do not change? How do drivers that only
support full frames react when handled only parts of a frame?

>
> However, it looks like supporting this might be a problem for the
> rockchip decoder though. Note that our Allwinner VPU can also process
> all slices one after the other, but can be configured for slice-level
> granularity while decoding (at least it looks that way).
>
> Side point: After some discussions with Thierry Reading, who's looking
> into the the Tegra VPU (also stateless), it seems that using the annex-
> b format for h.264 would be best for everyone. So that means including
> the start code, NAL header and "raw" slice data. I guess the same
> should apply to other codecs too. But that should be in the associated
> pixfmt spec, not in this general document.
>
> What do yout think?
>
> Cheers,
>
> Paul
>
> > Best regards,
> > Tomasz
> >
> > > > +* All the controls relevant to the format being decoded (see below for details).
> > > > +
> > > > +The contents of the source ``OUTPUT`` buffer, as well as the controls that must
> > > > +be set on the request, depend on the active coded pixel format and might be
> > > > +affected by codec-specific extended controls, as stated in documentation of each
> > > > +format.
> > > > +
> > > > +A typical frame would thus be decoded using the following sequence:
> > > > +
> > > > +1. Queue an ``OUTPUT`` buffer containing one frame worth of encoded bitstream
> > >
> > > Ditto here.
> > >
> > > > +   data for the decoding request, using :c:func:`VIDIOC_QBUF`.
> > > > +
> > > > +    * **Required fields:**
> > > > +
> > > > +      ``index``
> > > > +          index of the buffer being queued.
> > > > +
> > > > +      ``type``
> > > > +          type of the buffer.
> > > > +
> > > > +      ``bytesused``
> > > > +          number of bytes taken by the encoded data frame in the buffer.
> > > > +
> > > > +      ``flags``
> > > > +          the ``V4L2_BUF_FLAG_REQUEST_FD`` flag must be set. In addition, if
> > > > +       the decoded frame is to be used as a reference frame in the future,
> > > > +       then the ``V4L2_BUF_FLAG_TAG`` flag must be set (it can also be set
> > > > +       for non-reference frames if it helps the client).
> > > > +
> > > > +      ``request_fd``
> > > > +          must be set to the file descriptor of the decoding request.
> > > > +
> > > > +      ``tag``
> > > > +          if the ``V4L2_BUF_FLAG_TAG`` is set, then this must contain the tag
> > > > +          for the frame that will be copied into the decoded frame buffer, and
> > > > +          can be used to specify this frame as a reference frame for another
> > > > +          one.
> > > > +
> > > > +   .. note::
> > > > +
> > > > +     The API currently requires one frame of encoded data per ``OUTPUT`` buffer,
> > > > +     even though some encoded formats may present their data in smaller chunks
> > > > +     (e.g. H.264's frames can be made of several slices that can be processed
> > > > +     independently). It is currently the responsibility of the client to gather
> > > > +     the different parts of a frame into a single ``OUTPUT`` buffer, while
> > > > +     preserving the same layout as the original bitstream. This
> > > > +     restriction may be lifted in the future.
> > >
> > > And this part should probably be dropped too.
> > >
> > > Cheers,
> > >
> > > Paul
> > >
> > > > +2. Set the codec-specific controls for the decoding request, using
> > > > +   :c:func:`VIDIOC_S_EXT_CTRLS`.
> > > > +
> > > > +    * **Required fields:**
> > > > +
> > > > +      ``which``
> > > > +          must be ``V4L2_CTRL_WHICH_REQUEST_VAL``.
> > > > +
> > > > +      ``request_fd``
> > > > +          must be set to the file descriptor of the decoding request.
> > > > +
> > > > +      other fields
> > > > +          other fields are set as usual when setting controls. The ``controls``
> > > > +          array must contain all the codec-specific controls required to decode
> > > > +          a frame.
> > > > +
> > > > +   .. note::
> > > > +
> > > > +      It is possible to specify the controls in different invocations of
> > > > +      :c:func:`VIDIOC_S_EXT_CTRLS`, or to overwrite a previously set control, as
> > > > +      long as ``request_fd`` and ``which`` are properly set. The controls state
> > > > +      at the moment of request submission is the one that will be considered.
> > > > +
> > > > +   .. note::
> > > > +
> > > > +      The order in which steps 1 and 2 take place is interchangeable.
> > > > +
> > > > +3. Submit the request by invoking :c:func:`MEDIA_IOC_REQUEST_QUEUE` on the
> > > > +   request FD.
> > > > +
> > > > +    If the request is submitted without an ``OUTPUT`` buffer, or if some of the
> > > > +    required controls are missing from the request, then
> > > > +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one
> > > > +    ``OUTPUT`` buffer is queued, then it will return ``-EINVAL``.
> > > > +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` returning non-zero means that no
> > > > +    ``CAPTURE`` buffer will be produced for this request.
> > > > +
> > > > +``CAPTURE`` buffers must not be part of the request, and are queued
> > > > +independently. They are returned in decode order (i.e. the same order as
> > > > +``OUTPUT`` buffers were submitted).
> > > > +
> > > > +Runtime decoding errors are signaled by the dequeued ``CAPTURE`` buffers
> > > > +carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If a decoded reference frame has an
> > > > +error, then all following decoded frames that refer to it also have the
> > > > +``V4L2_BUF_FLAG_ERROR`` flag set, although the decoder will still try to
> > > > +produce a (likely corrupted) frame.
> > > > +
> > > > +Buffer management while decoding
> > > > +================================
> > > > +Contrary to stateful decoders, a stateless decoder does not perform any kind of
> > > > +buffer management: it only guarantees that dequeued ``CAPTURE`` buffer can be
> > > > +used by the client for as long as they are not queued again. "Used" here
> > > > +encompasses using the buffer for compositing, display, or as a reference frame
> > > > +to decode a subsequent frame.
> > > > +
> > > > +Reference frames are specified by using the same tag that was set to the
> > > > +``OUTPUT`` buffer of a frame into the relevant codec-specific structures that
> > > > +are submitted as controls. This tag will be copied to the corresponding
> > > > +``CAPTURE`` buffer, but can be used in any subsequent decoding request as soon
> > > > +as the decoding request for that buffer is queued successfully. This means that
> > > > +the client does not need to wait until a ``CAPTURE`` buffer with a given tag is
> > > > +dequeued to start using that tag in reference frames. However, it must wait
> > > > +until all frames referencing a given tag are dequeued before queuing the
> > > > +referenced ``CAPTURE`` buffer again, since queueing a buffer effectively removes
> > > > +its tag.
> > > > +
> > > > +When queuing a decoding request, the driver will increase the reference count of
> > > > +all the resources associated with reference frames. This means that the client
> > > > +can e.g. close the DMABUF file descriptors of the reference frame buffers if it
> > > > +won't need it afterwards, as long as the V4L2 ``CAPTURE`` buffer of the
> > > > +reference frame is not re-queued before all referencing frames are decoded.
> > > > +
> > > > +Seeking
> > > > +=======
> > > > +In order to seek, the client just needs to submit requests using input buffers
> > > > +corresponding to the new stream position. It must however be aware that
> > > > +resolution may have changed and follow the dynamic resolution change sequence in
> > > > +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> > > > +for H.264) may have changed and the client is responsible for making sure that a
> > > > +valid state is sent to the decoder.
> > > > +
> > > > +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> > > > +from the pre-seek position.
> > > > +
> > > > +Pause
> > > > +=====
> > > > +
> > > > +In order to pause, the client can just cease queuing buffers onto the ``OUTPUT``
> > > > +queue. Without source bitstream data, there is no data to process and the codec
> > > > +will remain idle.
> > > > +
> > > > +Dynamic resolution change
> > > > +=========================
> > > > +
> > > > +If the client detects a resolution change in the stream, it will need to perform
> > > > +the initialization sequence again with the new resolution:
> > > > +
> > > > +1. Wait until all submitted requests have completed and dequeue the
> > > > +   corresponding output buffers.
> > > > +
> > > > +2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE``
> > > > +   queues.
> > > > +
> > > > +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> > > > +   ``CAPTURE`` queue with a buffer count of zero.
> > > > +
> > > > +4. Perform the initialization sequence again (minus the allocation of
> > > > +   ``OUTPUT`` buffers), with the new resolution set on the ``OUTPUT`` queue.
> > > > +   Note that due to resolution constraints, a different format may need to be
> > > > +   picked on the ``CAPTURE`` queue.
> > > > +
> > > > +Drain
> > > > +=====
> > > > +
> > > > +In order to drain the stream on a stateless decoder, the client just needs to
> > > > +wait until all the submitted requests are completed. There is no need to send a
> > > > +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> > > > +decoder.
> > > > +
> > > > +End of stream
> > > > +=============
> > > > +
> > > > +When the client detects that the end of stream is reached, it can simply stop
> > > > +sending new frames to the decoder, drain the ``CAPTURE`` queue, and dispose of
> > > > +the decoder as needed.
> > > --
> > > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > > Embedded Linux and kernel engineering
> > > https://bootlin.com
> > >
> --
> Paul Kocialkowski, Bootlin
> Embedded Linux and kernel engineering
> https://bootlin.com
>
Paul Kocialkowski Jan. 23, 2019, 10:41 a.m. UTC | #5
Hi Alex,

On Wed, 2019-01-23 at 18:43 +0900, Alexandre Courbot wrote:
> On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski
> <paul.kocialkowski@bootlin.com> wrote:
> > Hi,
> > 
> > On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote:
> > > Hi Paul,
> > > 
> > > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
> > > <paul.kocialkowski@bootlin.com> wrote:
> > > > Hi,
> > > > 
> > > > Thanks for this new version! I only have one comment left, see below.
> > > > 
> > > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > > > > Documents the protocol that user-space should follow when
> > > > > communicating with stateless video decoders.
> > > > > 
> > > > > The stateless video decoding API makes use of the new request and tags
> > > > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > > > should probably still be considered staging for a short while.
> > > > > 
> > > > > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > > > > ---
> > > > > Removing the RFC flag this time. Changes since RFCv3:
> > > > > 
> > > > > * Included Tomasz and Hans feedback,
> > > > > * Expanded the decoding section to better describe the use of requests,
> > > > > * Use the tags API.
> > > > > 
> > > > >  Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
> > > > >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
> > > > >  2 files changed, 404 insertions(+)
> > > > >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > 
> > > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > index c61e938bd8dc..3e6a3e883f11 100644
> > > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > @@ -6,6 +6,11 @@
> > > > >  Codec Interface
> > > > >  ***************
> > > > > 
> > > > > +.. toctree::
> > > > > +    :maxdepth: 1
> > > > > +
> > > > > +    dev-stateless-decoder
> > > > > +
> > > > >  A V4L2 codec can compress, decompress, transform, or otherwise convert
> > > > >  video data from one format into another format, in memory. Typically
> > > > >  such devices are memory-to-memory devices (i.e. devices with the
> > > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > new file mode 100644
> > > > > index 000000000000..7a781c89bd59
> > > > > --- /dev/null
> > > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > @@ -0,0 +1,399 @@
> > > > > +.. -*- coding: utf-8; mode: rst -*-
> > > > > +
> > > > > +.. _stateless_decoder:
> > > > > +
> > > > > +**************************************************
> > > > > +Memory-to-memory Stateless Video Decoder Interface
> > > > > +**************************************************
> > > > > +
> > > > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > > > +between processing frames. This means that each frame is decoded independently
> > > > > +of any previous and future frames, and that the client is responsible for
> > > > > +maintaining the decoding state and providing it to the decoder with each
> > > > > +decoding request. This is in contrast to the stateful video decoder interface,
> > > > > +where the hardware and driver maintain the decoding state and all the client
> > > > > +has to do is to provide the raw encoded stream.
> > > > > +
> > > > > +This section describes how user-space ("the client") is expected to communicate
> > > > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > > > > +this simplicity is extra complexity in the client which must maintain a
> > > > > +consistent decoding state.
> > > > > +
> > > > > +Stateless decoders make use of the request API and buffer tags. A stateless
> > > > > +decoder must thus expose the following capabilities on its queues when
> > > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > > > > +
> > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > > > > +  ``OUTPUT`` queue,
> > > > > +
> > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > > > > +  and ``CAPTURE`` queues,
> > > > > +
> > > > 
> > > > [...]
> > > > 
> > > > > +Decoding
> > > > > +========
> > > > > +
> > > > > +For each frame, the client is responsible for submitting a request to which the
> > > > > +following is attached:
> > > > > +
> > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > > > +  ``OUTPUT`` queue,
> > > > 
> > > > Although this is still the case in the cedrus driver (but will be fixed
> > > > eventually), this requirement should be dropped because metadata is
> > > > per-slice and not per-picture in the formats we're currently aiming to
> > > > support.
> > > > 
> > > > I think it would be safer to mention something like filling the output
> > > > buffer with the minimum unit size for the selected output format, to
> > > > which the associated metadata applies.
> > > 
> > > I'm not sure it's a good idea. Some of the reasons why I think so:
> > >  1) There are streams that can have even 32 slices. With that, you
> > > instantly run out of V4L2 buffers even just for 1 frame.
> > >  2) The Rockchip hardware which seems to just pick all the slices one
> > > after another and which was the reason to actually put the slice data
> > > in the buffer like that.
> > >  3) Not all the metadata is per-slice. Actually most of the metadata
> > > is per frame and only what is located inside v4l2_h264_slice_param is
> > > per-slice. The corresponding control is an array, which has an entry
> > > for each slice in the buffer. Each entry includes an offset field,
> > > which points to the place in the buffer where the slice is located.
> > 
> > Sorry, I realize that my email wasn't very clear. What I meant to say
> > is that the spec should specify that "at least the minimum unit size
> > for decoding should be passed in a buffer" (that's maybe not the
> > clearest wording), instead of "one frame worth of".
> > 
> > I certainly don't mean to say that each slice should be held in a
> > separate buffer and totally agree with all the points you're making :)
> 
> Thanks for clarifying. I will update the document and post v3 accordingly.
> 
> > I just think we should still allow userspace to pass slices with a
> > finer granularity than "all the slices required for one frame".
> 
> I'm afraid that doing so could open the door to some ambiguities. If
> you allow that, then are you also allowed to send more than one frame
> if the decode parameters do not change? How do drivers that only
> support full frames react when handled only parts of a frame?

IIRC the ability to pass individual slices was brought up regarding a
potential latency benefit, but I doubt it would really be that
significant.

Thinking about it with the points you mentionned in mind, I guess the
downsides are much more significant than the potential gain.

So let's stick with requiring all the slices for a frame then!

Cheers,

Paul

> > However, it looks like supporting this might be a problem for the
> > rockchip decoder though. Note that our Allwinner VPU can also process
> > all slices one after the other, but can be configured for slice-level
> > granularity while decoding (at least it looks that way).
> > 
> > Side point: After some discussions with Thierry Reading, who's looking
> > into the the Tegra VPU (also stateless), it seems that using the annex-
> > b format for h.264 would be best for everyone. So that means including
> > the start code, NAL header and "raw" slice data. I guess the same
> > should apply to other codecs too. But that should be in the associated
> > pixfmt spec, not in this general document.
> > 
> > What do yout think?
> > 
> > Cheers,
> > 
> > Paul
> > 
> > > Best regards,
> > > Tomasz
> > > 
> > > > > +* All the controls relevant to the format being decoded (see below for details).
> > > > > +
> > > > > +The contents of the source ``OUTPUT`` buffer, as well as the controls that must
> > > > > +be set on the request, depend on the active coded pixel format and might be
> > > > > +affected by codec-specific extended controls, as stated in documentation of each
> > > > > +format.
> > > > > +
> > > > > +A typical frame would thus be decoded using the following sequence:
> > > > > +
> > > > > +1. Queue an ``OUTPUT`` buffer containing one frame worth of encoded bitstream
> > > > 
> > > > Ditto here.
> > > > 
> > > > > +   data for the decoding request, using :c:func:`VIDIOC_QBUF`.
> > > > > +
> > > > > +    * **Required fields:**
> > > > > +
> > > > > +      ``index``
> > > > > +          index of the buffer being queued.
> > > > > +
> > > > > +      ``type``
> > > > > +          type of the buffer.
> > > > > +
> > > > > +      ``bytesused``
> > > > > +          number of bytes taken by the encoded data frame in the buffer.
> > > > > +
> > > > > +      ``flags``
> > > > > +          the ``V4L2_BUF_FLAG_REQUEST_FD`` flag must be set. In addition, if
> > > > > +       the decoded frame is to be used as a reference frame in the future,
> > > > > +       then the ``V4L2_BUF_FLAG_TAG`` flag must be set (it can also be set
> > > > > +       for non-reference frames if it helps the client).
> > > > > +
> > > > > +      ``request_fd``
> > > > > +          must be set to the file descriptor of the decoding request.
> > > > > +
> > > > > +      ``tag``
> > > > > +          if the ``V4L2_BUF_FLAG_TAG`` is set, then this must contain the tag
> > > > > +          for the frame that will be copied into the decoded frame buffer, and
> > > > > +          can be used to specify this frame as a reference frame for another
> > > > > +          one.
> > > > > +
> > > > > +   .. note::
> > > > > +
> > > > > +     The API currently requires one frame of encoded data per ``OUTPUT`` buffer,
> > > > > +     even though some encoded formats may present their data in smaller chunks
> > > > > +     (e.g. H.264's frames can be made of several slices that can be processed
> > > > > +     independently). It is currently the responsibility of the client to gather
> > > > > +     the different parts of a frame into a single ``OUTPUT`` buffer, while
> > > > > +     preserving the same layout as the original bitstream. This
> > > > > +     restriction may be lifted in the future.
> > > > 
> > > > And this part should probably be dropped too.
> > > > 
> > > > Cheers,
> > > > 
> > > > Paul
> > > > 
> > > > > +2. Set the codec-specific controls for the decoding request, using
> > > > > +   :c:func:`VIDIOC_S_EXT_CTRLS`.
> > > > > +
> > > > > +    * **Required fields:**
> > > > > +
> > > > > +      ``which``
> > > > > +          must be ``V4L2_CTRL_WHICH_REQUEST_VAL``.
> > > > > +
> > > > > +      ``request_fd``
> > > > > +          must be set to the file descriptor of the decoding request.
> > > > > +
> > > > > +      other fields
> > > > > +          other fields are set as usual when setting controls. The ``controls``
> > > > > +          array must contain all the codec-specific controls required to decode
> > > > > +          a frame.
> > > > > +
> > > > > +   .. note::
> > > > > +
> > > > > +      It is possible to specify the controls in different invocations of
> > > > > +      :c:func:`VIDIOC_S_EXT_CTRLS`, or to overwrite a previously set control, as
> > > > > +      long as ``request_fd`` and ``which`` are properly set. The controls state
> > > > > +      at the moment of request submission is the one that will be considered.
> > > > > +
> > > > > +   .. note::
> > > > > +
> > > > > +      The order in which steps 1 and 2 take place is interchangeable.
> > > > > +
> > > > > +3. Submit the request by invoking :c:func:`MEDIA_IOC_REQUEST_QUEUE` on the
> > > > > +   request FD.
> > > > > +
> > > > > +    If the request is submitted without an ``OUTPUT`` buffer, or if some of the
> > > > > +    required controls are missing from the request, then
> > > > > +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one
> > > > > +    ``OUTPUT`` buffer is queued, then it will return ``-EINVAL``.
> > > > > +    :c:func:`MEDIA_REQUEST_IOC_QUEUE` returning non-zero means that no
> > > > > +    ``CAPTURE`` buffer will be produced for this request.
> > > > > +
> > > > > +``CAPTURE`` buffers must not be part of the request, and are queued
> > > > > +independently. They are returned in decode order (i.e. the same order as
> > > > > +``OUTPUT`` buffers were submitted).
> > > > > +
> > > > > +Runtime decoding errors are signaled by the dequeued ``CAPTURE`` buffers
> > > > > +carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If a decoded reference frame has an
> > > > > +error, then all following decoded frames that refer to it also have the
> > > > > +``V4L2_BUF_FLAG_ERROR`` flag set, although the decoder will still try to
> > > > > +produce a (likely corrupted) frame.
> > > > > +
> > > > > +Buffer management while decoding
> > > > > +================================
> > > > > +Contrary to stateful decoders, a stateless decoder does not perform any kind of
> > > > > +buffer management: it only guarantees that dequeued ``CAPTURE`` buffer can be
> > > > > +used by the client for as long as they are not queued again. "Used" here
> > > > > +encompasses using the buffer for compositing, display, or as a reference frame
> > > > > +to decode a subsequent frame.
> > > > > +
> > > > > +Reference frames are specified by using the same tag that was set to the
> > > > > +``OUTPUT`` buffer of a frame into the relevant codec-specific structures that
> > > > > +are submitted as controls. This tag will be copied to the corresponding
> > > > > +``CAPTURE`` buffer, but can be used in any subsequent decoding request as soon
> > > > > +as the decoding request for that buffer is queued successfully. This means that
> > > > > +the client does not need to wait until a ``CAPTURE`` buffer with a given tag is
> > > > > +dequeued to start using that tag in reference frames. However, it must wait
> > > > > +until all frames referencing a given tag are dequeued before queuing the
> > > > > +referenced ``CAPTURE`` buffer again, since queueing a buffer effectively removes
> > > > > +its tag.
> > > > > +
> > > > > +When queuing a decoding request, the driver will increase the reference count of
> > > > > +all the resources associated with reference frames. This means that the client
> > > > > +can e.g. close the DMABUF file descriptors of the reference frame buffers if it
> > > > > +won't need it afterwards, as long as the V4L2 ``CAPTURE`` buffer of the
> > > > > +reference frame is not re-queued before all referencing frames are decoded.
> > > > > +
> > > > > +Seeking
> > > > > +=======
> > > > > +In order to seek, the client just needs to submit requests using input buffers
> > > > > +corresponding to the new stream position. It must however be aware that
> > > > > +resolution may have changed and follow the dynamic resolution change sequence in
> > > > > +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> > > > > +for H.264) may have changed and the client is responsible for making sure that a
> > > > > +valid state is sent to the decoder.
> > > > > +
> > > > > +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> > > > > +from the pre-seek position.
> > > > > +
> > > > > +Pause
> > > > > +=====
> > > > > +
> > > > > +In order to pause, the client can just cease queuing buffers onto the ``OUTPUT``
> > > > > +queue. Without source bitstream data, there is no data to process and the codec
> > > > > +will remain idle.
> > > > > +
> > > > > +Dynamic resolution change
> > > > > +=========================
> > > > > +
> > > > > +If the client detects a resolution change in the stream, it will need to perform
> > > > > +the initialization sequence again with the new resolution:
> > > > > +
> > > > > +1. Wait until all submitted requests have completed and dequeue the
> > > > > +   corresponding output buffers.
> > > > > +
> > > > > +2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE``
> > > > > +   queues.
> > > > > +
> > > > > +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> > > > > +   ``CAPTURE`` queue with a buffer count of zero.
> > > > > +
> > > > > +4. Perform the initialization sequence again (minus the allocation of
> > > > > +   ``OUTPUT`` buffers), with the new resolution set on the ``OUTPUT`` queue.
> > > > > +   Note that due to resolution constraints, a different format may need to be
> > > > > +   picked on the ``CAPTURE`` queue.
> > > > > +
> > > > > +Drain
> > > > > +=====
> > > > > +
> > > > > +In order to drain the stream on a stateless decoder, the client just needs to
> > > > > +wait until all the submitted requests are completed. There is no need to send a
> > > > > +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> > > > > +decoder.
> > > > > +
> > > > > +End of stream
> > > > > +=============
> > > > > +
> > > > > +When the client detects that the end of stream is reached, it can simply stop
> > > > > +sending new frames to the decoder, drain the ``CAPTURE`` queue, and dispose of
> > > > > +the decoder as needed.
> > > > --
> > > > Paul Kocialkowski, Bootlin (formerly Free Electrons)
> > > > Embedded Linux and kernel engineering
> > > > https://bootlin.com
> > > > 
> > --
> > Paul Kocialkowski, Bootlin
> > Embedded Linux and kernel engineering
> > https://bootlin.com
> >
Tomasz Figa Jan. 24, 2019, 8:07 a.m. UTC | #6
On Wed, Jan 23, 2019 at 7:42 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi Alex,
>
> On Wed, 2019-01-23 at 18:43 +0900, Alexandre Courbot wrote:
> > On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski
> > <paul.kocialkowski@bootlin.com> wrote:
> > > Hi,
> > >
> > > On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote:
> > > > Hi Paul,
> > > >
> > > > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
> > > > <paul.kocialkowski@bootlin.com> wrote:
> > > > > Hi,
> > > > >
> > > > > Thanks for this new version! I only have one comment left, see below.
> > > > >
> > > > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > > > > > Documents the protocol that user-space should follow when
> > > > > > communicating with stateless video decoders.
> > > > > >
> > > > > > The stateless video decoding API makes use of the new request and tags
> > > > > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > > > > should probably still be considered staging for a short while.
> > > > > >
> > > > > > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > > > > > ---
> > > > > > Removing the RFC flag this time. Changes since RFCv3:
> > > > > >
> > > > > > * Included Tomasz and Hans feedback,
> > > > > > * Expanded the decoding section to better describe the use of requests,
> > > > > > * Use the tags API.
> > > > > >
> > > > > >  Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
> > > > > >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
> > > > > >  2 files changed, 404 insertions(+)
> > > > > >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > >
> > > > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > index c61e938bd8dc..3e6a3e883f11 100644
> > > > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > @@ -6,6 +6,11 @@
> > > > > >  Codec Interface
> > > > > >  ***************
> > > > > >
> > > > > > +.. toctree::
> > > > > > +    :maxdepth: 1
> > > > > > +
> > > > > > +    dev-stateless-decoder
> > > > > > +
> > > > > >  A V4L2 codec can compress, decompress, transform, or otherwise convert
> > > > > >  video data from one format into another format, in memory. Typically
> > > > > >  such devices are memory-to-memory devices (i.e. devices with the
> > > > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > new file mode 100644
> > > > > > index 000000000000..7a781c89bd59
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > @@ -0,0 +1,399 @@
> > > > > > +.. -*- coding: utf-8; mode: rst -*-
> > > > > > +
> > > > > > +.. _stateless_decoder:
> > > > > > +
> > > > > > +**************************************************
> > > > > > +Memory-to-memory Stateless Video Decoder Interface
> > > > > > +**************************************************
> > > > > > +
> > > > > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > > > > +between processing frames. This means that each frame is decoded independently
> > > > > > +of any previous and future frames, and that the client is responsible for
> > > > > > +maintaining the decoding state and providing it to the decoder with each
> > > > > > +decoding request. This is in contrast to the stateful video decoder interface,
> > > > > > +where the hardware and driver maintain the decoding state and all the client
> > > > > > +has to do is to provide the raw encoded stream.
> > > > > > +
> > > > > > +This section describes how user-space ("the client") is expected to communicate
> > > > > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > > > > > +this simplicity is extra complexity in the client which must maintain a
> > > > > > +consistent decoding state.
> > > > > > +
> > > > > > +Stateless decoders make use of the request API and buffer tags. A stateless
> > > > > > +decoder must thus expose the following capabilities on its queues when
> > > > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > > > > > +
> > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > > > > > +  ``OUTPUT`` queue,
> > > > > > +
> > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > > > > > +  and ``CAPTURE`` queues,
> > > > > > +
> > > > >
> > > > > [...]
> > > > >
> > > > > > +Decoding
> > > > > > +========
> > > > > > +
> > > > > > +For each frame, the client is responsible for submitting a request to which the
> > > > > > +following is attached:
> > > > > > +
> > > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > > > > +  ``OUTPUT`` queue,
> > > > >
> > > > > Although this is still the case in the cedrus driver (but will be fixed
> > > > > eventually), this requirement should be dropped because metadata is
> > > > > per-slice and not per-picture in the formats we're currently aiming to
> > > > > support.
> > > > >
> > > > > I think it would be safer to mention something like filling the output
> > > > > buffer with the minimum unit size for the selected output format, to
> > > > > which the associated metadata applies.
> > > >
> > > > I'm not sure it's a good idea. Some of the reasons why I think so:
> > > >  1) There are streams that can have even 32 slices. With that, you
> > > > instantly run out of V4L2 buffers even just for 1 frame.
> > > >  2) The Rockchip hardware which seems to just pick all the slices one
> > > > after another and which was the reason to actually put the slice data
> > > > in the buffer like that.
> > > >  3) Not all the metadata is per-slice. Actually most of the metadata
> > > > is per frame and only what is located inside v4l2_h264_slice_param is
> > > > per-slice. The corresponding control is an array, which has an entry
> > > > for each slice in the buffer. Each entry includes an offset field,
> > > > which points to the place in the buffer where the slice is located.
> > >
> > > Sorry, I realize that my email wasn't very clear. What I meant to say
> > > is that the spec should specify that "at least the minimum unit size
> > > for decoding should be passed in a buffer" (that's maybe not the
> > > clearest wording), instead of "one frame worth of".
> > >
> > > I certainly don't mean to say that each slice should be held in a
> > > separate buffer and totally agree with all the points you're making :)
> >
> > Thanks for clarifying. I will update the document and post v3 accordingly.
> >
> > > I just think we should still allow userspace to pass slices with a
> > > finer granularity than "all the slices required for one frame".
> >
> > I'm afraid that doing so could open the door to some ambiguities. If
> > you allow that, then are you also allowed to send more than one frame
> > if the decode parameters do not change? How do drivers that only
> > support full frames react when handled only parts of a frame?
>
> IIRC the ability to pass individual slices was brought up regarding a
> potential latency benefit, but I doubt it would really be that
> significant.
>
> Thinking about it with the points you mentionned in mind, I guess the
> downsides are much more significant than the potential gain.
>
> So let's stick with requiring all the slices for a frame then!

Ack.

My view is that we can still loosen this requirement in the future,
possibly behind some driver capability flag, but starting with a
simpler API, with less freedom to the applications and less
constraints on hardware support sounds like a better practice in
general.

>
> Cheers,
>
> Paul
>
> > > However, it looks like supporting this might be a problem for the
> > > rockchip decoder though. Note that our Allwinner VPU can also process
> > > all slices one after the other, but can be configured for slice-level
> > > granularity while decoding (at least it looks that way).
> > >
> > > Side point: After some discussions with Thierry Reading, who's looking
> > > into the the Tegra VPU (also stateless), it seems that using the annex-
> > > b format for h.264 would be best for everyone. So that means including
> > > the start code, NAL header and "raw" slice data. I guess the same
> > > should apply to other codecs too. But that should be in the associated
> > > pixfmt spec, not in this general document.
> > >
> > > What do yout think?

Hmm, wouldn't that effectively make it the same as V4L2_PIX_FMT_H264?

By the way, I proposed it once, some time ago, but it was rejected
because VAAPI didn't get the full annex B stream and a V4L2 stateless
VAAPI backend would have to reconstruct the stream.

Best regards,
Tomasz
Paul Kocialkowski Jan. 24, 2019, 8:59 a.m. UTC | #7
Hi,

On Thu, 2019-01-24 at 17:07 +0900, Tomasz Figa wrote:
> On Wed, Jan 23, 2019 at 7:42 PM Paul Kocialkowski
> <paul.kocialkowski@bootlin.com> wrote:
> > Hi Alex,
> > 
> > On Wed, 2019-01-23 at 18:43 +0900, Alexandre Courbot wrote:
> > > On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski
> > > <paul.kocialkowski@bootlin.com> wrote:
> > > > Hi,
> > > > 
> > > > On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote:
> > > > > Hi Paul,
> > > > > 
> > > > > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
> > > > > <paul.kocialkowski@bootlin.com> wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > Thanks for this new version! I only have one comment left, see below.
> > > > > > 
> > > > > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > > > > > > Documents the protocol that user-space should follow when
> > > > > > > communicating with stateless video decoders.
> > > > > > > 
> > > > > > > The stateless video decoding API makes use of the new request and tags
> > > > > > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > > > > > should probably still be considered staging for a short while.
> > > > > > > 
> > > > > > > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > > > > > > ---
> > > > > > > Removing the RFC flag this time. Changes since RFCv3:
> > > > > > > 
> > > > > > > * Included Tomasz and Hans feedback,
> > > > > > > * Expanded the decoding section to better describe the use of requests,
> > > > > > > * Use the tags API.
> > > > > > > 
> > > > > > >  Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
> > > > > > >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
> > > > > > >  2 files changed, 404 insertions(+)
> > > > > > >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > > 
> > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > > index c61e938bd8dc..3e6a3e883f11 100644
> > > > > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > > @@ -6,6 +6,11 @@
> > > > > > >  Codec Interface
> > > > > > >  ***************
> > > > > > > 
> > > > > > > +.. toctree::
> > > > > > > +    :maxdepth: 1
> > > > > > > +
> > > > > > > +    dev-stateless-decoder
> > > > > > > +
> > > > > > >  A V4L2 codec can compress, decompress, transform, or otherwise convert
> > > > > > >  video data from one format into another format, in memory. Typically
> > > > > > >  such devices are memory-to-memory devices (i.e. devices with the
> > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..7a781c89bd59
> > > > > > > --- /dev/null
> > > > > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > > @@ -0,0 +1,399 @@
> > > > > > > +.. -*- coding: utf-8; mode: rst -*-
> > > > > > > +
> > > > > > > +.. _stateless_decoder:
> > > > > > > +
> > > > > > > +**************************************************
> > > > > > > +Memory-to-memory Stateless Video Decoder Interface
> > > > > > > +**************************************************
> > > > > > > +
> > > > > > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > > > > > +between processing frames. This means that each frame is decoded independently
> > > > > > > +of any previous and future frames, and that the client is responsible for
> > > > > > > +maintaining the decoding state and providing it to the decoder with each
> > > > > > > +decoding request. This is in contrast to the stateful video decoder interface,
> > > > > > > +where the hardware and driver maintain the decoding state and all the client
> > > > > > > +has to do is to provide the raw encoded stream.
> > > > > > > +
> > > > > > > +This section describes how user-space ("the client") is expected to communicate
> > > > > > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > > > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > > > > > > +this simplicity is extra complexity in the client which must maintain a
> > > > > > > +consistent decoding state.
> > > > > > > +
> > > > > > > +Stateless decoders make use of the request API and buffer tags. A stateless
> > > > > > > +decoder must thus expose the following capabilities on its queues when
> > > > > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > > > > > > +
> > > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > > > > > > +  ``OUTPUT`` queue,
> > > > > > > +
> > > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > > > > > > +  and ``CAPTURE`` queues,
> > > > > > > +
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > +Decoding
> > > > > > > +========
> > > > > > > +
> > > > > > > +For each frame, the client is responsible for submitting a request to which the
> > > > > > > +following is attached:
> > > > > > > +
> > > > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > > > > > +  ``OUTPUT`` queue,
> > > > > > 
> > > > > > Although this is still the case in the cedrus driver (but will be fixed
> > > > > > eventually), this requirement should be dropped because metadata is
> > > > > > per-slice and not per-picture in the formats we're currently aiming to
> > > > > > support.
> > > > > > 
> > > > > > I think it would be safer to mention something like filling the output
> > > > > > buffer with the minimum unit size for the selected output format, to
> > > > > > which the associated metadata applies.
> > > > > 
> > > > > I'm not sure it's a good idea. Some of the reasons why I think so:
> > > > >  1) There are streams that can have even 32 slices. With that, you
> > > > > instantly run out of V4L2 buffers even just for 1 frame.
> > > > >  2) The Rockchip hardware which seems to just pick all the slices one
> > > > > after another and which was the reason to actually put the slice data
> > > > > in the buffer like that.
> > > > >  3) Not all the metadata is per-slice. Actually most of the metadata
> > > > > is per frame and only what is located inside v4l2_h264_slice_param is
> > > > > per-slice. The corresponding control is an array, which has an entry
> > > > > for each slice in the buffer. Each entry includes an offset field,
> > > > > which points to the place in the buffer where the slice is located.
> > > > 
> > > > Sorry, I realize that my email wasn't very clear. What I meant to say
> > > > is that the spec should specify that "at least the minimum unit size
> > > > for decoding should be passed in a buffer" (that's maybe not the
> > > > clearest wording), instead of "one frame worth of".
> > > > 
> > > > I certainly don't mean to say that each slice should be held in a
> > > > separate buffer and totally agree with all the points you're making :)
> > > 
> > > Thanks for clarifying. I will update the document and post v3 accordingly.
> > > 
> > > > I just think we should still allow userspace to pass slices with a
> > > > finer granularity than "all the slices required for one frame".
> > > 
> > > I'm afraid that doing so could open the door to some ambiguities. If
> > > you allow that, then are you also allowed to send more than one frame
> > > if the decode parameters do not change? How do drivers that only
> > > support full frames react when handled only parts of a frame?
> > 
> > IIRC the ability to pass individual slices was brought up regarding a
> > potential latency benefit, but I doubt it would really be that
> > significant.
> > 
> > Thinking about it with the points you mentionned in mind, I guess the
> > downsides are much more significant than the potential gain.
> > 
> > So let's stick with requiring all the slices for a frame then!
> 
> Ack.
> 
> My view is that we can still loosen this requirement in the future,
> possibly behind some driver capability flag, but starting with a
> simpler API, with less freedom to the applications and less
> constraints on hardware support sounds like a better practice in
> general.

Sounds good, a capability flag would definitely make sense for that.

> > > > Side point: After some discussions with Thierry Reading, who's looking
> > > > into the the Tegra VPU (also stateless), it seems that using the annex-
> > > > b format for h.264 would be best for everyone. So that means including
> > > > the start code, NAL header and "raw" slice data. I guess the same
> > > > should apply to other codecs too. But that should be in the associated
> > > > pixfmt spec, not in this general document.
> > > > 
> > > > What do yout think?
> 
> Hmm, wouldn't that effectively make it the same as V4L2_PIX_FMT_H264?

Well, this would only concern the slice NAL unit. As far as I
understood, V4L2_PIX_FMT_H264 takes all sorts of NAL units.

> By the way, I proposed it once, some time ago, but it was rejected
> because VAAPI didn't get the full annex B stream and a V4L2 stateless
> VAAPI backend would have to reconstruct the stream.

Oh, right I remember. After a close look, this is apparently not the
case, according to the VAAPI docs at: 
http://intel.github.io/libva/structVASliceParameterBufferH264.html

Also looking at ffmpeg, VAAPI and VDPAU seem to pass the same data,
except that VDPAU adds a start code prefix (which IIRC is required by
the rockchip decoder):

- VAAPI: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/vaapi_h264.c#L331
- VDPAU: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/vdpau_h264.c#L182

So I was initially a bit reluctant to make it part of the spec that the
full slice NAL should be passed since that would imply geting the NAL
header info both in parsed form through the control and in raw form
along with the slice data. But it looks like it might be rather common
for decoders to require this if the tegra decoder also needs it.

Cheers,

Paul
Tomasz Figa Jan. 24, 2019, 9:02 a.m. UTC | #8
On Thu, Jan 24, 2019 at 5:59 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi,
>
> On Thu, 2019-01-24 at 17:07 +0900, Tomasz Figa wrote:
> > On Wed, Jan 23, 2019 at 7:42 PM Paul Kocialkowski
> > <paul.kocialkowski@bootlin.com> wrote:
> > > Hi Alex,
> > >
> > > On Wed, 2019-01-23 at 18:43 +0900, Alexandre Courbot wrote:
> > > > On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski
> > > > <paul.kocialkowski@bootlin.com> wrote:
> > > > > Hi,
> > > > >
> > > > > On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote:
> > > > > > Hi Paul,
> > > > > >
> > > > > > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
> > > > > > <paul.kocialkowski@bootlin.com> wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Thanks for this new version! I only have one comment left, see below.
> > > > > > >
> > > > > > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > > > > > > > Documents the protocol that user-space should follow when
> > > > > > > > communicating with stateless video decoders.
> > > > > > > >
> > > > > > > > The stateless video decoding API makes use of the new request and tags
> > > > > > > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > > > > > > should probably still be considered staging for a short while.
> > > > > > > >
> > > > > > > > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > > > > > > > ---
> > > > > > > > Removing the RFC flag this time. Changes since RFCv3:
> > > > > > > >
> > > > > > > > * Included Tomasz and Hans feedback,
> > > > > > > > * Expanded the decoding section to better describe the use of requests,
> > > > > > > > * Use the tags API.
> > > > > > > >
> > > > > > > >  Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
> > > > > > > >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
> > > > > > > >  2 files changed, 404 insertions(+)
> > > > > > > >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > > >
> > > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > > > index c61e938bd8dc..3e6a3e883f11 100644
> > > > > > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > > > @@ -6,6 +6,11 @@
> > > > > > > >  Codec Interface
> > > > > > > >  ***************
> > > > > > > >
> > > > > > > > +.. toctree::
> > > > > > > > +    :maxdepth: 1
> > > > > > > > +
> > > > > > > > +    dev-stateless-decoder
> > > > > > > > +
> > > > > > > >  A V4L2 codec can compress, decompress, transform, or otherwise convert
> > > > > > > >  video data from one format into another format, in memory. Typically
> > > > > > > >  such devices are memory-to-memory devices (i.e. devices with the
> > > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > > > new file mode 100644
> > > > > > > > index 000000000000..7a781c89bd59
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > > > @@ -0,0 +1,399 @@
> > > > > > > > +.. -*- coding: utf-8; mode: rst -*-
> > > > > > > > +
> > > > > > > > +.. _stateless_decoder:
> > > > > > > > +
> > > > > > > > +**************************************************
> > > > > > > > +Memory-to-memory Stateless Video Decoder Interface
> > > > > > > > +**************************************************
> > > > > > > > +
> > > > > > > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > > > > > > +between processing frames. This means that each frame is decoded independently
> > > > > > > > +of any previous and future frames, and that the client is responsible for
> > > > > > > > +maintaining the decoding state and providing it to the decoder with each
> > > > > > > > +decoding request. This is in contrast to the stateful video decoder interface,
> > > > > > > > +where the hardware and driver maintain the decoding state and all the client
> > > > > > > > +has to do is to provide the raw encoded stream.
> > > > > > > > +
> > > > > > > > +This section describes how user-space ("the client") is expected to communicate
> > > > > > > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > > > > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > > > > > > > +this simplicity is extra complexity in the client which must maintain a
> > > > > > > > +consistent decoding state.
> > > > > > > > +
> > > > > > > > +Stateless decoders make use of the request API and buffer tags. A stateless
> > > > > > > > +decoder must thus expose the following capabilities on its queues when
> > > > > > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > > > > > > > +
> > > > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > > > > > > > +  ``OUTPUT`` queue,
> > > > > > > > +
> > > > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > > > > > > > +  and ``CAPTURE`` queues,
> > > > > > > > +
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > +Decoding
> > > > > > > > +========
> > > > > > > > +
> > > > > > > > +For each frame, the client is responsible for submitting a request to which the
> > > > > > > > +following is attached:
> > > > > > > > +
> > > > > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > > > > > > +  ``OUTPUT`` queue,
> > > > > > >
> > > > > > > Although this is still the case in the cedrus driver (but will be fixed
> > > > > > > eventually), this requirement should be dropped because metadata is
> > > > > > > per-slice and not per-picture in the formats we're currently aiming to
> > > > > > > support.
> > > > > > >
> > > > > > > I think it would be safer to mention something like filling the output
> > > > > > > buffer with the minimum unit size for the selected output format, to
> > > > > > > which the associated metadata applies.
> > > > > >
> > > > > > I'm not sure it's a good idea. Some of the reasons why I think so:
> > > > > >  1) There are streams that can have even 32 slices. With that, you
> > > > > > instantly run out of V4L2 buffers even just for 1 frame.
> > > > > >  2) The Rockchip hardware which seems to just pick all the slices one
> > > > > > after another and which was the reason to actually put the slice data
> > > > > > in the buffer like that.
> > > > > >  3) Not all the metadata is per-slice. Actually most of the metadata
> > > > > > is per frame and only what is located inside v4l2_h264_slice_param is
> > > > > > per-slice. The corresponding control is an array, which has an entry
> > > > > > for each slice in the buffer. Each entry includes an offset field,
> > > > > > which points to the place in the buffer where the slice is located.
> > > > >
> > > > > Sorry, I realize that my email wasn't very clear. What I meant to say
> > > > > is that the spec should specify that "at least the minimum unit size
> > > > > for decoding should be passed in a buffer" (that's maybe not the
> > > > > clearest wording), instead of "one frame worth of".
> > > > >
> > > > > I certainly don't mean to say that each slice should be held in a
> > > > > separate buffer and totally agree with all the points you're making :)
> > > >
> > > > Thanks for clarifying. I will update the document and post v3 accordingly.
> > > >
> > > > > I just think we should still allow userspace to pass slices with a
> > > > > finer granularity than "all the slices required for one frame".
> > > >
> > > > I'm afraid that doing so could open the door to some ambiguities. If
> > > > you allow that, then are you also allowed to send more than one frame
> > > > if the decode parameters do not change? How do drivers that only
> > > > support full frames react when handled only parts of a frame?
> > >
> > > IIRC the ability to pass individual slices was brought up regarding a
> > > potential latency benefit, but I doubt it would really be that
> > > significant.
> > >
> > > Thinking about it with the points you mentionned in mind, I guess the
> > > downsides are much more significant than the potential gain.
> > >
> > > So let's stick with requiring all the slices for a frame then!
> >
> > Ack.
> >
> > My view is that we can still loosen this requirement in the future,
> > possibly behind some driver capability flag, but starting with a
> > simpler API, with less freedom to the applications and less
> > constraints on hardware support sounds like a better practice in
> > general.
>
> Sounds good, a capability flag would definitely make sense for that.
>
> > > > > Side point: After some discussions with Thierry Reading, who's looking
> > > > > into the the Tegra VPU (also stateless), it seems that using the annex-
> > > > > b format for h.264 would be best for everyone. So that means including
> > > > > the start code, NAL header and "raw" slice data. I guess the same
> > > > > should apply to other codecs too. But that should be in the associated
> > > > > pixfmt spec, not in this general document.
> > > > >
> > > > > What do yout think?
> >
> > Hmm, wouldn't that effectively make it the same as V4L2_PIX_FMT_H264?
>
> Well, this would only concern the slice NAL unit. As far as I
> understood, V4L2_PIX_FMT_H264 takes all sorts of NAL units.
>

Ah, passing only slice NAL units makes much more sense indeed.

> > By the way, I proposed it once, some time ago, but it was rejected
> > because VAAPI didn't get the full annex B stream and a V4L2 stateless
> > VAAPI backend would have to reconstruct the stream.
>
> Oh, right I remember. After a close look, this is apparently not the
> case, according to the VAAPI docs at:
> http://intel.github.io/libva/structVASliceParameterBufferH264.html
>
> Also looking at ffmpeg, VAAPI and VDPAU seem to pass the same data,
> except that VDPAU adds a start code prefix (which IIRC is required by
> the rockchip decoder):
>
> - VAAPI: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/vaapi_h264.c#L331
> - VDPAU: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/vdpau_h264.c#L182
>
> So I was initially a bit reluctant to make it part of the spec that the
> full slice NAL should be passed since that would imply geting the NAL
> header info both in parsed form through the control and in raw form
> along with the slice data. But it looks like it might be rather common
> for decoders to require this if the tegra decoder also needs it.

If so, I think it makes perfect sense indeed.

Best regards,
Tomasz
Alexandre Courbot Jan. 24, 2019, 9:04 a.m. UTC | #9
On Wed, Jan 23, 2019 at 7:42 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi Alex,
>
> On Wed, 2019-01-23 at 18:43 +0900, Alexandre Courbot wrote:
> > On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski
> > <paul.kocialkowski@bootlin.com> wrote:
> > > Hi,
> > >
> > > On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote:
> > > > Hi Paul,
> > > >
> > > > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
> > > > <paul.kocialkowski@bootlin.com> wrote:
> > > > > Hi,
> > > > >
> > > > > Thanks for this new version! I only have one comment left, see below.
> > > > >
> > > > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > > > > > Documents the protocol that user-space should follow when
> > > > > > communicating with stateless video decoders.
> > > > > >
> > > > > > The stateless video decoding API makes use of the new request and tags
> > > > > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > > > > should probably still be considered staging for a short while.
> > > > > >
> > > > > > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > > > > > ---
> > > > > > Removing the RFC flag this time. Changes since RFCv3:
> > > > > >
> > > > > > * Included Tomasz and Hans feedback,
> > > > > > * Expanded the decoding section to better describe the use of requests,
> > > > > > * Use the tags API.
> > > > > >
> > > > > >  Documentation/media/uapi/v4l/dev-codec.rst    |   5 +
> > > > > >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 399 ++++++++++++++++++
> > > > > >  2 files changed, 404 insertions(+)
> > > > > >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > >
> > > > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > index c61e938bd8dc..3e6a3e883f11 100644
> > > > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > @@ -6,6 +6,11 @@
> > > > > >  Codec Interface
> > > > > >  ***************
> > > > > >
> > > > > > +.. toctree::
> > > > > > +    :maxdepth: 1
> > > > > > +
> > > > > > +    dev-stateless-decoder
> > > > > > +
> > > > > >  A V4L2 codec can compress, decompress, transform, or otherwise convert
> > > > > >  video data from one format into another format, in memory. Typically
> > > > > >  such devices are memory-to-memory devices (i.e. devices with the
> > > > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > new file mode 100644
> > > > > > index 000000000000..7a781c89bd59
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > @@ -0,0 +1,399 @@
> > > > > > +.. -*- coding: utf-8; mode: rst -*-
> > > > > > +
> > > > > > +.. _stateless_decoder:
> > > > > > +
> > > > > > +**************************************************
> > > > > > +Memory-to-memory Stateless Video Decoder Interface
> > > > > > +**************************************************
> > > > > > +
> > > > > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > > > > +between processing frames. This means that each frame is decoded independently
> > > > > > +of any previous and future frames, and that the client is responsible for
> > > > > > +maintaining the decoding state and providing it to the decoder with each
> > > > > > +decoding request. This is in contrast to the stateful video decoder interface,
> > > > > > +where the hardware and driver maintain the decoding state and all the client
> > > > > > +has to do is to provide the raw encoded stream.
> > > > > > +
> > > > > > +This section describes how user-space ("the client") is expected to communicate
> > > > > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > > > > > +this simplicity is extra complexity in the client which must maintain a
> > > > > > +consistent decoding state.
> > > > > > +
> > > > > > +Stateless decoders make use of the request API and buffer tags. A stateless
> > > > > > +decoder must thus expose the following capabilities on its queues when
> > > > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > > > > > +
> > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > > > > > +  ``OUTPUT`` queue,
> > > > > > +
> > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > > > > > +  and ``CAPTURE`` queues,
> > > > > > +
> > > > >
> > > > > [...]
> > > > >
> > > > > > +Decoding
> > > > > > +========
> > > > > > +
> > > > > > +For each frame, the client is responsible for submitting a request to which the
> > > > > > +following is attached:
> > > > > > +
> > > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > > > > +  ``OUTPUT`` queue,
> > > > >
> > > > > Although this is still the case in the cedrus driver (but will be fixed
> > > > > eventually), this requirement should be dropped because metadata is
> > > > > per-slice and not per-picture in the formats we're currently aiming to
> > > > > support.
> > > > >
> > > > > I think it would be safer to mention something like filling the output
> > > > > buffer with the minimum unit size for the selected output format, to
> > > > > which the associated metadata applies.
> > > >
> > > > I'm not sure it's a good idea. Some of the reasons why I think so:
> > > >  1) There are streams that can have even 32 slices. With that, you
> > > > instantly run out of V4L2 buffers even just for 1 frame.
> > > >  2) The Rockchip hardware which seems to just pick all the slices one
> > > > after another and which was the reason to actually put the slice data
> > > > in the buffer like that.
> > > >  3) Not all the metadata is per-slice. Actually most of the metadata
> > > > is per frame and only what is located inside v4l2_h264_slice_param is
> > > > per-slice. The corresponding control is an array, which has an entry
> > > > for each slice in the buffer. Each entry includes an offset field,
> > > > which points to the place in the buffer where the slice is located.
> > >
> > > Sorry, I realize that my email wasn't very clear. What I meant to say
> > > is that the spec should specify that "at least the minimum unit size
> > > for decoding should be passed in a buffer" (that's maybe not the
> > > clearest wording), instead of "one frame worth of".
> > >
> > > I certainly don't mean to say that each slice should be held in a
> > > separate buffer and totally agree with all the points you're making :)
> >
> > Thanks for clarifying. I will update the document and post v3 accordingly.
> >
> > > I just think we should still allow userspace to pass slices with a
> > > finer granularity than "all the slices required for one frame".
> >
> > I'm afraid that doing so could open the door to some ambiguities. If
> > you allow that, then are you also allowed to send more than one frame
> > if the decode parameters do not change? How do drivers that only
> > support full frames react when handled only parts of a frame?
>
> IIRC the ability to pass individual slices was brought up regarding a
> potential latency benefit, but I doubt it would really be that
> significant.
>
> Thinking about it with the points you mentionned in mind, I guess the
> downsides are much more significant than the potential gain.
>
> So let's stick with requiring all the slices for a frame then!

Grateful for that ; thanks!

I will rework the document, so please don't waste your time on this v2.

How data should be sliced can probably be implemented using a
codec-specific control exposing what the driver supports and that
user-space would have to check. To make things future-proof it is
probably safer to introduce it now even if it only takes a single
value ; I will try to incorporate this in the v3 as well.
diff mbox series

Patch

diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
index c61e938bd8dc..3e6a3e883f11 100644
--- a/Documentation/media/uapi/v4l/dev-codec.rst
+++ b/Documentation/media/uapi/v4l/dev-codec.rst
@@ -6,6 +6,11 @@ 
 Codec Interface
 ***************
 
+.. toctree::
+    :maxdepth: 1
+
+    dev-stateless-decoder
+
 A V4L2 codec can compress, decompress, transform, or otherwise convert
 video data from one format into another format, in memory. Typically
 such devices are memory-to-memory devices (i.e. devices with the
diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
new file mode 100644
index 000000000000..7a781c89bd59
--- /dev/null
+++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
@@ -0,0 +1,399 @@ 
+.. -*- coding: utf-8; mode: rst -*-
+
+.. _stateless_decoder:
+
+**************************************************
+Memory-to-memory Stateless Video Decoder Interface
+**************************************************
+
+A stateless decoder is a decoder that works without retaining any kind of state
+between processing frames. This means that each frame is decoded independently
+of any previous and future frames, and that the client is responsible for
+maintaining the decoding state and providing it to the decoder with each
+decoding request. This is in contrast to the stateful video decoder interface,
+where the hardware and driver maintain the decoding state and all the client
+has to do is to provide the raw encoded stream.
+
+This section describes how user-space ("the client") is expected to communicate
+with such decoders in order to successfully decode an encoded stream. Compared
+to stateful codecs, the decoder/client sequence is simpler, but the cost of
+this simplicity is extra complexity in the client which must maintain a
+consistent decoding state.
+
+Stateless decoders make use of the request API and buffer tags. A stateless
+decoder must thus expose the following capabilities on its queues when
+:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
+
+* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
+  ``OUTPUT`` queue,
+
+* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
+  and ``CAPTURE`` queues,
+
+Querying capabilities
+=====================
+
+1. To enumerate the set of coded formats supported by the decoder, the client
+   calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
+
+   * The driver must always return the full set of supported ``OUTPUT`` formats,
+     irrespective of the format currently set on the ``CAPTURE`` queue.
+
+   * Simultaneously, the driver must restrain the set of values returned by
+     codec-specific capability controls (such as H.264 profiles) to the set
+     actually supported by the hardware.
+
+2. To enumerate the set of supported raw formats, the client calls
+   :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
+
+   * The driver must return only the formats supported for the format currently
+     active on the ``OUTPUT`` queue.
+
+   * Depending on the currently set ``OUTPUT`` format, the set of supported raw
+     formats may depend on the value of some controls (e.g. parsed format
+     headers) which are codec-dependent. The client is responsible for making
+     sure that these controls are set before querying the ``CAPTURE`` queue.
+     Failure to do so will result in the default values for these controls being
+     used, and a returned set of formats that may not be usable for the media
+     the client is trying to decode.
+
+3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
+   resolutions for a given format, passing desired pixel format in
+   :c:type:`v4l2_frmsizeenum`'s ``pixel_format``.
+
+4. Supported profiles and levels for the current ``OUTPUT`` format, if
+   applicable, may be queried using their respective controls via
+   :c:func:`VIDIOC_QUERYCTRL`.
+
+Initialization
+==============
+
+1. Set the coded format on the ``OUTPUT`` queue via :c:func:`VIDIOC_S_FMT`.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``.
+
+     ``pixelformat``
+         a coded pixel format.
+
+     ``width``, ``height``
+         coded width and height parsed from the stream.
+
+     other fields
+         follow standard semantics.
+
+   .. note::
+
+      Changing the ``OUTPUT`` format may change the currently set ``CAPTURE``
+      format. The driver will derive a new ``CAPTURE`` format from the
+      ``OUTPUT`` format being set, including resolution, colorimetry
+      parameters, etc. If the client needs a specific ``CAPTURE`` format,
+      it must adjust it afterwards.
+
+2. Call :c:func:`VIDIOC_S_EXT_CTRLS` to set all the controls (parsed headers,
+   etc.) required by the ``OUTPUT`` format to enumerate the ``CAPTURE`` formats.
+
+3. Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get the format for the
+   destination buffers parsed/decoded from the bitstream.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``.
+
+   * **Returned fields:**
+
+     ``width``, ``height``
+         frame buffer resolution for the decoded frames.
+
+     ``pixelformat``
+         pixel format for decoded frames.
+
+     ``num_planes`` (for _MPLANE ``type`` only)
+         number of planes for pixelformat.
+
+     ``sizeimage``, ``bytesperline``
+         as per standard semantics; matching frame buffer format.
+
+   .. note::
+
+      The value of ``pixelformat`` may be any pixel format supported for the
+      ``OUTPUT`` format, based on the hardware capabilities. It is suggested
+      that driver chooses the preferred/optimal format for the current
+      configuration. For example, a YUV format may be preferred over an RGB
+      format, if an additional conversion step would be required for RGB.
+
+4. *[optional]* Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
+   the ``CAPTURE`` queue. The client may use this ioctl to discover which
+   alternative raw formats are supported for the current ``OUTPUT`` format and
+   select one of them via :c:func:`VIDIOC_S_FMT`.
+
+   .. note::
+
+      The driver will return only formats supported for the currently selected
+      ``OUTPUT`` format, even if more formats may be supported by the decoder in
+      general.
+
+      For example, a decoder may support YUV and RGB formats for
+      resolutions 1920x1088 and lower, but only YUV for higher resolutions (due
+      to hardware limitations). After setting a resolution of 1920x1088 or lower
+      as the ``OUTPUT`` format, :c:func:`VIDIOC_ENUM_FMT` may return a set of
+      YUV and RGB pixel formats, but after setting a resolution higher than
+      1920x1088, the driver will not return RGB pixel formats, since they are
+      unsupported for this resolution.
+
+5. *[optional]* Choose a different ``CAPTURE`` format than suggested via
+   :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
+   choose a different format than selected/suggested by the driver in
+   :c:func:`VIDIOC_G_FMT`.
+
+    * **Required fields:**
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``.
+
+      ``pixelformat``
+          a raw pixel format.
+
+6. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
+   ``OUTPUT`` queue.
+
+    * **Required fields:**
+
+      ``count``
+          requested number of buffers to allocate; greater than zero.
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``.
+
+      ``memory``
+          follows standard semantics.
+
+    * **Return fields:**
+
+      ``count``
+          actual number of buffers allocated.
+
+    * If required, the driver will adjust ``count`` to be equal or bigger to the
+      minimum of required number of ``OUTPUT`` buffers for the given format and
+      requested count. The client must check this value after the ioctl returns
+      to get the actual number of buffers allocated.
+
+7. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
+   ``CAPTURE`` queue.
+
+    * **Required fields:**
+
+      ``count``
+          requested number of buffers to allocate; greater than zero. The client
+          is responsible for deducing the minimum number of buffers required
+          for the stream to be properly decoded (taking e.g. reference frames
+          into account) and pass an equal or bigger number.
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``.
+
+      ``memory``
+          follows standard semantics. ``V4L2_MEMORY_USERPTR`` is not supported
+          for ``CAPTURE`` buffers.
+
+    * **Return fields:**
+
+      ``count``
+          adjusted to allocated number of buffers, in case the codec requires
+          more buffers than requested.
+
+    * The driver must adjust count to the minimum of required number of
+      ``CAPTURE`` buffers for the current format, stream configuration and
+      requested count. The client must check this value after the ioctl
+      returns to get the number of buffers allocated.
+
+8. Allocate requests (likely one per ``OUTPUT`` buffer) via
+    :c:func:`MEDIA_IOC_REQUEST_ALLOC` on the media device.
+
+9. Start streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
+    :c:func:`VIDIOC_STREAMON`.
+
+Decoding
+========
+
+For each frame, the client is responsible for submitting a request to which the
+following is attached:
+
+* Exactly one frame worth of encoded data in a buffer submitted to the
+  ``OUTPUT`` queue,
+* All the controls relevant to the format being decoded (see below for details).
+
+The contents of the source ``OUTPUT`` buffer, as well as the controls that must
+be set on the request, depend on the active coded pixel format and might be
+affected by codec-specific extended controls, as stated in documentation of each
+format.
+
+A typical frame would thus be decoded using the following sequence:
+
+1. Queue an ``OUTPUT`` buffer containing one frame worth of encoded bitstream
+   data for the decoding request, using :c:func:`VIDIOC_QBUF`.
+
+    * **Required fields:**
+
+      ``index``
+          index of the buffer being queued.
+
+      ``type``
+          type of the buffer.
+
+      ``bytesused``
+          number of bytes taken by the encoded data frame in the buffer.
+
+      ``flags``
+          the ``V4L2_BUF_FLAG_REQUEST_FD`` flag must be set. In addition, if
+	  the decoded frame is to be used as a reference frame in the future,
+	  then the ``V4L2_BUF_FLAG_TAG`` flag must be set (it can also be set
+	  for non-reference frames if it helps the client).
+
+      ``request_fd``
+          must be set to the file descriptor of the decoding request.
+
+      ``tag``
+          if the ``V4L2_BUF_FLAG_TAG`` is set, then this must contain the tag
+          for the frame that will be copied into the decoded frame buffer, and
+          can be used to specify this frame as a reference frame for another
+          one.
+
+   .. note::
+
+     The API currently requires one frame of encoded data per ``OUTPUT`` buffer,
+     even though some encoded formats may present their data in smaller chunks
+     (e.g. H.264's frames can be made of several slices that can be processed
+     independently). It is currently the responsibility of the client to gather
+     the different parts of a frame into a single ``OUTPUT`` buffer, while
+     preserving the same layout as the original bitstream. This
+     restriction may be lifted in the future.
+
+2. Set the codec-specific controls for the decoding request, using
+   :c:func:`VIDIOC_S_EXT_CTRLS`.
+
+    * **Required fields:**
+
+      ``which``
+          must be ``V4L2_CTRL_WHICH_REQUEST_VAL``.
+
+      ``request_fd``
+          must be set to the file descriptor of the decoding request.
+
+      other fields
+          other fields are set as usual when setting controls. The ``controls``
+          array must contain all the codec-specific controls required to decode
+          a frame.
+
+   .. note::
+
+      It is possible to specify the controls in different invocations of
+      :c:func:`VIDIOC_S_EXT_CTRLS`, or to overwrite a previously set control, as
+      long as ``request_fd`` and ``which`` are properly set. The controls state
+      at the moment of request submission is the one that will be considered.
+
+   .. note::
+
+      The order in which steps 1 and 2 take place is interchangeable.
+
+3. Submit the request by invoking :c:func:`MEDIA_IOC_REQUEST_QUEUE` on the
+   request FD.
+
+    If the request is submitted without an ``OUTPUT`` buffer, or if some of the
+    required controls are missing from the request, then
+    :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one
+    ``OUTPUT`` buffer is queued, then it will return ``-EINVAL``.
+    :c:func:`MEDIA_REQUEST_IOC_QUEUE` returning non-zero means that no
+    ``CAPTURE`` buffer will be produced for this request.
+
+``CAPTURE`` buffers must not be part of the request, and are queued
+independently. They are returned in decode order (i.e. the same order as
+``OUTPUT`` buffers were submitted).
+
+Runtime decoding errors are signaled by the dequeued ``CAPTURE`` buffers
+carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If a decoded reference frame has an
+error, then all following decoded frames that refer to it also have the
+``V4L2_BUF_FLAG_ERROR`` flag set, although the decoder will still try to
+produce a (likely corrupted) frame.
+
+Buffer management while decoding
+================================
+Contrary to stateful decoders, a stateless decoder does not perform any kind of
+buffer management: it only guarantees that dequeued ``CAPTURE`` buffer can be
+used by the client for as long as they are not queued again. "Used" here
+encompasses using the buffer for compositing, display, or as a reference frame
+to decode a subsequent frame.
+
+Reference frames are specified by using the same tag that was set to the
+``OUTPUT`` buffer of a frame into the relevant codec-specific structures that
+are submitted as controls. This tag will be copied to the corresponding
+``CAPTURE`` buffer, but can be used in any subsequent decoding request as soon
+as the decoding request for that buffer is queued successfully. This means that
+the client does not need to wait until a ``CAPTURE`` buffer with a given tag is
+dequeued to start using that tag in reference frames. However, it must wait
+until all frames referencing a given tag are dequeued before queuing the
+referenced ``CAPTURE`` buffer again, since queueing a buffer effectively removes
+its tag.
+
+When queuing a decoding request, the driver will increase the reference count of
+all the resources associated with reference frames. This means that the client
+can e.g. close the DMABUF file descriptors of the reference frame buffers if it
+won't need it afterwards, as long as the V4L2 ``CAPTURE`` buffer of the
+reference frame is not re-queued before all referencing frames are decoded.
+
+Seeking
+=======
+In order to seek, the client just needs to submit requests using input buffers
+corresponding to the new stream position. It must however be aware that
+resolution may have changed and follow the dynamic resolution change sequence in
+that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
+for H.264) may have changed and the client is responsible for making sure that a
+valid state is sent to the decoder.
+
+The client is then free to ignore any returned ``CAPTURE`` buffer that comes
+from the pre-seek position.
+
+Pause
+=====
+
+In order to pause, the client can just cease queuing buffers onto the ``OUTPUT``
+queue. Without source bitstream data, there is no data to process and the codec
+will remain idle.
+
+Dynamic resolution change
+=========================
+
+If the client detects a resolution change in the stream, it will need to perform
+the initialization sequence again with the new resolution:
+
+1. Wait until all submitted requests have completed and dequeue the
+   corresponding output buffers.
+
+2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE``
+   queues.
+
+3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
+   ``CAPTURE`` queue with a buffer count of zero.
+
+4. Perform the initialization sequence again (minus the allocation of
+   ``OUTPUT`` buffers), with the new resolution set on the ``OUTPUT`` queue.
+   Note that due to resolution constraints, a different format may need to be
+   picked on the ``CAPTURE`` queue.
+
+Drain
+=====
+
+In order to drain the stream on a stateless decoder, the client just needs to
+wait until all the submitted requests are completed. There is no need to send a
+``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
+decoder.
+
+End of stream
+=============
+
+When the client detects that the end of stream is reached, it can simply stop
+sending new frames to the decoder, drain the ``CAPTURE`` queue, and dispose of
+the decoder as needed.