diff mbox series

[2/2] media: docs-rst: Document memory-to-memory video encoder interface

Message ID 20180724140621.59624-3-tfiga@chromium.org (mailing list archive)
State New, archived
Headers show
Series Document memory-to-memory video codec interfaces | expand

Commit Message

Tomasz Figa July 24, 2018, 2:06 p.m. UTC
Due to complexity of the video encoding process, the V4L2 drivers of
stateful encoder hardware require specific sequences of V4L2 API calls
to be followed. These include capability enumeration, initialization,
encoding, encode parameters change, drain and reset.

Specifics of the above have been discussed during Media Workshops at
LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
Conference Europe 2014 in Düsseldorf. The de facto Codec API that
originated at those events was later implemented by the drivers we already
have merged in mainline, such as s5p-mfc or coda.

The only thing missing was the real specification included as a part of
Linux Media documentation. Fix it now and document the encoder part of
the Codec API.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
---
 Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
 Documentation/media/uapi/v4l/devices.rst     |   1 +
 Documentation/media/uapi/v4l/v4l2.rst        |   2 +
 3 files changed, 553 insertions(+)
 create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst

Comments

Philipp Zabel July 25, 2018, 1:41 p.m. UTC | #1
On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> Due to complexity of the video encoding process, the V4L2 drivers of
> stateful encoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> encoding, encode parameters change, drain and reset.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the encoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>

Thanks a lot for the update, 
> ---
>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>  3 files changed, 553 insertions(+)
>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> new file mode 100644
> index 000000000000..28be1698e99c
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> @@ -0,0 +1,550 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _encoder:
> +
> +****************************************
> +Memory-to-memory Video Encoder Interface
> +****************************************
> +
> +Input data to a video encoder are raw video frames in display order
> +to be encoded into the output bitstream. Output data are complete chunks of
> +valid bitstream, including all metadata, headers, etc. The resulting stream
> +must not need any further post-processing by the client.
> +
> +Performing software stream processing, header generation etc. in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Encoder Interface (in
> +development) is strongly advised.
> +
[...]
> +
> +Commit points
> +=============
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on ``CAPTURE`` queue may change the set of formats
> +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> +   means that ``OUTPUT`` format may be reset and the client must not
> +   rely on the previously set format being preserved.

Since the only property of the CAPTURE format that can be set directly
via S_FMT is the pixelformat, should this be made explicit?

1. Setting pixelformat on ``CAPTURE`` queue may change the set of
   formats supported/advertised on the ``OUTPUT`` queue. In particular,
   it also means that ``OUTPUT`` format may be reset and the client
   must not rely on the previously set format being preserved.

?

> +2. Enumerating formats on ``OUTPUT`` queue must only return formats
> +   supported for the ``CAPTURE`` format currently set.

Same here, as it usually is the codec selected by CAPTURE pixelformat
that determines the supported OUTPUT pixelformats and resolutions.

2. Enumerating formats on ``OUTPUT`` queue must only return formats
   supported for the ``CAPTURE`` pixelformat currently set.

This could prevent the possible misconception that the CAPTURE
width/height might in any form limit the OUTPUT format, when in fact it
is determined by it.

> +3. Setting/changing format on ``OUTPUT`` queue does not change formats
> +   available on ``CAPTURE`` queue.

3. Setting/changing format on the ``OUTPUT`` queue does not change
   pixelformats available on the ``CAPTURE`` queue.

?

Because setting OUTPUT width/height or CROP SELECTION very much limits
the possible values of CAPTURE width/height.

Maybe 'available' in this context should be specified somewhere to mean
'returned by ENUM_FMT and allowed by S_FMT/TRY_FMT'.

> +   An attempt to set ``OUTPUT`` format that
> +   is not supported for the currently selected ``CAPTURE`` format must
> +   result in the driver adjusting the requested format to an acceptable
> +   one.

   An attempt to set ``OUTPUT`` format that is not supported for the
  
currently selected ``CAPTURE`` pixelformat must result in the driver
  
adjusting the requested format to an acceptable one.

> +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
> +   supported coded formats, irrespective of the current ``OUTPUT``
> +   format.
> +
> +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> +   change format on it.
> +
> +To summarize, setting formats and allocation must always start with the
> +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> +set of supported formats for the ``OUTPUT`` queue.

To summarize, setting formats and allocation must always start with
setting the encoded pixelformat on the ``CAPTURE`` queue. The
``CAPTURE`` queue is the master that governs the set of supported
formats for the ``OUTPUT`` queue.

Or is this too verbose?

regards
Philipp
Hans Verkuil July 25, 2018, 1:57 p.m. UTC | #2
On 24/07/18 16:06, Tomasz Figa wrote:
> Due to complexity of the video encoding process, the V4L2 drivers of
> stateful encoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> encoding, encode parameters change, drain and reset.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the encoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>  3 files changed, 553 insertions(+)
>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> new file mode 100644
> index 000000000000..28be1698e99c
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> @@ -0,0 +1,550 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _encoder:
> +
> +****************************************
> +Memory-to-memory Video Encoder Interface
> +****************************************
> +
> +Input data to a video encoder are raw video frames in display order
> +to be encoded into the output bitstream. Output data are complete chunks of
> +valid bitstream, including all metadata, headers, etc. The resulting stream
> +must not need any further post-processing by the client.

Due to the confusing use capture and output I wonder if it would be better to
rephrase this as follows:

"A video encoder takes raw video frames in display order and encodes them into
a bitstream. It generates complete chunks of the bitstream, including
all metadata, headers, etc. The resulting bitstream does not require any further
post-processing by the client."

Something similar should be done for the decoder documentation.

> +
> +Performing software stream processing, header generation etc. in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Encoder Interface (in
> +development) is strongly advised.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
> +   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
> +   unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise,
> +   depending on driver capabilities and following the general V4L2
> +   guidelines.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
> +   containing data (encoded frame/stream) that resulted from processing
> +   buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue; the queue of buffers containing encoded
> +   bitstream; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
> +   hardware into ``CAPTURE`` buffers
> +
> +client
> +   application client communicating with the driver implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.);
> +   see also: raw format
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec and hardware requirements;
> +   typically visible resolution rounded up to full macroblocks; see also:
> +   visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ from display order if
> +   coded format includes a feature of frame reordering; ``CAPTURE`` buffers
> +   must be returned by the driver in decode order
> +
> +display order
> +   the order in which frames must be displayed; ``OUTPUT`` buffers must be
> +   queued by the client in display order
> +
> +IDR
> +   a type of a keyframe in H.264-encoded stream, which clears the list of
> +   earlier reference frames (DPBs)

Same problem as with the previous patch: it doesn't say what IDR stands for.
It also refers to DPBs, but DPB is not part of this glossary.

Perhaps the glossary of the encoder/decoder should be combined.

> +
> +keyframe
> +   an encoded frame that does not reference frames decoded earlier, i.e.
> +   can be decoded fully on its own.
> +
> +macroblock
> +   a processing unit in image and video compression formats based on linear
> +   block transforms (e.g. H264, VP8, VP9); codec-specific, but for most of
> +   popular codecs the size is 16x16 samples (pixels)
> +
> +OUTPUT
> +   the source buffer queue; the queue of buffers containing raw frames;
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
> +   from ``OUTPUT`` buffers
> +
> +PPS
> +   Picture Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may start/continue, without
> +   any previous state/data present, e.g.: a keyframe (VP8/VP9) or
> +   SPS/PPS/IDR sequence (H.264); a resume point is required to start decode
> +   of a new stream, or to resume decoding after a seek
> +
> +source
> +   data fed to the encoder; ``OUTPUT``
> +
> +source height
> +   height in pixels for given source resolution
> +
> +source resolution
> +   resolution in pixels of source frames being source to the encoder and
> +   subject to further cropping to the bounds of visible resolution
> +
> +source width
> +   width in pixels for given source resolution
> +
> +SPS
> +   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +stream metadata
> +   additional (non-visual) information contained inside encoded bitstream;
> +   for example: coded resolution, visible resolution, codec profile
> +
> +visible height
> +   height for given visible resolution; display height
> +
> +visible resolution
> +   stream resolution of the visible picture, in pixels, to be used for
> +   display purposes; must be smaller or equal to coded resolution;
> +   display resolution
> +
> +visible width
> +   width for given visible resolution; display width
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> +
> +   * The driver must always return the full set of supported formats,
> +     irrespective of the format set on the ``OUTPUT`` queue.
> +
> +2. To enumerate the set of supported raw formats, the client may call
> +   :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> +
> +   * The driver must return only the formats supported for the format
> +     currently active on ``CAPTURE``.
> +
> +   * In order to enumerate raw formats supported by a given coded format,
> +     the client must first set that coded format on ``CAPTURE`` and then
> +     enumerate the ``OUTPUT`` queue.
> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing desired pixel format in
> +   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE``
> +     must include all possible coded resolutions supported by the encoder
> +     for given coded pixel format.
> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT``
> +     queue must include all possible frame buffer resolutions supported
> +     by the encoder for given raw pixel format and coded format currently
> +     set on ``CAPTURE``.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +5. Any additional encoder capabilities may be discovered by querying
> +   their respective controls.
> +
> +Initialization
> +==============
> +
> +1. *[optional]* Enumerate supported formats and resolutions. See
> +   capability enumeration.

capability enumeration. -> 'Querying capabilities' above.

> +
> +2. Set a coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +     ``pixelformat``
> +         set to a coded format to be produced
> +
> +   * **Return fields:**
> +
> +     ``width``, ``height``
> +         coded resolution (based on currently active ``OUTPUT`` format)
> +
> +   .. note::
> +
> +      Changing ``CAPTURE`` format may change currently set ``OUTPUT``
> +      format. The driver will derive a new ``OUTPUT`` format from
> +      ``CAPTURE`` format being set, including resolution, colorimetry
> +      parameters, etc. If the client needs a specific ``OUTPUT`` format,
> +      it must adjust it afterwards.
> +
> +3. *[optional]* Enumerate supported ``OUTPUT`` formats (raw formats for
> +   source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``index``
> +         follows standard semantics
> +
> +   * **Return fields:**
> +
> +     ``pixelformat``
> +         raw format supported for the coded format currently selected on
> +         the ``OUTPUT`` queue.
> +
> +4. The client may set the raw source format on the ``OUTPUT`` queue via
> +   :c:func:`VIDIOC_S_FMT`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``pixelformat``
> +         raw format of the source
> +
> +     ``width``, ``height``
> +         source resolution
> +
> +     ``num_planes`` (for _MPLANE)
> +         set to number of planes for pixelformat
> +
> +     ``sizeimage``, ``bytesperline``
> +         follow standard semantics
> +
> +   * **Return fields:**
> +
> +     ``width``, ``height``
> +         may be adjusted by driver to match alignment requirements, as
> +         required by the currently selected formats
> +
> +     ``sizeimage``, ``bytesperline``
> +         follow standard semantics
> +
> +   * Setting the source resolution will reset visible resolution to the
> +     adjusted source resolution rounded up to the closest visible
> +     resolution supported by the driver. Similarly, coded resolution will

coded -> the coded

> +     be reset to source resolution rounded up to the closest coded

reset -> set
source -> the source

> +     resolution supported by the driver (typically a multiple of
> +     macroblock size).

The first sentence of this paragraph is very confusing. It needs a bit more work,
I think.

> +
> +   .. note::
> +
> +      This step is not strictly required, since ``OUTPUT`` is expected to
> +      have a valid default format. However, the client needs to ensure that
> +      ``OUTPUT`` format matches its expectations via either
> +      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
> +      being the typical scenario, since the default format is unlikely to
> +      be what the client needs.

Hmm. I'm not sure if this note should be included. It's good practice to always
set the output format. I think the note confuses more than that it helps. IMHO.

> +
> +5. *[optional]* Set visible resolution for the stream metadata via

Set -> Set the

> +   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``target``
> +         set to ``V4L2_SEL_TGT_CROP``
> +
> +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +         visible rectangle; this must fit within the framebuffer resolution

Should that be "source resolution"? Or the resolution returned by "CROP_BOUNDS"?

> +         and might be subject to adjustment to match codec and hardware
> +         constraints
> +
> +   * **Return fields:**
> +
> +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +         visible rectangle adjusted by the driver
> +
> +   * The driver must expose following selection targets on ``OUTPUT``:
> +
> +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> +         maximum crop bounds within the source buffer supported by the
> +         encoder
> +
> +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> +         suggested cropping rectangle that covers the whole source picture
> +
> +     ``V4L2_SEL_TGT_CROP``
> +         rectangle within the source buffer to be encoded into the
> +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> +
> +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> +         maximum rectangle within the coded resolution, which the cropped
> +         source frame can be output into; always equal to (0, 0)x(width of
> +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> +         hardware does not support compose/scaling
> +
> +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> +         equal to ``V4L2_SEL_TGT_CROP``
> +
> +     ``V4L2_SEL_TGT_COMPOSE``
> +         rectangle within the coded frame, which the cropped source frame
> +         is to be output into; defaults to
> +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> +         additional compose/scaling capabilities; resulting stream will
> +         have this rectangle encoded as the visible rectangle in its
> +         metadata
> +
> +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> +         always equal to coded resolution of the stream, as selected by the
> +         encoder based on source resolution and crop/compose rectangles

Are there codec drivers that support composition? I can't remember seeing any.

> +
> +   .. note::
> +
> +      The driver may adjust the crop/compose rectangles to the nearest
> +      supported ones to meet codec and hardware requirements.
> +
> +6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
> +   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
> +
> +   * **Required fields:**
> +
> +     ``count``
> +         requested number of buffers to allocate; greater than zero
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
> +         ``CAPTURE``
> +
> +     ``memory``
> +         follows standard semantics
> +
> +   * **Return fields:**
> +
> +     ``count``
> +         adjusted to allocated number of buffers
> +
> +   * The driver must adjust count to minimum of required number of
> +     buffers for given format and count passed.

I'd rephrase this:

	The driver must adjust ``count`` to the maximum of ``count`` and
	the required number of buffers for the given format.

Note that this is set to the maximum, not minimum.

> The client must
> +     check this value after the ioctl returns to get the number of
> +     buffers actually allocated.
> +
> +   .. note::
> +
> +      To allocate more than minimum number of buffers (for pipeline

than -> than the

> +      depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) or
> +      G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``), respectively,
> +      to get the minimum number of buffers required by the
> +      driver/format, and pass the obtained value plus the number of
> +      additional buffers needed in count field to :c:func:`VIDIOC_REQBUFS`.

count -> the ``count``

> +
> +7. Begin streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> +   :c:func:`VIDIOC_STREAMON`. This may be performed in any order. Actual

Actual -> The actual

> +   encoding process starts when both queues start streaming.
> +
> +.. note::
> +
> +   If the client stops ``CAPTURE`` during the encode process and then
> +   restarts it again, the encoder will be expected to generate a stream
> +   independent from the stream generated before the stop. Depending on the
> +   coded format, that may imply that:
> +
> +   * encoded frames produced after the restart must not reference any
> +     frames produced before the stop, e.g. no long term references for
> +     H264,
> +
> +   * any headers that must be included in a standalone stream must be
> +     produced again, e.g. SPS and PPS for H264.
> +
> +Encoding
> +========
> +
> +This state is reached after a successful initialization sequence. In
> +this state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> +semantics.
> +
> +Both queues operate independently, following standard behavior of V4L2
> +buffer queues and memory-to-memory devices. In addition, the order of
> +encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> +queuing raw frames to ``OUTPUT`` queue, due to properties of selected coded
> +format, e.g. frame reordering. The client must not assume any direct
> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> +reported by :c:type:`v4l2_buffer` ``timestamp``.

Same question as for the decoder: are you sure about that?

> +
> +Encoding parameter changes
> +==========================
> +
> +The client is allowed to use :c:func:`VIDIOC_S_CTRL` to change encoder
> +parameters at any time. The availability of parameters is driver-specific
> +and the client must query the driver to find the set of available controls.
> +
> +The ability to change each parameter during encoding of is driver-specific,

Remove spurious 'of'

> +as per standard semantics of the V4L2 control interface. The client may

per -> per the

> +attempt setting a control of its interest during encoding and if it the

Remove spurious 'it'

> +operation fails with the -EBUSY error code, ``CAPTURE`` queue needs to be

``CAPTURE`` -> the ``CAPTURE``

> +stopped for the configuration change to be allowed (following the drain
> +sequence will be  needed to avoid losing already queued/encoded frames).

losing -> losing the

> +
> +The timing of parameter update is driver-specific, as per standard

update -> updates
per -> per the

> +semantics of the V4L2 control interface. If the client needs to apply the
> +parameters exactly at specific frame and the encoder supports it, using

using -> using the

> +Request API should be considered.

This makes the assumption that the Request API will be merged at about the
same time as this document. Which is at the moment a reasonable assumption,
to be fair.

> +
> +Drain
> +=====
> +
> +To ensure that all queued ``OUTPUT`` buffers have been processed and
> +related ``CAPTURE`` buffers output to the client, the following drain

related -> the related

> +sequence may be followed. After the drain sequence is complete, the client
> +has received all encoded frames for all ``OUTPUT`` buffers queued before
> +the sequence was started.
> +
> +1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
> +
> +   * **Required fields:**
> +
> +     ``cmd``
> +         set to ``V4L2_ENC_CMD_STOP``
> +
> +     ``flags``
> +         set to 0
> +
> +     ``pts``
> +         set to 0
> +
> +2. The driver must process and encode as normal all ``OUTPUT`` buffers
> +   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
> +
> +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
> +   processed:
> +
> +   * Once all decoded frames (if any) are ready to be dequeued on the
> +     ``CAPTURE`` queue the driver must send a ``V4L2_EVENT_EOS``. The
> +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
> +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
> +     last frame (if any) produced as a result of processing the ``OUTPUT``
> +     buffers queued before
> +     ``V4L2_ENC_CMD_STOP``.

Hmm, this is somewhat awkward phrasing. Can you take another look at this?

> +
> +   * If no more frames are left to be returned at the point of handling
> +     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
> +     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> +     ``V4L2_BUF_FLAG_LAST`` set.
> +
> +   * Any attempts to dequeue more buffers beyond the buffer marked with
> +     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
> +     :c:func:`VIDIOC_DQBUF`.
> +
> +4. At this point, encoding is paused and the driver will accept, but not
> +   process any newly queued ``OUTPUT`` buffers until the client issues

issues -> issues a

> +   ``V4L2_ENC_CMD_START`` or restarts streaming on any queue.
> +
> +* Once the drain sequence is initiated, the client needs to drive it to
> +  completion, as described by the above steps, unless it aborts the process
> +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``CAPTURE`` queue.  The client
> +  is not allowed to issue ``V4L2_ENC_CMD_START`` or ``V4L2_ENC_CMD_STOP``
> +  again while the drain sequence is in progress and they will fail with
> +  -EBUSY error code if attempted.
> +
> +* Restarting streaming on ``CAPTURE`` queue will implicitly end the paused
> +  state and make the encoder continue encoding, as long as other encoding
> +  conditions are met. Restarting ``OUTPUT`` queue will not affect an
> +  in-progress drain sequence.
> +
> +* The drivers must also implement :c:func:`VIDIOC_TRY_ENCODER_CMD`, as a
> +  way to let the client query the availability of encoder commands.
> +
> +Reset
> +=====
> +
> +The client may want to request the encoder to reinitialize the encoding,
> +so that the stream produced becomes independent from the stream generated
> +before. Depending on the coded format, that may imply that:
> +
> +* encoded frames produced after the restart must not reference any frames
> +  produced before the stop, e.g. no long term references for H264,
> +
> +* any headers that must be included in a standalone stream must be produced
> +  again, e.g. SPS and PPS for H264.
> +
> +This can be achieved by performing the reset sequence.
> +
> +1. *[optional]* If the client is interested in encoded frames resulting
> +   from already queued source frames, it needs to perform the Drain
> +   sequence. Otherwise, the reset sequence would cause the already
> +   encoded and not dequeued encoded frames to be lost.
> +
> +2. Stop streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMOFF`. This
> +   will return all currently queued ``CAPTURE`` buffers to the client,
> +   without valid frame data.
> +
> +3. *[optional]* Restart streaming on ``OUTPUT`` queue via
> +   :c:func:`VIDIOC_STREAMOFF` followed by :c:func:`VIDIOC_STREAMON` to
> +   drop any source frames enqueued to the encoder before the reset
> +   sequence. This is useful if the client requires the new stream to begin
> +   at specific source frame. Otherwise, the new stream might include
> +   frames encoded from source frames queued before the reset sequence.
> +
> +4. Restart streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMON` and
> +   continue with regular encoding sequence. The encoded frames produced
> +   into ``CAPTURE`` buffers from now on will contain a standalone stream
> +   that can be decoded without the need for frames encoded before the reset
> +   sequence.
> +
> +Commit points
> +=============
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on ``CAPTURE`` queue may change the set of formats

format -> the format

> +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> +   means that ``OUTPUT`` format may be reset and the client must not

that -> that the

> +   rely on the previously set format being preserved.
> +
> +2. Enumerating formats on ``OUTPUT`` queue must only return formats

on -> on the

> +   supported for the ``CAPTURE`` format currently set.

'for the current ``CAPTURE`` format.'

> +
> +3. Setting/changing format on ``OUTPUT`` queue does not change formats

format -> the format
on -> on the

> +   available on ``CAPTURE`` queue. An attempt to set ``OUTPUT`` format that

on -> on the
set -> set the

> +   is not supported for the currently selected ``CAPTURE`` format must
> +   result in the driver adjusting the requested format to an acceptable
> +   one.
> +
> +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of

on -> on the

> +   supported coded formats, irrespective of the current ``OUTPUT``
> +   format.
> +
> +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> +   change format on it.

format -> the format

> +
> +To summarize, setting formats and allocation must always start with the
> +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> +set of supported formats for the ``OUTPUT`` queue.
> diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
> index 12d43fe711cf..1822c66c2154 100644
> --- a/Documentation/media/uapi/v4l/devices.rst
> +++ b/Documentation/media/uapi/v4l/devices.rst
> @@ -16,6 +16,7 @@ Interfaces
>      dev-osd
>      dev-codec
>      dev-decoder
> +    dev-encoder
>      dev-effect
>      dev-raw-vbi
>      dev-sliced-vbi
> diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> index 65dc096199ad..2ef6693b9499 100644
> --- a/Documentation/media/uapi/v4l/v4l2.rst
> +++ b/Documentation/media/uapi/v4l/v4l2.rst
> @@ -56,6 +56,7 @@ Authors, in alphabetical order:
>  - Figa, Tomasz <tfiga@chromium.org>
>  
>    - Documented the memory-to-memory decoder interface.
> +  - Documented the memory-to-memory encoder interface.
>  
>  - H Schimek, Michael <mschimek@gmx.at>
>  
> @@ -68,6 +69,7 @@ Authors, in alphabetical order:
>  - Osciak, Pawel <posciak@chromium.org>
>  
>    - Documented the memory-to-memory decoder interface.
> +  - Documented the memory-to-memory encoder interface.
>  
>  - Osciak, Pawel <pawel@osciak.com>
>  
> 

One general comment:

you often talk about 'the driver must', e.g.:

"The driver must process and encode as normal all ``OUTPUT`` buffers
queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued."

But this is not a driver specification, it is an API specification.

I think it would be better to phrase it like this:

"All ``OUTPUT`` buffers queued by the client before the :c:func:`VIDIOC_ENCODER_CMD`
was issued will be processed and encoded as normal."

(or perhaps even 'shall' if you want to be really formal)

End-users do not really care what drivers do, they want to know what the API does,
and that implies rules for drivers.

Regards,

	Hans
Tomasz Figa Aug. 7, 2018, 6:07 a.m. UTC | #3
Hi Philipp,

On Wed, Jul 25, 2018 at 10:41 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> > Due to complexity of the video encoding process, the V4L2 drivers of
> > stateful encoder hardware require specific sequences of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > encoding, encode parameters change, drain and reset.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or coda.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the encoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
>
> Thanks a lot for the update,

Thanks for review!

> > ---
> >  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
> >  Documentation/media/uapi/v4l/devices.rst     |   1 +
> >  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
> >  3 files changed, 553 insertions(+)
> >  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> > new file mode 100644
> > index 000000000000..28be1698e99c
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> > @@ -0,0 +1,550 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _encoder:
> > +
> > +****************************************
> > +Memory-to-memory Video Encoder Interface
> > +****************************************
> > +
> > +Input data to a video encoder are raw video frames in display order
> > +to be encoded into the output bitstream. Output data are complete chunks of
> > +valid bitstream, including all metadata, headers, etc. The resulting stream
> > +must not need any further post-processing by the client.
> > +
> > +Performing software stream processing, header generation etc. in the driver
> > +in order to support this interface is strongly discouraged. In case such
> > +operations are needed, use of Stateless Video Encoder Interface (in
> > +development) is strongly advised.
> > +
> [...]
> > +
> > +Commit points
> > +=============
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
> > +of the driver.
> > +
> > +1. Setting format on ``CAPTURE`` queue may change the set of formats
> > +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> > +   means that ``OUTPUT`` format may be reset and the client must not
> > +   rely on the previously set format being preserved.
>
> Since the only property of the CAPTURE format that can be set directly
> via S_FMT is the pixelformat, should this be made explicit?
>
> 1. Setting pixelformat on ``CAPTURE`` queue may change the set of
>    formats supported/advertised on the ``OUTPUT`` queue. In particular,
>    it also means that ``OUTPUT`` format may be reset and the client
>    must not rely on the previously set format being preserved.
>
> ?
>
> > +2. Enumerating formats on ``OUTPUT`` queue must only return formats
> > +   supported for the ``CAPTURE`` format currently set.
>
> Same here, as it usually is the codec selected by CAPTURE pixelformat
> that determines the supported OUTPUT pixelformats and resolutions.
>
> 2. Enumerating formats on ``OUTPUT`` queue must only return formats
>    supported for the ``CAPTURE`` pixelformat currently set.
>
> This could prevent the possible misconception that the CAPTURE
> width/height might in any form limit the OUTPUT format, when in fact it
> is determined by it.
>
> > +3. Setting/changing format on ``OUTPUT`` queue does not change formats
> > +   available on ``CAPTURE`` queue.
>
> 3. Setting/changing format on the ``OUTPUT`` queue does not change
>    pixelformats available on the ``CAPTURE`` queue.
>
> ?
>
> Because setting OUTPUT width/height or CROP SELECTION very much limits
> the possible values of CAPTURE width/height.
>
> Maybe 'available' in this context should be specified somewhere to mean
> 'returned by ENUM_FMT and allowed by S_FMT/TRY_FMT'.
>
> > +   An attempt to set ``OUTPUT`` format that
> > +   is not supported for the currently selected ``CAPTURE`` format must
> > +   result in the driver adjusting the requested format to an acceptable
> > +   one.
>
>    An attempt to set ``OUTPUT`` format that is not supported for the
>
> currently selected ``CAPTURE`` pixelformat must result in the driver
>
> adjusting the requested format to an acceptable one.
>
> > +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
> > +   supported coded formats, irrespective of the current ``OUTPUT``
> > +   format.
> > +
> > +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> > +   change format on it.
> > +
> > +To summarize, setting formats and allocation must always start with the
> > +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> > +set of supported formats for the ``OUTPUT`` queue.
>
> To summarize, setting formats and allocation must always start with
> setting the encoded pixelformat on the ``CAPTURE`` queue. The
> ``CAPTURE`` queue is the master that governs the set of supported
> formats for the ``OUTPUT`` queue.
>
> Or is this too verbose?

I'm personally okay with making this "pixel format" specifically, but
I thought we wanted to extend this later to other things, such as
colorimetry. Would it introduce any problems if we keep it more
general?

To avoid any ambiguities, we could add a table that lists all the
state accessible by user space, which would clearly mark CAPTURE
width/height as fixed by the driver.

Best regards,
Tomasz
Tomasz Figa Aug. 7, 2018, 6:54 a.m. UTC | #4
Hi Hans,

On Wed, Jul 25, 2018 at 10:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 24/07/18 16:06, Tomasz Figa wrote:
> > Due to complexity of the video encoding process, the V4L2 drivers of
> > stateful encoder hardware require specific sequences of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > encoding, encode parameters change, drain and reset.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or coda.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the encoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
> >  Documentation/media/uapi/v4l/devices.rst     |   1 +
> >  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
> >  3 files changed, 553 insertions(+)
> >  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> > new file mode 100644
> > index 000000000000..28be1698e99c
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> > @@ -0,0 +1,550 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _encoder:
> > +
> > +****************************************
> > +Memory-to-memory Video Encoder Interface
> > +****************************************
> > +
> > +Input data to a video encoder are raw video frames in display order
> > +to be encoded into the output bitstream. Output data are complete chunks of
> > +valid bitstream, including all metadata, headers, etc. The resulting stream
> > +must not need any further post-processing by the client.
>
> Due to the confusing use capture and output I wonder if it would be better to
> rephrase this as follows:
>
> "A video encoder takes raw video frames in display order and encodes them into
> a bitstream. It generates complete chunks of the bitstream, including
> all metadata, headers, etc. The resulting bitstream does not require any further
> post-processing by the client."
>
> Something similar should be done for the decoder documentation.
>

First, thanks a lot for review!

Sounds good to me, it indeed feels much easier to read, thanks.

[snip]
> > +
> > +IDR
> > +   a type of a keyframe in H.264-encoded stream, which clears the list of
> > +   earlier reference frames (DPBs)
>
> Same problem as with the previous patch: it doesn't say what IDR stands for.
> It also refers to DPBs, but DPB is not part of this glossary.

Ack.

>
> Perhaps the glossary of the encoder/decoder should be combined.
>

There are some terms that have slightly different nuance between
encoder and decoder, so while it would be possible to just include
both meanings (as it was in RFC), I wonder if it wouldn't make it more
difficult to read, also given that it would move it to a separate
page. No strong opinion, though.

[snip]
> > +
> > +Initialization
> > +==============
> > +
> > +1. *[optional]* Enumerate supported formats and resolutions. See
> > +   capability enumeration.
>
> capability enumeration. -> 'Querying capabilities' above.
>

Ack.

[snip]
> > +4. The client may set the raw source format on the ``OUTPUT`` queue via
> > +   :c:func:`VIDIOC_S_FMT`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``pixelformat``
> > +         raw format of the source
> > +
> > +     ``width``, ``height``
> > +         source resolution
> > +
> > +     ``num_planes`` (for _MPLANE)
> > +         set to number of planes for pixelformat
> > +
> > +     ``sizeimage``, ``bytesperline``
> > +         follow standard semantics
> > +
> > +   * **Return fields:**
> > +
> > +     ``width``, ``height``
> > +         may be adjusted by driver to match alignment requirements, as
> > +         required by the currently selected formats
> > +
> > +     ``sizeimage``, ``bytesperline``
> > +         follow standard semantics
> > +
> > +   * Setting the source resolution will reset visible resolution to the
> > +     adjusted source resolution rounded up to the closest visible
> > +     resolution supported by the driver. Similarly, coded resolution will
>
> coded -> the coded

Ack.

>
> > +     be reset to source resolution rounded up to the closest coded
>
> reset -> set
> source -> the source

Ack.

>
> > +     resolution supported by the driver (typically a multiple of
> > +     macroblock size).
>
> The first sentence of this paragraph is very confusing. It needs a bit more work,
> I think.

Actually, this applies to all crop rectangles, not just visible
resolution. How about the following?

    Setting the source resolution will reset the crop rectangles to
default values
    corresponding to the new resolution, as described further in this document.
    Similarly, the coded resolution will be reset to match source
resolution rounded up
    to the closest coded resolution supported by the driver (typically
a multiple of
    macroblock size).

>
> > +
> > +   .. note::
> > +
> > +      This step is not strictly required, since ``OUTPUT`` is expected to
> > +      have a valid default format. However, the client needs to ensure that
> > +      ``OUTPUT`` format matches its expectations via either
> > +      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
> > +      being the typical scenario, since the default format is unlikely to
> > +      be what the client needs.
>
> Hmm. I'm not sure if this note should be included. It's good practice to always
> set the output format. I think the note confuses more than that it helps. IMHO.
>

I agree with you on that. In RFC, Philipp noticed that technically
S_FMT is not mandatory and that there might be some use case where the
already set format matches client's expectation. I've added this note
to cover that. Philipp, do you still think we need it?

> > +
> > +5. *[optional]* Set visible resolution for the stream metadata via
>
> Set -> Set the
>

Ack.

> > +   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``target``
> > +         set to ``V4L2_SEL_TGT_CROP``
> > +
> > +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > +         visible rectangle; this must fit within the framebuffer resolution
>
> Should that be "source resolution"? Or the resolution returned by "CROP_BOUNDS"?
>

Referring to V4L2_SEL_TGT_CROP_BOUNDS rather than some arbitrary
resolution is better indeed, will change.

> > +         and might be subject to adjustment to match codec and hardware
> > +         constraints
> > +
> > +   * **Return fields:**
> > +
> > +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > +         visible rectangle adjusted by the driver
> > +
> > +   * The driver must expose following selection targets on ``OUTPUT``:
> > +
> > +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> > +         maximum crop bounds within the source buffer supported by the
> > +         encoder
> > +
> > +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> > +         suggested cropping rectangle that covers the whole source picture
> > +
> > +     ``V4L2_SEL_TGT_CROP``
> > +         rectangle within the source buffer to be encoded into the
> > +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> > +
> > +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> > +         maximum rectangle within the coded resolution, which the cropped
> > +         source frame can be output into; always equal to (0, 0)x(width of
> > +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> > +         hardware does not support compose/scaling
> > +
> > +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> > +         equal to ``V4L2_SEL_TGT_CROP``
> > +
> > +     ``V4L2_SEL_TGT_COMPOSE``
> > +         rectangle within the coded frame, which the cropped source frame
> > +         is to be output into; defaults to
> > +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> > +         additional compose/scaling capabilities; resulting stream will
> > +         have this rectangle encoded as the visible rectangle in its
> > +         metadata
> > +
> > +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> > +         always equal to coded resolution of the stream, as selected by the
> > +         encoder based on source resolution and crop/compose rectangles
>
> Are there codec drivers that support composition? I can't remember seeing any.
>

Hmm, I was convinced that MFC could scale and we just lacked support
in the driver, but I checked the documentation and it doesn't seem to
be able to do so. I guess we could drop the COMPOSE rectangles for
now, until we find any hardware that can do scaling or composing on
the fly.

> > +
> > +   .. note::
> > +
> > +      The driver may adjust the crop/compose rectangles to the nearest
> > +      supported ones to meet codec and hardware requirements.
> > +
> > +6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
> > +   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
> > +
> > +   * **Required fields:**
> > +
> > +     ``count``
> > +         requested number of buffers to allocate; greater than zero
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
> > +         ``CAPTURE``
> > +
> > +     ``memory``
> > +         follows standard semantics
> > +
> > +   * **Return fields:**
> > +
> > +     ``count``
> > +         adjusted to allocated number of buffers
> > +
> > +   * The driver must adjust count to minimum of required number of
> > +     buffers for given format and count passed.
>
> I'd rephrase this:
>
>         The driver must adjust ``count`` to the maximum of ``count`` and
>         the required number of buffers for the given format.
>
> Note that this is set to the maximum, not minimum.
>

Good catch. Will fix it.

> > The client must
> > +     check this value after the ioctl returns to get the number of
> > +     buffers actually allocated.
> > +
> > +   .. note::
> > +
> > +      To allocate more than minimum number of buffers (for pipeline
>
> than -> than the
>

Ack.

> > +      depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) or
> > +      G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``), respectively,
> > +      to get the minimum number of buffers required by the
> > +      driver/format, and pass the obtained value plus the number of
> > +      additional buffers needed in count field to :c:func:`VIDIOC_REQBUFS`.
>
> count -> the ``count``
>

Ack.

> > +
> > +7. Begin streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> > +   :c:func:`VIDIOC_STREAMON`. This may be performed in any order. Actual
>
> Actual -> The actual
>

Ack.

> > +   encoding process starts when both queues start streaming.
> > +
> > +.. note::
> > +
> > +   If the client stops ``CAPTURE`` during the encode process and then
> > +   restarts it again, the encoder will be expected to generate a stream
> > +   independent from the stream generated before the stop. Depending on the
> > +   coded format, that may imply that:
> > +
> > +   * encoded frames produced after the restart must not reference any
> > +     frames produced before the stop, e.g. no long term references for
> > +     H264,
> > +
> > +   * any headers that must be included in a standalone stream must be
> > +     produced again, e.g. SPS and PPS for H264.
> > +
> > +Encoding
> > +========
> > +
> > +This state is reached after a successful initialization sequence. In
> > +this state, client queues and dequeues buffers to both queues via
> > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> > +semantics.
> > +
> > +Both queues operate independently, following standard behavior of V4L2
> > +buffer queues and memory-to-memory devices. In addition, the order of
> > +encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> > +queuing raw frames to ``OUTPUT`` queue, due to properties of selected coded
> > +format, e.g. frame reordering. The client must not assume any direct
> > +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> > +reported by :c:type:`v4l2_buffer` ``timestamp``.
>
> Same question as for the decoder: are you sure about that?
>

I think it's the same answer here. That's why we have the timestamp
copy mechanism, right?

> > +
> > +Encoding parameter changes
> > +==========================
> > +
> > +The client is allowed to use :c:func:`VIDIOC_S_CTRL` to change encoder
> > +parameters at any time. The availability of parameters is driver-specific
> > +and the client must query the driver to find the set of available controls.
> > +
> > +The ability to change each parameter during encoding of is driver-specific,
>
> Remove spurious 'of'
>
> > +as per standard semantics of the V4L2 control interface. The client may
>
> per -> per the
>
> > +attempt setting a control of its interest during encoding and if it the
>
> Remove spurious 'it'
>
> > +operation fails with the -EBUSY error code, ``CAPTURE`` queue needs to be
>
> ``CAPTURE`` -> the ``CAPTURE``
>
> > +stopped for the configuration change to be allowed (following the drain
> > +sequence will be  needed to avoid losing already queued/encoded frames).
>
> losing -> losing the
>
> > +
> > +The timing of parameter update is driver-specific, as per standard
>
> update -> updates
> per -> per the
>
> > +semantics of the V4L2 control interface. If the client needs to apply the
> > +parameters exactly at specific frame and the encoder supports it, using
>
> using -> using the

Ack +6

>
> > +Request API should be considered.
>
> This makes the assumption that the Request API will be merged at about the
> same time as this document. Which is at the moment a reasonable assumption,
> to be fair.
>

We can easily remove this, if it doesn't happen, but I'd prefer not to
need to. ;)

> > +
> > +Drain
> > +=====
> > +
> > +To ensure that all queued ``OUTPUT`` buffers have been processed and
> > +related ``CAPTURE`` buffers output to the client, the following drain
>
> related -> the related
>

Ack.

> > +sequence may be followed. After the drain sequence is complete, the client
> > +has received all encoded frames for all ``OUTPUT`` buffers queued before
> > +the sequence was started.
> > +
> > +1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``cmd``
> > +         set to ``V4L2_ENC_CMD_STOP``
> > +
> > +     ``flags``
> > +         set to 0
> > +
> > +     ``pts``
> > +         set to 0
> > +
> > +2. The driver must process and encode as normal all ``OUTPUT`` buffers
> > +   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
> > +
> > +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
> > +   processed:
> > +
> > +   * Once all decoded frames (if any) are ready to be dequeued on the
> > +     ``CAPTURE`` queue the driver must send a ``V4L2_EVENT_EOS``. The
> > +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
> > +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
> > +     last frame (if any) produced as a result of processing the ``OUTPUT``
> > +     buffers queued before
> > +     ``V4L2_ENC_CMD_STOP``.
>
> Hmm, this is somewhat awkward phrasing. Can you take another look at this?
>

How about this?

3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
   processed:

   * The driver returns all ``CAPTURE`` buffers corresponding to processed
     ``OUTPUT`` buffers, if any. The last buffer must have
``V4L2_BUF_FLAG_LAST``
     set in its :c:type:`v4l2_buffer` ``flags`` field.

   * The driver sends a ``V4L2_EVENT_EOS`` event.

> > +
> > +   * If no more frames are left to be returned at the point of handling
> > +     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
> > +     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> > +     ``V4L2_BUF_FLAG_LAST`` set.
> > +
> > +   * Any attempts to dequeue more buffers beyond the buffer marked with
> > +     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
> > +     :c:func:`VIDIOC_DQBUF`.
> > +
> > +4. At this point, encoding is paused and the driver will accept, but not
> > +   process any newly queued ``OUTPUT`` buffers until the client issues
>
> issues -> issues a
>

Ack.

[snip]
> > +Commit points
> > +=============
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
> > +of the driver.
> > +
> > +1. Setting format on ``CAPTURE`` queue may change the set of formats
>
> format -> the format
>
> > +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> > +   means that ``OUTPUT`` format may be reset and the client must not
>
> that -> that the
>
> > +   rely on the previously set format being preserved.
> > +
> > +2. Enumerating formats on ``OUTPUT`` queue must only return formats
>
> on -> on the
>
> > +   supported for the ``CAPTURE`` format currently set.
>
> 'for the current ``CAPTURE`` format.'
>
> > +
> > +3. Setting/changing format on ``OUTPUT`` queue does not change formats
>
> format -> the format
> on -> on the
>
> > +   available on ``CAPTURE`` queue. An attempt to set ``OUTPUT`` format that
>
> on -> on the
> set -> set the
>
> > +   is not supported for the currently selected ``CAPTURE`` format must
> > +   result in the driver adjusting the requested format to an acceptable
> > +   one.
> > +
> > +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
>
> on -> on the
>
> > +   supported coded formats, irrespective of the current ``OUTPUT``
> > +   format.
> > +
> > +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> > +   change format on it.
>
> format -> the format
>

Ack +7

> > +
> > +To summarize, setting formats and allocation must always start with the
> > +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> > +set of supported formats for the ``OUTPUT`` queue.
> > diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
> > index 12d43fe711cf..1822c66c2154 100644
> > --- a/Documentation/media/uapi/v4l/devices.rst
> > +++ b/Documentation/media/uapi/v4l/devices.rst
> > @@ -16,6 +16,7 @@ Interfaces
> >      dev-osd
> >      dev-codec
> >      dev-decoder
> > +    dev-encoder
> >      dev-effect
> >      dev-raw-vbi
> >      dev-sliced-vbi
> > diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> > index 65dc096199ad..2ef6693b9499 100644
> > --- a/Documentation/media/uapi/v4l/v4l2.rst
> > +++ b/Documentation/media/uapi/v4l/v4l2.rst
> > @@ -56,6 +56,7 @@ Authors, in alphabetical order:
> >  - Figa, Tomasz <tfiga@chromium.org>
> >
> >    - Documented the memory-to-memory decoder interface.
> > +  - Documented the memory-to-memory encoder interface.
> >
> >  - H Schimek, Michael <mschimek@gmx.at>
> >
> > @@ -68,6 +69,7 @@ Authors, in alphabetical order:
> >  - Osciak, Pawel <posciak@chromium.org>
> >
> >    - Documented the memory-to-memory decoder interface.
> > +  - Documented the memory-to-memory encoder interface.
> >
> >  - Osciak, Pawel <pawel@osciak.com>
> >
> >
>
> One general comment:
>
> you often talk about 'the driver must', e.g.:
>
> "The driver must process and encode as normal all ``OUTPUT`` buffers
> queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued."
>
> But this is not a driver specification, it is an API specification.
>
> I think it would be better to phrase it like this:
>
> "All ``OUTPUT`` buffers queued by the client before the :c:func:`VIDIOC_ENCODER_CMD`
> was issued will be processed and encoded as normal."
>
> (or perhaps even 'shall' if you want to be really formal)
>
> End-users do not really care what drivers do, they want to know what the API does,
> and that implies rules for drivers.

While I see the point, I'm not fully convinced that it makes the
documentation easier to read. We defined "client" for the purpose of
not using the passive form too much, so possibly we could also define
"driver" in the glossary. Maybe it's just me, but I find that
referring directly to both sides of the API and using the active form
is much easier to read.

Possibly just replacing "driver" with "encoder" would ease your concern?

Best regards,
Tomasz
Hans Verkuil Aug. 7, 2018, 7:25 a.m. UTC | #5
On 08/07/2018 08:54 AM, Tomasz Figa wrote:
> Hi Hans,
> 
> On Wed, Jul 25, 2018 at 10:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> On 24/07/18 16:06, Tomasz Figa wrote:
>>> Due to complexity of the video encoding process, the V4L2 drivers of
>>> stateful encoder hardware require specific sequences of V4L2 API calls
>>> to be followed. These include capability enumeration, initialization,
>>> encoding, encode parameters change, drain and reset.
>>>
>>> Specifics of the above have been discussed during Media Workshops at
>>> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
>>> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
>>> originated at those events was later implemented by the drivers we already
>>> have merged in mainline, such as s5p-mfc or coda.
>>>
>>> The only thing missing was the real specification included as a part of
>>> Linux Media documentation. Fix it now and document the encoder part of
>>> the Codec API.
>>>
>>> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
>>> ---
>>>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>>>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>>>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>>>  3 files changed, 553 insertions(+)
>>>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
>>>
>>> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
>>> new file mode 100644
>>> index 000000000000..28be1698e99c
>>> --- /dev/null
>>> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
>>> @@ -0,0 +1,550 @@
>>> +.. -*- coding: utf-8; mode: rst -*-
>>> +
>>> +.. _encoder:
>>> +
>>> +****************************************
>>> +Memory-to-memory Video Encoder Interface
>>> +****************************************
>>> +
>>> +Input data to a video encoder are raw video frames in display order
>>> +to be encoded into the output bitstream. Output data are complete chunks of
>>> +valid bitstream, including all metadata, headers, etc. The resulting stream
>>> +must not need any further post-processing by the client.
>>
>> Due to the confusing use capture and output I wonder if it would be better to
>> rephrase this as follows:
>>
>> "A video encoder takes raw video frames in display order and encodes them into
>> a bitstream. It generates complete chunks of the bitstream, including
>> all metadata, headers, etc. The resulting bitstream does not require any further
>> post-processing by the client."
>>
>> Something similar should be done for the decoder documentation.
>>
> 
> First, thanks a lot for review!
> 
> Sounds good to me, it indeed feels much easier to read, thanks.
> 
> [snip]
>>> +
>>> +IDR
>>> +   a type of a keyframe in H.264-encoded stream, which clears the list of
>>> +   earlier reference frames (DPBs)
>>
>> Same problem as with the previous patch: it doesn't say what IDR stands for.
>> It also refers to DPBs, but DPB is not part of this glossary.
> 
> Ack.
> 
>>
>> Perhaps the glossary of the encoder/decoder should be combined.
>>
> 
> There are some terms that have slightly different nuance between
> encoder and decoder, so while it would be possible to just include
> both meanings (as it was in RFC), I wonder if it wouldn't make it more
> difficult to read, also given that it would move it to a separate
> page. No strong opinion, though.

I don't have a strong opinion either. Let's keep it as is, we can always
change it later.

>>> +   * Setting the source resolution will reset visible resolution to the
>>> +     adjusted source resolution rounded up to the closest visible
>>> +     resolution supported by the driver. Similarly, coded resolution will
>>
>> coded -> the coded
> 
> Ack.
> 
>>
>>> +     be reset to source resolution rounded up to the closest coded
>>
>> reset -> set
>> source -> the source
> 
> Ack.
> 
>>
>>> +     resolution supported by the driver (typically a multiple of
>>> +     macroblock size).
>>
>> The first sentence of this paragraph is very confusing. It needs a bit more work,
>> I think.
> 
> Actually, this applies to all crop rectangles, not just visible
> resolution. How about the following?
> 
>     Setting the source resolution will reset the crop rectangles to
> default values

default -> their default

>     corresponding to the new resolution, as described further in this document.

Does 'this document' refer to this encoder chapter, or the whole v4l2 spec? It
might be better to provide an explicit link here.

>     Similarly, the coded resolution will be reset to match source

source -> the source

> resolution rounded up
>     to the closest coded resolution supported by the driver (typically
> a multiple of

of -> of the

>     macroblock size).

Anyway, this is much better.

>>> +Both queues operate independently, following standard behavior of V4L2
>>> +buffer queues and memory-to-memory devices. In addition, the order of
>>> +encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
>>> +queuing raw frames to ``OUTPUT`` queue, due to properties of selected coded
>>> +format, e.g. frame reordering. The client must not assume any direct
>>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
>>> +reported by :c:type:`v4l2_buffer` ``timestamp``.
>>
>> Same question as for the decoder: are you sure about that?
>>
> 
> I think it's the same answer here. That's why we have the timestamp
> copy mechanism, right?

See my reply from a few minutes ago. I'm not convinced copying timestamps
makes sense for codecs.

>>> +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
>>> +   processed:
>>> +
>>> +   * Once all decoded frames (if any) are ready to be dequeued on the
>>> +     ``CAPTURE`` queue the driver must send a ``V4L2_EVENT_EOS``. The
>>> +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
>>> +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
>>> +     last frame (if any) produced as a result of processing the ``OUTPUT``
>>> +     buffers queued before
>>> +     ``V4L2_ENC_CMD_STOP``.
>>
>> Hmm, this is somewhat awkward phrasing. Can you take another look at this?
>>
> 
> How about this?
> 
> 3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
>    processed:
> 
>    * The driver returns all ``CAPTURE`` buffers corresponding to processed
>      ``OUTPUT`` buffers, if any. The last buffer must have
> ``V4L2_BUF_FLAG_LAST``
>      set in its :c:type:`v4l2_buffer` ``flags`` field.
> 
>    * The driver sends a ``V4L2_EVENT_EOS`` event.

I'd rephrase that last sentence to:

* Once the last buffer is returned the driver sends a ``V4L2_EVENT_EOS`` event.

>> One general comment:
>>
>> you often talk about 'the driver must', e.g.:
>>
>> "The driver must process and encode as normal all ``OUTPUT`` buffers
>> queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued."
>>
>> But this is not a driver specification, it is an API specification.
>>
>> I think it would be better to phrase it like this:
>>
>> "All ``OUTPUT`` buffers queued by the client before the :c:func:`VIDIOC_ENCODER_CMD`
>> was issued will be processed and encoded as normal."
>>
>> (or perhaps even 'shall' if you want to be really formal)
>>
>> End-users do not really care what drivers do, they want to know what the API does,
>> and that implies rules for drivers.
> 
> While I see the point, I'm not fully convinced that it makes the
> documentation easier to read. We defined "client" for the purpose of
> not using the passive form too much, so possibly we could also define
> "driver" in the glossary. Maybe it's just me, but I find that
> referring directly to both sides of the API and using the active form
> is much easier to read.
> 
> Possibly just replacing "driver" with "encoder" would ease your concern?

Actually, yes. I think that would work quite well.

Also, the phrase "the driver must" can be replaced by "the encoder will"
which describes the behavior of the encoder, which in turn defines what
the underlying driver must do.

Regards,

	Hans
Ezequiel Garcia Sept. 7, 2018, 8:17 p.m. UTC | #6
On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> Due to complexity of the video encoding process, the V4L2 drivers of
> stateful encoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> encoding, encode parameters change, drain and reset.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the encoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>  3 files changed, 553 insertions(+)
>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> new file mode 100644
> index 000000000000..28be1698e99c
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> @@ -0,0 +1,550 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _encoder:
> +
> +****************************************
> +Memory-to-memory Video Encoder Interface
> +****************************************
> +
> +Input data to a video encoder are raw video frames in display order
> +to be encoded into the output bitstream. Output data are complete chunks of
> +valid bitstream, including all metadata, headers, etc. The resulting stream
> +must not need any further post-processing by the client.
> +
> +Performing software stream processing, header generation etc. in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Encoder Interface (in
> +development) is strongly advised.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
> +   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
> +   unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise,
> +   depending on driver capabilities and following the general V4L2
> +   guidelines.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
> +   containing data (encoded frame/stream) that resulted from processing
> +   buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue; the queue of buffers containing encoded
> +   bitstream; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
> +   hardware into ``CAPTURE`` buffers
> +
> +client
> +   application client communicating with the driver implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.);
> +   see also: raw format
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec and hardware requirements;
> +   typically visible resolution rounded up to full macroblocks; see also:
> +   visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ from display order if
> +   coded format includes a feature of frame reordering; ``CAPTURE`` buffers
> +   must be returned by the driver in decode order
> +
> +display order
> +   the order in which frames must be displayed; ``OUTPUT`` buffers must be
> +   queued by the client in display order
> +
> +IDR
> +   a type of a keyframe in H.264-encoded stream, which clears the list of
> +   earlier reference frames (DPBs)
> +
> +keyframe
> +   an encoded frame that does not reference frames decoded earlier, i.e.
> +   can be decoded fully on its own.
> +
> +macroblock
> +   a processing unit in image and video compression formats based on linear
> +   block transforms (e.g. H264, VP8, VP9); codec-specific, but for most of
> +   popular codecs the size is 16x16 samples (pixels)
> +
> +OUTPUT
> +   the source buffer queue; the queue of buffers containing raw frames;
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
> +   from ``OUTPUT`` buffers
> +
> +PPS
> +   Picture Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may start/continue, without
> +   any previous state/data present, e.g.: a keyframe (VP8/VP9) or
> +   SPS/PPS/IDR sequence (H.264); a resume point is required to start decode
> +   of a new stream, or to resume decoding after a seek
> +
> +source
> +   data fed to the encoder; ``OUTPUT``
> +
> +source height
> +   height in pixels for given source resolution
> +
> +source resolution
> +   resolution in pixels of source frames being source to the encoder and
> +   subject to further cropping to the bounds of visible resolution
> +
> +source width
> +   width in pixels for given source resolution
> +
> +SPS
> +   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +stream metadata
> +   additional (non-visual) information contained inside encoded bitstream;
> +   for example: coded resolution, visible resolution, codec profile
> +
> +visible height
> +   height for given visible resolution; display height
> +
> +visible resolution
> +   stream resolution of the visible picture, in pixels, to be used for
> +   display purposes; must be smaller or equal to coded resolution;
> +   display resolution
> +
> +visible width
> +   width for given visible resolution; display width
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> +
> +   * The driver must always return the full set of supported formats,
> +     irrespective of the format set on the ``OUTPUT`` queue.
> +
> +2. To enumerate the set of supported raw formats, the client may call
> +   :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> +
> +   * The driver must return only the formats supported for the format
> +     currently active on ``CAPTURE``.
> +

Paul and I where discussing about the default active format on CAPTURE
and OUTPUT queues. That is, the format that is active (if any) right
after driver probes.

Currently, the v4l2-compliance tool tests the default active format,
by requiring drivers to support:

    fmt = g_fmt()
    s_fmt(fmt)

Is this actually required? Should we also require this for stateful
and stateless codecs? If yes, should it be documented?

Regards,
Ezequiel
Tomasz Figa Sept. 10, 2018, 3:34 a.m. UTC | #7
On Sat, Sep 8, 2018 at 5:17 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
>
> On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
[snip]
> > +Querying capabilities
> > +=====================
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the
> > +   client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> > +
> > +   * The driver must always return the full set of supported formats,
> > +     irrespective of the format set on the ``OUTPUT`` queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client may call
> > +   :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> > +
> > +   * The driver must return only the formats supported for the format
> > +     currently active on ``CAPTURE``.
> > +
>
> Paul and I where discussing about the default active format on CAPTURE
> and OUTPUT queues. That is, the format that is active (if any) right
> after driver probes.
>
> Currently, the v4l2-compliance tool tests the default active format,
> by requiring drivers to support:
>
>     fmt = g_fmt()
>     s_fmt(fmt)
>
> Is this actually required? Should we also require this for stateful
> and stateless codecs? If yes, should it be documented?

The general V4L2 principle is that drivers must maintain some sane
default state right from when they are exposed to the userspace. I'd
try to stick to the common V4L2 semantics, unless there is a very good
reason not to do so.

Note that we actually diverged from it on CAPTURE state for stateful
decoders, because we return an error, if any format-related ioctl is
called on CAPTURE queue before OUTPUT queue is initialized with a
valid coded format, either explicitly by the client or implicitly via
bitstream parsing. The reason was backwards compatibility with clients
which don't handle source change events. If that wasn't the case, we
could have made the CAPTURE queue completely independent and have the
format there reset with source change event, whenever it becomes
invalid due to things like resolution change or speculative
initialization miss, which would make things much more symmetrical.

Best regards,
Tomasz
Tomasz Figa Oct. 16, 2018, 7:36 a.m. UTC | #8
On Tue, Aug 7, 2018 at 3:54 PM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi Hans,
>
> On Wed, Jul 25, 2018 at 10:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >
> > On 24/07/18 16:06, Tomasz Figa wrote:
[snip]
> > > +4. The client may set the raw source format on the ``OUTPUT`` queue via
> > > +   :c:func:`VIDIOC_S_FMT`.
> > > +
> > > +   * **Required fields:**
> > > +
> > > +     ``type``
> > > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > > +
> > > +     ``pixelformat``
> > > +         raw format of the source
> > > +
> > > +     ``width``, ``height``
> > > +         source resolution
> > > +
> > > +     ``num_planes`` (for _MPLANE)
> > > +         set to number of planes for pixelformat
> > > +
> > > +     ``sizeimage``, ``bytesperline``
> > > +         follow standard semantics
> > > +
> > > +   * **Return fields:**
> > > +
> > > +     ``width``, ``height``
> > > +         may be adjusted by driver to match alignment requirements, as
> > > +         required by the currently selected formats
> > > +
> > > +     ``sizeimage``, ``bytesperline``
> > > +         follow standard semantics
> > > +
> > > +   * Setting the source resolution will reset visible resolution to the
> > > +     adjusted source resolution rounded up to the closest visible
> > > +     resolution supported by the driver. Similarly, coded resolution will
> >
> > coded -> the coded
>
> Ack.
>
> >
> > > +     be reset to source resolution rounded up to the closest coded
> >
> > reset -> set
> > source -> the source
>
> Ack.
>

Actually, I'd prefer to keep it at "reset", so that it signifies the
fact that the driver will actually override whatever was set by the
application before.

[snip]
> > > +   * The driver must expose following selection targets on ``OUTPUT``:
> > > +
> > > +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> > > +         maximum crop bounds within the source buffer supported by the
> > > +         encoder
> > > +
> > > +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> > > +         suggested cropping rectangle that covers the whole source picture
> > > +
> > > +     ``V4L2_SEL_TGT_CROP``
> > > +         rectangle within the source buffer to be encoded into the
> > > +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> > > +
> > > +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> > > +         maximum rectangle within the coded resolution, which the cropped
> > > +         source frame can be output into; always equal to (0, 0)x(width of
> > > +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> > > +         hardware does not support compose/scaling
> > > +
> > > +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> > > +         equal to ``V4L2_SEL_TGT_CROP``
> > > +
> > > +     ``V4L2_SEL_TGT_COMPOSE``
> > > +         rectangle within the coded frame, which the cropped source frame
> > > +         is to be output into; defaults to
> > > +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> > > +         additional compose/scaling capabilities; resulting stream will
> > > +         have this rectangle encoded as the visible rectangle in its
> > > +         metadata
> > > +
> > > +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> > > +         always equal to coded resolution of the stream, as selected by the
> > > +         encoder based on source resolution and crop/compose rectangles
> >
> > Are there codec drivers that support composition? I can't remember seeing any.
> >
>
> Hmm, I was convinced that MFC could scale and we just lacked support
> in the driver, but I checked the documentation and it doesn't seem to
> be able to do so. I guess we could drop the COMPOSE rectangles for
> now, until we find any hardware that can do scaling or composing on
> the fly.
>

On the other hand, having them defined already wouldn't complicate
existing drivers too much either, because they would just handle all
of them in the same switch case, i.e.

case V4L2_SEL_TGT_COMPOSE_BOUNDS:
case V4L2_SEL_TGT_COMPOSE_DEFAULT:
case V4L2_SEL_TGT_COMPOSE:
case V4L2_SEL_TGT_COMPOSE_PADDED:
     return visible_rectangle;

That would need one change, though. We would define
V4L2_SEL_TGT_COMPOSE_DEFAULT to be equal to (0, 0)x(width of
V4L2_SEL_TGT_CROP - 1, height of ``V4L2_SEL_TGT_CROP - 1), which
makes more sense than current definition, since it would bypass any
compose/scaling by default.

> > > +
> > > +   .. note::
> > > +
> > > +      The driver may adjust the crop/compose rectangles to the nearest
> > > +      supported ones to meet codec and hardware requirements.
> > > +
> > > +6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
> > > +   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
> > > +
> > > +   * **Required fields:**
> > > +
> > > +     ``count``
> > > +         requested number of buffers to allocate; greater than zero
> > > +
> > > +     ``type``
> > > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
> > > +         ``CAPTURE``
> > > +
> > > +     ``memory``
> > > +         follows standard semantics
> > > +
> > > +   * **Return fields:**
> > > +
> > > +     ``count``
> > > +         adjusted to allocated number of buffers
> > > +
> > > +   * The driver must adjust count to minimum of required number of
> > > +     buffers for given format and count passed.
> >
> > I'd rephrase this:
> >
> >         The driver must adjust ``count`` to the maximum of ``count`` and
> >         the required number of buffers for the given format.
> >
> > Note that this is set to the maximum, not minimum.
> >
>
> Good catch. Will fix it.
>

Actually this is not always the maximum. Encoders may also have
constraints on the maximum number of buffers, so how about just making
it a bit less specific:

The count will be adjusted by the driver to match the hardware
requirements. The client must check the final value after the ioctl
returns to get the number of buffers allocated.

[snip]
> > One general comment:
> >
> > you often talk about 'the driver must', e.g.:
> >
> > "The driver must process and encode as normal all ``OUTPUT`` buffers
> > queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued."
> >
> > But this is not a driver specification, it is an API specification.
> >
> > I think it would be better to phrase it like this:
> >
> > "All ``OUTPUT`` buffers queued by the client before the :c:func:`VIDIOC_ENCODER_CMD`
> > was issued will be processed and encoded as normal."
> >
> > (or perhaps even 'shall' if you want to be really formal)
> >
> > End-users do not really care what drivers do, they want to know what the API does,
> > and that implies rules for drivers.
>
> While I see the point, I'm not fully convinced that it makes the
> documentation easier to read. We defined "client" for the purpose of
> not using the passive form too much, so possibly we could also define
> "driver" in the glossary. Maybe it's just me, but I find that
> referring directly to both sides of the API and using the active form
> is much easier to read.
>
> Possibly just replacing "driver" with "encoder" would ease your concern?

I actually went ahead and rephrased the text of both encoder and
decoder to be more userspace-centric. There are still mentions of a
driver, but only limited to the places
where it is necessary to signify the driver-specific bits, such as
alignments, capabilities, etc.

Best regards,
Tomasz
Hans Verkuil Oct. 16, 2018, 1:49 p.m. UTC | #9
On 10/16/18 09:36, Tomasz Figa wrote:
> On Tue, Aug 7, 2018 at 3:54 PM Tomasz Figa <tfiga@chromium.org> wrote:
>>>> +   * The driver must expose following selection targets on ``OUTPUT``:
>>>> +
>>>> +     ``V4L2_SEL_TGT_CROP_BOUNDS``
>>>> +         maximum crop bounds within the source buffer supported by the
>>>> +         encoder
>>>> +
>>>> +     ``V4L2_SEL_TGT_CROP_DEFAULT``
>>>> +         suggested cropping rectangle that covers the whole source picture
>>>> +
>>>> +     ``V4L2_SEL_TGT_CROP``
>>>> +         rectangle within the source buffer to be encoded into the
>>>> +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
>>>> +
>>>> +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
>>>> +         maximum rectangle within the coded resolution, which the cropped
>>>> +         source frame can be output into; always equal to (0, 0)x(width of
>>>> +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
>>>> +         hardware does not support compose/scaling

Re-reading this I would rewrite this a bit:

if the hardware does not support composition or scaling, then this is always
equal to (0, 0)x(width of ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``).

>>>> +
>>>> +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
>>>> +         equal to ``V4L2_SEL_TGT_CROP``
>>>> +
>>>> +     ``V4L2_SEL_TGT_COMPOSE``
>>>> +         rectangle within the coded frame, which the cropped source frame
>>>> +         is to be output into; defaults to
>>>> +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
>>>> +         additional compose/scaling capabilities; resulting stream will
>>>> +         have this rectangle encoded as the visible rectangle in its
>>>> +         metadata
>>>> +
>>>> +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
>>>> +         always equal to coded resolution of the stream, as selected by the
>>>> +         encoder based on source resolution and crop/compose rectangles
>>>
>>> Are there codec drivers that support composition? I can't remember seeing any.
>>>
>>
>> Hmm, I was convinced that MFC could scale and we just lacked support
>> in the driver, but I checked the documentation and it doesn't seem to
>> be able to do so. I guess we could drop the COMPOSE rectangles for
>> now, until we find any hardware that can do scaling or composing on
>> the fly.
>>
> 
> On the other hand, having them defined already wouldn't complicate
> existing drivers too much either, because they would just handle all
> of them in the same switch case, i.e.
> 
> case V4L2_SEL_TGT_COMPOSE_BOUNDS:
> case V4L2_SEL_TGT_COMPOSE_DEFAULT:
> case V4L2_SEL_TGT_COMPOSE:
> case V4L2_SEL_TGT_COMPOSE_PADDED:
>      return visible_rectangle;
> 
> That would need one change, though. We would define
> V4L2_SEL_TGT_COMPOSE_DEFAULT to be equal to (0, 0)x(width of
> V4L2_SEL_TGT_CROP - 1, height of ``V4L2_SEL_TGT_CROP - 1), which

" - 1"? Where does that come from?

Usually rectangles are specified as widthxheight@left,top.

> makes more sense than current definition, since it would bypass any
> compose/scaling by default.

I have no problem with drivers optionally implementing these rectangles,
even if they don't do scaling or composition. The question is, should it
be required for decoders? If there is a good reason, then I'm OK with it.

Regards,

	Hans
Laurent Pinchart Oct. 17, 2018, 3:19 p.m. UTC | #10
Hi Tomasz,

Thank you for the patch.

On Tuesday, 24 July 2018 17:06:21 EEST Tomasz Figa wrote:
> Due to complexity of the video encoding process, the V4L2 drivers of
> stateful encoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> encoding, encode parameters change, drain and reset.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the encoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>  3 files changed, 553 insertions(+)
>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst
> b/Documentation/media/uapi/v4l/dev-encoder.rst new file mode 100644
> index 000000000000..28be1698e99c
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> @@ -0,0 +1,550 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _encoder:
> +
> +****************************************
> +Memory-to-memory Video Encoder Interface
> +****************************************
> +
> +Input data to a video encoder are raw video frames in display order
> +to be encoded into the output bitstream. Output data are complete chunks of
> +valid bitstream, including all metadata, headers, etc. The resulting
> stream
> +must not need any further post-processing by the client.
> +
> +Performing software stream processing, header generation etc. in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Encoder Interface (in

s/use of/use of the/

(and in various places below, as pointed out in the review of patch 1/2)

> +development) is strongly advised.
> +
> +Conventions and notation used in this document
> +==============================================

[snip]

> +Glossary
> +========

[snip]

Let's try to share these two sections between the two documents.

[snip]

> +Initialization
> +==============
> +
> +1. *[optional]* Enumerate supported formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set a coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +     ``pixelformat``
> +         set to a coded format to be produced
> +
> +   * **Return fields:**
> +
> +     ``width``, ``height``
> +         coded resolution (based on currently active ``OUTPUT`` format)

Shouldn't userspace then set the resolution on the CAPTURE queue first ?

> +   .. note::
> +
> +      Changing ``CAPTURE`` format may change currently set ``OUTPUT``
> +      format. The driver will derive a new ``OUTPUT`` format from
> +      ``CAPTURE`` format being set, including resolution, colorimetry
> +      parameters, etc. If the client needs a specific ``OUTPUT`` format,
> +      it must adjust it afterwards.

Doesn't this contradict the "based on currently active ``OUTPUT`` format" 
above ?

> +3. *[optional]* Enumerate supported ``OUTPUT`` formats (raw formats for
> +   source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``index``
> +         follows standard semantics
> +
> +   * **Return fields:**
> +
> +     ``pixelformat``
> +         raw format supported for the coded format currently selected on
> +         the ``OUTPUT`` queue.
> +
> +4. The client may set the raw source format on the ``OUTPUT`` queue via
> +   :c:func:`VIDIOC_S_FMT`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``pixelformat``
> +         raw format of the source
> +
> +     ``width``, ``height``
> +         source resolution
> +
> +     ``num_planes`` (for _MPLANE)
> +         set to number of planes for pixelformat
> +
> +     ``sizeimage``, ``bytesperline``
> +         follow standard semantics
> +
> +   * **Return fields:**
> +
> +     ``width``, ``height``
> +         may be adjusted by driver to match alignment requirements, as
> +         required by the currently selected formats
> +
> +     ``sizeimage``, ``bytesperline``
> +         follow standard semantics
> +
> +   * Setting the source resolution will reset visible resolution to the
> +     adjusted source resolution rounded up to the closest visible
> +     resolution supported by the driver. Similarly, coded resolution will
> +     be reset to source resolution rounded up to the closest coded
> +     resolution supported by the driver (typically a multiple of
> +     macroblock size).
> +
> +   .. note::
> +
> +      This step is not strictly required, since ``OUTPUT`` is expected to
> +      have a valid default format. However, the client needs to ensure that

s/needs to/must/

> +      ``OUTPUT`` format matches its expectations via either
> +      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
> +      being the typical scenario, since the default format is unlikely to
> +      be what the client needs.
> +
> +5. *[optional]* Set visible resolution for the stream metadata via
> +   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``target``
> +         set to ``V4L2_SEL_TGT_CROP``
> +
> +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +         visible rectangle; this must fit within the framebuffer resolution
> +         and might be subject to adjustment to match codec and hardware
> +         constraints

Just for my information, are there use cases for r.left != 0 or r.top != 0 ?

> +   * **Return fields:**
> +
> +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +         visible rectangle adjusted by the driver
> +
> +   * The driver must expose following selection targets on ``OUTPUT``:
> +
> +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> +         maximum crop bounds within the source buffer supported by the
> +         encoder

Will this always match the format on the OUTPUT queue, or can it differ ?

> +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> +         suggested cropping rectangle that covers the whole source picture

How can the driver know what to report here, apart from the same value as 
V4L2_SET_TGT_CROP_BOUNDS ?

> +     ``V4L2_SEL_TGT_CROP``
> +         rectangle within the source buffer to be encoded into the
> +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> +
> +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> +         maximum rectangle within the coded resolution, which the cropped
> +         source frame can be output into; always equal to (0, 0)x(width of
> +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> +         hardware does not support compose/scaling
> +
> +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> +         equal to ``V4L2_SEL_TGT_CROP``
> +
> +     ``V4L2_SEL_TGT_COMPOSE``
> +         rectangle within the coded frame, which the cropped source frame
> +         is to be output into; defaults to
> +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> +         additional compose/scaling capabilities; resulting stream will
> +         have this rectangle encoded as the visible rectangle in its
> +         metadata
> +
> +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> +         always equal to coded resolution of the stream, as selected by the
> +         encoder based on source resolution and crop/compose rectangles
> +
> +   .. note::
> +
> +      The driver may adjust the crop/compose rectangles to the nearest
> +      supported ones to meet codec and hardware requirements.
> +
> +6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
> +   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
> +
> +   * **Required fields:**
> +
> +     ``count``
> +         requested number of buffers to allocate; greater than zero
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
> +         ``CAPTURE``
> +
> +     ``memory``
> +         follows standard semantics
> +
> +   * **Return fields:**
> +
> +     ``count``
> +         adjusted to allocated number of buffers
> +
> +   * The driver must adjust count to minimum of required number of
> +     buffers for given format and count passed.

s/minimum/maximum/ ?

> The client must
> +     check this value after the ioctl returns to get the number of
> +     buffers actually allocated.
> +
> +   .. note::
> +
> +      To allocate more than minimum number of buffers (for pipeline
> +      depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) or
> +      G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``), respectively,
> +      to get the minimum number of buffers required by the
> +      driver/format, and pass the obtained value plus the number of
> +      additional buffers needed in count field to :c:func:`VIDIOC_REQBUFS`.
> +
> +7. Begin streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> +   :c:func:`VIDIOC_STREAMON`. This may be performed in any order. Actual
> +   encoding process starts when both queues start streaming.
> +
> +.. note::
> +
> +   If the client stops ``CAPTURE`` during the encode process and then
> +   restarts it again, the encoder will be expected to generate a stream
> +   independent from the stream generated before the stop. Depending on the
> +   coded format, that may imply that:
> +
> +   * encoded frames produced after the restart must not reference any
> +     frames produced before the stop, e.g. no long term references for
> +     H264,
> +
> +   * any headers that must be included in a standalone stream must be
> +     produced again, e.g. SPS and PPS for H264.

s/H264/H.264/

(and in other places too)

> +Encoding
> +========
> +
> +This state is reached after a successful initialization sequence. In
> +this state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> +semantics.
> +
> +Both queues operate independently, following standard behavior of V4L2
> +buffer queues and memory-to-memory devices. In addition, the order of
> +encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> +queuing raw frames to ``OUTPUT`` queue, due to properties of selected
> coded
> +format, e.g. frame reordering. The client must not assume any direct
> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> +reported by :c:type:`v4l2_buffer` ``timestamp``.
> +
> +Encoding parameter changes
> +==========================
> +
> +The client is allowed to use :c:func:`VIDIOC_S_CTRL` to change encoder
> +parameters at any time. The availability of parameters is driver-specific
> +and the client must query the driver to find the set of available controls.
> +
> +The ability to change each parameter during encoding of is driver-specific,
> +as per standard semantics of the V4L2 control interface. The client may
> +attempt setting a control of its interest during encoding and if it the
> +operation fails with the -EBUSY error code, ``CAPTURE`` queue needs to be
> +stopped for the configuration change to be allowed (following the drain
> +sequence will be  needed to avoid losing already queued/encoded frames).
> +
> +The timing of parameter update is driver-specific, as per standard
> +semantics of the V4L2 control interface. If the client needs to apply the
> +parameters exactly at specific frame and the encoder supports it, using
> +Request API should be considered.
> +
> +Drain
> +=====
> +
> +To ensure that all queued ``OUTPUT`` buffers have been processed and
> +related ``CAPTURE`` buffers output to the client, the following drain
> +sequence may be followed. After the drain sequence is complete, the client
> +has received all encoded frames for all ``OUTPUT`` buffers queued before
> +the sequence was started.
> +
> +1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
> +
> +   * **Required fields:**
> +
> +     ``cmd``
> +         set to ``V4L2_ENC_CMD_STOP``
> +
> +     ``flags``
> +         set to 0
> +
> +     ``pts``
> +         set to 0
> +
> +2. The driver must process and encode as normal all ``OUTPUT`` buffers
> +   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
> +
> +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
> +   processed:
> +
> +   * Once all decoded frames (if any) are ready to be dequeued on the
> +     ``CAPTURE`` queue

I understand this condition to be equivalent to the main step 3 condition. I 
would thus write it as

"At this point all decoded frames (if any) are ready to be dequeued on the 
``CAPTURE`` queue. The driver must send a ``V4L2_EVENT_EOS``."

> the driver must send a ``V4L2_EVENT_EOS``. The
> +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
> +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
> +     last frame (if any) produced as a result of processing the ``OUTPUT``
> +     buffers queued before

Unneeded line break ?

> +     ``V4L2_ENC_CMD_STOP``.
> +
> +   * If no more frames are left to be returned at the point of handling
> +     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
> +     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> +     ``V4L2_BUF_FLAG_LAST`` set.
> +
> +   * Any attempts to dequeue more buffers beyond the buffer marked with
> +     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
> +     :c:func:`VIDIOC_DQBUF`.
> +
> +4. At this point, encoding is paused and the driver will accept, but not
> +   process any newly queued ``OUTPUT`` buffers until the client issues
> +   ``V4L2_ENC_CMD_START`` or restarts streaming on any queue.
> +
> +* Once the drain sequence is initiated, the client needs to drive it to
> +  completion, as described by the above steps, unless it aborts the process
> +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``CAPTURE`` queue.  The client
> +  is not allowed to issue ``V4L2_ENC_CMD_START`` or ``V4L2_ENC_CMD_STOP``
> +  again while the drain sequence is in progress and they will fail with
> +  -EBUSY error code if attempted.
> +
> +* Restarting streaming on ``CAPTURE`` queue will implicitly end the paused
> +  state and make the encoder continue encoding, as long as other encoding
> +  conditions are met. Restarting ``OUTPUT`` queue will not affect an
> +  in-progress drain sequence.

The last sentence seems to contradict the "on any queue" part of step 4. What 
happens if the client restarts streaming on the OUTPUT queue while a drain 
sequence is in progress ?

> +* The drivers must also implement :c:func:`VIDIOC_TRY_ENCODER_CMD`, as a
> +  way to let the client query the availability of encoder commands.
> +
> +Reset
> +=====
> +
> +The client may want to request the encoder to reinitialize the encoding,
> +so that the stream produced becomes independent from the stream generated
> +before. Depending on the coded format, that may imply that:
> +
> +* encoded frames produced after the restart must not reference any frames
> +  produced before the stop, e.g. no long term references for H264,
> +
> +* any headers that must be included in a standalone stream must be produced
> +  again, e.g. SPS and PPS for H264.
> +
> +This can be achieved by performing the reset sequence.
> +
> +1. *[optional]* If the client is interested in encoded frames resulting
> +   from already queued source frames, it needs to perform the Drain
> +   sequence. Otherwise, the reset sequence would cause the already
> +   encoded and not dequeued encoded frames to be lost.
> +
> +2. Stop streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMOFF`. This
> +   will return all currently queued ``CAPTURE`` buffers to the client,
> +   without valid frame data.
> +
> +3. *[optional]* Restart streaming on ``OUTPUT`` queue via
> +   :c:func:`VIDIOC_STREAMOFF` followed by :c:func:`VIDIOC_STREAMON` to
> +   drop any source frames enqueued to the encoder before the reset
> +   sequence. This is useful if the client requires the new stream to begin
> +   at specific source frame. Otherwise, the new stream might include
> +   frames encoded from source frames queued before the reset sequence.
> +
> +4. Restart streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMON` and
> +   continue with regular encoding sequence. The encoded frames produced
> +   into ``CAPTURE`` buffers from now on will contain a standalone stream
> +   that can be decoded without the need for frames encoded before the reset
> +   sequence.
> +
> +Commit points
> +=============
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on ``CAPTURE`` queue may change the set of formats
> +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> +   means that ``OUTPUT`` format may be reset and the client must not
> +   rely on the previously set format being preserved.
> +
> +2. Enumerating formats on ``OUTPUT`` queue must only return formats
> +   supported for the ``CAPTURE`` format currently set.
> +
> +3. Setting/changing format on ``OUTPUT`` queue does not change formats

Just "Setting" ?

> +   available on ``CAPTURE`` queue. An attempt to set ``OUTPUT`` format that
> +   is not supported for the currently selected ``CAPTURE`` format must
> +   result in the driver adjusting the requested format to an acceptable
> +   one.
> +
> +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
> +   supported coded formats, irrespective of the current ``OUTPUT``
> +   format.
> +
> +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> +   change format on it.
> +
> +To summarize, setting formats and allocation must always start with the
> +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> +set of supported formats for the ``OUTPUT`` queue.

[snip]
Tomasz Figa Oct. 22, 2018, 4:50 a.m. UTC | #11
On Tue, Oct 16, 2018 at 10:50 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 10/16/18 09:36, Tomasz Figa wrote:
> > On Tue, Aug 7, 2018 at 3:54 PM Tomasz Figa <tfiga@chromium.org> wrote:
> >>>> +   * The driver must expose following selection targets on ``OUTPUT``:
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> >>>> +         maximum crop bounds within the source buffer supported by the
> >>>> +         encoder
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> >>>> +         suggested cropping rectangle that covers the whole source picture
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_CROP``
> >>>> +         rectangle within the source buffer to be encoded into the
> >>>> +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> >>>> +         maximum rectangle within the coded resolution, which the cropped
> >>>> +         source frame can be output into; always equal to (0, 0)x(width of
> >>>> +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> >>>> +         hardware does not support compose/scaling
>
> Re-reading this I would rewrite this a bit:
>
> if the hardware does not support composition or scaling, then this is always
> equal to (0, 0)x(width of ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``).
>

Ack.

> >>>> +
> >>>> +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> >>>> +         equal to ``V4L2_SEL_TGT_CROP``
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_COMPOSE``
> >>>> +         rectangle within the coded frame, which the cropped source frame
> >>>> +         is to be output into; defaults to
> >>>> +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> >>>> +         additional compose/scaling capabilities; resulting stream will
> >>>> +         have this rectangle encoded as the visible rectangle in its
> >>>> +         metadata
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> >>>> +         always equal to coded resolution of the stream, as selected by the
> >>>> +         encoder based on source resolution and crop/compose rectangles
> >>>
> >>> Are there codec drivers that support composition? I can't remember seeing any.
> >>>
> >>
> >> Hmm, I was convinced that MFC could scale and we just lacked support
> >> in the driver, but I checked the documentation and it doesn't seem to
> >> be able to do so. I guess we could drop the COMPOSE rectangles for
> >> now, until we find any hardware that can do scaling or composing on
> >> the fly.
> >>
> >
> > On the other hand, having them defined already wouldn't complicate
> > existing drivers too much either, because they would just handle all
> > of them in the same switch case, i.e.
> >
> > case V4L2_SEL_TGT_COMPOSE_BOUNDS:
> > case V4L2_SEL_TGT_COMPOSE_DEFAULT:
> > case V4L2_SEL_TGT_COMPOSE:
> > case V4L2_SEL_TGT_COMPOSE_PADDED:
> >      return visible_rectangle;
> >
> > That would need one change, though. We would define
> > V4L2_SEL_TGT_COMPOSE_DEFAULT to be equal to (0, 0)x(width of
> > V4L2_SEL_TGT_CROP - 1, height of ``V4L2_SEL_TGT_CROP - 1), which
>
> " - 1"? Where does that come from?
>
> Usually rectangles are specified as widthxheight@left,top.
>

Yeah, the notation I used was quite unfortunate. How about just making
it fully verbose?

         if the hardware does not support
         composition or scaling, then this is always equal to the rectangle of
         width and height matching ``V4L2_SEL_TGT_CROP`` and located at (0, 0)

> > makes more sense than current definition, since it would bypass any
> > compose/scaling by default.
>
> I have no problem with drivers optionally implementing these rectangles,
> even if they don't do scaling or composition. The question is, should it
> be required for decoders? If there is a good reason, then I'm OK with it.

There is no particular reason to do it for existing drivers. I'm
personally not a big fan of making things optional, since you never
know when something becomes required and then you run into problems
with user space compatibility. In this case the cost of having those
rectangles defined is really low and they will be useful to handle
encoders and decoders with scaling/compose ability in the future.

I have no strong opinion though.

Best regards,
Tomasz
Tomasz Figa Oct. 22, 2018, 6:12 a.m. UTC | #12
On Thu, Oct 18, 2018 at 12:19 AM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Tomasz,
>
> Thank you for the patch.
>

Thanks for the review. I'll snip out the comments that I've already addressed.

> On Tuesday, 24 July 2018 17:06:21 EEST Tomasz Figa wrote:
[snip]
> > +Glossary
> > +========
>
> [snip]
>
> Let's try to share these two sections between the two documents.
>

Do you have any idea on how to include common contents in multiple
documentation pages (here encoder and decoder)?

> [snip]
>
> > +Initialization
> > +==============
> > +
> > +1. *[optional]* Enumerate supported formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > +     ``pixelformat``
> > +         set to a coded format to be produced
> > +
> > +   * **Return fields:**
> > +
> > +     ``width``, ``height``
> > +         coded resolution (based on currently active ``OUTPUT`` format)
>
> Shouldn't userspace then set the resolution on the CAPTURE queue first ?
>

I don't think so. The resolution on the CAPTURE queue is the internal
coded size of the stream. It depends on the resolution of the OUTPUT
queue, selection rectangles and codec and hardware details.

Actually, I'm thinking whether we actually need to report it to the
userspace. I kept it this way to be consistent with decoders, but I
can't find any use case for it and the CAPTURE format could just
always have width and height set to 0, since it's just a compressed
bitstream.

> > +   .. note::
> > +
> > +      Changing ``CAPTURE`` format may change currently set ``OUTPUT``
> > +      format. The driver will derive a new ``OUTPUT`` format from
> > +      ``CAPTURE`` format being set, including resolution, colorimetry
> > +      parameters, etc. If the client needs a specific ``OUTPUT`` format,
> > +      it must adjust it afterwards.
>
> Doesn't this contradict the "based on currently active ``OUTPUT`` format"
> above ?
>

It might be worded a bit unfortunately indeed, but generally the
userspace doesn't set the width and height, so these values don't
affect the OUTPUT format.

> > +3. *[optional]* Enumerate supported ``OUTPUT`` formats (raw formats for
> > +   source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``index``
> > +         follows standard semantics
> > +
> > +   * **Return fields:**
> > +
> > +     ``pixelformat``
> > +         raw format supported for the coded format currently selected on
> > +         the ``OUTPUT`` queue.
> > +
> > +4. The client may set the raw source format on the ``OUTPUT`` queue via
> > +   :c:func:`VIDIOC_S_FMT`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``pixelformat``
> > +         raw format of the source
> > +
> > +     ``width``, ``height``
> > +         source resolution
> > +
> > +     ``num_planes`` (for _MPLANE)
> > +         set to number of planes for pixelformat
> > +
> > +     ``sizeimage``, ``bytesperline``
> > +         follow standard semantics
> > +
> > +   * **Return fields:**
> > +
> > +     ``width``, ``height``
> > +         may be adjusted by driver to match alignment requirements, as
> > +         required by the currently selected formats
> > +
> > +     ``sizeimage``, ``bytesperline``
> > +         follow standard semantics
> > +
> > +   * Setting the source resolution will reset visible resolution to the
> > +     adjusted source resolution rounded up to the closest visible
> > +     resolution supported by the driver. Similarly, coded resolution will
> > +     be reset to source resolution rounded up to the closest coded
> > +     resolution supported by the driver (typically a multiple of
> > +     macroblock size).
> > +
> > +   .. note::
> > +
> > +      This step is not strictly required, since ``OUTPUT`` is expected to
> > +      have a valid default format. However, the client needs to ensure that
>
> s/needs to/must/

I've removed this note.

>
> > +      ``OUTPUT`` format matches its expectations via either
> > +      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
> > +      being the typical scenario, since the default format is unlikely to
> > +      be what the client needs.
> > +
> > +5. *[optional]* Set visible resolution for the stream metadata via
> > +   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``target``
> > +         set to ``V4L2_SEL_TGT_CROP``
> > +
> > +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > +         visible rectangle; this must fit within the framebuffer resolution
> > +         and might be subject to adjustment to match codec and hardware
> > +         constraints
>
> Just for my information, are there use cases for r.left != 0 or r.top != 0 ?
>

How about screen capture, where you select the part of the screen to
encode, or arbitrary cropping of camera frames?

> > +   * **Return fields:**
> > +
> > +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > +         visible rectangle adjusted by the driver
> > +
> > +   * The driver must expose following selection targets on ``OUTPUT``:
> > +
> > +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> > +         maximum crop bounds within the source buffer supported by the
> > +         encoder
>
> Will this always match the format on the OUTPUT queue, or can it differ ?

Yes. I've reworded it as follows:

     ``V4L2_SEL_TGT_CROP_BOUNDS``
         equal to the full source frame, matching the active ``OUTPUT``
         format

>
> > +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> > +         suggested cropping rectangle that covers the whole source picture
>
> How can the driver know what to report here, apart from the same value as
> V4L2_SET_TGT_CROP_BOUNDS ?
>

I've made them equal in v2 indeed.

[snip]
> > +Drain
> > +=====
> > +
> > +To ensure that all queued ``OUTPUT`` buffers have been processed and
> > +related ``CAPTURE`` buffers output to the client, the following drain
> > +sequence may be followed. After the drain sequence is complete, the client
> > +has received all encoded frames for all ``OUTPUT`` buffers queued before
> > +the sequence was started.
> > +
> > +1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``cmd``
> > +         set to ``V4L2_ENC_CMD_STOP``
> > +
> > +     ``flags``
> > +         set to 0
> > +
> > +     ``pts``
> > +         set to 0
> > +
> > +2. The driver must process and encode as normal all ``OUTPUT`` buffers
> > +   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
> > +
> > +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
> > +   processed:
> > +
> > +   * Once all decoded frames (if any) are ready to be dequeued on the
> > +     ``CAPTURE`` queue
>
> I understand this condition to be equivalent to the main step 3 condition. I
> would thus write it as
>
> "At this point all decoded frames (if any) are ready to be dequeued on the
> ``CAPTURE`` queue. The driver must send a ``V4L2_EVENT_EOS``."
>
> > the driver must send a ``V4L2_EVENT_EOS``. The
> > +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
> > +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
> > +     last frame (if any) produced as a result of processing the ``OUTPUT``
> > +     buffers queued before
>
> Unneeded line break ?
>

The whole sequence has been completely rewritten in v2, hopefully
addressing your comments. Please take a look when I post the new
revision.

> > +     ``V4L2_ENC_CMD_STOP``.
> > +
> > +   * If no more frames are left to be returned at the point of handling
> > +     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
> > +     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> > +     ``V4L2_BUF_FLAG_LAST`` set.
> > +
> > +   * Any attempts to dequeue more buffers beyond the buffer marked with
> > +     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
> > +     :c:func:`VIDIOC_DQBUF`.
> > +
> > +4. At this point, encoding is paused and the driver will accept, but not
> > +   process any newly queued ``OUTPUT`` buffers until the client issues
> > +   ``V4L2_ENC_CMD_START`` or restarts streaming on any queue.
> > +
> > +* Once the drain sequence is initiated, the client needs to drive it to
> > +  completion, as described by the above steps, unless it aborts the process
> > +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``CAPTURE`` queue.  The client
> > +  is not allowed to issue ``V4L2_ENC_CMD_START`` or ``V4L2_ENC_CMD_STOP``
> > +  again while the drain sequence is in progress and they will fail with
> > +  -EBUSY error code if attempted.
> > +
> > +* Restarting streaming on ``CAPTURE`` queue will implicitly end the paused
> > +  state and make the encoder continue encoding, as long as other encoding
> > +  conditions are met. Restarting ``OUTPUT`` queue will not affect an
> > +  in-progress drain sequence.
>
> The last sentence seems to contradict the "on any queue" part of step 4. What
> happens if the client restarts streaming on the OUTPUT queue while a drain
> sequence is in progress ?
>

v2 states that the sequence is aborted in case of any of the queues is stopped.

Best regards,
Tomasz
diff mbox series

Patch

diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
new file mode 100644
index 000000000000..28be1698e99c
--- /dev/null
+++ b/Documentation/media/uapi/v4l/dev-encoder.rst
@@ -0,0 +1,550 @@ 
+.. -*- coding: utf-8; mode: rst -*-
+
+.. _encoder:
+
+****************************************
+Memory-to-memory Video Encoder Interface
+****************************************
+
+Input data to a video encoder are raw video frames in display order
+to be encoded into the output bitstream. Output data are complete chunks of
+valid bitstream, including all metadata, headers, etc. The resulting stream
+must not need any further post-processing by the client.
+
+Performing software stream processing, header generation etc. in the driver
+in order to support this interface is strongly discouraged. In case such
+operations are needed, use of Stateless Video Encoder Interface (in
+development) is strongly advised.
+
+Conventions and notation used in this document
+==============================================
+
+1. The general V4L2 API rules apply if not specified in this document
+   otherwise.
+
+2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
+   2119.
+
+3. All steps not marked “optional” are required.
+
+4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
+   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
+   unless specified otherwise.
+
+5. Single-plane API (see spec) and applicable structures may be used
+   interchangeably with Multi-plane API, unless specified otherwise,
+   depending on driver capabilities and following the general V4L2
+   guidelines.
+
+6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
+   [0..2]: i = 0, 1, 2.
+
+7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
+   containing data (encoded frame/stream) that resulted from processing
+   buffer A.
+
+Glossary
+========
+
+CAPTURE
+   the destination buffer queue; the queue of buffers containing encoded
+   bitstream; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
+   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
+   hardware into ``CAPTURE`` buffers
+
+client
+   application client communicating with the driver implementing this API
+
+coded format
+   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.);
+   see also: raw format
+
+coded height
+   height for given coded resolution
+
+coded resolution
+   stream resolution in pixels aligned to codec and hardware requirements;
+   typically visible resolution rounded up to full macroblocks; see also:
+   visible resolution
+
+coded width
+   width for given coded resolution
+
+decode order
+   the order in which frames are decoded; may differ from display order if
+   coded format includes a feature of frame reordering; ``CAPTURE`` buffers
+   must be returned by the driver in decode order
+
+display order
+   the order in which frames must be displayed; ``OUTPUT`` buffers must be
+   queued by the client in display order
+
+IDR
+   a type of a keyframe in H.264-encoded stream, which clears the list of
+   earlier reference frames (DPBs)
+
+keyframe
+   an encoded frame that does not reference frames decoded earlier, i.e.
+   can be decoded fully on its own.
+
+macroblock
+   a processing unit in image and video compression formats based on linear
+   block transforms (e.g. H264, VP8, VP9); codec-specific, but for most of
+   popular codecs the size is 16x16 samples (pixels)
+
+OUTPUT
+   the source buffer queue; the queue of buffers containing raw frames;
+   ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
+   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
+   from ``OUTPUT`` buffers
+
+PPS
+   Picture Parameter Set; a type of metadata entity in H.264 bitstream
+
+raw format
+   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
+
+resume point
+   a point in the bitstream from which decoding may start/continue, without
+   any previous state/data present, e.g.: a keyframe (VP8/VP9) or
+   SPS/PPS/IDR sequence (H.264); a resume point is required to start decode
+   of a new stream, or to resume decoding after a seek
+
+source
+   data fed to the encoder; ``OUTPUT``
+
+source height
+   height in pixels for given source resolution
+
+source resolution
+   resolution in pixels of source frames being source to the encoder and
+   subject to further cropping to the bounds of visible resolution
+
+source width
+   width in pixels for given source resolution
+
+SPS
+   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
+
+stream metadata
+   additional (non-visual) information contained inside encoded bitstream;
+   for example: coded resolution, visible resolution, codec profile
+
+visible height
+   height for given visible resolution; display height
+
+visible resolution
+   stream resolution of the visible picture, in pixels, to be used for
+   display purposes; must be smaller or equal to coded resolution;
+   display resolution
+
+visible width
+   width for given visible resolution; display width
+
+Querying capabilities
+=====================
+
+1. To enumerate the set of coded formats supported by the driver, the
+   client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
+
+   * The driver must always return the full set of supported formats,
+     irrespective of the format set on the ``OUTPUT`` queue.
+
+2. To enumerate the set of supported raw formats, the client may call
+   :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
+
+   * The driver must return only the formats supported for the format
+     currently active on ``CAPTURE``.
+
+   * In order to enumerate raw formats supported by a given coded format,
+     the client must first set that coded format on ``CAPTURE`` and then
+     enumerate the ``OUTPUT`` queue.
+
+3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
+   resolutions for a given format, passing desired pixel format in
+   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
+
+   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE``
+     must include all possible coded resolutions supported by the encoder
+     for given coded pixel format.
+
+   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT``
+     queue must include all possible frame buffer resolutions supported
+     by the encoder for given raw pixel format and coded format currently
+     set on ``CAPTURE``.
+
+4. Supported profiles and levels for given format, if applicable, may be
+   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
+
+5. Any additional encoder capabilities may be discovered by querying
+   their respective controls.
+
+Initialization
+==============
+
+1. *[optional]* Enumerate supported formats and resolutions. See
+   capability enumeration.
+
+2. Set a coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+     ``pixelformat``
+         set to a coded format to be produced
+
+   * **Return fields:**
+
+     ``width``, ``height``
+         coded resolution (based on currently active ``OUTPUT`` format)
+
+   .. note::
+
+      Changing ``CAPTURE`` format may change currently set ``OUTPUT``
+      format. The driver will derive a new ``OUTPUT`` format from
+      ``CAPTURE`` format being set, including resolution, colorimetry
+      parameters, etc. If the client needs a specific ``OUTPUT`` format,
+      it must adjust it afterwards.
+
+3. *[optional]* Enumerate supported ``OUTPUT`` formats (raw formats for
+   source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+     ``index``
+         follows standard semantics
+
+   * **Return fields:**
+
+     ``pixelformat``
+         raw format supported for the coded format currently selected on
+         the ``OUTPUT`` queue.
+
+4. The client may set the raw source format on the ``OUTPUT`` queue via
+   :c:func:`VIDIOC_S_FMT`.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+     ``pixelformat``
+         raw format of the source
+
+     ``width``, ``height``
+         source resolution
+
+     ``num_planes`` (for _MPLANE)
+         set to number of planes for pixelformat
+
+     ``sizeimage``, ``bytesperline``
+         follow standard semantics
+
+   * **Return fields:**
+
+     ``width``, ``height``
+         may be adjusted by driver to match alignment requirements, as
+         required by the currently selected formats
+
+     ``sizeimage``, ``bytesperline``
+         follow standard semantics
+
+   * Setting the source resolution will reset visible resolution to the
+     adjusted source resolution rounded up to the closest visible
+     resolution supported by the driver. Similarly, coded resolution will
+     be reset to source resolution rounded up to the closest coded
+     resolution supported by the driver (typically a multiple of
+     macroblock size).
+
+   .. note::
+
+      This step is not strictly required, since ``OUTPUT`` is expected to
+      have a valid default format. However, the client needs to ensure that
+      ``OUTPUT`` format matches its expectations via either
+      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
+      being the typical scenario, since the default format is unlikely to
+      be what the client needs.
+
+5. *[optional]* Set visible resolution for the stream metadata via
+   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+     ``target``
+         set to ``V4L2_SEL_TGT_CROP``
+
+     ``r.left``, ``r.top``, ``r.width``, ``r.height``
+         visible rectangle; this must fit within the framebuffer resolution
+         and might be subject to adjustment to match codec and hardware
+         constraints
+
+   * **Return fields:**
+
+     ``r.left``, ``r.top``, ``r.width``, ``r.height``
+         visible rectangle adjusted by the driver
+
+   * The driver must expose following selection targets on ``OUTPUT``:
+
+     ``V4L2_SEL_TGT_CROP_BOUNDS``
+         maximum crop bounds within the source buffer supported by the
+         encoder
+
+     ``V4L2_SEL_TGT_CROP_DEFAULT``
+         suggested cropping rectangle that covers the whole source picture
+
+     ``V4L2_SEL_TGT_CROP``
+         rectangle within the source buffer to be encoded into the
+         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
+
+     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
+         maximum rectangle within the coded resolution, which the cropped
+         source frame can be output into; always equal to (0, 0)x(width of
+         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
+         hardware does not support compose/scaling
+
+     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
+         equal to ``V4L2_SEL_TGT_CROP``
+
+     ``V4L2_SEL_TGT_COMPOSE``
+         rectangle within the coded frame, which the cropped source frame
+         is to be output into; defaults to
+         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
+         additional compose/scaling capabilities; resulting stream will
+         have this rectangle encoded as the visible rectangle in its
+         metadata
+
+     ``V4L2_SEL_TGT_COMPOSE_PADDED``
+         always equal to coded resolution of the stream, as selected by the
+         encoder based on source resolution and crop/compose rectangles
+
+   .. note::
+
+      The driver may adjust the crop/compose rectangles to the nearest
+      supported ones to meet codec and hardware requirements.
+
+6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
+   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
+
+   * **Required fields:**
+
+     ``count``
+         requested number of buffers to allocate; greater than zero
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
+         ``CAPTURE``
+
+     ``memory``
+         follows standard semantics
+
+   * **Return fields:**
+
+     ``count``
+         adjusted to allocated number of buffers
+
+   * The driver must adjust count to minimum of required number of
+     buffers for given format and count passed. The client must
+     check this value after the ioctl returns to get the number of
+     buffers actually allocated.
+
+   .. note::
+
+      To allocate more than minimum number of buffers (for pipeline
+      depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) or
+      G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``), respectively,
+      to get the minimum number of buffers required by the
+      driver/format, and pass the obtained value plus the number of
+      additional buffers needed in count field to :c:func:`VIDIOC_REQBUFS`.
+
+7. Begin streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
+   :c:func:`VIDIOC_STREAMON`. This may be performed in any order. Actual
+   encoding process starts when both queues start streaming.
+
+.. note::
+
+   If the client stops ``CAPTURE`` during the encode process and then
+   restarts it again, the encoder will be expected to generate a stream
+   independent from the stream generated before the stop. Depending on the
+   coded format, that may imply that:
+
+   * encoded frames produced after the restart must not reference any
+     frames produced before the stop, e.g. no long term references for
+     H264,
+
+   * any headers that must be included in a standalone stream must be
+     produced again, e.g. SPS and PPS for H264.
+
+Encoding
+========
+
+This state is reached after a successful initialization sequence. In
+this state, client queues and dequeues buffers to both queues via
+:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
+semantics.
+
+Both queues operate independently, following standard behavior of V4L2
+buffer queues and memory-to-memory devices. In addition, the order of
+encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
+queuing raw frames to ``OUTPUT`` queue, due to properties of selected coded
+format, e.g. frame reordering. The client must not assume any direct
+relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
+reported by :c:type:`v4l2_buffer` ``timestamp``.
+
+Encoding parameter changes
+==========================
+
+The client is allowed to use :c:func:`VIDIOC_S_CTRL` to change encoder
+parameters at any time. The availability of parameters is driver-specific
+and the client must query the driver to find the set of available controls.
+
+The ability to change each parameter during encoding of is driver-specific,
+as per standard semantics of the V4L2 control interface. The client may
+attempt setting a control of its interest during encoding and if it the
+operation fails with the -EBUSY error code, ``CAPTURE`` queue needs to be
+stopped for the configuration change to be allowed (following the drain
+sequence will be  needed to avoid losing already queued/encoded frames).
+
+The timing of parameter update is driver-specific, as per standard
+semantics of the V4L2 control interface. If the client needs to apply the
+parameters exactly at specific frame and the encoder supports it, using
+Request API should be considered.
+
+Drain
+=====
+
+To ensure that all queued ``OUTPUT`` buffers have been processed and
+related ``CAPTURE`` buffers output to the client, the following drain
+sequence may be followed. After the drain sequence is complete, the client
+has received all encoded frames for all ``OUTPUT`` buffers queued before
+the sequence was started.
+
+1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
+
+   * **Required fields:**
+
+     ``cmd``
+         set to ``V4L2_ENC_CMD_STOP``
+
+     ``flags``
+         set to 0
+
+     ``pts``
+         set to 0
+
+2. The driver must process and encode as normal all ``OUTPUT`` buffers
+   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
+
+3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
+   processed:
+
+   * Once all decoded frames (if any) are ready to be dequeued on the
+     ``CAPTURE`` queue the driver must send a ``V4L2_EVENT_EOS``. The
+     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
+     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
+     last frame (if any) produced as a result of processing the ``OUTPUT``
+     buffers queued before
+     ``V4L2_ENC_CMD_STOP``.
+
+   * If no more frames are left to be returned at the point of handling
+     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
+     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
+     ``V4L2_BUF_FLAG_LAST`` set.
+
+   * Any attempts to dequeue more buffers beyond the buffer marked with
+     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
+     :c:func:`VIDIOC_DQBUF`.
+
+4. At this point, encoding is paused and the driver will accept, but not
+   process any newly queued ``OUTPUT`` buffers until the client issues
+   ``V4L2_ENC_CMD_START`` or restarts streaming on any queue.
+
+* Once the drain sequence is initiated, the client needs to drive it to
+  completion, as described by the above steps, unless it aborts the process
+  by issuing :c:func:`VIDIOC_STREAMOFF` on ``CAPTURE`` queue.  The client
+  is not allowed to issue ``V4L2_ENC_CMD_START`` or ``V4L2_ENC_CMD_STOP``
+  again while the drain sequence is in progress and they will fail with
+  -EBUSY error code if attempted.
+
+* Restarting streaming on ``CAPTURE`` queue will implicitly end the paused
+  state and make the encoder continue encoding, as long as other encoding
+  conditions are met. Restarting ``OUTPUT`` queue will not affect an
+  in-progress drain sequence.
+
+* The drivers must also implement :c:func:`VIDIOC_TRY_ENCODER_CMD`, as a
+  way to let the client query the availability of encoder commands.
+
+Reset
+=====
+
+The client may want to request the encoder to reinitialize the encoding,
+so that the stream produced becomes independent from the stream generated
+before. Depending on the coded format, that may imply that:
+
+* encoded frames produced after the restart must not reference any frames
+  produced before the stop, e.g. no long term references for H264,
+
+* any headers that must be included in a standalone stream must be produced
+  again, e.g. SPS and PPS for H264.
+
+This can be achieved by performing the reset sequence.
+
+1. *[optional]* If the client is interested in encoded frames resulting
+   from already queued source frames, it needs to perform the Drain
+   sequence. Otherwise, the reset sequence would cause the already
+   encoded and not dequeued encoded frames to be lost.
+
+2. Stop streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMOFF`. This
+   will return all currently queued ``CAPTURE`` buffers to the client,
+   without valid frame data.
+
+3. *[optional]* Restart streaming on ``OUTPUT`` queue via
+   :c:func:`VIDIOC_STREAMOFF` followed by :c:func:`VIDIOC_STREAMON` to
+   drop any source frames enqueued to the encoder before the reset
+   sequence. This is useful if the client requires the new stream to begin
+   at specific source frame. Otherwise, the new stream might include
+   frames encoded from source frames queued before the reset sequence.
+
+4. Restart streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMON` and
+   continue with regular encoding sequence. The encoded frames produced
+   into ``CAPTURE`` buffers from now on will contain a standalone stream
+   that can be decoded without the need for frames encoded before the reset
+   sequence.
+
+Commit points
+=============
+
+Setting formats and allocating buffers triggers changes in the behavior
+of the driver.
+
+1. Setting format on ``CAPTURE`` queue may change the set of formats
+   supported/advertised on the ``OUTPUT`` queue. In particular, it also
+   means that ``OUTPUT`` format may be reset and the client must not
+   rely on the previously set format being preserved.
+
+2. Enumerating formats on ``OUTPUT`` queue must only return formats
+   supported for the ``CAPTURE`` format currently set.
+
+3. Setting/changing format on ``OUTPUT`` queue does not change formats
+   available on ``CAPTURE`` queue. An attempt to set ``OUTPUT`` format that
+   is not supported for the currently selected ``CAPTURE`` format must
+   result in the driver adjusting the requested format to an acceptable
+   one.
+
+4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
+   supported coded formats, irrespective of the current ``OUTPUT``
+   format.
+
+5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
+   change format on it.
+
+To summarize, setting formats and allocation must always start with the
+``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
+set of supported formats for the ``OUTPUT`` queue.
diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
index 12d43fe711cf..1822c66c2154 100644
--- a/Documentation/media/uapi/v4l/devices.rst
+++ b/Documentation/media/uapi/v4l/devices.rst
@@ -16,6 +16,7 @@  Interfaces
     dev-osd
     dev-codec
     dev-decoder
+    dev-encoder
     dev-effect
     dev-raw-vbi
     dev-sliced-vbi
diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
index 65dc096199ad..2ef6693b9499 100644
--- a/Documentation/media/uapi/v4l/v4l2.rst
+++ b/Documentation/media/uapi/v4l/v4l2.rst
@@ -56,6 +56,7 @@  Authors, in alphabetical order:
 - Figa, Tomasz <tfiga@chromium.org>
 
   - Documented the memory-to-memory decoder interface.
+  - Documented the memory-to-memory encoder interface.
 
 - H Schimek, Michael <mschimek@gmx.at>
 
@@ -68,6 +69,7 @@  Authors, in alphabetical order:
 - Osciak, Pawel <posciak@chromium.org>
 
   - Documented the memory-to-memory decoder interface.
+  - Documented the memory-to-memory encoder interface.
 
 - Osciak, Pawel <pawel@osciak.com>