diff mbox series

[RFC,07/12] media: uapi: h264: Add DPB entry field reference flags

Message ID HE1PR06MB4011559BF2447047C66285D2ACBF0@HE1PR06MB4011.eurprd06.prod.outlook.com (mailing list archive)
State New, archived
Headers show
Series media: hantro: H264 fixes and improvements | expand

Commit Message

Jonas Karlman Sept. 1, 2019, 12:45 p.m. UTC
Add DPB entry flags to help indicate when a reference frame is a field picture
and how the DPB entry is referenced, top or bottom field or full frame.

Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
---
 Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
 include/media/h264-ctrls.h                       |  4 ++++
 2 files changed, 16 insertions(+)

Comments

Ezequiel Garcia July 10, 2020, 4:21 a.m. UTC | #1
Hello Jonas,

In the context of the uAPI cleanup,
I'm revisiting this patch.

On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> Add DPB entry flags to help indicate when a reference frame is a field picture
> and how the DPB entry is referenced, top or bottom field or full frame.
> 
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> ---
>  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
>  include/media/h264-ctrls.h                       |  4 ++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> index bc5dd8e76567..eb6c32668ad7 100644
> --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> @@ -2022,6 +2022,18 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
>      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
>        - 0x00000004
>        - The DPB entry is a long term reference frame
> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> +      - 0x00000008
> +      - The DPB entry is a field picture
> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> +      - 0x00000010
> +      - The DPB entry is a top field reference
> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> +      - 0x00000020
> +      - The DPB entry is a bottom field reference
> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> +      - 0x00000030
> +      - The DPB entry is a reference frame
>  
>  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
>      Specifies the decoding mode to use. Currently exposes slice-based and
> diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> index e877bf1d537c..76020ebd1e6c 100644
> --- a/include/media/h264-ctrls.h
> +++ b/include/media/h264-ctrls.h
> @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
>  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
>  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
>  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
>  

I've been going thru the H264 spec and I'm unsure,
are all these flags semantically needed?

For instance, if one of REF_BOTTOM or REF_TOP (or both)
are set, doesn't that indicate it's a field picture?

Or conversely, if neither REF_BOTTOM or REF_TOP are set,
then it's a frame picture?

Thanks,
Ezequiel
Boris Brezillon July 10, 2020, 8:13 a.m. UTC | #2
On Fri, 10 Jul 2020 01:21:07 -0300
Ezequiel Garcia <ezequiel@collabora.com> wrote:

> Hello Jonas,
> 
> In the context of the uAPI cleanup,
> I'm revisiting this patch.
> 
> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> > Add DPB entry flags to help indicate when a reference frame is a field picture
> > and how the DPB entry is referenced, top or bottom field or full frame.
> > 
> > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > ---
> >  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
> >  include/media/h264-ctrls.h                       |  4 ++++
> >  2 files changed, 16 insertions(+)
> > 
> > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > index bc5dd8e76567..eb6c32668ad7 100644
> > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > @@ -2022,6 +2022,18 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
> >      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
> >        - 0x00000004
> >        - The DPB entry is a long term reference frame
> > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> > +      - 0x00000008
> > +      - The DPB entry is a field picture
> > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> > +      - 0x00000010
> > +      - The DPB entry is a top field reference
> > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> > +      - 0x00000020
> > +      - The DPB entry is a bottom field reference
> > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> > +      - 0x00000030
> > +      - The DPB entry is a reference frame
> >  
> >  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
> >      Specifies the decoding mode to use. Currently exposes slice-based and
> > diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> > index e877bf1d537c..76020ebd1e6c 100644
> > --- a/include/media/h264-ctrls.h
> > +++ b/include/media/h264-ctrls.h
> > @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
> >  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> >  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> >  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
> >    
> 
> I've been going thru the H264 spec and I'm unsure,
> are all these flags semantically needed?
> 
> For instance, if one of REF_BOTTOM or REF_TOP (or both)
> are set, doesn't that indicate it's a field picture?
> 
> Or conversely, if neither REF_BOTTOM or REF_TOP are set,
> then it's a frame picture?

I think that's what I was trying to do here [1]

[1]https://patchwork.kernel.org/patch/11392095/
Jonas Karlman July 10, 2020, 8:48 a.m. UTC | #3
On 2020-07-10 10:13, Boris Brezillon wrote:
> On Fri, 10 Jul 2020 01:21:07 -0300
> Ezequiel Garcia <ezequiel@collabora.com> wrote:
> 
>> Hello Jonas,
>>
>> In the context of the uAPI cleanup,
>> I'm revisiting this patch.
>>
>> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
>>> Add DPB entry flags to help indicate when a reference frame is a field picture
>>> and how the DPB entry is referenced, top or bottom field or full frame.
>>>
>>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>>> ---
>>>  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
>>>  include/media/h264-ctrls.h                       |  4 ++++
>>>  2 files changed, 16 insertions(+)
>>>
>>> diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
>>> index bc5dd8e76567..eb6c32668ad7 100644
>>> --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
>>> +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
>>> @@ -2022,6 +2022,18 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
>>>      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
>>>        - 0x00000004
>>>        - The DPB entry is a long term reference frame
>>> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
>>> +      - 0x00000008
>>> +      - The DPB entry is a field picture
>>> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
>>> +      - 0x00000010
>>> +      - The DPB entry is a top field reference
>>> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
>>> +      - 0x00000020
>>> +      - The DPB entry is a bottom field reference
>>> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
>>> +      - 0x00000030
>>> +      - The DPB entry is a reference frame
>>>  
>>>  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
>>>      Specifies the decoding mode to use. Currently exposes slice-based and
>>> diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
>>> index e877bf1d537c..76020ebd1e6c 100644
>>> --- a/include/media/h264-ctrls.h
>>> +++ b/include/media/h264-ctrls.h
>>> @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
>>>  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
>>>  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
>>>  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
>>> +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
>>> +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
>>> +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
>>> +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
>>>    
>>
>> I've been going thru the H264 spec and I'm unsure,
>> are all these flags semantically needed?
>>
>> For instance, if one of REF_BOTTOM or REF_TOP (or both)
>> are set, doesn't that indicate it's a field picture?

These flags would only indicate how the frame / field pair / field is
referenced and not if the DPB entry was decoded as a frame or field pair.

Both hantro and rkvdec needs to know how the referenced frame / field pair
was decoded (not how it is referenced), my best guess is that MV is stored
differently for a frame (linear) and field pair (buffer split in two).

I think we should be able to track how the buffer was decoded similar to
how VP9 keep track of buffer width/height.

When I played with interlaced decoding of rkvdec a few weeks ago I
reverted flags to something similar as my initial rfc patch, see [1].
I guess it should be possible to keep current flags and track field_pic
in driver, some macro to simplify check for top/bottom ref could be
useful if flags is kept as-is.

I am hoping to find some time next week to revisit hantro interlaced
and refine rkvdec interlaced support.

[1] https://github.com/Kwiboo/linux-rockchip/compare/da52ca6f8d2284aebea2d0b99d254b64922faa2d...c9f04cd9bc65eda0da713f4ce1c77eeb1960bd70

Regards,
Jonas

>>
>> Or conversely, if neither REF_BOTTOM or REF_TOP are set,
>> then it's a frame picture?
> 
> I think that's what I was trying to do here [1]
> 
> [1]https://patchwork.kernel.org/patch/11392095/
>
Ezequiel Garcia July 10, 2020, 11:50 a.m. UTC | #4
On Fri, 2020-07-10 at 10:13 +0200, Boris Brezillon wrote:
> On Fri, 10 Jul 2020 01:21:07 -0300
> Ezequiel Garcia <ezequiel@collabora.com> wrote:
> 
> > Hello Jonas,
> > 
> > In the context of the uAPI cleanup,
> > I'm revisiting this patch.
> > 
> > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> > > Add DPB entry flags to help indicate when a reference frame is a field picture
> > > and how the DPB entry is referenced, top or bottom field or full frame.
> > > 
> > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > > ---
> > >  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
> > >  include/media/h264-ctrls.h                       |  4 ++++
> > >  2 files changed, 16 insertions(+)
> > > 
> > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > index bc5dd8e76567..eb6c32668ad7 100644
> > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > @@ -2022,6 +2022,18 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
> > >      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
> > >        - 0x00000004
> > >        - The DPB entry is a long term reference frame
> > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> > > +      - 0x00000008
> > > +      - The DPB entry is a field picture
> > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> > > +      - 0x00000010
> > > +      - The DPB entry is a top field reference
> > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> > > +      - 0x00000020
> > > +      - The DPB entry is a bottom field reference
> > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> > > +      - 0x00000030
> > > +      - The DPB entry is a reference frame
> > >  
> > >  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
> > >      Specifies the decoding mode to use. Currently exposes slice-based and
> > > diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> > > index e877bf1d537c..76020ebd1e6c 100644
> > > --- a/include/media/h264-ctrls.h
> > > +++ b/include/media/h264-ctrls.h
> > > @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
> > >  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > >  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > >  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
> > >    
> > 
> > I've been going thru the H264 spec and I'm unsure,
> > are all these flags semantically needed?
> > 
> > For instance, if one of REF_BOTTOM or REF_TOP (or both)
> > are set, doesn't that indicate it's a field picture?
> > 
> > Or conversely, if neither REF_BOTTOM or REF_TOP are set,
> > then it's a frame picture?
> 
> I think that's what I was trying to do here [1]
> 
> [1]https://patchwork.kernel.org/patch/11392095/

Right. Aren't we missing a DPB_ENTRY_FLAG_TOP_FIELD?

If I understand correctly, the DPB can contain:

* frames (FLAG_FIELD not set)
* a field pair, with a single field (FLAG_FIELD and either TOP or BOTTOM).
* a field pair, with boths fields (FLAG_FIELD and both TOP or BOTTOM).

Ezequiel
Boris Brezillon July 10, 2020, 12:05 p.m. UTC | #5
On Fri, 10 Jul 2020 08:50:28 -0300
Ezequiel Garcia <ezequiel@collabora.com> wrote:

> On Fri, 2020-07-10 at 10:13 +0200, Boris Brezillon wrote:
> > On Fri, 10 Jul 2020 01:21:07 -0300
> > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> >   
> > > Hello Jonas,
> > > 
> > > In the context of the uAPI cleanup,
> > > I'm revisiting this patch.
> > > 
> > > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:  
> > > > Add DPB entry flags to help indicate when a reference frame is a field picture
> > > > and how the DPB entry is referenced, top or bottom field or full frame.
> > > > 
> > > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > > > ---
> > > >  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
> > > >  include/media/h264-ctrls.h                       |  4 ++++
> > > >  2 files changed, 16 insertions(+)
> > > > 
> > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > index bc5dd8e76567..eb6c32668ad7 100644
> > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > @@ -2022,6 +2022,18 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
> > > >      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
> > > >        - 0x00000004
> > > >        - The DPB entry is a long term reference frame
> > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> > > > +      - 0x00000008
> > > > +      - The DPB entry is a field picture
> > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> > > > +      - 0x00000010
> > > > +      - The DPB entry is a top field reference
> > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> > > > +      - 0x00000020
> > > > +      - The DPB entry is a bottom field reference
> > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> > > > +      - 0x00000030
> > > > +      - The DPB entry is a reference frame
> > > >  
> > > >  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
> > > >      Specifies the decoding mode to use. Currently exposes slice-based and
> > > > diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> > > > index e877bf1d537c..76020ebd1e6c 100644
> > > > --- a/include/media/h264-ctrls.h
> > > > +++ b/include/media/h264-ctrls.h
> > > > @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
> > > >  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > > >  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > > >  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
> > > >      
> > > 
> > > I've been going thru the H264 spec and I'm unsure,
> > > are all these flags semantically needed?
> > > 
> > > For instance, if one of REF_BOTTOM or REF_TOP (or both)
> > > are set, doesn't that indicate it's a field picture?
> > > 
> > > Or conversely, if neither REF_BOTTOM or REF_TOP are set,
> > > then it's a frame picture?  
> > 
> > I think that's what I was trying to do here [1]
> > 
> > [1]https://patchwork.kernel.org/patch/11392095/  
> 
> Right. Aren't we missing a DPB_ENTRY_FLAG_TOP_FIELD?
> 
> If I understand correctly, the DPB can contain:
> 
> * frames (FLAG_FIELD not set)
> * a field pair, with a single field (FLAG_FIELD and either TOP or BOTTOM).
> * a field pair, with boths fields (FLAG_FIELD and both TOP or BOTTOM).

Well, my understand is that, if the buffer contains both a TOP and
BOTTOM field, it actually becomes a full frame, so you actually have
those cases:

* FLAG_FIELD not set: this a frame (note that a TOP/BOTTOM field
  decoded buffer can become of frame if it's complemented with the
  missing field later during the decoding)
* FLAG_FIELD set + BOTTOM_FIELD not set: this is a TOP field
* FLAG_FIELD set + BOTTOM_FIELD set: this is a BOTTOM field
* FLAG_FIELD not set + BOTTOM_FIELD set: invalid combination

but I might be wrong.
Ezequiel Garcia July 10, 2020, 12:18 p.m. UTC | #6
On Fri, 2020-07-10 at 08:48 +0000, Jonas Karlman wrote:
> On 2020-07-10 10:13, Boris Brezillon wrote:
> > On Fri, 10 Jul 2020 01:21:07 -0300
> > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > 
> > > Hello Jonas,
> > > 
> > > In the context of the uAPI cleanup,
> > > I'm revisiting this patch.
> > > 
> > > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:
> > > > Add DPB entry flags to help indicate when a reference frame is a field picture
> > > > and how the DPB entry is referenced, top or bottom field or full frame.
> > > > 
> > > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > > > ---
> > > >  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
> > > >  include/media/h264-ctrls.h                       |  4 ++++
> > > >  2 files changed, 16 insertions(+)
> > > > 
> > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > index bc5dd8e76567..eb6c32668ad7 100644
> > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > @@ -2022,6 +2022,18 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
> > > >      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
> > > >        - 0x00000004
> > > >        - The DPB entry is a long term reference frame
> > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> > > > +      - 0x00000008
> > > > +      - The DPB entry is a field picture
> > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> > > > +      - 0x00000010
> > > > +      - The DPB entry is a top field reference
> > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> > > > +      - 0x00000020
> > > > +      - The DPB entry is a bottom field reference
> > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> > > > +      - 0x00000030
> > > > +      - The DPB entry is a reference frame
> > > >  
> > > >  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
> > > >      Specifies the decoding mode to use. Currently exposes slice-based and
> > > > diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> > > > index e877bf1d537c..76020ebd1e6c 100644
> > > > --- a/include/media/h264-ctrls.h
> > > > +++ b/include/media/h264-ctrls.h
> > > > @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
> > > >  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > > >  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > > >  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
> > > >    
> > > 
> > > I've been going thru the H264 spec and I'm unsure,
> > > are all these flags semantically needed?
> > > 
> > > For instance, if one of REF_BOTTOM or REF_TOP (or both)
> > > are set, doesn't that indicate it's a field picture?
> 
> These flags would only indicate how the frame / field pair / field is
> referenced and not if the DPB entry was decoded as a frame or field pair.
> 

I believe _how_ the picture is referenced shouldn't (or can't?) be signaled
in the DPB representation. It seems Jernej's [1] which properly adds a
flag for each entry in ref_pic_list0 is the right way.

https://patchwork.linuxtv.org/patch/64289/

> Both hantro and rkvdec needs to know how the referenced frame / field pair
> was decoded (not how it is referenced), my best guess is that MV is stored
> differently for a frame (linear) and field pair (buffer split in two).
> 
> I think we should be able to track how the buffer was decoded similar to
> how VP9 keep track of buffer width/height.
> 
> When I played with interlaced decoding of rkvdec a few weeks ago I
> reverted flags to something similar as my initial rfc patch, see [1].
> I guess it should be possible to keep current flags and track field_pic
> in driver, some macro to simplify check for top/bottom ref could be
> useful if flags is kept as-is.
> 
> I am hoping to find some time next week to revisit hantro interlaced
> and refine rkvdec interlaced support.
> 
> [1] https://github.com/Kwiboo/linux-rockchip/compare/da52ca6f8d2284aebea2d0b99d254b64922faa2d...c9f04cd9bc65eda0da713f4ce1c77eeb1960bd70
> 

Yup, I noticed this and it's why I started looking at the uAPI side
of the DPB.

It seems to me all we are missing is further clarification
of the meaning of each DPB_ENTRY_FLAG (possibly adding/removing
flags).

From this snippet:

		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD_PIC)
			refer_addr |= RKVDEC_FIELD_REF;
		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_TOP_REF)
			refer_addr |= RKVDEC_TOPFIELD_USED_REF;
		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_BOTTOM_REF)
			refer_addr |= RKVDEC_BOTFIELD_USED_REF;
		if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)
			refer_addr |= RKVDEC_COLMV_USED_FLAG_REF;

Which of these flags are needed, i.e. which are required to fully
describe a picture stored in the DPB?

Also, since we are here, I wonder what is exactly RKVDEC COLMV
and what's the condition for RKVDEC_COLMV_USED_FLAG_REF.

Thanks a lot!
Ezequiel

> Regards,
> Jonas
> 
> > > Or conversely, if neither REF_BOTTOM or REF_TOP are set,
> > > then it's a frame picture?
> > 
> > I think that's what I was trying to do here [1]
> > 
> > [1]https://patchwork.kernel.org/patch/11392095/
> >
Ezequiel Garcia July 10, 2020, 12:25 p.m. UTC | #7
+Nicolas

On Fri, 2020-07-10 at 14:05 +0200, Boris Brezillon wrote:
> On Fri, 10 Jul 2020 08:50:28 -0300
> Ezequiel Garcia <ezequiel@collabora.com> wrote:
> 
> > On Fri, 2020-07-10 at 10:13 +0200, Boris Brezillon wrote:
> > > On Fri, 10 Jul 2020 01:21:07 -0300
> > > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > >   
> > > > Hello Jonas,
> > > > 
> > > > In the context of the uAPI cleanup,
> > > > I'm revisiting this patch.
> > > > 
> > > > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:  
> > > > > Add DPB entry flags to help indicate when a reference frame is a field picture
> > > > > and how the DPB entry is referenced, top or bottom field or full frame.
> > > > > 
> > > > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > > > > ---
> > > > >  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
> > > > >  include/media/h264-ctrls.h                       |  4 ++++
> > > > >  2 files changed, 16 insertions(+)
> > > > > 
> > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > index bc5dd8e76567..eb6c32668ad7 100644
> > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > @@ -2022,6 +2022,18 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type -
> > > > >      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
> > > > >        - 0x00000004
> > > > >        - The DPB entry is a long term reference frame
> > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> > > > > +      - 0x00000008
> > > > > +      - The DPB entry is a field picture
> > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> > > > > +      - 0x00000010
> > > > > +      - The DPB entry is a top field reference
> > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> > > > > +      - 0x00000020
> > > > > +      - The DPB entry is a bottom field reference
> > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> > > > > +      - 0x00000030
> > > > > +      - The DPB entry is a reference frame
> > > > >  
> > > > >  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
> > > > >      Specifies the decoding mode to use. Currently exposes slice-based and
> > > > > diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> > > > > index e877bf1d537c..76020ebd1e6c 100644
> > > > > --- a/include/media/h264-ctrls.h
> > > > > +++ b/include/media/h264-ctrls.h
> > > > > @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
> > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
> > > > >      
> > > > 
> > > > I've been going thru the H264 spec and I'm unsure,
> > > > are all these flags semantically needed?
> > > > 
> > > > For instance, if one of REF_BOTTOM or REF_TOP (or both)
> > > > are set, doesn't that indicate it's a field picture?
> > > > 
> > > > Or conversely, if neither REF_BOTTOM or REF_TOP are set,
> > > > then it's a frame picture?  
> > > 
> > > I think that's what I was trying to do here [1]
> > > 
> > > [1]https://patchwork.kernel.org/patch/11392095/  
> > 
> > Right. Aren't we missing a DPB_ENTRY_FLAG_TOP_FIELD?
> > 
> > If I understand correctly, the DPB can contain:
> > 
> > * frames (FLAG_FIELD not set)
> > * a field pair, with a single field (FLAG_FIELD and either TOP or BOTTOM).
> > * a field pair, with boths fields (FLAG_FIELD and both TOP or BOTTOM).
> 
> Well, my understand is that, if the buffer contains both a TOP and
> BOTTOM field, it actually becomes a full frame, so you actually have
> those cases:
> 
> * FLAG_FIELD not set: this a frame (note that a TOP/BOTTOM field
>   decoded buffer can become of frame if it's complemented with the
>   missing field later during the decoding)
> * FLAG_FIELD set + BOTTOM_FIELD not set: this is a TOP field
> * FLAG_FIELD set + BOTTOM_FIELD set: this is a BOTTOM field
> * FLAG_FIELD not set + BOTTOM_FIELD set: invalid combination
> 
> but I might be wrong.

Yes, perhaps that's correct. I was trying to think strictly
in terms of the H264 semantics, to define a clean interface.

From the mpp code, looks like the above is enough for rkvdec
(although I haven't done any tests).

Ezequiel
Nicolas Dufresne July 10, 2020, 9:49 p.m. UTC | #8
Le vendredi 10 juillet 2020 à 09:25 -0300, Ezequiel Garcia a écrit :
> +Nicolas
> 
> On Fri, 2020-07-10 at 14:05 +0200, Boris Brezillon wrote:
> > On Fri, 10 Jul 2020 08:50:28 -0300
> > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > 
> > > On Fri, 2020-07-10 at 10:13 +0200, Boris Brezillon wrote:
> > > > On Fri, 10 Jul 2020 01:21:07 -0300
> > > > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > >   
> > > > > Hello Jonas,
> > > > > 
> > > > > In the context of the uAPI cleanup,
> > > > > I'm revisiting this patch.
> > > > > 
> > > > > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:  
> > > > > > Add DPB entry flags to help indicate when a reference frame is a
> > > > > > field picture
> > > > > > and how the DPB entry is referenced, top or bottom field or full
> > > > > > frame.
> > > > > > 
> > > > > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > > > > > ---
> > > > > >  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
> > > > > >  include/media/h264-ctrls.h                       |  4 ++++
> > > > > >  2 files changed, 16 insertions(+)
> > > > > > 
> > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > index bc5dd8e76567..eb6c32668ad7 100644
> > > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > @@ -2022,6 +2022,18 @@ enum
> > > > > > v4l2_mpeg_video_h264_hierarchical_coding_type -
> > > > > >      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
> > > > > >        - 0x00000004
> > > > > >        - The DPB entry is a long term reference frame
> > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> > > > > > +      - 0x00000008
> > > > > > +      - The DPB entry is a field picture
> > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> > > > > > +      - 0x00000010
> > > > > > +      - The DPB entry is a top field reference
> > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> > > > > > +      - 0x00000020
> > > > > > +      - The DPB entry is a bottom field reference
> > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> > > > > > +      - 0x00000030
> > > > > > +      - The DPB entry is a reference frame
> > > > > >  
> > > > > >  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
> > > > > >      Specifies the decoding mode to use. Currently exposes slice-
> > > > > > based and
> > > > > > diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> > > > > > index e877bf1d537c..76020ebd1e6c 100644
> > > > > > --- a/include/media/h264-ctrls.h
> > > > > > +++ b/include/media/h264-ctrls.h
> > > > > > @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
> > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
> > > > > >      
> > > > > 
> > > > > I've been going thru the H264 spec and I'm unsure,
> > > > > are all these flags semantically needed?
> > > > > 
> > > > > For instance, if one of REF_BOTTOM or REF_TOP (or both)
> > > > > are set, doesn't that indicate it's a field picture?
> > > > > 
> > > > > Or conversely, if neither REF_BOTTOM or REF_TOP are set,
> > > > > then it's a frame picture?  
> > > > 
> > > > I think that's what I was trying to do here [1]
> > > > 
> > > > [1]https://patchwork.kernel.org/patch/11392095/  
> > > 
> > > Right. Aren't we missing a DPB_ENTRY_FLAG_TOP_FIELD?
> > > 
> > > If I understand correctly, the DPB can contain:
> > > 
> > > * frames (FLAG_FIELD not set)
> > > * a field pair, with a single field (FLAG_FIELD and either TOP or BOTTOM).
> > > * a field pair, with boths fields (FLAG_FIELD and both TOP or BOTTOM).
> > 
> > Well, my understand is that, if the buffer contains both a TOP and
> > BOTTOM field, it actually becomes a full frame, so you actually have
> > those cases:
> > 
> > * FLAG_FIELD not set: this a frame (note that a TOP/BOTTOM field
> >   decoded buffer can become of frame if it's complemented with the
> >   missing field later during the decoding)
> > * FLAG_FIELD set + BOTTOM_FIELD not set: this is a TOP field
> > * FLAG_FIELD set + BOTTOM_FIELD set: this is a BOTTOM field
> > * FLAG_FIELD not set + BOTTOM_FIELD set: invalid combination

Let's admit, while this work, it's odd. Can we just move to that instewad ?

  FLAG_TOP_FIELD
  FLAG_BOTTOM_FIELD
  FLAG_FRAME = (FLAG_TOP_FIELD | FLAG_BOTTOM_FIELD)

So it can be used as a flag, but also is a proper enum and there is no longer an
invalid combination.
  
> > 
> > but I might be wrong.
> 
> Yes, perhaps that's correct. I was trying to think strictly
> in terms of the H264 semantics, to define a clean interface.
> 
> From the mpp code, looks like the above is enough for rkvdec
> (although I haven't done any tests).
> 
> Ezequiel
> 
> 
>
Jonas Karlman July 11, 2020, 10:21 a.m. UTC | #9
On 2020-07-10 23:49, Nicolas Dufresne wrote:
> Le vendredi 10 juillet 2020 à 09:25 -0300, Ezequiel Garcia a écrit :
>> +Nicolas
>>
>> On Fri, 2020-07-10 at 14:05 +0200, Boris Brezillon wrote:
>>> On Fri, 10 Jul 2020 08:50:28 -0300
>>> Ezequiel Garcia <ezequiel@collabora.com> wrote:
>>>
>>>> On Fri, 2020-07-10 at 10:13 +0200, Boris Brezillon wrote:
>>>>> On Fri, 10 Jul 2020 01:21:07 -0300
>>>>> Ezequiel Garcia <ezequiel@collabora.com> wrote:
>>>>>   
>>>>>> Hello Jonas,
>>>>>>
>>>>>> In the context of the uAPI cleanup,
>>>>>> I'm revisiting this patch.
>>>>>>
>>>>>> On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:  
>>>>>>> Add DPB entry flags to help indicate when a reference frame is a
>>>>>>> field picture
>>>>>>> and how the DPB entry is referenced, top or bottom field or full
>>>>>>> frame.
>>>>>>>
>>>>>>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>>>>>>> ---
>>>>>>>  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
>>>>>>>  include/media/h264-ctrls.h                       |  4 ++++
>>>>>>>  2 files changed, 16 insertions(+)
>>>>>>>
>>>>>>> diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
>>>>>>> b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
>>>>>>> index bc5dd8e76567..eb6c32668ad7 100644
>>>>>>> --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
>>>>>>> +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
>>>>>>> @@ -2022,6 +2022,18 @@ enum
>>>>>>> v4l2_mpeg_video_h264_hierarchical_coding_type -
>>>>>>>      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
>>>>>>>        - 0x00000004
>>>>>>>        - The DPB entry is a long term reference frame
>>>>>>> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
>>>>>>> +      - 0x00000008
>>>>>>> +      - The DPB entry is a field picture
>>>>>>> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
>>>>>>> +      - 0x00000010
>>>>>>> +      - The DPB entry is a top field reference
>>>>>>> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
>>>>>>> +      - 0x00000020
>>>>>>> +      - The DPB entry is a bottom field reference
>>>>>>> +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
>>>>>>> +      - 0x00000030
>>>>>>> +      - The DPB entry is a reference frame
>>>>>>>  
>>>>>>>  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
>>>>>>>      Specifies the decoding mode to use. Currently exposes slice-
>>>>>>> based and
>>>>>>> diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
>>>>>>> index e877bf1d537c..76020ebd1e6c 100644
>>>>>>> --- a/include/media/h264-ctrls.h
>>>>>>> +++ b/include/media/h264-ctrls.h
>>>>>>> @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
>>>>>>>  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
>>>>>>>  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
>>>>>>>  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
>>>>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
>>>>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
>>>>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
>>>>>>> +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
>>>>>>>      
>>>>>>
>>>>>> I've been going thru the H264 spec and I'm unsure,
>>>>>> are all these flags semantically needed?
>>>>>>
>>>>>> For instance, if one of REF_BOTTOM or REF_TOP (or both)
>>>>>> are set, doesn't that indicate it's a field picture?
>>>>>>
>>>>>> Or conversely, if neither REF_BOTTOM or REF_TOP are set,
>>>>>> then it's a frame picture?  
>>>>>
>>>>> I think that's what I was trying to do here [1]
>>>>>
>>>>> [1]https://patchwork.kernel.org/patch/11392095/  
>>>>
>>>> Right. Aren't we missing a DPB_ENTRY_FLAG_TOP_FIELD?
>>>>
>>>> If I understand correctly, the DPB can contain:
>>>>
>>>> * frames (FLAG_FIELD not set)
>>>> * a field pair, with a single field (FLAG_FIELD and either TOP or BOTTOM).
>>>> * a field pair, with boths fields (FLAG_FIELD and both TOP or BOTTOM).
>>>
>>> Well, my understand is that, if the buffer contains both a TOP and
>>> BOTTOM field, it actually becomes a full frame, so you actually have
>>> those cases:
>>>
>>> * FLAG_FIELD not set: this a frame (note that a TOP/BOTTOM field
>>>   decoded buffer can become of frame if it's complemented with the
>>>   missing field later during the decoding)
>>> * FLAG_FIELD set + BOTTOM_FIELD not set: this is a TOP field
>>> * FLAG_FIELD set + BOTTOM_FIELD set: this is a BOTTOM field
>>> * FLAG_FIELD not set + BOTTOM_FIELD set: invalid combination
> 
> Let's admit, while this work, it's odd. Can we just move to that instewad ?
> 
>   FLAG_TOP_FIELD
>   FLAG_BOTTOM_FIELD
>   FLAG_FRAME = (FLAG_TOP_FIELD | FLAG_BOTTOM_FIELD)
> 
> So it can be used as a flag, but also is a proper enum and there is no longer an
> invalid combination.
>   
>>>
>>> but I might be wrong.

There seems to be some misunderstanding here, the top/bottom flagging should
not be used to describe if the picture is a field, field pair or frame, it
should be used to flag if a frame or the top and/or bottom field (in case of
a field pair) is "used for short-term reference".

FLAG_TOP_REF
FLAG_BOTTOM_REF
FLAG_FRAME_REF = (FLAG_TOP_REF | FLAG_BOTTOM_REF)

Would be a more appropriate naming.

The FIELD_PIC flag would then be used to describe if the picture is a
reference frame or a complementary reference field pair.

As described in hantro h264 driver [1] the MV buffer is split in two
for field encoded frames, and I guess the rkvdec block does something
similar and therefore the HW blocks probably needs to know if the reference
picture is a reference frame or a complementary reference field pair.
It should be possible to keep such state in driver but since such information
was easily available in ffmpeg and the driver being "stateless" using a flag
seamed like a good choice at the time.

Please note that I have not done any test without the "field pic" flagging
but both mpp and the imx/hantro reference code are configuring this bit.

[1] https://git.linuxtv.org/media_tree.git/tree/drivers/staging/media/hantro/hantro_g1_h264_dec.c#n265

Regards,
Jonas

>>
>> Yes, perhaps that's correct. I was trying to think strictly
>> in terms of the H264 semantics, to define a clean interface.
>>
>> From the mpp code, looks like the above is enough for rkvdec
>> (although I haven't done any tests).
>>
>> Ezequiel
>>
>>
>>
>
Nicolas Dufresne July 11, 2020, 6:36 p.m. UTC | #10
Le samedi 11 juillet 2020 à 10:21 +0000, Jonas Karlman a écrit :
> On 2020-07-10 23:49, Nicolas Dufresne wrote:
> > Le vendredi 10 juillet 2020 à 09:25 -0300, Ezequiel Garcia a écrit :
> > > +Nicolas
> > > 
> > > On Fri, 2020-07-10 at 14:05 +0200, Boris Brezillon wrote:
> > > > On Fri, 10 Jul 2020 08:50:28 -0300
> > > > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > > 
> > > > > On Fri, 2020-07-10 at 10:13 +0200, Boris Brezillon wrote:
> > > > > > On Fri, 10 Jul 2020 01:21:07 -0300
> > > > > > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > > > >   
> > > > > > > Hello Jonas,
> > > > > > > 
> > > > > > > In the context of the uAPI cleanup,
> > > > > > > I'm revisiting this patch.
> > > > > > > 
> > > > > > > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:  
> > > > > > > > Add DPB entry flags to help indicate when a reference frame is a
> > > > > > > > field picture
> > > > > > > > and how the DPB entry is referenced, top or bottom field or full
> > > > > > > > frame.
> > > > > > > > 
> > > > > > > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > > > > > > > ---
> > > > > > > >  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
> > > > > > > >  include/media/h264-ctrls.h                       |  4 ++++
> > > > > > > >  2 files changed, 16 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > index bc5dd8e76567..eb6c32668ad7 100644
> > > > > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > @@ -2022,6 +2022,18 @@ enum
> > > > > > > > v4l2_mpeg_video_h264_hierarchical_coding_type -
> > > > > > > >      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
> > > > > > > >        - 0x00000004
> > > > > > > >        - The DPB entry is a long term reference frame
> > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> > > > > > > > +      - 0x00000008
> > > > > > > > +      - The DPB entry is a field picture
> > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> > > > > > > > +      - 0x00000010
> > > > > > > > +      - The DPB entry is a top field reference
> > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> > > > > > > > +      - 0x00000020
> > > > > > > > +      - The DPB entry is a bottom field reference
> > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> > > > > > > > +      - 0x00000030
> > > > > > > > +      - The DPB entry is a reference frame
> > > > > > > >  
> > > > > > > >  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
> > > > > > > >      Specifies the decoding mode to use. Currently exposes slice-
> > > > > > > > based and
> > > > > > > > diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> > > > > > > > index e877bf1d537c..76020ebd1e6c 100644
> > > > > > > > --- a/include/media/h264-ctrls.h
> > > > > > > > +++ b/include/media/h264-ctrls.h
> > > > > > > > @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
> > > > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > > > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > > > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
> > > > > > > >      
> > > > > > > 
> > > > > > > I've been going thru the H264 spec and I'm unsure,
> > > > > > > are all these flags semantically needed?
> > > > > > > 
> > > > > > > For instance, if one of REF_BOTTOM or REF_TOP (or both)
> > > > > > > are set, doesn't that indicate it's a field picture?
> > > > > > > 
> > > > > > > Or conversely, if neither REF_BOTTOM or REF_TOP are set,
> > > > > > > then it's a frame picture?  
> > > > > > 
> > > > > > I think that's what I was trying to do here [1]
> > > > > > 
> > > > > > [1]https://patchwork.kernel.org/patch/11392095/  
> > > > > 
> > > > > Right. Aren't we missing a DPB_ENTRY_FLAG_TOP_FIELD?
> > > > > 
> > > > > If I understand correctly, the DPB can contain:
> > > > > 
> > > > > * frames (FLAG_FIELD not set)
> > > > > * a field pair, with a single field (FLAG_FIELD and either TOP or BOTTOM).
> > > > > * a field pair, with boths fields (FLAG_FIELD and both TOP or BOTTOM).
> > > > 
> > > > Well, my understand is that, if the buffer contains both a TOP and
> > > > BOTTOM field, it actually becomes a full frame, so you actually have
> > > > those cases:
> > > > 
> > > > * FLAG_FIELD not set: this a frame (note that a TOP/BOTTOM field
> > > >   decoded buffer can become of frame if it's complemented with the
> > > >   missing field later during the decoding)
> > > > * FLAG_FIELD set + BOTTOM_FIELD not set: this is a TOP field
> > > > * FLAG_FIELD set + BOTTOM_FIELD set: this is a BOTTOM field
> > > > * FLAG_FIELD not set + BOTTOM_FIELD set: invalid combination
> > 
> > Let's admit, while this work, it's odd. Can we just move to that instewad ?
> > 
> >   FLAG_TOP_FIELD
> >   FLAG_BOTTOM_FIELD
> >   FLAG_FRAME = (FLAG_TOP_FIELD | FLAG_BOTTOM_FIELD)
> > 
> > So it can be used as a flag, but also is a proper enum and there is no longer an
> > invalid combination.
> >   
> > > > but I might be wrong.
> 
> There seems to be some misunderstanding here, the top/bottom flagging should
> not be used to describe if the picture is a field, field pair or frame, it
> should be used to flag if a frame or the top and/or bottom field (in case of
> a field pair) is "used for short-term reference".
> 
> FLAG_TOP_REF
> FLAG_BOTTOM_REF
> FLAG_FRAME_REF = (FLAG_TOP_REF | FLAG_BOTTOM_REF)
> 
> Would be a more appropriate naming.

It's a subtle nuance, but could work.

The reason I referred to it like this is because in gstreamer-vaapi,
this information is deduced from picture->structure flags (I believe
it's inspired from JM reference decoder). This structure is updated
when a specific field has been decoded. So it effectively represent
which field of that picture are valid/decoded, and the combination of
this picture being reference and that flag is the only state used to
communicate that information. The real use for this is for the case we
have lost a field. A missing reference picture can then be detected.

So in gstreamer-vaapi, the case where you have both top/bottom field of
a reference being decoded, but only one of the field marked for
reference in the DPB does not exist. I don't know if that really exist
in H.264.

> 
> The FIELD_PIC flag would then be used to describe if the picture is a
> reference frame or a complementary reference field pair.
> 
> As described in hantro h264 driver [1] the MV buffer is split in two
> for field encoded frames, and I guess the rkvdec block does something
> similar and therefore the HW blocks probably needs to know if the reference
> picture is a reference frame or a complementary reference field pair.
> It should be possible to keep such state in driver but since such information
> was easily available in ffmpeg and the driver being "stateless" using a flag
> seamed like a good choice at the time.
> 
> Please note that I have not done any test without the "field pic" flagging
> but both mpp and the imx/hantro reference code are configuring this bit.
> 
> [1] https://git.linuxtv.org/media_tree.git/tree/drivers/staging/media/hantro/hantro_g1_h264_dec.c#n265
> 
> Regards,
> Jonas
> 
> > > Yes, perhaps that's correct. I was trying to think strictly
> > > in terms of the H264 semantics, to define a clean interface.
> > > 
> > > From the mpp code, looks like the above is enough for rkvdec
> > > (although I haven't done any tests).
> > > 
> > > Ezequiel
> > > 
> > > 
> > >
Ezequiel Garcia July 12, 2020, 10:59 p.m. UTC | #11
On Sat, 2020-07-11 at 10:21 +0000, Jonas Karlman wrote:
> On 2020-07-10 23:49, Nicolas Dufresne wrote:
> > Le vendredi 10 juillet 2020 à 09:25 -0300, Ezequiel Garcia a écrit :
> > > +Nicolas
> > > 
> > > On Fri, 2020-07-10 at 14:05 +0200, Boris Brezillon wrote:
> > > > On Fri, 10 Jul 2020 08:50:28 -0300
> > > > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > > 
> > > > > On Fri, 2020-07-10 at 10:13 +0200, Boris Brezillon wrote:
> > > > > > On Fri, 10 Jul 2020 01:21:07 -0300
> > > > > > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > > > >   
> > > > > > > Hello Jonas,
> > > > > > > 
> > > > > > > In the context of the uAPI cleanup,
> > > > > > > I'm revisiting this patch.
> > > > > > > 
> > > > > > > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:  
> > > > > > > > Add DPB entry flags to help indicate when a reference frame is a
> > > > > > > > field picture
> > > > > > > > and how the DPB entry is referenced, top or bottom field or full
> > > > > > > > frame.
> > > > > > > > 
> > > > > > > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > > > > > > > ---
> > > > > > > >  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
> > > > > > > >  include/media/h264-ctrls.h                       |  4 ++++
> > > > > > > >  2 files changed, 16 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > index bc5dd8e76567..eb6c32668ad7 100644
> > > > > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > @@ -2022,6 +2022,18 @@ enum
> > > > > > > > v4l2_mpeg_video_h264_hierarchical_coding_type -
> > > > > > > >      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
> > > > > > > >        - 0x00000004
> > > > > > > >        - The DPB entry is a long term reference frame
> > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> > > > > > > > +      - 0x00000008
> > > > > > > > +      - The DPB entry is a field picture
> > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> > > > > > > > +      - 0x00000010
> > > > > > > > +      - The DPB entry is a top field reference
> > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> > > > > > > > +      - 0x00000020
> > > > > > > > +      - The DPB entry is a bottom field reference
> > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> > > > > > > > +      - 0x00000030
> > > > > > > > +      - The DPB entry is a reference frame
> > > > > > > >  
> > > > > > > >  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
> > > > > > > >      Specifies the decoding mode to use. Currently exposes slice-
> > > > > > > > based and
> > > > > > > > diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> > > > > > > > index e877bf1d537c..76020ebd1e6c 100644
> > > > > > > > --- a/include/media/h264-ctrls.h
> > > > > > > > +++ b/include/media/h264-ctrls.h
> > > > > > > > @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
> > > > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > > > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > > > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
> > > > > > > >      
> > > > > > > 
> > > > > > > I've been going thru the H264 spec and I'm unsure,
> > > > > > > are all these flags semantically needed?
> > > > > > > 
> > > > > > > For instance, if one of REF_BOTTOM or REF_TOP (or both)
> > > > > > > are set, doesn't that indicate it's a field picture?
> > > > > > > 
> > > > > > > Or conversely, if neither REF_BOTTOM or REF_TOP are set,
> > > > > > > then it's a frame picture?  
> > > > > > 
> > > > > > I think that's what I was trying to do here [1]
> > > > > > 
> > > > > > [1]https://patchwork.kernel.org/patch/11392095/  
> > > > > 
> > > > > Right. Aren't we missing a DPB_ENTRY_FLAG_TOP_FIELD?
> > > > > 
> > > > > If I understand correctly, the DPB can contain:
> > > > > 
> > > > > * frames (FLAG_FIELD not set)
> > > > > * a field pair, with a single field (FLAG_FIELD and either TOP or BOTTOM).
> > > > > * a field pair, with boths fields (FLAG_FIELD and both TOP or BOTTOM).
> > > > 
> > > > Well, my understand is that, if the buffer contains both a TOP and
> > > > BOTTOM field, it actually becomes a full frame, so you actually have
> > > > those cases:
> > > > 
> > > > * FLAG_FIELD not set: this a frame (note that a TOP/BOTTOM field
> > > >   decoded buffer can become of frame if it's complemented with the
> > > >   missing field later during the decoding)
> > > > * FLAG_FIELD set + BOTTOM_FIELD not set: this is a TOP field
> > > > * FLAG_FIELD set + BOTTOM_FIELD set: this is a BOTTOM field
> > > > * FLAG_FIELD not set + BOTTOM_FIELD set: invalid combination
> > 
> > Let's admit, while this work, it's odd. Can we just move to that instewad ?
> > 
> >   FLAG_TOP_FIELD
> >   FLAG_BOTTOM_FIELD
> >   FLAG_FRAME = (FLAG_TOP_FIELD | FLAG_BOTTOM_FIELD)
> > 
> > So it can be used as a flag, but also is a proper enum and there is no longer an
> > invalid combination.
> >   
> > > > but I might be wrong.
> 
> There seems to be some misunderstanding here, the top/bottom flagging should
> not be used to describe if the picture is a field, field pair or frame, it
> should be used to flag if a frame or the top and/or bottom field (in case of
> a field pair) is "used for short-term reference".
> 

I'm not sure why "used for short-term reference" instead
of "used for reference".

> FLAG_TOP_REF
> FLAG_BOTTOM_REF
> FLAG_FRAME_REF = (FLAG_TOP_REF | FLAG_BOTTOM_REF)
> 
> Would be a more appropriate naming.
> 
> The FIELD_PIC flag would then be used to describe if the picture is a
> reference frame or a complementary reference field pair.
> 
> As described in hantro h264 driver [1] the MV buffer is split in two
> for field encoded frames, and I guess the rkvdec block does something
> similar and therefore the HW blocks probably needs to know if the reference
> picture is a reference frame or a complementary reference field pair.
> It should be possible to keep such state in driver but since such information
> was easily available in ffmpeg and the driver being "stateless" using a flag
> seamed like a good choice at the time.
> 
> Please note that I have not done any test without the "field pic" flagging
> but both mpp and the imx/hantro reference code are configuring this bit.
> 
> [1] https://git.linuxtv.org/media_tree.git/tree/drivers/staging/media/hantro/hantro_g1_h264_dec.c#n265
> 

How about this:

#define V4L2_H264_DPB_ENTRY_FLAG_VALID          0x01
#define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE         0x02
#define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM      0x04
#define V4L2_H264_DPB_ENTRY_FLAG_FIELD          0x08

enum v4l2_h264_dpb_reference {
        V4L2_H264_DPB_TOP_REF = 0x1,
        V4L2_H264_DPB_BOTTOM_REF = 0x2,
        V4L2_H264_DPB_FRAME_REF = 0x3,
};

With the following semantics (which should be
specified in the documentation):

* VALID: non-empty DPB entry.
* ACTIVE: picture is marked as "used for reference" (short-term or long-term).
* LONG_TERM: picture is marked as "used for long-term".
* FIELD: picture is a single field, or a complementary field pair. 

The v4l2_h264_dpb_reference enum would flag which
of the fields as used for reference.

This enum seems less ambiguous and easier to use for both
drivers and applications.

I am not exactly sure why a driver would ever need to
configure an "unused for reference" decoded picture
(i.e. VALID=1, ACTIVE=0), but I guess it's just clearer
to include this in the interface.

Thanks,
Ezequiel

> Regards,
> Jonas
> 
> > > Yes, perhaps that's correct. I was trying to think strictly
> > > in terms of the H264 semantics, to define a clean interface.
> > > 
> > > From the mpp code, looks like the above is enough for rkvdec
> > > (although I haven't done any tests).
> > > 
> > > Ezequiel
> > > 
> > > 
> > >
Nicolas Dufresne July 14, 2020, 4:04 p.m. UTC | #12
Le dimanche 12 juillet 2020 à 19:59 -0300, Ezequiel Garcia a écrit :
> On Sat, 2020-07-11 at 10:21 +0000, Jonas Karlman wrote:
> > On 2020-07-10 23:49, Nicolas Dufresne wrote:
> > > Le vendredi 10 juillet 2020 à 09:25 -0300, Ezequiel Garcia a écrit :
> > > > +Nicolas
> > > > 
> > > > On Fri, 2020-07-10 at 14:05 +0200, Boris Brezillon wrote:
> > > > > On Fri, 10 Jul 2020 08:50:28 -0300
> > > > > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > > > 
> > > > > > On Fri, 2020-07-10 at 10:13 +0200, Boris Brezillon wrote:
> > > > > > > On Fri, 10 Jul 2020 01:21:07 -0300
> > > > > > > Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > > > > >   
> > > > > > > > Hello Jonas,
> > > > > > > > 
> > > > > > > > In the context of the uAPI cleanup,
> > > > > > > > I'm revisiting this patch.
> > > > > > > > 
> > > > > > > > On Sun, 2019-09-01 at 12:45 +0000, Jonas Karlman wrote:  
> > > > > > > > > Add DPB entry flags to help indicate when a reference frame is a
> > > > > > > > > field picture
> > > > > > > > > and how the DPB entry is referenced, top or bottom field or full
> > > > > > > > > frame.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > > > > > > > > ---
> > > > > > > > >  Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 12 ++++++++++++
> > > > > > > > >  include/media/h264-ctrls.h                       |  4 ++++
> > > > > > > > >  2 files changed, 16 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > > b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > > index bc5dd8e76567..eb6c32668ad7 100644
> > > > > > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
> > > > > > > > > @@ -2022,6 +2022,18 @@ enum
> > > > > > > > > v4l2_mpeg_video_h264_hierarchical_coding_type -
> > > > > > > > >      * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
> > > > > > > > >        - 0x00000004
> > > > > > > > >        - The DPB entry is a long term reference frame
> > > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
> > > > > > > > > +      - 0x00000008
> > > > > > > > > +      - The DPB entry is a field picture
> > > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
> > > > > > > > > +      - 0x00000010
> > > > > > > > > +      - The DPB entry is a top field reference
> > > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
> > > > > > > > > +      - 0x00000020
> > > > > > > > > +      - The DPB entry is a bottom field reference
> > > > > > > > > +    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
> > > > > > > > > +      - 0x00000030
> > > > > > > > > +      - The DPB entry is a reference frame
> > > > > > > > >  
> > > > > > > > >  ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
> > > > > > > > >      Specifies the decoding mode to use. Currently exposes slice-
> > > > > > > > > based and
> > > > > > > > > diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
> > > > > > > > > index e877bf1d537c..76020ebd1e6c 100644
> > > > > > > > > --- a/include/media/h264-ctrls.h
> > > > > > > > > +++ b/include/media/h264-ctrls.h
> > > > > > > > > @@ -185,6 +185,10 @@ struct v4l2_ctrl_h264_slice_params {
> > > > > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
> > > > > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
> > > > > > > > >  #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
> > > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
> > > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
> > > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
> > > > > > > > > +#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
> > > > > > > > >      
> > > > > > > > 
> > > > > > > > I've been going thru the H264 spec and I'm unsure,
> > > > > > > > are all these flags semantically needed?
> > > > > > > > 
> > > > > > > > For instance, if one of REF_BOTTOM or REF_TOP (or both)
> > > > > > > > are set, doesn't that indicate it's a field picture?
> > > > > > > > 
> > > > > > > > Or conversely, if neither REF_BOTTOM or REF_TOP are set,
> > > > > > > > then it's a frame picture?  
> > > > > > > 
> > > > > > > I think that's what I was trying to do here [1]
> > > > > > > 
> > > > > > > [1]https://patchwork.kernel.org/patch/11392095/  
> > > > > > 
> > > > > > Right. Aren't we missing a DPB_ENTRY_FLAG_TOP_FIELD?
> > > > > > 
> > > > > > If I understand correctly, the DPB can contain:
> > > > > > 
> > > > > > * frames (FLAG_FIELD not set)
> > > > > > * a field pair, with a single field (FLAG_FIELD and either TOP or BOTTOM).
> > > > > > * a field pair, with boths fields (FLAG_FIELD and both TOP or BOTTOM).
> > > > > 
> > > > > Well, my understand is that, if the buffer contains both a TOP and
> > > > > BOTTOM field, it actually becomes a full frame, so you actually have
> > > > > those cases:
> > > > > 
> > > > > * FLAG_FIELD not set: this a frame (note that a TOP/BOTTOM field
> > > > >   decoded buffer can become of frame if it's complemented with the
> > > > >   missing field later during the decoding)
> > > > > * FLAG_FIELD set + BOTTOM_FIELD not set: this is a TOP field
> > > > > * FLAG_FIELD set + BOTTOM_FIELD set: this is a BOTTOM field
> > > > > * FLAG_FIELD not set + BOTTOM_FIELD set: invalid combination
> > > 
> > > Let's admit, while this work, it's odd. Can we just move to that instewad ?
> > > 
> > >   FLAG_TOP_FIELD
> > >   FLAG_BOTTOM_FIELD
> > >   FLAG_FRAME = (FLAG_TOP_FIELD | FLAG_BOTTOM_FIELD)
> > > 
> > > So it can be used as a flag, but also is a proper enum and there is no longer an
> > > invalid combination.
> > >   
> > > > > but I might be wrong.
> > 
> > There seems to be some misunderstanding here, the top/bottom flagging should
> > not be used to describe if the picture is a field, field pair or frame, it
> > should be used to flag if a frame or the top and/or bottom field (in case of
> > a field pair) is "used for short-term reference".
> > 
> 
> I'm not sure why "used for short-term reference" instead
> of "used for reference".
> 
> > FLAG_TOP_REF
> > FLAG_BOTTOM_REF
> > FLAG_FRAME_REF = (FLAG_TOP_REF | FLAG_BOTTOM_REF)
> > 
> > Would be a more appropriate naming.
> > 
> > The FIELD_PIC flag would then be used to describe if the picture is a
> > reference frame or a complementary reference field pair.
> > 
> > As described in hantro h264 driver [1] the MV buffer is split in two
> > for field encoded frames, and I guess the rkvdec block does something
> > similar and therefore the HW blocks probably needs to know if the reference
> > picture is a reference frame or a complementary reference field pair.
> > It should be possible to keep such state in driver but since such information
> > was easily available in ffmpeg and the driver being "stateless" using a flag
> > seamed like a good choice at the time.
> > 
> > Please note that I have not done any test without the "field pic" flagging
> > but both mpp and the imx/hantro reference code are configuring this bit.
> > 
> > [1] https://git.linuxtv.org/media_tree.git/tree/drivers/staging/media/hantro/hantro_g1_h264_dec.c#n265
> > 
> 
> How about this:
> 
> #define V4L2_H264_DPB_ENTRY_FLAG_VALID          0x01
> #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE         0x02
> #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM      0x04
> #define V4L2_H264_DPB_ENTRY_FLAG_FIELD          0x08
> 
> enum v4l2_h264_dpb_reference {
>         V4L2_H264_DPB_TOP_REF = 0x1,
>         V4L2_H264_DPB_BOTTOM_REF = 0x2,
>         V4L2_H264_DPB_FRAME_REF = 0x3,
> };
> 
> With the following semantics (which should be
> specified in the documentation):
> 
> * VALID: non-empty DPB entry.
> * ACTIVE: picture is marked as "used for reference" (short-term or long-term).
> * LONG_TERM: picture is marked as "used for long-term".
> * FIELD: picture is a single field, or a complementary field pair. 
> 
> The v4l2_h264_dpb_reference enum would flag which
> of the fields as used for reference.
> 
> This enum seems less ambiguous and easier to use for both
> drivers and applications.
> 
> I am not exactly sure why a driver would ever need to
> configure an "unused for reference" decoded picture
> (i.e. VALID=1, ACTIVE=0), but I guess it's just clearer
> to include this in the interface.

Indeed, that might have leaked from what we do in userspace, were we
need to track this. I haven't seen anything that would do concealment
or anything anyway.

I don't have definitive opinion on the above, but I think it's getting
in the right direction.

> 
> Thanks,
> Ezequiel
> 
> > Regards,
> > Jonas
> > 
> > > > Yes, perhaps that's correct. I was trying to think strictly
> > > > in terms of the H264 semantics, to define a clean interface.
> > > > 
> > > > From the mpp code, looks like the above is enough for rkvdec
> > > > (although I haven't done any tests).
> > > > 
> > > > Ezequiel
> > > > 
> > > > 
> > > > 
> 
>
diff mbox series

Patch

diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
index bc5dd8e76567..eb6c32668ad7 100644
--- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
+++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst
@@ -2022,6 +2022,18 @@  enum v4l2_mpeg_video_h264_hierarchical_coding_type -
     * - ``V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM``
       - 0x00000004
       - The DPB entry is a long term reference frame
+    * - ``V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE``
+      - 0x00000008
+      - The DPB entry is a field picture
+    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_TOP``
+      - 0x00000010
+      - The DPB entry is a top field reference
+    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM``
+      - 0x00000020
+      - The DPB entry is a bottom field reference
+    * - ``V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME``
+      - 0x00000030
+      - The DPB entry is a reference frame
 
 ``V4L2_CID_MPEG_VIDEO_H264_DECODE_MODE (enum)``
     Specifies the decoding mode to use. Currently exposes slice-based and
diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h
index e877bf1d537c..76020ebd1e6c 100644
--- a/include/media/h264-ctrls.h
+++ b/include/media/h264-ctrls.h
@@ -185,6 +185,10 @@  struct v4l2_ctrl_h264_slice_params {
 #define V4L2_H264_DPB_ENTRY_FLAG_VALID		0x01
 #define V4L2_H264_DPB_ENTRY_FLAG_ACTIVE		0x02
 #define V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM	0x04
+#define V4L2_H264_DPB_ENTRY_FLAG_FIELD_PICTURE	0x08
+#define V4L2_H264_DPB_ENTRY_FLAG_REF_TOP	0x10
+#define V4L2_H264_DPB_ENTRY_FLAG_REF_BOTTOM	0x20
+#define V4L2_H264_DPB_ENTRY_FLAG_REF_FRAME	0x30
 
 struct v4l2_h264_dpb_entry {
 	__u64 reference_ts;