mbox series

[v3,0/2] media: cedrus: Add H264 decoding support

Message ID cover.d3bb4d93da91ed5668025354ee1fca656e7d5b8b.1549895062.git-series.maxime.ripard@bootlin.com (mailing list archive)
Headers show
Series media: cedrus: Add H264 decoding support | expand

Message

Maxime Ripard Feb. 11, 2019, 2:39 p.m. UTC
Hi,

Here is a new version of the H264 decoding support in the cedrus
driver.

As you might already know, the cedrus driver relies on the Request
API, and is a reverse engineered driver for the video decoding engine
found on the Allwinner SoCs.

This work has been possible thanks to the work done by the people
behind libvdpau-sunxi found here:
https://github.com/linux-sunxi/libvdpau-sunxi/

I've tested the various ABI using this gdb script:
http://code.bulix.org/jl4se4-505620?raw

And this test script:
http://code.bulix.org/8zle4s-505623?raw

The application compiled is quite trivial:
http://code.bulix.org/e34zp8-505624?raw

The output is:
arm:	builds/arm-test-v4l2-h264-structures
	SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
x86:	builds/x86-test-v4l2-h264-structures
	SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
x64:	builds/x64-test-v4l2-h264-structures
	SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
arm64:	builds/arm64-test-v4l2-h264-structures
	SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318

Let me know if there's any flaw using that test setup, or if you have
any comments on the patches.

Maxime

Changes from v2:
  - Simplified _cedrus_write_ref_list as suggested by Jernej
  - Set whether the frame is used as reference using nal_ref_idc
  - Respect chroma_format_idc
  - Fixes for the scaling list and prediction tables
  - Wrote the documentation for the flags
  - Added a bunch of defines to the driver bit fields
  - Reworded the controls and data format descriptions as suggested
    by Hans
  - Reworked the controls' structure field size to avoid padding
  - Removed the long term reference flag
  - Reintroduced the neighbor info buffer
  - Removed the ref_pic_list_p0/b0/b1 arrays that are redundant with the
    one in the DPB
  - used the timestamps instead of tags
  - Rebased on 5.0-rc1

Changes from v1:
  - Rebased on 4.20
  - Did the documentation for the userspace API
  - Used the tags instead of buffer IDs
  - Added a comment to explain why we still needed the swdec trigger
  - Reworked the MV col buffer in order to have one slot per frame
  - Removed the unused neighbor info buffer
  - Made sure to have the same structure offset and alignments across
    32 bits and 64 bits architecture

Maxime Ripard (1):
  media: cedrus: Add H264 decoding support

Pawel Osciak (1):
  media: uapi: Add H264 low-level decoder API compound controls.

 Documentation/media/uapi/v4l/biblio.rst            |   9 +-
 Documentation/media/uapi/v4l/extended-controls.rst | 530 +++++++++++++-
 Documentation/media/uapi/v4l/pixfmt-compressed.rst |  20 +-
 Documentation/media/uapi/v4l/vidioc-queryctrl.rst  |  30 +-
 Documentation/media/videodev2.h.rst.exceptions     |   5 +-
 drivers/media/v4l2-core/v4l2-ctrls.c               |  42 +-
 drivers/media/v4l2-core/v4l2-ioctl.c               |   1 +-
 drivers/staging/media/sunxi/cedrus/Makefile        |   3 +-
 drivers/staging/media/sunxi/cedrus/cedrus.c        |  31 +-
 drivers/staging/media/sunxi/cedrus/cedrus.h        |  38 +-
 drivers/staging/media/sunxi/cedrus/cedrus_dec.c    |  15 +-
 drivers/staging/media/sunxi/cedrus/cedrus_h264.c   | 589 ++++++++++++++-
 drivers/staging/media/sunxi/cedrus/cedrus_hw.c     |   4 +-
 drivers/staging/media/sunxi/cedrus/cedrus_regs.h   |  91 ++-
 drivers/staging/media/sunxi/cedrus/cedrus_video.c  |   9 +-
 include/media/h264-ctrls.h                         | 179 ++++-
 include/media/v4l2-ctrls.h                         |  13 +-
 include/uapi/linux/videodev2.h                     |   1 +-
 18 files changed, 1607 insertions(+), 3 deletions(-)
 create mode 100644 drivers/staging/media/sunxi/cedrus/cedrus_h264.c
 create mode 100644 include/media/h264-ctrls.h

base-commit: e7552cc33320523b660dd1891bceb616ced7b47c

Comments

Tomasz Figa Feb. 12, 2019, 12:50 p.m. UTC | #1
Hi Maxime,

On Mon, Feb 11, 2019 at 11:39 PM Maxime Ripard
<maxime.ripard@bootlin.com> wrote:
>
> Hi,
>
> Here is a new version of the H264 decoding support in the cedrus
> driver.

Thanks for working on this. Please see my comments below.

>
> As you might already know, the cedrus driver relies on the Request
> API, and is a reverse engineered driver for the video decoding engine
> found on the Allwinner SoCs.
>
> This work has been possible thanks to the work done by the people
> behind libvdpau-sunxi found here:
> https://github.com/linux-sunxi/libvdpau-sunxi/
>
> I've tested the various ABI using this gdb script:
> http://code.bulix.org/jl4se4-505620?raw
>
> And this test script:
> http://code.bulix.org/8zle4s-505623?raw
>
> The application compiled is quite trivial:
> http://code.bulix.org/e34zp8-505624?raw
>
> The output is:
> arm:    builds/arm-test-v4l2-h264-structures
>         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> x86:    builds/x86-test-v4l2-h264-structures
>         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> x64:    builds/x64-test-v4l2-h264-structures
>         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> arm64:  builds/arm64-test-v4l2-h264-structures
>         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
>
> Let me know if there's any flaw using that test setup, or if you have
> any comments on the patches.
>
> Maxime
>
> Changes from v2:
>   - Simplified _cedrus_write_ref_list as suggested by Jernej
>   - Set whether the frame is used as reference using nal_ref_idc
>   - Respect chroma_format_idc
>   - Fixes for the scaling list and prediction tables
>   - Wrote the documentation for the flags
>   - Added a bunch of defines to the driver bit fields
>   - Reworded the controls and data format descriptions as suggested
>     by Hans
>   - Reworked the controls' structure field size to avoid padding
>   - Removed the long term reference flag

This and...

>   - Reintroduced the neighbor info buffer
>   - Removed the ref_pic_list_p0/b0/b1 arrays that are redundant with the
>     one in the DPB

these are used in our Rockchip VDEC driver.

Could you elaborate on the reasons why they got removed?

Best regards,
Tomasz
Ezequiel Garcia Feb. 12, 2019, 9:22 p.m. UTC | #2
Hey Tomasz,

On Tue, 2019-02-12 at 21:50 +0900, Tomasz Figa wrote:
> Hi Maxime,
> 
> On Mon, Feb 11, 2019 at 11:39 PM Maxime Ripard
> <maxime.ripard@bootlin.com> wrote:
> > Hi,
> > 
> > Here is a new version of the H264 decoding support in the cedrus
> > driver.
> 
> Thanks for working on this. Please see my comments below.
> 
> > As you might already know, the cedrus driver relies on the Request
> > API, and is a reverse engineered driver for the video decoding engine
> > found on the Allwinner SoCs.
> > 
> > This work has been possible thanks to the work done by the people
> > behind libvdpau-sunxi found here:
> > https://github.com/linux-sunxi/libvdpau-sunxi/
> > 
> > I've tested the various ABI using this gdb script:
> > http://code.bulix.org/jl4se4-505620?raw
> > 
> > And this test script:
> > http://code.bulix.org/8zle4s-505623?raw
> > 
> > The application compiled is quite trivial:
> > http://code.bulix.org/e34zp8-505624?raw
> > 
> > The output is:
> > arm:    builds/arm-test-v4l2-h264-structures
> >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > x86:    builds/x86-test-v4l2-h264-structures
> >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > x64:    builds/x64-test-v4l2-h264-structures
> >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > arm64:  builds/arm64-test-v4l2-h264-structures
> >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > 
> > Let me know if there's any flaw using that test setup, or if you have
> > any comments on the patches.
> > 
> > Maxime
> > 
> > Changes from v2:
> >   - Simplified _cedrus_write_ref_list as suggested by Jernej
> >   - Set whether the frame is used as reference using nal_ref_idc
> >   - Respect chroma_format_idc
> >   - Fixes for the scaling list and prediction tables
> >   - Wrote the documentation for the flags
> >   - Added a bunch of defines to the driver bit fields
> >   - Reworded the controls and data format descriptions as suggested
> >     by Hans
> >   - Reworked the controls' structure field size to avoid padding
> >   - Removed the long term reference flag
> 
> This and...
> 

Maxime has dropped this because of Ayaka's mail about long term references
not making much sense in stateless decoders.

I noticed that RK3399 TRM has a field to specify long term refs and
so was wondering about this item as well.

> >   - Reintroduced the neighbor info buffer
> >   - Removed the ref_pic_list_p0/b0/b1 arrays that are redundant with the
> >     one in the DPB
> 
> these are used in our Rockchip VDEC driver.
> 
> Could you elaborate on the reasons why they got removed?
> 

If I understood correctly, there are two reference picture lists.
P-frames will populate ref_pic_list0 and B-frames will populate both.

According to this, v4l2_ctrl_h264_slice_param.ref_pic_list0 and .ref_pic_list1
should be enough and ref_pic_list_p0/b0/b1 are not needed.

What do you think?

Regards,
Ezequiel
Tomasz Figa Feb. 13, 2019, 3:02 a.m. UTC | #3
On Wed, Feb 13, 2019 at 6:22 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
>
> Hey Tomasz,
>
> On Tue, 2019-02-12 at 21:50 +0900, Tomasz Figa wrote:
> > Hi Maxime,
> >
> > On Mon, Feb 11, 2019 at 11:39 PM Maxime Ripard
> > <maxime.ripard@bootlin.com> wrote:
> > > Hi,
> > >
> > > Here is a new version of the H264 decoding support in the cedrus
> > > driver.
> >
> > Thanks for working on this. Please see my comments below.
> >
> > > As you might already know, the cedrus driver relies on the Request
> > > API, and is a reverse engineered driver for the video decoding engine
> > > found on the Allwinner SoCs.
> > >
> > > This work has been possible thanks to the work done by the people
> > > behind libvdpau-sunxi found here:
> > > https://github.com/linux-sunxi/libvdpau-sunxi/
> > >
> > > I've tested the various ABI using this gdb script:
> > > http://code.bulix.org/jl4se4-505620?raw
> > >
> > > And this test script:
> > > http://code.bulix.org/8zle4s-505623?raw
> > >
> > > The application compiled is quite trivial:
> > > http://code.bulix.org/e34zp8-505624?raw
> > >
> > > The output is:
> > > arm:    builds/arm-test-v4l2-h264-structures
> > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > x86:    builds/x86-test-v4l2-h264-structures
> > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > x64:    builds/x64-test-v4l2-h264-structures
> > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > arm64:  builds/arm64-test-v4l2-h264-structures
> > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > >
> > > Let me know if there's any flaw using that test setup, or if you have
> > > any comments on the patches.
> > >
> > > Maxime
> > >
> > > Changes from v2:
> > >   - Simplified _cedrus_write_ref_list as suggested by Jernej
> > >   - Set whether the frame is used as reference using nal_ref_idc
> > >   - Respect chroma_format_idc
> > >   - Fixes for the scaling list and prediction tables
> > >   - Wrote the documentation for the flags
> > >   - Added a bunch of defines to the driver bit fields
> > >   - Reworded the controls and data format descriptions as suggested
> > >     by Hans
> > >   - Reworked the controls' structure field size to avoid padding
> > >   - Removed the long term reference flag
> >
> > This and...
> >
>
> Maxime has dropped this because of Ayaka's mail about long term references
> not making much sense in stateless decoders.

I haven't seen any argument confirming that thesis, though. I should
have kicked in earlier, sorry.

>
> I noticed that RK3399 TRM has a field to specify long term refs and
> so was wondering about this item as well.
>
> > >   - Reintroduced the neighbor info buffer
> > >   - Removed the ref_pic_list_p0/b0/b1 arrays that are redundant with the
> > >     one in the DPB
> >
> > these are used in our Rockchip VDEC driver.
> >
> > Could you elaborate on the reasons why they got removed?
> >
>
> If I understood correctly, there are two reference picture lists.
> P-frames will populate ref_pic_list0 and B-frames will populate both.
>
> According to this, v4l2_ctrl_h264_slice_param.ref_pic_list0 and .ref_pic_list1
> should be enough and ref_pic_list_p0/b0/b1 are not needed.
>
> What do you think?

The lists in v4l2_ctrl_h264_slice_param are expected to be past the
per-slice modification stage (which is quite complicated and better
done in userspace), while the ones in v4l2_ctrl_h264_decode_param just
in the original order. Rockchip VPU expects them in the original order
and does the modification in the hardware.

Best regards,
Tomasz
Ezequiel Garcia Feb. 13, 2019, 4:28 p.m. UTC | #4
On Wed, 2019-02-13 at 12:02 +0900, Tomasz Figa wrote:
> On Wed, Feb 13, 2019 at 6:22 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > Hey Tomasz,
> > 
> > On Tue, 2019-02-12 at 21:50 +0900, Tomasz Figa wrote:
> > > Hi Maxime,
> > > 
> > > On Mon, Feb 11, 2019 at 11:39 PM Maxime Ripard
> > > <maxime.ripard@bootlin.com> wrote:
> > > > Hi,
> > > > 
> > > > Here is a new version of the H264 decoding support in the cedrus
> > > > driver.
> > > 
> > > Thanks for working on this. Please see my comments below.
> > > 
> > > > As you might already know, the cedrus driver relies on the Request
> > > > API, and is a reverse engineered driver for the video decoding engine
> > > > found on the Allwinner SoCs.
> > > > 
> > > > This work has been possible thanks to the work done by the people
> > > > behind libvdpau-sunxi found here:
> > > > https://github.com/linux-sunxi/libvdpau-sunxi/
> > > > 
> > > > I've tested the various ABI using this gdb script:
> > > > http://code.bulix.org/jl4se4-505620?raw
> > > > 
> > > > And this test script:
> > > > http://code.bulix.org/8zle4s-505623?raw
> > > > 
> > > > The application compiled is quite trivial:
> > > > http://code.bulix.org/e34zp8-505624?raw
> > > > 
> > > > The output is:
> > > > arm:    builds/arm-test-v4l2-h264-structures
> > > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > > x86:    builds/x86-test-v4l2-h264-structures
> > > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > > x64:    builds/x64-test-v4l2-h264-structures
> > > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > > arm64:  builds/arm64-test-v4l2-h264-structures
> > > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > > 
> > > > Let me know if there's any flaw using that test setup, or if you have
> > > > any comments on the patches.
> > > > 
> > > > Maxime
> > > > 
> > > > Changes from v2:
> > > >   - Simplified _cedrus_write_ref_list as suggested by Jernej
> > > >   - Set whether the frame is used as reference using nal_ref_idc
> > > >   - Respect chroma_format_idc
> > > >   - Fixes for the scaling list and prediction tables
> > > >   - Wrote the documentation for the flags
> > > >   - Added a bunch of defines to the driver bit fields
> > > >   - Reworded the controls and data format descriptions as suggested
> > > >     by Hans
> > > >   - Reworked the controls' structure field size to avoid padding
> > > >   - Removed the long term reference flag
> > > 
> > > This and...
> > > 
> > 
> > Maxime has dropped this because of Ayaka's mail about long term references
> > not making much sense in stateless decoders.
> 
> I haven't seen any argument confirming that thesis, though. I should
> have kicked in earlier, sorry.
> 

OK, in that case, we need to have this flag back.

> > I noticed that RK3399 TRM has a field to specify long term refs and
> > so was wondering about this item as well.
> > 
> > > >   - Reintroduced the neighbor info buffer
> > > >   - Removed the ref_pic_list_p0/b0/b1 arrays that are redundant with the
> > > >     one in the DPB
> > > 
> > > these are used in our Rockchip VDEC driver.
> > > 
> > > Could you elaborate on the reasons why they got removed?
> > > 
> > 
> > If I understood correctly, there are two reference picture lists.
> > P-frames will populate ref_pic_list0 and B-frames will populate both.
> > 
> > According to this, v4l2_ctrl_h264_slice_param.ref_pic_list0 and .ref_pic_list1
> > should be enough and ref_pic_list_p0/b0/b1 are not needed.
> > 
> > What do you think?
> 
> The lists in v4l2_ctrl_h264_slice_param are expected to be past the
> per-slice modification stage (which is quite complicated and better
> done in userspace),

The fact that these are RefPicList0 and RefPicList1, after
the reordering stage should be better documented.

> while the ones in v4l2_ctrl_h264_decode_param just
> in the original order. Rockchip VPU expects them in the original order
> and does the modification in the hardware.
> 

OK, I see.

So, we have RefPicList0 and RefPicList1, and there is an initialization
stage and a modification/reordering process.

One could argue that it's more generic to just pass the initial list,
but that would mean doing in the kernel something that is easier
done in userspace (and parsers doing this are already available).

The question would be what is the most generic way of passing
the RefPicList0 and RefPicList1 in its initial state.

1/ We create additional controls for these.

2/ We put them on some of the other controls. Putting them on
v4l2_ctrl_h264_decode_param didn't seem too wrong.

Any objections to put them back in there?

Regards,
Ezequiel
Maxime Ripard Feb. 14, 2019, 3:47 p.m. UTC | #5
On Wed, Feb 13, 2019 at 01:28:34PM -0300, Ezequiel Garcia wrote:
> On Wed, 2019-02-13 at 12:02 +0900, Tomasz Figa wrote:
> > On Wed, Feb 13, 2019 at 6:22 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > Hey Tomasz,
> > > 
> > > On Tue, 2019-02-12 at 21:50 +0900, Tomasz Figa wrote:
> > > > Hi Maxime,
> > > > 
> > > > On Mon, Feb 11, 2019 at 11:39 PM Maxime Ripard
> > > > <maxime.ripard@bootlin.com> wrote:
> > > > > Hi,
> > > > > 
> > > > > Here is a new version of the H264 decoding support in the cedrus
> > > > > driver.
> > > > 
> > > > Thanks for working on this. Please see my comments below.
> > > > 
> > > > > As you might already know, the cedrus driver relies on the Request
> > > > > API, and is a reverse engineered driver for the video decoding engine
> > > > > found on the Allwinner SoCs.
> > > > > 
> > > > > This work has been possible thanks to the work done by the people
> > > > > behind libvdpau-sunxi found here:
> > > > > https://github.com/linux-sunxi/libvdpau-sunxi/
> > > > > 
> > > > > I've tested the various ABI using this gdb script:
> > > > > http://code.bulix.org/jl4se4-505620?raw
> > > > > 
> > > > > And this test script:
> > > > > http://code.bulix.org/8zle4s-505623?raw
> > > > > 
> > > > > The application compiled is quite trivial:
> > > > > http://code.bulix.org/e34zp8-505624?raw
> > > > > 
> > > > > The output is:
> > > > > arm:    builds/arm-test-v4l2-h264-structures
> > > > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > > > x86:    builds/x86-test-v4l2-h264-structures
> > > > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > > > x64:    builds/x64-test-v4l2-h264-structures
> > > > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > > > arm64:  builds/arm64-test-v4l2-h264-structures
> > > > >         SHA1: 88cbf7485ba81831fc3b93772b215599b3b38318
> > > > > 
> > > > > Let me know if there's any flaw using that test setup, or if you have
> > > > > any comments on the patches.
> > > > > 
> > > > > Maxime
> > > > > 
> > > > > Changes from v2:
> > > > >   - Simplified _cedrus_write_ref_list as suggested by Jernej
> > > > >   - Set whether the frame is used as reference using nal_ref_idc
> > > > >   - Respect chroma_format_idc
> > > > >   - Fixes for the scaling list and prediction tables
> > > > >   - Wrote the documentation for the flags
> > > > >   - Added a bunch of defines to the driver bit fields
> > > > >   - Reworded the controls and data format descriptions as suggested
> > > > >     by Hans
> > > > >   - Reworked the controls' structure field size to avoid padding
> > > > >   - Removed the long term reference flag
> > > > 
> > > > This and...
> > > > 
> > > 
> > > Maxime has dropped this because of Ayaka's mail about long term references
> > > not making much sense in stateless decoders.
> > 
> > I haven't seen any argument confirming that thesis, though. I should
> > have kicked in earlier, sorry.
> > 
> 
> OK, in that case, we need to have this flag back.
> 
> > > I noticed that RK3399 TRM has a field to specify long term refs and
> > > so was wondering about this item as well.
> > > 
> > > > >   - Reintroduced the neighbor info buffer
> > > > >   - Removed the ref_pic_list_p0/b0/b1 arrays that are redundant with the
> > > > >     one in the DPB
> > > > 
> > > > these are used in our Rockchip VDEC driver.
> > > > 
> > > > Could you elaborate on the reasons why they got removed?
> > > > 
> > > 
> > > If I understood correctly, there are two reference picture lists.
> > > P-frames will populate ref_pic_list0 and B-frames will populate both.
> > > 
> > > According to this, v4l2_ctrl_h264_slice_param.ref_pic_list0 and .ref_pic_list1
> > > should be enough and ref_pic_list_p0/b0/b1 are not needed.
> > > 
> > > What do you think?
> > 
> > The lists in v4l2_ctrl_h264_slice_param are expected to be past the
> > per-slice modification stage (which is quite complicated and better
> > done in userspace),
> 
> The fact that these are RefPicList0 and RefPicList1, after
> the reordering stage should be better documented.
> 
> > while the ones in v4l2_ctrl_h264_decode_param just
> > in the original order. Rockchip VPU expects them in the original order
> > and does the modification in the hardware.
> > 
> 
> OK, I see.
> 
> So, we have RefPicList0 and RefPicList1, and there is an initialization
> stage and a modification/reordering process.
> 
> One could argue that it's more generic to just pass the initial list,
> but that would mean doing in the kernel something that is easier
> done in userspace (and parsers doing this are already available).
> 
> The question would be what is the most generic way of passing
> the RefPicList0 and RefPicList1 in its initial state.
> 
> 1/ We create additional controls for these.
> 
> 2/ We put them on some of the other controls. Putting them on
> v4l2_ctrl_h264_decode_param didn't seem too wrong.
> 
> Any objections to put them back in there?

None. i'll put them back in.

Maxime