mbox series

[v2,0/4] Enable Hantro G1 post-processor

Message ID 20191003190833.29046-1-ezequiel@collabora.com (mailing list archive)
Headers show
Series Enable Hantro G1 post-processor | expand

Message

Ezequiel Garcia Oct. 3, 2019, 7:08 p.m. UTC
Hi all,

The Hantro G1 VPU post-processor block can be pipelined with
the decoder hardware, allowing to perform operations such as
color conversion, scaling, rotation, cropping, among others.

When the post-processor is enabled, the decoder hardware
gets its own set of NV12 buffers (the native decoder format),
and the post-processor is the owner of the CAPTURE buffers.

This allows the application get processed
(scaled, converted, etc) buffers, completely transparently.

This feature is implemented by exposing the post-processed pixel
formats on ENUM_FMT. When the application sets a pixel format
other than NV12, and if the post-processor is MC-linked,
the driver will make use of it, transparently.

On this v2, the media controller API is now required
to "enable" (aka link, in topology jargon) the post-processor.
Therefore, applications now have to explicitly request this feature.

For instance, the post-processor is enabled using the media-ctl
tool:

media-ctl -l "'decoder':1 -> 'rockchip,rk3288-vpu-dec':0[0]"
media-ctl -l "'postproc':1 -> 'rockchip,rk3288-vpu-dec':0[1]"

v4l2-ctl -d 1 --list-formats
ioctl: VIDIOC_ENUM_FMT
	Type: Video Capture Multiplanar

	[0]: 'NV12' (Y/CbCr 4:2:0)
	[1]: 'YUYV' (YUYV 4:2:2)

Patches 1 and 2 are simply cleanups needed to easier integrate the
post-processor. Patch 2 is a bit invasive, but it's a step
forward towards merging the implementation for RK3399 and RK3288.

Patch 3 re-works the media-topology, making the decoder
a v4l2_subdevice, instead of a bare entity. This allows
to introduce a more accurate of the decoder+post-processor complex.

Patch 4 introduces the post-processing support.

This is tested on RK3288 platforms with MPEG-2, VP8 and
H264 streams, decoding to YUY2 surfaces. For now, this series
is only adding support for NV12-to-YUY2 conversion.

The design and implementation is quite different from v1:

* The decoder->post-processor topology is now exposed
  explicitly and applications need to configure the pipeline.
  By default, the decoder is enabled and the post-processor
  is disabled.

* RGB post-processing output has been dropped. We might
  add this in the future, but for now, it seems it would
  make the code more complex without a use-case in mind.
  RGB is much more memory-consuming so less attractive
  than YUV, and modern GPUs and display controllers support YUV.

* The post-processor implementation still supports RK3288
  only. However, a generic register infrastructure is introduced
  to make addition of other variants such as RK3399 really easy.

Ezequiel Garcia (4):
  media: hantro: Cleanup format negotiation helpers
  media: hantro: mpeg2_dec: Re-use common register macros
  media: hantro: Rework media topology
  media: hantro: Support color conversion via post-processing

 drivers/staging/media/hantro/Makefile         |   1 +
 drivers/staging/media/hantro/hantro.h         | 105 +++++-
 drivers/staging/media/hantro/hantro_drv.c     | 336 ++++++++++++++----
 .../staging/media/hantro/hantro_g1_h264_dec.c |   2 +-
 .../media/hantro/hantro_g1_mpeg2_dec.c        | 188 ++++------
 drivers/staging/media/hantro/hantro_g1_regs.h | 109 ++++--
 .../staging/media/hantro/hantro_g1_vp8_dec.c  |   2 +-
 drivers/staging/media/hantro/hantro_h264.c    |   6 +-
 drivers/staging/media/hantro/hantro_hw.h      |  13 +
 .../staging/media/hantro/hantro_postproc.c    | 141 ++++++++
 drivers/staging/media/hantro/hantro_v4l2.c    | 116 ++++--
 drivers/staging/media/hantro/rk3288_vpu_hw.c  |  10 +
 12 files changed, 754 insertions(+), 275 deletions(-)
 create mode 100644 drivers/staging/media/hantro/hantro_postproc.c

Comments

Tomasz Figa Oct. 4, 2019, 6:07 a.m. UTC | #1
Hi Ezequiel,

On Fri, Oct 4, 2019 at 4:09 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
>
> Hi all,
>
> The Hantro G1 VPU post-processor block can be pipelined with
> the decoder hardware, allowing to perform operations such as
> color conversion, scaling, rotation, cropping, among others.
>
> When the post-processor is enabled, the decoder hardware
> gets its own set of NV12 buffers (the native decoder format),
> and the post-processor is the owner of the CAPTURE buffers.
>
> This allows the application get processed
> (scaled, converted, etc) buffers, completely transparently.
>
> This feature is implemented by exposing the post-processed pixel
> formats on ENUM_FMT. When the application sets a pixel format
> other than NV12, and if the post-processor is MC-linked,
> the driver will make use of it, transparently.
>
> On this v2, the media controller API is now required
> to "enable" (aka link, in topology jargon) the post-processor.
> Therefore, applications now have to explicitly request this feature.
>
> For instance, the post-processor is enabled using the media-ctl
> tool:
>
> media-ctl -l "'decoder':1 -> 'rockchip,rk3288-vpu-dec':0[0]"
> media-ctl -l "'postproc':1 -> 'rockchip,rk3288-vpu-dec':0[1]"
>
> v4l2-ctl -d 1 --list-formats
> ioctl: VIDIOC_ENUM_FMT
>         Type: Video Capture Multiplanar
>
>         [0]: 'NV12' (Y/CbCr 4:2:0)
>         [1]: 'YUYV' (YUYV 4:2:2)
>
> Patches 1 and 2 are simply cleanups needed to easier integrate the
> post-processor. Patch 2 is a bit invasive, but it's a step
> forward towards merging the implementation for RK3399 and RK3288.
>
> Patch 3 re-works the media-topology, making the decoder
> a v4l2_subdevice, instead of a bare entity. This allows
> to introduce a more accurate of the decoder+post-processor complex.
>
> Patch 4 introduces the post-processing support.
>
> This is tested on RK3288 platforms with MPEG-2, VP8 and
> H264 streams, decoding to YUY2 surfaces. For now, this series
> is only adding support for NV12-to-YUY2 conversion.
>
> The design and implementation is quite different from v1:
>
> * The decoder->post-processor topology is now exposed
>   explicitly and applications need to configure the pipeline.
>   By default, the decoder is enabled and the post-processor
>   is disabled.
>

First of all, thanks for working on this. While from Chromium point of
view there isn't any practical use case for this feature, it could
possibly be valuable for some other platforms.

I still see a problem with the current design. Mem-to-mem decoders are
commonly used in a multi-instance fashion, but media topology is
global. That means that when having two applications on the system
decoding their own videos, one might be stepping on the other by
changing the topology.

I believe that if we want this to be really usable, we would need to
make the media topology per instance, but that's a significant change
to the media subsystem. Maybe we could pursue the other options I
suggested in the previous revision instead, like ordering the formats
returned by enum_fmt by native first and adding format flags that
would tell the userspace that the format can have some performance
and/or power penalty?

Best regards,
Tomasz
Hans Verkuil Nov. 9, 2019, 10:40 a.m. UTC | #2
On 10/4/19 8:07 AM, Tomasz Figa wrote:
> Hi Ezequiel,
> 
> On Fri, Oct 4, 2019 at 4:09 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
>>
>> Hi all,
>>
>> The Hantro G1 VPU post-processor block can be pipelined with
>> the decoder hardware, allowing to perform operations such as
>> color conversion, scaling, rotation, cropping, among others.
>>
>> When the post-processor is enabled, the decoder hardware
>> gets its own set of NV12 buffers (the native decoder format),
>> and the post-processor is the owner of the CAPTURE buffers.
>>
>> This allows the application get processed
>> (scaled, converted, etc) buffers, completely transparently.
>>
>> This feature is implemented by exposing the post-processed pixel
>> formats on ENUM_FMT. When the application sets a pixel format
>> other than NV12, and if the post-processor is MC-linked,
>> the driver will make use of it, transparently.
>>
>> On this v2, the media controller API is now required
>> to "enable" (aka link, in topology jargon) the post-processor.
>> Therefore, applications now have to explicitly request this feature.
>>
>> For instance, the post-processor is enabled using the media-ctl
>> tool:
>>
>> media-ctl -l "'decoder':1 -> 'rockchip,rk3288-vpu-dec':0[0]"
>> media-ctl -l "'postproc':1 -> 'rockchip,rk3288-vpu-dec':0[1]"
>>
>> v4l2-ctl -d 1 --list-formats
>> ioctl: VIDIOC_ENUM_FMT
>>         Type: Video Capture Multiplanar
>>
>>         [0]: 'NV12' (Y/CbCr 4:2:0)
>>         [1]: 'YUYV' (YUYV 4:2:2)
>>
>> Patches 1 and 2 are simply cleanups needed to easier integrate the
>> post-processor. Patch 2 is a bit invasive, but it's a step
>> forward towards merging the implementation for RK3399 and RK3288.
>>
>> Patch 3 re-works the media-topology, making the decoder
>> a v4l2_subdevice, instead of a bare entity. This allows
>> to introduce a more accurate of the decoder+post-processor complex.
>>
>> Patch 4 introduces the post-processing support.
>>
>> This is tested on RK3288 platforms with MPEG-2, VP8 and
>> H264 streams, decoding to YUY2 surfaces. For now, this series
>> is only adding support for NV12-to-YUY2 conversion.
>>
>> The design and implementation is quite different from v1:
>>
>> * The decoder->post-processor topology is now exposed
>>   explicitly and applications need to configure the pipeline.
>>   By default, the decoder is enabled and the post-processor
>>   is disabled.
>>
> 
> First of all, thanks for working on this. While from Chromium point of
> view there isn't any practical use case for this feature, it could
> possibly be valuable for some other platforms.
> 
> I still see a problem with the current design. Mem-to-mem decoders are
> commonly used in a multi-instance fashion, but media topology is
> global. That means that when having two applications on the system
> decoding their own videos, one might be stepping on the other by
> changing the topology.
> 
> I believe that if we want this to be really usable, we would need to
> make the media topology per instance, but that's a significant change
> to the media subsystem. Maybe we could pursue the other options I
> suggested in the previous revision instead, like ordering the formats
> returned by enum_fmt by native first and adding format flags that
> would tell the userspace that the format can have some performance
> and/or power penalty?

I discussed this with Ezequiel in Lyon, and my preference is also to
not rely on the media controller, but instead order the formats with
the native formats first, followed by the formats that need this additional
processing step. A format flag can be added, but I feel that it is better
to wait with that until it is clear that there is a real need for it.

It would be good to document in the ENUM_FMT ioctl that formats that
require additional processing are at the end of the format list.

Ezequiel, I'm marking this series as "Changes Requested" in patchwork.

Regards,

	Hans

> 
> Best regards,
> Tomasz
>
Ezequiel Garcia Nov. 9, 2019, 4:02 p.m. UTC | #3
On Sat, 2019-11-09 at 11:40 +0100, Hans Verkuil wrote:
> On 10/4/19 8:07 AM, Tomasz Figa wrote:
> > Hi Ezequiel,
> > 
> > On Fri, Oct 4, 2019 at 4:09 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
> > > Hi all,
> > > 
> > > The Hantro G1 VPU post-processor block can be pipelined with
> > > the decoder hardware, allowing to perform operations such as
> > > color conversion, scaling, rotation, cropping, among others.
> > > 
> > > When the post-processor is enabled, the decoder hardware
> > > gets its own set of NV12 buffers (the native decoder format),
> > > and the post-processor is the owner of the CAPTURE buffers.
> > > 
> > > This allows the application get processed
> > > (scaled, converted, etc) buffers, completely transparently.
> > > 
> > > This feature is implemented by exposing the post-processed pixel
> > > formats on ENUM_FMT. When the application sets a pixel format
> > > other than NV12, and if the post-processor is MC-linked,
> > > the driver will make use of it, transparently.
> > > 
> > > On this v2, the media controller API is now required
> > > to "enable" (aka link, in topology jargon) the post-processor.
> > > Therefore, applications now have to explicitly request this feature.
> > > 
> > > For instance, the post-processor is enabled using the media-ctl
> > > tool:
> > > 
> > > media-ctl -l "'decoder':1 -> 'rockchip,rk3288-vpu-dec':0[0]"
> > > media-ctl -l "'postproc':1 -> 'rockchip,rk3288-vpu-dec':0[1]"
> > > 
> > > v4l2-ctl -d 1 --list-formats
> > > ioctl: VIDIOC_ENUM_FMT
> > >         Type: Video Capture Multiplanar
> > > 
> > >         [0]: 'NV12' (Y/CbCr 4:2:0)
> > >         [1]: 'YUYV' (YUYV 4:2:2)
> > > 
> > > Patches 1 and 2 are simply cleanups needed to easier integrate the
> > > post-processor. Patch 2 is a bit invasive, but it's a step
> > > forward towards merging the implementation for RK3399 and RK3288.
> > > 
> > > Patch 3 re-works the media-topology, making the decoder
> > > a v4l2_subdevice, instead of a bare entity. This allows
> > > to introduce a more accurate of the decoder+post-processor complex.
> > > 
> > > Patch 4 introduces the post-processing support.
> > > 
> > > This is tested on RK3288 platforms with MPEG-2, VP8 and
> > > H264 streams, decoding to YUY2 surfaces. For now, this series
> > > is only adding support for NV12-to-YUY2 conversion.
> > > 
> > > The design and implementation is quite different from v1:
> > > 
> > > * The decoder->post-processor topology is now exposed
> > >   explicitly and applications need to configure the pipeline.
> > >   By default, the decoder is enabled and the post-processor
> > >   is disabled.
> > > 
> > 
> > First of all, thanks for working on this. While from Chromium point of
> > view there isn't any practical use case for this feature, it could
> > possibly be valuable for some other platforms.
> > 
> > I still see a problem with the current design. Mem-to-mem decoders are
> > commonly used in a multi-instance fashion, but media topology is
> > global. That means that when having two applications on the system
> > decoding their own videos, one might be stepping on the other by
> > changing the topology.
> > 
> > I believe that if we want this to be really usable, we would need to
> > make the media topology per instance, but that's a significant change
> > to the media subsystem. Maybe we could pursue the other options I
> > suggested in the previous revision instead, like ordering the formats
> > returned by enum_fmt by native first and adding format flags that
> > would tell the userspace that the format can have some performance
> > and/or power penalty?
> 
> I discussed this with Ezequiel in Lyon, and my preference is also to
> not rely on the media controller, but instead order the formats with
> the native formats first, followed by the formats that need this additional
> processing step. A format flag can be added, but I feel that it is better
> to wait with that until it is clear that there is a real need for it.
> 
> It would be good to document in the ENUM_FMT ioctl that formats that
> require additional processing are at the end of the format list.
> 
> Ezequiel, I'm marking this series as "Changes Requested" in patchwork.
> 

Thanks! I should revisit this and post a new series soon.

Ezequiel