mbox series

[v2,0/2] Enable JPEG encoding on rk3588

Message ID 20240327134115.424846-1-linkmauve@linkmauve.fr (mailing list archive)
Headers show
Series Enable JPEG encoding on rk3588 | expand

Message

Link Mauve March 27, 2024, 1:41 p.m. UTC
Only the JPEG encoder is available for now, although there are patches
for the undocumented VP8 encoder floating around[0].

This has been tested on a rock-5b, resulting in four /dev/video*
encoders.  The userspace program I’ve been using to test them is
Onix[1], using the jpeg-encoder example, it will pick one of these four
at random (but displays the one it picked):
% ffmpeg -i <input image> -pix_fmt yuvj420p temp.yuv
% jpeg-encoder temp.yuv <width> <height> NV12 <quality> output.jpeg

[0] https://patchwork.kernel.org/project/linux-rockchip/list/?series=789885
[1] https://crates.io/crates/onix

Changes since v1:
- Dropped patches 1 and 4.
- Use the proper compatible form, since this device should be fully
  compatible with the VEPU of rk356x.
- Describe where the VEPU121 name comes from, and list other encoders
  and decoders present in this SoC.
- Properly test the device tree changes, I previously couldn’t since I
  was using a too recent version of python-jsonschema…

Emmanuel Gil Peyrot (2):
  media: dt-binding: media: Document rk3588’s VEPU121
  arm64: dts: rockchip: Add VEPU121 to rk3588

 .../bindings/media/rockchip,rk3568-vepu.yaml  |  8 +-
 arch/arm64/boot/dts/rockchip/rk3588s.dtsi     | 80 +++++++++++++++++++
 2 files changed, 86 insertions(+), 2 deletions(-)

Comments

Nicolas Dufresne April 4, 2024, 5:41 p.m. UTC | #1
Hi,

Le mercredi 27 mars 2024 à 14:41 +0100, Emmanuel Gil Peyrot a écrit :
> Only the JPEG encoder is available for now, although there are patches
> for the undocumented VP8 encoder floating around[0].

[0] seems like a broken link. The VP8 encoder RFC is for RK3399 (and Hantro H1
posted by ST more recently). The TRM says "VEPU121(JPEG encoder only)", which
suggest that the H.264 and VP8 encoders usually found on the VEPU121 are
removed. As Rockchip have remove the synthesize register while modifying the H1
IP, it is difficult to verify. Confusingly the H.264 specific registers are
documented in the TRM around VEPU121.

> 
> This has been tested on a rock-5b, resulting in four /dev/video*
> encoders.  The userspace program I’ve been using to test them is
> Onix[1], using the jpeg-encoder example, it will pick one of these four
> at random (but displays the one it picked):
> % ffmpeg -i <input image> -pix_fmt yuvj420p temp.yuv
> % jpeg-encoder temp.yuv <width> <height> NV12 <quality> output.jpeg

I don't like that we exposing each identical cores a separate video nodes. I
think we should aim for 1 device, and then multi-plex and schedule de cores from
inside the Linux kernel.

Not doing this now means we'll never have an optimal hardware usage
distribution. Just consider two userspace software wanting to do jpeg encoding.
If they both take a guess, they may endup using a single core. Where with proper
scheduling in V4L2, the kernel will be able to properly distribute the load. I
insist on this, since if we merge you changes it becomes an ABI and we can't
change it anymore.

I understand that this impose a rework of the mem2mem framework so that we can
run multiple jobs, but this will be needed anyway on RK3588, since the rkvdec2,
which we don't have a driver yet is also multi-core, but you need to use 2 cores
when the resolution is close to 8K.

Nicolas

> 
> [0] https://patchwork.kernel.org/project/linux-rockchip/list/?series=789885
> [1] https://crates.io/crates/onix
> 
> Changes since v1:
> - Dropped patches 1 and 4.
> - Use the proper compatible form, since this device should be fully
>   compatible with the VEPU of rk356x.
> - Describe where the VEPU121 name comes from, and list other encoders
>   and decoders present in this SoC.
> - Properly test the device tree changes, I previously couldn’t since I
>   was using a too recent version of python-jsonschema…
> 
> Emmanuel Gil Peyrot (2):
>   media: dt-binding: media: Document rk3588’s VEPU121
>   arm64: dts: rockchip: Add VEPU121 to rk3588
> 
>  .../bindings/media/rockchip,rk3568-vepu.yaml  |  8 +-
>  arch/arm64/boot/dts/rockchip/rk3588s.dtsi     | 80 +++++++++++++++++++
>  2 files changed, 86 insertions(+), 2 deletions(-)
>
Link Mauve April 5, 2024, 2:21 p.m. UTC | #2
On Thu, Apr 04, 2024 at 01:41:15PM -0400, Nicolas Dufresne wrote:
> Hi,

Hi,

> 
> Le mercredi 27 mars 2024 à 14:41 +0100, Emmanuel Gil Peyrot a écrit :
> > Only the JPEG encoder is available for now, although there are patches
> > for the undocumented VP8 encoder floating around[0].
> 
> [0] seems like a broken link. The VP8 encoder RFC is for RK3399 (and Hantro H1
> posted by ST more recently). The TRM says "VEPU121(JPEG encoder only)", which
> suggest that the H.264 and VP8 encoders usually found on the VEPU121 are
> removed. As Rockchip have remove the synthesize register while modifying the H1
> IP, it is difficult to verify. Confusingly the H.264 specific registers are
> documented in the TRM around VEPU121.

Ah, the link became, and was indeed ST’s series:
https://patchwork.kernel.org/project/linux-rockchip/list/?series=789885&archive=both

But the TRM part 1 says the VEPU121 supports H.264 encoding (page 367),
and it’s likely they didn’t remove just VP8 support since the codec
features are pretty close to H.264’s.

> 
> > 
> > This has been tested on a rock-5b, resulting in four /dev/video*
> > encoders.  The userspace program I’ve been using to test them is
> > Onix[1], using the jpeg-encoder example, it will pick one of these four
> > at random (but displays the one it picked):
> > % ffmpeg -i <input image> -pix_fmt yuvj420p temp.yuv
> > % jpeg-encoder temp.yuv <width> <height> NV12 <quality> output.jpeg
> 
> I don't like that we exposing each identical cores a separate video nodes. I
> think we should aim for 1 device, and then multi-plex and schedule de cores from
> inside the Linux kernel.

I agree, but this should be handled in the driver not in the device
tree, and it can be done later.

> 
> Not doing this now means we'll never have an optimal hardware usage
> distribution. Just consider two userspace software wanting to do jpeg encoding.
> If they both take a guess, they may endup using a single core. Where with proper
> scheduling in V4L2, the kernel will be able to properly distribute the load. I
> insist on this, since if we merge you changes it becomes an ABI and we can't
> change it anymore.

Will it really become ABI just like that?  Userspace should always
discover the video nodes and their capabilities and not hardcode e.g. a
specific /dev/videoN file for a specific codec.  I would argue that this
series would let userspace do JPEG encoding right away, even if in a
less optimal way than if the driver would round-robin them through a
single video node, but that can always be added in a future version.

> 
> I understand that this impose a rework of the mem2mem framework so that we can
> run multiple jobs, but this will be needed anyway on RK3588, since the rkvdec2,
> which we don't have a driver yet is also multi-core, but you need to use 2 cores
> when the resolution is close to 8K.

I think the mediatek JPEG driver already supports that, would it be ok
to do it the same way?

> 
> Nicolas
> 
> > 
> > [0] https://patchwork.kernel.org/project/linux-rockchip/list/?series=789885
> > [1] https://crates.io/crates/onix
> > 
> > Changes since v1:
> > - Dropped patches 1 and 4.
> > - Use the proper compatible form, since this device should be fully
> >   compatible with the VEPU of rk356x.
> > - Describe where the VEPU121 name comes from, and list other encoders
> >   and decoders present in this SoC.
> > - Properly test the device tree changes, I previously couldn’t since I
> >   was using a too recent version of python-jsonschema…
> > 
> > Emmanuel Gil Peyrot (2):
> >   media: dt-binding: media: Document rk3588’s VEPU121
> >   arm64: dts: rockchip: Add VEPU121 to rk3588
> > 
> >  .../bindings/media/rockchip,rk3568-vepu.yaml  |  8 +-
> >  arch/arm64/boot/dts/rockchip/rk3588s.dtsi     | 80 +++++++++++++++++++
> >  2 files changed, 86 insertions(+), 2 deletions(-)
> > 
>
Nicolas Dufresne April 7, 2024, 8:08 a.m. UTC | #3
Le vendredi 05 avril 2024 à 16:21 +0200, Link Mauve a écrit :
> On Thu, Apr 04, 2024 at 01:41:15PM -0400, Nicolas Dufresne wrote:
> > Hi,
> 
> Hi,
> 
> > 
> > Le mercredi 27 mars 2024 à 14:41 +0100, Emmanuel Gil Peyrot a écrit :
> > > Only the JPEG encoder is available for now, although there are patches
> > > for the undocumented VP8 encoder floating around[0].
> > 
> > [0] seems like a broken link. The VP8 encoder RFC is for RK3399 (and Hantro H1
> > posted by ST more recently). The TRM says "VEPU121(JPEG encoder only)", which
> > suggest that the H.264 and VP8 encoders usually found on the VEPU121 are
> > removed. As Rockchip have remove the synthesize register while modifying the H1
> > IP, it is difficult to verify. Confusingly the H.264 specific registers are
> > documented in the TRM around VEPU121.
> 
> Ah, the link became, and was indeed ST’s series:
> https://patchwork.kernel.org/project/linux-rockchip/list/?series=789885&archive=both
> 
> But the TRM part 1 says the VEPU121 supports H.264 encoding (page 367),
> and it’s likely they didn’t remove just VP8 support since the codec
> features are pretty close to H.264’s.
> 
> > 
> > > 
> > > This has been tested on a rock-5b, resulting in four /dev/video*
> > > encoders.  The userspace program I’ve been using to test them is
> > > Onix[1], using the jpeg-encoder example, it will pick one of these four
> > > at random (but displays the one it picked):
> > > % ffmpeg -i <input image> -pix_fmt yuvj420p temp.yuv
> > > % jpeg-encoder temp.yuv <width> <height> NV12 <quality> output.jpeg
> > 
> > I don't like that we exposing each identical cores a separate video nodes. I
> > think we should aim for 1 device, and then multi-plex and schedule de cores from
> > inside the Linux kernel.
> 
> I agree, but this should be handled in the driver not in the device
> tree, and it can be done later.

As the behaviour we want is that these cores becomes a group and get schedule
together, its certainly a good time to slow down and evaluate if that part needs
to be improve in the DT too.

Hantro G1/H1 and VEPU/VDPU121 combos originally shared the same sram region. Its
not clear if any of these cores have this limitation and if this should be
expressed in the DT / driver.

> 
> > 
> > Not doing this now means we'll never have an optimal hardware usage
> > distribution. Just consider two userspace software wanting to do jpeg encoding.
> > If they both take a guess, they may endup using a single core. Where with proper
> > scheduling in V4L2, the kernel will be able to properly distribute the load. I
> > insist on this, since if we merge you changes it becomes an ABI and we can't
> > change it anymore.
> 
> Will it really become ABI just like that?  Userspace should always
> discover the video nodes and their capabilities and not hardcode e.g. a
> specific /dev/videoN file for a specific codec.  I would argue that this
> series would let userspace do JPEG encoding right away, even if in a
> less optimal way than if the driver would round-robin them through a
> single video node, but that can always be added in a future version.

Might be on the gray side, but there is good chances software written for your
specific board can stop working after te grouping is done.

> 
> > 
> > I understand that this impose a rework of the mem2mem framework so that we can
> > run multiple jobs, but this will be needed anyway on RK3588, since the rkvdec2,
> > which we don't have a driver yet is also multi-core, but you need to use 2 cores
> > when the resolution is close to 8K.
> 
> I think the mediatek JPEG driver already supports that, would it be ok
> to do it the same way?

I don't know for JPEG, the MTK vcoder do support cascading cores. This is
different from concurrent cores. In MTK architecture, for some of the codec,
there is LAT (entropy decoder) and CORE (the reconstruction block) that are
split.

Nicolas
Link Mauve April 12, 2024, 3:07 p.m. UTC | #4
On Sun, Apr 07, 2024 at 10:08:58AM +0200, Nicolas Dufresne wrote:
> Le vendredi 05 avril 2024 à 16:21 +0200, Link Mauve a écrit :
> > On Thu, Apr 04, 2024 at 01:41:15PM -0400, Nicolas Dufresne wrote:
> > > Hi,
> > 
> > Hi,
> > 
> > > 
> > > Le mercredi 27 mars 2024 à 14:41 +0100, Emmanuel Gil Peyrot a écrit :
> > > > Only the JPEG encoder is available for now, although there are patches
> > > > for the undocumented VP8 encoder floating around[0].
> > > 
> > > [0] seems like a broken link. The VP8 encoder RFC is for RK3399 (and Hantro H1
> > > posted by ST more recently). The TRM says "VEPU121(JPEG encoder only)", which
> > > suggest that the H.264 and VP8 encoders usually found on the VEPU121 are
> > > removed. As Rockchip have remove the synthesize register while modifying the H1
> > > IP, it is difficult to verify. Confusingly the H.264 specific registers are
> > > documented in the TRM around VEPU121.
> > 
> > Ah, the link became, and was indeed ST’s series:
> > https://patchwork.kernel.org/project/linux-rockchip/list/?series=789885&archive=both
> > 
> > But the TRM part 1 says the VEPU121 supports H.264 encoding (page 367),
> > and it’s likely they didn’t remove just VP8 support since the codec
> > features are pretty close to H.264’s.
> > 
> > > 
> > > > 
> > > > This has been tested on a rock-5b, resulting in four /dev/video*
> > > > encoders.  The userspace program I’ve been using to test them is
> > > > Onix[1], using the jpeg-encoder example, it will pick one of these four
> > > > at random (but displays the one it picked):
> > > > % ffmpeg -i <input image> -pix_fmt yuvj420p temp.yuv
> > > > % jpeg-encoder temp.yuv <width> <height> NV12 <quality> output.jpeg
> > > 
> > > I don't like that we exposing each identical cores a separate video nodes. I
> > > think we should aim for 1 device, and then multi-plex and schedule de cores from
> > > inside the Linux kernel.
> > 
> > I agree, but this should be handled in the driver not in the device
> > tree, and it can be done later.
> 
> As the behaviour we want is that these cores becomes a group and get schedule
> together, its certainly a good time to slow down and evaluate if that part needs
> to be improve in the DT too.
> 
> Hantro G1/H1 and VEPU/VDPU121 combos originally shared the same sram region. Its
> not clear if any of these cores have this limitation and if this should be
> expressed in the DT / driver.

The TRM on page 369 mentions that:
> Please note that VDPU121 and VDPU381 and VDPU720 and AV1 and VEPU121 and VEPU580
> is different IP cores, so these processing core can be work together.

I understand that as them not sharing any memory and being able to work
independently.

> 
> > 
> > > 
> > > Not doing this now means we'll never have an optimal hardware usage
> > > distribution. Just consider two userspace software wanting to do jpeg encoding.
> > > If they both take a guess, they may endup using a single core. Where with proper
> > > scheduling in V4L2, the kernel will be able to properly distribute the load. I
> > > insist on this, since if we merge you changes it becomes an ABI and we can't
> > > change it anymore.
> > 
> > Will it really become ABI just like that?  Userspace should always
> > discover the video nodes and their capabilities and not hardcode e.g. a
> > specific /dev/videoN file for a specific codec.  I would argue that this
> > series would let userspace do JPEG encoding right away, even if in a
> > less optimal way than if the driver would round-robin them through a
> > single video node, but that can always be added in a future version.
> 
> Might be on the gray side, but there is good chances software written for your
> specific board can stop working after te grouping is done.

I will send a new series shortly which enables only one of these four
cores, the functionality should be the same, it will just expose only
one video node which can get a four times throughput upgrade later once
we implement multi-core support in the driver.

> 
> > 
> > > 
> > > I understand that this impose a rework of the mem2mem framework so that we can
> > > run multiple jobs, but this will be needed anyway on RK3588, since the rkvdec2,
> > > which we don't have a driver yet is also multi-core, but you need to use 2 cores
> > > when the resolution is close to 8K.
> > 
> > I think the mediatek JPEG driver already supports that, would it be ok
> > to do it the same way?
> 
> I don't know for JPEG, the MTK vcoder do support cascading cores. This is
> different from concurrent cores. In MTK architecture, for some of the codec,
> there is LAT (entropy decoder) and CORE (the reconstruction block) that are
> split.

Ah, that’s different then, I only had a cursory look at them back when I
implemented the JPEG decoder for sunxi.

> 
> Nicolas