diff mbox series

[v3,2/5] media: hantro: Reduce H264 extra space for motion vectors

Message ID HE1PR06MB4011FF930111A869E4645C8CAC790@HE1PR06MB4011.eurprd06.prod.outlook.com (mailing list archive)
State New, archived
Headers show
Series media: hantro: H264 fixes and improvements | expand

Commit Message

Jonas Karlman Nov. 6, 2019, 10:34 p.m. UTC
A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per
macroblock with additional 32 bytes on multi-core variants.

Memory layout is as follow:

+---------------------------+
| Y-plane   256 bytes x MBs |
+---------------------------+
| UV-plane  128 bytes x MBs |
+---------------------------+
| MV buffer  64 bytes x MBs |
+---------------------------+
| MC sync          32 bytes |
+---------------------------+

Reduce the extra space allocated now that motion vector buffer offset no
longer is based on the extra space.

Only allocate extra space for 64 bytes x MBs of motion vector buffer
and 32 bytes for multi-core sync.

Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
---
Changes in v3:
  - add memory layout to code comment (Boris)
Changes in v2:
  - updated commit message
---
 drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Tomasz Figa Nov. 20, 2019, 12:44 p.m. UTC | #1
Hi Jonas,

On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote:
>
> A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per
> macroblock with additional 32 bytes on multi-core variants.
>
> Memory layout is as follow:
>
> +---------------------------+
> | Y-plane   256 bytes x MBs |
> +---------------------------+
> | UV-plane  128 bytes x MBs |
> +---------------------------+
> | MV buffer  64 bytes x MBs |
> +---------------------------+
> | MC sync          32 bytes |
> +---------------------------+
>
> Reduce the extra space allocated now that motion vector buffer offset no
> longer is based on the extra space.
>
> Only allocate extra space for 64 bytes x MBs of motion vector buffer
> and 32 bytes for multi-core sync.
>
> Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
> ---
> Changes in v3:
>   - add memory layout to code comment (Boris)
> Changes in v2:
>   - updated commit message
> ---
>  drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>

Thanks for the patch!

What platform did you test it on and how? Was it tested with IOMMU enabled?

Best regards,
Tomasz

> diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
> index 3dae52abb96c..c8c896a06f58 100644
> --- a/drivers/staging/media/hantro/hantro_v4l2.c
> +++ b/drivers/staging/media/hantro/hantro_v4l2.c
> @@ -240,14 +240,30 @@ static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
>                 v4l2_fill_pixfmt_mp(pix_mp, fmt->fourcc, pix_mp->width,
>                                     pix_mp->height);
>                 /*
> +                * A decoded 8-bit 4:2:0 NV12 frame may need memory for up to
> +                * 448 bytes per macroblock with additional 32 bytes on
> +                * multi-core variants.
> +                *
>                  * The H264 decoder needs extra space on the output buffers
>                  * to store motion vectors. This is needed for reference
>                  * frames.
> +                *
> +                * Memory layout is as follow:
> +                *
> +                * +---------------------------+
> +                * | Y-plane   256 bytes x MBs |
> +                * +---------------------------+
> +                * | UV-plane  128 bytes x MBs |
> +                * +---------------------------+
> +                * | MV buffer  64 bytes x MBs |
> +                * +---------------------------+
> +                * | MC sync          32 bytes |
> +                * +---------------------------+
>                  */
>                 if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE)
>                         pix_mp->plane_fmt[0].sizeimage +=
> -                               128 * DIV_ROUND_UP(pix_mp->width, 16) *
> -                                     DIV_ROUND_UP(pix_mp->height, 16);
> +                               64 * MB_WIDTH(pix_mp->width) *
> +                                    MB_WIDTH(pix_mp->height) + 32;
>         } else if (!pix_mp->plane_fmt[0].sizeimage) {
>                 /*
>                  * For coded formats the application can specify
> --
> 2.17.1
>
Ezequiel Garcia Dec. 9, 2019, 6:11 p.m. UTC | #2
On Wed, 2019-11-20 at 21:44 +0900, Tomasz Figa wrote:
> Hi Jonas,
> 
> On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote:
> > A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per
> > macroblock with additional 32 bytes on multi-core variants.
> > 
> > Memory layout is as follow:
> > 
> > +---------------------------+
> > > Y-plane   256 bytes x MBs |
> > +---------------------------+
> > > UV-plane  128 bytes x MBs |
> > +---------------------------+
> > > MV buffer  64 bytes x MBs |
> > +---------------------------+
> > > MC sync          32 bytes |
> > +---------------------------+
> > 
> > Reduce the extra space allocated now that motion vector buffer offset no
> > longer is based on the extra space.
> > 
> > Only allocate extra space for 64 bytes x MBs of motion vector buffer
> > and 32 bytes for multi-core sync.
> > 
> > Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
> > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
> > ---
> > Changes in v3:
> >   - add memory layout to code comment (Boris)
> > Changes in v2:
> >   - updated commit message
> > ---
> >  drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> > 
> 
> Thanks for the patch!
> 
> What platform did you test it on and how? Was it tested with IOMMU enabled?

Hello Tomasz,

Please note that this series has been picked-up and is merged
in v5.5-rc1.

IIRC, we tested these patches on RK3399 and RK3288 (that means
with an IOMMU). I've just ran some more extensive tests on RK3288,
on media/master; and I plan to test some more on RK3399 later this week.

Do you have any specific concern in mind?

Thanks,
Ezequiel
Tomasz Figa Jan. 8, 2020, 12:59 p.m. UTC | #3
On Tue, Dec 10, 2019 at 3:11 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
>
> On Wed, 2019-11-20 at 21:44 +0900, Tomasz Figa wrote:
> > Hi Jonas,
> >
> > On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote:
> > > A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per
> > > macroblock with additional 32 bytes on multi-core variants.
> > >
> > > Memory layout is as follow:
> > >
> > > +---------------------------+
> > > > Y-plane   256 bytes x MBs |
> > > +---------------------------+
> > > > UV-plane  128 bytes x MBs |
> > > +---------------------------+
> > > > MV buffer  64 bytes x MBs |
> > > +---------------------------+
> > > > MC sync          32 bytes |
> > > +---------------------------+
> > >
> > > Reduce the extra space allocated now that motion vector buffer offset no
> > > longer is based on the extra space.
> > >
> > > Only allocate extra space for 64 bytes x MBs of motion vector buffer
> > > and 32 bytes for multi-core sync.
> > >
> > > Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
> > > Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> > > Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > ---
> > > Changes in v3:
> > >   - add memory layout to code comment (Boris)
> > > Changes in v2:
> > >   - updated commit message
> > > ---
> > >  drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++--
> > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > >
> >
> > Thanks for the patch!
> >
> > What platform did you test it on and how? Was it tested with IOMMU enabled?
>
> Hello Tomasz,
>
> Please note that this series has been picked-up and is merged
> in v5.5-rc1.
>
> IIRC, we tested these patches on RK3399 and RK3288 (that means
> with an IOMMU). I've just ran some more extensive tests on RK3288,
> on media/master; and I plan to test some more on RK3399 later this week.
>
> Do you have any specific concern in mind?

I specifically want to know whether we're 100% sure that those sizes
are correct. The IOMMU still works on page granularity so it's
possible that the allocation could be just big enough by luck.

Best regards,
Tomasz
Jonas Karlman Jan. 8, 2020, 3:10 p.m. UTC | #4
On 2020-01-08 13:59, Tomasz Figa wrote:
> On Tue, Dec 10, 2019 at 3:11 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
>>
>> On Wed, 2019-11-20 at 21:44 +0900, Tomasz Figa wrote:
>>> Hi Jonas,
>>>
>>> On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote:
>>>> A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per
>>>> macroblock with additional 32 bytes on multi-core variants.
>>>>
>>>> Memory layout is as follow:
>>>>
>>>> +---------------------------+
>>>>> Y-plane   256 bytes x MBs |
>>>> +---------------------------+
>>>>> UV-plane  128 bytes x MBs |
>>>> +---------------------------+
>>>>> MV buffer  64 bytes x MBs |
>>>> +---------------------------+
>>>>> MC sync          32 bytes |
>>>> +---------------------------+
>>>>
>>>> Reduce the extra space allocated now that motion vector buffer offset no
>>>> longer is based on the extra space.
>>>>
>>>> Only allocate extra space for 64 bytes x MBs of motion vector buffer
>>>> and 32 bytes for multi-core sync.
>>>>
>>>> Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
>>>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
>>>> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
>>>> ---
>>>> Changes in v3:
>>>>   - add memory layout to code comment (Boris)
>>>> Changes in v2:
>>>>   - updated commit message
>>>> ---
>>>>  drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++--
>>>>  1 file changed, 18 insertions(+), 2 deletions(-)
>>>>
>>>
>>> Thanks for the patch!
>>>
>>> What platform did you test it on and how? Was it tested with IOMMU enabled?
>>
>> Hello Tomasz,
>>
>> Please note that this series has been picked-up and is merged
>> in v5.5-rc1.
>>
>> IIRC, we tested these patches on RK3399 and RK3288 (that means
>> with an IOMMU). I've just ran some more extensive tests on RK3288,
>> on media/master; and I plan to test some more on RK3399 later this week.
>>
>> Do you have any specific concern in mind?
> 
> I specifically want to know whether we're 100% sure that those sizes
> are correct. The IOMMU still works on page granularity so it's
> possible that the allocation could be just big enough by luck.

One of my RK3288 TRM [1] contains the following:

Direct mode motion vector write:
  Its base addr is right after decode output picture data
  Its length is mbwidth*mbheight*64

Also both the mpp library and imx-vpu-hantro code both use mbwidth*mbheight*64.
So I feel confident that the buffer size is correct.

[1] Rockchip RK3288TRM V1.1 Part3-Graphic and multi-media.pdf

Regards,
Jonas

> 
> Best regards,
> Tomasz
>
Tomasz Figa Jan. 16, 2020, 3:56 a.m. UTC | #5
On Thu, Jan 9, 2020 at 12:10 AM Jonas Karlman <jonas@kwiboo.se> wrote:
>
> On 2020-01-08 13:59, Tomasz Figa wrote:
> > On Tue, Dec 10, 2019 at 3:11 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
> >>
> >> On Wed, 2019-11-20 at 21:44 +0900, Tomasz Figa wrote:
> >>> Hi Jonas,
> >>>
> >>> On Thu, Nov 7, 2019 at 7:34 AM Jonas Karlman <jonas@kwiboo.se> wrote:
> >>>> A decoded 8-bit 4:2:0 frame need memory for up to 448 bytes per
> >>>> macroblock with additional 32 bytes on multi-core variants.
> >>>>
> >>>> Memory layout is as follow:
> >>>>
> >>>> +---------------------------+
> >>>>> Y-plane   256 bytes x MBs |
> >>>> +---------------------------+
> >>>>> UV-plane  128 bytes x MBs |
> >>>> +---------------------------+
> >>>>> MV buffer  64 bytes x MBs |
> >>>> +---------------------------+
> >>>>> MC sync          32 bytes |
> >>>> +---------------------------+
> >>>>
> >>>> Reduce the extra space allocated now that motion vector buffer offset no
> >>>> longer is based on the extra space.
> >>>>
> >>>> Only allocate extra space for 64 bytes x MBs of motion vector buffer
> >>>> and 32 bytes for multi-core sync.
> >>>>
> >>>> Fixes: a9471e25629b ("media: hantro: Add core bits to support H264 decoding")
> >>>> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
> >>>> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
> >>>> ---
> >>>> Changes in v3:
> >>>>   - add memory layout to code comment (Boris)
> >>>> Changes in v2:
> >>>>   - updated commit message
> >>>> ---
> >>>>  drivers/staging/media/hantro/hantro_v4l2.c | 20 ++++++++++++++++++--
> >>>>  1 file changed, 18 insertions(+), 2 deletions(-)
> >>>>
> >>>
> >>> Thanks for the patch!
> >>>
> >>> What platform did you test it on and how? Was it tested with IOMMU enabled?
> >>
> >> Hello Tomasz,
> >>
> >> Please note that this series has been picked-up and is merged
> >> in v5.5-rc1.
> >>
> >> IIRC, we tested these patches on RK3399 and RK3288 (that means
> >> with an IOMMU). I've just ran some more extensive tests on RK3288,
> >> on media/master; and I plan to test some more on RK3399 later this week.
> >>
> >> Do you have any specific concern in mind?
> >
> > I specifically want to know whether we're 100% sure that those sizes
> > are correct. The IOMMU still works on page granularity so it's
> > possible that the allocation could be just big enough by luck.
>
> One of my RK3288 TRM [1] contains the following:
>
> Direct mode motion vector write:
>   Its base addr is right after decode output picture data
>   Its length is mbwidth*mbheight*64
>
> Also both the mpp library and imx-vpu-hantro code both use mbwidth*mbheight*64.
> So I feel confident that the buffer size is correct.
>
> [1] Rockchip RK3288TRM V1.1 Part3-Graphic and multi-media.pdf

Fair enough. Thanks!

Best regards,
Tomasz
diff mbox series

Patch

diff --git a/drivers/staging/media/hantro/hantro_v4l2.c b/drivers/staging/media/hantro/hantro_v4l2.c
index 3dae52abb96c..c8c896a06f58 100644
--- a/drivers/staging/media/hantro/hantro_v4l2.c
+++ b/drivers/staging/media/hantro/hantro_v4l2.c
@@ -240,14 +240,30 @@  static int vidioc_try_fmt(struct file *file, void *priv, struct v4l2_format *f,
 		v4l2_fill_pixfmt_mp(pix_mp, fmt->fourcc, pix_mp->width,
 				    pix_mp->height);
 		/*
+		 * A decoded 8-bit 4:2:0 NV12 frame may need memory for up to
+		 * 448 bytes per macroblock with additional 32 bytes on
+		 * multi-core variants.
+		 *
 		 * The H264 decoder needs extra space on the output buffers
 		 * to store motion vectors. This is needed for reference
 		 * frames.
+		 *
+		 * Memory layout is as follow:
+		 *
+		 * +---------------------------+
+		 * | Y-plane   256 bytes x MBs |
+		 * +---------------------------+
+		 * | UV-plane  128 bytes x MBs |
+		 * +---------------------------+
+		 * | MV buffer  64 bytes x MBs |
+		 * +---------------------------+
+		 * | MC sync          32 bytes |
+		 * +---------------------------+
 		 */
 		if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_H264_SLICE)
 			pix_mp->plane_fmt[0].sizeimage +=
-				128 * DIV_ROUND_UP(pix_mp->width, 16) *
-				      DIV_ROUND_UP(pix_mp->height, 16);
+				64 * MB_WIDTH(pix_mp->width) *
+				     MB_WIDTH(pix_mp->height) + 32;
 	} else if (!pix_mp->plane_fmt[0].sizeimage) {
 		/*
 		 * For coded formats the application can specify