diff mbox series

media: verisilicon: Additional fix for the crash when opening the driver

Message ID 20230523162515.993862-1-benjamin.gaignard@collabora.com (mailing list archive)
State New, archived
Headers show
Series media: verisilicon: Additional fix for the crash when opening the driver | expand

Commit Message

Benjamin Gaignard May 23, 2023, 4:25 p.m. UTC
This fixes the following issue observed on Odroid-M1 board:

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
 Mem abort info:
 ...
 Modules linked in: crct10dif_ce hantro_vpu snd_soc_simple_card snd_soc_simple_card_utils v4l2_vp9 v4l2_h264 rockchip_saradc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops rtc_rk808 videobuf2_v4l2 industrialio_triggered_buffer rockchip_thermal dwmac_rk stmmac_platform stmmac videodev kfifo_buf display_connector videobuf2_common pcs_xpcs mc rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi drm_display_helper panfrost drm_shmem_helper gpu_sched ip_tables x_tables ipv6
 CPU: 3 PID: 176 Comm: v4l_id Not tainted 6.3.0-rc7-next-20230420 #13481
 Hardware name: Hardkernel ODROID-M1 (DT)
 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : hantro_try_fmt+0xa0/0x278 [hantro_vpu]
 lr : hantro_try_fmt+0x94/0x278 [hantro_vpu]
 ...
 Call trace:
  hantro_try_fmt+0xa0/0x278 [hantro_vpu]
  hantro_set_fmt_out+0x3c/0x298 [hantro_vpu]
  hantro_reset_raw_fmt+0x98/0x128 [hantro_vpu]
  hantro_set_fmt_cap+0x240/0x254 [hantro_vpu]
  hantro_reset_encoded_fmt+0x94/0xcc [hantro_vpu]
  hantro_reset_fmts+0x18/0x38 [hantro_vpu]
  hantro_open+0xd4/0x20c [hantro_vpu]
  v4l2_open+0x80/0x120 [videodev]
  chrdev_open+0xc0/0x22c
  do_dentry_open+0x13c/0x48c
  vfs_open+0x2c/0x38
  path_openat+0x550/0x934
  do_filp_open+0x80/0x12c
  do_sys_openat2+0xb4/0x168
  __arm64_sys_openat+0x64/0xac
  invoke_syscall+0x48/0x114
  el0_svc_common+0x100/0x120
  do_el0_svc+0x3c/0xa8
  el0_svc+0x40/0xa8
  el0t_64_sync_handler+0xb8/0xbc
  el0t_64_sync+0x190/0x194
 Code: 97fc8a7f f940aa80 52864a61 72a686c1 (b9400800)
 ---[ end trace 0000000000000000 ]---

Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
---
 drivers/media/platform/verisilicon/hantro_v4l2.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Ezequiel Garcia May 23, 2023, 4:28 p.m. UTC | #1
Hi Benjamin,

Thanks for the patch.

On Tue, May 23, 2023 at 1:25 PM Benjamin Gaignard
<benjamin.gaignard@collabora.com> wrote:
>
> This fixes the following issue observed on Odroid-M1 board:
>
>  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008

What pointer is NULL? ctx->src_fmt ?

>  Mem abort info:
>  ...
>  Modules linked in: crct10dif_ce hantro_vpu snd_soc_simple_card snd_soc_simple_card_utils v4l2_vp9 v4l2_h264 rockchip_saradc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops rtc_rk808 videobuf2_v4l2 industrialio_triggered_buffer rockchip_thermal dwmac_rk stmmac_platform stmmac videodev kfifo_buf display_connector videobuf2_common pcs_xpcs mc rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi drm_display_helper panfrost drm_shmem_helper gpu_sched ip_tables x_tables ipv6
>  CPU: 3 PID: 176 Comm: v4l_id Not tainted 6.3.0-rc7-next-20230420 #13481
>  Hardware name: Hardkernel ODROID-M1 (DT)
>  pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : hantro_try_fmt+0xa0/0x278 [hantro_vpu]
>  lr : hantro_try_fmt+0x94/0x278 [hantro_vpu]
>  ...
>  Call trace:
>   hantro_try_fmt+0xa0/0x278 [hantro_vpu]
>   hantro_set_fmt_out+0x3c/0x298 [hantro_vpu]
>   hantro_reset_raw_fmt+0x98/0x128 [hantro_vpu]
>   hantro_set_fmt_cap+0x240/0x254 [hantro_vpu]
>   hantro_reset_encoded_fmt+0x94/0xcc [hantro_vpu]
>   hantro_reset_fmts+0x18/0x38 [hantro_vpu]
>   hantro_open+0xd4/0x20c [hantro_vpu]
>   v4l2_open+0x80/0x120 [videodev]
>   chrdev_open+0xc0/0x22c
>   do_dentry_open+0x13c/0x48c
>   vfs_open+0x2c/0x38
>   path_openat+0x550/0x934
>   do_filp_open+0x80/0x12c
>   do_sys_openat2+0xb4/0x168
>   __arm64_sys_openat+0x64/0xac
>   invoke_syscall+0x48/0x114
>   el0_svc_common+0x100/0x120
>   do_el0_svc+0x3c/0xa8
>   el0_svc+0x40/0xa8
>   el0t_64_sync_handler+0xb8/0xbc
>   el0t_64_sync+0x190/0x194
>  Code: 97fc8a7f f940aa80 52864a61 72a686c1 (b9400800)
>  ---[ end trace 0000000000000000 ]---
>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
>
> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> ---
>  drivers/media/platform/verisilicon/hantro_v4l2.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> index 835518534e3b..61cfaaf4e927 100644
> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> @@ -397,10 +397,12 @@ hantro_reset_raw_fmt(struct hantro_ctx *ctx, int bit_depth)
>         if (!raw_vpu_fmt)
>                 return -EINVAL;
>
> -       if (ctx->is_encoder)
> +       if (ctx->is_encoder) {
>                 encoded_fmt = &ctx->dst_fmt;
> -       else
> +               ctx->vpu_src_fmt = raw_vpu_fmt;
> +       } else {
>                 encoded_fmt = &ctx->src_fmt;
> +       }
>
>         hantro_reset_fmt(&raw_fmt, raw_vpu_fmt);
>         raw_fmt.width = encoded_fmt->width;
> --
> 2.34.1
>
Benjamin Gaignard May 23, 2023, 4:33 p.m. UTC | #2
Le 23/05/2023 à 18:28, Ezequiel Garcia a écrit :
> Hi Benjamin,
>
> Thanks for the patch.
>
> On Tue, May 23, 2023 at 1:25 PM Benjamin Gaignard
> <benjamin.gaignard@collabora.com> wrote:
>> This fixes the following issue observed on Odroid-M1 board:
>>
>>   Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> What pointer is NULL? ctx->src_fmt ?

yes ctx->vpu_src_fmt pointer was NULL when probing the encoder.

>
>>   Mem abort info:
>>   ...
>>   Modules linked in: crct10dif_ce hantro_vpu snd_soc_simple_card snd_soc_simple_card_utils v4l2_vp9 v4l2_h264 rockchip_saradc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops rtc_rk808 videobuf2_v4l2 industrialio_triggered_buffer rockchip_thermal dwmac_rk stmmac_platform stmmac videodev kfifo_buf display_connector videobuf2_common pcs_xpcs mc rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi drm_display_helper panfrost drm_shmem_helper gpu_sched ip_tables x_tables ipv6
>>   CPU: 3 PID: 176 Comm: v4l_id Not tainted 6.3.0-rc7-next-20230420 #13481
>>   Hardware name: Hardkernel ODROID-M1 (DT)
>>   pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>   pc : hantro_try_fmt+0xa0/0x278 [hantro_vpu]
>>   lr : hantro_try_fmt+0x94/0x278 [hantro_vpu]
>>   ...
>>   Call trace:
>>    hantro_try_fmt+0xa0/0x278 [hantro_vpu]
>>    hantro_set_fmt_out+0x3c/0x298 [hantro_vpu]
>>    hantro_reset_raw_fmt+0x98/0x128 [hantro_vpu]
>>    hantro_set_fmt_cap+0x240/0x254 [hantro_vpu]
>>    hantro_reset_encoded_fmt+0x94/0xcc [hantro_vpu]
>>    hantro_reset_fmts+0x18/0x38 [hantro_vpu]
>>    hantro_open+0xd4/0x20c [hantro_vpu]
>>    v4l2_open+0x80/0x120 [videodev]
>>    chrdev_open+0xc0/0x22c
>>    do_dentry_open+0x13c/0x48c
>>    vfs_open+0x2c/0x38
>>    path_openat+0x550/0x934
>>    do_filp_open+0x80/0x12c
>>    do_sys_openat2+0xb4/0x168
>>    __arm64_sys_openat+0x64/0xac
>>    invoke_syscall+0x48/0x114
>>    el0_svc_common+0x100/0x120
>>    do_el0_svc+0x3c/0xa8
>>    el0_svc+0x40/0xa8
>>    el0t_64_sync_handler+0xb8/0xbc
>>    el0t_64_sync+0x190/0x194
>>   Code: 97fc8a7f f940aa80 52864a61 72a686c1 (b9400800)
>>   ---[ end trace 0000000000000000 ]---
>>
>> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
>>
>> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
>> ---
>>   drivers/media/platform/verisilicon/hantro_v4l2.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
>> index 835518534e3b..61cfaaf4e927 100644
>> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
>> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
>> @@ -397,10 +397,12 @@ hantro_reset_raw_fmt(struct hantro_ctx *ctx, int bit_depth)
>>          if (!raw_vpu_fmt)
>>                  return -EINVAL;
>>
>> -       if (ctx->is_encoder)
>> +       if (ctx->is_encoder) {
>>                  encoded_fmt = &ctx->dst_fmt;
>> -       else
>> +               ctx->vpu_src_fmt = raw_vpu_fmt;
>> +       } else {
>>                  encoded_fmt = &ctx->src_fmt;
>> +       }
>>
>>          hantro_reset_fmt(&raw_fmt, raw_vpu_fmt);
>>          raw_fmt.width = encoded_fmt->width;
>> --
>> 2.34.1
>>
Benjamin Gaignard May 23, 2023, 4:36 p.m. UTC | #3
Le 23/05/2023 à 18:25, Benjamin Gaignard a écrit :
> This fixes the following issue observed on Odroid-M1 board:
>
>   Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
>   Mem abort info:
>   ...
>   Modules linked in: crct10dif_ce hantro_vpu snd_soc_simple_card snd_soc_simple_card_utils v4l2_vp9 v4l2_h264 rockchip_saradc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops rtc_rk808 videobuf2_v4l2 industrialio_triggered_buffer rockchip_thermal dwmac_rk stmmac_platform stmmac videodev kfifo_buf display_connector videobuf2_common pcs_xpcs mc rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi drm_display_helper panfrost drm_shmem_helper gpu_sched ip_tables x_tables ipv6
>   CPU: 3 PID: 176 Comm: v4l_id Not tainted 6.3.0-rc7-next-20230420 #13481
>   Hardware name: Hardkernel ODROID-M1 (DT)
>   pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : hantro_try_fmt+0xa0/0x278 [hantro_vpu]
>   lr : hantro_try_fmt+0x94/0x278 [hantro_vpu]
>   ...
>   Call trace:
>    hantro_try_fmt+0xa0/0x278 [hantro_vpu]
>    hantro_set_fmt_out+0x3c/0x298 [hantro_vpu]
>    hantro_reset_raw_fmt+0x98/0x128 [hantro_vpu]
>    hantro_set_fmt_cap+0x240/0x254 [hantro_vpu]
>    hantro_reset_encoded_fmt+0x94/0xcc [hantro_vpu]
>    hantro_reset_fmts+0x18/0x38 [hantro_vpu]
>    hantro_open+0xd4/0x20c [hantro_vpu]
>    v4l2_open+0x80/0x120 [videodev]
>    chrdev_open+0xc0/0x22c
>    do_dentry_open+0x13c/0x48c
>    vfs_open+0x2c/0x38
>    path_openat+0x550/0x934
>    do_filp_open+0x80/0x12c
>    do_sys_openat2+0xb4/0x168
>    __arm64_sys_openat+0x64/0xac
>    invoke_syscall+0x48/0x114
>    el0_svc_common+0x100/0x120
>    do_el0_svc+0x3c/0xa8
>    el0_svc+0x40/0xa8
>    el0t_64_sync_handler+0xb8/0xbc
>    el0t_64_sync+0x190/0x194
>   Code: 97fc8a7f f940aa80 52864a61 72a686c1 (b9400800)
>   ---[ end trace 0000000000000000 ]---
>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
>
> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> ---

Diederick, Marek, Michael,
I have tested this patch on my boards and I see no regressions on
decoder part and no more crash when probing the encoder.
Could you test it on your side to confirm it is ok ?

Thorsten, I try/test regzbot commands, please tell me if it is correct.

#regzbot ^introduced db6f68b51e5c
#regzbot title media: verisilicon: null pointer dereference in try_fmt
#regzbot ignore-activity


>   drivers/media/platform/verisilicon/hantro_v4l2.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> index 835518534e3b..61cfaaf4e927 100644
> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> @@ -397,10 +397,12 @@ hantro_reset_raw_fmt(struct hantro_ctx *ctx, int bit_depth)
>   	if (!raw_vpu_fmt)
>   		return -EINVAL;
>   
> -	if (ctx->is_encoder)
> +	if (ctx->is_encoder) {
>   		encoded_fmt = &ctx->dst_fmt;
> -	else
> +		ctx->vpu_src_fmt = raw_vpu_fmt;
> +	} else {
>   		encoded_fmt = &ctx->src_fmt;
> +	}
>   
>   	hantro_reset_fmt(&raw_fmt, raw_vpu_fmt);
>   	raw_fmt.width = encoded_fmt->width;
Michael Tretter May 23, 2023, 5:06 p.m. UTC | #4
On Tue, 23 May 2023 18:36:09 +0200, Benjamin Gaignard wrote:
> 
> Le 23/05/2023 à 18:25, Benjamin Gaignard a écrit :
> > This fixes the following issue observed on Odroid-M1 board:
> > 
> >   Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> >   Mem abort info:
> >   ...
> >   Modules linked in: crct10dif_ce hantro_vpu snd_soc_simple_card snd_soc_simple_card_utils v4l2_vp9 v4l2_h264 rockchip_saradc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops rtc_rk808 videobuf2_v4l2 industrialio_triggered_buffer rockchip_thermal dwmac_rk stmmac_platform stmmac videodev kfifo_buf display_connector videobuf2_common pcs_xpcs mc rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi drm_display_helper panfrost drm_shmem_helper gpu_sched ip_tables x_tables ipv6
> >   CPU: 3 PID: 176 Comm: v4l_id Not tainted 6.3.0-rc7-next-20230420 #13481
> >   Hardware name: Hardkernel ODROID-M1 (DT)
> >   pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >   pc : hantro_try_fmt+0xa0/0x278 [hantro_vpu]
> >   lr : hantro_try_fmt+0x94/0x278 [hantro_vpu]
> >   ...
> >   Call trace:
> >    hantro_try_fmt+0xa0/0x278 [hantro_vpu]
> >    hantro_set_fmt_out+0x3c/0x298 [hantro_vpu]
> >    hantro_reset_raw_fmt+0x98/0x128 [hantro_vpu]
> >    hantro_set_fmt_cap+0x240/0x254 [hantro_vpu]
> >    hantro_reset_encoded_fmt+0x94/0xcc [hantro_vpu]
> >    hantro_reset_fmts+0x18/0x38 [hantro_vpu]
> >    hantro_open+0xd4/0x20c [hantro_vpu]
> >    v4l2_open+0x80/0x120 [videodev]
> >    chrdev_open+0xc0/0x22c
> >    do_dentry_open+0x13c/0x48c
> >    vfs_open+0x2c/0x38
> >    path_openat+0x550/0x934
> >    do_filp_open+0x80/0x12c
> >    do_sys_openat2+0xb4/0x168
> >    __arm64_sys_openat+0x64/0xac
> >    invoke_syscall+0x48/0x114
> >    el0_svc_common+0x100/0x120
> >    do_el0_svc+0x3c/0xa8
> >    el0_svc+0x40/0xa8
> >    el0t_64_sync_handler+0xb8/0xbc
> >    el0t_64_sync+0x190/0x194
> >   Code: 97fc8a7f f940aa80 52864a61 72a686c1 (b9400800)
> >   ---[ end trace 0000000000000000 ]---
> > 
> > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")

This patch partially reverts the previous commit. I wonder whether the reason
for resetting the context format only if the targeted queue is not busy still
stands.

> > 
> > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>

Tested-by: Michael Tretter <m.tretter@pengutronix.de>

> > ---
> 
> Diederick, Marek, Michael,
> I have tested this patch on my boards and I see no regressions on
> decoder part and no more crash when probing the encoder.
> Could you test it on your side to confirm it is ok ?
> 
> Thorsten, I try/test regzbot commands, please tell me if it is correct.
> 
> #regzbot ^introduced db6f68b51e5c
> #regzbot title media: verisilicon: null pointer dereference in try_fmt
> #regzbot ignore-activity
> 
> 
> >   drivers/media/platform/verisilicon/hantro_v4l2.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > index 835518534e3b..61cfaaf4e927 100644
> > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > @@ -397,10 +397,12 @@ hantro_reset_raw_fmt(struct hantro_ctx *ctx, int bit_depth)
> >   	if (!raw_vpu_fmt)
> >   		return -EINVAL;
> > -	if (ctx->is_encoder)
> > +	if (ctx->is_encoder) {
> >   		encoded_fmt = &ctx->dst_fmt;
> > -	else
> > +		ctx->vpu_src_fmt = raw_vpu_fmt;
> > +	} else {
> >   		encoded_fmt = &ctx->src_fmt;
> > +	}
> >   	hantro_reset_fmt(&raw_fmt, raw_vpu_fmt);
> >   	raw_fmt.width = encoded_fmt->width;
>
Ezequiel Garcia May 23, 2023, 5:36 p.m. UTC | #5
Hi guys,

After reviewing the format logic (hantro_reset_encoded_fmt and
hantro_reset_raw_fmt).
It seems to me trying to support Decoders, Encoders and so many
different SoC Variants, is getting increasingly fragile.
This driver is becoming a big fat monolith. Regressions like this will
be increasingly frequent.

The only codec that supports encoding right now is JPEG, so I think
it's a good idea to remove it for good,
and split it to its own driver.

Anyone volunteering? :-)

Thanks,
Ezequiel

On Tue, May 23, 2023 at 2:06 PM Michael Tretter
<m.tretter@pengutronix.de> wrote:
>
> On Tue, 23 May 2023 18:36:09 +0200, Benjamin Gaignard wrote:
> >
> > Le 23/05/2023 à 18:25, Benjamin Gaignard a écrit :
> > > This fixes the following issue observed on Odroid-M1 board:
> > >
> > >   Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> > >   Mem abort info:
> > >   ...
> > >   Modules linked in: crct10dif_ce hantro_vpu snd_soc_simple_card snd_soc_simple_card_utils v4l2_vp9 v4l2_h264 rockchip_saradc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops rtc_rk808 videobuf2_v4l2 industrialio_triggered_buffer rockchip_thermal dwmac_rk stmmac_platform stmmac videodev kfifo_buf display_connector videobuf2_common pcs_xpcs mc rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi drm_display_helper panfrost drm_shmem_helper gpu_sched ip_tables x_tables ipv6
> > >   CPU: 3 PID: 176 Comm: v4l_id Not tainted 6.3.0-rc7-next-20230420 #13481
> > >   Hardware name: Hardkernel ODROID-M1 (DT)
> > >   pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > >   pc : hantro_try_fmt+0xa0/0x278 [hantro_vpu]
> > >   lr : hantro_try_fmt+0x94/0x278 [hantro_vpu]
> > >   ...
> > >   Call trace:
> > >    hantro_try_fmt+0xa0/0x278 [hantro_vpu]
> > >    hantro_set_fmt_out+0x3c/0x298 [hantro_vpu]
> > >    hantro_reset_raw_fmt+0x98/0x128 [hantro_vpu]
> > >    hantro_set_fmt_cap+0x240/0x254 [hantro_vpu]
> > >    hantro_reset_encoded_fmt+0x94/0xcc [hantro_vpu]
> > >    hantro_reset_fmts+0x18/0x38 [hantro_vpu]
> > >    hantro_open+0xd4/0x20c [hantro_vpu]
> > >    v4l2_open+0x80/0x120 [videodev]
> > >    chrdev_open+0xc0/0x22c
> > >    do_dentry_open+0x13c/0x48c
> > >    vfs_open+0x2c/0x38
> > >    path_openat+0x550/0x934
> > >    do_filp_open+0x80/0x12c
> > >    do_sys_openat2+0xb4/0x168
> > >    __arm64_sys_openat+0x64/0xac
> > >    invoke_syscall+0x48/0x114
> > >    el0_svc_common+0x100/0x120
> > >    do_el0_svc+0x3c/0xa8
> > >    el0_svc+0x40/0xa8
> > >    el0t_64_sync_handler+0xb8/0xbc
> > >    el0t_64_sync+0x190/0x194
> > >   Code: 97fc8a7f f940aa80 52864a61 72a686c1 (b9400800)
> > >   ---[ end trace 0000000000000000 ]---
> > >
> > > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
>
> This patch partially reverts the previous commit. I wonder whether the reason
> for resetting the context format only if the targeted queue is not busy still
> stands.
>
> > >
> > > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
>
> Tested-by: Michael Tretter <m.tretter@pengutronix.de>
>
> > > ---
> >
> > Diederick, Marek, Michael,
> > I have tested this patch on my boards and I see no regressions on
> > decoder part and no more crash when probing the encoder.
> > Could you test it on your side to confirm it is ok ?
> >
> > Thorsten, I try/test regzbot commands, please tell me if it is correct.
> >
> > #regzbot ^introduced db6f68b51e5c
> > #regzbot title media: verisilicon: null pointer dereference in try_fmt
> > #regzbot ignore-activity
> >
> >
> > >   drivers/media/platform/verisilicon/hantro_v4l2.c | 6 ++++--
> > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > index 835518534e3b..61cfaaf4e927 100644
> > > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > @@ -397,10 +397,12 @@ hantro_reset_raw_fmt(struct hantro_ctx *ctx, int bit_depth)
> > >     if (!raw_vpu_fmt)
> > >             return -EINVAL;
> > > -   if (ctx->is_encoder)
> > > +   if (ctx->is_encoder) {
> > >             encoded_fmt = &ctx->dst_fmt;
> > > -   else
> > > +           ctx->vpu_src_fmt = raw_vpu_fmt;
> > > +   } else {
> > >             encoded_fmt = &ctx->src_fmt;
> > > +   }
> > >     hantro_reset_fmt(&raw_fmt, raw_vpu_fmt);
> > >     raw_fmt.width = encoded_fmt->width;
> >
Diederik de Haas May 23, 2023, 7:58 p.m. UTC | #6
On Tuesday, 23 May 2023 18:36:09 CEST Benjamin Gaignard wrote:
> Diederik, Marek, Michael,
> I have tested this patch on my boards and I see no regressions on
> decoder part and no more crash when probing the encoder.
> Could you test it on your side to confirm it is ok ?

With this patch I'm (also) not seeing the crash

Tested-by: Diederik de Haas <didi.debian@cknow.org>
Nicolas Dufresne May 23, 2023, 11:22 p.m. UTC | #7
Le mardi 23 mai 2023 à 14:36 -0300, Ezequiel Garcia a écrit :
> Hi guys,
> 
> After reviewing the format logic (hantro_reset_encoded_fmt and
> hantro_reset_raw_fmt).
> It seems to me trying to support Decoders, Encoders and so many
> different SoC Variants, is getting increasingly fragile.
> This driver is becoming a big fat monolith. Regressions like this will
> be increasingly frequent.
> 
> The only codec that supports encoding right now is JPEG, so I think
> it's a good idea to remove it for good,
> and split it to its own driver.
> 
> Anyone volunteering? :-)

We won't have that luxury with VP8 and H.264, as the decoder and encoder shares
the same cache memory. They must be time sliced. Note that this driver is only
missing VP8/H.264 encoding before it becomes maintenance only (there won't be
any interesting feature left, so I would not start on big refactoring, as this
may cause more trouble then good. Anything newer like VC8000 or VC9000 should be
a new driver, and with encoder/decoder split.

regards,
Nicolas

p.s. this is my personal opinion, in general, we should improve the helpers if
there is too much boilerplate, rather then creating monolithic drivers, and on
that, I believe I agree, but the H1/G1 combo have hardware dependencies which
has been solve that way, and changing that now is a big amount of work for a
relative quite driver. Feel free to split G2 away from that driver, that would
make sense, its not sharing anything.

> 
> Thanks,
> Ezequiel
> 
> On Tue, May 23, 2023 at 2:06 PM Michael Tretter
> <m.tretter@pengutronix.de> wrote:
> > 
> > On Tue, 23 May 2023 18:36:09 +0200, Benjamin Gaignard wrote:
> > > 
> > > Le 23/05/2023 à 18:25, Benjamin Gaignard a écrit :
> > > > This fixes the following issue observed on Odroid-M1 board:
> > > > 
> > > >   Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> > > >   Mem abort info:
> > > >   ...
> > > >   Modules linked in: crct10dif_ce hantro_vpu snd_soc_simple_card snd_soc_simple_card_utils v4l2_vp9 v4l2_h264 rockchip_saradc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops rtc_rk808 videobuf2_v4l2 industrialio_triggered_buffer rockchip_thermal dwmac_rk stmmac_platform stmmac videodev kfifo_buf display_connector videobuf2_common pcs_xpcs mc rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi drm_display_helper panfrost drm_shmem_helper gpu_sched ip_tables x_tables ipv6
> > > >   CPU: 3 PID: 176 Comm: v4l_id Not tainted 6.3.0-rc7-next-20230420 #13481
> > > >   Hardware name: Hardkernel ODROID-M1 (DT)
> > > >   pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > >   pc : hantro_try_fmt+0xa0/0x278 [hantro_vpu]
> > > >   lr : hantro_try_fmt+0x94/0x278 [hantro_vpu]
> > > >   ...
> > > >   Call trace:
> > > >    hantro_try_fmt+0xa0/0x278 [hantro_vpu]
> > > >    hantro_set_fmt_out+0x3c/0x298 [hantro_vpu]
> > > >    hantro_reset_raw_fmt+0x98/0x128 [hantro_vpu]
> > > >    hantro_set_fmt_cap+0x240/0x254 [hantro_vpu]
> > > >    hantro_reset_encoded_fmt+0x94/0xcc [hantro_vpu]
> > > >    hantro_reset_fmts+0x18/0x38 [hantro_vpu]
> > > >    hantro_open+0xd4/0x20c [hantro_vpu]
> > > >    v4l2_open+0x80/0x120 [videodev]
> > > >    chrdev_open+0xc0/0x22c
> > > >    do_dentry_open+0x13c/0x48c
> > > >    vfs_open+0x2c/0x38
> > > >    path_openat+0x550/0x934
> > > >    do_filp_open+0x80/0x12c
> > > >    do_sys_openat2+0xb4/0x168
> > > >    __arm64_sys_openat+0x64/0xac
> > > >    invoke_syscall+0x48/0x114
> > > >    el0_svc_common+0x100/0x120
> > > >    do_el0_svc+0x3c/0xa8
> > > >    el0_svc+0x40/0xa8
> > > >    el0t_64_sync_handler+0xb8/0xbc
> > > >    el0t_64_sync+0x190/0x194
> > > >   Code: 97fc8a7f f940aa80 52864a61 72a686c1 (b9400800)
> > > >   ---[ end trace 0000000000000000 ]---
> > > > 
> > > > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
> > 
> > This patch partially reverts the previous commit. I wonder whether the reason
> > for resetting the context format only if the targeted queue is not busy still
> > stands.
> > 
> > > > 
> > > > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> > 
> > Tested-by: Michael Tretter <m.tretter@pengutronix.de>
> > 
> > > > ---
> > > 
> > > Diederick, Marek, Michael,
> > > I have tested this patch on my boards and I see no regressions on
> > > decoder part and no more crash when probing the encoder.
> > > Could you test it on your side to confirm it is ok ?
> > > 
> > > Thorsten, I try/test regzbot commands, please tell me if it is correct.
> > > 
> > > #regzbot ^introduced db6f68b51e5c
> > > #regzbot title media: verisilicon: null pointer dereference in try_fmt
> > > #regzbot ignore-activity
> > > 
> > > 
> > > >   drivers/media/platform/verisilicon/hantro_v4l2.c | 6 ++++--
> > > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > > index 835518534e3b..61cfaaf4e927 100644
> > > > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > > @@ -397,10 +397,12 @@ hantro_reset_raw_fmt(struct hantro_ctx *ctx, int bit_depth)
> > > >     if (!raw_vpu_fmt)
> > > >             return -EINVAL;
> > > > -   if (ctx->is_encoder)
> > > > +   if (ctx->is_encoder) {
> > > >             encoded_fmt = &ctx->dst_fmt;
> > > > -   else
> > > > +           ctx->vpu_src_fmt = raw_vpu_fmt;
> > > > +   } else {
> > > >             encoded_fmt = &ctx->src_fmt;
> > > > +   }
> > > >     hantro_reset_fmt(&raw_fmt, raw_vpu_fmt);
> > > >     raw_fmt.width = encoded_fmt->width;
> > >
Marek Szyprowski May 24, 2023, 7:51 a.m. UTC | #8
On 23.05.2023 18:25, Benjamin Gaignard wrote:
> This fixes the following issue observed on Odroid-M1 board:
>
>   Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
>   Mem abort info:
>   ...
>   Modules linked in: crct10dif_ce hantro_vpu snd_soc_simple_card snd_soc_simple_card_utils v4l2_vp9 v4l2_h264 rockchip_saradc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops rtc_rk808 videobuf2_v4l2 industrialio_triggered_buffer rockchip_thermal dwmac_rk stmmac_platform stmmac videodev kfifo_buf display_connector videobuf2_common pcs_xpcs mc rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi drm_display_helper panfrost drm_shmem_helper gpu_sched ip_tables x_tables ipv6
>   CPU: 3 PID: 176 Comm: v4l_id Not tainted 6.3.0-rc7-next-20230420 #13481
>   Hardware name: Hardkernel ODROID-M1 (DT)
>   pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : hantro_try_fmt+0xa0/0x278 [hantro_vpu]
>   lr : hantro_try_fmt+0x94/0x278 [hantro_vpu]
>   ...
>   Call trace:
>    hantro_try_fmt+0xa0/0x278 [hantro_vpu]
>    hantro_set_fmt_out+0x3c/0x298 [hantro_vpu]
>    hantro_reset_raw_fmt+0x98/0x128 [hantro_vpu]
>    hantro_set_fmt_cap+0x240/0x254 [hantro_vpu]
>    hantro_reset_encoded_fmt+0x94/0xcc [hantro_vpu]
>    hantro_reset_fmts+0x18/0x38 [hantro_vpu]
>    hantro_open+0xd4/0x20c [hantro_vpu]
>    v4l2_open+0x80/0x120 [videodev]
>    chrdev_open+0xc0/0x22c
>    do_dentry_open+0x13c/0x48c
>    vfs_open+0x2c/0x38
>    path_openat+0x550/0x934
>    do_filp_open+0x80/0x12c
>    do_sys_openat2+0xb4/0x168
>    __arm64_sys_openat+0x64/0xac
>    invoke_syscall+0x48/0x114
>    el0_svc_common+0x100/0x120
>    do_el0_svc+0x3c/0xa8
>    el0_svc+0x40/0xa8
>    el0t_64_sync_handler+0xb8/0xbc
>    el0t_64_sync+0x190/0x194
>   Code: 97fc8a7f f940aa80 52864a61 72a686c1 (b9400800)
>   ---[ end trace 0000000000000000 ]---
>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
>
> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>   drivers/media/platform/verisilicon/hantro_v4l2.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> index 835518534e3b..61cfaaf4e927 100644
> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> @@ -397,10 +397,12 @@ hantro_reset_raw_fmt(struct hantro_ctx *ctx, int bit_depth)
>   	if (!raw_vpu_fmt)
>   		return -EINVAL;
>   
> -	if (ctx->is_encoder)
> +	if (ctx->is_encoder) {
>   		encoded_fmt = &ctx->dst_fmt;
> -	else
> +		ctx->vpu_src_fmt = raw_vpu_fmt;
> +	} else {
>   		encoded_fmt = &ctx->src_fmt;
> +	}
>   
>   	hantro_reset_fmt(&raw_fmt, raw_vpu_fmt);
>   	raw_fmt.width = encoded_fmt->width;

Best regards
Linux regression tracking (Thorsten Leemhuis) May 24, 2023, 9:06 a.m. UTC | #9
On 23.05.23 18:36, Benjamin Gaignard wrote:
> 
> Le 23/05/2023 à 18:25, Benjamin Gaignard a écrit :
>> This fixes the following issue observed on Odroid-M1 board:
> [...]
> Diederick, Marek, Michael,
> I have tested this patch on my boards and I see no regressions on
> decoder part and no more crash when probing the encoder.
> Could you test it on your side to confirm it is ok ?

They all did, so that is done. Thx for your help, everybody!

/me now hopes this patch will be quickly reviewed, accepted and sent to
Linus to prevent even more people running into this...

> Thorsten, I try/test regzbot commands, please tell me if it is correct.
> 
> #regzbot ^introduced db6f68b51e5c
> #regzbot title media: verisilicon: null pointer dereference in try_fmt
> #regzbot ignore-activity

Thx for this, we just now track this regression two times. No worries,
let me fix this and also tell regzbot about the fix:

#regzbot dup-of: https://lore.kernel.org/lkml/4995215.LvFx2qVVIh@bagend/
#regzbot fix: media: verisilicon: Additional fix for the crash when
opening the driver

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
Hans Verkuil May 24, 2023, 9:15 a.m. UTC | #10
On 24/05/2023 11:06, Thorsten Leemhuis wrote:
> On 23.05.23 18:36, Benjamin Gaignard wrote:
>>
>> Le 23/05/2023 à 18:25, Benjamin Gaignard a écrit :
>>> This fixes the following issue observed on Odroid-M1 board:
>> [...]
>> Diederick, Marek, Michael,
>> I have tested this patch on my boards and I see no regressions on
>> decoder part and no more crash when probing the encoder.
>> Could you test it on your side to confirm it is ok ?
> 
> They all did, so that is done. Thx for your help, everybody!
> 
> /me now hopes this patch will be quickly reviewed, accepted and sent to
> Linus to prevent even more people running into this...

I plan to make a PR with 6.4 fixes today or tomorrow.

Regards,

	Hans

> 
>> Thorsten, I try/test regzbot commands, please tell me if it is correct.
>>
>> #regzbot ^introduced db6f68b51e5c
>> #regzbot title media: verisilicon: null pointer dereference in try_fmt
>> #regzbot ignore-activity
> 
> Thx for this, we just now track this regression two times. No worries,
> let me fix this and also tell regzbot about the fix:
> 
> #regzbot dup-of: https://lore.kernel.org/lkml/4995215.LvFx2qVVIh@bagend/
> #regzbot fix: media: verisilicon: Additional fix for the crash when
> opening the driver
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
Linux regression tracking (Thorsten Leemhuis) May 24, 2023, 9:22 a.m. UTC | #11
On 24.05.23 11:15, Hans Verkuil wrote:
> On 24/05/2023 11:06, Thorsten Leemhuis wrote:
>> On 23.05.23 18:36, Benjamin Gaignard wrote:
>>>
>>> Le 23/05/2023 à 18:25, Benjamin Gaignard a écrit :
>>>> This fixes the following issue observed on Odroid-M1 board:
>>> [...]
>>> Diederick, Marek, Michael,
>>> I have tested this patch on my boards and I see no regressions on
>>> decoder part and no more crash when probing the encoder.
>>> Could you test it on your side to confirm it is ok ?
>>
>> They all did, so that is done. Thx for your help, everybody!
>>
>> /me now hopes this patch will be quickly reviewed, accepted and sent to
>> Linus to prevent even more people running into this...
> 
> I plan to make a PR with 6.4 fixes today or tomorrow.

Ahh, fabulous, many thx!

Ciao, Thorsten
diff mbox series

Patch

diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
index 835518534e3b..61cfaaf4e927 100644
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
@@ -397,10 +397,12 @@  hantro_reset_raw_fmt(struct hantro_ctx *ctx, int bit_depth)
 	if (!raw_vpu_fmt)
 		return -EINVAL;
 
-	if (ctx->is_encoder)
+	if (ctx->is_encoder) {
 		encoded_fmt = &ctx->dst_fmt;
-	else
+		ctx->vpu_src_fmt = raw_vpu_fmt;
+	} else {
 		encoded_fmt = &ctx->src_fmt;
+	}
 
 	hantro_reset_fmt(&raw_fmt, raw_vpu_fmt);
 	raw_fmt.width = encoded_fmt->width;