Message ID | 20230413104756.356695-1-benjamin.gaignard@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] media: verisilicon: Fix crash when probing encoder | expand |
Hi, cause I sent a Rb on old one after v2 was sent, the extra cleanup make sense to me. Le jeudi 13 avril 2023 à 12:47 +0200, Benjamin Gaignard a écrit : > ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt() > so assigne it to vpu_fmt led to crash the kernel. > Like for decoder case use 'fmt' as format for encoder and clean up > the code. > > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions") Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> > --- > version 2: > - Remove useless vpu_fmt. > > drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c > index 8f1414085f47..d71f79471396 100644 > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c > @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > struct v4l2_pix_format_mplane *pix_mp, > enum v4l2_buf_type type) > { > - const struct hantro_fmt *fmt, *vpu_fmt; > + const struct hantro_fmt *fmt; > bool capture = V4L2_TYPE_IS_CAPTURE(type); > bool coded; > > @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > > if (coded) { > pix_mp->num_planes = 1; > - vpu_fmt = fmt; > - } else if (ctx->is_encoder) { > - vpu_fmt = ctx->vpu_dst_fmt; > - } else { > - vpu_fmt = fmt; > + } else if (!ctx->is_encoder) { > /* > * Width/height on the CAPTURE end of a decoder are ignored and > * replaced by the OUTPUT ones. > @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > pix_mp->field = V4L2_FIELD_NONE; > > v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height, > - &vpu_fmt->frmsize); > + &fmt->frmsize); > > if (!coded) { > /* Fill remaining fields */ > -- > 2.34.1 > >
Hi, Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit : > Benjamin, > > Please include the crash stracktrace in the commit. > > Careful with HTML message, they don't always make it in these ML and tooling might not play well with the tooling. Perhaps it can be edited while pulling ? Here's the info from Marek's bug report: hantro-vpu fdea0000.video-codec: Adding to iommu group 0 hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as /dev/video0 hantro-vpu fdee0000.video-codec: Adding to iommu group 1 hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as /dev/video1 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000 [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_dma_contig snd_soc_simple_card display_connector snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables x_tables ipv6 CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478 Hardware name: Hardkernel ODROID-M1 (DT) pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu] lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu] ... Call trace: hantro_try_fmt+0xb4/0x280 [hantro_vpu] hantro_set_fmt_out+0x3c/0x278 [hantro_vpu] hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu] hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu] hantro_reset_fmts+0x94/0xcc [hantro_vpu] hantro_open+0xd4/0x20c [hantro_vpu] v4l2_open+0x80/0x120 [videodev] chrdev_open+0xc0/0x22c do_dentry_open+0x13c/0x490 vfs_open+0x2c/0x38 path_openat+0x550/0x938 do_filp_open+0x80/0x12c do_sys_openat2+0xb4/0x16c __arm64_sys_openat+0x64/0xac invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0xfc/0x11c do_el0_svc+0x38/0xa4 el0_svc+0x48/0xb8 el0t_64_sync_handler+0xb8/0xbc el0t_64_sync+0x190/0x194 Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800) ---[ end trace 0000000000000000 ]--- > > Thanks, > Ezequiel > > > On Thu, Apr 13, 2023 at 7:48 AM Benjamin Gaignard > <benjamin.gaignard@collabora.com> wrote: > > ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt() > > so assigne it to vpu_fmt led to crash the kernel. > > Like for decoder case use 'fmt' as format for encoder and clean up > > the code. > > > > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats > > in reset functions") > > --- > > version 2: > > - Remove useless vpu_fmt. > > > > drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++------- > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c > > b/drivers/media/platform/verisilicon/hantro_v4l2.c > > index 8f1414085f47..d71f79471396 100644 > > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c > > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c > > @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > > struct v4l2_pix_format_mplane *pix_mp, > > enum v4l2_buf_type type) > > { > > - const struct hantro_fmt *fmt, *vpu_fmt; > > + const struct hantro_fmt *fmt; > > bool capture = V4L2_TYPE_IS_CAPTURE(type); > > bool coded; > > > > @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > > > > if (coded) { > > pix_mp->num_planes = 1; > > - vpu_fmt = fmt; > > - } else if (ctx->is_encoder) { > > - vpu_fmt = ctx->vpu_dst_fmt; > > - } else { > > - vpu_fmt = fmt; > > + } else if (!ctx->is_encoder) { > > /* > > * Width/height on the CAPTURE end of a decoder are ignored > > and > > * replaced by the OUTPUT ones. > > @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > > pix_mp->field = V4L2_FIELD_NONE; > > > > v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height, > > - &vpu_fmt->frmsize); > > + &fmt->frmsize); > > > > if (!coded) { > > /* Fill remaining fields */
Il 13/04/23 12:47, Benjamin Gaignard ha scritto: > ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt() > so assigne it to vpu_fmt led to crash the kernel. > Like for decoder case use 'fmt' as format for encoder and clean up > the code. > > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions") Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Hi, On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote: > Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit : > > Benjamin, > > > > Please include the crash stracktrace in the commit. > > Careful with HTML message, they don't always make it in these ML and tooling > might not play well with the tooling. Perhaps it can be edited while > pulling ? Here's the info from Marek's bug report: > > hantro-vpu fdea0000.video-codec: Adding to iommu group 0 > hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as > /dev/video0 > hantro-vpu fdee0000.video-codec: Adding to iommu group 1 > hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as > /dev/video1 > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000008 > Mem abort info: > ESR = 0x0000000096000004 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x04: level 0 translation fault > Data abort info: > ISV = 0, ISS = 0x00000004 > CM = 0, WnR = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000 > [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 > Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem > videobuf2_dma_contig snd_soc_simple_card display_connector > snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk > rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc > industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs > rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper > analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables > x_tables ipv6 > CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478 > Hardware name: Hardkernel ODROID-M1 (DT) > pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu] > lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu] > ... > Call trace: > hantro_try_fmt+0xb4/0x280 [hantro_vpu] > hantro_set_fmt_out+0x3c/0x278 [hantro_vpu] > hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu] > hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu] > hantro_reset_fmts+0x94/0xcc [hantro_vpu] > hantro_open+0xd4/0x20c [hantro_vpu] > v4l2_open+0x80/0x120 [videodev] > chrdev_open+0xc0/0x22c > do_dentry_open+0x13c/0x490 > vfs_open+0x2c/0x38 > path_openat+0x550/0x938 > do_filp_open+0x80/0x12c > do_sys_openat2+0xb4/0x16c > __arm64_sys_openat+0x64/0xac > invoke_syscall+0x48/0x114 > el0_svc_common.constprop.0+0xfc/0x11c > do_el0_svc+0x38/0xa4 > el0_svc+0x48/0xb8 > el0t_64_sync_handler+0xb8/0xbc > el0t_64_sync+0x190/0x194 > Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800) > ---[ end trace 0000000000000000 ]--- When I booted into my 6.4-rc1 (but also rc2) kernel on my Pine64 Quartz64 Model A, I noticed a crash which seems the same as above, but I didn't have such a crash with my 6.3 kernel. Searching for 'hantro' led me to this commit as the most likely culprit but when I build a new 6.4-rcX kernel with this commit reverted, I still had this crash. Do you have suggestions which commit would then be the likely culprit? Cheers, Diederik For completeness, this is the error I got with 6.4-rcX: [ 26.976766] panfrost fde60000.gpu: clock rate = 594000000 [ 26.977297] panfrost fde60000.gpu: bus_clock rate = 500000000 [ 26.996012] random: crng init done [ 27.072438] videodev: Linux video capture interface: v2.00 [ 27.119012] Registered IR keymap rc-cec [ 27.125161] rc rc0: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0 [ 27.125427] panfrost fde60000.gpu: mali-g52 id 0x7402 major 0x1 minor 0x0 status 0x0 [ 27.125905] input: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0/input1 [ 27.126427] panfrost fde60000.gpu: features: 00000000,00000cf7, issues: 00000000,00000400 [ 27.127785] panfrost fde60000.gpu: Features: L2:0x07110206 Shader:0x00000002 Tiler:0x00000209 Mem:0x1 MMU:0x00002823 AS:0xff JS:0x7 [ 27.128954] panfrost fde60000.gpu: shader_present=0x1 l2_present=0x1 [ 27.148920] gpio-fan gpio_fan: GPIO fan initialized [ 27.191131] [drm] Initialized panfrost 1.2.0 20180908 for fde60000.gpu on minor 1 [ 27.265920] hantro-vpu fdea0000.video-codec: Adding to iommu group 0 [ 27.267535] hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as /dev/video0 [ 27.270668] hantro-vpu fdee0000.video-codec: Adding to iommu group 1 [ 27.272590] hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as /dev/video1 [ 27.573417] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 27.574238] Mem abort info: [ 27.574499] ESR = 0x0000000096000004 [ 27.574836] EC = 0x25: DABT (current EL), IL = 32 bits [ 27.575310] SET = 0, FnV = 0 [ 27.575586] EA = 0, S1PTW = 0 [ 27.575868] FSC = 0x04: level 0 translation fault [ 27.576368] Data abort info: [ 27.576637] ISV = 0, ISS = 0x00000004 [ 27.576980] CM = 0, WnR = 0 [ 27.577247] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010718b000 [ 27.577818] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [ 27.578430] Internal error: Oops: 0000000096000004 [#1] SMP [ 27.578934] Modules linked in: polyval_generic(E+) ghash_ce(E) gf128mul(E) snd_soc_hdmi_codec(E+) sha2_ce(E) ecdh_generic(E+) sha256_arm64(E) sha1_ce(E) rfkill(E) snd_soc_spdif_tx(E) ecc(E) leds_gpio(E) snd_soc_simple_card(E) snd_soc_simple_card_utils(E) display_connector(E) gpio_fan(E) hantro_vpu(E) v4l2_vp9(E) snd_soc_rockchip_i2s_tdm(E) v4l2_h264(E) videobuf2_dma_contig(E) snd_soc_rockchip_spdif(E) snd_soc_rk817(E) v4l2_mem2mem(E) videobuf2_memops(E) governor_simpleondemand(E) rockchip_thermal(E) videobuf2_v4l2(E) dw_wdt(E) snd_soc_core(E) dw_hdmi_i2s_audio(E) dw_hdmi_cec(E) videodev(E) snd_pcm_dmaengine(E) panfrost(E) videobuf2_common(E) snd_pcm(E) rk805_pwrkey(E) snd_timer(E) gpu_sched(E) snd(E) soundcore(E) mc(E) drm_shmem_helper(E) cpufreq_dt(E) loop(E) fuse(E) efi_pstore(E) dm_mod(E) dax(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) xhci_plat_hcd(E) xhci_hcd(E) motorcomm(E) rk808_regulator(E) fan53555(E) gpio_rockchip(E) dwmac_rk(E) stmmac_platform(E) [ 27.579108] crct10dif_ce(E) stmmac(E) pcs_xpcs(E) crct10dif_common(E) phylink(E) fixed(E) of_mdio(E) pinctrl_rockchip(E) phy_rockchip_inno_usb2(E) fixed_phy(E) sdhci_of_dwcmshc(E) dw_mmc_rockchip(E) fwnode_mdio(E) sdhci_pltfm(E) phy_rockchip_naneng_combphy(E) dw_mmc_pltfm(E) sdhci(E) dw_mmc(E) pl330(E) libphy(E) rockchipdrm(E) ptp(E) drm_dma_helper(E) analogix_dp(E) dw_hdmi(E) cec(E) rc_core(E) drm_display_helper(E) dw_mipi_dsi(E) pps_core(E) drm_kms_helper(E) ohci_platform(E) ohci_hcd(E) ehci_platform(E) io_domain(E) ehci_hcd(E) dwc3(E) i2c_rk3x(E) udc_core(E) usbcore(E) roles(E) ulpi(E) drm(E) usb_common(E) [ 27.591758] CPU: 1 PID: 407 Comm: v4l_id Tainted: G E 6.4.0-0-pine64-arm64 #1 Debian 6.4~rc2-1~pine64 [ 27.592705] Hardware name: Pine64 RK3566 Quartz64-A Board (DT) [ 27.593223] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 27.593843] pc : hantro_try_fmt+0xb8/0x290 [hantro_vpu] [ 27.594336] lr : hantro_try_fmt+0xac/0x290 [hantro_vpu] [ 27.594811] sp : ffff80000aa9b670 [ 27.595107] x29: ffff80000aa9b670 x28: ffff80000aa9bb60 x27: 0000000000000000 [ 27.595746] x26: 0000000000000000 x25: ffff000108380008 x24: ffff8000014ebac0 [ 27.596383] x23: 0000000000000001 x22: ffff000108380000 x21: 0000000000000000 [ 27.597019] x20: ffff8000014f10f0 x19: ffff80000aa9b708 x18: 0000000000000010 [ 27.597657] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000aa9b6a0 [ 27.598292] x14: ffff0001043fe280 x13: 0000000000000001 x12: 0000000000000001 [ 27.598927] x11: 0000000000000002 x10: 0000000000000003 x9 : 0000000000000002 [ 27.599563] x8 : 000000000000001f x7 : 000000000000005f x6 : 0000000000000003 [ 27.600199] x5 : ffff8000013e2583 x4 : ffff8000013e2580 x3 : 00000000ffffffff [ 27.600834] x2 : 0000000000000010 x1 : 0000000000000000 x0 : 0000000034363253 [ 27.601470] Call trace: [ 27.601693] hantro_try_fmt+0xb8/0x290 [hantro_vpu] [ 27.602143] hantro_set_fmt_out+0x44/0x388 [hantro_vpu] [ 27.602617] hantro_reset_raw_fmt+0x7c/0xe0 [hantro_vpu] [ 27.603096] hantro_set_fmt_cap+0x29c/0x2b8 [hantro_vpu] [ 27.603575] hantro_reset_encoded_fmt+0x80/0xc0 [hantro_vpu] [ 27.604086] hantro_reset_fmts+0x20/0x48 [hantro_vpu] [ 27.604545] hantro_open+0xe0/0x208 [hantro_vpu] [ 27.604965] v4l2_open+0x84/0x130 [videodev] [ 27.605389] chrdev_open+0xd8/0x2d8 [ 27.605712] do_dentry_open+0x1bc/0x490 [ 27.606060] vfs_open+0x34/0x40 [ 27.606345] path_openat+0x9c8/0xf20 [ 27.606670] do_filp_open+0xa4/0x160 [ 27.606991] do_sys_openat2+0xc8/0x188 [ 27.607327] __arm64_sys_openat+0x6c/0xb8 [ 27.607687] invoke_syscall+0x78/0x108 [ 27.608029] el0_svc_common.constprop.0+0xd4/0x100 [ 27.608456] do_el0_svc+0x40/0xa8 [ 27.608755] el0_svc+0x34/0xd8 [ 27.609036] el0t_64_sync_handler+0xf4/0x120 [ 27.609420] el0t_64_sync+0x190/0x198 [ 27.609753] Code: 97fbc8e2 f94056c1 52864a60 72a686c0 (b9400821) [ 27.610297] ---[ end trace 0000000000000000 ]---
Le 20/05/2023 à 00:34, Diederik de Haas a écrit : > Hi, > > On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote: >> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit : >>> Benjamin, >>> >>> Please include the crash stracktrace in the commit. >> Careful with HTML message, they don't always make it in these ML and tooling >> might not play well with the tooling. Perhaps it can be edited while >> pulling ? Here's the info from Marek's bug report: >> >> hantro-vpu fdea0000.video-codec: Adding to iommu group 0 >> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as >> /dev/video0 >> hantro-vpu fdee0000.video-codec: Adding to iommu group 1 >> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as >> /dev/video1 >> Unable to handle kernel NULL pointer dereference at virtual address >> 0000000000000008 >> Mem abort info: >> ESR = 0x0000000096000004 >> EC = 0x25: DABT (current EL), IL = 32 bits >> SET = 0, FnV = 0 >> EA = 0, S1PTW = 0 >> FSC = 0x04: level 0 translation fault >> Data abort info: >> ISV = 0, ISS = 0x00000004 >> CM = 0, WnR = 0 >> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000 >> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP >> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem >> videobuf2_dma_contig snd_soc_simple_card display_connector >> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk >> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc >> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs >> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper >> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables >> x_tables ipv6 >> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478 >> Hardware name: Hardkernel ODROID-M1 (DT) >> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu] >> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu] >> ... >> Call trace: >> hantro_try_fmt+0xb4/0x280 [hantro_vpu] >> hantro_set_fmt_out+0x3c/0x278 [hantro_vpu] >> hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu] >> hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu] >> hantro_reset_fmts+0x94/0xcc [hantro_vpu] >> hantro_open+0xd4/0x20c [hantro_vpu] >> v4l2_open+0x80/0x120 [videodev] >> chrdev_open+0xc0/0x22c >> do_dentry_open+0x13c/0x490 >> vfs_open+0x2c/0x38 >> path_openat+0x550/0x938 >> do_filp_open+0x80/0x12c >> do_sys_openat2+0xb4/0x16c >> __arm64_sys_openat+0x64/0xac >> invoke_syscall+0x48/0x114 >> el0_svc_common.constprop.0+0xfc/0x11c >> do_el0_svc+0x38/0xa4 >> el0_svc+0x48/0xb8 >> el0t_64_sync_handler+0xb8/0xbc >> el0t_64_sync+0x190/0x194 >> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800) >> ---[ end trace 0000000000000000 ]--- > When I booted into my 6.4-rc1 (but also rc2) kernel on my > Pine64 Quartz64 Model A, I noticed a crash which seems the same as > above, but I didn't have such a crash with my 6.3 kernel. > Searching for 'hantro' led me to this commit as the most likely culprit > but when I build a new 6.4-rcX kernel with this commit reverted, > I still had this crash. > Do you have suggestions which commit would then be the likely culprit? > This patch fix the crash at boot time, revert it doesn't seem to be the solution. Maybe this proposal from Marek can help you ? https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/ Regards, Benjamin > Cheers, > Diederik > > For completeness, this is the error I got with 6.4-rcX: > > [ 26.976766] panfrost fde60000.gpu: clock rate = 594000000 > [ 26.977297] panfrost fde60000.gpu: bus_clock rate = 500000000 > [ 26.996012] random: crng init done > [ 27.072438] videodev: Linux video capture interface: v2.00 > [ 27.119012] Registered IR keymap rc-cec > [ 27.125161] rc rc0: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0 > [ 27.125427] panfrost fde60000.gpu: mali-g52 id 0x7402 major 0x1 minor 0x0 status 0x0 > [ 27.125905] input: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0/input1 > [ 27.126427] panfrost fde60000.gpu: features: 00000000,00000cf7, issues: 00000000,00000400 > [ 27.127785] panfrost fde60000.gpu: Features: L2:0x07110206 Shader:0x00000002 Tiler:0x00000209 Mem:0x1 MMU:0x00002823 AS:0xff JS:0x7 > [ 27.128954] panfrost fde60000.gpu: shader_present=0x1 l2_present=0x1 > [ 27.148920] gpio-fan gpio_fan: GPIO fan initialized > [ 27.191131] [drm] Initialized panfrost 1.2.0 20180908 for fde60000.gpu on minor 1 > [ 27.265920] hantro-vpu fdea0000.video-codec: Adding to iommu group 0 > [ 27.267535] hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as /dev/video0 > [ 27.270668] hantro-vpu fdee0000.video-codec: Adding to iommu group 1 > [ 27.272590] hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as /dev/video1 > [ 27.573417] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 > [ 27.574238] Mem abort info: > [ 27.574499] ESR = 0x0000000096000004 > [ 27.574836] EC = 0x25: DABT (current EL), IL = 32 bits > [ 27.575310] SET = 0, FnV = 0 > [ 27.575586] EA = 0, S1PTW = 0 > [ 27.575868] FSC = 0x04: level 0 translation fault > [ 27.576368] Data abort info: > [ 27.576637] ISV = 0, ISS = 0x00000004 > [ 27.576980] CM = 0, WnR = 0 > [ 27.577247] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010718b000 > [ 27.577818] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 > [ 27.578430] Internal error: Oops: 0000000096000004 [#1] SMP > [ 27.578934] Modules linked in: polyval_generic(E+) ghash_ce(E) gf128mul(E) snd_soc_hdmi_codec(E+) sha2_ce(E) ecdh_generic(E+) sha256_arm64(E) sha1_ce(E) rfkill(E) snd_soc_spdif_tx(E) ecc(E) leds_gpio(E) snd_soc_simple_card(E) snd_soc_simple_card_utils(E) display_connector(E) gpio_fan(E) hantro_vpu(E) v4l2_vp9(E) snd_soc_rockchip_i2s_tdm(E) v4l2_h264(E) videobuf2_dma_contig(E) snd_soc_rockchip_spdif(E) snd_soc_rk817(E) v4l2_mem2mem(E) videobuf2_memops(E) governor_simpleondemand(E) rockchip_thermal(E) videobuf2_v4l2(E) dw_wdt(E) snd_soc_core(E) dw_hdmi_i2s_audio(E) dw_hdmi_cec(E) videodev(E) snd_pcm_dmaengine(E) panfrost(E) videobuf2_common(E) snd_pcm(E) rk805_pwrkey(E) snd_timer(E) gpu_sched(E) snd(E) soundcore(E) mc(E) drm_shmem_helper(E) cpufreq_dt(E) loop(E) fuse(E) efi_pstore(E) dm_mod(E) dax(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) xhci_plat_hcd(E) xhci_hcd(E) motorcomm(E) rk808_regulator(E) fan53555(E) gpio_rockchip(E) dwmac_rk(E) stmmac_platform(E) > [ 27.579108] crct10dif_ce(E) stmmac(E) pcs_xpcs(E) crct10dif_common(E) phylink(E) fixed(E) of_mdio(E) pinctrl_rockchip(E) phy_rockchip_inno_usb2(E) fixed_phy(E) sdhci_of_dwcmshc(E) dw_mmc_rockchip(E) fwnode_mdio(E) sdhci_pltfm(E) phy_rockchip_naneng_combphy(E) dw_mmc_pltfm(E) sdhci(E) dw_mmc(E) pl330(E) libphy(E) rockchipdrm(E) ptp(E) drm_dma_helper(E) analogix_dp(E) dw_hdmi(E) cec(E) rc_core(E) drm_display_helper(E) dw_mipi_dsi(E) pps_core(E) drm_kms_helper(E) ohci_platform(E) ohci_hcd(E) ehci_platform(E) io_domain(E) ehci_hcd(E) dwc3(E) i2c_rk3x(E) udc_core(E) usbcore(E) roles(E) ulpi(E) drm(E) usb_common(E) > [ 27.591758] CPU: 1 PID: 407 Comm: v4l_id Tainted: G E 6.4.0-0-pine64-arm64 #1 Debian 6.4~rc2-1~pine64 > [ 27.592705] Hardware name: Pine64 RK3566 Quartz64-A Board (DT) > [ 27.593223] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 27.593843] pc : hantro_try_fmt+0xb8/0x290 [hantro_vpu] > [ 27.594336] lr : hantro_try_fmt+0xac/0x290 [hantro_vpu] > [ 27.594811] sp : ffff80000aa9b670 > [ 27.595107] x29: ffff80000aa9b670 x28: ffff80000aa9bb60 x27: 0000000000000000 > [ 27.595746] x26: 0000000000000000 x25: ffff000108380008 x24: ffff8000014ebac0 > [ 27.596383] x23: 0000000000000001 x22: ffff000108380000 x21: 0000000000000000 > [ 27.597019] x20: ffff8000014f10f0 x19: ffff80000aa9b708 x18: 0000000000000010 > [ 27.597657] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000aa9b6a0 > [ 27.598292] x14: ffff0001043fe280 x13: 0000000000000001 x12: 0000000000000001 > [ 27.598927] x11: 0000000000000002 x10: 0000000000000003 x9 : 0000000000000002 > [ 27.599563] x8 : 000000000000001f x7 : 000000000000005f x6 : 0000000000000003 > [ 27.600199] x5 : ffff8000013e2583 x4 : ffff8000013e2580 x3 : 00000000ffffffff > [ 27.600834] x2 : 0000000000000010 x1 : 0000000000000000 x0 : 0000000034363253 > [ 27.601470] Call trace: > [ 27.601693] hantro_try_fmt+0xb8/0x290 [hantro_vpu] > [ 27.602143] hantro_set_fmt_out+0x44/0x388 [hantro_vpu] > [ 27.602617] hantro_reset_raw_fmt+0x7c/0xe0 [hantro_vpu] > [ 27.603096] hantro_set_fmt_cap+0x29c/0x2b8 [hantro_vpu] > [ 27.603575] hantro_reset_encoded_fmt+0x80/0xc0 [hantro_vpu] > [ 27.604086] hantro_reset_fmts+0x20/0x48 [hantro_vpu] > [ 27.604545] hantro_open+0xe0/0x208 [hantro_vpu] > [ 27.604965] v4l2_open+0x84/0x130 [videodev] > [ 27.605389] chrdev_open+0xd8/0x2d8 > [ 27.605712] do_dentry_open+0x1bc/0x490 > [ 27.606060] vfs_open+0x34/0x40 > [ 27.606345] path_openat+0x9c8/0xf20 > [ 27.606670] do_filp_open+0xa4/0x160 > [ 27.606991] do_sys_openat2+0xc8/0x188 > [ 27.607327] __arm64_sys_openat+0x6c/0xb8 > [ 27.607687] invoke_syscall+0x78/0x108 > [ 27.608029] el0_svc_common.constprop.0+0xd4/0x100 > [ 27.608456] do_el0_svc+0x40/0xa8 > [ 27.608755] el0_svc+0x34/0xd8 > [ 27.609036] el0t_64_sync_handler+0xf4/0x120 > [ 27.609420] el0t_64_sync+0x190/0x198 > [ 27.609753] Code: 97fbc8e2 f94056c1 52864a60 72a686c0 (b9400821) > [ 27.610297] ---[ end trace 0000000000000000 ]---
On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote: > Le 20/05/2023 à 00:34, Diederik de Haas a écrit : > > On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote: > >> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit : > >>> Benjamin, > >>> > >>> Please include the crash stracktrace in the commit. > >> > >> Careful with HTML message, they don't always make it in these ML and > >> tooling might not play well with the tooling. Perhaps it can be edited > >> while pulling ? Here's the info from Marek's bug report: > >> > >> hantro-vpu fdea0000.video-codec: Adding to iommu group 0 > >> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as > >> /dev/video0 > >> hantro-vpu fdee0000.video-codec: Adding to iommu group 1 > >> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as > >> /dev/video1 > >> Unable to handle kernel NULL pointer dereference at virtual address > >> 0000000000000008 > >> Mem abort info: > >> ESR = 0x0000000096000004 > >> EC = 0x25: DABT (current EL), IL = 32 bits > >> SET = 0, FnV = 0 > >> EA = 0, S1PTW = 0 > >> FSC = 0x04: level 0 translation fault > >> Data abort info: > >> ISV = 0, ISS = 0x00000004 > >> CM = 0, WnR = 0 > >> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000 > >> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 > >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > >> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem > >> videobuf2_dma_contig snd_soc_simple_card display_connector > >> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk > >> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc > >> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs > >> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper > >> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables > >> x_tables ipv6 > >> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478 > >> Hardware name: Hardkernel ODROID-M1 (DT) > >> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu] > >> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu] > >> ... > >> Call trace: > >> hantro_try_fmt+0xb4/0x280 [hantro_vpu] > >> hantro_set_fmt_out+0x3c/0x278 [hantro_vpu] > >> hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu] > >> hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu] > >> hantro_reset_fmts+0x94/0xcc [hantro_vpu] > >> hantro_open+0xd4/0x20c [hantro_vpu] > >> v4l2_open+0x80/0x120 [videodev] > >> chrdev_open+0xc0/0x22c > >> do_dentry_open+0x13c/0x490 > >> vfs_open+0x2c/0x38 > >> path_openat+0x550/0x938 > >> do_filp_open+0x80/0x12c > >> do_sys_openat2+0xb4/0x16c > >> __arm64_sys_openat+0x64/0xac > >> invoke_syscall+0x48/0x114 > >> el0_svc_common.constprop.0+0xfc/0x11c > >> do_el0_svc+0x38/0xa4 > >> el0_svc+0x48/0xb8 > >> el0t_64_sync_handler+0xb8/0xbc > >> el0t_64_sync+0x190/0x194 > >> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800) > >> ---[ end trace 0000000000000000 ]--- > > > > When I booted into my 6.4-rc1 (but also rc2) kernel on my > > Pine64 Quartz64 Model A, I noticed a crash which seems the same as > > above, but I didn't have such a crash with my 6.3 kernel. > > Searching for 'hantro' led me to this commit as the most likely culprit > > but when I build a new 6.4-rcX kernel with this commit reverted, > > I still had this crash. > > Do you have suggestions which commit would then be the likely culprit? > > This patch fix the crash at boot time, revert it doesn't seem to be the > solution. Maybe this proposal from Marek can help you ? > https://patchwork.kernel.org/project/linux-media/patch/20230421104759.223646 > 3-1-m.szyprowski@samsung.com/ That helped :) After applying that patch I no longer have the crash. Thanks! Regards, Diederik
[CCing the regression list, as it should be in the loop for regressions: https://docs.kernel.org/admin-guide/reporting-regressions.html] [TLDR: I'm adding this report to the list of tracked Linux kernel regressions; the text you find below is based on a few templates paragraphs you might have encountered already in similar form. See link in footer if these mails annoy you.] On 20.05.23 00:34, Diederik de Haas wrote: > > On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote: >> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit : > [....] > > When I booted into my 6.4-rc1 (but also rc2) kernel on my > Pine64 Quartz64 Model A, I noticed a crash which seems the same as > above, but I didn't have such a crash with my 6.3 kernel. > Searching for 'hantro' led me to this commit as the most likely culprit > but when I build a new 6.4-rcX kernel with this commit reverted, > I still had this crash. > Do you have suggestions which commit would then be the likely culprit? > [...] Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced db6f68b51e5c #regzbot title media: verisilicon: null pointer dereference in try_fmt #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail. Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
CCing the Regression list and a bunch of other people that were CCed in threads that look related: On 23.05.23 00:38, Diederik de Haas wrote: > On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote: >> Le 20/05/2023 à 00:34, Diederik de Haas a écrit : >>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote: > [...] >>> When I booted into my 6.4-rc1 (but also rc2) kernel on my >>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as >>> above, but I didn't have such a crash with my 6.3 kernel. >>> Searching for 'hantro' led me to this commit as the most likely culprit >>> but when I build a new 6.4-rcX kernel with this commit reverted, >>> I still had this crash. >>> Do you have suggestions which commit would then be the likely culprit? >> >> This patch fix the crash at boot time, revert it doesn't seem to be the >> solution. Maybe this proposal from Marek can help you ? >> >> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/ > > That helped :) After applying that patch I no longer have the crash. > Thanks! That regression fix is now a month old, but not yet merged afaics -- guess due to Nicolas comment that wasn't addressed yet and likely requires a updated patch. Michael afaics a week ago posted a patch that to my *very limited understanding of things* (I hope I don't confuse matters here!) seems to address the same problem, but slightly differently: https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/ No reply yet. That's all a bit unfortunate, as it's not how regression fixes should be dealt with -- and caused multiple people headaches that could have been avoided. :-/ But well, things happen. But it leads to the question: How can we finally address the issue quickly now to ensure is doesn't cause headaches for even more people? Marek, Michael, could you work on a patch together that we then get somewhat fast-tracked to Linus to avoid him getting even more unhappy about the state of things[1]? Ciao, Thorsten [1] see https://lore.kernel.org/all/CAHk-=wgzU8_dGn0Yg+DyX7ammTkDUCyEJ4C=NvnHRhxKWC7Wpw@mail.gmail.com/
On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote: > CCing the Regression list and a bunch of other people that were CCed in > threads that look related: Thanks! > > On 23.05.23 00:38, Diederik de Haas wrote: > > On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote: > >> Le 20/05/2023 à 00:34, Diederik de Haas a écrit : > >>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote: > > [...] > >>> When I booted into my 6.4-rc1 (but also rc2) kernel on my > >>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as > >>> above, but I didn't have such a crash with my 6.3 kernel. > >>> Searching for 'hantro' led me to this commit as the most likely culprit > >>> but when I build a new 6.4-rcX kernel with this commit reverted, > >>> I still had this crash. > >>> Do you have suggestions which commit would then be the likely culprit? > >> > >> This patch fix the crash at boot time, revert it doesn't seem to be the > >> solution. Maybe this proposal from Marek can help you ? > >> > >> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/ > > > > That helped :) After applying that patch I no longer have the crash. > > Thanks! > > That regression fix is now a month old, but not yet merged afaics -- > guess due to Nicolas comment that wasn't addressed yet and likely > requires a updated patch. I agree with Nicolas comment on that patch and it needs to be updated. > > Michael afaics a week ago posted a patch that to my *very limited > understanding of things* (I hope I don't confuse matters here!) seems to > address the same problem, but slightly differently: > https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/ Correct, my patch addresses the same problem. > > No reply yet. > > That's all a bit unfortunate, as it's not how regression fixes should be > dealt with -- and caused multiple people headaches that could have been > avoided. :-/ > > But well, things happen. But it leads to the question: > > How can we finally address the issue quickly now to ensure is doesn't > cause headaches for even more people? > > Marek, Michael, could you work on a patch together that we then get > somewhat fast-tracked to Linus to avoid him getting even more unhappy > about the state of things[1]? Marek, if you have an updated patch, I will happily test and review it. Otherwise, please take a look at my patch. Michael
On 23.05.2023 16:54, Michael Tretter wrote: > On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote: >> CCing the Regression list and a bunch of other people that were CCed in >> threads that look related: > Thanks! > >> On 23.05.23 00:38, Diederik de Haas wrote: >>> On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote: >>>> Le 20/05/2023 à 00:34, Diederik de Haas a écrit : >>>>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote: >>> [...] >>>>> When I booted into my 6.4-rc1 (but also rc2) kernel on my >>>>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as >>>>> above, but I didn't have such a crash with my 6.3 kernel. >>>>> Searching for 'hantro' led me to this commit as the most likely culprit >>>>> but when I build a new 6.4-rcX kernel with this commit reverted, >>>>> I still had this crash. >>>>> Do you have suggestions which commit would then be the likely culprit? >>>> This patch fix the crash at boot time, revert it doesn't seem to be the >>>> solution. Maybe this proposal from Marek can help you ? >>>> >>>> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/ >>> That helped :) After applying that patch I no longer have the crash. >>> Thanks! >> That regression fix is now a month old, but not yet merged afaics -- >> guess due to Nicolas comment that wasn't addressed yet and likely >> requires a updated patch. > I agree with Nicolas comment on that patch and it needs to be updated. > >> Michael afaics a week ago posted a patch that to my *very limited >> understanding of things* (I hope I don't confuse matters here!) seems to >> address the same problem, but slightly differently: >> https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/ > Correct, my patch addresses the same problem. > >> No reply yet. >> >> That's all a bit unfortunate, as it's not how regression fixes should be >> dealt with -- and caused multiple people headaches that could have been >> avoided. :-/ >> >> But well, things happen. But it leads to the question: >> >> How can we finally address the issue quickly now to ensure is doesn't >> cause headaches for even more people? >> >> Marek, Michael, could you work on a patch together that we then get >> somewhat fast-tracked to Linus to avoid him getting even more unhappy >> about the state of things[1]? > Marek, if you have an updated patch, I will happily test and review it. > Otherwise, please take a look at my patch. Go ahead with your, my was just a blind try eliminating the oops during driver probe. Best regards
Hi Micheal, Le mardi 23 mai 2023 à 16:54 +0200, Michael Tretter a écrit : > On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote: > > CCing the Regression list and a bunch of other people that were CCed in > > threads that look related: > > Thanks! > > > > > On 23.05.23 00:38, Diederik de Haas wrote: > > > On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote: > > > > Le 20/05/2023 à 00:34, Diederik de Haas a écrit : > > > > > On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote: > > > [...] > > > > > When I booted into my 6.4-rc1 (but also rc2) kernel on my > > > > > Pine64 Quartz64 Model A, I noticed a crash which seems the same as > > > > > above, but I didn't have such a crash with my 6.3 kernel. > > > > > Searching for 'hantro' led me to this commit as the most likely culprit > > > > > but when I build a new 6.4-rcX kernel with this commit reverted, > > > > > I still had this crash. > > > > > Do you have suggestions which commit would then be the likely culprit? > > > > > > > > This patch fix the crash at boot time, revert it doesn't seem to be the > > > > solution. Maybe this proposal from Marek can help you ? > > > > > > > > https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/ > > > > > > That helped :) After applying that patch I no longer have the crash. > > > Thanks! > > > > That regression fix is now a month old, but not yet merged afaics -- > > guess due to Nicolas comment that wasn't addressed yet and likely > > requires a updated patch. > > I agree with Nicolas comment on that patch and it needs to be updated. > > > > > Michael afaics a week ago posted a patch that to my *very limited > > understanding of things* (I hope I don't confuse matters here!) seems to > > address the same problem, but slightly differently: > > https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/ > > Correct, my patch addresses the same problem. Sorry, just got really busy and missed the second fix. From a hot fix stand point, your patch seems a lot safer. It does not go as far, and probably does not make the driver better, but considering we had such a slow response we need to do something about it. Ezequiel, will you be fine with the approach ? > > > > > No reply yet. > > > > That's all a bit unfortunate, as it's not how regression fixes should be > > dealt with -- and caused multiple people headaches that could have been > > avoided. :-/ > > > > But well, things happen. But it leads to the question: > > > > How can we finally address the issue quickly now to ensure is doesn't > > cause headaches for even more people? > > > > Marek, Michael, could you work on a patch together that we then get > > somewhat fast-tracked to Linus to avoid him getting even more unhappy > > about the state of things[1]? > > Marek, if you have an updated patch, I will happily test and review it. > Otherwise, please take a look at my patch. > > Michael >
Hi Benjamin, On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote: > ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt() > so assigne it to vpu_fmt led to crash the kernel. > Like for decoder case use 'fmt' as format for encoder and clean up > the code. > > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions") > --- > version 2: > - Remove useless vpu_fmt. > > drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c > index 8f1414085f47..d71f79471396 100644 > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c > @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > struct v4l2_pix_format_mplane *pix_mp, > enum v4l2_buf_type type) > { > - const struct hantro_fmt *fmt, *vpu_fmt; > + const struct hantro_fmt *fmt; > bool capture = V4L2_TYPE_IS_CAPTURE(type); > bool coded; > > @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > > if (coded) { > pix_mp->num_planes = 1; > - vpu_fmt = fmt; > - } else if (ctx->is_encoder) { > - vpu_fmt = ctx->vpu_dst_fmt; > - } else { > - vpu_fmt = fmt; > + } else if (!ctx->is_encoder) { > /* > * Width/height on the CAPTURE end of a decoder are ignored and > * replaced by the OUTPUT ones. > @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > pix_mp->field = V4L2_FIELD_NONE; > > v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height, > - &vpu_fmt->frmsize); > + &fmt->frmsize); This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an encoder will be a raw format. This results in width and height to be clamped to 0. I think the correct fix would be to apply the frmsize constraints of the currently configured coded format, but as ctx->vpu_dst_fmt is not initialized before calling this code, I don't know how to get the coded format. Michael > > if (!coded) { > /* Fill remaining fields */ > -- > 2.34.1 >
Le 01/06/2023 à 15:27, Michael Tretter a écrit : > Hi Benjamin, > > On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote: >> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt() >> so assigne it to vpu_fmt led to crash the kernel. >> Like for decoder case use 'fmt' as format for encoder and clean up >> the code. >> >> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> >> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> >> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions") >> --- >> version 2: >> - Remove useless vpu_fmt. >> >> drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++------- >> 1 file changed, 3 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c >> index 8f1414085f47..d71f79471396 100644 >> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c >> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c >> @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, >> struct v4l2_pix_format_mplane *pix_mp, >> enum v4l2_buf_type type) >> { >> - const struct hantro_fmt *fmt, *vpu_fmt; >> + const struct hantro_fmt *fmt; >> bool capture = V4L2_TYPE_IS_CAPTURE(type); >> bool coded; >> >> @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, >> >> if (coded) { >> pix_mp->num_planes = 1; >> - vpu_fmt = fmt; >> - } else if (ctx->is_encoder) { >> - vpu_fmt = ctx->vpu_dst_fmt; >> - } else { >> - vpu_fmt = fmt; >> + } else if (!ctx->is_encoder) { >> /* >> * Width/height on the CAPTURE end of a decoder are ignored and >> * replaced by the OUTPUT ones. >> @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, >> pix_mp->field = V4L2_FIELD_NONE; >> >> v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height, >> - &vpu_fmt->frmsize); >> + &fmt->frmsize); > This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is > only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an > encoder will be a raw format. This results in width and height to be clamped > to 0. > > I think the correct fix would be to apply the frmsize constraints of the > currently configured coded format, but as ctx->vpu_dst_fmt is not initialized > before calling this code, I don't know how to get the coded format. if ctx->dst_fmt is correctly set (and it should be) then doing: pix_mp->width = ctx->dst_fmt.width; pix_mp->height = ctx->dst_fmt.height; should solve the issue. Benjamin > > Michael > >> >> if (!coded) { >> /* Fill remaining fields */ >> -- >> 2.34.1 >>
On Thu, 01 Jun 2023 16:08:13 +0200, Benjamin Gaignard wrote: > Le 01/06/2023 à 15:27, Michael Tretter a écrit : > > On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote: > > > ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt() > > > so assigne it to vpu_fmt led to crash the kernel. > > > Like for decoder case use 'fmt' as format for encoder and clean up > > > the code. > > > > > > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> > > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> > > > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions") > > > --- > > > version 2: > > > - Remove useless vpu_fmt. > > > > > > drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++------- > > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c > > > index 8f1414085f47..d71f79471396 100644 > > > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c > > > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c > > > @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > > > struct v4l2_pix_format_mplane *pix_mp, > > > enum v4l2_buf_type type) > > > { > > > - const struct hantro_fmt *fmt, *vpu_fmt; > > > + const struct hantro_fmt *fmt; > > > bool capture = V4L2_TYPE_IS_CAPTURE(type); > > > bool coded; > > > @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > > > if (coded) { > > > pix_mp->num_planes = 1; > > > - vpu_fmt = fmt; > > > - } else if (ctx->is_encoder) { > > > - vpu_fmt = ctx->vpu_dst_fmt; > > > - } else { > > > - vpu_fmt = fmt; > > > + } else if (!ctx->is_encoder) { > > > /* > > > * Width/height on the CAPTURE end of a decoder are ignored and > > > * replaced by the OUTPUT ones. > > > @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, > > > pix_mp->field = V4L2_FIELD_NONE; > > > v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height, > > > - &vpu_fmt->frmsize); > > > + &fmt->frmsize); > > This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is > > only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an > > encoder will be a raw format. This results in width and height to be clamped > > to 0. > > > > I think the correct fix would be to apply the frmsize constraints of the > > currently configured coded format, but as ctx->vpu_dst_fmt is not initialized > > before calling this code, I don't know how to get the coded format. > > if ctx->dst_fmt is correctly set (and it should be) then doing: > > pix_mp->width = ctx->dst_fmt.width; > pix_mp->height = ctx->dst_fmt.height; > > should solve the issue. Using the width and height of dst_fmt for OUTPUT is not correct, since the v4l2 stateless encoder spec dictates that the size is set on the OUTPUT device and will be ignored on the CAPTURE device. I sent a patch [0] to apply the frame constraints using dst_fmt. Michael [0] https://lore.kernel.org/linux-media/20230706071510.1717684-1-m.tretter@pengutronix.de/ > > > if (!coded) { > > > /* Fill remaining fields */ > > > -- > > > 2.34.1 > > > >
diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c index 8f1414085f47..d71f79471396 100644 --- a/drivers/media/platform/verisilicon/hantro_v4l2.c +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, struct v4l2_pix_format_mplane *pix_mp, enum v4l2_buf_type type) { - const struct hantro_fmt *fmt, *vpu_fmt; + const struct hantro_fmt *fmt; bool capture = V4L2_TYPE_IS_CAPTURE(type); bool coded; @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, if (coded) { pix_mp->num_planes = 1; - vpu_fmt = fmt; - } else if (ctx->is_encoder) { - vpu_fmt = ctx->vpu_dst_fmt; - } else { - vpu_fmt = fmt; + } else if (!ctx->is_encoder) { /* * Width/height on the CAPTURE end of a decoder are ignored and * replaced by the OUTPUT ones. @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx, pix_mp->field = V4L2_FIELD_NONE; v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height, - &vpu_fmt->frmsize); + &fmt->frmsize); if (!coded) { /* Fill remaining fields */