Message ID | 20230824013858.303105-1-marex@denx.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | media: hantro: Check whether reset op is defined before use | expand |
On Thu, Aug 24, 2023 at 9:39 AM Marek Vasut <marex@denx.de> wrote: > > The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > done by genpd. Check whether the .reset op is defined before calling it > to avoid NULL pointer dereference. > > Note that the Fixes tag is set to the commit which removed the reset op > from i.MX8M Hantro G2 implementation, this is because before this commit > all the implementations did define the .reset op. > > Fixes: 6971efb70ac3 ("media: hantro: Allow i.MX8MQ G1 and G2 to run independently") > Signed-off-by: Marek Vasut <marex@denx.de> Had the same change in my local tree, so Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> > --- > Cc: Adam Ford <aford173@gmail.com> > Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com> > Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org> > Cc: Philipp Zabel <p.zabel@pengutronix.de> > Cc: linux-media@vger.kernel.org > Cc: linux-rockchip@lists.infradead.org > --- > drivers/media/platform/verisilicon/hantro_drv.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/media/platform/verisilicon/hantro_drv.c b/drivers/media/platform/verisilicon/hantro_drv.c > index 423fc85d79ee3..50ec24c753e9e 100644 > --- a/drivers/media/platform/verisilicon/hantro_drv.c > +++ b/drivers/media/platform/verisilicon/hantro_drv.c > @@ -125,7 +125,8 @@ void hantro_watchdog(struct work_struct *work) > ctx = v4l2_m2m_get_curr_priv(vpu->m2m_dev); > if (ctx) { > vpu_err("frame processing timed out!\n"); > - ctx->codec_ops->reset(ctx); > + if (ctx->codec_ops->reset) > + ctx->codec_ops->reset(ctx); > hantro_job_finish(vpu, ctx, VB2_BUF_STATE_ERROR); > } > } > -- > 2.40.1 > > > _______________________________________________ > Linux-rockchip mailing list > Linux-rockchip@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-rockchip
On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > > The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > done by genpd. Check whether the .reset op is defined before calling it > to avoid NULL pointer dereference. > > Note that the Fixes tag is set to the commit which removed the reset op > from i.MX8M Hantro G2 implementation, this is because before this commit > all the implementations did define the .reset op. I am surprised I didn't have issues when I was testing the 8MQ and 8MM, but this makes sense. > > Fixes: 6971efb70ac3 ("media: hantro: Allow i.MX8MQ G1 and G2 to run independently") > Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Adam Ford <aford173@gmail.com> > --- > Cc: Adam Ford <aford173@gmail.com> > Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com> > Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org> > Cc: Philipp Zabel <p.zabel@pengutronix.de> > Cc: linux-media@vger.kernel.org > Cc: linux-rockchip@lists.infradead.org > --- > drivers/media/platform/verisilicon/hantro_drv.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/media/platform/verisilicon/hantro_drv.c b/drivers/media/platform/verisilicon/hantro_drv.c > index 423fc85d79ee3..50ec24c753e9e 100644 > --- a/drivers/media/platform/verisilicon/hantro_drv.c > +++ b/drivers/media/platform/verisilicon/hantro_drv.c > @@ -125,7 +125,8 @@ void hantro_watchdog(struct work_struct *work) > ctx = v4l2_m2m_get_curr_priv(vpu->m2m_dev); > if (ctx) { > vpu_err("frame processing timed out!\n"); > - ctx->codec_ops->reset(ctx); > + if (ctx->codec_ops->reset) > + ctx->codec_ops->reset(ctx); > hantro_job_finish(vpu, ctx, VB2_BUF_STATE_ERROR); > } > } > -- > 2.40.1 >
On 8/24/23 12:39, Adam Ford wrote: > On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: >> >> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is >> done by genpd. Check whether the .reset op is defined before calling it >> to avoid NULL pointer dereference. >> >> Note that the Fixes tag is set to the commit which removed the reset op >> from i.MX8M Hantro G2 implementation, this is because before this commit >> all the implementations did define the .reset op. > > I am surprised I didn't have issues when I was testing the 8MQ and > 8MM, but this makes sense. You need to trigger the VPU watchdog to trigger the crash, that means you have to get the VPU into some weird state where it fails to decode frame. Then it triggers the reset and ... boom. See this patch, that contains a gstreamer invocation to generate such trigger condition input data: [PATCH] media: verisilicon: Do not enable G2 postproc downscale if source is narrower than destination " To generate input test data to trigger this bug, use e.g.: $ gst-launch-1.0 videotestsrc ! video/x-raw,width=272,height=256,format=I420 ! \ vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 To trigger the bug upon decoding (note that the NV12 must be forced, as that assures the output data would pass the G2 postproc): $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! vp9parse ! \ v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert ! fbdevsink "
On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: > > On 8/24/23 12:39, Adam Ford wrote: > > On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > >> > >> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > >> done by genpd. Check whether the .reset op is defined before calling it > >> to avoid NULL pointer dereference. > >> > >> Note that the Fixes tag is set to the commit which removed the reset op > >> from i.MX8M Hantro G2 implementation, this is because before this commit > >> all the implementations did define the .reset op. > > > > I am surprised I didn't have issues when I was testing the 8MQ and > > 8MM, but this makes sense. > > You need to trigger the VPU watchdog to trigger the crash, that means > you have to get the VPU into some weird state where it fails to decode > frame. Then it triggers the reset and ... boom. > > See this patch, that contains a gstreamer invocation to generate such > trigger condition input data: > > [PATCH] media: verisilicon: Do not enable G2 postproc downscale if > source is narrower than destination > > " > To generate input test data to trigger this bug, use e.g.: > $ gst-launch-1.0 videotestsrc ! > video/x-raw,width=272,height=256,format=I420 ! \ > vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 > To trigger the bug upon decoding (note that the NV12 must be forced, as > that assures the output data would pass the G2 postproc): > $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! > vp9parse ! \ > v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert > ! fbdevsink > " Does it completely recover afterwards? In my previous trials the hardware ended up in some bizzare state: while decoding succeeds, the output's md5sum didn't match up. ChenYu
On 8/25/23 09:09, Chen-Yu Tsai wrote: > On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: >> >> On 8/24/23 12:39, Adam Ford wrote: >>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: >>>> >>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is >>>> done by genpd. Check whether the .reset op is defined before calling it >>>> to avoid NULL pointer dereference. >>>> >>>> Note that the Fixes tag is set to the commit which removed the reset op >>>> from i.MX8M Hantro G2 implementation, this is because before this commit >>>> all the implementations did define the .reset op. >>> >>> I am surprised I didn't have issues when I was testing the 8MQ and >>> 8MM, but this makes sense. >> >> You need to trigger the VPU watchdog to trigger the crash, that means >> you have to get the VPU into some weird state where it fails to decode >> frame. Then it triggers the reset and ... boom. >> >> See this patch, that contains a gstreamer invocation to generate such >> trigger condition input data: >> >> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if >> source is narrower than destination >> >> " >> To generate input test data to trigger this bug, use e.g.: >> $ gst-launch-1.0 videotestsrc ! >> video/x-raw,width=272,height=256,format=I420 ! \ >> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 >> To trigger the bug upon decoding (note that the NV12 must be forced, as >> that assures the output data would pass the G2 postproc): >> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! >> vp9parse ! \ >> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert >> ! fbdevsink >> " > > Does it completely recover afterwards? In my previous trials the hardware > ended up in some bizzare state: while decoding succeeds, the output's md5sum > didn't match up. Have you got a testcase that triggers this, one I can try ? I am not entirely sure whether this is happening here as well or not, but I can imagine that the power domain went down and back up between tests, so the VPU would be power cycled (and therefore reset) that way. So, I think it is worth testing that.
On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: > > On 8/25/23 09:09, Chen-Yu Tsai wrote: > > On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: > >> > >> On 8/24/23 12:39, Adam Ford wrote: > >>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > >>>> > >>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > >>>> done by genpd. Check whether the .reset op is defined before calling it > >>>> to avoid NULL pointer dereference. > >>>> > >>>> Note that the Fixes tag is set to the commit which removed the reset op > >>>> from i.MX8M Hantro G2 implementation, this is because before this commit > >>>> all the implementations did define the .reset op. > >>> > >>> I am surprised I didn't have issues when I was testing the 8MQ and > >>> 8MM, but this makes sense. > >> > >> You need to trigger the VPU watchdog to trigger the crash, that means > >> you have to get the VPU into some weird state where it fails to decode > >> frame. Then it triggers the reset and ... boom. > >> > >> See this patch, that contains a gstreamer invocation to generate such > >> trigger condition input data: > >> > >> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if > >> source is narrower than destination > >> > >> " > >> To generate input test data to trigger this bug, use e.g.: > >> $ gst-launch-1.0 videotestsrc ! > >> video/x-raw,width=272,height=256,format=I420 ! \ > >> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 > >> To trigger the bug upon decoding (note that the NV12 must be forced, as > >> that assures the output data would pass the G2 postproc): > >> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! > >> vp9parse ! \ > >> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert > >> ! fbdevsink > >> " > > > > Does it completely recover afterwards? In my previous trials the hardware > > ended up in some bizzare state: while decoding succeeds, the output's md5sum > > didn't match up. > > Have you got a testcase that triggers this, one I can try ? > > I am not entirely sure whether this is happening here as well or not, > but I can imagine that the power domain went down and back up between > tests, so the VPU would be power cycled (and therefore reset) that way. > So, I think it is worth testing that. This was last year while I was writing HEVC decoding code for Chromium. IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test suite does cause our stack to crash, but Gstreamer seemed to handle it OK. It could be that the Chromium decoder stack is passing bad values to the decoder. ChenYu
On 8/25/23 10:52, Chen-Yu Tsai wrote: > On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: >> >> On 8/25/23 09:09, Chen-Yu Tsai wrote: >>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: >>>> >>>> On 8/24/23 12:39, Adam Ford wrote: >>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: >>>>>> >>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is >>>>>> done by genpd. Check whether the .reset op is defined before calling it >>>>>> to avoid NULL pointer dereference. >>>>>> >>>>>> Note that the Fixes tag is set to the commit which removed the reset op >>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit >>>>>> all the implementations did define the .reset op. >>>>> >>>>> I am surprised I didn't have issues when I was testing the 8MQ and >>>>> 8MM, but this makes sense. >>>> >>>> You need to trigger the VPU watchdog to trigger the crash, that means >>>> you have to get the VPU into some weird state where it fails to decode >>>> frame. Then it triggers the reset and ... boom. >>>> >>>> See this patch, that contains a gstreamer invocation to generate such >>>> trigger condition input data: >>>> >>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if >>>> source is narrower than destination >>>> >>>> " >>>> To generate input test data to trigger this bug, use e.g.: >>>> $ gst-launch-1.0 videotestsrc ! >>>> video/x-raw,width=272,height=256,format=I420 ! \ >>>> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 >>>> To trigger the bug upon decoding (note that the NV12 must be forced, as >>>> that assures the output data would pass the G2 postproc): >>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! >>>> vp9parse ! \ >>>> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert >>>> ! fbdevsink >>>> " >>> >>> Does it completely recover afterwards? In my previous trials the hardware >>> ended up in some bizzare state: while decoding succeeds, the output's md5sum >>> didn't match up. >> >> Have you got a testcase that triggers this, one I can try ? >> >> I am not entirely sure whether this is happening here as well or not, >> but I can imagine that the power domain went down and back up between >> tests, so the VPU would be power cycled (and therefore reset) that way. >> So, I think it is worth testing that. > > This was last year while I was writing HEVC decoding code for Chromium. > IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test > suite does cause our stack to crash, but Gstreamer seemed to handle it > OK. It could be that the Chromium decoder stack is passing bad values to > the decoder. That can be easily tested with ftrace enabled. I was just tracking down an issue with gstreamer and added the following patch to the hantro driver. Then: echo > /sys/kernel/debug/tracing/trace <run fail test> cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace echo > /sys/kernel/debug/tracing/trace <run pass test> cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace # remove time stamps etc. diff /tmp/{fail,pass}.trace You should see whether some register programming differs between gstreamer and chromium. diff --git a/drivers/media/platform/verisilicon/hantro.h b/drivers/media/platform/verisilicon/hantro.h index dea35a501ba30..529f1ab478ec8 100644 --- a/drivers/media/platform/verisilicon/hantro.h +++ b/drivers/media/platform/verisilicon/hantro.h @@ -353,8 +353,7 @@ extern int hantro_debug; #define vpu_debug(level, fmt, args...) \ do { \ - if (hantro_debug & BIT(level)) \ - pr_info("%s:%d: " fmt, \ + trace_printk("%s:%d: " fmt, \ __func__, __LINE__, ##args); \ } while (0)
On Sun, Aug 27, 2023 at 5:44 AM Marek Vasut <marex@denx.de> wrote: > > On 8/25/23 10:52, Chen-Yu Tsai wrote: > > On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: > >> > >> On 8/25/23 09:09, Chen-Yu Tsai wrote: > >>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: > >>>> > >>>> On 8/24/23 12:39, Adam Ford wrote: > >>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > >>>>>> > >>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > >>>>>> done by genpd. Check whether the .reset op is defined before calling it > >>>>>> to avoid NULL pointer dereference. > >>>>>> > >>>>>> Note that the Fixes tag is set to the commit which removed the reset op > >>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit > >>>>>> all the implementations did define the .reset op. > >>>>> > >>>>> I am surprised I didn't have issues when I was testing the 8MQ and > >>>>> 8MM, but this makes sense. > >>>> > >>>> You need to trigger the VPU watchdog to trigger the crash, that means > >>>> you have to get the VPU into some weird state where it fails to decode > >>>> frame. Then it triggers the reset and ... boom. > >>>> > >>>> See this patch, that contains a gstreamer invocation to generate such > >>>> trigger condition input data: > >>>> > >>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if > >>>> source is narrower than destination > >>>> > >>>> " > >>>> To generate input test data to trigger this bug, use e.g.: > >>>> $ gst-launch-1.0 videotestsrc ! > >>>> video/x-raw,width=272,height=256,format=I420 ! \ > >>>> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 > >>>> To trigger the bug upon decoding (note that the NV12 must be forced, as > >>>> that assures the output data would pass the G2 postproc): > >>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! > >>>> vp9parse ! \ > >>>> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert > >>>> ! fbdevsink > >>>> " > >>> > >>> Does it completely recover afterwards? In my previous trials the hardware > >>> ended up in some bizzare state: while decoding succeeds, the output's md5sum > >>> didn't match up. > >> > >> Have you got a testcase that triggers this, one I can try ? > >> > >> I am not entirely sure whether this is happening here as well or not, > >> but I can imagine that the power domain went down and back up between > >> tests, so the VPU would be power cycled (and therefore reset) that way. > >> So, I think it is worth testing that. > > > > This was last year while I was writing HEVC decoding code for Chromium. > > IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test > > suite does cause our stack to crash, but Gstreamer seemed to handle it > > OK. It could be that the Chromium decoder stack is passing bad values to > > the decoder. > > That can be easily tested with ftrace enabled. I was just tracking down > an issue with gstreamer and added the following patch to the hantro > driver. Then: > > echo > /sys/kernel/debug/tracing/trace > <run fail test> > cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace > echo > /sys/kernel/debug/tracing/trace > <run pass test> > cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace > # remove time stamps etc. > diff /tmp/{fail,pass}.trace > > You should see whether some register programming differs between > gstreamer and chromium. I ended up using VISL to compare the controls set. I did find a bug. It still hard hangs after a couple frames, so I guess I'd need to use your method, but do printk instead. BTW, I wonder if we shouldn't add a reset op, if only just to stop the hardware? That is, do the same two register writes as in the Hantro G2 interrupt handler. ChenYu > diff --git a/drivers/media/platform/verisilicon/hantro.h > b/drivers/media/platform/verisilicon/hantro.h > index dea35a501ba30..529f1ab478ec8 100644 > --- a/drivers/media/platform/verisilicon/hantro.h > +++ b/drivers/media/platform/verisilicon/hantro.h > @@ -353,8 +353,7 @@ extern int hantro_debug; > > #define vpu_debug(level, fmt, args...) \ > do { \ > - if (hantro_debug & BIT(level)) \ > - pr_info("%s:%d: " fmt, \ > + trace_printk("%s:%d: " fmt, \ > __func__, __LINE__, ##args); \ > } while (0)
On 8/30/23 05:38, Chen-Yu Tsai wrote: > On Sun, Aug 27, 2023 at 5:44 AM Marek Vasut <marex@denx.de> wrote: >> >> On 8/25/23 10:52, Chen-Yu Tsai wrote: >>> On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: >>>> >>>> On 8/25/23 09:09, Chen-Yu Tsai wrote: >>>>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: >>>>>> >>>>>> On 8/24/23 12:39, Adam Ford wrote: >>>>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: >>>>>>>> >>>>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is >>>>>>>> done by genpd. Check whether the .reset op is defined before calling it >>>>>>>> to avoid NULL pointer dereference. >>>>>>>> >>>>>>>> Note that the Fixes tag is set to the commit which removed the reset op >>>>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit >>>>>>>> all the implementations did define the .reset op. >>>>>>> >>>>>>> I am surprised I didn't have issues when I was testing the 8MQ and >>>>>>> 8MM, but this makes sense. >>>>>> >>>>>> You need to trigger the VPU watchdog to trigger the crash, that means >>>>>> you have to get the VPU into some weird state where it fails to decode >>>>>> frame. Then it triggers the reset and ... boom. >>>>>> >>>>>> See this patch, that contains a gstreamer invocation to generate such >>>>>> trigger condition input data: >>>>>> >>>>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if >>>>>> source is narrower than destination >>>>>> >>>>>> " >>>>>> To generate input test data to trigger this bug, use e.g.: >>>>>> $ gst-launch-1.0 videotestsrc ! >>>>>> video/x-raw,width=272,height=256,format=I420 ! \ >>>>>> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 >>>>>> To trigger the bug upon decoding (note that the NV12 must be forced, as >>>>>> that assures the output data would pass the G2 postproc): >>>>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! >>>>>> vp9parse ! \ >>>>>> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert >>>>>> ! fbdevsink >>>>>> " >>>>> >>>>> Does it completely recover afterwards? In my previous trials the hardware >>>>> ended up in some bizzare state: while decoding succeeds, the output's md5sum >>>>> didn't match up. >>>> >>>> Have you got a testcase that triggers this, one I can try ? >>>> >>>> I am not entirely sure whether this is happening here as well or not, >>>> but I can imagine that the power domain went down and back up between >>>> tests, so the VPU would be power cycled (and therefore reset) that way. >>>> So, I think it is worth testing that. >>> >>> This was last year while I was writing HEVC decoding code for Chromium. >>> IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test >>> suite does cause our stack to crash, but Gstreamer seemed to handle it >>> OK. It could be that the Chromium decoder stack is passing bad values to >>> the decoder. >> >> That can be easily tested with ftrace enabled. I was just tracking down >> an issue with gstreamer and added the following patch to the hantro >> driver. Then: >> >> echo > /sys/kernel/debug/tracing/trace >> <run fail test> >> cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace >> echo > /sys/kernel/debug/tracing/trace >> <run pass test> >> cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace >> # remove time stamps etc. >> diff /tmp/{fail,pass}.trace >> >> You should see whether some register programming differs between >> gstreamer and chromium. > > I ended up using VISL to compare the controls set. I did find a bug. > It still hard hangs after a couple frames, so I guess I'd need to use > your method, but do printk instead. > > BTW, I wonder if we shouldn't add a reset op, if only just to stop the > hardware? That is, do the same two register writes as in the Hantro G2 > interrupt handler. You mean these two ? 38 vdpu_write(vpu, 0, G2_REG_INTERRUPT); 39 vdpu_write(vpu, G2_REG_CONFIG_DEC_CLK_GATE_E, G2_REG_CONFIG); As far as I understand this, that only clears IRQ and gates the clock off, but it doesn't reset the IP state, does it ?
On Thu, Aug 31, 2023 at 3:13 AM Marek Vasut <marex@denx.de> wrote: > > On 8/30/23 05:38, Chen-Yu Tsai wrote: > > On Sun, Aug 27, 2023 at 5:44 AM Marek Vasut <marex@denx.de> wrote: > >> > >> On 8/25/23 10:52, Chen-Yu Tsai wrote: > >>> On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: > >>>> > >>>> On 8/25/23 09:09, Chen-Yu Tsai wrote: > >>>>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: > >>>>>> > >>>>>> On 8/24/23 12:39, Adam Ford wrote: > >>>>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > >>>>>>>> > >>>>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > >>>>>>>> done by genpd. Check whether the .reset op is defined before calling it > >>>>>>>> to avoid NULL pointer dereference. > >>>>>>>> > >>>>>>>> Note that the Fixes tag is set to the commit which removed the reset op > >>>>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit > >>>>>>>> all the implementations did define the .reset op. > >>>>>>> > >>>>>>> I am surprised I didn't have issues when I was testing the 8MQ and > >>>>>>> 8MM, but this makes sense. > >>>>>> > >>>>>> You need to trigger the VPU watchdog to trigger the crash, that means > >>>>>> you have to get the VPU into some weird state where it fails to decode > >>>>>> frame. Then it triggers the reset and ... boom. > >>>>>> > >>>>>> See this patch, that contains a gstreamer invocation to generate such > >>>>>> trigger condition input data: > >>>>>> > >>>>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if > >>>>>> source is narrower than destination > >>>>>> > >>>>>> " > >>>>>> To generate input test data to trigger this bug, use e.g.: > >>>>>> $ gst-launch-1.0 videotestsrc ! > >>>>>> video/x-raw,width=272,height=256,format=I420 ! \ > >>>>>> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 > >>>>>> To trigger the bug upon decoding (note that the NV12 must be forced, as > >>>>>> that assures the output data would pass the G2 postproc): > >>>>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! > >>>>>> vp9parse ! \ > >>>>>> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert > >>>>>> ! fbdevsink > >>>>>> " > >>>>> > >>>>> Does it completely recover afterwards? In my previous trials the hardware > >>>>> ended up in some bizzare state: while decoding succeeds, the output's md5sum > >>>>> didn't match up. > >>>> > >>>> Have you got a testcase that triggers this, one I can try ? > >>>> > >>>> I am not entirely sure whether this is happening here as well or not, > >>>> but I can imagine that the power domain went down and back up between > >>>> tests, so the VPU would be power cycled (and therefore reset) that way. > >>>> So, I think it is worth testing that. > >>> > >>> This was last year while I was writing HEVC decoding code for Chromium. > >>> IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test > >>> suite does cause our stack to crash, but Gstreamer seemed to handle it > >>> OK. It could be that the Chromium decoder stack is passing bad values to > >>> the decoder. > >> > >> That can be easily tested with ftrace enabled. I was just tracking down > >> an issue with gstreamer and added the following patch to the hantro > >> driver. Then: > >> > >> echo > /sys/kernel/debug/tracing/trace > >> <run fail test> > >> cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace > >> echo > /sys/kernel/debug/tracing/trace > >> <run pass test> > >> cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace > >> # remove time stamps etc. > >> diff /tmp/{fail,pass}.trace > >> > >> You should see whether some register programming differs between > >> gstreamer and chromium. > > > > I ended up using VISL to compare the controls set. I did find a bug. > > It still hard hangs after a couple frames, so I guess I'd need to use > > your method, but do printk instead. > > > > BTW, I wonder if we shouldn't add a reset op, if only just to stop the > > hardware? That is, do the same two register writes as in the Hantro G2 > > interrupt handler. > > You mean these two ? > > 38 vdpu_write(vpu, 0, G2_REG_INTERRUPT); > 39 vdpu_write(vpu, G2_REG_CONFIG_DEC_CLK_GATE_E, G2_REG_CONFIG); Yes. > As far as I understand this, that only clears IRQ and gates the clock > off, but it doesn't reset the IP state, does it ? That's right, but it would stop the hardware from continuing to do whatever it is doing before it gets shut down through runtime PM. I'm not sure it would make that much of a difference, but it did seem like something that could be done. ChenYu
diff --git a/drivers/media/platform/verisilicon/hantro_drv.c b/drivers/media/platform/verisilicon/hantro_drv.c index 423fc85d79ee3..50ec24c753e9e 100644 --- a/drivers/media/platform/verisilicon/hantro_drv.c +++ b/drivers/media/platform/verisilicon/hantro_drv.c @@ -125,7 +125,8 @@ void hantro_watchdog(struct work_struct *work) ctx = v4l2_m2m_get_curr_priv(vpu->m2m_dev); if (ctx) { vpu_err("frame processing timed out!\n"); - ctx->codec_ops->reset(ctx); + if (ctx->codec_ops->reset) + ctx->codec_ops->reset(ctx); hantro_job_finish(vpu, ctx, VB2_BUF_STATE_ERROR); } }
The i.MX8MM/N/P does not define the .reset op since reset of the VPU is done by genpd. Check whether the .reset op is defined before calling it to avoid NULL pointer dereference. Note that the Fixes tag is set to the commit which removed the reset op from i.MX8M Hantro G2 implementation, this is because before this commit all the implementations did define the .reset op. Fixes: 6971efb70ac3 ("media: hantro: Allow i.MX8MQ G1 and G2 to run independently") Signed-off-by: Marek Vasut <marex@denx.de> --- Cc: Adam Ford <aford173@gmail.com> Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com> Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: linux-media@vger.kernel.org Cc: linux-rockchip@lists.infradead.org --- drivers/media/platform/verisilicon/hantro_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)