Message ID | 20241219053734.588145-1-senozhatsky@chromium.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [PATCHv2] media: venc: destroy hfi session after m2m_ctx release | expand |
On Thu, Dec 19, 2024 at 02:37:08PM +0900, Sergey Senozhatsky wrote: > This partially reverts commit that made hfi_session_destroy() > the first step of vdec/venc close(). The reason being is a > regression report when, supposedly, encode/decoder is closed > with still active streaming (no ->stop_streaming() call before > close()) and pending pkts, so isr_thread cannot find instance > and fails to process those pending pkts. This was the idea > behind the original patch - make it impossible to use instance > under destruction, because this is racy, but apparently there > are uses cases that depend on that unsafe pattern. Return to > the old (unsafe) behaviour for the time being (until a better > fix is found). You are describing a regression. Could you please add corresponding Reported-by and Closes tags (for now you can post them in a reply to your patch, in future please include them in your commit message). > > Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction") > Cc: stable@vger.kernel.org > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> This is v2, but there is no changelog. Please provide one. Also please don't post your patches as replies to the previous versions. > --- > drivers/media/platform/qcom/venus/core.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c > index 2d27c5167246..807487a1f536 100644 > --- a/drivers/media/platform/qcom/venus/core.c > +++ b/drivers/media/platform/qcom/venus/core.c > @@ -506,18 +506,14 @@ static __maybe_unused int venus_runtime_suspend(struct device *dev) > void venus_close_common(struct venus_inst *inst) > { > /* > - * First, remove the inst from the ->instances list, so that > - * to_instance() will return NULL. > - */ > - hfi_session_destroy(inst); > - /* > - * Second, make sure we don't have IRQ/IRQ-thread currently running > + * Make sure we don't have IRQ/IRQ-thread currently running > * or pending execution, which would race with the inst destruction. > */ > synchronize_irq(inst->core->irq); > > v4l2_m2m_ctx_release(inst->m2m_ctx); > v4l2_m2m_release(inst->m2m_dev); > + hfi_session_destroy(inst); > v4l2_fh_del(&inst->fh); > v4l2_fh_exit(&inst->fh); > v4l2_ctrl_handler_free(&inst->ctrl_handler); > -- > 2.47.1.613.gc27f4b7a9f-goog >
On 19/12/2024 05:37, Sergey Senozhatsky wrote: > This partially reverts commit that made hfi_session_destroy() > the first step of vdec/venc close(). The reason being is a > regression report when, supposedly, encode/decoder is closed > with still active streaming (no ->stop_streaming() call before > close()) and pending pkts, so isr_thread cannot find instance > and fails to process those pending pkts. This was the idea > behind the original patch - make it impossible to use instance > under destruction, because this is racy, but apparently there > are uses cases that depend on that unsafe pattern. Return to > the old (unsafe) behaviour for the time being (until a better > fix is found). > > Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction") > Cc: stable@vger.kernel.org > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> > --- > drivers/media/platform/qcom/venus/core.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c > index 2d27c5167246..807487a1f536 100644 > --- a/drivers/media/platform/qcom/venus/core.c > +++ b/drivers/media/platform/qcom/venus/core.c > @@ -506,18 +506,14 @@ static __maybe_unused int venus_runtime_suspend(struct device *dev) > void venus_close_common(struct venus_inst *inst) > { > /* > - * First, remove the inst from the ->instances list, so that > - * to_instance() will return NULL. > - */ > - hfi_session_destroy(inst); > - /* > - * Second, make sure we don't have IRQ/IRQ-thread currently running > + * Make sure we don't have IRQ/IRQ-thread currently running > * or pending execution, which would race with the inst destruction. > */ > synchronize_irq(inst->core->irq); > > v4l2_m2m_ctx_release(inst->m2m_ctx); > v4l2_m2m_release(inst->m2m_dev); > + hfi_session_destroy(inst); > v4l2_fh_del(&inst->fh); > v4l2_fh_exit(&inst->fh); > v4l2_ctrl_handler_free(&inst->ctrl_handler); Two questions: 1: Will this revert re-instantiate the original bug @ commit 45b1a1b348ec178a599323f1ce7d7932aea8c6d4 Author: Sergey Senozhatsky <senozhatsky@chromium.org> Date: Sat Oct 26 01:56:42 2024 +0900 media: venus: sync with threaded IRQ during inst destruction 2: Why not balanced out the ordering of calls by moving hfi_session_create() in vdec_open_common() ? to match the ordering in venus_close_common(); --- bod
On (24/12/19 15:12), Dmitry Baryshkov wrote: > On Thu, Dec 19, 2024 at 02:37:08PM +0900, Sergey Senozhatsky wrote: > > This partially reverts commit that made hfi_session_destroy() > > the first step of vdec/venc close(). The reason being is a > > regression report when, supposedly, encode/decoder is closed > > with still active streaming (no ->stop_streaming() call before > > close()) and pending pkts, so isr_thread cannot find instance > > and fails to process those pending pkts. This was the idea > > behind the original patch - make it impossible to use instance > > under destruction, because this is racy, but apparently there > > are uses cases that depend on that unsafe pattern. Return to > > the old (unsafe) behaviour for the time being (until a better > > fix is found). > > You are describing a regression. Could you please add corresponding > Reported-by and Closes tags (for now you can post them in a reply to > your patch, in future please include them in your commit message). The regression report is internal, I don't have anything public. One of the teams found out that some of their tests started to fail. > > Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction") > > Cc: stable@vger.kernel.org > > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> > > This is v2, but there is no changelog. Please provide one. The code is obviously the same, I only added Cc: stable and removed one extra character in commit id "45b1a1b348ec".
On Fri, Dec 20, 2024 at 01:32:48PM +0900, Sergey Senozhatsky wrote: > On (24/12/19 15:12), Dmitry Baryshkov wrote: > > On Thu, Dec 19, 2024 at 02:37:08PM +0900, Sergey Senozhatsky wrote: > > > This partially reverts commit that made hfi_session_destroy() > > > the first step of vdec/venc close(). The reason being is a > > > regression report when, supposedly, encode/decoder is closed > > > with still active streaming (no ->stop_streaming() call before > > > close()) and pending pkts, so isr_thread cannot find instance > > > and fails to process those pending pkts. This was the idea > > > behind the original patch - make it impossible to use instance > > > under destruction, because this is racy, but apparently there > > > are uses cases that depend on that unsafe pattern. Return to > > > the old (unsafe) behaviour for the time being (until a better > > > fix is found). > > > > You are describing a regression. Could you please add corresponding > > Reported-by and Closes tags (for now you can post them in a reply to > > your patch, in future please include them in your commit message). > > The regression report is internal, I don't have anything public. > One of the teams found out that some of their tests started to > fail. Still you probably can have a Repored-by, unless it's anonymous report. > > > > Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> > > > > This is v2, but there is no changelog. Please provide one. > > The code is obviously the same, I only added Cc: stable and removed > one extra character in commit id "45b1a1b348ec". This is the changelog. It helps reviewers to understand that there were no code changes between versions.
On (24/12/19 14:03), Bryan O'Donoghue wrote: > On 19/12/2024 05:37, Sergey Senozhatsky wrote: > > This partially reverts commit that made hfi_session_destroy() > > the first step of vdec/venc close(). The reason being is a > > regression report when, supposedly, encode/decoder is closed > > with still active streaming (no ->stop_streaming() call before > > close()) and pending pkts, so isr_thread cannot find instance > > and fails to process those pending pkts. This was the idea > > behind the original patch - make it impossible to use instance > > under destruction, because this is racy, but apparently there > > are uses cases that depend on that unsafe pattern. Return to > > the old (unsafe) behaviour for the time being (until a better > > fix is found). [..] > Two questions: > > 1: Will this revert re-instantiate the original bug @ I'm afraid pretty much so, yes. isr_thread runs concurrently with close(), the instance is still in the streaming mode and there are pending pkts. As far as I understand it, stop_streaming() is called from close() v4l2_m2m_ctx_release() vb2_queue_release() // ->cap_q_ctx.q ->out_q_ctx.q vb2_core_queue_release() __vb2_cleanup_fileio() vb2_core_streamoff() At least this is how I understand the test that is failing. I don't have a fix (nor even an idea) at this point. > 2: Why not balanced out the ordering of calls by moving > hfi_session_create() in vdec_open_common() ? to match > the ordering in venus_close_common(); Something like this? --- diff --git a/drivers/media/platform/qcom/venus/vdec.c b/drivers/media/platform/qcom/venus/vdec.c index 98c22b9f9372..9f82882b77bc 100644 --- a/drivers/media/platform/qcom/venus/vdec.c +++ b/drivers/media/platform/qcom/venus/vdec.c @@ -1697,10 +1697,6 @@ static int vdec_open(struct file *file) if (ret) goto err_free; - ret = hfi_session_create(inst, &vdec_hfi_ops); - if (ret) - goto err_ctrl_deinit; - vdec_inst_init(inst); ida_init(&inst->dpb_ids); @@ -1712,15 +1708,19 @@ static int vdec_open(struct file *file) inst->m2m_dev = v4l2_m2m_init(&vdec_m2m_ops); if (IS_ERR(inst->m2m_dev)) { ret = PTR_ERR(inst->m2m_dev); - goto err_session_destroy; + goto err_ctrl_deinit; } inst->m2m_ctx = v4l2_m2m_ctx_init(inst->m2m_dev, inst, m2m_queue_init); if (IS_ERR(inst->m2m_ctx)) { ret = PTR_ERR(inst->m2m_ctx); - goto err_m2m_release; + goto err_m2m_dev_release; } + ret = hfi_session_create(inst, &vdec_hfi_ops); + if (ret) + goto err_m2m_ctx_release; + v4l2_fh_init(&inst->fh, core->vdev_dec); inst->fh.ctrl_handler = &inst->ctrl_handler; @@ -1730,10 +1730,10 @@ static int vdec_open(struct file *file) return 0; -err_m2m_release: +err_m2m_ctx_release: + v4l2_m2m_ctx_release(inst->m2m_ctx); +err_m2m_dev_release: v4l2_m2m_release(inst->m2m_dev); -err_session_destroy: - hfi_session_destroy(inst); err_ctrl_deinit: v4l2_ctrl_handler_free(&inst->ctrl_handler); err_free: diff --git a/drivers/media/platform/qcom/venus/venc.c b/drivers/media/platform/qcom/venus/venc.c index c1c543535aaf..c7f8e37dba9b 100644 --- a/drivers/media/platform/qcom/venus/venc.c +++ b/drivers/media/platform/qcom/venus/venc.c @@ -1492,10 +1492,6 @@ static int venc_open(struct file *file) if (ret) goto err_free; - ret = hfi_session_create(inst, &venc_hfi_ops); - if (ret) - goto err_ctrl_deinit; - venc_inst_init(inst); /* @@ -1505,15 +1501,19 @@ static int venc_open(struct file *file) inst->m2m_dev = v4l2_m2m_init(&venc_m2m_ops); if (IS_ERR(inst->m2m_dev)) { ret = PTR_ERR(inst->m2m_dev); - goto err_session_destroy; + goto err_ctrl_deinit; } inst->m2m_ctx = v4l2_m2m_ctx_init(inst->m2m_dev, inst, m2m_queue_init); if (IS_ERR(inst->m2m_ctx)) { ret = PTR_ERR(inst->m2m_ctx); - goto err_m2m_release; + goto err_m2m_dev_release; } + ret = hfi_session_create(inst, &venc_hfi_ops); + if (ret) + goto err_m2m_ctx_release; + v4l2_fh_init(&inst->fh, core->vdev_enc); inst->fh.ctrl_handler = &inst->ctrl_handler; @@ -1523,10 +1523,10 @@ static int venc_open(struct file *file) return 0; -err_m2m_release: +err_m2m_ctx_release: + v4l2_m2m_ctx_release(inst->m2m_ctx); +err_m2m_dev_release: v4l2_m2m_release(inst->m2m_dev); -err_session_destroy: - hfi_session_destroy(inst); err_ctrl_deinit: v4l2_ctrl_handler_free(&inst->ctrl_handler); err_free:
On (24/12/20 08:03), Dmitry Baryshkov wrote: > On Fri, Dec 20, 2024 at 01:32:48PM +0900, Sergey Senozhatsky wrote: > > On (24/12/19 15:12), Dmitry Baryshkov wrote: > > > On Thu, Dec 19, 2024 at 02:37:08PM +0900, Sergey Senozhatsky wrote: > > > > This partially reverts commit that made hfi_session_destroy() > > > > the first step of vdec/venc close(). The reason being is a > > > > regression report when, supposedly, encode/decoder is closed > > > > with still active streaming (no ->stop_streaming() call before > > > > close()) and pending pkts, so isr_thread cannot find instance > > > > and fails to process those pending pkts. This was the idea > > > > behind the original patch - make it impossible to use instance > > > > under destruction, because this is racy, but apparently there > > > > are uses cases that depend on that unsafe pattern. Return to > > > > the old (unsafe) behaviour for the time being (until a better > > > > fix is found). > > > > > > You are describing a regression. Could you please add corresponding > > > Reported-by and Closes tags (for now you can post them in a reply to > > > your patch, in future please include them in your commit message). > > > > The regression report is internal, I don't have anything public. > > One of the teams found out that some of their tests started to > > fail. > > Still you probably can have a Repored-by, unless it's anonymous report. Sure. > > > > Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction") > > > > Cc: stable@vger.kernel.org > > > > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> > > > > > > This is v2, but there is no changelog. Please provide one. > > > > The code is obviously the same, I only added Cc: stable and removed > > one extra character in commit id "45b1a1b348ec". > > This is the changelog. It helps reviewers to understand that there were > no code changes between versions. Sure.
diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c index 2d27c5167246..807487a1f536 100644 --- a/drivers/media/platform/qcom/venus/core.c +++ b/drivers/media/platform/qcom/venus/core.c @@ -506,18 +506,14 @@ static __maybe_unused int venus_runtime_suspend(struct device *dev) void venus_close_common(struct venus_inst *inst) { /* - * First, remove the inst from the ->instances list, so that - * to_instance() will return NULL. - */ - hfi_session_destroy(inst); - /* - * Second, make sure we don't have IRQ/IRQ-thread currently running + * Make sure we don't have IRQ/IRQ-thread currently running * or pending execution, which would race with the inst destruction. */ synchronize_irq(inst->core->irq); v4l2_m2m_ctx_release(inst->m2m_ctx); v4l2_m2m_release(inst->m2m_dev); + hfi_session_destroy(inst); v4l2_fh_del(&inst->fh); v4l2_fh_exit(&inst->fh); v4l2_ctrl_handler_free(&inst->ctrl_handler);
This partially reverts commit that made hfi_session_destroy() the first step of vdec/venc close(). The reason being is a regression report when, supposedly, encode/decoder is closed with still active streaming (no ->stop_streaming() call before close()) and pending pkts, so isr_thread cannot find instance and fails to process those pending pkts. This was the idea behind the original patch - make it impossible to use instance under destruction, because this is racy, but apparently there are uses cases that depend on that unsafe pattern. Return to the old (unsafe) behaviour for the time being (until a better fix is found). Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction") Cc: stable@vger.kernel.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> --- drivers/media/platform/qcom/venus/core.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)