diff mbox series

[PATCHv2] media: venc: destroy hfi session after m2m_ctx release

Message ID 20241219053734.588145-1-senozhatsky@chromium.org (mailing list archive)
State New
Headers show
Series [PATCHv2] media: venc: destroy hfi session after m2m_ctx release | expand

Commit Message

Sergey Senozhatsky Dec. 19, 2024, 5:37 a.m. UTC
This partially reverts commit that made hfi_session_destroy()
the first step of vdec/venc close().  The reason being is a
regression report when, supposedly, encode/decoder is closed
with still active streaming (no ->stop_streaming() call before
close()) and pending pkts, so isr_thread cannot find instance
and fails to process those pending pkts.  This was the idea
behind the original patch - make it impossible to use instance
under destruction, because this is racy, but apparently there
are uses cases that depend on that unsafe pattern.  Return to
the old (unsafe) behaviour for the time being (until a better
fix is found).

Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction")
Cc: stable@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/media/platform/qcom/venus/core.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Dmitry Baryshkov Dec. 19, 2024, 1:12 p.m. UTC | #1
On Thu, Dec 19, 2024 at 02:37:08PM +0900, Sergey Senozhatsky wrote:
> This partially reverts commit that made hfi_session_destroy()
> the first step of vdec/venc close().  The reason being is a
> regression report when, supposedly, encode/decoder is closed
> with still active streaming (no ->stop_streaming() call before
> close()) and pending pkts, so isr_thread cannot find instance
> and fails to process those pending pkts.  This was the idea
> behind the original patch - make it impossible to use instance
> under destruction, because this is racy, but apparently there
> are uses cases that depend on that unsafe pattern.  Return to
> the old (unsafe) behaviour for the time being (until a better
> fix is found).

You are describing a regression. Could you please add corresponding
Reported-by and Closes tags (for now you can post them in a reply to
your patch, in future please include them in your commit message).

> 
> Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>

This is v2, but there is no changelog. Please provide one.

Also please don't post your patches as replies to the previous versions.

> ---
>  drivers/media/platform/qcom/venus/core.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
> index 2d27c5167246..807487a1f536 100644
> --- a/drivers/media/platform/qcom/venus/core.c
> +++ b/drivers/media/platform/qcom/venus/core.c
> @@ -506,18 +506,14 @@ static __maybe_unused int venus_runtime_suspend(struct device *dev)
>  void venus_close_common(struct venus_inst *inst)
>  {
>  	/*
> -	 * First, remove the inst from the ->instances list, so that
> -	 * to_instance() will return NULL.
> -	 */
> -	hfi_session_destroy(inst);
> -	/*
> -	 * Second, make sure we don't have IRQ/IRQ-thread currently running
> +	 * Make sure we don't have IRQ/IRQ-thread currently running
>  	 * or pending execution, which would race with the inst destruction.
>  	 */
>  	synchronize_irq(inst->core->irq);
>  
>  	v4l2_m2m_ctx_release(inst->m2m_ctx);
>  	v4l2_m2m_release(inst->m2m_dev);
> +	hfi_session_destroy(inst);
>  	v4l2_fh_del(&inst->fh);
>  	v4l2_fh_exit(&inst->fh);
>  	v4l2_ctrl_handler_free(&inst->ctrl_handler);
> -- 
> 2.47.1.613.gc27f4b7a9f-goog
>
Bryan O'Donoghue Dec. 19, 2024, 2:03 p.m. UTC | #2
On 19/12/2024 05:37, Sergey Senozhatsky wrote:
> This partially reverts commit that made hfi_session_destroy()
> the first step of vdec/venc close().  The reason being is a
> regression report when, supposedly, encode/decoder is closed
> with still active streaming (no ->stop_streaming() call before
> close()) and pending pkts, so isr_thread cannot find instance
> and fails to process those pending pkts.  This was the idea
> behind the original patch - make it impossible to use instance
> under destruction, because this is racy, but apparently there
> are uses cases that depend on that unsafe pattern.  Return to
> the old (unsafe) behaviour for the time being (until a better
> fix is found).
> 
> Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>   drivers/media/platform/qcom/venus/core.c | 8 ++------
>   1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
> index 2d27c5167246..807487a1f536 100644
> --- a/drivers/media/platform/qcom/venus/core.c
> +++ b/drivers/media/platform/qcom/venus/core.c
> @@ -506,18 +506,14 @@ static __maybe_unused int venus_runtime_suspend(struct device *dev)
>   void venus_close_common(struct venus_inst *inst)
>   {
>   	/*
> -	 * First, remove the inst from the ->instances list, so that
> -	 * to_instance() will return NULL.
> -	 */
> -	hfi_session_destroy(inst);
> -	/*
> -	 * Second, make sure we don't have IRQ/IRQ-thread currently running
> +	 * Make sure we don't have IRQ/IRQ-thread currently running
>   	 * or pending execution, which would race with the inst destruction.
>   	 */
>   	synchronize_irq(inst->core->irq);
>   
>   	v4l2_m2m_ctx_release(inst->m2m_ctx);
>   	v4l2_m2m_release(inst->m2m_dev);
> +	hfi_session_destroy(inst);
>   	v4l2_fh_del(&inst->fh);
>   	v4l2_fh_exit(&inst->fh);
>   	v4l2_ctrl_handler_free(&inst->ctrl_handler);

Two questions:

1: Will this revert re-instantiate the original bug @

commit 45b1a1b348ec178a599323f1ce7d7932aea8c6d4
Author: Sergey Senozhatsky <senozhatsky@chromium.org>
Date:   Sat Oct 26 01:56:42 2024 +0900

     media: venus: sync with threaded IRQ during inst destruction


2: Why not balanced out the ordering of calls by moving
    hfi_session_create() in vdec_open_common() ? to match
    the ordering in venus_close_common();

---
bod
Sergey Senozhatsky Dec. 20, 2024, 4:32 a.m. UTC | #3
On (24/12/19 15:12), Dmitry Baryshkov wrote:
> On Thu, Dec 19, 2024 at 02:37:08PM +0900, Sergey Senozhatsky wrote:
> > This partially reverts commit that made hfi_session_destroy()
> > the first step of vdec/venc close().  The reason being is a
> > regression report when, supposedly, encode/decoder is closed
> > with still active streaming (no ->stop_streaming() call before
> > close()) and pending pkts, so isr_thread cannot find instance
> > and fails to process those pending pkts.  This was the idea
> > behind the original patch - make it impossible to use instance
> > under destruction, because this is racy, but apparently there
> > are uses cases that depend on that unsafe pattern.  Return to
> > the old (unsafe) behaviour for the time being (until a better
> > fix is found).
> 
> You are describing a regression. Could you please add corresponding
> Reported-by and Closes tags (for now you can post them in a reply to
> your patch, in future please include them in your commit message).

The regression report is internal, I don't have anything public.
One of the teams found out that some of their tests started to
fail.

> > Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> 
> This is v2, but there is no changelog. Please provide one.

The code is obviously the same, I only added Cc: stable and removed
one extra character in commit id "45b1a1b348ec".
Dmitry Baryshkov Dec. 20, 2024, 6:03 a.m. UTC | #4
On Fri, Dec 20, 2024 at 01:32:48PM +0900, Sergey Senozhatsky wrote:
> On (24/12/19 15:12), Dmitry Baryshkov wrote:
> > On Thu, Dec 19, 2024 at 02:37:08PM +0900, Sergey Senozhatsky wrote:
> > > This partially reverts commit that made hfi_session_destroy()
> > > the first step of vdec/venc close().  The reason being is a
> > > regression report when, supposedly, encode/decoder is closed
> > > with still active streaming (no ->stop_streaming() call before
> > > close()) and pending pkts, so isr_thread cannot find instance
> > > and fails to process those pending pkts.  This was the idea
> > > behind the original patch - make it impossible to use instance
> > > under destruction, because this is racy, but apparently there
> > > are uses cases that depend on that unsafe pattern.  Return to
> > > the old (unsafe) behaviour for the time being (until a better
> > > fix is found).
> > 
> > You are describing a regression. Could you please add corresponding
> > Reported-by and Closes tags (for now you can post them in a reply to
> > your patch, in future please include them in your commit message).
> 
> The regression report is internal, I don't have anything public.
> One of the teams found out that some of their tests started to
> fail.

Still you probably can have a Repored-by, unless it's anonymous report.

> 
> > > Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> > 
> > This is v2, but there is no changelog. Please provide one.
> 
> The code is obviously the same, I only added Cc: stable and removed
> one extra character in commit id "45b1a1b348ec".

This is the changelog. It helps reviewers to understand that there were
no code changes between versions.
Sergey Senozhatsky Dec. 20, 2024, 6:21 a.m. UTC | #5
On (24/12/19 14:03), Bryan O'Donoghue wrote:
> On 19/12/2024 05:37, Sergey Senozhatsky wrote:
> > This partially reverts commit that made hfi_session_destroy()
> > the first step of vdec/venc close().  The reason being is a
> > regression report when, supposedly, encode/decoder is closed
> > with still active streaming (no ->stop_streaming() call before
> > close()) and pending pkts, so isr_thread cannot find instance
> > and fails to process those pending pkts.  This was the idea
> > behind the original patch - make it impossible to use instance
> > under destruction, because this is racy, but apparently there
> > are uses cases that depend on that unsafe pattern.  Return to
> > the old (unsafe) behaviour for the time being (until a better
> > fix is found).

[..]

> Two questions:
> 
> 1: Will this revert re-instantiate the original bug @

I'm afraid pretty much so, yes.  isr_thread runs concurrently
with close(), the instance is still in the streaming mode and
there are pending pkts.  As far as I understand it, stop_streaming()
is called from

close()
 v4l2_m2m_ctx_release()
  vb2_queue_release()        // ->cap_q_ctx.q  ->out_q_ctx.q
   vb2_core_queue_release()
    __vb2_cleanup_fileio()
	 vb2_core_streamoff()

At least this is how I understand the test that is failing.

I don't have a fix (nor even an idea) at this point.

> 2: Why not balanced out the ordering of calls by moving
>    hfi_session_create() in vdec_open_common() ? to match
>    the ordering in venus_close_common();


Something like this?

---

diff --git a/drivers/media/platform/qcom/venus/vdec.c b/drivers/media/platform/qcom/venus/vdec.c
index 98c22b9f9372..9f82882b77bc 100644
--- a/drivers/media/platform/qcom/venus/vdec.c
+++ b/drivers/media/platform/qcom/venus/vdec.c
@@ -1697,10 +1697,6 @@ static int vdec_open(struct file *file)
 	if (ret)
 		goto err_free;
 
-	ret = hfi_session_create(inst, &vdec_hfi_ops);
-	if (ret)
-		goto err_ctrl_deinit;
-
 	vdec_inst_init(inst);
 
 	ida_init(&inst->dpb_ids);
@@ -1712,15 +1708,19 @@ static int vdec_open(struct file *file)
 	inst->m2m_dev = v4l2_m2m_init(&vdec_m2m_ops);
 	if (IS_ERR(inst->m2m_dev)) {
 		ret = PTR_ERR(inst->m2m_dev);
-		goto err_session_destroy;
+		goto err_ctrl_deinit;
 	}
 
 	inst->m2m_ctx = v4l2_m2m_ctx_init(inst->m2m_dev, inst, m2m_queue_init);
 	if (IS_ERR(inst->m2m_ctx)) {
 		ret = PTR_ERR(inst->m2m_ctx);
-		goto err_m2m_release;
+		goto err_m2m_dev_release;
 	}
 
+	ret = hfi_session_create(inst, &vdec_hfi_ops);
+	if (ret)
+		goto err_m2m_ctx_release;
+
 	v4l2_fh_init(&inst->fh, core->vdev_dec);
 
 	inst->fh.ctrl_handler = &inst->ctrl_handler;
@@ -1730,10 +1730,10 @@ static int vdec_open(struct file *file)
 
 	return 0;
 
-err_m2m_release:
+err_m2m_ctx_release:
+	v4l2_m2m_ctx_release(inst->m2m_ctx);
+err_m2m_dev_release:
 	v4l2_m2m_release(inst->m2m_dev);
-err_session_destroy:
-	hfi_session_destroy(inst);
 err_ctrl_deinit:
 	v4l2_ctrl_handler_free(&inst->ctrl_handler);
 err_free:
diff --git a/drivers/media/platform/qcom/venus/venc.c b/drivers/media/platform/qcom/venus/venc.c
index c1c543535aaf..c7f8e37dba9b 100644
--- a/drivers/media/platform/qcom/venus/venc.c
+++ b/drivers/media/platform/qcom/venus/venc.c
@@ -1492,10 +1492,6 @@ static int venc_open(struct file *file)
 	if (ret)
 		goto err_free;
 
-	ret = hfi_session_create(inst, &venc_hfi_ops);
-	if (ret)
-		goto err_ctrl_deinit;
-
 	venc_inst_init(inst);
 
 	/*
@@ -1505,15 +1501,19 @@ static int venc_open(struct file *file)
 	inst->m2m_dev = v4l2_m2m_init(&venc_m2m_ops);
 	if (IS_ERR(inst->m2m_dev)) {
 		ret = PTR_ERR(inst->m2m_dev);
-		goto err_session_destroy;
+		goto err_ctrl_deinit;
 	}
 
 	inst->m2m_ctx = v4l2_m2m_ctx_init(inst->m2m_dev, inst, m2m_queue_init);
 	if (IS_ERR(inst->m2m_ctx)) {
 		ret = PTR_ERR(inst->m2m_ctx);
-		goto err_m2m_release;
+		goto err_m2m_dev_release;
 	}
 
+	ret = hfi_session_create(inst, &venc_hfi_ops);
+	if (ret)
+		goto err_m2m_ctx_release;
+
 	v4l2_fh_init(&inst->fh, core->vdev_enc);
 
 	inst->fh.ctrl_handler = &inst->ctrl_handler;
@@ -1523,10 +1523,10 @@ static int venc_open(struct file *file)
 
 	return 0;
 
-err_m2m_release:
+err_m2m_ctx_release:
+	v4l2_m2m_ctx_release(inst->m2m_ctx);
+err_m2m_dev_release:
 	v4l2_m2m_release(inst->m2m_dev);
-err_session_destroy:
-	hfi_session_destroy(inst);
 err_ctrl_deinit:
 	v4l2_ctrl_handler_free(&inst->ctrl_handler);
 err_free:
Sergey Senozhatsky Dec. 20, 2024, 6:22 a.m. UTC | #6
On (24/12/20 08:03), Dmitry Baryshkov wrote:
> On Fri, Dec 20, 2024 at 01:32:48PM +0900, Sergey Senozhatsky wrote:
> > On (24/12/19 15:12), Dmitry Baryshkov wrote:
> > > On Thu, Dec 19, 2024 at 02:37:08PM +0900, Sergey Senozhatsky wrote:
> > > > This partially reverts commit that made hfi_session_destroy()
> > > > the first step of vdec/venc close().  The reason being is a
> > > > regression report when, supposedly, encode/decoder is closed
> > > > with still active streaming (no ->stop_streaming() call before
> > > > close()) and pending pkts, so isr_thread cannot find instance
> > > > and fails to process those pending pkts.  This was the idea
> > > > behind the original patch - make it impossible to use instance
> > > > under destruction, because this is racy, but apparently there
> > > > are uses cases that depend on that unsafe pattern.  Return to
> > > > the old (unsafe) behaviour for the time being (until a better
> > > > fix is found).
> > > 
> > > You are describing a regression. Could you please add corresponding
> > > Reported-by and Closes tags (for now you can post them in a reply to
> > > your patch, in future please include them in your commit message).
> > 
> > The regression report is internal, I don't have anything public.
> > One of the teams found out that some of their tests started to
> > fail.
> 
> Still you probably can have a Repored-by, unless it's anonymous report.

Sure.

> > > > Fixes: 45b1a1b348ec ("media: venus: sync with threaded IRQ during inst destruction")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> > > 
> > > This is v2, but there is no changelog. Please provide one.
> > 
> > The code is obviously the same, I only added Cc: stable and removed
> > one extra character in commit id "45b1a1b348ec".
> 
> This is the changelog. It helps reviewers to understand that there were
> no code changes between versions.

Sure.
diff mbox series

Patch

diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
index 2d27c5167246..807487a1f536 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -506,18 +506,14 @@  static __maybe_unused int venus_runtime_suspend(struct device *dev)
 void venus_close_common(struct venus_inst *inst)
 {
 	/*
-	 * First, remove the inst from the ->instances list, so that
-	 * to_instance() will return NULL.
-	 */
-	hfi_session_destroy(inst);
-	/*
-	 * Second, make sure we don't have IRQ/IRQ-thread currently running
+	 * Make sure we don't have IRQ/IRQ-thread currently running
 	 * or pending execution, which would race with the inst destruction.
 	 */
 	synchronize_irq(inst->core->irq);
 
 	v4l2_m2m_ctx_release(inst->m2m_ctx);
 	v4l2_m2m_release(inst->m2m_dev);
+	hfi_session_destroy(inst);
 	v4l2_fh_del(&inst->fh);
 	v4l2_fh_exit(&inst->fh);
 	v4l2_ctrl_handler_free(&inst->ctrl_handler);