diff mbox series

[2/2] media: venus: sync with threaded IRQ during inst destruction

Message ID 20241023052444.139356-3-senozhatsky@chromium.org (mailing list archive)
State New
Headers show
Series media: venus: close() fixes | expand

Commit Message

Sergey Senozhatsky Oct. 23, 2024, 5:24 a.m. UTC
When destroy inst we should make sure that we don't race
against threaded IRQ (or pending IRQ), otherwise we can
concurrently kfree() inst context and inst itself.

BUG: KASAN: slab-use-after-free in vb2_queue_error+0x80/0x90
Call trace:
dump_backtrace+0x1c4/0x1f8
show_stack+0x38/0x60
dump_stack_lvl+0x168/0x1f0
print_report+0x170/0x4c8
kasan_report+0x94/0xd0
__asan_report_load2_noabort+0x20/0x30
vb2_queue_error+0x80/0x90
venus_helper_vb2_queue_error+0x54/0x78
venc_event_notify+0xec/0x158
hfi_event_notify+0x878/0xd20
hfi_process_msg_packet+0x27c/0x4e0
venus_isr_thread+0x258/0x6e8
hfi_isr_thread+0x70/0x90
venus_isr_thread+0x34/0x50
irq_thread_fn+0x88/0x130
irq_thread+0x160/0x2c0
kthread+0x294/0x328
ret_from_fork+0x10/0x20

Allocated by task 20291:
kasan_set_track+0x4c/0x80
kasan_save_alloc_info+0x28/0x38
__kasan_kmalloc+0x84/0xa0
kmalloc_trace+0x7c/0x98
v4l2_m2m_ctx_init+0x74/0x280
venc_open+0x444/0x6d0
v4l2_open+0x19c/0x2a0
chrdev_open+0x374/0x3f0
do_dentry_open+0x710/0x10a8
vfs_open+0x88/0xa8
path_openat+0x1e6c/0x2700
do_filp_open+0x1a4/0x2e0
do_sys_openat2+0xe8/0x508
do_sys_open+0x15c/0x1a0
__arm64_sys_openat+0xa8/0xc8
invoke_syscall+0xdc/0x270
el0_svc_common+0x1ec/0x250
do_el0_svc+0x54/0x70
el0_svc+0x50/0xe8
el0t_64_sync_handler+0x48/0x120
el0t_64_sync+0x1a8/0x1b0

Guard inst destruction (both dec and enc) with hard and threaded
IRQ synchronization.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/media/platform/qcom/venus/vdec.c | 13 +++++++++++++
 drivers/media/platform/qcom/venus/venc.c | 13 +++++++++++++
 2 files changed, 26 insertions(+)

Comments

Sergey Senozhatsky Oct. 24, 2024, 4:58 a.m. UTC | #1
On (24/10/23 14:24), Sergey Senozhatsky wrote:
> Guard inst destruction (both dec and enc) with hard and threaded
> IRQ synchronization.

Folks, please ignore this patch.   Stand by for v2.
Sergey Senozhatsky Oct. 24, 2024, 5:13 a.m. UTC | #2
On (24/10/24 13:58), Sergey Senozhatsky wrote:
> Date: Thu, 24 Oct 2024 13:58:36 +0900
> From: Sergey Senozhatsky <senozhatsky@chromium.org>
> To: Sergey Senozhatsky <senozhatsky@chromium.org>
> Cc: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>, Vikash Garodia
>  <quic_vgarodia@quicinc.com>, Bryan O'Donoghue
>  <bryan.odonoghue@linaro.org>, linux-media@vger.kernel.org,
>  linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 2/2] media: venus: sync with threaded IRQ during inst
>  destruction
> Message-ID: <20241024045836.GJ1279924@google.com>
> 
> On (24/10/23 14:24), Sergey Senozhatsky wrote:
> > Guard inst destruction (both dec and enc) with hard and threaded
> > IRQ synchronization.
> 
> Folks, please ignore this patch.   Stand by for v2.

I think it probably should be something like this (both for dec and
enc).

---

@@ -1538,9 +1538,25 @@ static int venc_close(struct file *file)
 
        venc_pm_get(inst);
 
+       /*
+        * First, remove the inst from the ->instances list, so that
+        * to_instance() will return NULL.
+        */
+       hfi_session_destroy(inst);
+       /*
+        * Second, make sure we don't have IRQ/IRQ-thread currently running or
+        * pending execution (disable_irq() calls synchronize_irq()), which
+        * can race with the inst destruction.
+        */
+       disable_irq(inst->core->irq);
+       /*
+        * Lastly, inst is gone from the core->instances list and we don't
+        * have running/pending IRQ/IRQ-thread, proceed with the destruction
+        */
+       enable_irq(inst->core->irq);
+
        v4l2_m2m_ctx_release(inst->m2m_ctx);
        v4l2_m2m_release(inst->m2m_dev);
-       hfi_session_destroy(inst);
        v4l2_fh_del(&inst->fh);
        v4l2_fh_exit(&inst->fh);
        venc_ctrl_deinit(inst);
Tomasz Figa Oct. 24, 2024, 5:18 a.m. UTC | #3
Hi Sergey,

On Thu, Oct 24, 2024 at 2:13 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/10/24 13:58), Sergey Senozhatsky wrote:
> > Date: Thu, 24 Oct 2024 13:58:36 +0900
> > From: Sergey Senozhatsky <senozhatsky@chromium.org>
> > To: Sergey Senozhatsky <senozhatsky@chromium.org>
> > Cc: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>, Vikash Garodia
> >  <quic_vgarodia@quicinc.com>, Bryan O'Donoghue
> >  <bryan.odonoghue@linaro.org>, linux-media@vger.kernel.org,
> >  linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH 2/2] media: venus: sync with threaded IRQ during inst
> >  destruction
> > Message-ID: <20241024045836.GJ1279924@google.com>
> >
> > On (24/10/23 14:24), Sergey Senozhatsky wrote:
> > > Guard inst destruction (both dec and enc) with hard and threaded
> > > IRQ synchronization.
> >
> > Folks, please ignore this patch.   Stand by for v2.
>
> I think it probably should be something like this (both for dec and
> enc).
>
> ---
>
> @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file)
>
>         venc_pm_get(inst);
>
> +       /*
> +        * First, remove the inst from the ->instances list, so that
> +        * to_instance() will return NULL.
> +        */
> +       hfi_session_destroy(inst);
> +       /*
> +        * Second, make sure we don't have IRQ/IRQ-thread currently running or
> +        * pending execution (disable_irq() calls synchronize_irq()), which
> +        * can race with the inst destruction.
> +        */
> +       disable_irq(inst->core->irq);
> +       /*
> +        * Lastly, inst is gone from the core->instances list and we don't
> +        * have running/pending IRQ/IRQ-thread, proceed with the destruction
> +        */
> +       enable_irq(inst->core->irq);
> +

Thanks a lot for looking into this. Wouldn't it be enough to just call
synchronize_irq() at this point, since the instance was removed from
the list already? I guess the question is if that's the only way the
interrupt handler can get hold of the instance.

Best,
Tomasz

>         v4l2_m2m_ctx_release(inst->m2m_ctx);
>         v4l2_m2m_release(inst->m2m_dev);
> -       hfi_session_destroy(inst);
>         v4l2_fh_del(&inst->fh);
>         v4l2_fh_exit(&inst->fh);
>         venc_ctrl_deinit(inst);
>
Sergey Senozhatsky Oct. 24, 2024, 5:46 a.m. UTC | #4
On (24/10/24 14:18), Tomasz Figa wrote:
> > @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file)
> >
> >         venc_pm_get(inst);
> >
> > +       /*
> > +        * First, remove the inst from the ->instances list, so that
> > +        * to_instance() will return NULL.
> > +        */
> > +       hfi_session_destroy(inst);
> > +       /*
> > +        * Second, make sure we don't have IRQ/IRQ-thread currently running or
> > +        * pending execution (disable_irq() calls synchronize_irq()), which
> > +        * can race with the inst destruction.
> > +        */
> > +       disable_irq(inst->core->irq);
> > +       /*
> > +        * Lastly, inst is gone from the core->instances list and we don't
> > +        * have running/pending IRQ/IRQ-thread, proceed with the destruction
> > +        */
> > +       enable_irq(inst->core->irq);
> > +
> 
> Thanks a lot for looking into this. Wouldn't it be enough to just call
> synchronize_irq() at this point, since the instance was removed from
> the list already? I guess the question is if that's the only way the
> interrupt handler can get hold of the instance.

Good question.

synchronize_irq() waits for IRQ-threads, so if inst is accessed only from
IRQ-thread then we are fine.  If, however, inst is also accessed from hard
IRQ, then synchronize_irq() won't work, I guess, because it doesn't wait
for "in flight hard IRQs".  disable_irq() OTOH "waits for completion", so
we cover in-flight hard IRQs too.
Tomasz Figa Oct. 24, 2024, 6:05 a.m. UTC | #5
On Thu, Oct 24, 2024 at 2:46 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/10/24 14:18), Tomasz Figa wrote:
> > > @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file)
> > >
> > >         venc_pm_get(inst);
> > >
> > > +       /*
> > > +        * First, remove the inst from the ->instances list, so that
> > > +        * to_instance() will return NULL.
> > > +        */
> > > +       hfi_session_destroy(inst);
> > > +       /*
> > > +        * Second, make sure we don't have IRQ/IRQ-thread currently running or
> > > +        * pending execution (disable_irq() calls synchronize_irq()), which
> > > +        * can race with the inst destruction.
> > > +        */
> > > +       disable_irq(inst->core->irq);
> > > +       /*
> > > +        * Lastly, inst is gone from the core->instances list and we don't
> > > +        * have running/pending IRQ/IRQ-thread, proceed with the destruction
> > > +        */
> > > +       enable_irq(inst->core->irq);
> > > +
> >
> > Thanks a lot for looking into this. Wouldn't it be enough to just call
> > synchronize_irq() at this point, since the instance was removed from
> > the list already? I guess the question is if that's the only way the
> > interrupt handler can get hold of the instance.
>
> Good question.
>
> synchronize_irq() waits for IRQ-threads, so if inst is accessed only from
> IRQ-thread then we are fine.  If, however, inst is also accessed from hard
> IRQ, then synchronize_irq() won't work, I guess, because it doesn't wait
> for "in flight hard IRQs".  disable_irq() OTOH "waits for completion", so
> we cover in-flight hard IRQs too.

Looking at the code, synchronize_irq() internally also calls
synchronize_hardirq() and that in turn waits for the
IRQD_IRQ_INPROGESS flag to be cleared before returning [1]. The flag
is set by handle_irq_event() before most of the IRQ handling is run
and cleared at the end of the function [2], which makes me believe
that it would actually ensure all the hardirq and threaded IRQ
handlers would be waited for.

[1] https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/manage.c#L38
[2] https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/handle.c#L202

Although I guess it would be the best if someone confirmed that,
because with all the IRQ handling complexities of SMP, nothing can be
certain today. :P

Best,
Tomasz
diff mbox series

Patch

diff --git a/drivers/media/platform/qcom/venus/vdec.c b/drivers/media/platform/qcom/venus/vdec.c
index 0013c4704f03..ff1823bc967c 100644
--- a/drivers/media/platform/qcom/venus/vdec.c
+++ b/drivers/media/platform/qcom/venus/vdec.c
@@ -1747,6 +1747,13 @@  static int vdec_close(struct file *file)
 {
 	struct venus_inst *inst = to_inst(file);
 
+	/*
+	 * Make sure we don't have pending IRQs and no threaded IRQ handler is
+	 * running nor pending (synchronize_irq)), which can race with inst
+	 * destruction.
+	 */
+	disable_irq(inst->core->irq);
+
 	vdec_pm_get(inst);
 
 	cancel_work_sync(&inst->delayed_process_work);
@@ -1763,6 +1770,12 @@  static int vdec_close(struct file *file)
 
 	vdec_pm_put(inst, false);
 
+	/*
+	 * inst is gone from the core->instances list, re-enable IRQ and
+	 * threaded IRQ
+	 */
+	enable_irq(inst->core->irq);
+
 	kfree(inst);
 	return 0;
 }
diff --git a/drivers/media/platform/qcom/venus/venc.c b/drivers/media/platform/qcom/venus/venc.c
index 6a26a6592424..6575e84312fe 100644
--- a/drivers/media/platform/qcom/venus/venc.c
+++ b/drivers/media/platform/qcom/venus/venc.c
@@ -1515,6 +1515,13 @@  static int venc_close(struct file *file)
 {
 	struct venus_inst *inst = to_inst(file);
 
+	/*
+	 * Make sure we don't have pending IRQs and no threaded IRQ handler is
+	 * running nor pending (synchronize_irq)), which can race with inst
+	 * destruction.
+	 */
+	disable_irq(inst->core->irq);
+
 	venc_pm_get(inst);
 
 	v4l2_m2m_ctx_release(inst->m2m_ctx);
@@ -1530,6 +1537,12 @@  static int venc_close(struct file *file)
 
 	venc_pm_put(inst, false);
 
+	/*
+	 * inst is gone from the core->instances list, re-enable IRQ and
+	 * threaded IRQ
+	 */
+	enable_irq(inst->core->irq);
+
 	kfree(inst);
 	return 0;
 }