Message ID | 20241023052444.139356-3-senozhatsky@chromium.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | media: venus: close() fixes | expand |
On (24/10/23 14:24), Sergey Senozhatsky wrote: > Guard inst destruction (both dec and enc) with hard and threaded > IRQ synchronization. Folks, please ignore this patch. Stand by for v2.
On (24/10/24 13:58), Sergey Senozhatsky wrote: > Date: Thu, 24 Oct 2024 13:58:36 +0900 > From: Sergey Senozhatsky <senozhatsky@chromium.org> > To: Sergey Senozhatsky <senozhatsky@chromium.org> > Cc: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>, Vikash Garodia > <quic_vgarodia@quicinc.com>, Bryan O'Donoghue > <bryan.odonoghue@linaro.org>, linux-media@vger.kernel.org, > linux-kernel@vger.kernel.org > Subject: Re: [PATCH 2/2] media: venus: sync with threaded IRQ during inst > destruction > Message-ID: <20241024045836.GJ1279924@google.com> > > On (24/10/23 14:24), Sergey Senozhatsky wrote: > > Guard inst destruction (both dec and enc) with hard and threaded > > IRQ synchronization. > > Folks, please ignore this patch. Stand by for v2. I think it probably should be something like this (both for dec and enc). --- @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file) venc_pm_get(inst); + /* + * First, remove the inst from the ->instances list, so that + * to_instance() will return NULL. + */ + hfi_session_destroy(inst); + /* + * Second, make sure we don't have IRQ/IRQ-thread currently running or + * pending execution (disable_irq() calls synchronize_irq()), which + * can race with the inst destruction. + */ + disable_irq(inst->core->irq); + /* + * Lastly, inst is gone from the core->instances list and we don't + * have running/pending IRQ/IRQ-thread, proceed with the destruction + */ + enable_irq(inst->core->irq); + v4l2_m2m_ctx_release(inst->m2m_ctx); v4l2_m2m_release(inst->m2m_dev); - hfi_session_destroy(inst); v4l2_fh_del(&inst->fh); v4l2_fh_exit(&inst->fh); venc_ctrl_deinit(inst);
Hi Sergey, On Thu, Oct 24, 2024 at 2:13 PM Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > On (24/10/24 13:58), Sergey Senozhatsky wrote: > > Date: Thu, 24 Oct 2024 13:58:36 +0900 > > From: Sergey Senozhatsky <senozhatsky@chromium.org> > > To: Sergey Senozhatsky <senozhatsky@chromium.org> > > Cc: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>, Vikash Garodia > > <quic_vgarodia@quicinc.com>, Bryan O'Donoghue > > <bryan.odonoghue@linaro.org>, linux-media@vger.kernel.org, > > linux-kernel@vger.kernel.org > > Subject: Re: [PATCH 2/2] media: venus: sync with threaded IRQ during inst > > destruction > > Message-ID: <20241024045836.GJ1279924@google.com> > > > > On (24/10/23 14:24), Sergey Senozhatsky wrote: > > > Guard inst destruction (both dec and enc) with hard and threaded > > > IRQ synchronization. > > > > Folks, please ignore this patch. Stand by for v2. > > I think it probably should be something like this (both for dec and > enc). > > --- > > @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file) > > venc_pm_get(inst); > > + /* > + * First, remove the inst from the ->instances list, so that > + * to_instance() will return NULL. > + */ > + hfi_session_destroy(inst); > + /* > + * Second, make sure we don't have IRQ/IRQ-thread currently running or > + * pending execution (disable_irq() calls synchronize_irq()), which > + * can race with the inst destruction. > + */ > + disable_irq(inst->core->irq); > + /* > + * Lastly, inst is gone from the core->instances list and we don't > + * have running/pending IRQ/IRQ-thread, proceed with the destruction > + */ > + enable_irq(inst->core->irq); > + Thanks a lot for looking into this. Wouldn't it be enough to just call synchronize_irq() at this point, since the instance was removed from the list already? I guess the question is if that's the only way the interrupt handler can get hold of the instance. Best, Tomasz > v4l2_m2m_ctx_release(inst->m2m_ctx); > v4l2_m2m_release(inst->m2m_dev); > - hfi_session_destroy(inst); > v4l2_fh_del(&inst->fh); > v4l2_fh_exit(&inst->fh); > venc_ctrl_deinit(inst); >
On (24/10/24 14:18), Tomasz Figa wrote: > > @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file) > > > > venc_pm_get(inst); > > > > + /* > > + * First, remove the inst from the ->instances list, so that > > + * to_instance() will return NULL. > > + */ > > + hfi_session_destroy(inst); > > + /* > > + * Second, make sure we don't have IRQ/IRQ-thread currently running or > > + * pending execution (disable_irq() calls synchronize_irq()), which > > + * can race with the inst destruction. > > + */ > > + disable_irq(inst->core->irq); > > + /* > > + * Lastly, inst is gone from the core->instances list and we don't > > + * have running/pending IRQ/IRQ-thread, proceed with the destruction > > + */ > > + enable_irq(inst->core->irq); > > + > > Thanks a lot for looking into this. Wouldn't it be enough to just call > synchronize_irq() at this point, since the instance was removed from > the list already? I guess the question is if that's the only way the > interrupt handler can get hold of the instance. Good question. synchronize_irq() waits for IRQ-threads, so if inst is accessed only from IRQ-thread then we are fine. If, however, inst is also accessed from hard IRQ, then synchronize_irq() won't work, I guess, because it doesn't wait for "in flight hard IRQs". disable_irq() OTOH "waits for completion", so we cover in-flight hard IRQs too.
On Thu, Oct 24, 2024 at 2:46 PM Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > > On (24/10/24 14:18), Tomasz Figa wrote: > > > @@ -1538,9 +1538,25 @@ static int venc_close(struct file *file) > > > > > > venc_pm_get(inst); > > > > > > + /* > > > + * First, remove the inst from the ->instances list, so that > > > + * to_instance() will return NULL. > > > + */ > > > + hfi_session_destroy(inst); > > > + /* > > > + * Second, make sure we don't have IRQ/IRQ-thread currently running or > > > + * pending execution (disable_irq() calls synchronize_irq()), which > > > + * can race with the inst destruction. > > > + */ > > > + disable_irq(inst->core->irq); > > > + /* > > > + * Lastly, inst is gone from the core->instances list and we don't > > > + * have running/pending IRQ/IRQ-thread, proceed with the destruction > > > + */ > > > + enable_irq(inst->core->irq); > > > + > > > > Thanks a lot for looking into this. Wouldn't it be enough to just call > > synchronize_irq() at this point, since the instance was removed from > > the list already? I guess the question is if that's the only way the > > interrupt handler can get hold of the instance. > > Good question. > > synchronize_irq() waits for IRQ-threads, so if inst is accessed only from > IRQ-thread then we are fine. If, however, inst is also accessed from hard > IRQ, then synchronize_irq() won't work, I guess, because it doesn't wait > for "in flight hard IRQs". disable_irq() OTOH "waits for completion", so > we cover in-flight hard IRQs too. Looking at the code, synchronize_irq() internally also calls synchronize_hardirq() and that in turn waits for the IRQD_IRQ_INPROGESS flag to be cleared before returning [1]. The flag is set by handle_irq_event() before most of the IRQ handling is run and cleared at the end of the function [2], which makes me believe that it would actually ensure all the hardirq and threaded IRQ handlers would be waited for. [1] https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/manage.c#L38 [2] https://elixir.bootlin.com/linux/v6.11.5/source/kernel/irq/handle.c#L202 Although I guess it would be the best if someone confirmed that, because with all the IRQ handling complexities of SMP, nothing can be certain today. :P Best, Tomasz
diff --git a/drivers/media/platform/qcom/venus/vdec.c b/drivers/media/platform/qcom/venus/vdec.c index 0013c4704f03..ff1823bc967c 100644 --- a/drivers/media/platform/qcom/venus/vdec.c +++ b/drivers/media/platform/qcom/venus/vdec.c @@ -1747,6 +1747,13 @@ static int vdec_close(struct file *file) { struct venus_inst *inst = to_inst(file); + /* + * Make sure we don't have pending IRQs and no threaded IRQ handler is + * running nor pending (synchronize_irq)), which can race with inst + * destruction. + */ + disable_irq(inst->core->irq); + vdec_pm_get(inst); cancel_work_sync(&inst->delayed_process_work); @@ -1763,6 +1770,12 @@ static int vdec_close(struct file *file) vdec_pm_put(inst, false); + /* + * inst is gone from the core->instances list, re-enable IRQ and + * threaded IRQ + */ + enable_irq(inst->core->irq); + kfree(inst); return 0; } diff --git a/drivers/media/platform/qcom/venus/venc.c b/drivers/media/platform/qcom/venus/venc.c index 6a26a6592424..6575e84312fe 100644 --- a/drivers/media/platform/qcom/venus/venc.c +++ b/drivers/media/platform/qcom/venus/venc.c @@ -1515,6 +1515,13 @@ static int venc_close(struct file *file) { struct venus_inst *inst = to_inst(file); + /* + * Make sure we don't have pending IRQs and no threaded IRQ handler is + * running nor pending (synchronize_irq)), which can race with inst + * destruction. + */ + disable_irq(inst->core->irq); + venc_pm_get(inst); v4l2_m2m_ctx_release(inst->m2m_ctx); @@ -1530,6 +1537,12 @@ static int venc_close(struct file *file) venc_pm_put(inst, false); + /* + * inst is gone from the core->instances list, re-enable IRQ and + * threaded IRQ + */ + enable_irq(inst->core->irq); + kfree(inst); return 0; }
When destroy inst we should make sure that we don't race against threaded IRQ (or pending IRQ), otherwise we can concurrently kfree() inst context and inst itself. BUG: KASAN: slab-use-after-free in vb2_queue_error+0x80/0x90 Call trace: dump_backtrace+0x1c4/0x1f8 show_stack+0x38/0x60 dump_stack_lvl+0x168/0x1f0 print_report+0x170/0x4c8 kasan_report+0x94/0xd0 __asan_report_load2_noabort+0x20/0x30 vb2_queue_error+0x80/0x90 venus_helper_vb2_queue_error+0x54/0x78 venc_event_notify+0xec/0x158 hfi_event_notify+0x878/0xd20 hfi_process_msg_packet+0x27c/0x4e0 venus_isr_thread+0x258/0x6e8 hfi_isr_thread+0x70/0x90 venus_isr_thread+0x34/0x50 irq_thread_fn+0x88/0x130 irq_thread+0x160/0x2c0 kthread+0x294/0x328 ret_from_fork+0x10/0x20 Allocated by task 20291: kasan_set_track+0x4c/0x80 kasan_save_alloc_info+0x28/0x38 __kasan_kmalloc+0x84/0xa0 kmalloc_trace+0x7c/0x98 v4l2_m2m_ctx_init+0x74/0x280 venc_open+0x444/0x6d0 v4l2_open+0x19c/0x2a0 chrdev_open+0x374/0x3f0 do_dentry_open+0x710/0x10a8 vfs_open+0x88/0xa8 path_openat+0x1e6c/0x2700 do_filp_open+0x1a4/0x2e0 do_sys_openat2+0xe8/0x508 do_sys_open+0x15c/0x1a0 __arm64_sys_openat+0xa8/0xc8 invoke_syscall+0xdc/0x270 el0_svc_common+0x1ec/0x250 do_el0_svc+0x54/0x70 el0_svc+0x50/0xe8 el0t_64_sync_handler+0x48/0x120 el0t_64_sync+0x1a8/0x1b0 Guard inst destruction (both dec and enc) with hard and threaded IRQ synchronization. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> --- drivers/media/platform/qcom/venus/vdec.c | 13 +++++++++++++ drivers/media/platform/qcom/venus/venc.c | 13 +++++++++++++ 2 files changed, 26 insertions(+)