diff mbox series

[RESEND,1/3] venus: core: handle race condititon for core ops

Message ID 1596781478-12216-2-git-send-email-mansur@codeaurora.org (mailing list archive)
State Superseded
Headers show
Series Venus - Handle race conditions in concurrency | expand

Commit Message

Mansur Alisha Shaik Aug. 7, 2020, 6:24 a.m. UTC
For core ops we are having only write protect but there
is no read protect, because of this in multthreading
and concurrency, one CPU core is reading without wait
which is causing the NULL pointer dereferece crash.

one such scenario is as show below, where in one
core core->ops becoming NULL and in another core
calling core->ops->session_init().

CPU: core-7:
Call trace:
 hfi_session_init+0x180/0x1dc [venus_core]
 vdec_queue_setup+0x9c/0x364 [venus_dec]
 vb2_core_reqbufs+0x1e4/0x368 [videobuf2_common]
 vb2_reqbufs+0x4c/0x64 [videobuf2_v4l2]
 v4l2_m2m_reqbufs+0x50/0x84 [v4l2_mem2mem]
 v4l2_m2m_ioctl_reqbufs+0x2c/0x38 [v4l2_mem2mem]
 v4l_reqbufs+0x4c/0x5c
__video_do_ioctl+0x2b0/0x39c

CPU: core-0:
Call trace:
 venus_shutdown+0x98/0xfc [venus_core]
 venus_sys_error_handler+0x64/0x148 [venus_core]
 process_one_work+0x210/0x3d0
 worker_thread+0x248/0x3f4
 kthread+0x11c/0x12c

Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org>
---
 drivers/media/platform/qcom/venus/core.c | 2 +-
 drivers/media/platform/qcom/venus/hfi.c  | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

Comments

Stanimir Varbanov Aug. 10, 2020, 9:50 a.m. UTC | #1
Hi Mansur,

Thanks for the patches!

On 8/7/20 9:24 AM, Mansur Alisha Shaik wrote:
> For core ops we are having only write protect but there
> is no read protect, because of this in multthreading
> and concurrency, one CPU core is reading without wait
> which is causing the NULL pointer dereferece crash.
> 
> one such scenario is as show below, where in one
> core core->ops becoming NULL and in another core
> calling core->ops->session_init().
> 
> CPU: core-7:
> Call trace:
>  hfi_session_init+0x180/0x1dc [venus_core]
>  vdec_queue_setup+0x9c/0x364 [venus_dec]
>  vb2_core_reqbufs+0x1e4/0x368 [videobuf2_common]
>  vb2_reqbufs+0x4c/0x64 [videobuf2_v4l2]
>  v4l2_m2m_reqbufs+0x50/0x84 [v4l2_mem2mem]
>  v4l2_m2m_ioctl_reqbufs+0x2c/0x38 [v4l2_mem2mem]
>  v4l_reqbufs+0x4c/0x5c
> __video_do_ioctl+0x2b0/0x39c
> 
> CPU: core-0:
> Call trace:
>  venus_shutdown+0x98/0xfc [venus_core]
>  venus_sys_error_handler+0x64/0x148 [venus_core]
>  process_one_work+0x210/0x3d0
>  worker_thread+0x248/0x3f4
>  kthread+0x11c/0x12c
> 
> Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org>
> ---
>  drivers/media/platform/qcom/venus/core.c | 2 +-
>  drivers/media/platform/qcom/venus/hfi.c  | 5 ++++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
> index 203c653..fe99c83 100644
> --- a/drivers/media/platform/qcom/venus/core.c
> +++ b/drivers/media/platform/qcom/venus/core.c
> @@ -64,8 +64,8 @@ static void venus_sys_error_handler(struct work_struct *work)
>  	pm_runtime_get_sync(core->dev);
>  
>  	hfi_core_deinit(core, true);
> -	hfi_destroy(core);
>  	mutex_lock(&core->lock);
> +	hfi_destroy(core);

As my recovery fixes [1] touches this part also, could you please apply
them on top of yours and re-test?

Otherwise this patch looks good to me.

[1] https://www.spinics.net/lists/linux-arm-msm/msg70092.html

>  	venus_shutdown(core);
>  
>  	pm_runtime_put_sync(core->dev);
> diff --git a/drivers/media/platform/qcom/venus/hfi.c b/drivers/media/platform/qcom/venus/hfi.c
> index a211eb9..2eeb31f 100644
> --- a/drivers/media/platform/qcom/venus/hfi.c
> +++ b/drivers/media/platform/qcom/venus/hfi.c
> @@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(hfi_session_create);
>  int hfi_session_init(struct venus_inst *inst, u32 pixfmt)
>  {
>  	struct venus_core *core = inst->core;
> -	const struct hfi_ops *ops = core->ops;
> +	const struct hfi_ops *ops;
>  	int ret;
>  
>  	if (inst->state != INST_UNINIT)
> @@ -204,10 +204,13 @@ int hfi_session_init(struct venus_inst *inst, u32 pixfmt)
>  	inst->hfi_codec = to_codec_type(pixfmt);
>  	reinit_completion(&inst->done);
>  
> +	mutex_lock(&core->lock);
> +	ops = core->ops;
>  	ret = ops->session_init(inst, inst->session_type, inst->hfi_codec);
>  	if (ret)
>  		return ret;
>  
> +	mutex_unlock(&core->lock);
>  	ret = wait_session_msg(inst);
>  	if (ret)
>  		return ret;
>
Stanimir Varbanov Aug. 21, 2020, 10:59 a.m. UTC | #2
Hi Mansur,

On 8/10/20 12:50 PM, Stanimir Varbanov wrote:
> Hi Mansur,
> 
> Thanks for the patches!
> 
> On 8/7/20 9:24 AM, Mansur Alisha Shaik wrote:
>> For core ops we are having only write protect but there
>> is no read protect, because of this in multthreading
>> and concurrency, one CPU core is reading without wait
>> which is causing the NULL pointer dereferece crash.
>>
>> one such scenario is as show below, where in one
>> core core->ops becoming NULL and in another core
>> calling core->ops->session_init().
>>
>> CPU: core-7:
>> Call trace:
>>  hfi_session_init+0x180/0x1dc [venus_core]
>>  vdec_queue_setup+0x9c/0x364 [venus_dec]
>>  vb2_core_reqbufs+0x1e4/0x368 [videobuf2_common]
>>  vb2_reqbufs+0x4c/0x64 [videobuf2_v4l2]
>>  v4l2_m2m_reqbufs+0x50/0x84 [v4l2_mem2mem]
>>  v4l2_m2m_ioctl_reqbufs+0x2c/0x38 [v4l2_mem2mem]
>>  v4l_reqbufs+0x4c/0x5c
>> __video_do_ioctl+0x2b0/0x39c
>>
>> CPU: core-0:
>> Call trace:
>>  venus_shutdown+0x98/0xfc [venus_core]
>>  venus_sys_error_handler+0x64/0x148 [venus_core]
>>  process_one_work+0x210/0x3d0
>>  worker_thread+0x248/0x3f4
>>  kthread+0x11c/0x12c
>>
>> Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org>
>> ---
>>  drivers/media/platform/qcom/venus/core.c | 2 +-
>>  drivers/media/platform/qcom/venus/hfi.c  | 5 ++++-
>>  2 files changed, 5 insertions(+), 2 deletions(-)

See below comment, otherwise:

Acked-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>

>>
>> diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
>> index 203c653..fe99c83 100644
>> --- a/drivers/media/platform/qcom/venus/core.c
>> +++ b/drivers/media/platform/qcom/venus/core.c
>> @@ -64,8 +64,8 @@ static void venus_sys_error_handler(struct work_struct *work)
>>  	pm_runtime_get_sync(core->dev);
>>  
>>  	hfi_core_deinit(core, true);
>> -	hfi_destroy(core);
>>  	mutex_lock(&core->lock);
>> +	hfi_destroy(core);
> 
> As my recovery fixes [1] touches this part also, could you please apply
> them on top of yours and re-test?

I'll drop above chunk from the patch because it is already taken into
account in my recovery fixes series and queue up the patch for v5.10.

> 
> Otherwise this patch looks good to me.
> 
> [1] https://www.spinics.net/lists/linux-arm-msm/msg70092.html
> 
>>  	venus_shutdown(core);
>>  
>>  	pm_runtime_put_sync(core->dev);
>> diff --git a/drivers/media/platform/qcom/venus/hfi.c b/drivers/media/platform/qcom/venus/hfi.c
>> index a211eb9..2eeb31f 100644
>> --- a/drivers/media/platform/qcom/venus/hfi.c
>> +++ b/drivers/media/platform/qcom/venus/hfi.c
>> @@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(hfi_session_create);
>>  int hfi_session_init(struct venus_inst *inst, u32 pixfmt)
>>  {
>>  	struct venus_core *core = inst->core;
>> -	const struct hfi_ops *ops = core->ops;
>> +	const struct hfi_ops *ops;
>>  	int ret;
>>  
>>  	if (inst->state != INST_UNINIT)
>> @@ -204,10 +204,13 @@ int hfi_session_init(struct venus_inst *inst, u32 pixfmt)
>>  	inst->hfi_codec = to_codec_type(pixfmt);
>>  	reinit_completion(&inst->done);
>>  
>> +	mutex_lock(&core->lock);
>> +	ops = core->ops;
>>  	ret = ops->session_init(inst, inst->session_type, inst->hfi_codec);
>>  	if (ret)
>>  		return ret;
>>  
>> +	mutex_unlock(&core->lock);
>>  	ret = wait_session_msg(inst);
>>  	if (ret)
>>  		return ret;
>>
>
Mansur Alisha Shaik Aug. 25, 2020, 1:43 a.m. UTC | #3
On 2020-08-21 16:29, Stanimir Varbanov wrote:
> Hi Mansur,
> 
> On 8/10/20 12:50 PM, Stanimir Varbanov wrote:
>> Hi Mansur,
>> 
>> Thanks for the patches!
>> 
>> On 8/7/20 9:24 AM, Mansur Alisha Shaik wrote:
>>> For core ops we are having only write protect but there
>>> is no read protect, because of this in multthreading
>>> and concurrency, one CPU core is reading without wait
>>> which is causing the NULL pointer dereferece crash.
>>> 
>>> one such scenario is as show below, where in one
>>> core core->ops becoming NULL and in another core
>>> calling core->ops->session_init().
>>> 
>>> CPU: core-7:
>>> Call trace:
>>>  hfi_session_init+0x180/0x1dc [venus_core]
>>>  vdec_queue_setup+0x9c/0x364 [venus_dec]
>>>  vb2_core_reqbufs+0x1e4/0x368 [videobuf2_common]
>>>  vb2_reqbufs+0x4c/0x64 [videobuf2_v4l2]
>>>  v4l2_m2m_reqbufs+0x50/0x84 [v4l2_mem2mem]
>>>  v4l2_m2m_ioctl_reqbufs+0x2c/0x38 [v4l2_mem2mem]
>>>  v4l_reqbufs+0x4c/0x5c
>>> __video_do_ioctl+0x2b0/0x39c
>>> 
>>> CPU: core-0:
>>> Call trace:
>>>  venus_shutdown+0x98/0xfc [venus_core]
>>>  venus_sys_error_handler+0x64/0x148 [venus_core]
>>>  process_one_work+0x210/0x3d0
>>>  worker_thread+0x248/0x3f4
>>>  kthread+0x11c/0x12c
>>> 
>>> Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org>
>>> ---
>>>  drivers/media/platform/qcom/venus/core.c | 2 +-
>>>  drivers/media/platform/qcom/venus/hfi.c  | 5 ++++-
>>>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> See below comment, otherwise:
> 
> Acked-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
> 
>>> 
>>> diff --git a/drivers/media/platform/qcom/venus/core.c 
>>> b/drivers/media/platform/qcom/venus/core.c
>>> index 203c653..fe99c83 100644
>>> --- a/drivers/media/platform/qcom/venus/core.c
>>> +++ b/drivers/media/platform/qcom/venus/core.c
>>> @@ -64,8 +64,8 @@ static void venus_sys_error_handler(struct 
>>> work_struct *work)
>>>  	pm_runtime_get_sync(core->dev);
>>> 
>>>  	hfi_core_deinit(core, true);
>>> -	hfi_destroy(core);
>>>  	mutex_lock(&core->lock);
>>> +	hfi_destroy(core);
>> 
>> As my recovery fixes [1] touches this part also, could you please 
>> apply
>> them on top of yours and re-test?
> 
> I'll drop above chunk from the patch because it is already taken into
> account in my recovery fixes series and queue up the patch for v5.10.
> 
yes, you can drop. I have validated these patches on top of your 
recovery patch
series. I will push V2 with dependency on "venus - recovery from 
frimware crash"
series 
(https://lore.kernel.org/patchwork/project/lkml/list/?series=455962)

>> 
>> Otherwise this patch looks good to me.
>> 
>> [1] https://www.spinics.net/lists/linux-arm-msm/msg70092.html
>> 
>>>  	venus_shutdown(core);
>>> 
>>>  	pm_runtime_put_sync(core->dev);
>>> diff --git a/drivers/media/platform/qcom/venus/hfi.c 
>>> b/drivers/media/platform/qcom/venus/hfi.c
>>> index a211eb9..2eeb31f 100644
>>> --- a/drivers/media/platform/qcom/venus/hfi.c
>>> +++ b/drivers/media/platform/qcom/venus/hfi.c
>>> @@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(hfi_session_create);
>>>  int hfi_session_init(struct venus_inst *inst, u32 pixfmt)
>>>  {
>>>  	struct venus_core *core = inst->core;
>>> -	const struct hfi_ops *ops = core->ops;
>>> +	const struct hfi_ops *ops;
>>>  	int ret;
>>> 
>>>  	if (inst->state != INST_UNINIT)
>>> @@ -204,10 +204,13 @@ int hfi_session_init(struct venus_inst *inst, 
>>> u32 pixfmt)
>>>  	inst->hfi_codec = to_codec_type(pixfmt);
>>>  	reinit_completion(&inst->done);
>>> 
>>> +	mutex_lock(&core->lock);
>>> +	ops = core->ops;
>>>  	ret = ops->session_init(inst, inst->session_type, inst->hfi_codec);
>>>  	if (ret)
>>>  		return ret;
>>> 
>>> +	mutex_unlock(&core->lock);
>>>  	ret = wait_session_msg(inst);
>>>  	if (ret)
>>>  		return ret;
>>> 
>>
Stanimir Varbanov Sept. 10, 2020, 10:43 a.m. UTC | #4
Hi Mansur,

On 8/7/20 9:24 AM, Mansur Alisha Shaik wrote:
> For core ops we are having only write protect but there
> is no read protect, because of this in multthreading
> and concurrency, one CPU core is reading without wait
> which is causing the NULL pointer dereferece crash.
> 
> one such scenario is as show below, where in one
> core core->ops becoming NULL and in another core
> calling core->ops->session_init().
> 
> CPU: core-7:
> Call trace:
>  hfi_session_init+0x180/0x1dc [venus_core]

I thought more on this issue. I think we have to return error from
hfi_session_init() in the case when the driver is in
system-error-handler. Infact all userspace ioctls must end up with error
while we are in recovery state. What do you think?

>  vdec_queue_setup+0x9c/0x364 [venus_dec]
>  vb2_core_reqbufs+0x1e4/0x368 [videobuf2_common]
>  vb2_reqbufs+0x4c/0x64 [videobuf2_v4l2]
>  v4l2_m2m_reqbufs+0x50/0x84 [v4l2_mem2mem]
>  v4l2_m2m_ioctl_reqbufs+0x2c/0x38 [v4l2_mem2mem]
>  v4l_reqbufs+0x4c/0x5c
> __video_do_ioctl+0x2b0/0x39c
> 
> CPU: core-0:
> Call trace:
>  venus_shutdown+0x98/0xfc [venus_core]
>  venus_sys_error_handler+0x64/0x148 [venus_core]
>  process_one_work+0x210/0x3d0
>  worker_thread+0x248/0x3f4
>  kthread+0x11c/0x12c
> 
> Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org>
> ---
>  drivers/media/platform/qcom/venus/core.c | 2 +-
>  drivers/media/platform/qcom/venus/hfi.c  | 5 ++++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
> index 203c653..fe99c83 100644
> --- a/drivers/media/platform/qcom/venus/core.c
> +++ b/drivers/media/platform/qcom/venus/core.c
> @@ -64,8 +64,8 @@ static void venus_sys_error_handler(struct work_struct *work)
>  	pm_runtime_get_sync(core->dev);
>  
>  	hfi_core_deinit(core, true);
> -	hfi_destroy(core);
>  	mutex_lock(&core->lock);
> +	hfi_destroy(core);
>  	venus_shutdown(core);
>  
>  	pm_runtime_put_sync(core->dev);
> diff --git a/drivers/media/platform/qcom/venus/hfi.c b/drivers/media/platform/qcom/venus/hfi.c
> index a211eb9..2eeb31f 100644
> --- a/drivers/media/platform/qcom/venus/hfi.c
> +++ b/drivers/media/platform/qcom/venus/hfi.c
> @@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(hfi_session_create);
>  int hfi_session_init(struct venus_inst *inst, u32 pixfmt)
>  {
>  	struct venus_core *core = inst->core;
> -	const struct hfi_ops *ops = core->ops;
> +	const struct hfi_ops *ops;
>  	int ret;
>  
>  	if (inst->state != INST_UNINIT)
> @@ -204,10 +204,13 @@ int hfi_session_init(struct venus_inst *inst, u32 pixfmt)
>  	inst->hfi_codec = to_codec_type(pixfmt);
>  	reinit_completion(&inst->done);
>  
> +	mutex_lock(&core->lock);
> +	ops = core->ops;
>  	ret = ops->session_init(inst, inst->session_type, inst->hfi_codec);
>  	if (ret)
>  		return ret;
>  
> +	mutex_unlock(&core->lock);
>  	ret = wait_session_msg(inst);
>  	if (ret)
>  		return ret;
>
diff mbox series

Patch

diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c
index 203c653..fe99c83 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -64,8 +64,8 @@  static void venus_sys_error_handler(struct work_struct *work)
 	pm_runtime_get_sync(core->dev);
 
 	hfi_core_deinit(core, true);
-	hfi_destroy(core);
 	mutex_lock(&core->lock);
+	hfi_destroy(core);
 	venus_shutdown(core);
 
 	pm_runtime_put_sync(core->dev);
diff --git a/drivers/media/platform/qcom/venus/hfi.c b/drivers/media/platform/qcom/venus/hfi.c
index a211eb9..2eeb31f 100644
--- a/drivers/media/platform/qcom/venus/hfi.c
+++ b/drivers/media/platform/qcom/venus/hfi.c
@@ -195,7 +195,7 @@  EXPORT_SYMBOL_GPL(hfi_session_create);
 int hfi_session_init(struct venus_inst *inst, u32 pixfmt)
 {
 	struct venus_core *core = inst->core;
-	const struct hfi_ops *ops = core->ops;
+	const struct hfi_ops *ops;
 	int ret;
 
 	if (inst->state != INST_UNINIT)
@@ -204,10 +204,13 @@  int hfi_session_init(struct venus_inst *inst, u32 pixfmt)
 	inst->hfi_codec = to_codec_type(pixfmt);
 	reinit_completion(&inst->done);
 
+	mutex_lock(&core->lock);
+	ops = core->ops;
 	ret = ops->session_init(inst, inst->session_type, inst->hfi_codec);
 	if (ret)
 		return ret;
 
+	mutex_unlock(&core->lock);
 	ret = wait_session_msg(inst);
 	if (ret)
 		return ret;