diff mbox series

[v1] scsi: ufs: fix tm cmd timeout/ISR racing issue

Message ID 20211111094939.14991-1-peter.wang@mediatek.com (mailing list archive)
State Rejected
Headers show
Series [v1] scsi: ufs: fix tm cmd timeout/ISR racing issue | expand

Commit Message

Peter Wang (王信友) Nov. 11, 2021, 9:49 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

When tmc 100 ms timeout and recevied tmc complete ISR concurrently,
Bug happen because complete NULL poiner and KE.
Fix this racing issue by check NULL and use host_lock protect.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/scsi/ufs/ufshcd.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Nov. 15, 2021, 7:49 p.m. UTC | #1
On 11/11/21 1:49 AM, peter.wang@mediatek.com wrote:
> From: Peter Wang <peter.wang@mediatek.com>
> 
> When tmc 100 ms timeout and recevied tmc complete ISR concurrently,
> Bug happen because complete NULL poiner and KE.
> Fix this racing issue by check NULL and use host_lock protect.
> 
> Signed-off-by: Peter Wang <peter.wang@mediatek.com>
> ---
>   drivers/scsi/ufs/ufshcd.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 5c6a58a666d2..6821ceb6783e 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6442,7 +6442,8 @@ static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba)
>   		struct request *req = hba->tmf_rqs[tag];
>   		struct completion *c = req->end_io_data;
>   
> -		complete(c);
> +		if (c)
> +			complete(c);
>   		ret = IRQ_HANDLED;
>   	}
>   	spin_unlock_irqrestore(hba->host->host_lock, flags);
> @@ -6597,7 +6598,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>   		 * Make sure that ufshcd_compl_tm() does not trigger a
>   		 * use-after-free.
>   		 */
> +		spin_lock_irqsave(hba->host->host_lock, flags);
>   		req->end_io_data = NULL;
> +		spin_unlock_irqrestore(hba->host->host_lock, flags);
> +
>   		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>   		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
>   				__func__, tm_function);

Isn't this already addressed by Adrian Hunter's patches? See also
https://lore.kernel.org/linux-scsi/20211108064815.569494-1-adrian.hunter@intel.com/

Thanks,

Bart.
Peter Wang (王信友) Nov. 16, 2021, 6:57 a.m. UTC | #2
On 11/16/21 3:49 AM, Bart Van Assche wrote:
> On 11/11/21 1:49 AM, peter.wang@mediatek.com wrote:
>> From: Peter Wang <peter.wang@mediatek.com>
>>
>> When tmc 100 ms timeout and recevied tmc complete ISR concurrently,
>> Bug happen because complete NULL poiner and KE.
>> Fix this racing issue by check NULL and use host_lock protect.
>>
>> Signed-off-by: Peter Wang <peter.wang@mediatek.com>
>> ---
>>   drivers/scsi/ufs/ufshcd.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 5c6a58a666d2..6821ceb6783e 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -6442,7 +6442,8 @@ static irqreturn_t ufshcd_tmc_handler(struct 
>> ufs_hba *hba)
>>           struct request *req = hba->tmf_rqs[tag];
>>           struct completion *c = req->end_io_data;
>>   -        complete(c);
>> +        if (c)
>> +            complete(c);
>>           ret = IRQ_HANDLED;
>>       }
>>       spin_unlock_irqrestore(hba->host->host_lock, flags);
>> @@ -6597,7 +6598,10 @@ static int __ufshcd_issue_tm_cmd(struct 
>> ufs_hba *hba,
>>            * Make sure that ufshcd_compl_tm() does not trigger a
>>            * use-after-free.
>>            */
>> +        spin_lock_irqsave(hba->host->host_lock, flags);
>>           req->end_io_data = NULL;
>> +        spin_unlock_irqrestore(hba->host->host_lock, flags);
>> +
>>           ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>>           dev_err(hba->dev, "%s: task management cmd 0x%.2x 
>> timed-out\n",
>>                   __func__, tm_function);
>
> Isn't this already addressed by Adrian Hunter's patches? See also
> https://urldefense.com/v3/__https://lore.kernel.org/linux-scsi/20211108064815.569494-1-adrian.hunter@intel.com/__;!!CTRNKA9wMg0ARbw!zttcrXBZgCk261BxtN67hFHTMRzOwcDr1IVH8znRw4I0POKCxUijARo7H3btU8SfRQ$ 
>
> Thanks,
>
> Bart.
>
Hi Bart,

Yes, I will drop this patch.

By the way, we observe that 100ms TMC timeout value may not enough for 
some device, maybe we need enlarge this value?


Thanks

Peter
Bart Van Assche Nov. 16, 2021, 5:28 p.m. UTC | #3
On 11/15/21 22:57, Peter Wang wrote:
> By the way, we observe that 100ms TMC timeout value may not enough for
> some device, maybe we need enlarge this value?

Is that the TM_CMD_TIMEOUT constant? It surprises me that 100 ms is not 
enough. Will increasing that constant have a negative impact on the 
error handler in case it hits a task management timeout?

Thanks,

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 5c6a58a666d2..6821ceb6783e 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6442,7 +6442,8 @@  static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba)
 		struct request *req = hba->tmf_rqs[tag];
 		struct completion *c = req->end_io_data;
 
-		complete(c);
+		if (c)
+			complete(c);
 		ret = IRQ_HANDLED;
 	}
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
@@ -6597,7 +6598,10 @@  static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 		 * Make sure that ufshcd_compl_tm() does not trigger a
 		 * use-after-free.
 		 */
+		spin_lock_irqsave(hba->host->host_lock, flags);
 		req->end_io_data = NULL;
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+
 		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
 		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
 				__func__, tm_function);