[v1] scsi: ufs: fix tm cmd timeout/ISR racing issue

Message ID	20211111094939.14991-1-peter.wang@mediatek.com (mailing list archive)
State	Rejected
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: <peter.wang@mediatek.com> To: <stanley.chu@mediatek.com>, <linux-scsi@vger.kernel.org>, <martin.petersen@oracle.com>, <avri.altman@wdc.com>, <alim.akhtar@samsung.com>, <jejb@linux.ibm.com> CC: <wsd_upstream@mediatek.com>, <linux-mediatek@lists.infradead.org>, <peter.wang@mediatek.com>, <chun-hung.wu@mediatek.com>, <alice.chao@mediatek.com>, <cc.chou@mediatek.com>, <chaotian.jing@mediatek.com>, <jiajie.hao@mediatek.com>, <powen.kao@mediatek.com>, <jonathan.hsu@mediatek.com>, <qilin.tan@mediatek.com>, <lin.gui@mediatek.com>, <mikebi@micron.com> Subject: [PATCH v1] scsi: ufs: fix tm cmd timeout/ISR racing issue Date: Thu, 11 Nov 2021 17:49:39 +0800 Message-ID: <20211111094939.14991-1-peter.wang@mediatek.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	[v1] scsi: ufs: fix tm cmd timeout/ISR racing issue \| expand [v1] scsi: ufs: fix tm cmd timeout/ISR racing issue

Message ID

20211111094939.14991-1-peter.wang@mediatek.com (mailing list archive)

State

Rejected

Headers

From: <peter.wang@mediatek.com>
To: <stanley.chu@mediatek.com>, <linux-scsi@vger.kernel.org>,
        <martin.petersen@oracle.com>, <avri.altman@wdc.com>,
        <alim.akhtar@samsung.com>, <jejb@linux.ibm.com>
CC: <wsd_upstream@mediatek.com>, <linux-mediatek@lists.infradead.org>,
        <peter.wang@mediatek.com>, <chun-hung.wu@mediatek.com>,
        <alice.chao@mediatek.com>, <cc.chou@mediatek.com>,
        <chaotian.jing@mediatek.com>, <jiajie.hao@mediatek.com>,
        <powen.kao@mediatek.com>, <jonathan.hsu@mediatek.com>,
        <qilin.tan@mediatek.com>, <lin.gui@mediatek.com>,
        <mikebi@micron.com>
Subject: [PATCH v1] scsi: ufs: fix tm cmd timeout/ISR racing issue
Date: Thu, 11 Nov 2021 17:49:39 +0800
Message-ID: <20211111094939.14991-1-peter.wang@mediatek.com>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk

Series

[v1] scsi: ufs: fix tm cmd timeout/ISR racing issue | expand

Commit Message

Peter Wang (王信友) Nov. 11, 2021, 9:49 a.m. UTC

From: Peter Wang <peter.wang@mediatek.com>

When tmc 100 ms timeout and recevied tmc complete ISR concurrently,
Bug happen because complete NULL poiner and KE.
Fix this racing issue by check NULL and use host_lock protect.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/scsi/ufs/ufshcd.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Nov. 15, 2021, 7:49 p.m. UTC | #1

On 11/11/21 1:49 AM, peter.wang@mediatek.com wrote:
> From: Peter Wang <peter.wang@mediatek.com>
> 
> When tmc 100 ms timeout and recevied tmc complete ISR concurrently,
> Bug happen because complete NULL poiner and KE.
> Fix this racing issue by check NULL and use host_lock protect.
> 
> Signed-off-by: Peter Wang <peter.wang@mediatek.com>
> ---
>   drivers/scsi/ufs/ufshcd.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 5c6a58a666d2..6821ceb6783e 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6442,7 +6442,8 @@ static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba)
>   		struct request *req = hba->tmf_rqs[tag];
>   		struct completion *c = req->end_io_data;
>   
> -		complete(c);
> +		if (c)
> +			complete(c);
>   		ret = IRQ_HANDLED;
>   	}
>   	spin_unlock_irqrestore(hba->host->host_lock, flags);
> @@ -6597,7 +6598,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>   		 * Make sure that ufshcd_compl_tm() does not trigger a
>   		 * use-after-free.
>   		 */
> +		spin_lock_irqsave(hba->host->host_lock, flags);
>   		req->end_io_data = NULL;
> +		spin_unlock_irqrestore(hba->host->host_lock, flags);
> +
>   		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>   		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
>   				__func__, tm_function);

Isn't this already addressed by Adrian Hunter's patches? See also
https://lore.kernel.org/linux-scsi/20211108064815.569494-1-adrian.hunter@intel.com/

Thanks,

Bart.

Peter Wang (王信友) Nov. 16, 2021, 6:57 a.m. UTC | #2

On 11/16/21 3:49 AM, Bart Van Assche wrote:
> On 11/11/21 1:49 AM, peter.wang@mediatek.com wrote:
>> From: Peter Wang <peter.wang@mediatek.com>
>>
>> When tmc 100 ms timeout and recevied tmc complete ISR concurrently,
>> Bug happen because complete NULL poiner and KE.
>> Fix this racing issue by check NULL and use host_lock protect.
>>
>> Signed-off-by: Peter Wang <peter.wang@mediatek.com>
>> ---
>>   drivers/scsi/ufs/ufshcd.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 5c6a58a666d2..6821ceb6783e 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -6442,7 +6442,8 @@ static irqreturn_t ufshcd_tmc_handler(struct 
>> ufs_hba *hba)
>>           struct request *req = hba->tmf_rqs[tag];
>>           struct completion *c = req->end_io_data;
>>   -        complete(c);
>> +        if (c)
>> +            complete(c);
>>           ret = IRQ_HANDLED;
>>       }
>>       spin_unlock_irqrestore(hba->host->host_lock, flags);
>> @@ -6597,7 +6598,10 @@ static int __ufshcd_issue_tm_cmd(struct 
>> ufs_hba *hba,
>>            * Make sure that ufshcd_compl_tm() does not trigger a
>>            * use-after-free.
>>            */
>> +        spin_lock_irqsave(hba->host->host_lock, flags);
>>           req->end_io_data = NULL;
>> +        spin_unlock_irqrestore(hba->host->host_lock, flags);
>> +
>>           ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>>           dev_err(hba->dev, "%s: task management cmd 0x%.2x 
>> timed-out\n",
>>                   __func__, tm_function);
>
> Isn't this already addressed by Adrian Hunter's patches? See also
> https://urldefense.com/v3/__https://lore.kernel.org/linux-scsi/20211108064815.569494-1-adrian.hunter@intel.com/__;!!CTRNKA9wMg0ARbw!zttcrXBZgCk261BxtN67hFHTMRzOwcDr1IVH8znRw4I0POKCxUijARo7H3btU8SfRQ$ 
>
> Thanks,
>
> Bart.
>
Hi Bart,

Yes, I will drop this patch.

By the way, we observe that 100ms TMC timeout value may not enough for 
some device, maybe we need enlarge this value?


Thanks

Peter

Bart Van Assche Nov. 16, 2021, 5:28 p.m. UTC | #3

On 11/15/21 22:57, Peter Wang wrote:
> By the way, we observe that 100ms TMC timeout value may not enough for
> some device, maybe we need enlarge this value?

Is that the TM_CMD_TIMEOUT constant? It surprises me that 100 ms is not 
enough. Will increasing that constant have a negative impact on the 
error handler in case it hits a task management timeout?

Thanks,

Bart.

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 5c6a58a666d2..6821ceb6783e 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6442,7 +6442,8 @@  static irqreturn_t ufshcd_tmc_handler(struct ufs_hba *hba)
 		struct request *req = hba->tmf_rqs[tag];
 		struct completion *c = req->end_io_data;
 
-		complete(c);
+		if (c)
+			complete(c);
 		ret = IRQ_HANDLED;
 	}
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
@@ -6597,7 +6598,10 @@  static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 		 * Make sure that ufshcd_compl_tm() does not trigger a
 		 * use-after-free.
 		 */
+		spin_lock_irqsave(hba->host->host_lock, flags);
 		req->end_io_data = NULL;
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+
 		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
 		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
 				__func__, tm_function);

[v1] scsi: ufs: fix tm cmd timeout/ISR racing issue

Commit Message

Comments

Patch