diff mbox series

[1/1] scsi: ufs: core: Fix task management completion timeout race

Message ID 20211013150116.31158-2-adrian.hunter@intel.com (mailing list archive)
State Superseded
Headers show
Series scsi: ufs: core: Fix task management completion timeout race | expand

Commit Message

Adrian Hunter Oct. 13, 2021, 3:01 p.m. UTC
__ufshcd_issue_tm_cmd() clears req->end_io_data after timing out,
which races with the completion function ufshcd_tmc_handler() which
expects req->end_io_data to have a value.

Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
host_lock spinlock.

It is also not necessary (nor typical) to clear req->end_io_data because
the block layer does it before allocating out requests e.g. via
blk_get_request().

So fix by not clearing it.

Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/scsi/ufs/ufshcd.c | 5 -----
 1 file changed, 5 deletions(-)

Comments

Bart Van Assche Oct. 14, 2021, 4:14 a.m. UTC | #1
On 10/13/21 08:01, Adrian Hunter wrote:
> __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out,
> which races with the completion function ufshcd_tmc_handler() which
> expects req->end_io_data to have a value.
> 
> Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
> synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
> host_lock spinlock.
> 
> It is also not necessary (nor typical) to clear req->end_io_data because
> the block layer does it before allocating out requests e.g. via
> blk_get_request().
> 
> So fix by not clearing it.
> 
> Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion")
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>   drivers/scsi/ufs/ufshcd.c | 5 -----
>   1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 95be7ecdfe10..f34b3994d1aa 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>   	err = wait_for_completion_io_timeout(&wait,
>   			msecs_to_jiffies(TM_CMD_TIMEOUT));
>   	if (!err) {
> -		/*
> -		 * Make sure that ufshcd_compl_tm() does not trigger a
> -		 * use-after-free.
> -		 */
> -		req->end_io_data = NULL;
>   		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>   		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
>   				__func__, tm_function);

With this patch applied ufshcd_tmc_handler() can trigger a 
use-after-free of the stack memory used for the 'wait' completion. 
Wouldn't it be better to keep the code that clears req->end_io_data and 
to change complete(c) into if(c) complete(c) in ufshcd_tmc_handler()?

Thanks,

Bart.
Adrian Hunter Oct. 14, 2021, 6:02 a.m. UTC | #2
On 14/10/2021 07:14, Bart Van Assche wrote:
> On 10/13/21 08:01, Adrian Hunter wrote:
>> __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out,
>> which races with the completion function ufshcd_tmc_handler() which
>> expects req->end_io_data to have a value.
>>
>> Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
>> synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
>> host_lock spinlock.
>>
>> It is also not necessary (nor typical) to clear req->end_io_data because
>> the block layer does it before allocating out requests e.g. via
>> blk_get_request().
>>
>> So fix by not clearing it.
>>
>> Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion")
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>   drivers/scsi/ufs/ufshcd.c | 5 -----
>>   1 file changed, 5 deletions(-)
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 95be7ecdfe10..f34b3994d1aa 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>>       err = wait_for_completion_io_timeout(&wait,
>>               msecs_to_jiffies(TM_CMD_TIMEOUT));
>>       if (!err) {
>> -        /*
>> -         * Make sure that ufshcd_compl_tm() does not trigger a
>> -         * use-after-free.
>> -         */
>> -        req->end_io_data = NULL;
>>           ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>>           dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
>>                   __func__, tm_function);
> 
> With this patch applied ufshcd_tmc_handler() can trigger a use-after-free of the stack memory used for the 'wait' completion.

AFAICT that would only happen because blk_mq_tagset_busy_iter() is not synchronized with respect to blk_put_request().
But ufshcd_tmc_handler() does not use blk_mq_tagset_busy_iter() anymore, so that can't happen.

Wouldn't it be better to keep the code that clears req->end_io_data and to change complete(c) into if(c) complete(c) in ufshcd_tmc_handler()?

If that were needed, it would imply the synchronization was broken i.e. why are we referencing a request that has already been through blk_put_request()?
Bart Van Assche Oct. 14, 2021, 4:47 p.m. UTC | #3
On 10/13/21 11:02 PM, Adrian Hunter wrote:
> On 14/10/2021 07:14, Bart Van Assche wrote:
>> Wouldn't it be better to keep the code that clears req->end_io_data
>> and to change complete(c) into if(c) complete(c) in
>> ufshcd_tmc_handler()?
> 
> If that were needed, it would imply the synchronization was broken
> i.e. why are we referencing a request that has already been through
> blk_put_request()?

The scenario I'm worried about is as follows:
* __ufshcd_issue_tm_cmd() issues a task management function.
* No completion is received before TM_CMD_TIMEOUT has expired (100 ms).
* ufshcd_clear_tm_cmd() fails.
* The TMF completes, ufshcd_tmc_handler() is called and that function 
calls complete(req->end_io_data).

Can this happen?

I agree that this scenario involves completion of a request that has 
already been through blk_put_request().

Thanks,

Bart.
Adrian Hunter Oct. 14, 2021, 6:50 p.m. UTC | #4
On 14/10/2021 19:47, Bart Van Assche wrote:
> On 10/13/21 11:02 PM, Adrian Hunter wrote:
>> On 14/10/2021 07:14, Bart Van Assche wrote:
>>> Wouldn't it be better to keep the code that clears req->end_io_data
>>> and to change complete(c) into if(c) complete(c) in
>>> ufshcd_tmc_handler()?
>>
>> If that were needed, it would imply the synchronization was broken
>> i.e. why are we referencing a request that has already been through
>> blk_put_request()?
> 
> The scenario I'm worried about is as follows:
> * __ufshcd_issue_tm_cmd() issues a task management function.
> * No completion is received before TM_CMD_TIMEOUT has expired (100 ms).
> * ufshcd_clear_tm_cmd() fails.
> * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data).
> 
> Can this happen?

No because the tag's bit is cleared from outstanding_tasks before blk_put_request() and
access to outstanding_tasks is protected by host_lock in both __ufshcd_issue_tm_cmd()
and ufshcd_clear_tm_cmd().

> 
> I agree that this scenario involves completion of a request that has already been through blk_put_request().
> 
> Thanks,
> 
> Bart.
Bart Van Assche Oct. 14, 2021, 7:18 p.m. UTC | #5
On 10/13/21 8:01 AM, Adrian Hunter wrote:
> __ufshcd_issue_tm_cmd() clears req->end_io_data after timing out,
> which races with the completion function ufshcd_tmc_handler() which
> expects req->end_io_data to have a value.
> 
> Note __ufshcd_issue_tm_cmd() and ufshcd_tmc_handler() are already
> synchronized using hba->tmf_rqs and hba->outstanding_tasks under the
> host_lock spinlock.
> 
> It is also not necessary (nor typical) to clear req->end_io_data because
> the block layer does it before allocating out requests e.g. via
> blk_get_request().
> 
> So fix by not clearing it.
> 
> Fixes: f5ef336fd2e4c3 ("scsi: ufs: core: Fix task management completion")
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>   drivers/scsi/ufs/ufshcd.c | 5 -----
>   1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 95be7ecdfe10..f34b3994d1aa 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6550,11 +6550,6 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
>   	err = wait_for_completion_io_timeout(&wait,
>   			msecs_to_jiffies(TM_CMD_TIMEOUT));
>   	if (!err) {
> -		/*
> -		 * Make sure that ufshcd_compl_tm() does not trigger a
> -		 * use-after-free.
> -		 */
> -		req->end_io_data = NULL;
>   		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
>   		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
>   				__func__, tm_function);

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Adrian Hunter Oct. 15, 2021, 5:41 a.m. UTC | #6
On 14/10/2021 21:50, Adrian Hunter wrote:
> On 14/10/2021 19:47, Bart Van Assche wrote:
>> On 10/13/21 11:02 PM, Adrian Hunter wrote:
>>> On 14/10/2021 07:14, Bart Van Assche wrote:
>>>> Wouldn't it be better to keep the code that clears req->end_io_data
>>>> and to change complete(c) into if(c) complete(c) in
>>>> ufshcd_tmc_handler()?
>>>
>>> If that were needed, it would imply the synchronization was broken
>>> i.e. why are we referencing a request that has already been through
>>> blk_put_request()?
>>
>> The scenario I'm worried about is as follows:
>> * __ufshcd_issue_tm_cmd() issues a task management function.
>> * No completion is received before TM_CMD_TIMEOUT has expired (100 ms).
>> * ufshcd_clear_tm_cmd() fails.
>> * The TMF completes, ufshcd_tmc_handler() is called and that function calls complete(req->end_io_data).
>>
>> Can this happen?
> 
> No because the tag's bit is cleared from outstanding_tasks before blk_put_request() and
> access to outstanding_tasks is protected by host_lock in both __ufshcd_issue_tm_cmd()
> and ufshcd_clear_tm_cmd().

Although I just noticed a different issue.

In ufshcd_tmc_handler() the task doorbell register needs to be read
in conjunction with outstanding_tasks i.e. under the spinlock

I will send a V2.

> 
>>
>> I agree that this scenario involves completion of a request that has already been through blk_put_request().
>>
>> Thanks,
>>
>> Bart.
>
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 95be7ecdfe10..f34b3994d1aa 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6550,11 +6550,6 @@  static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 	err = wait_for_completion_io_timeout(&wait,
 			msecs_to_jiffies(TM_CMD_TIMEOUT));
 	if (!err) {
-		/*
-		 * Make sure that ufshcd_compl_tm() does not trigger a
-		 * use-after-free.
-		 */
-		req->end_io_data = NULL;
 		ufshcd_add_tm_upiu_trace(hba, task_tag, UFS_TM_ERR);
 		dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
 				__func__, tm_function);