diff mbox series

[v4,2/2] ufs: core: requeue aborted request

Message ID 20240910073035.25974-3-peter.wang@mediatek.com (mailing list archive)
State Superseded
Headers show
Series fix abort defect | expand

Commit Message

Peter Wang (王信友) Sept. 10, 2024, 7:30 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

ufshcd_abort_all froce abort all on-going command and the host
will automatically fill in the OCS field of the corresponding
response with OCS_ABORTED based on different working modes.

MCQ mode: aborts a command using SQ cleanup, The host controller
will post a Completion Queue entry with OCS = ABORTED.

SDB mode: aborts a command using UTRLCLR. Task Management response
which means a Transfer Request was aborted.

For these two cases, set a flag to notify SCSI to requeue the
command after receiving response with OCS_ABORTED.

Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 34 +++++++++++++++++++---------------
 include/ufs/ufshcd.h      |  3 +++
 2 files changed, 22 insertions(+), 15 deletions(-)

Comments

Bart Van Assche Sept. 10, 2024, 5:59 p.m. UTC | #1
On 9/10/24 12:30 AM, peter.wang@mediatek.com wrote:
> ufshcd_abort_all froce abort all on-going command and the host
                    ^^^^^ ^^^^^              ^^^^^^^         ^^^^
                forcibly? aborts?            commands?   host controller?
> will automatically fill in the OCS field of the corresponding
> response with OCS_ABORTED based on different working modes.
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The host controller only sets the OCS field to OCS_ABORTED in MCQ mode
if the host controller successfully aborted the command. If the
abort TMF is submitted to the UFS device, the OCS field won't be changed
into OCS_ABORTED. In SDB mode, the host controller does not modify the 
OCS field either.

> SDB mode: aborts a command using UTRLCLR. Task Management response
> which means a Transfer Request was aborted.

Hmm ... my understanding is that clearing a bit from UTRLCLR is only
allowed *after* a command has been aborted and also that clearing a bit
from this register does not abort a command but only frees the resources
in the host controller associated with the command.

> For these two cases, set a flag to notify SCSI to requeue the
> command after receiving response with OCS_ABORTED.

I think there is only one case when the SCSI core needs to be requested
to requeue a command, namely when the UFS driver decided to initiate the
abort (ufshcd_abort_all()).

> @@ -7561,6 +7551,20 @@ int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
>   		goto out;
>   	}
>   
> +	/*
> +	 * When the host software receives a "FUNCTION COMPLETE", set flag
> +	 * to requeue command after receive response with OCS_ABORTED
> +	 * SDB mode: UTRLCLR Task Management response which means a Transfer
> +	 *           Request was aborted.
> +	 * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup
> +	 * This flag is set because ufshcd_abort_all forcibly aborts all
> +	 * commands, and the host will automatically fill in the OCS field
> +	 * of the corresponding response with OCS_ABORTED.
> +	 * Therefore, upon receiving this response, it needs to be requeued.
> +	 */
> +	if (!err)
> +		lrbp->abort_initiated_by_err = true;
> +
>   	err = ufshcd_clear_cmd(hba, tag);
>   	if (err)
>   		dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",

The above change is misplaced. ufshcd_try_to_abort_task() can be called
when the SCSI core decides to abort a command while
abort_initiated_by_err must not be set in that case. Please move the
above code block into ufshcd_abort_one().

Regarding the word "host" in the above comment block: the host is the 
Android device. I think that in the above comment "host" should be
changed into "host controller".

> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index 0fd2aebac728..15b357672ca5 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -173,6 +173,8 @@ struct ufs_pm_lvl_states {
>    * @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
>    * @data_unit_num: the data unit number for the first block for inline crypto
>    * @req_abort_skip: skip request abort task flag
> + * @abort_initiated_by_err: The flag is specifically used to handle aborts
> + *                          caused by errors due to host/device communication

The "abort_initiated_by_err" name still seems confusing to me. Please
make it more clear that this flag is only set if the UFS error handler
decides to abort a command. How about "abort_initiated_by_eh"?

Please also make the description of this member variable more clear.

Thanks,

Bart.
Peter Wang (王信友) Sept. 11, 2024, 6:03 a.m. UTC | #2
On Tue, 2024-09-10 at 10:59 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/10/24 12:30 AM, peter.wang@mediatek.com wrote:
> > ufshcd_abort_all froce abort all on-going command and the host
>                     ^^^^^ ^^^^^              ^^^^^^^         ^^^^
>                 forcibly? aborts?            commands?   host
> controller?
> 

Hi Bart,

Sorry, will correct words next version.


> > will automatically fill in the OCS field of the corresponding
> > response with OCS_ABORTED based on different working modes.
>                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> The host controller only sets the OCS field to OCS_ABORTED in MCQ
> mode
> if the host controller successfully aborted the command. If the
> abort TMF is submitted to the UFS device, the OCS field won't be
> changed
> into OCS_ABORTED. In SDB mode, the host controller does not modify
> the 
> OCS field either.
> 

This statement is not quite accurate becasue in UFSHIC2.1, SDB mode 
specification already have OCS: ABORTED (0x6) define.
And it is used in below UTRLCLR description:
'which means a Transfer Request was "aborted"'
Therefore, the host controller should follow the 
specification and fill the OCS field with OCS: ABORTED. 
If not so, at what point does your host controller use the 
OCS: ABORTED status?


> > SDB mode: aborts a command using UTRLCLR. Task Management response
> > which means a Transfer Request was aborted.
> 
> Hmm ... my understanding is that clearing a bit from UTRLCLR is only
> allowed *after* a command has been aborted and also that clearing a
> bit
> from this register does not abort a command but only frees the
> resources
> in the host controller associated with the command.
> 

Although this specification description does not explicitly 
state the OCS behavior, to my understanding, the specification
for MCQ abort behavior is formulated with reference to the SDB mode.


> > For these two cases, set a flag to notify SCSI to requeue the
> > command after receiving response with OCS_ABORTED.
> 
> I think there is only one case when the SCSI core needs to be
> requested
> to requeue a command, namely when the UFS driver decided to initiate
> the
> abort (ufshcd_abort_all()).
> 
> > @@ -7561,6 +7551,20 @@ int ufshcd_try_to_abort_task(struct ufs_hba
> *hba, int tag)
> >   goto out;
> >   }
> >   
> > +/*
> > + * When the host software receives a "FUNCTION COMPLETE", set flag
> > + * to requeue command after receive response with OCS_ABORTED
> > + * SDB mode: UTRLCLR Task Management response which means a
> Transfer
> > + *           Request was aborted.
> > + * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ
> cleanup
> > + * This flag is set because ufshcd_abort_all forcibly aborts all
> > + * commands, and the host will automatically fill in the OCS field
> > + * of the corresponding response with OCS_ABORTED.
> > + * Therefore, upon receiving this response, it needs to be
> requeued.
> > + */
> > +if (!err)
> > +lrbp->abort_initiated_by_err = true;
> > +
> >   err = ufshcd_clear_cmd(hba, tag);
> >   if (err)
> >   dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
> 
> The above change is misplaced. ufshcd_try_to_abort_task() can be
> called
> when the SCSI core decides to abort a command while
> abort_initiated_by_err must not be set in that case. Please move the
> above code block into ufshcd_abort_one().
> 

But move to ufshcd_abort_one may have race condition, beacause we
need set this flag before ufshcd_clear_cmd host controller fill
OCS_ABORTED to response. I will add check ufshcd_eh_in_progress.


> Regarding the word "host" in the above comment block: the host is
> the 
> Android device. I think that in the above comment "host" should be
> changed into "host controller".
> 

It will be changed to 'host controller' to make the comment clearer.


> > diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> > index 0fd2aebac728..15b357672ca5 100644
> > --- a/include/ufs/ufshcd.h
> > +++ b/include/ufs/ufshcd.h
> > @@ -173,6 +173,8 @@ struct ufs_pm_lvl_states {
> >    * @crypto_key_slot: the key slot to use for inline crypto (-1 if
> none)
> >    * @data_unit_num: the data unit number for the first block for
> inline crypto
> >    * @req_abort_skip: skip request abort task flag
> > + * @abort_initiated_by_err: The flag is specifically used to
> handle aborts
> > + *                          caused by errors due to host/device
> communication
> 
> The "abort_initiated_by_err" name still seems confusing to me. Please
> make it more clear that this flag is only set if the UFS error
> handler
> decides to abort a command. How about "abort_initiated_by_eh"?
> 
> Please also make the description of this member variable more clear.
> 

Sure, will change this name and make description clearer.

Thanks.
Peter



> Thanks,
> 
> Bart.
Bart Van Assche Sept. 11, 2024, 7:11 p.m. UTC | #3
On 9/10/24 11:03 PM, Peter Wang (王信友) wrote:
> This statement is not quite accurate becasue in UFSHIC2.1, SDB mode
> specification already have OCS: ABORTED (0x6) define.
> And it is used in below UTRLCLR description:
> 'which means a Transfer Request was "aborted"'
> Therefore, the host controller should follow the
> specification and fill the OCS field with OCS: ABORTED.
> If not so, at what point does your host controller use the
> OCS: ABORTED status?

Hmm ... I have not been able to find any explanation in the UFSHCI 2.1
specification that says when the OCS status is set to aborted. Did I
perhaps overlook something?

This is what I found in the UTRLCLR description: "The host software 
shall use this field only when a UTP Transfer Request is expected to
not be completed, e.g., when the host software receives a “FUNCTION
COMPLETE” Task Management response which means a Transfer Request was
aborted." This does not mean that the host controller is expected to
set the OCS status to "ABORTED". I will send an email to the JC-64
mailing list to request clarification.

>>> +/*
>>> + * When the host software receives a "FUNCTION COMPLETE", set flag
>>> + * to requeue command after receive response with OCS_ABORTED
>>> + * SDB mode: UTRLCLR Task Management response which means a
>> Transfer
>>> + *           Request was aborted.
>>> + * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ
>> cleanup
>>> + * This flag is set because ufshcd_abort_all forcibly aborts all
>>> + * commands, and the host will automatically fill in the OCS field
>>> + * of the corresponding response with OCS_ABORTED.
>>> + * Therefore, upon receiving this response, it needs to be
>> requeued.
>>> + */
>>> +if (!err)
>>> +lrbp->abort_initiated_by_err = true;
>>> +
>>>    err = ufshcd_clear_cmd(hba, tag);
>>>    if (err)
>>>    dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
>>
>> The above change is misplaced. ufshcd_try_to_abort_task() can be
>> called
>> when the SCSI core decides to abort a command while
>> abort_initiated_by_err must not be set in that case. Please move the
>> above code block into ufshcd_abort_one().
> 
> But move to ufshcd_abort_one may have race condition, beacause we
> need set this flag before ufshcd_clear_cmd host controller fill
> OCS_ABORTED to response. I will add check ufshcd_eh_in_progress.

Calling ufshcd_clear_cmd() does not affect the OCS status as far as I
know. Did I perhaps overlook something?

Thanks,

Bart.
Peter Wang (王信友) Sept. 12, 2024, 1:31 p.m. UTC | #4
On Wed, 2024-09-11 at 12:11 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/10/24 11:03 PM, Peter Wang (王信友) wrote:
> > This statement is not quite accurate becasue in UFSHIC2.1, SDB mode
> > specification already have OCS: ABORTED (0x6) define.
> > And it is used in below UTRLCLR description:
> > 'which means a Transfer Request was "aborted"'
> > Therefore, the host controller should follow the
> > specification and fill the OCS field with OCS: ABORTED.
> > If not so, at what point does your host controller use the
> > OCS: ABORTED status?
> 
> Hmm ... I have not been able to find any explanation in the UFSHCI
> 2.1
> specification that says when the OCS status is set to aborted. Did I
> perhaps overlook something?
> 
> This is what I found in the UTRLCLR description: "The host software 
> shall use this field only when a UTP Transfer Request is expected to
> not be completed, e.g., when the host software receives a “FUNCTION
> COMPLETE” Task Management response which means a Transfer Request was
> aborted." This does not mean that the host controller is expected to
> set the OCS status to "ABORTED". I will send an email to the JC-64
> mailing list to request clarification.
> 

Hi Bart,


Yes, you're right, just as I mentioned earlier, I also think
the spec does not explicitly state that UTRLC should have 
corresponding behavior for OCS. This might lead to inconsistencies
in how host controllers operate. 


> >>> +/*
> >>> + * When the host software receives a "FUNCTION COMPLETE", set
> flag
> >>> + * to requeue command after receive response with OCS_ABORTED
> >>> + * SDB mode: UTRLCLR Task Management response which means a
> >> Transfer
> >>> + *           Request was aborted.
> >>> + * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ
> >> cleanup
> >>> + * This flag is set because ufshcd_abort_all forcibly aborts all
> >>> + * commands, and the host will automatically fill in the OCS
> field
> >>> + * of the corresponding response with OCS_ABORTED.
> >>> + * Therefore, upon receiving this response, it needs to be
> >> requeued.
> >>> + */
> >>> +if (!err)
> >>> +lrbp->abort_initiated_by_err = true;
> >>> +
> >>>    err = ufshcd_clear_cmd(hba, tag);
> >>>    if (err)
> >>>    dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err
> %d\n",
> >>
> >> The above change is misplaced. ufshcd_try_to_abort_task() can be
> >> called
> >> when the SCSI core decides to abort a command while
> >> abort_initiated_by_err must not be set in that case. Please move
> the
> >> above code block into ufshcd_abort_one().
> > 
> > But move to ufshcd_abort_one may have race condition, beacause we
> > need set this flag before ufshcd_clear_cmd host controller fill
> > OCS_ABORTED to response. I will add check ufshcd_eh_in_progress.
> 
> Calling ufshcd_clear_cmd() does not affect the OCS status as far as I
> know. Did I perhaps overlook something?
> 

Because after ufshcd_clear_cmd,

in MCQ mode: 
ufshcd_mcq_sq_cleanup, host controller will post CQ response with OCS
ABORTED.

in SDB mode: 
ufshcd_utrl_clear set UTRLC, Mediatek host controller 
(may not all host controller) will post response with OCS ABORTED.

In both cases, we have an interrupt sent to the host, and there 
may be a race condition before we set this flag for requeue.
So I need to set this flag before ufshcd_clear_cmd.


Thanks.
Peter


> Thanks,
> 
> Bart.
Bart Van Assche Sept. 12, 2024, 9:17 p.m. UTC | #5
On 9/12/24 6:31 AM, Peter Wang (王信友) wrote:
> in SDB mode:
> ufshcd_utrl_clear set UTRLC, Mediatek host controller
> (may not all host controller) will post response with OCS ABORTED.
> 
> In both cases, we have an interrupt sent to the host, and there
> may be a race condition before we set this flag for requeue.
> So I need to set this flag before ufshcd_clear_cmd.

If a completion interrupt is sent to the host if a command has been
cleared in SDB mode (I doubt this is what happens), I think that's a
severe controller bug. A UFSHCI controller is not allowed to send a
completion interrupt to the host if a command is cleared by writing into
the UTRLCLR register.

Thanks,

Bart.
Peter Wang (王信友) Sept. 13, 2024, 7:10 a.m. UTC | #6
On Thu, 2024-09-12 at 14:17 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/12/24 6:31 AM, Peter Wang (王信友) wrote:
> > in SDB mode:
> > ufshcd_utrl_clear set UTRLC, Mediatek host controller
> > (may not all host controller) will post response with OCS ABORTED.
> > 
> > In both cases, we have an interrupt sent to the host, and there
> > may be a race condition before we set this flag for requeue.
> > So I need to set this flag before ufshcd_clear_cmd.
> 
> If a completion interrupt is sent to the host if a command has been
> cleared in SDB mode (I doubt this is what happens), I think that's a
> severe controller bug. A UFSHCI controller is not allowed to send a
> completion interrupt to the host if a command is cleared by writing
> into
> the UTRLCLR register.
> 
> Thanks,
> 
> Bart.

Hi Bart,

Because the MediaTek UFS controller uses UTRLCLR to clear 
commands and fills OCS with ABORTED. 

Regarding the specification of UTRCS:
This bit is set to '1' by the host controller upon one of the
following:
	Overall command Status (OCS) of the completed command is not 
	equal to 'SUCCESS' even if its UTRD Interrupt bit set to '0'

So, MediaTek host controller will send interrupt in this case.

Thanks.
Peter
Bart Van Assche Sept. 13, 2024, 5:41 p.m. UTC | #7
On 9/13/24 12:10 AM, Peter Wang (王信友) wrote:
> Because the MediaTek UFS controller uses UTRLCLR to clear
> commands and fills OCS with ABORTED.
> 
> Regarding the specification of UTRCS:
> This bit is set to '1' by the host controller upon one of the
> following:
> 	Overall command Status (OCS) of the completed command is not
> 	equal to 'SUCCESS' even if its UTRD Interrupt bit set to '0'
> 
> So, MediaTek host controller will send interrupt in this case.

Hi Peter,

Thank you for having shared this information. Please consider
introducing a quirk for ignoring completions triggered by clearing
a command, e.g. as follows (there may be better approaches):
* In ufshcd_clear_cmd(), before a command is cleared, initialize
   the completion that will be used for waiting for the completion
   interrupt. After a command has been cleared, call
   wait_for_completion_timeout().
* In ufshcd_compl_one_cqe(), check whether the completion is the
   result of a command being cleared. If so, call complete() instead
   of executing the regular completion code.

Thanks,

Bart.
Bart Van Assche Sept. 14, 2024, 4:13 p.m. UTC | #8
On 9/10/24 11:03 PM, Peter Wang (王信友) wrote:
> This statement is not quite accurate becasue in UFSHIC2.1, SDB mode
> specification already have OCS: ABORTED (0x6) define.

I think I found why that status code is defined in the UFSHCI 2.1
standard. From the UFS 2.0 standard: "TASK ABORTED - This status shall
be returned when a command is aborted by a command or task management
function on another I_T nexus and the Control mode page TAS bit is set
to one. Since in UFS there is only one I_T nexus and TAS bit is zero
TASK ABORTED status codes will never occur."

Bart.
Peter Wang (王信友) Sept. 18, 2024, 1:29 p.m. UTC | #9
On Fri, 2024-09-13 at 10:41 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/13/24 12:10 AM, Peter Wang (王信友) wrote:
> > Because the MediaTek UFS controller uses UTRLCLR to clear
> > commands and fills OCS with ABORTED.
> > 
> > Regarding the specification of UTRCS:
> > This bit is set to '1' by the host controller upon one of the
> > following:
> > Overall command Status (OCS) of the completed command is not
> > equal to 'SUCCESS' even if its UTRD Interrupt bit set to '0'
> > 
> > So, MediaTek host controller will send interrupt in this case.
> 
> Hi Peter,
> 
> Thank you for having shared this information. Please consider
> introducing a quirk for ignoring completions triggered by clearing
> a command, e.g. as follows (there may be better approaches):
> * In ufshcd_clear_cmd(), before a command is cleared, initialize
>    the completion that will be used for waiting for the completion
>    interrupt. After a command has been cleared, call
>    wait_for_completion_timeout().
> * In ufshcd_compl_one_cqe(), check whether the completion is the
>    result of a command being cleared. If so, call complete() instead
>    of executing the regular completion code.
> 
> Thanks,
> 
> Bart.


Hi Bart,

Sorry for the lack of response over the past few days due to the 
Mid-Autumn Festival in Taiwan. I think this would make the abort 
flow more complicated and difficult to understand.

Basically, this patch currently only needs to handle requeueing 
for the error handler abort.
The approach for DBR mode and MCQ mode should be consistent.
If receive an interrupt response (OCS:ABORTED or INVALID_OCS_VALUE), 
then set DID_REQUEUE. If there is no interrupt, it will also set 
SCSI DID_REQUEUE in ufshcd_err_handler through
ufshcd_complete_requests 
with force_compl = true.

As for the SCSI abort (30s) situation, in DBR mode, if there is an 
interrupt with OCS:ABORTED or INVALID_OCS_VALUE, then set DID_REQUEUE; 
if not, ufshcd_abort will also return SUCCESS, allowing SCSI to decide 
whether to retry the command. This part is less problematic, 
and we can keep the original approach.

The more problematic part is with MCQ mode. To imitate the DBR 
approach, we just need to set DID_REQUEUE upon receiving an interrupt. 
Everything else remains the same. This would make things simpler. 

Moving forward, if we want to simplify things and we have also 
taken stock of the two or three scenarios where OCS: ABORTED occurs, 
do we even need a flag? Couldn't we just set DID_REQUEUE directly 
for OCS: ABORTED?
What do you think?

Thanks.
Peter
Peter Wang (王信友) Sept. 18, 2024, 1:30 p.m. UTC | #10
On Sat, 2024-09-14 at 09:13 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/10/24 11:03 PM, Peter Wang (王信友) wrote:
> > This statement is not quite accurate becasue in UFSHIC2.1, SDB mode
> > specification already have OCS: ABORTED (0x6) define.
> 
> I think I found why that status code is defined in the UFSHCI 2.1
> standard. From the UFS 2.0 standard: "TASK ABORTED - This status
> shall
> be returned when a command is aborted by a command or task management
> function on another I_T nexus and the Control mode page TAS bit is
> set
> to one. Since in UFS there is only one I_T nexus and TAS bit is zero
> TASK ABORTED status codes will never occur."
> 
> Bart.
> 


Hi Bart,

Okay, I think this is a very good reason that explains 
why UFSHCI2.1 has defined OCS: ABORTED. Thank you for sharing 
this information.

Thanks.
Peter
Bart Van Assche Sept. 18, 2024, 6:29 p.m. UTC | #11
On 9/18/24 6:29 AM, Peter Wang (王信友) wrote:
> Basically, this patch currently only needs to handle requeueing
> for the error handler abort.
> The approach for DBR mode and MCQ mode should be consistent.
> If receive an interrupt response (OCS:ABORTED or INVALID_OCS_VALUE),
> then set DID_REQUEUE. If there is no interrupt, it will also set
> SCSI DID_REQUEUE in ufshcd_err_handler through
> ufshcd_complete_requests
> with force_compl = true.

Reporting a completion for commands cleared by writing into the legacy
UTRLCLR register is not compliant with any version of the UFSHCI
standard. Reporting a completion for commands cleared by writing into
that register is problematic because it causes ufshcd_release_scsi_cmd()
to be called as follows:

ufshcd_sl_intr()
   ufshcd_transfer_req_compl()
     ufshcd_poll()
       __ufshcd_transfer_req_compl()
         ufshcd_compl_one_cqe()
           cmd->result = ...
           ufshcd_release_scsi_cmd()
           scsi_done()

Calling ufshcd_release_scsi_cmd() if a command has been cleared is
problematic because the SCSI core does not expect this. If 
ufshcd_try_to_abort_task() clears a SCSI command, 
ufshcd_release_scsi_cmd() must not be called until the SCSI core
decides to release the command. This is why I wrote in a previous mail
that I think that a quirk should be introduced to suppress the
completions generated by clearing a SCSI command.

> The more problematic part is with MCQ mode. To imitate the DBR
> approach, we just need to set DID_REQUEUE upon receiving an interrupt.
> Everything else remains the same. This would make things simpler.
> 
> Moving forward, if we want to simplify things and we have also
> taken stock of the two or three scenarios where OCS: ABORTED occurs,
> do we even need a flag? Couldn't we just set DID_REQUEUE directly
> for OCS: ABORTED?
> What do you think?

How about making ufshcd_compl_one_cqe() skip entries with status
OCS_ABORTED? That would make ufshcd_compl_one_cqe() behave as the
SCSI core expects, namely not freeing any command resources if a
SCSI command is aborted successfully.

This approach may require further changes to ufshcd_abort_all().
In that function there are separate code paths for legacy and MCQ
mode. This is less than ideal. Would it be possible to combine
these code paths by removing the ufshcd_complete_requests() call
from ufshcd_abort_all() and by handling completions from inside
ufshcd_abort_one()?

Thanks,

Bart.
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a6f818cdef0e..615da47c1727 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -3006,6 +3006,7 @@  static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	ufshcd_prepare_lrbp_crypto(scsi_cmd_to_rq(cmd), lrbp);
 
 	lrbp->req_abort_skip = false;
+	lrbp->abort_initiated_by_err = false;
 
 	ufshcd_comp_scsi_upiu(hba, lrbp);
 
@@ -5404,7 +5405,10 @@  ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
 		}
 		break;
 	case OCS_ABORTED:
-		result |= DID_ABORT << 16;
+		if (lrbp->abort_initiated_by_err)
+			result |= DID_REQUEUE << 16;
+		else
+			result |= DID_ABORT << 16;
 		break;
 	case OCS_INVALID_COMMAND_STATUS:
 		result |= DID_REQUEUE << 16;
@@ -6471,26 +6475,12 @@  static bool ufshcd_abort_one(struct request *rq, void *priv)
 	struct scsi_device *sdev = cmd->device;
 	struct Scsi_Host *shost = sdev->host;
 	struct ufs_hba *hba = shost_priv(shost);
-	struct ufshcd_lrb *lrbp = &hba->lrb[tag];
-	struct ufs_hw_queue *hwq;
-	unsigned long flags;
 
 	*ret = ufshcd_try_to_abort_task(hba, tag);
 	dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
 		hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
 		*ret ? "failed" : "succeeded");
 
-	/* Release cmd in MCQ mode if abort succeeds */
-	if (hba->mcq_enabled && (*ret == 0)) {
-		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
-		if (!hwq)
-			return 0;
-		spin_lock_irqsave(&hwq->cq_lock, flags);
-		if (ufshcd_cmd_inflight(lrbp->cmd))
-			ufshcd_release_scsi_cmd(hba, lrbp);
-		spin_unlock_irqrestore(&hwq->cq_lock, flags);
-	}
-
 	return *ret == 0;
 }
 
@@ -7561,6 +7551,20 @@  int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
 		goto out;
 	}
 
+	/*
+	 * When the host software receives a "FUNCTION COMPLETE", set flag
+	 * to requeue command after receive response with OCS_ABORTED
+	 * SDB mode: UTRLCLR Task Management response which means a Transfer
+	 *           Request was aborted.
+	 * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup
+	 * This flag is set because ufshcd_abort_all forcibly aborts all
+	 * commands, and the host will automatically fill in the OCS field
+	 * of the corresponding response with OCS_ABORTED.
+	 * Therefore, upon receiving this response, it needs to be requeued.
+	 */
+	if (!err)
+		lrbp->abort_initiated_by_err = true;
+
 	err = ufshcd_clear_cmd(hba, tag);
 	if (err)
 		dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 0fd2aebac728..15b357672ca5 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -173,6 +173,8 @@  struct ufs_pm_lvl_states {
  * @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
  * @data_unit_num: the data unit number for the first block for inline crypto
  * @req_abort_skip: skip request abort task flag
+ * @abort_initiated_by_err: The flag is specifically used to handle aborts
+ *                          caused by errors due to host/device communication
  */
 struct ufshcd_lrb {
 	struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -202,6 +204,7 @@  struct ufshcd_lrb {
 #endif
 
 	bool req_abort_skip;
+	bool abort_initiated_by_err;
 };
 
 /**