diff mbox series

[v1,6/6] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests

Message ID 1620885319-15151-8-git-send-email-cang@codeaurora.org (mailing list archive)
State Superseded
Headers show
Series Complementary changes for error handling | expand

Commit Message

Can Guo May 13, 2021, 5:55 a.m. UTC
If PM requests fail during runtime suspend/resume, RPM framework saves the
error to dev->power.runtime_error. Before the runtime_error gets cleared,
runtime PM on this specific device won't work again, leaving the device
in either suspended or active state permanently.

When task abort happens to a PM request sent during runtime suspend/resume,
even if it can be successfully aborted, RPM framework anyways saves the
(TIMEOUT) error. But we want more and we can do better - let error handling
recover and clear the runtime_error. So, let PM requests take the fast
abort path in ufshcd_abort().

Signed-off-by: Can Guo <cang@codeaurora.org>
---
 drivers/scsi/ufs/ufshcd.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

Comments

Bart Van Assche May 14, 2021, 4:05 a.m. UTC | #1
On 5/12/21 10:55 PM, Can Guo wrote:
> If PM requests fail during runtime suspend/resume, RPM framework saves the
> error to dev->power.runtime_error. Before the runtime_error gets cleared,
> runtime PM on this specific device won't work again, leaving the device
> in either suspended or active state permanently.
> 
> When task abort happens to a PM request sent during runtime suspend/resume,
> even if it can be successfully aborted, RPM framework anyways saves the
> (TIMEOUT) error. But we want more and we can do better - let error handling
> recover and clear the runtime_error. So, let PM requests take the fast
> abort path in ufshcd_abort().

The only RQF_PM requests I know of are START STOP UNIT and SYNCHRONIZE
CACHE. Are there devices for which these commands can time out or do
these commands perhaps only time out as the result of error injection?

> -	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN) {
> +	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN ||
> +	    (cmd->request->rq_flags & RQF_PM)) {

Which are the RQF_PM commands that are not sent to a WLUN? Are these
START STOP UNIT and SYNCHRONIZE CACHE only?

Thanks,

Bart.
Can Guo May 14, 2021, 4:17 a.m. UTC | #2
On 2021-05-14 12:05, Bart Van Assche wrote:
> On 5/12/21 10:55 PM, Can Guo wrote:
>> If PM requests fail during runtime suspend/resume, RPM framework saves 
>> the
>> error to dev->power.runtime_error. Before the runtime_error gets 
>> cleared,
>> runtime PM on this specific device won't work again, leaving the 
>> device
>> in either suspended or active state permanently.
>> 
>> When task abort happens to a PM request sent during runtime 
>> suspend/resume,
>> even if it can be successfully aborted, RPM framework anyways saves 
>> the
>> (TIMEOUT) error. But we want more and we can do better - let error 
>> handling
>> recover and clear the runtime_error. So, let PM requests take the fast
>> abort path in ufshcd_abort().
> 
> The only RQF_PM requests I know of are START STOP UNIT and SYNCHRONIZE
> CACHE. Are there devices for which these commands can time out or do
> these commands perhaps only time out as the result of error injection?

There are also REQUEST SENSE requests sent with RQF_PM flag set from
pm ops. And they do time out (device does not respond in 60s) in real
cases, at least I have seen quite a lot of related issues reported
from customers these years.

> 
>> -	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN) {
>> +	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN ||
>> +	    (cmd->request->rq_flags & RQF_PM)) {
> 
> Which are the RQF_PM commands that are not sent to a WLUN? Are these
> START STOP UNIT and SYNCHRONIZE CACHE only?
> 

There are also REQUEST SENSE cmds sent to the RPMB W-LU, in 
ufshcd_add_wlus(),
ufshcd_err_handler() and ufshcd_rpmb_resume() and/or ufshcd_wl_resume().

And SYNCHRONIZE CACHE cmd is only sent to general LUs, but not W-LUs.

Thanks,
Can Guo.

> Thanks,
> 
> Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index a6313cf40..2a814e2 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2643,7 +2643,7 @@  static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 
 	lrbp = &hba->lrb[tag];
 	if (unlikely(lrbp->in_use)) {
-		if (hba->wl_pm_op_in_progress)
+		if (cmd->request->rq_flags & RQF_PM)
 			set_host_byte(cmd, DID_BAD_TARGET);
 		else
 			err = SCSI_MLQUEUE_HOST_BUSY;
@@ -2690,7 +2690,7 @@  static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 		 * err handler blocked for too long. So, just fail the scsi cmd
 		 * sent from PM ops, err handler can recover PM error anyways.
 		 */
-		if (hba->wl_pm_op_in_progress) {
+		if (cmd->request->rq_flags & RQF_PM) {
 			hba->force_reset = true;
 			set_host_byte(cmd, DID_BAD_TARGET);
 			goto out_compl_cmd;
@@ -6959,14 +6959,17 @@  static int ufshcd_abort(struct scsi_cmnd *cmd)
 	}
 
 	/*
-	 * Task abort to the device W-LUN is illegal. When this command
-	 * will fail, due to spec violation, scsi err handling next step
-	 * will be to send LU reset which, again, is a spec violation.
-	 * To avoid these unnecessary/illegal steps, first we clean up
-	 * the lrb taken by this cmd and mark the lrb as in_use, then
-	 * queue the eh_work and bail.
+	 * This fast path guarantees the cmd always gets aborted successfully,
+	 * meanwhile it invokes the error handler. It allows contexts, which
+	 * are blocked by this cmd, to fail fast. It serves multiple purposes:
+	 * #1 To avoid unnecessary/illagal abort attempts to the W-LU.
+	 * #2 To avoid live lock between eh_work and specific contexts, i.e.,
+	 *    suspend/resume and eh_work itself.
+	 * #3 To let eh_work recover runtime PM error in case abort happens
+	 *    to cmds sent from runtime suspend/resume ops.
 	 */
-	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN) {
+	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN ||
+	    (cmd->request->rq_flags & RQF_PM)) {
 		ufshcd_update_evt_hist(hba, UFS_EVT_ABORT, lrbp->lun);
 		spin_lock_irqsave(host->host_lock, flags);
 		if (lrbp->cmd) {