diff mbox series

[v8,2/3] ufs: core: fix error handler process for MCQ abort

Message ID 20240923080344.19084-3-peter.wang@mediatek.com (mailing list archive)
State New
Headers show
Series fix abort defect | expand

Commit Message

Peter Wang (王信友) Sept. 23, 2024, 8:03 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

When the error handler successfully aborts a MCQ request,
it only releases the command and does not notify the SCSI layer.
This may cause another abort after 30 seconds timeout.
This patch notifies the SCSI layer to requeue the request.

Additionally, ignore the OCS: ABORTED CQ slot after MCQ mode
SQ cleanup. This makes the behavior of MCQ mode consistent with
that of legacy SDB mode.

Also, print logs for OCS: ABORTED and OCS_INVALID_COMMAND_STATUS
for debugging purposes.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Sept. 23, 2024, 6:19 p.m. UTC | #1
On 9/23/24 1:03 AM, peter.wang@mediatek.com wrote:
> When the error handler successfully aborts a MCQ request,
> it only releases the command and does not notify the SCSI layer.
> This may cause another abort after 30 seconds timeout.
> This patch notifies the SCSI layer to requeue the request.
> 
> Additionally, ignore the OCS: ABORTED CQ slot after MCQ mode
> SQ cleanup. This makes the behavior of MCQ mode consistent with
> that of legacy SDB mode.
> 
> Also, print logs for OCS: ABORTED and OCS_INVALID_COMMAND_STATUS
> for debugging purposes.

Although I like the approach of this patch, two comments below.

> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index a6f818cdef0e..b5c7bc50a27e 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -5405,9 +5405,15 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
>   		break;
>   	case OCS_ABORTED:
>   		result |= DID_ABORT << 16;
> +		dev_warn(hba->dev,
> +				"OCS aborted from controller = %x for tag %d\n",
> +				ocs, lrbp->task_tag);
>   		break;

Including the OCS status in this message seems redundant to me.

>   	case OCS_INVALID_COMMAND_STATUS:
>   		result |= DID_REQUEUE << 16;
> +		dev_warn(hba->dev,
> +				"OCS invaild from controller = %x for tag %d\n",
> +				ocs, lrbp->task_tag);

Also here, including the OCS status in this message seems redundant to me.

Please change "invaild" into "invalid".

> @@ -5526,6 +5532,18 @@ void ufshcd_compl_one_cqe(struct ufs_hba *hba, int task_tag,
>   			ufshcd_update_monitor(hba, lrbp);
>   		ufshcd_add_command_trace(hba, task_tag, UFS_CMD_COMP);
>   		cmd->result = ufshcd_transfer_rsp_status(hba, lrbp, cqe);
> +
> +		/*
> +		 * Ignore MCQ OCS: ABORTED posted by the host controller.
> +		 * This makes the behavior of MCQ mode consistent with that
> +		 * of legacy SDB mode.
> +		 */
> +		if (hba->mcq_enabled) {
> +			ocs = ufshcd_get_tr_ocs(lrbp, cqe);
> +			if (ocs == OCS_ABORTED)
> +				return;
> +		}

Why only ignore the OCS_ABORTED status in MCQ mode? Is my understanding
correct that MediaTek controllers can also report this status in legacy 
mode?

Thanks,

Bart.
Peter Wang (王信友) Sept. 24, 2024, 8:48 a.m. UTC | #2
On Mon, 2024-09-23 at 11:19 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/23/24 1:03 AM, peter.wang@mediatek.com wrote:
> > When the error handler successfully aborts a MCQ request,
> > it only releases the command and does not notify the SCSI layer.
> > This may cause another abort after 30 seconds timeout.
> > This patch notifies the SCSI layer to requeue the request.
> > 
> > Additionally, ignore the OCS: ABORTED CQ slot after MCQ mode
> > SQ cleanup. This makes the behavior of MCQ mode consistent with
> > that of legacy SDB mode.
> > 
> > Also, print logs for OCS: ABORTED and OCS_INVALID_COMMAND_STATUS
> > for debugging purposes.
> 
> Although I like the approach of this patch, two comments below.
> 
> > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> > index a6f818cdef0e..b5c7bc50a27e 100644
> > --- a/drivers/ufs/core/ufshcd.c
> > +++ b/drivers/ufs/core/ufshcd.c
> > @@ -5405,9 +5405,15 @@ ufshcd_transfer_rsp_status(struct ufs_hba
> *hba, struct ufshcd_lrb *lrbp,
> >   break;
> >   case OCS_ABORTED:
> >   result |= DID_ABORT << 16;
> > +dev_warn(hba->dev,
> > +"OCS aborted from controller = %x for tag %d\n",
> > +ocs, lrbp->task_tag);
> >   break;
> 
> Including the OCS status in this message seems redundant to me.
> 
> >   case OCS_INVALID_COMMAND_STATUS:
> >   result |= DID_REQUEUE << 16;
> > +dev_warn(hba->dev,
> > +"OCS invaild from controller = %x for tag %d\n",
> > +ocs, lrbp->task_tag);
> 
> Also here, including the OCS status in this message seems redundant
> to me.
> 
> Please change "invaild" into "invalid".

Hi Bart,

Okay, I will revise the content printed here.


> 
> > @@ -5526,6 +5532,18 @@ void ufshcd_compl_one_cqe(struct ufs_hba
> *hba, int task_tag,
> >   ufshcd_update_monitor(hba, lrbp);
> >   ufshcd_add_command_trace(hba, task_tag, UFS_CMD_COMP);
> >   cmd->result = ufshcd_transfer_rsp_status(hba, lrbp, cqe);
> > +
> > +/*
> > + * Ignore MCQ OCS: ABORTED posted by the host controller.
> > + * This makes the behavior of MCQ mode consistent with that
> > + * of legacy SDB mode.
> > + */
> > +if (hba->mcq_enabled) {
> > +ocs = ufshcd_get_tr_ocs(lrbp, cqe);
> > +if (ocs == OCS_ABORTED)
> > +return;
> > +}
> 
> Why only ignore the OCS_ABORTED status in MCQ mode? Is my
> understanding
> correct that MediaTek controllers can also report this status in
> legacy 
> mode?
> 
> Thanks,

Because according to the specification, only MCQ will have 
OCS: ABORTED. MediaTek, even with a quirk handling SDB OCS ABORTED,
cannot simply ignore it here and not deal with it. 
Otherwise, it would need to release directly like MCQ ufshcd_abort_one.

Thanks
Peter


> 
> Bart.
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a6f818cdef0e..b5c7bc50a27e 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -5405,9 +5405,15 @@  ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
 		break;
 	case OCS_ABORTED:
 		result |= DID_ABORT << 16;
+		dev_warn(hba->dev,
+				"OCS aborted from controller = %x for tag %d\n",
+				ocs, lrbp->task_tag);
 		break;
 	case OCS_INVALID_COMMAND_STATUS:
 		result |= DID_REQUEUE << 16;
+		dev_warn(hba->dev,
+				"OCS invaild from controller = %x for tag %d\n",
+				ocs, lrbp->task_tag);
 		break;
 	case OCS_INVALID_CMD_TABLE_ATTR:
 	case OCS_INVALID_PRDT_ATTR:
@@ -5526,6 +5532,18 @@  void ufshcd_compl_one_cqe(struct ufs_hba *hba, int task_tag,
 			ufshcd_update_monitor(hba, lrbp);
 		ufshcd_add_command_trace(hba, task_tag, UFS_CMD_COMP);
 		cmd->result = ufshcd_transfer_rsp_status(hba, lrbp, cqe);
+
+		/*
+		 * Ignore MCQ OCS: ABORTED posted by the host controller.
+		 * This makes the behavior of MCQ mode consistent with that
+		 * of legacy SDB mode.
+		 */
+		if (hba->mcq_enabled) {
+			ocs = ufshcd_get_tr_ocs(lrbp, cqe);
+			if (ocs == OCS_ABORTED)
+				return;
+		}
+
 		ufshcd_release_scsi_cmd(hba, lrbp);
 		/* Do not touch lrbp after scsi done */
 		scsi_done(cmd);
@@ -6486,8 +6504,11 @@  static bool ufshcd_abort_one(struct request *rq, void *priv)
 		if (!hwq)
 			return 0;
 		spin_lock_irqsave(&hwq->cq_lock, flags);
-		if (ufshcd_cmd_inflight(lrbp->cmd))
+		if (ufshcd_cmd_inflight(lrbp->cmd)) {
+			set_host_byte(lrbp->cmd, DID_REQUEUE);
 			ufshcd_release_scsi_cmd(hba, lrbp);
+			scsi_done(lrbp->cmd);
+		}
 		spin_unlock_irqrestore(&hwq->cq_lock, flags);
 	}