Message ID | 20240925095546.19492-3-peter.wang@mediatek.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fix abort defect | expand |
On 9/25/24 2:55 AM, peter.wang@mediatek.com wrote: > case OCS_INVALID_COMMAND_STATUS: > result |= DID_REQUEUE << 16; > + dev_warn(hba->dev, > + "OCS invaild from controller for tag %d\n", > + lrbp->task_tag); Please change "invaild" into "invalid". Once that change has been made, feel free to add: Reviewed-by: Bart Van Assche <bvanassche@acm.org>
On Wed, 2024-09-25 at 09:49 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/25/24 2:55 AM, peter.wang@mediatek.com wrote: > > case OCS_INVALID_COMMAND_STATUS: > > result |= DID_REQUEUE << 16; > > +dev_warn(hba->dev, > > +"OCS invaild from controller for tag %d\n", > > +lrbp->task_tag); > > Please change "invaild" into "invalid". Once that change has been > made, > feel free to add: > > Reviewed-by: Bart Van Assche <bvanassche@acm.org> Hi Bart, Sorry, this typo wasn't corrected. However, I still feel that this patch is not quite reasonable. The reason is that in the first patch, "ufs: core: fix the issue of ICU failure" we corrected the ICU problem, allowing the SQ to clean up correctly, while the CQ will have a corresponding response. But the second patch directly ignores the CQ's response and continues to use a workaround to release the command right after ufshcd_try_to_abort_task. The Legacy SDB mode's approach would not release the command after ufshcd_try_to_abort_task. Instead, it releases after the DBR is cleared. Therefore, as I previously suggested, it would probably be more reasonable to directly requeue the ABORTED commands as shown in the patch below. --------------------------------------------------------------------- --- diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 24a32e2fd75e..06aa4ed1a9e6 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -5417,10 +5417,12 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp, } break; case OCS_ABORTED: - result |= DID_ABORT << 16; - break; case OCS_INVALID_COMMAND_STATUS: result |= DID_REQUEUE << 16; + dev_warn(hba->dev, + "OCS %s from controller for tag %d\n", + (ocs == OCS_ABORTED? "aborted" : "invalid"), + lrbp->task_tag); break; case OCS_INVALID_CMD_TABLE_ATTR: case OCS_INVALID_PRDT_ATTR: @@ -6466,26 +6468,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv) struct scsi_device *sdev = cmd->device; struct Scsi_Host *shost = sdev->host; struct ufs_hba *hba = shost_priv(shost); - struct ufshcd_lrb *lrbp = &hba->lrb[tag]; - struct ufs_hw_queue *hwq; - unsigned long flags; *ret = ufshcd_try_to_abort_task(hba, tag); dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag, hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1, *ret ? "failed" : "succeeded"); - /* Release cmd in MCQ mode if abort succeeds */ - if (hba->mcq_enabled && (*ret == 0)) { - hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp- >cmd)); - if (!hwq) - return 0; - spin_lock_irqsave(&hwq->cq_lock, flags); - if (ufshcd_cmd_inflight(lrbp->cmd)) - ufshcd_release_scsi_cmd(hba, lrbp); - spin_unlock_irqrestore(&hwq->cq_lock, flags); - } - return *ret == 0; } --------------------------------------------------------------------- --- This patch has several advantages: 1. It makes the patch 'ufs: core: fix the issue of ICU failure' seem valuable. 2. The patch is more concise. 3. There is no need to fetch OCS to determine OCS: ABORTED on every CQ completion, which increases ISR time. 4. The err_handler flow for SDB and MCQ would be consistent. 5. There is no need for the MediaTek SDB quirk. What do you think?"
On 9/25/24 8:45 PM, Peter Wang (王信友) wrote: > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index 24a32e2fd75e..06aa4ed1a9e6 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -5417,10 +5417,12 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, > struct ufshcd_lrb *lrbp, > } > break; > case OCS_ABORTED: > - result |= DID_ABORT << 16; > - break; > case OCS_INVALID_COMMAND_STATUS: > result |= DID_REQUEUE << 16; > + dev_warn(hba->dev, > + "OCS %s from controller for tag %d\n", > + (ocs == OCS_ABORTED? "aborted" : > "invalid"), > + lrbp->task_tag); > break; > case OCS_INVALID_CMD_TABLE_ATTR: > case OCS_INVALID_PRDT_ATTR: > @@ -6466,26 +6468,12 @@ static bool ufshcd_abort_one(struct request > *rq, void *priv) > struct scsi_device *sdev = cmd->device; > struct Scsi_Host *shost = sdev->host; > struct ufs_hba *hba = shost_priv(shost); > - struct ufshcd_lrb *lrbp = &hba->lrb[tag]; > - struct ufs_hw_queue *hwq; > - unsigned long flags; > > *ret = ufshcd_try_to_abort_task(hba, tag); > dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag, > hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1, > *ret ? "failed" : "succeeded"); > > - /* Release cmd in MCQ mode if abort succeeds */ > - if (hba->mcq_enabled && (*ret == 0)) { > - hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp- >> cmd)); > - if (!hwq) > - return 0; > - spin_lock_irqsave(&hwq->cq_lock, flags); > - if (ufshcd_cmd_inflight(lrbp->cmd)) > - ufshcd_release_scsi_cmd(hba, lrbp); > - spin_unlock_irqrestore(&hwq->cq_lock, flags); > - } > - > return *ret == 0; > } > > --------------------------------------------------------------------- > > > This patch has several advantages: > > 1. It makes the patch 'ufs: core: fix the issue of ICU failure' > seem valuable. > 2. The patch is more concise. > 3. There is no need to fetch OCS to determine OCS: ABORTED > on every CQ completion, which increases ISR time. > 4. The err_handler flow for SDB and MCQ would be consistent. > 5. There is no need for the MediaTek SDB quirk. > > > What do you think?" Hi Peter, Is the above patch sufficient? In MCQ mode, aborting a command happens as follows (simplified): (1) Send the ABORT TASK TMF. If this TMF succeeds, no SQE will be generated. If this TMF succeeds it means that the SCSI command has reached the UFS device and hence is no longer present in any submission queue (SQ). (2) If the command is still in a submission queue, nullify the SQE. In this case a CQE will be generated with status ABORTED. It seems to me that the above patch handles (2) but not (1)? Thanks, Bart.
On Thu, 2024-09-26 at 11:26 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/25/24 8:45 PM, Peter Wang (王信友) wrote: > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > > index 24a32e2fd75e..06aa4ed1a9e6 100644 > > --- a/drivers/ufs/core/ufshcd.c > > +++ b/drivers/ufs/core/ufshcd.c > > @@ -5417,10 +5417,12 @@ ufshcd_transfer_rsp_status(struct ufs_hba > *hba, > > struct ufshcd_lrb *lrbp, > > } > > break; > > case OCS_ABORTED: > > -result |= DID_ABORT << 16; > > -break; > > case OCS_INVALID_COMMAND_STATUS: > > result |= DID_REQUEUE << 16; > > +dev_warn(hba->dev, > > +"OCS %s from controller for tag %d\n", > > +(ocs == OCS_ABORTED? "aborted" : > > "invalid"), > > +lrbp->task_tag); > > break; > > case OCS_INVALID_CMD_TABLE_ATTR: > > case OCS_INVALID_PRDT_ATTR: > > @@ -6466,26 +6468,12 @@ static bool ufshcd_abort_one(struct request > > *rq, void *priv) > > struct scsi_device *sdev = cmd->device; > > struct Scsi_Host *shost = sdev->host; > > struct ufs_hba *hba = shost_priv(shost); > > -struct ufshcd_lrb *lrbp = &hba->lrb[tag]; > > -struct ufs_hw_queue *hwq; > > -unsigned long flags; > > > > *ret = ufshcd_try_to_abort_task(hba, tag); > > dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag, > > hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1, > > *ret ? "failed" : "succeeded"); > > > > -/* Release cmd in MCQ mode if abort succeeds */ > > -if (hba->mcq_enabled && (*ret == 0)) { > > -hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp- > >> cmd)); > > -if (!hwq) > > -return 0; > > -spin_lock_irqsave(&hwq->cq_lock, flags); > > -if (ufshcd_cmd_inflight(lrbp->cmd)) > > -ufshcd_release_scsi_cmd(hba, lrbp); > > -spin_unlock_irqrestore(&hwq->cq_lock, flags); > > -} > > - > > return *ret == 0; > > } > > > > ----------------------------------------------------------------- > ---- > > > > > > This patch has several advantages: > > > > 1. It makes the patch 'ufs: core: fix the issue of ICU failure' > > seem valuable. > > 2. The patch is more concise. > > 3. There is no need to fetch OCS to determine OCS: ABORTED > > on every CQ completion, which increases ISR time. > > 4. The err_handler flow for SDB and MCQ would be consistent. > > 5. There is no need for the MediaTek SDB quirk. > > > > > > What do you think?" > > Hi Peter, > > Is the above patch sufficient? In MCQ mode, aborting a command > happens > as follows (simplified): > (1) Send the ABORT TASK TMF. If this TMF succeeds, no SQE will be > generated. If this TMF succeeds it means that the SCSI command > has > reached the UFS device and hence is no longer present in any > submission queue (SQ). > (2) If the command is still in a submission queue, nullify the SQE. > In > this case a CQE will be generated with status ABORTED. > > It seems to me that the above patch handles (2) but not (1)? > > Thanks, > > Bart. Hi Bart, Regarding your description of 'ABORT TMF success' and 'SQE' I think there might be some misunderstanding. In this section of the UFSHCI 4.0 specification. 4.4.6 (Informative) Processing Abort in MCQ mode: An Implementation Example There are three case for MCQ abort: 1. When the host controller has already sent out the SQE and the UFS device has already responded with the corresponding response, the CQ Entry will automatically increment by 1. This case is the simplest, the SQE will have a corresponding CQE for the host to cleanup resources. 2. When the host controller has not yet sent out this SQE (SQ is not empty), the software can fill in 'nullify' to notify the host controller that there is no need to send it, and directly fill the corresponding response into the CQ with OCS: ABORTED. This scenario is also straightforward, the UFS device won't be aware, and only the host controller needs to clean up the related resources. 3. When the host controller has already sent out the SQE and is waiting for the response from the UFS device (CQE), the software can initiate cleanup to notify the host controller that there is no need to wait, and directly fill the corresponding response into the CQ with OCS: ABORTED. Therefore, when a TMF is successfully executed, for example, tag 0 has been aborted, the software will know that the UFS device will no longer return a response for tag 0. Thus, the host controller needs to initiate cleanup, allowing the CQE corresponding to this SQE to return, while also cleaning the resources for tag 0. In both the second and third cases, no matter which one occurs, they will result in the CQE corresponding to the SQE being filled with OCS: ABORTED. So, as long as we directly requeue when OCS: ABORTED, it will satisfy both of these situations. Thanks Peter
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index a6f818cdef0e..4fff929b70d6 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -5405,9 +5405,15 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp, break; case OCS_ABORTED: result |= DID_ABORT << 16; + dev_warn(hba->dev, + "OCS aborted from controller for tag %d\n", + lrbp->task_tag); break; case OCS_INVALID_COMMAND_STATUS: result |= DID_REQUEUE << 16; + dev_warn(hba->dev, + "OCS invaild from controller for tag %d\n", + lrbp->task_tag); break; case OCS_INVALID_CMD_TABLE_ATTR: case OCS_INVALID_PRDT_ATTR: @@ -5526,6 +5532,18 @@ void ufshcd_compl_one_cqe(struct ufs_hba *hba, int task_tag, ufshcd_update_monitor(hba, lrbp); ufshcd_add_command_trace(hba, task_tag, UFS_CMD_COMP); cmd->result = ufshcd_transfer_rsp_status(hba, lrbp, cqe); + + /* + * Ignore MCQ OCS: ABORTED posted by the host controller. + * This makes the behavior of MCQ mode consistent with that + * of legacy SDB mode. + */ + if (hba->mcq_enabled) { + ocs = ufshcd_get_tr_ocs(lrbp, cqe); + if (ocs == OCS_ABORTED) + return; + } + ufshcd_release_scsi_cmd(hba, lrbp); /* Do not touch lrbp after scsi done */ scsi_done(cmd); @@ -6486,8 +6504,11 @@ static bool ufshcd_abort_one(struct request *rq, void *priv) if (!hwq) return 0; spin_lock_irqsave(&hwq->cq_lock, flags); - if (ufshcd_cmd_inflight(lrbp->cmd)) + if (ufshcd_cmd_inflight(lrbp->cmd)) { + set_host_byte(lrbp->cmd, DID_REQUEUE); ufshcd_release_scsi_cmd(hba, lrbp); + scsi_done(lrbp->cmd); + } spin_unlock_irqrestore(&hwq->cq_lock, flags); }