Message ID | 20240923080344.19084-4-peter.wang@mediatek.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fix abort defect | expand |
On 9/23/24 1:03 AM, peter.wang@mediatek.com wrote: > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index b5c7bc50a27e..b42079c3d634 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -5404,7 +5404,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp, > } > break; > case OCS_ABORTED: > - result |= DID_ABORT << 16; > + if (hba->quirks & UFSHCD_QUIRK_OCS_ABORTED) > + result |= DID_REQUEUE << 16; > + else > + result |= DID_ABORT << 16; > dev_warn(hba->dev, > "OCS aborted from controller = %x for tag %d\n", > ocs, lrbp->task_tag); I think the approach of this patch is racy: the cmd->result assignment by ufshcd_transfer_rsp_status() races with the cmd->result assignment by ufshcd_abort_one(). How about addressing this race as follows? * In ufshcd_compl_one_cqe(), if the OCS_ABORTED status is encountered, set a completion. * In the code that aborts SCSI commands, for MediaTek controllers only, wait for that completion (based on a quirk). * Instead of introducing an if-statement in ufshcd_transfer_rsp_status(), rely on the cmd->result value assigned by ufshcd_abort_one(). Thanks, Bart.
On Mon, 2024-09-23 at 11:15 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/23/24 1:03 AM, peter.wang@mediatek.com wrote: > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > > index b5c7bc50a27e..b42079c3d634 100644 > > --- a/drivers/ufs/core/ufshcd.c > > +++ b/drivers/ufs/core/ufshcd.c > > @@ -5404,7 +5404,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba > *hba, struct ufshcd_lrb *lrbp, > > } > > break; > > case OCS_ABORTED: > > -result |= DID_ABORT << 16; > > +if (hba->quirks & UFSHCD_QUIRK_OCS_ABORTED) > > +result |= DID_REQUEUE << 16; > > +else > > +result |= DID_ABORT << 16; > > dev_warn(hba->dev, > > "OCS aborted from controller = %x for tag %d\n", > > ocs, lrbp->task_tag); > > I think the approach of this patch is racy: the cmd->result > assignment > by ufshcd_transfer_rsp_status() races with the cmd->result assignment > by > ufshcd_abort_one(). How about addressing this race as follows? > * In ufshcd_compl_one_cqe(), if the OCS_ABORTED status is > encountered, > set a completion. > * In the code that aborts SCSI commands, for MediaTek controllers > only, > wait for that completion (based on a quirk). > * Instead of introducing an if-statement in > ufshcd_transfer_rsp_status(), rely on the cmd->result value > assigned > by ufshcd_abort_one(). > > Thanks, > > Bart. Hi Bart, Sorry, I might not have understood the potential racing issue here. The SCSI abort command doesn't need to wait for completion because SCSI doesn't care about cmd->result, right? The error handler abort also doesn't need to wait for completion, because it should have a guaranteed order? Firstly, ufshcd_abort_one is only likely to be filled back with OCS: ABORTED by MediaTek UFS controller after ufshcd_try_to_abort_task, and the sequence is as follows. ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task() // trigger mediatek controller fill OCS: ABORTED ufshcd_complete_requests() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE In addition, the ISR will use the outstanding_lock with ufshcd_err_handler to provide protection, so there won't be any racing that causes the command to be released repeatedly. The only possible issue might be that after ufshcd_abort_one, the MediaTek UFS controller has not yet filled in OCS: ABORTED and has entered ufshcd_transfer_rsp_status for checking. But this doesn't matter, because it will just be treated as OCS_INVALID_COMMAND_STATUS. Is there any corner case that I might have overlooked? Thanks. Peter >
On 9/24/24 1:51 AM, Peter Wang (王信友) wrote:
> Is there any corner case that I might have overlooked?
Maybe I misunderstood how MediaTek controllers work. Anyway, if a
MediaTek controller reports the completion status OCS_ABORTED, is
there any chance that this status will be reported after
ufshcd_release_scsi_cmd() has been called? Is there any chance that
the controller writing OCS_ABORTED into the status field will race
with the host software overwriting that status field while submitting
a new command?
Thanks,
Bart.
On Tue, 2024-09-24 at 12:36 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 9/24/24 1:51 AM, Peter Wang (王信友) wrote: > > Is there any corner case that I might have overlooked? > > Maybe I misunderstood how MediaTek controllers work. Anyway, if a > MediaTek controller reports the completion status OCS_ABORTED, is > there any chance that this status will be reported after > ufshcd_release_scsi_cmd() has been called? Is there any chance that > the controller writing OCS_ABORTED into the status field will race > with the host software overwriting that status field while submitting > a new command? > > Thanks, > > Bart. > Hi Bart, Okay, I may not have clearly explained the behavior of the MediaTek UFS controller. In the UFSHCI specification, under the description of UTRLCLR. After writing to UTRLCLR, when UTRLDBR is cleared to 0, the MediaTek UFS controller will simultaneously change the OCS to ABORTED. So, OCS: ABORTED is filled at the moment when ufshcd_clear_cmd successfully waits for DBR to clear. That's why I wrote it like this: ufshcd_try_to_abort_task() // triggers MediaTek controller to fill OCS: ABORTED Therefore, the status should already be ABORTED before ufshcd_release_scsi_cmd, so there shouldn't be a racing issue. Thanks. Peter
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index b5c7bc50a27e..b42079c3d634 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -5404,7 +5404,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp, } break; case OCS_ABORTED: - result |= DID_ABORT << 16; + if (hba->quirks & UFSHCD_QUIRK_OCS_ABORTED) + result |= DID_REQUEUE << 16; + else + result |= DID_ABORT << 16; dev_warn(hba->dev, "OCS aborted from controller = %x for tag %d\n", ocs, lrbp->task_tag); diff --git a/drivers/ufs/host/ufs-mediatek.c b/drivers/ufs/host/ufs-mediatek.c index 02c9064284e1..8a4c1b8f5a26 100644 --- a/drivers/ufs/host/ufs-mediatek.c +++ b/drivers/ufs/host/ufs-mediatek.c @@ -1021,6 +1021,7 @@ static int ufs_mtk_init(struct ufs_hba *hba) hba->quirks |= UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL; hba->quirks |= UFSHCD_QUIRK_MCQ_BROKEN_INTR; hba->quirks |= UFSHCD_QUIRK_MCQ_BROKEN_RTC; + hba->quirks |= UFSHCD_QUIRK_OCS_ABORTED; hba->vps->wb_flush_threshold = UFS_WB_BUF_REMAIN_PERCENT(80); if (host->caps & UFS_MTK_CAP_DISABLE_AH8) diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h index 0fd2aebac728..8f156803d703 100644 --- a/include/ufs/ufshcd.h +++ b/include/ufs/ufshcd.h @@ -684,6 +684,12 @@ enum ufshcd_quirks { * single doorbell mode. */ UFSHCD_QUIRK_BROKEN_LSDBS_CAP = 1 << 25, + + /* + * Some host controllers set OCS_ABORTED after UTRLCLR (SDB mode), + * this quirk is set to treat OCS: ABORTED as INVALID_OCS_VALUE + */ + UFSHCD_QUIRK_OCS_ABORTED = 1 << 26, }; enum ufshcd_caps {