Message ID | 20211110004440.3389311-9-bvanassche@acm.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | UFS patches for kernel v5.17 | expand |
On 10/11/2021 02:44, Bart Van Assche wrote: > Make sure that aborted commands are completed once by clearing the > corresponding tag bit from hba->outstanding_reqs. This patch is a > follow-up for commit cd892096c940 ("scsi: ufs: core: Improve SCSI > abort handling"). > > Fixes: 7a3e97b0dc4b ("[SCSI] ufshcd: UFS Host controller driver") > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/ufs/ufshcd.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index 8f5640647054..1e15ed1f639f 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -7090,6 +7090,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) > goto release; > } > > + /* > + * ufshcd_try_to_abort_task() cleared the 'tag' bit in the doorbell > + * register. Clear the corresponding bit from outstanding_reqs to > + * prevent early completion. > + */ > + spin_lock_irqsave(&hba->outstanding_lock, flags); > + __clear_bit(tag, &hba->outstanding_reqs); > + spin_unlock_irqrestore(&hba->outstanding_lock, flags); Seems like something ufshcd_clear_cmd() should be doing instead? > + > lrbp->cmd = NULL; > err = SUCCESS; > >
On 11/10/21 12:57 AM, Adrian Hunter wrote: > On 10/11/2021 02:44, Bart Van Assche wrote: >> Make sure that aborted commands are completed once by clearing the >> corresponding tag bit from hba->outstanding_reqs. This patch is a >> follow-up for commit cd892096c940 ("scsi: ufs: core: Improve SCSI >> abort handling"). >> >> Fixes: 7a3e97b0dc4b ("[SCSI] ufshcd: UFS Host controller driver") >> Signed-off-by: Bart Van Assche <bvanassche@acm.org> >> --- >> drivers/scsi/ufs/ufshcd.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >> index 8f5640647054..1e15ed1f639f 100644 >> --- a/drivers/scsi/ufs/ufshcd.c >> +++ b/drivers/scsi/ufs/ufshcd.c >> @@ -7090,6 +7090,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) >> goto release; >> } >> >> + /* >> + * ufshcd_try_to_abort_task() cleared the 'tag' bit in the doorbell >> + * register. Clear the corresponding bit from outstanding_reqs to >> + * prevent early completion. >> + */ >> + spin_lock_irqsave(&hba->outstanding_lock, flags); >> + __clear_bit(tag, &hba->outstanding_reqs); >> + spin_unlock_irqrestore(&hba->outstanding_lock, flags); > > Seems like something ufshcd_clear_cmd() should be doing instead? Hi Adrian, I'm concerned that would break ufshcd_eh_device_reset_handler() since that reset handler retries SCSI commands by calling __ufshcd_transfer_req_compl() after having called ufshcd_clear_cmd(). Thanks, Bart.
Hi Bart, On 11/10/21 8:44 AM, Bart Van Assche wrote: > Make sure that aborted commands are completed once by clearing the > corresponding tag bit from hba->outstanding_reqs. This patch is a > follow-up for commit cd892096c940 ("scsi: ufs: core: Improve SCSI > abort handling"). > > Fixes: 7a3e97b0dc4b ("[SCSI] ufshcd: UFS Host controller driver") > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/ufs/ufshcd.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index 8f5640647054..1e15ed1f639f 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -7090,6 +7090,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) > goto release; > } > > + /* > + * ufshcd_try_to_abort_task() cleared the 'tag' bit in the doorbell > + * register. Clear the corresponding bit from outstanding_reqs to > + * prevent early completion. > + */ > + spin_lock_irqsave(&hba->outstanding_lock, flags); > + __clear_bit(tag, &hba->outstanding_reqs); > + spin_unlock_irqrestore(&hba->outstanding_lock, flags); > + should we also call unmap? scsi_dma_unmap(cmd); > lrbp->cmd = NULL; > err = SUCCESS; >
On 10/11/2021 20:56, Bart Van Assche wrote: > On 11/10/21 12:57 AM, Adrian Hunter wrote: >> On 10/11/2021 02:44, Bart Van Assche wrote: >>> Make sure that aborted commands are completed once by clearing the >>> corresponding tag bit from hba->outstanding_reqs. This patch is a >>> follow-up for commit cd892096c940 ("scsi: ufs: core: Improve SCSI >>> abort handling"). >>> >>> Fixes: 7a3e97b0dc4b ("[SCSI] ufshcd: UFS Host controller driver") >>> Signed-off-by: Bart Van Assche <bvanassche@acm.org> >>> --- >>> drivers/scsi/ufs/ufshcd.c | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >>> index 8f5640647054..1e15ed1f639f 100644 >>> --- a/drivers/scsi/ufs/ufshcd.c >>> +++ b/drivers/scsi/ufs/ufshcd.c >>> @@ -7090,6 +7090,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) >>> goto release; >>> } >>> + /* >>> + * ufshcd_try_to_abort_task() cleared the 'tag' bit in the doorbell >>> + * register. Clear the corresponding bit from outstanding_reqs to >>> + * prevent early completion. >>> + */ >>> + spin_lock_irqsave(&hba->outstanding_lock, flags); >>> + __clear_bit(tag, &hba->outstanding_reqs); >>> + spin_unlock_irqrestore(&hba->outstanding_lock, flags); >> >> Seems like something ufshcd_clear_cmd() should be doing instead? > > Hi Adrian, > > I'm concerned that would break ufshcd_eh_device_reset_handler() since that reset handler retries SCSI commands by calling __ufshcd_transfer_req_compl() after having called ufshcd_clear_cmd(). Whenever an outstanding_reqs bit transitions 1 -> 0, then __ufshcd_transfer_req_compl() must be called. In all cases, the correct logic must have the effect of: spin_lock_irqsave(&hba->outstanding_lock, flags); bit = __test_and_clear_bit(tag, &hba->outstanding_reqs); spin_unlock_irqrestore(&hba->outstanding_lock, flags); if (bit) __ufshcd_transfer_req_compl(hba, 1UL << tag); To put it another way, how else does the driver know whether or not __ufshcd_transfer_req_compl() has been called already. As a separate issue, in ufshcd_abort() there is: /* If command is already aborted/completed, return FAILED. */ if (!(test_bit(tag, &hba->outstanding_reqs))) { dev_err(hba->dev, "%s: cmd at tag %d already completed, outstanding=0x%lx, doorbell=0x%x\n", __func__, tag, hba->outstanding_reqs, reg); goto release; } which seems wrong. FAILED should only be returned to escalate the error handling, so if the slot has already successfully been cleared, that is SUCCESS. scsi_times_out() has already blocked the scsi_done() path (by setting SCMD_STATE_COMPLETE), so any use after free must be being caused by SCSI not the ufs driver. path
On 11/12/21 2:56 AM, Adrian Hunter wrote: > On 10/11/2021 20:56, Bart Van Assche wrote: >> On 11/10/21 12:57 AM, Adrian Hunter wrote: >>> Seems like something ufshcd_clear_cmd() should be doing instead? >> >> I'm concerned that would break ufshcd_eh_device_reset_handler() >> since that reset handler retries SCSI commands by calling >> __ufshcd_transfer_req_compl() after having called ufshcd_clear_cmd(). > Whenever an outstanding_reqs bit transitions 1 -> 0, then > __ufshcd_transfer_req_compl() must be called. I will look further into this. > As a separate issue, in ufshcd_abort() there is: > > /* If command is already aborted/completed, return FAILED. */ > if (!(test_bit(tag, &hba->outstanding_reqs))) { > dev_err(hba->dev, > "%s: cmd at tag %d already completed, outstanding=0x%lx, doorbell=0x%x\n", > __func__, tag, hba->outstanding_reqs, reg); > goto release; > } > > which seems wrong. FAILED should only be returned to escalate the > error handling, so if the slot has already successfully been > cleared, that is SUCCESS. scsi_times_out() has already blocked > the scsi_done() path (by setting SCMD_STATE_COMPLETE), so any use > after free must be being caused by SCSI not the ufs driver. scmd_eh_abort_handler() would trigger a use-after-free if ufshcd_abort() would return SUCCESS for completed commands. Hence the choice for the return value FAILED for completed commands. BTW, can this code path ever be reached since scsi_done() sets the SCMD_STATE_COMPLETE bit before it calls blk_mq_complete_request() and since scsi_times_out() tests that bit before it schedules a call of ufshcd_abort()? Thanks, Bart.
On 16/11/2021 01:09, Bart Van Assche wrote: > On 11/12/21 2:56 AM, Adrian Hunter wrote: >> On 10/11/2021 20:56, Bart Van Assche wrote: >>> On 11/10/21 12:57 AM, Adrian Hunter wrote: >>>> Seems like something ufshcd_clear_cmd() should be doing instead? >>> >>> I'm concerned that would break ufshcd_eh_device_reset_handler() >>> since that reset handler retries SCSI commands by calling >>> __ufshcd_transfer_req_compl() after having called ufshcd_clear_cmd(). >> Whenever an outstanding_reqs bit transitions 1 -> 0, then >> __ufshcd_transfer_req_compl() must be called. > > I will look further into this. > >> As a separate issue, in ufshcd_abort() there is: >> >> /* If command is already aborted/completed, return FAILED. */ >> if (!(test_bit(tag, &hba->outstanding_reqs))) { >> dev_err(hba->dev, >> "%s: cmd at tag %d already completed, outstanding=0x%lx, doorbell=0x%x\n", >> __func__, tag, hba->outstanding_reqs, reg); >> goto release; >> } >> >> which seems wrong. FAILED should only be returned to escalate the >> error handling, so if the slot has already successfully been >> cleared, that is SUCCESS. scsi_times_out() has already blocked >> the scsi_done() path (by setting SCMD_STATE_COMPLETE), so any use >> after free must be being caused by SCSI not the ufs driver. > > scmd_eh_abort_handler() would trigger a use-after-free if ufshcd_abort() would return SUCCESS for completed commands. Hence the choice for the return value FAILED for completed commands. > > BTW, can this code path ever be reached since scsi_done() sets the SCMD_STATE_COMPLETE bit before it calls blk_mq_complete_request() and since scsi_times_out() tests that bit before it schedules a call of ufshcd_abort()? Racing with timing out, the UFS interrupt handler can clear hba->outstanding_reqs bit and call __ufshcd_transfer_req_compl(), but the request will not be freed if scsi_times_out() wins the race. So there should be no use-after-free unless SCSI error handling itself has a bug. One perhaps unrelated issue in scsi_times_out(): if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state)) return BLK_EH_RESET_TIMER; Shouldn't that return BLK_EH_DONE not BLK_EH_RESET_TIMER, since the request has been through blk_mq_complete_request() ?
On 11/11/21 5:17 PM, Peter Wang wrote: > Hi Bart, > > On 11/10/21 8:44 AM, Bart Van Assche wrote: >> Make sure that aborted commands are completed once by clearing the >> corresponding tag bit from hba->outstanding_reqs. This patch is a >> follow-up for commit cd892096c940 ("scsi: ufs: core: Improve SCSI >> abort handling"). >> >> Fixes: 7a3e97b0dc4b ("[SCSI] ufshcd: UFS Host controller driver") >> Signed-off-by: Bart Van Assche <bvanassche@acm.org> >> --- >> drivers/scsi/ufs/ufshcd.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >> index 8f5640647054..1e15ed1f639f 100644 >> --- a/drivers/scsi/ufs/ufshcd.c >> +++ b/drivers/scsi/ufs/ufshcd.c >> @@ -7090,6 +7090,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) >> goto release; >> } >> + /* >> + * ufshcd_try_to_abort_task() cleared the 'tag' bit in the doorbell >> + * register. Clear the corresponding bit from outstanding_reqs to >> + * prevent early completion. >> + */ >> + spin_lock_irqsave(&hba->outstanding_lock, flags); >> + __clear_bit(tag, &hba->outstanding_reqs); >> + spin_unlock_irqrestore(&hba->outstanding_lock, flags); >> + > > should we also call unmap? > > scsi_dma_unmap(cmd); > Hi Bart, Should we add unmap? Thanks. Peter > >> lrbp->cmd = NULL; >> err = SUCCESS;
On 11/16/21 1:03 AM, Adrian Hunter wrote: > One perhaps unrelated issue in scsi_times_out(): > > if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state)) > return BLK_EH_RESET_TIMER; > > Shouldn't that return BLK_EH_DONE not BLK_EH_RESET_TIMER, since the request has been through blk_mq_complete_request() ? I think so. I will submit a patch. Bart.
On 11/16/21 1:07 AM, Peter Wang wrote:
> Should we add unmap?
Hi Peter,
I will add DMA unmapping code in the abort handler.
Thanks,
Bart.
On 16/11/2021 18:08, Bart Van Assche wrote: > On 11/16/21 1:07 AM, Peter Wang wrote: >> Should we add unmap? > > Hi Peter, > > I will add DMA unmapping code in the abort handler. I would note that __ufshcd_transfer_req_compl() does that, as well as providing a matching ufshcd_release() for the ufshcd_hold() in ufshcd_queuecommand(), so do consider __ufshcd_transfer_req_compl(). Using __ufshcd_transfer_req_compl() seems consistent with the error handler which uses __ufshcd_transfer_req_compl() via ufshcd_complete_requests(), which will pick up all the requests that the error handler has just aborted via ufshcd_try_to_abort_task(). Also ufshcd_host_reset_and_restore() uses __ufshcd_transfer_req_compl() via ufshcd_complete_requests(), which will pick up anything still in outstanding_reqs because the doorbell has become zero at that point.
On 11/16/21 12:16, Adrian Hunter wrote: > On 16/11/2021 18:08, Bart Van Assche wrote: >> On 11/16/21 1:07 AM, Peter Wang wrote: >>> Should we add unmap? >> >> Hi Peter, >> >> I will add DMA unmapping code in the abort handler. > > I would note that __ufshcd_transfer_req_compl() does that, as well as providing > a matching ufshcd_release() for the ufshcd_hold() in ufshcd_queuecommand(), so > do consider __ufshcd_transfer_req_compl(). > > Using __ufshcd_transfer_req_compl() seems consistent with the error handler which > uses __ufshcd_transfer_req_compl() via ufshcd_complete_requests(), which will pick > up all the requests that the error handler has just aborted via > ufshcd_try_to_abort_task(). Also ufshcd_host_reset_and_restore() uses > __ufshcd_transfer_req_compl() via ufshcd_complete_requests(), which will pick > up anything still in outstanding_reqs because the doorbell has become zero at > that point. Hi Adrian, Although I agree with minimizing code duplication, I'm not sure that __ufshcd_transfer_req_compl() should be called from inside ufshcd_abort(). __ufshcd_transfer_req_compl() also calls ufshcd_update_monitor() and ufshcd_add_command_trace(hba, index, UFS_CMD_COMP). Neither function should be called when aborting a command. Thanks, Bart.
On 16/11/2021 23:53, Bart Van Assche wrote: > On 11/16/21 12:16, Adrian Hunter wrote: >> On 16/11/2021 18:08, Bart Van Assche wrote: >>> On 11/16/21 1:07 AM, Peter Wang wrote: >>>> Should we add unmap? >>> >>> Hi Peter, >>> >>> I will add DMA unmapping code in the abort handler. >> >> I would note that __ufshcd_transfer_req_compl() does that, as well as providing >> a matching ufshcd_release() for the ufshcd_hold() in ufshcd_queuecommand(), so >> do consider __ufshcd_transfer_req_compl(). >> >> Using __ufshcd_transfer_req_compl() seems consistent with the error handler which >> uses __ufshcd_transfer_req_compl() via ufshcd_complete_requests(), which will pick >> up all the requests that the error handler has just aborted via >> ufshcd_try_to_abort_task(). Also ufshcd_host_reset_and_restore() uses >> __ufshcd_transfer_req_compl() via ufshcd_complete_requests(), which will pick >> up anything still in outstanding_reqs because the doorbell has become zero at >> that point. > > Hi Adrian, > > Although I agree with minimizing code duplication, I'm not sure that __ufshcd_transfer_req_compl() should be called from inside ufshcd_abort(). __ufshcd_transfer_req_compl() also calls ufshcd_update_monitor() and ufshcd_add_command_trace(hba, index, UFS_CMD_COMP). Neither function should be called when aborting a command. But we should trace the abort I would have thought - seems like a separate issue though.
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 8f5640647054..1e15ed1f639f 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -7090,6 +7090,15 @@ static int ufshcd_abort(struct scsi_cmnd *cmd) goto release; } + /* + * ufshcd_try_to_abort_task() cleared the 'tag' bit in the doorbell + * register. Clear the corresponding bit from outstanding_reqs to + * prevent early completion. + */ + spin_lock_irqsave(&hba->outstanding_lock, flags); + __clear_bit(tag, &hba->outstanding_reqs); + spin_unlock_irqrestore(&hba->outstanding_lock, flags); + lrbp->cmd = NULL; err = SUCCESS;
Make sure that aborted commands are completed once by clearing the corresponding tag bit from hba->outstanding_reqs. This patch is a follow-up for commit cd892096c940 ("scsi: ufs: core: Improve SCSI abort handling"). Fixes: 7a3e97b0dc4b ("[SCSI] ufshcd: UFS Host controller driver") Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/scsi/ufs/ufshcd.c | 9 +++++++++ 1 file changed, 9 insertions(+)