Message ID | 20200123042345.23886-3-bvanassche@acm.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | qla2xxx patches for kernel v5.6 | expand |
On Wed, Jan 22, 2020 at 08:23:41PM -0800, Bart Van Assche wrote: > Since the SCSI core does not reuse the tag of the SCSI command that is > being aborted by .eh_abort() before .eh_abort() has finished it is not > necessary to check from inside that callback whether or not the SCSI command > has already completed. Instead, rely on the firmware to return an error code > when attempting to abort a command that has already completed. Additionally, > rely on the firmware to return an error code when attempting to abort an > already aborted command. > > In qla2x00_abort_srb(), use blk_mq_request_started() instead of > sp->completed and sp->aborted. > > This patch eliminates several race conditions triggered by the removed member > variables. I can only guess here what the races are but I agree removing the logic here and relying on the SCSI layer to handle it correctly makes sense. > Acked-by: Himanshu Madhani <hmadhani@marvell.com> > Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> > Cc: Quinn Tran <qutran@marvell.com> > Cc: Martin Wilck <mwilck@suse.com> > Cc: Daniel Wagner <dwagner@suse.de> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Daniel Wagner <dwagner@suse.de> > +/* > + * The caller must ensure that no completion interrupts will happen > + * while this function is in progress. > + */ So could we add something like WARN_ON(irqs_disabled())?
On 2020-01-23 02:17, Daniel Wagner wrote: > On Wed, Jan 22, 2020 at 08:23:41PM -0800, Bart Van Assche wrote: >> Since the SCSI core does not reuse the tag of the SCSI command that is >> being aborted by .eh_abort() before .eh_abort() has finished it is not >> necessary to check from inside that callback whether or not the SCSI command >> has already completed. Instead, rely on the firmware to return an error code >> when attempting to abort a command that has already completed. Additionally, >> rely on the firmware to return an error code when attempting to abort an >> already aborted command. >> >> In qla2x00_abort_srb(), use blk_mq_request_started() instead of >> sp->completed and sp->aborted. >> >> This patch eliminates several race conditions triggered by the removed member >> variables. > > I can only guess here what the races are but I agree removing the > logic here and relying on the SCSI layer to handle it correctly makes > sense. I will make the patch description more clear when I repost this patch. >> +/* >> + * The caller must ensure that no completion interrupts will happen >> + * while this function is in progress. >> + */ > > So could we add something like WARN_ON(irqs_disabled())? qla2x00_abort_all_cmds() is typically called after the kernel driver discovered that the firmware stopped processing commands. If the firmware has stopped processing commands it is safe to call this function without disabling interrupts. Even if the caller would disable interrupts, that would only disable interrupts on the local CPU but not on other CPUs. Additionally, disabling interrupts is not a proper solution because if a completion interrupt arrives after a command has been aborted that will cause trouble if the command handle has already been associated with another command. Bart.
Hi Bart, On Thu, Jan 23, 2020 at 07:12:04PM -0800, Bart Van Assche wrote: > >> +/* > >> + * The caller must ensure that no completion interrupts will happen > >> + * while this function is in progress. > >> + */ > > > > So could we add something like WARN_ON(irqs_disabled())? > > qla2x00_abort_all_cmds() is typically called after the kernel driver > discovered that the firmware stopped processing commands. If the > firmware has stopped processing commands it is safe to call this > function without disabling interrupts. Even if the caller would disable > interrupts, that would only disable interrupts on the local CPU but not > on other CPUs. Additionally, disabling interrupts is not a proper > solution because if a completion interrupt arrives after a command has > been aborted that will cause trouble if the command handle has already > been associated with another command. Thanks for the explenation. I understood the comment as 'interrupts' have to be disabled when calling this function. Thanks, Daniel
diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h index ed32e9715794..c5a067f45005 100644 --- a/drivers/scsi/qla2xxx/qla_def.h +++ b/drivers/scsi/qla2xxx/qla_def.h @@ -597,9 +597,6 @@ typedef struct srb { struct fc_port *fcport; struct scsi_qla_host *vha; unsigned int start_timer:1; - unsigned int abort:1; - unsigned int aborted:1; - unsigned int completed:1; uint32_t handle; uint16_t flags; diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index e7bad0bfffda..0c9bfe77ba8a 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -2504,11 +2504,6 @@ qla2x00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt) return; } - if (sp->abort) - sp->aborted = 1; - else - sp->completed = 1; - if (sp->cmd_type != TYPE_SRB) { req->outstanding_cmds[handle] = NULL; ql_dbg(ql_dbg_io, vha, 0x3015, diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 79387ac8936f..a34f27b2d602 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -1253,17 +1253,6 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd) return SUCCESS; spin_lock_irqsave(qpair->qp_lock_ptr, flags); - if (sp->completed) { - spin_unlock_irqrestore(qpair->qp_lock_ptr, flags); - return SUCCESS; - } - - if (sp->abort || sp->aborted) { - spin_unlock_irqrestore(qpair->qp_lock_ptr, flags); - return FAILED; - } - - sp->abort = 1; sp->comp = ∁ spin_unlock_irqrestore(qpair->qp_lock_ptr, flags); @@ -1688,6 +1677,10 @@ qla2x00_loop_reset(scsi_qla_host_t *vha) return QLA_SUCCESS; } +/* + * The caller must ensure that no completion interrupts will happen + * while this function is in progress. + */ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res, unsigned long *flags) __releases(qp->qp_lock_ptr) @@ -1696,6 +1689,7 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res, DECLARE_COMPLETION_ONSTACK(comp); scsi_qla_host_t *vha = qp->vha; struct qla_hw_data *ha = vha->hw; + struct scsi_cmnd *cmd = GET_CMD_SP(sp); int rval; bool ret_cmd; uint32_t ratov_j; @@ -1717,7 +1711,6 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res, } sp->comp = ∁ - sp->abort = 1; spin_unlock_irqrestore(qp->qp_lock_ptr, *flags); rval = ha->isp_ops->abort_command(sp); @@ -1741,13 +1734,17 @@ static void qla2x00_abort_srb(struct qla_qpair *qp, srb_t *sp, const int res, } spin_lock_irqsave(qp->qp_lock_ptr, *flags); - if (ret_cmd && (!sp->completed || !sp->aborted)) + if (ret_cmd && blk_mq_request_started(cmd->request)) sp->done(sp, res); } else { sp->done(sp, res); } } +/* + * The caller must ensure that no completion interrupts will happen + * while this function is in progress. + */ static void __qla2x00_abort_all_cmds(struct qla_qpair *qp, int res) { @@ -1794,6 +1791,10 @@ __qla2x00_abort_all_cmds(struct qla_qpair *qp, int res) spin_unlock_irqrestore(qp->qp_lock_ptr, flags); } +/* + * The caller must ensure that no completion interrupts will happen + * while this function is in progress. + */ void qla2x00_abort_all_cmds(scsi_qla_host_t *vha, int res) {