Message ID | 90b21916a384281682d520302b594834e010671b.1353903448.git.dillowda@ornl.gov (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On 11/26/12 05:44, David Dillow wrote: > Once we know we have an issue with the QP, there is no point trying to > send anything else down the pipe. This also allows us to consolidate > code in the SCSI EH path. > [ ... ] > @@ -1683,7 +1681,7 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target, > struct srp_iu *iu; > struct srp_tsk_mgmt *tsk_mgmt; > > - if (srp_is_removed(target)) > + if (target->state) > return -1; > Hi Dave, After I posted the patch on which the above patch has been based I realized that testing the connection state at the start of srp_send_tsk_mgmt() is not sufficient to avoid QPN use-after-free. If a DREQ is received by the initiator after the above test has been performed and before the task management function has been sent it is still possible to send a task management function over a closed QP. I'd like to address this in a different way - see also the thread called "SCSI LLDs, the SCSI error handler and host resource lifetime" on the linux-scsi mailing list (November 20, http://marc.info/?t=135342155500003&r=1). Sorry for the confusion I caused. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2012-11-26 at 10:17 +0100, Bart Van Assche wrote: > On 11/26/12 05:44, David Dillow wrote: > > Once we know we have an issue with the QP, there is no point trying to > > send anything else down the pipe. This also allows us to consolidate > > code in the SCSI EH path. > After I posted the patch on which the above patch has been based I > realized that testing the connection state at the start of > srp_send_tsk_mgmt() is not sufficient to avoid QPN use-after-free. If a > DREQ is received by the initiator after the above test has been > performed and before the task management function has been sent it is > still possible to send a task management function over a closed QP. AFIACT, DREQ does not actually close the QP -- it only tells us that the other side would like to. We don't actually close the connection until we try to send on it again, I think -- not sure if we see recv failures for the queued work items. Regardless, the issue of resource lifetime is an issue that needs solving. > I'd like to address this in a different way - see also the thread called > "SCSI LLDs, the SCSI error handler and host resource lifetime" on the > linux-scsi mailing list (November 20, > http://marc.info/?t=135342155500003&r=1). I like the direction you propose there. It seems that scsi_remove_host() at one point waited for the EH thread to exit -- or perhaps it was part of scsi_host_put() chain -- as there's the longstanding deferral to the work queue for the SRP target removal. Of course, that's been there for ~5 years now, and things have changed in the SCSI stack.
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 1e8ce81..2951e1c 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1328,14 +1328,12 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd) int len; if (unlikely(target->state)) { - if (srp_is_disconnected(target)) + if (!srp_is_removed(target)) goto err; - if (srp_is_removed(target)) { - scmnd->result = DID_BAD_TARGET << 16; - scmnd->scsi_done(scmnd); - return 0; - } + scmnd->result = DID_BAD_TARGET << 16; + scmnd->scsi_done(scmnd); + return 0; } spin_lock_irqsave(&target->lock, flags); @@ -1683,7 +1681,7 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target, struct srp_iu *iu; struct srp_tsk_mgmt *tsk_mgmt; - if (srp_is_removed(target)) + if (target->state) return -1; init_completion(&target->tsk_mgmt_done); @@ -1727,12 +1725,11 @@ static int srp_abort(struct scsi_cmnd *scmnd) shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n"); - if (srp_in_error(target)) - return FAILED; if (!req || !srp_claim_req(target, req, scmnd)) return FAILED; - srp_send_tsk_mgmt(target, req->index, scmnd->device->lun, - SRP_TSK_ABORT_TASK); + if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun, + SRP_TSK_ABORT_TASK)) + return FAILED; srp_free_req(target, req, scmnd, 0); scmnd->result = DID_ABORT << 16; scmnd->scsi_done(scmnd); @@ -1747,8 +1744,6 @@ static int srp_reset_device(struct scsi_cmnd *scmnd) shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n"); - if (srp_in_error(target)) - return FAILED; if (srp_send_tsk_mgmt(target, SRP_TAG_NO_REQ, scmnd->device->lun, SRP_TSK_LUN_RESET)) return FAILED;
Once we know we have an issue with the QP, there is no point trying to send anything else down the pipe. This also allows us to consolidate code in the SCSI EH path. Needs-to-be-signed-off-by: Bart Van Assche <bvanassche@acm.org> [ adapted to new state tracking code ] Signed-off-by: David Dillow <dillowda@ornl.gov> --- drivers/infiniband/ulp/srp/ib_srp.c | 21 ++++++++------------- 1 files changed, 8 insertions(+), 13 deletions(-)