[03/11] IB/srp: don't send anything on a bad QP

Message ID	90b21916a384281682d520302b594834e010671b.1353903448.git.dillowda@ornl.gov (mailing list archive)
State	Accepted, archived
Headers	show Return-Path: <linux-rdma-owner@vger.kernel.org> From: David Dillow <dillowda@ornl.gov> To: linux-rdma@vger.kernel.org Cc: linux-scsi@vger.kernel.org, bvanassche@acm.org, roland@purestorage.com Subject: [PATCH 03/11] IB/srp: don't send anything on a bad QP Date: Sun, 25 Nov 2012 23:44:08 -0500 Message-Id: <90b21916a384281682d520302b594834e010671b.1353903448.git.dillowda@ornl.gov> In-Reply-To: <cover.1353903448.git.dillowda@ornl.gov> References: <cover.1353903448.git.dillowda@ornl.gov> In-Reply-To: <cover.1353903448.git.dillowda@ornl.gov> References: <cover.1353903448.git.dillowda@ornl.gov> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk

Message ID

90b21916a384281682d520302b594834e010671b.1353903448.git.dillowda@ornl.gov (mailing list archive)

State

Accepted, archived

Headers

From: David Dillow <dillowda@ornl.gov>
To: linux-rdma@vger.kernel.org
Cc: linux-scsi@vger.kernel.org, bvanassche@acm.org,
	roland@purestorage.com
Subject: [PATCH 03/11] IB/srp: don't send anything on a bad QP
Date: Sun, 25 Nov 2012 23:44:08 -0500
Message-Id: <90b21916a384281682d520302b594834e010671b.1353903448.git.dillowda@ornl.gov>
In-Reply-To: <cover.1353903448.git.dillowda@ornl.gov>
References: <cover.1353903448.git.dillowda@ornl.gov>
In-Reply-To: <cover.1353903448.git.dillowda@ornl.gov>
References: <cover.1353903448.git.dillowda@ornl.gov>
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk

Commit Message

David Dillow Nov. 26, 2012, 4:44 a.m. UTC

Once we know we have an issue with the QP, there is no point trying to
send anything else down the pipe. This also allows us to consolidate
code in the SCSI EH path.

Needs-to-be-signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ adapted to new state tracking code ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   21 ++++++++-------------
 1 files changed, 8 insertions(+), 13 deletions(-)

Comments

Bart Van Assche Nov. 26, 2012, 9:17 a.m. UTC | #1

On 11/26/12 05:44, David Dillow wrote:
> Once we know we have an issue with the QP, there is no point trying to
> send anything else down the pipe. This also allows us to consolidate
> code in the SCSI EH path.
>
[ ... ]
> @@ -1683,7 +1681,7 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target,
>   	struct srp_iu *iu;
>   	struct srp_tsk_mgmt *tsk_mgmt;
>
> -	if (srp_is_removed(target))
> +	if (target->state)
>   		return -1;
>

Hi Dave,

After I posted the patch on which the above patch has been based I 
realized that testing the connection state at the start of 
srp_send_tsk_mgmt() is not sufficient to avoid QPN use-after-free. If a 
DREQ is received by the initiator after the above test has been 
performed and before the task management function has been sent it is 
still possible to send a task management function over a closed QP. I'd 
like to address this in a different way - see also the thread called 
"SCSI LLDs, the SCSI error handler and host resource lifetime" on the 
linux-scsi mailing list (November 20, 
http://marc.info/?t=135342155500003&r=1). Sorry for the confusion I caused.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Dillow Nov. 27, 2012, 3:31 a.m. UTC | #2

On Mon, 2012-11-26 at 10:17 +0100, Bart Van Assche wrote:
> On 11/26/12 05:44, David Dillow wrote:
> > Once we know we have an issue with the QP, there is no point trying to
> > send anything else down the pipe. This also allows us to consolidate
> > code in the SCSI EH path.

> After I posted the patch on which the above patch has been based I 
> realized that testing the connection state at the start of 
> srp_send_tsk_mgmt() is not sufficient to avoid QPN use-after-free. If a 
> DREQ is received by the initiator after the above test has been 
> performed and before the task management function has been sent it is 
> still possible to send a task management function over a closed QP.

AFIACT, DREQ does not actually close the QP -- it only tells us that the
other side would like to. We don't actually close the connection until
we try to send on it again, I think -- not sure if we see recv failures
for the queued work items.

Regardless, the issue of resource lifetime is an issue that needs
solving.

>  I'd like to address this in a different way - see also the thread called 
> "SCSI LLDs, the SCSI error handler and host resource lifetime" on the 
> linux-scsi mailing list (November 20, 
> http://marc.info/?t=135342155500003&r=1).

I like the direction you propose there. It seems that scsi_remove_host()
at one point waited for the EH thread to exit -- or perhaps it was part
of scsi_host_put() chain -- as there's the longstanding deferral to the
work queue for the SRP target removal. Of course, that's been there for
~5 years now, and things have changed in the SCSI stack.

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 1e8ce81..2951e1c 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1328,14 +1328,12 @@  static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	int len;
 
 	if (unlikely(target->state)) {
-		if (srp_is_disconnected(target))
+		if (!srp_is_removed(target))
 			goto err;
 
-		if (srp_is_removed(target)) {
-			scmnd->result = DID_BAD_TARGET << 16;
-			scmnd->scsi_done(scmnd);
-			return 0;
-		}
+		scmnd->result = DID_BAD_TARGET << 16;
+		scmnd->scsi_done(scmnd);
+		return 0;
 	}
 
 	spin_lock_irqsave(&target->lock, flags);
@@ -1683,7 +1681,7 @@  static int srp_send_tsk_mgmt(struct srp_target_port *target,
 	struct srp_iu *iu;
 	struct srp_tsk_mgmt *tsk_mgmt;
 
-	if (srp_is_removed(target))
+	if (target->state)
 		return -1;
 
 	init_completion(&target->tsk_mgmt_done);
@@ -1727,12 +1725,11 @@  static int srp_abort(struct scsi_cmnd *scmnd)
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");
 
-	if (srp_in_error(target))
-		return FAILED;
 	if (!req || !srp_claim_req(target, req, scmnd))
 		return FAILED;
-	srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
-			  SRP_TSK_ABORT_TASK);
+	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
+			  SRP_TSK_ABORT_TASK))
+		return FAILED;
 	srp_free_req(target, req, scmnd, 0);
 	scmnd->result = DID_ABORT << 16;
 	scmnd->scsi_done(scmnd);
@@ -1747,8 +1744,6 @@  static int srp_reset_device(struct scsi_cmnd *scmnd)
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n");
 
-	if (srp_in_error(target))
-		return FAILED;
 	if (srp_send_tsk_mgmt(target, SRP_TAG_NO_REQ, scmnd->device->lun,
 			      SRP_TSK_LUN_RESET))
 		return FAILED;

[03/11] IB/srp: don't send anything on a bad QP

Commit Message

Comments

Patch