diff mbox

[4/7] IB/srp: Fix a potential queue overflow in an error path

Message ID 562FF484.6030400@sandisk.com (mailing list archive)
State Superseded
Headers show

Commit Message

Bart Van Assche Oct. 27, 2015, 10:02 p.m. UTC
Wait until memory registration has finished in the srp_queuecommand()
error path before invalidating memory regions to avoid a send queue
overflow.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 41 ++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

Comments

Sagi Grimberg Nov. 3, 2015, 5:36 p.m. UTC | #1
On 28/10/2015 00:02, Bart Van Assche wrote:
> Wait until memory registration has finished in the srp_queuecommand()
> error path before invalidating memory regions to avoid a send queue
> overflow.

This looks backwards to me... Why do we even post anything on our
queue-pair to begin with if we got an unsupported sg list?

Can't we perform a simple sanity check on the sg list instead?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Nov. 3, 2015, 7:04 p.m. UTC | #2
On 11/03/2015 09:36 AM, Sagi Grimberg wrote:
> On 28/10/2015 00:02, Bart Van Assche wrote:
>> Wait until memory registration has finished in the srp_queuecommand()
>> error path before invalidating memory regions to avoid a send queue
>> overflow.
>
> This looks backwards to me... Why do we even post anything on our
> queue-pair to begin with if we got an unsupported sg list?
>
> Can't we perform a simple sanity check on the sg list instead?

Hello Sagi,

There is one memory descriptor pool per RDMA channel and RDMA channels 
are typically used by more than one CPU. This means that memory 
registration for more than one command can happen concurrently from a 
single memory descriptor pool. This is why checking how many memory 
descriptors are left before memory registration occurs wouldn't be 
sufficient.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Nov. 3, 2015, 7:56 p.m. UTC | #3
On 03/11/2015 21:04, Bart Van Assche wrote:
> On 11/03/2015 09:36 AM, Sagi Grimberg wrote:
>> On 28/10/2015 00:02, Bart Van Assche wrote:
>>> Wait until memory registration has finished in the srp_queuecommand()
>>> error path before invalidating memory regions to avoid a send queue
>>> overflow.
>>
>> This looks backwards to me... Why do we even post anything on our
>> queue-pair to begin with if we got an unsupported sg list?
>>
>> Can't we perform a simple sanity check on the sg list instead?
>
> Hello Sagi,
>
> There is one memory descriptor pool per RDMA channel and RDMA channels
> are typically used by more than one CPU. This means that memory
> registration for more than one command can happen concurrently from a
> single memory descriptor pool.
> This is why checking how many memory
> descriptors are left before memory registration occurs wouldn't be
> sufficient.

How harm would it do to make the entire SG list mapping channel-wide
atomic? I'm asking because it's really a non-trivial flow you're
introducing.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Nov. 3, 2015, 8:01 p.m. UTC | #4
On 11/03/2015 11:56 AM, Sagi Grimberg wrote:
> On 03/11/2015 21:04, Bart Van Assche wrote:
>> On 11/03/2015 09:36 AM, Sagi Grimberg wrote:
>>> On 28/10/2015 00:02, Bart Van Assche wrote:
>>>> Wait until memory registration has finished in the srp_queuecommand()
>>>> error path before invalidating memory regions to avoid a send queue
>>>> overflow.
>>>
>>> This looks backwards to me... Why do we even post anything on our
>>> queue-pair to begin with if we got an unsupported sg list?
>>>
>>> Can't we perform a simple sanity check on the sg list instead?
>>
>> Hello Sagi,
>>
>> There is one memory descriptor pool per RDMA channel and RDMA channels
>> are typically used by more than one CPU. This means that memory
>> registration for more than one command can happen concurrently from a
>> single memory descriptor pool.
>> This is why checking how many memory
>> descriptors are left before memory registration occurs wouldn't be
>> sufficient.
>
> How harm would it do to make the entire SG list mapping channel-wide
> atomic? I'm asking because it's really a non-trivial flow you're
> introducing.

Hello Sagi,

Sorry but I strongly prefer not to introduce new contention points in 
the SRP initiator driver.

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Nov. 3, 2015, 8:13 p.m. UTC | #5
> Hello Sagi,
>
> Sorry but I strongly prefer not to introduce new contention points in
> the SRP initiator driver.

Yea... Me neither.

What if you just have a simple check on the SG list and reserve the
required descriptors up front? Would that work?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Nov. 3, 2015, 8:50 p.m. UTC | #6
On 11/03/2015 12:13 PM, Sagi Grimberg wrote:
>> Sorry but I strongly prefer not to introduce new contention points in
>> the SRP initiator driver.
>
> Yea... Me neither.
>
> What if you just have a simple check on the SG list and reserve the
> required descriptors up front? Would that work?

Such a check wouldn't be that simple because the only way to perform 
such a check is either by doubling the number of ib_map_mr_sg() calls or 
by performing additional memory allocations.

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Nov. 4, 2015, 4:03 a.m. UTC | #7
On Tue, Nov 03, 2015 at 12:50:59PM -0800, Bart Van Assche wrote:
> Such a check wouldn't be that simple because the only way to perform such a
> check is either by doubling the number of ib_map_mr_sg() calls or by
> performing additional memory allocations.

The other option woud be to disallow gappy SG lists unless supported by
the hardware with a single MR similar to iSER.  While this would leave
the SRP protocol support for multiple SG entries per command unused it
would significantly simplify the driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Nov. 4, 2015, 10:19 p.m. UTC | #8
On 11/03/2015 08:03 PM, Christoph Hellwig wrote:
> On Tue, Nov 03, 2015 at 12:50:59PM -0800, Bart Van Assche wrote:
>> Such a check wouldn't be that simple because the only way to perform such a
>> check is either by doubling the number of ib_map_mr_sg() calls or by
>> performing additional memory allocations.
>
> The other option woud be to disallow gappy SG lists unless supported by
> the hardware with a single MR similar to iSER.  While this would leave
> the SRP protocol support for multiple SG entries per command unused it
> would significantly simplify the driver.

Hello Christoph,

I will have a look into this after the kernel 4.4 merge window has been 
closed.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 1aa9a4c..6d17fe2 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1044,7 +1044,7 @@  static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich)
 	}
 }
 
-static int srp_inv_rkey(struct srp_rdma_ch *ch, u32 rkey)
+static int srp_inv_rkey(struct srp_rdma_ch *ch, u32 rkey, u32 send_flags)
 {
 	struct ib_send_wr *bad_wr;
 	struct ib_send_wr wr = {
@@ -1052,16 +1052,32 @@  static int srp_inv_rkey(struct srp_rdma_ch *ch, u32 rkey)
 		.wr_id		    = LOCAL_INV_WR_ID,
 		.next		    = NULL,
 		.num_sge	    = 0,
-		.send_flags	    = 0,
+		.send_flags	    = send_flags,
 		.ex.invalidate_rkey = rkey,
 	};
 
 	return ib_post_send(ch->qp, &wr, &bad_wr);
 }
 
+static bool srp_wait_until_done(struct srp_rdma_ch *ch, int i, long timeout)
+{
+	WARN_ON_ONCE(timeout <= 0);
+
+	for ( ; i > 0; i--) {
+		spin_lock_irq(&ch->lock);
+		srp_send_completion(ch->send_cq, ch);
+		spin_unlock_irq(&ch->lock);
+
+		if (wait_for_completion_timeout(&ch->done, timeout) > 0)
+			return true;
+	}
+	return false;
+}
+
 static void srp_unmap_data(struct scsi_cmnd *scmnd,
 			   struct srp_rdma_ch *ch,
-			   struct srp_request *req)
+			   struct srp_request *req,
+			   bool wait_for_first_unmap)
 {
 	struct srp_target_port *target = ch->target;
 	struct srp_device *dev = target->srp_host->srp_dev;
@@ -1077,13 +1093,19 @@  static void srp_unmap_data(struct scsi_cmnd *scmnd,
 		struct srp_fr_desc **pfr;
 
 		for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++) {
-			res = srp_inv_rkey(ch, (*pfr)->mr->rkey);
+			res = srp_inv_rkey(ch, (*pfr)->mr->rkey,
+					   wait_for_first_unmap ?
+					   IB_SEND_SIGNALED : 0);
 			if (res < 0) {
 				shost_printk(KERN_ERR, target->scsi_host, PFX
 				  "Queueing INV WR for rkey %#x failed (%d)\n",
 				  (*pfr)->mr->rkey, res);
 				queue_work(system_long_wq,
 					   &target->tl_err_work);
+			} else if (wait_for_first_unmap) {
+				wait_for_first_unmap = false;
+				WARN_ON_ONCE(!srp_wait_until_done(ch, 10,
+							msecs_to_jiffies(100)));
 			}
 		}
 		if (req->nmdesc)
@@ -1144,7 +1166,7 @@  static void srp_free_req(struct srp_rdma_ch *ch, struct srp_request *req,
 {
 	unsigned long flags;
 
-	srp_unmap_data(scmnd, ch, req);
+	srp_unmap_data(scmnd, ch, req, false);
 
 	spin_lock_irqsave(&ch->lock, flags);
 	ch->req_lim += req_lim_delta;
@@ -1982,7 +2004,12 @@  static void srp_send_completion(struct ib_cq *cq, void *ch_ptr)
 	struct srp_iu *iu;
 
 	while (ib_poll_cq(cq, 1, &wc) > 0) {
-		if (likely(wc.status == IB_WC_SUCCESS)) {
+		if (unlikely(wc.wr_id == LOCAL_INV_WR_ID)) {
+			complete(&ch->done);
+			if (wc.status != IB_WC_SUCCESS)
+				srp_handle_qp_err(wc.wr_id, wc.status, true,
+						  ch);
+		} else if (likely(wc.status == IB_WC_SUCCESS)) {
 			iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
 			list_add(&iu->list, &ch->free_tx);
 		} else {
@@ -2084,7 +2111,7 @@  unlock_rport:
 	return ret;
 
 err_unmap:
-	srp_unmap_data(scmnd, ch, req);
+	srp_unmap_data(scmnd, ch, req, true);
 
 err_iu:
 	srp_put_tx_iu(ch, iu, SRP_IU_CMD);