diff mbox

[V3,4/4] RDMA/isert: Support iWARP transport

Message ID 20150701163058.6501.39171.stgit@build.ogc.int (mailing list archive)
State Superseded
Headers show

Commit Message

Steve Wise July 1, 2015, 4:30 p.m. UTC
Memory regions that are the target of an iWARP RDMA READ RESPONSE need
REMOTE_WRITE access rights.  So enable REMOTE_WRITE for iWARP devices.

Use the device's max_sge_rd capability to compute the target's read sge
depth.  Save both the read and write max_sge values in the isert_conn
struct, and use these when creating RDMA_READ work requests

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/ulp/isert/ib_isert.c |   54 ++++++++++++++++++++++++++-----
 drivers/infiniband/ulp/isert/ib_isert.h |    3 +-
 2 files changed, 47 insertions(+), 10 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sagi Grimberg July 1, 2015, 5:16 p.m. UTC | #1
On 7/1/2015 7:30 PM, Steve Wise wrote:
> Memory regions that are the target of an iWARP RDMA READ RESPONSE need
> REMOTE_WRITE access rights.  So enable REMOTE_WRITE for iWARP devices.
>
> Use the device's max_sge_rd capability to compute the target's read sge
> depth.  Save both the read and write max_sge values in the isert_conn
> struct, and use these when creating RDMA_READ work requests
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
>   drivers/infiniband/ulp/isert/ib_isert.c |   54 ++++++++++++++++++++++++++-----
>   drivers/infiniband/ulp/isert/ib_isert.h |    3 +-
>   2 files changed, 47 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> index 9e7b492..8334dd0 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -163,7 +163,9 @@ isert_create_qp(struct isert_conn *isert_conn,
>   	 * outgoing control PDU responses.
>   	 */
>   	attr.cap.max_send_sge = max(2, device->dev_attr.max_sge - 2);
> -	isert_conn->max_sge = attr.cap.max_send_sge;
> +	isert_conn->max_write_sge = attr.cap.max_send_sge;
> +	isert_conn->max_read_sge = min_t(u32, device->dev_attr.max_sge_rd,
> +					 attr.cap.max_send_sge);
>
>   	attr.cap.max_recv_sge = 1;
>   	attr.sq_sig_type = IB_SIGNAL_REQ_WR;
> @@ -348,6 +350,17 @@ out_cq:
>   	return ret;
>   }
>
> +static int any_port_is_iwarp(struct isert_device *device)
> +{
> +	int i;
> +
> +	for (i = rdma_start_port(device->ib_device);
> +	     i <= rdma_end_port(device->ib_device); i++)
> +		if (rdma_protocol_iwarp(device->ib_device, i))
> +			return 1;
> +	return 0;
> +}
> +

Lets get rid of that as soon as possible...

However,
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>


Nic,

I think it makes sense that this will go via doug's tree.
Any objection?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz July 1, 2015, 8:33 p.m. UTC | #2
On Wed, Jul 1, 2015 at 7:30 PM, Steve Wise <swise@opengridcomputing.com> wrote:
> Memory regions that are the target of an iWARP RDMA READ RESPONSE need
> REMOTE_WRITE access rights.  So enable REMOTE_WRITE for iWARP devices.

I don't see the point to mess the code with this branching on the
iwarp specific diffs from IB/RoCE -- note that the MRs for which you
mangle their access flags are LOCAL and not advertised to external
entity (initiator), hence why not just OR into the access flags what
iwarp needs?!

> Use the device's max_sge_rd capability to compute the target's read sge
> depth.  Save both the read and write max_sge values in the isert_conn
> struct, and use these when creating RDMA_READ work requests

We have here strictly two related but different changes, break this to
two patches.

> +static int any_port_is_iwarp(struct isert_device *device)
> +{
> +       int i;
> +
> +       for (i = rdma_start_port(device->ib_device);
> +            i <= rdma_end_port(device->ib_device); i++)
> +               if (rdma_protocol_iwarp(device->ib_device, i))
> +                       return 1;
> +       return 0;
> +}
> +

If this needed @ all, put in one of the IB core headers, not in a ULP source
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 1, 2015, 8:53 p.m. UTC | #3
> -----Original Message-----
> From: Or Gerlitz [mailto:gerlitz.or@gmail.com]
> Sent: Wednesday, July 01, 2015 3:33 PM
> To: Steve Wise
> Cc: Doug Ledford; Roi Dayan; linux-rdma@vger.kernel.org; Sagi Grimberg; Mike Marciniszyn; target-devel@vger.kernel.org; Eli Cohen; Or
> Gerlitz
> Subject: Re: [PATCH V3 4/4] RDMA/isert: Support iWARP transport
> 
> On Wed, Jul 1, 2015 at 7:30 PM, Steve Wise <swise@opengridcomputing.com> wrote:
> > Memory regions that are the target of an iWARP RDMA READ RESPONSE need
> > REMOTE_WRITE access rights.  So enable REMOTE_WRITE for iWARP devices.
> 
> I don't see the point to mess the code with this branching on the
> iwarp specific diffs from IB/RoCE -- note that the MRs for which you
> mangle their access flags are LOCAL and not advertised to external
> entity (initiator), hence why not just OR into the access flags what
> iwarp needs?!

Yes, the MR is a local MR, but it is used for REMOTE access for iWARP, but not IB.  It think the reason is that in iWARP there is no distinction between local and remote keys.  For iwarp you get 1 key called a Steering Tag or STAG that is used both locally and advertised to the peer (if to be used for REMOTE IO).  Further, that STAG is sent to the peer in the RDMA READ REQUEST and the peer iWARP stack uses it to generate READ RESPONSE messages with the advertised STAG as the READ DESTINATION.  And thus these STAGs require REMOTE_WRITE access flags.  In IB, I believe the "key" sent in the READ REQUEST is not the MR lkey or rkey at all, but a one-shot transaction key, valid for that READ operation only, and the local IB stack uses this key to map to the destination MR/lkey when processing the RDMA READ RESPONSE.  This difference in the protocols is what drives the access flag difference.

Regardless, I'm not sure what you propose I do differently?  The code in this patch does OR the needed access flag if the device is iWARP when creating the DMA_MR.

> 
> > Use the device's max_sge_rd capability to compute the target's read sge
> > depth.  Save both the read and write max_sge values in the isert_conn
> > struct, and use these when creating RDMA_READ work requests
> 
> We have here strictly two related but different changes, break this to
> two patches.
> 

Easy enough.

> > +static int any_port_is_iwarp(struct isert_device *device)
> > +{
> > +       int i;
> > +
> > +       for (i = rdma_start_port(device->ib_device);
> > +            i <= rdma_end_port(device->ib_device); i++)
> > +               if (rdma_protocol_iwarp(device->ib_device, i))
> > +                       return 1;
> > +       return 0;
> > +}
> > +
> 
> If this needed @ all, put in one of the IB core headers, not in a ULP source

As per the consensus on V2 of the series, the core changes to remove any_port_is_iwarp() will be in a subsequent series so we can carefully review the new proposed API and see how it affects the apps that use it (iSER and NFSRDMA).  I'll be posting this 2nd series soon.

Thanks!

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz July 1, 2015, 9:03 p.m. UTC | #4
On Wed, Jul 1, 2015 at 11:53 PM, Steve Wise <swise@opengridcomputing.com> wrote:
>> From: Or Gerlitz [mailto:gerlitz.or@gmail.com]

> Yes, the MR is a local MR, but it is used for REMOTE access for iWARP, but not IB.  It think the reason is that in iWARP there is no distinction between local and remote keys.  For iwarp you get 1 key called a Steering Tag or STAG that is used both locally and advertised to the peer (if to be used for REMOTE IO).  Further, that STAG is sent to the peer in the RDMA READ REQUEST and the peer iWARP stack uses it to generate READ RESPONSE messages with the advertised STAG as the READ DESTINATION.  And thus these STAGs require REMOTE_WRITE access flags.  In IB, I believe the "key" sent in the READ REQUEST is not the MR lkey or rkey at all, but a one-shot transaction key, valid for that READ operation only, and the local IB stack uses this key to map to the destination MR/lkey when processing the RDMA READ RESPONSE.  This difference in the protocols is what drives the access flag difference.


Since in IB/RoCE the key sent on the wire isn't actually something
that can be used as rkey by the peer, we can safely do the extra
access flags Oring always and not worry which transport is used.


> Regardless, I'm not sure what you propose I do differently?  The code in this patch does OR the needed access flag if the device is iWARP when creating the DMA_MR.

So always OR and don't branch

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg July 2, 2015, 6:28 a.m. UTC | #5
On 7/2/2015 12:03 AM, Or Gerlitz wrote:
> On Wed, Jul 1, 2015 at 11:53 PM, Steve Wise <swise@opengridcomputing.com> wrote:
>>> From: Or Gerlitz [mailto:gerlitz.or@gmail.com]
>
>> Yes, the MR is a local MR, but it is used for REMOTE access for iWARP, but not IB.  It think the reason is that in iWARP there is no distinction between local and remote keys.  For iwarp you get 1 key called a Steering Tag or STAG that is used both locally and advertised to the peer (if to be used for REMOTE IO).  Further, that STAG is sent to the peer in the RDMA READ REQUEST and the peer iWARP stack uses it to generate READ RESPONSE messages with the advertised STAG as the READ DESTINATION.  And thus these STAGs require REMOTE_WRITE access flags.  In IB, I believe the "key" sent in the READ REQUEST is not the MR lkey or rkey at all, but a one-shot transaction key, valid for that READ operation only, and the local IB stack uses this key to map to the destination MR/lkey when processing the RDMA READ RESPONSE.  This difference in the protocols is what drives the access flag difference.
>
>
> Since in IB/RoCE the key sent on the wire isn't actually something
> that can be used as rkey by the peer, we can safely do the extra
> access flags Oring always and not worry which transport is used.
>
>
>> Regardless, I'm not sure what you propose I do differently?  The code in this patch does OR the needed access flag if the device is iWARP when creating the DMA_MR.
>
> So always OR and don't branch

Or has a good point.
The DMA mkey in target mode is discrete and not sent to any peer.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 2, 2015, 1:17 p.m. UTC | #6
On 7/2/2015 1:28 AM, Sagi Grimberg wrote:
> On 7/2/2015 12:03 AM, Or Gerlitz wrote:
>> On Wed, Jul 1, 2015 at 11:53 PM, Steve Wise 
>> <swise@opengridcomputing.com> wrote:
>>>> From: Or Gerlitz [mailto:gerlitz.or@gmail.com]
>>
>>> Yes, the MR is a local MR, but it is used for REMOTE access for 
>>> iWARP, but not IB.  It think the reason is that in iWARP there is no 
>>> distinction between local and remote keys.  For iwarp you get 1 key 
>>> called a Steering Tag or STAG that is used both locally and 
>>> advertised to the peer (if to be used for REMOTE IO).  Further, that 
>>> STAG is sent to the peer in the RDMA READ REQUEST and the peer iWARP 
>>> stack uses it to generate READ RESPONSE messages with the advertised 
>>> STAG as the READ DESTINATION.  And thus these STAGs require 
>>> REMOTE_WRITE access flags.  In IB, I believe the "key" sent in the 
>>> READ REQUEST is not the MR lkey or rkey at all, but a one-shot 
>>> transaction key, valid for that READ operation only, and the local 
>>> IB stack uses this key to map to the destination MR/lkey when 
>>> processing the RDMA READ RESPONSE. This difference in the protocols 
>>> is what drives the access flag difference.
>>
>>
>> Since in IB/RoCE the key sent on the wire isn't actually something
>> that can be used as rkey by the peer, we can safely do the extra
>> access flags Oring always and not worry which transport is used.
>>
>>
>>> Regardless, I'm not sure what you propose I do differently?  The 
>>> code in this patch does OR the needed access flag if the device is 
>>> iWARP when creating the DMA_MR.
>>
>> So always OR and don't branch
>
> Or has a good point.
> The DMA mkey in target mode is discrete and not sent to any peer.
>

Yup.  I agree.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 2, 2015, 4:39 p.m. UTC | #7
On Thu, Jul 02, 2015 at 09:28:46AM +0300, Sagi Grimberg wrote:

> Or has a good point.
> The DMA mkey in target mode is discrete and not sent to any peer.

That doesn't mean the peer cannot guess it.

Using the right permission is clearly a stronger protection, we
shouldn't weaken IB just to accommodate iWarp's limitations.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg July 4, 2015, 10:51 a.m. UTC | #8
On 7/2/2015 7:39 PM, Jason Gunthorpe wrote:
> On Thu, Jul 02, 2015 at 09:28:46AM +0300, Sagi Grimberg wrote:
>
>> Or has a good point.
>> The DMA mkey in target mode is discrete and not sent to any peer.
>
> That doesn't mean the peer cannot guess it.
>
> Using the right permission is clearly a stronger protection, we
> shouldn't weaken IB just to accommodate iWarp's limitations.
>

Can't argue with that.

Sorry Steve for the hassle...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 9e7b492..8334dd0 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -163,7 +163,9 @@  isert_create_qp(struct isert_conn *isert_conn,
 	 * outgoing control PDU responses.
 	 */
 	attr.cap.max_send_sge = max(2, device->dev_attr.max_sge - 2);
-	isert_conn->max_sge = attr.cap.max_send_sge;
+	isert_conn->max_write_sge = attr.cap.max_send_sge;
+	isert_conn->max_read_sge = min_t(u32, device->dev_attr.max_sge_rd,
+					 attr.cap.max_send_sge);
 
 	attr.cap.max_recv_sge = 1;
 	attr.sq_sig_type = IB_SIGNAL_REQ_WR;
@@ -348,6 +350,17 @@  out_cq:
 	return ret;
 }
 
+static int any_port_is_iwarp(struct isert_device *device)
+{
+	int i;
+
+	for (i = rdma_start_port(device->ib_device);
+	     i <= rdma_end_port(device->ib_device); i++)
+		if (rdma_protocol_iwarp(device->ib_device, i))
+			return 1;
+	return 0;
+}
+
 static int
 isert_create_device_ib_res(struct isert_device *device)
 {
@@ -383,7 +396,17 @@  isert_create_device_ib_res(struct isert_device *device)
 		goto out_cq;
 	}
 
-	device->mr = ib_get_dma_mr(device->pd, IB_ACCESS_LOCAL_WRITE);
+	/*
+	 * IWARP transports need REMOTE_WRITE for MRs used as the target of
+	 * an RDMA_READ.  Since the DMA MR is used for all ports, then if
+	 * any port is running IWARP, add REMOTE_WRITE.
+	 */
+	if (any_port_is_iwarp(device))
+		device->mr = ib_get_dma_mr(device->pd, IB_ACCESS_LOCAL_WRITE |
+						       IB_ACCESS_REMOTE_WRITE);
+	else
+		device->mr = ib_get_dma_mr(device->pd, IB_ACCESS_LOCAL_WRITE);
+
 	if (IS_ERR(device->mr)) {
 		ret = PTR_ERR(device->mr);
 		isert_err("failed to create dma mr, device %p, ret=%d\n",
@@ -2375,7 +2398,7 @@  isert_put_text_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
 static int
 isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 		    struct ib_sge *ib_sge, struct ib_send_wr *send_wr,
-		    u32 data_left, u32 offset)
+		    u32 data_left, u32 offset, u32 max_sge)
 {
 	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
 	struct scatterlist *sg_start, *tmp_sg;
@@ -2386,7 +2409,7 @@  isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 
 	sg_off = offset / PAGE_SIZE;
 	sg_start = &cmd->se_cmd.t_data_sg[sg_off];
-	sg_nents = min(cmd->se_cmd.t_data_nents - sg_off, isert_conn->max_sge);
+	sg_nents = min(cmd->se_cmd.t_data_nents - sg_off, max_sge);
 	page_off = offset % PAGE_SIZE;
 
 	send_wr->sg_list = ib_sge;
@@ -2430,8 +2453,9 @@  isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 	struct isert_data_buf *data = &wr->data;
 	struct ib_send_wr *send_wr;
 	struct ib_sge *ib_sge;
-	u32 offset, data_len, data_left, rdma_write_max, va_offset = 0;
+	u32 offset, data_len, data_left, rdma_max_len, va_offset = 0;
 	int ret = 0, i, ib_sge_cnt;
+	u32 max_sge;
 
 	isert_cmd->tx_desc.isert_cmd = isert_cmd;
 
@@ -2453,7 +2477,12 @@  isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 	}
 	wr->ib_sge = ib_sge;
 
-	wr->send_wr_num = DIV_ROUND_UP(data->nents, isert_conn->max_sge);
+	if (wr->iser_ib_op == ISER_IB_RDMA_WRITE)
+		max_sge = isert_conn->max_write_sge;
+	else
+		max_sge =  isert_conn->max_read_sge;
+
+	wr->send_wr_num = DIV_ROUND_UP(data->nents, max_sge);
 	wr->send_wr = kzalloc(sizeof(struct ib_send_wr) * wr->send_wr_num,
 				GFP_KERNEL);
 	if (!wr->send_wr) {
@@ -2463,11 +2492,11 @@  isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 	}
 
 	wr->isert_cmd = isert_cmd;
-	rdma_write_max = isert_conn->max_sge * PAGE_SIZE;
+	rdma_max_len = max_sge * PAGE_SIZE;
 
 	for (i = 0; i < wr->send_wr_num; i++) {
 		send_wr = &isert_cmd->rdma_wr.send_wr[i];
-		data_len = min(data_left, rdma_write_max);
+		data_len = min(data_left, rdma_max_len);
 
 		send_wr->send_flags = 0;
 		if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
@@ -2489,7 +2518,7 @@  isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 		}
 
 		ib_sge_cnt = isert_build_rdma_wr(isert_conn, isert_cmd, ib_sge,
-					send_wr, data_len, offset);
+					send_wr, data_len, offset, max_sge);
 		ib_sge += ib_sge_cnt;
 
 		offset += data_len;
@@ -2618,6 +2647,13 @@  isert_fast_reg_mr(struct isert_conn *isert_conn,
 	fr_wr.wr.fast_reg.rkey = mr->rkey;
 	fr_wr.wr.fast_reg.access_flags = IB_ACCESS_LOCAL_WRITE;
 
+	/*
+	 * IWARP transports need REMOTE_WRITE for MRs used as the target of
+	 * an RDMA_READ.
+	 */
+	if (rdma_protocol_iwarp(ib_dev, isert_conn->cm_id->port_num))
+		fr_wr.wr.fast_reg.access_flags |= IB_ACCESS_REMOTE_WRITE;
+
 	if (!wr)
 		wr = &fr_wr;
 	else
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 9ec23a7..29fde27 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -152,7 +152,8 @@  struct isert_conn {
 	u32			responder_resources;
 	u32			initiator_depth;
 	bool			pi_support;
-	u32			max_sge;
+	u32			max_write_sge;
+	u32			max_read_sge;
 	char			*login_buf;
 	char			*login_req_buf;
 	char			*login_rsp_buf;