diff mbox

[v4,2/5] IB/core, RDMA RW API: Do not exceed QP SGE send limit

Message ID c71ece34-36e2-86da-5032-2fc946ff0073@sandisk.com (mailing list archive)
State Accepted
Headers show

Commit Message

Bart Van Assche July 21, 2016, 8:03 p.m. UTC
Compute the SGE limit for RDMA READ and WRITE requests in
ib_create_qp(). Use that limit in the RDMA RW API implementation.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Parav Pandit <pandit.parav@gmail.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: <stable@vger.kernel.org> #v4.7+
---
 drivers/infiniband/core/rw.c    | 10 ++--------
 drivers/infiniband/core/verbs.c |  9 +++++++++
 include/rdma/ib_verbs.h         |  6 ++++++
 3 files changed, 17 insertions(+), 8 deletions(-)

Comments

Leon Romanovsky July 27, 2016, 12:42 p.m. UTC | #1
On Thu, Jul 21, 2016 at 01:03:30PM -0700, Bart Van Assche wrote:
> Compute the SGE limit for RDMA READ and WRITE requests in
> ib_create_qp(). Use that limit in the RDMA RW API implementation.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>
> Cc: Steve Wise <swise@opengridcomputing.com>
> Cc: Parav Pandit <pandit.parav@gmail.com>
> Cc: Nicholas Bellinger <nab@linux-iscsi.org>
> Cc: Laurence Oberman <loberman@redhat.com>
> Cc: <stable@vger.kernel.org> #v4.7+
> ---
>  drivers/infiniband/core/rw.c    | 10 ++--------
>  drivers/infiniband/core/verbs.c |  9 +++++++++
>  include/rdma/ib_verbs.h         |  6 ++++++
>  3 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
> index 1ad2baa..dbfd854 100644
> --- a/drivers/infiniband/core/rw.c
> +++ b/drivers/infiniband/core/rw.c
> @@ -58,13 +58,6 @@ static inline bool rdma_rw_io_needs_mr(struct ib_device *dev, u8 port_num,
>  	return false;
>  }
>  
> -static inline u32 rdma_rw_max_sge(struct ib_device *dev,
> -		enum dma_data_direction dir)
> -{
> -	return dir == DMA_TO_DEVICE ?
> -		dev->attrs.max_sge : dev->attrs.max_sge_rd;
> -}
> -
>  static inline u32 rdma_rw_fr_page_list_len(struct ib_device *dev)
>  {
>  	/* arbitrary limit to avoid allocating gigantic resources */
> @@ -186,7 +179,8 @@ static int rdma_rw_init_map_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
>  		u64 remote_addr, u32 rkey, enum dma_data_direction dir)
>  {
>  	struct ib_device *dev = qp->pd->device;
> -	u32 max_sge = rdma_rw_max_sge(dev, dir);
> +	u32 max_sge = dir == DMA_TO_DEVICE ? qp->max_write_sge :
> +		      qp->max_read_sge;

Bart,
I'm sure that I missed something.

Can "dir" be DMA_BIDIRECTIONAL?
If yes, mxa_sge will be min(max_write_sge, max_read_sge).
Bart Van Assche July 27, 2016, 3:24 p.m. UTC | #2
On 07/27/2016 05:42 AM, Leon Romanovsky wrote:
> On Thu, Jul 21, 2016 at 01:03:30PM -0700, Bart Van Assche wrote:
>>  static inline u32 rdma_rw_fr_page_list_len(struct ib_device *dev)
>>  {
>>  	/* arbitrary limit to avoid allocating gigantic resources */
>> @@ -186,7 +179,8 @@ static int rdma_rw_init_map_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
>>  		u64 remote_addr, u32 rkey, enum dma_data_direction dir)
>>  {
>>  	struct ib_device *dev = qp->pd->device;
>> -	u32 max_sge = rdma_rw_max_sge(dev, dir);
>> +	u32 max_sge = dir == DMA_TO_DEVICE ? qp->max_write_sge :
>> +		      qp->max_read_sge;
>
> Bart,
> I'm sure that I missed something.
>
> Can "dir" be DMA_BIDIRECTIONAL?
> If yes, mxa_sge will be min(max_write_sge, max_read_sge).

Hello Leon,

The RDMA R/W API is intended for target drivers. None of the upstream 
target drivers (ib_isert, ib_srpt, nvmet) that use the RDMA R/W API 
support bidi commands today. And even if such support would be added, 
these target drivers should perform RDMA READs and WRITEs in two phases. 
So I do not expect that "dir" will ever be equal to DMA_BIDIRECTIONAL in 
the above code.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky July 27, 2016, 4:23 p.m. UTC | #3
On Wed, Jul 27, 2016 at 08:24:32AM -0700, Bart Van Assche wrote:
> On 07/27/2016 05:42 AM, Leon Romanovsky wrote:
> >On Thu, Jul 21, 2016 at 01:03:30PM -0700, Bart Van Assche wrote:
> >> static inline u32 rdma_rw_fr_page_list_len(struct ib_device *dev)
> >> {
> >> 	/* arbitrary limit to avoid allocating gigantic resources */
> >>@@ -186,7 +179,8 @@ static int rdma_rw_init_map_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
> >> 		u64 remote_addr, u32 rkey, enum dma_data_direction dir)
> >> {
> >> 	struct ib_device *dev = qp->pd->device;
> >>-	u32 max_sge = rdma_rw_max_sge(dev, dir);
> >>+	u32 max_sge = dir == DMA_TO_DEVICE ? qp->max_write_sge :
> >>+		      qp->max_read_sge;
> >
> >Bart,
> >I'm sure that I missed something.
> >
> >Can "dir" be DMA_BIDIRECTIONAL?
> >If yes, mxa_sge will be min(max_write_sge, max_read_sge).
> 
> Hello Leon,
> 
> The RDMA R/W API is intended for target drivers. None of the upstream target
> drivers (ib_isert, ib_srpt, nvmet) that use the RDMA R/W API support bidi
> commands today. And even if such support would be added, these target
> drivers should perform RDMA READs and WRITEs in two phases. So I do not
> expect that "dir" will ever be equal to DMA_BIDIRECTIONAL in the above code.
> 
> Bart.

Thanks Bart.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Aug. 1, 2016, 10:56 a.m. UTC | #4
Looks fine, thanks a lot Bart!

Reviewed-by: Christoph Hellwig <hch@lst.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 1ad2baa..dbfd854 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -58,13 +58,6 @@  static inline bool rdma_rw_io_needs_mr(struct ib_device *dev, u8 port_num,
 	return false;
 }
 
-static inline u32 rdma_rw_max_sge(struct ib_device *dev,
-		enum dma_data_direction dir)
-{
-	return dir == DMA_TO_DEVICE ?
-		dev->attrs.max_sge : dev->attrs.max_sge_rd;
-}
-
 static inline u32 rdma_rw_fr_page_list_len(struct ib_device *dev)
 {
 	/* arbitrary limit to avoid allocating gigantic resources */
@@ -186,7 +179,8 @@  static int rdma_rw_init_map_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 		u64 remote_addr, u32 rkey, enum dma_data_direction dir)
 {
 	struct ib_device *dev = qp->pd->device;
-	u32 max_sge = rdma_rw_max_sge(dev, dir);
+	u32 max_sge = dir == DMA_TO_DEVICE ? qp->max_write_sge :
+		      qp->max_read_sge;
 	struct ib_sge *sge;
 	u32 total_len = 0, i, j;
 
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 6298f54..e39a0b5 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -814,6 +814,15 @@  struct ib_qp *ib_create_qp(struct ib_pd *pd,
 		}
 	}
 
+	/*
+	 * Note: all hw drivers guarantee that max_send_sge is lower than
+	 * the device RDMA WRITE SGE limit but not all hw drivers ensure that
+	 * max_send_sge <= max_sge_rd.
+	 */
+	qp->max_write_sge = qp_init_attr->cap.max_send_sge;
+	qp->max_read_sge = min_t(u32, qp_init_attr->cap.max_send_sge,
+				 device->attrs.max_sge_rd);
+
 	return qp;
 }
 EXPORT_SYMBOL(ib_create_qp);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7e440d4..e694f02 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1428,6 +1428,10 @@  struct ib_srq {
 	} ext;
 };
 
+/*
+ * @max_write_sge: Maximum SGE elements per RDMA WRITE request.
+ * @max_read_sge:  Maximum SGE elements per RDMA READ request.
+ */
 struct ib_qp {
 	struct ib_device       *device;
 	struct ib_pd	       *pd;
@@ -1449,6 +1453,8 @@  struct ib_qp {
 	void                  (*event_handler)(struct ib_event *, void *);
 	void		       *qp_context;
 	u32			qp_num;
+	u32			max_write_sge;
+	u32			max_read_sge;
 	enum ib_qp_type		qp_type;
 };