Message ID | ce96b1f4-e9b9-31e2-f9d2-77e71f8bbcd1@sandisk.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, Jun 28, 2016 at 01:26:03PM +0200, Bart Van Assche wrote: > The SGE limit for a queue pair is typically lower than what is > defined by the HCA limits. Hence make max_sge an argument. I don't think this is the right way to approach any limit. I'd rather have a current limit in the ib_qp structure than having to pass it explicitly and growing driver specific policy once again. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/28/2016 01:52 PM, Christoph Hellwig wrote: > On Tue, Jun 28, 2016 at 01:26:03PM +0200, Bart Van Assche wrote: >> The SGE limit for a queue pair is typically lower than what is >> defined by the HCA limits. Hence make max_sge an argument. > > I don't think this is the right way to approach any limit. I'd rather > have a current limit in the ib_qp structure than having to pass it > explicitly and growing driver specific policy once again. Hello Christoph, The memory needed for QP buffers strongly depends on max_send_sge and max_recv_sge. Using a large value for max_send_sge and/or max_recv_sge for all drivers would increase the memory consumption for drivers that only submit RDMA requests with a low number of SG elements. Using a small value for max_send_sge and/or max_recv_sge would result in a performance decrease for drivers that benefit from large max_send_sge and/or max_recv_sge values. This is why I think that drivers should have the freedom to chose the max_send_sge and max_recv_sge values. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 28, 2016 at 02:37:29PM +0200, Bart Van Assche wrote: > The memory needed for QP buffers strongly depends on max_send_sge and > max_recv_sge. Using a large value for max_send_sge and/or max_recv_sge for > all drivers would increase the memory consumption for drivers that only > submit RDMA requests with a low number of SG elements. Using a small value > for max_send_sge and/or max_recv_sge would result in a performance decrease > for drivers that benefit from large max_send_sge and/or max_recv_sge > values. This is why I think that drivers should have the freedom to chose > the max_send_sge and max_recv_sge values. No problem with that - but please configure it on the QP as part of the qp_init_attr structure instead of passing it to every RDMA READ/WRITE call. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/28/2016 05:42 PM, Christoph Hellwig wrote: > On Tue, Jun 28, 2016 at 02:37:29PM +0200, Bart Van Assche wrote: >> The memory needed for QP buffers strongly depends on max_send_sge and >> max_recv_sge. Using a large value for max_send_sge and/or max_recv_sge for >> all drivers would increase the memory consumption for drivers that only >> submit RDMA requests with a low number of SG elements. Using a small value >> for max_send_sge and/or max_recv_sge would result in a performance decrease >> for drivers that benefit from large max_send_sge and/or max_recv_sge >> values. This is why I think that drivers should have the freedom to chose >> the max_send_sge and max_recv_sge values. > > No problem with that - but please configure it on the QP as part of the > qp_init_attr structure instead of passing it to every RDMA READ/WRITE call. That sounds like a good idea to me. I will make this change, retest and repost this patch series. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c index 13d4067..3894f9e 100644 --- a/drivers/infiniband/core/rw.c +++ b/drivers/infiniband/core/rw.c @@ -58,13 +58,6 @@ static inline bool rdma_rw_io_needs_mr(struct ib_device *dev, u8 port_num, return false; } -static inline u32 rdma_rw_max_sge(struct ib_device *dev, - enum dma_data_direction dir) -{ - return dir == DMA_TO_DEVICE ? - dev->attrs.max_sge : dev->attrs.max_sge_rd; -} - static inline u32 rdma_rw_fr_page_list_len(struct ib_device *dev) { /* arbitrary limit to avoid allocating gigantic resources */ @@ -183,10 +176,10 @@ out: static int rdma_rw_init_map_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp, struct scatterlist *sg, u32 sg_cnt, u32 offset, - u64 remote_addr, u32 rkey, enum dma_data_direction dir) + u64 remote_addr, u32 rkey, enum dma_data_direction dir, + u32 max_sge) { struct ib_device *dev = qp->pd->device; - u32 max_sge = rdma_rw_max_sge(dev, dir); struct ib_sge *sge; u32 total_len = 0, i, j; @@ -275,13 +268,15 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp, * @remote_addr:remote address to read/write (relative to @rkey) * @rkey: remote key to operate on * @dir: %DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ + * @max_sge: maximum number of SG elements per work request * * Returns the number of WQEs that will be needed on the workqueue if * successful, or a negative error code. */ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num, struct scatterlist *sg, u32 sg_cnt, u32 sg_offset, - u64 remote_addr, u32 rkey, enum dma_data_direction dir) + u64 remote_addr, u32 rkey, enum dma_data_direction dir, + u32 max_sge) { struct ib_device *dev = qp->pd->device; int ret; @@ -314,7 +309,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num, sg_offset, remote_addr, rkey, dir); } else if (sg_cnt > 1) { ret = rdma_rw_init_map_wrs(ctx, qp, sg, sg_cnt, sg_offset, - remote_addr, rkey, dir); + remote_addr, rkey, dir, max_sge); } else { ret = rdma_rw_init_single_wr(ctx, qp, sg, sg_offset, remote_addr, rkey, dir); diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index a990c04..06a5671 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -2075,21 +2075,24 @@ static int isert_rdma_rw_ctx_post(struct isert_cmd *cmd, struct isert_conn *conn, struct ib_cqe *cqe, struct ib_send_wr *chain_wr) { + struct ib_device *dev = conn->device->ib_device; struct se_cmd *se_cmd = &cmd->iscsi_cmd->se_cmd; enum dma_data_direction dir = target_reverse_dma_direction(se_cmd); u8 port_num = conn->cm_id->port_num; u64 addr; - u32 rkey, offset; + u32 rkey, offset, max_sge; int ret; if (dir == DMA_FROM_DEVICE) { addr = cmd->write_va; rkey = cmd->write_stag; offset = cmd->iscsi_cmd->write_data_done; + max_sge = dev->attrs.max_sge_rd; } else { addr = cmd->read_va; rkey = cmd->read_stag; offset = 0; + max_sge = dev->attrs.max_sge; } if (isert_prot_cmd(conn, se_cmd)) { @@ -2107,7 +2110,7 @@ isert_rdma_rw_ctx_post(struct isert_cmd *cmd, struct isert_conn *conn, } else { ret = rdma_rw_ctx_init(&cmd->rw, conn->qp, port_num, se_cmd->t_data_sg, se_cmd->t_data_nents, - offset, addr, rkey, dir); + offset, addr, rkey, dir, max_sge); } if (ret < 0) { isert_err("Cmd: %p failed to prepare RDMA res\n", cmd); diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 4a41556..2ed65f5 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -803,7 +803,9 @@ static int srpt_alloc_rw_ctxs(struct srpt_send_ioctx *ioctx, { enum dma_data_direction dir = target_reverse_dma_direction(&ioctx->cmd); struct srpt_rdma_ch *ch = ioctx->ch; + struct ib_device *dev = ch->qp->pd->device; struct scatterlist *prev = NULL; + u32 max_sge; unsigned prev_nents; int ret, i; @@ -816,6 +818,9 @@ static int srpt_alloc_rw_ctxs(struct srpt_send_ioctx *ioctx, return -ENOMEM; } + max_sge = dir == DMA_TO_DEVICE ? dev->attrs.max_sge : + dev->attrs.max_sge_rd; + for (i = ioctx->n_rw_ctx; i < nbufs; i++, db++) { struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i]; u64 remote_addr = be64_to_cpu(db->va); @@ -828,7 +833,8 @@ static int srpt_alloc_rw_ctxs(struct srpt_send_ioctx *ioctx, goto unwind; ret = rdma_rw_ctx_init(&ctx->rw, ch->qp, ch->sport->port, - ctx->sg, ctx->nents, 0, remote_addr, rkey, dir); + ctx->sg, ctx->nents, 0, remote_addr, rkey, dir, + max_sge); if (ret < 0) { target_free_sgl(ctx->sg, ctx->nents); goto unwind; diff --git a/include/rdma/rw.h b/include/rdma/rw.h index 377d865..590a8aec 100644 --- a/include/rdma/rw.h +++ b/include/rdma/rw.h @@ -61,7 +61,8 @@ struct rdma_rw_ctx { int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num, struct scatterlist *sg, u32 sg_cnt, u32 sg_offset, - u64 remote_addr, u32 rkey, enum dma_data_direction dir); + u64 remote_addr, u32 rkey, enum dma_data_direction dir, + u32 max_sge); void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num, struct scatterlist *sg, u32 sg_cnt, enum dma_data_direction dir);
The SGE limit for a queue pair is typically lower than what is defined by the HCA limits. Hence make max_sge an argument. Fixes: b99f8e4d7bcd ("IB/srpt: convert to the generic RDMA READ/WRITE API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: <stable@vger.kernel.org> #v4.7+ Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Laurence Oberman <loberman@redhat.com> --- drivers/infiniband/core/rw.c | 17 ++++++----------- drivers/infiniband/ulp/isert/ib_isert.c | 7 +++++-- drivers/infiniband/ulp/srpt/ib_srpt.c | 8 +++++++- include/rdma/rw.h | 3 ++- 4 files changed, 20 insertions(+), 15 deletions(-)