Message ID | 20181210162945.4198.24714.stgit@manet.1015granger.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NFS/RDMA client for next | expand |
On Mon, Dec 10, 2018 at 11:29:45AM -0500, Chuck Lever wrote: > Some devices advertise a large max_fast_reg_page_list_len > capability, but perform optimally when MRs are significantly smaller > than that depth -- probably when the MR itself is no larger than a > page. > > By default, the RDMA R/W core API uses max_sge_rd as the maximum > page depth for MRs. For some devices, the value of max_sge_rd is > 1, which is also not optimal. Thus, when max_sge_rd is larger than > 1, use that value. Otherwise use the value of the > max_fast_reg_page_list_len attribute. > > I've tested this with a couple of devices, and it reproducibly > improves the throughput of large I/Os by several percent. Can you list which devices for reference in the changelog?
> On Dec 11, 2018, at 9:02 AM, Christoph Hellwig <hch@infradead.org> wrote: > > On Mon, Dec 10, 2018 at 11:29:45AM -0500, Chuck Lever wrote: >> Some devices advertise a large max_fast_reg_page_list_len >> capability, but perform optimally when MRs are significantly smaller >> than that depth -- probably when the MR itself is no larger than a >> page. >> >> By default, the RDMA R/W core API uses max_sge_rd as the maximum >> page depth for MRs. For some devices, the value of max_sge_rd is >> 1, which is also not optimal. Thus, when max_sge_rd is larger than >> 1, use that value. Otherwise use the value of the >> max_fast_reg_page_list_len attribute. >> >> I've tested this with a couple of devices, and it reproducibly >> improves the throughput of large I/Os by several percent. > > Can you list which devices for reference in the changelog? I have only three devices here. I can't make an exhaustive list. Besides, this is exactly how rdma_rw works. I thought this was common knowledge. -- Chuck Lever
On Tue, Dec 11, 2018 at 10:30:24AM -0500, Chuck Lever wrote: > > > > On Dec 11, 2018, at 9:02 AM, Christoph Hellwig <hch@infradead.org> wrote: > > > > On Mon, Dec 10, 2018 at 11:29:45AM -0500, Chuck Lever wrote: > >> Some devices advertise a large max_fast_reg_page_list_len > >> capability, but perform optimally when MRs are significantly smaller > >> than that depth -- probably when the MR itself is no larger than a > >> page. > >> > >> By default, the RDMA R/W core API uses max_sge_rd as the maximum > >> page depth for MRs. For some devices, the value of max_sge_rd is > >> 1, which is also not optimal. Thus, when max_sge_rd is larger than > >> 1, use that value. Otherwise use the value of the > >> max_fast_reg_page_list_len attribute. > >> > >> I've tested this with a couple of devices, and it reproducibly > >> improves the throughput of large I/Os by several percent. > > > > Can you list which devices for reference in the changelog? > > I have only three devices here. I can't make an exhaustive list. Just list the onces you've tested.
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index ae94de9..72c6d32 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -191,10 +191,17 @@ if (attrs->device_cap_flags & IB_DEVICE_SG_GAPS_REG) ia->ri_mrtype = IB_MR_TYPE_SG_GAPS; - ia->ri_max_frwr_depth = - min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, - attrs->max_fast_reg_page_list_len); - dprintk("RPC: %s: device's max FR page list len = %u\n", + /* Quirk: Some devices advertise a large max_fast_reg_page_list_len + * capability, but perform optimally when the MRs are not larger + * than a page. + */ + if (attrs->max_sge_rd > 1) + ia->ri_max_frwr_depth = attrs->max_sge_rd; + else + ia->ri_max_frwr_depth = attrs->max_fast_reg_page_list_len; + if (ia->ri_max_frwr_depth > RPCRDMA_MAX_DATA_SEGS) + ia->ri_max_frwr_depth = RPCRDMA_MAX_DATA_SEGS; + dprintk("RPC: %s: max FR page list depth = %u\n", __func__, ia->ri_max_frwr_depth); /* Add room for frwr register and invalidate WRs.
Some devices advertise a large max_fast_reg_page_list_len capability, but perform optimally when MRs are significantly smaller than that depth -- probably when the MR itself is no larger than a page. By default, the RDMA R/W core API uses max_sge_rd as the maximum page depth for MRs. For some devices, the value of max_sge_rd is 1, which is also not optimal. Thus, when max_sge_rd is larger than 1, use that value. Otherwise use the value of the max_fast_reg_page_list_len attribute. I've tested this with a couple of devices, and it reproducibly improves the throughput of large I/Os by several percent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/frwr_ops.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)