diff mbox series

[v3,05/24] xprtrdma: Reduce max_frwr_depth

Message ID 20181210162945.4198.24714.stgit@manet.1015granger.net (mailing list archive)
State New, archived
Headers show
Series NFS/RDMA client for next | expand

Commit Message

Chuck Lever III Dec. 10, 2018, 4:29 p.m. UTC
Some devices advertise a large max_fast_reg_page_list_len
capability, but perform optimally when MRs are significantly smaller
than that depth -- probably when the MR itself is no larger than a
page.

By default, the RDMA R/W core API uses max_sge_rd as the maximum
page depth for MRs. For some devices, the value of max_sge_rd is
1, which is also not optimal. Thus, when max_sge_rd is larger than
1, use that value. Otherwise use the value of the
max_fast_reg_page_list_len attribute.

I've tested this with a couple of devices, and it reproducibly
improves the throughput of large I/Os by several percent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/frwr_ops.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Comments

Christoph Hellwig Dec. 11, 2018, 2:02 p.m. UTC | #1
On Mon, Dec 10, 2018 at 11:29:45AM -0500, Chuck Lever wrote:
> Some devices advertise a large max_fast_reg_page_list_len
> capability, but perform optimally when MRs are significantly smaller
> than that depth -- probably when the MR itself is no larger than a
> page.
> 
> By default, the RDMA R/W core API uses max_sge_rd as the maximum
> page depth for MRs. For some devices, the value of max_sge_rd is
> 1, which is also not optimal. Thus, when max_sge_rd is larger than
> 1, use that value. Otherwise use the value of the
> max_fast_reg_page_list_len attribute.
> 
> I've tested this with a couple of devices, and it reproducibly
> improves the throughput of large I/Os by several percent.

Can you list which devices for reference in the changelog?
Chuck Lever III Dec. 11, 2018, 3:30 p.m. UTC | #2
> On Dec 11, 2018, at 9:02 AM, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Mon, Dec 10, 2018 at 11:29:45AM -0500, Chuck Lever wrote:
>> Some devices advertise a large max_fast_reg_page_list_len
>> capability, but perform optimally when MRs are significantly smaller
>> than that depth -- probably when the MR itself is no larger than a
>> page.
>> 
>> By default, the RDMA R/W core API uses max_sge_rd as the maximum
>> page depth for MRs. For some devices, the value of max_sge_rd is
>> 1, which is also not optimal. Thus, when max_sge_rd is larger than
>> 1, use that value. Otherwise use the value of the
>> max_fast_reg_page_list_len attribute.
>> 
>> I've tested this with a couple of devices, and it reproducibly
>> improves the throughput of large I/Os by several percent.
> 
> Can you list which devices for reference in the changelog?

I have only three devices here. I can't make an exhaustive list.

Besides, this is exactly how rdma_rw works. I thought this was
common knowledge.

--
Chuck Lever
Christoph Hellwig Dec. 12, 2018, 7:18 a.m. UTC | #3
On Tue, Dec 11, 2018 at 10:30:24AM -0500, Chuck Lever wrote:
> 
> 
> > On Dec 11, 2018, at 9:02 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > On Mon, Dec 10, 2018 at 11:29:45AM -0500, Chuck Lever wrote:
> >> Some devices advertise a large max_fast_reg_page_list_len
> >> capability, but perform optimally when MRs are significantly smaller
> >> than that depth -- probably when the MR itself is no larger than a
> >> page.
> >> 
> >> By default, the RDMA R/W core API uses max_sge_rd as the maximum
> >> page depth for MRs. For some devices, the value of max_sge_rd is
> >> 1, which is also not optimal. Thus, when max_sge_rd is larger than
> >> 1, use that value. Otherwise use the value of the
> >> max_fast_reg_page_list_len attribute.
> >> 
> >> I've tested this with a couple of devices, and it reproducibly
> >> improves the throughput of large I/Os by several percent.
> > 
> > Can you list which devices for reference in the changelog?
> 
> I have only three devices here. I can't make an exhaustive list.

Just list the onces you've tested.
diff mbox series

Patch

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index ae94de9..72c6d32 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -191,10 +191,17 @@ 
 	if (attrs->device_cap_flags & IB_DEVICE_SG_GAPS_REG)
 		ia->ri_mrtype = IB_MR_TYPE_SG_GAPS;
 
-	ia->ri_max_frwr_depth =
-			min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
-			      attrs->max_fast_reg_page_list_len);
-	dprintk("RPC:       %s: device's max FR page list len = %u\n",
+	/* Quirk: Some devices advertise a large max_fast_reg_page_list_len
+	 * capability, but perform optimally when the MRs are not larger
+	 * than a page.
+	 */
+	if (attrs->max_sge_rd > 1)
+		ia->ri_max_frwr_depth = attrs->max_sge_rd;
+	else
+		ia->ri_max_frwr_depth = attrs->max_fast_reg_page_list_len;
+	if (ia->ri_max_frwr_depth > RPCRDMA_MAX_DATA_SEGS)
+		ia->ri_max_frwr_depth = RPCRDMA_MAX_DATA_SEGS;
+	dprintk("RPC:       %s: max FR page list depth = %u\n",
 		__func__, ia->ri_max_frwr_depth);
 
 	/* Add room for frwr register and invalidate WRs.