diff mbox

RDMA: Increasing RPCRDMA_MAX_DATA_SEGS

Message ID 1311270542-2021-1-git-send-email-steved@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Steve Dickson July 21, 2011, 5:49 p.m. UTC
Our performance team has noticed that increasing
RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
increases throughput when using the RDMA transport.

Signed-off-by: Steve Dickson <steved@redhat.com>
---
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

bfields@fieldses.org July 21, 2011, 9:41 p.m. UTC | #1
On Thu, Jul 21, 2011 at 01:49:02PM -0400, Steve Dickson wrote:
> Our performance team has noticed that increasing
> RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
> increases throughput when using the RDMA transport.

The main risk that I can see being that we have on the stack in two
places:

	rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg, ...
	{
		...
		u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];

	rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, ...
	{
		...
		struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; 

Where ip_phys_buf is 16 bytes.

So that's 512 bytes in the first case, 1024 in the second.  This is
called from rpciod--what are our rules about allocating memory from
rpciod?

--b.

> 
> Signed-off-by: Steve Dickson <steved@redhat.com>
> ---
>  net/sunrpc/xprtrdma/xprt_rdma.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index cae761a..5d1cfe5 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -109,7 +109,7 @@ struct rpcrdma_ep {
>   */
>  
>  /* temporary static scatter/gather max */
> -#define RPCRDMA_MAX_DATA_SEGS	(8)	/* max scatter/gather */
> +#define RPCRDMA_MAX_DATA_SEGS	(64)	/* max scatter/gather */
>  #define RPCRDMA_MAX_SEGS 	(RPCRDMA_MAX_DATA_SEGS + 2) /* head+tail = 2 */
>  #define MAX_RPCRDMAHDR	(\
>  	/* max supported RPC/RDMA header */ \
> -- 
> 1.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust July 22, 2011, 1:42 a.m. UTC | #2
On Thu, 2011-07-21 at 17:41 -0400, J. Bruce Fields wrote: 
> On Thu, Jul 21, 2011 at 01:49:02PM -0400, Steve Dickson wrote:
> > Our performance team has noticed that increasing
> > RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
> > increases throughput when using the RDMA transport.
> 
> The main risk that I can see being that we have on the stack in two
> places:
> 
> 	rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg, ...
> 	{
> 		...
> 		u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
> 
> 	rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, ...
> 	{
> 		...
> 		struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; 
> 
> Where ip_phys_buf is 16 bytes.
> 
> So that's 512 bytes in the first case, 1024 in the second.  This is
> called from rpciod--what are our rules about allocating memory from
> rpciod?

Is that allocated on the stack? We should always try to avoid 1024-byte
allocations on the stack, since that eats up a full 1/8th (or 1/4 in the
case of 4k stacks) of the total stack space.

If, OTOH, that memory is being allocated dynamically, then the rule is
"don't let rpciod sleep".

Cheers
  Trond
bfields@fieldses.org July 22, 2011, 1:55 a.m. UTC | #3
On Thu, Jul 21, 2011 at 09:42:04PM -0400, Trond Myklebust wrote:
> On Thu, 2011-07-21 at 17:41 -0400, J. Bruce Fields wrote: 
> > On Thu, Jul 21, 2011 at 01:49:02PM -0400, Steve Dickson wrote:
> > > Our performance team has noticed that increasing
> > > RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
> > > increases throughput when using the RDMA transport.
> > 
> > The main risk that I can see being that we have on the stack in two
> > places:
> > 
> > 	rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg, ...
> > 	{
> > 		...
> > 		u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
> > 
> > 	rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, ...
> > 	{
> > 		...
> > 		struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; 
> > 
> > Where ip_phys_buf is 16 bytes.
> > 
> > So that's 512 bytes in the first case, 1024 in the second.  This is
> > called from rpciod--what are our rules about allocating memory from
> > rpciod?
> 
> Is that allocated on the stack? We should always try to avoid 1024-byte
> allocations on the stack, since that eats up a full 1/8th (or 1/4 in the
> case of 4k stacks) of the total stack space.

Right, it's on the stack, so I was wondering what we should do
instead....

> If, OTOH, that memory is being allocated dynamically, then the rule is
> "don't let rpciod sleep".

OK, so, looking around, the buf_alloc methods might provide examples to
follow for dynamic allocation here?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Max Matveev July 22, 2011, 8:19 a.m. UTC | #4
On Thu, 21 Jul 2011 13:49:02 -0400, Steve Dickson wrote:

 steved> Our performance team has noticed that increasing
 steved> RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
 steved> increases throughput when using the RDMA transport.

Did they try new client with old server and vice versa?
Both read and write?

max

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Dickson July 25, 2011, 3:18 p.m. UTC | #5
Sorry for the delayed response... I took a day off..  

On 07/22/2011 04:19 AM, Max Matveev wrote:
> On Thu, 21 Jul 2011 13:49:02 -0400, Steve Dickson wrote:
> 
>  steved> Our performance team has noticed that increasing
>  steved> RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
>  steved> increases throughput when using the RDMA transport.
> 
> Did they try new client with old server and vice versa?
> Both read and write?
I believe it was done on the server side, but I've cc-ed the
person who did the testing.... 

steved.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index cae761a..5d1cfe5 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -109,7 +109,7 @@  struct rpcrdma_ep {
  */
 
 /* temporary static scatter/gather max */
-#define RPCRDMA_MAX_DATA_SEGS	(8)	/* max scatter/gather */
+#define RPCRDMA_MAX_DATA_SEGS	(64)	/* max scatter/gather */
 #define RPCRDMA_MAX_SEGS 	(RPCRDMA_MAX_DATA_SEGS + 2) /* head+tail = 2 */
 #define MAX_RPCRDMAHDR	(\
 	/* max supported RPC/RDMA header */ \