diff mbox

RDMA: Increasing RPCRDMA_MAX_DATA_SEGS

Message ID 1311270542-2021-1-git-send-email-steved@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Steve Dickson July 21, 2011, 5:49 p.m. UTC
Our performance team has noticed that increasing
RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
increases throughput when using the RDMA transport.

Signed-off-by: Steve Dickson <steved@redhat.com>
---
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

J. Bruce Fields July 21, 2011, 9:41 p.m. UTC | #1
On Thu, Jul 21, 2011 at 01:49:02PM -0400, Steve Dickson wrote:
> Our performance team has noticed that increasing
> RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
> increases throughput when using the RDMA transport.

The main risk that I can see being that we have on the stack in two
places:

	rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg, ...
	{
		...
		u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];

	rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, ...
	{
		...
		struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; 

Where ip_phys_buf is 16 bytes.

So that's 512 bytes in the first case, 1024 in the second.  This is
called from rpciod--what are our rules about allocating memory from
rpciod?

--b.

> 
> Signed-off-by: Steve Dickson <steved@redhat.com>
> ---
>  net/sunrpc/xprtrdma/xprt_rdma.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index cae761a..5d1cfe5 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -109,7 +109,7 @@ struct rpcrdma_ep {
>   */
>  
>  /* temporary static scatter/gather max */
> -#define RPCRDMA_MAX_DATA_SEGS	(8)	/* max scatter/gather */
> +#define RPCRDMA_MAX_DATA_SEGS	(64)	/* max scatter/gather */
>  #define RPCRDMA_MAX_SEGS 	(RPCRDMA_MAX_DATA_SEGS + 2) /* head+tail = 2 */
>  #define MAX_RPCRDMAHDR	(\
>  	/* max supported RPC/RDMA header */ \
> -- 
> 1.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust July 22, 2011, 1:42 a.m. UTC | #2
On Thu, 2011-07-21 at 17:41 -0400, J. Bruce Fields wrote: 
> On Thu, Jul 21, 2011 at 01:49:02PM -0400, Steve Dickson wrote:
> > Our performance team has noticed that increasing
> > RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
> > increases throughput when using the RDMA transport.
> 
> The main risk that I can see being that we have on the stack in two
> places:
> 
> 	rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg, ...
> 	{
> 		...
> 		u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
> 
> 	rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, ...
> 	{
> 		...
> 		struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; 
> 
> Where ip_phys_buf is 16 bytes.
> 
> So that's 512 bytes in the first case, 1024 in the second.  This is
> called from rpciod--what are our rules about allocating memory from
> rpciod?

Is that allocated on the stack? We should always try to avoid 1024-byte
allocations on the stack, since that eats up a full 1/8th (or 1/4 in the
case of 4k stacks) of the total stack space.

If, OTOH, that memory is being allocated dynamically, then the rule is
"don't let rpciod sleep".

Cheers
  Trond
J. Bruce Fields July 22, 2011, 1:55 a.m. UTC | #3
On Thu, Jul 21, 2011 at 09:42:04PM -0400, Trond Myklebust wrote:
> On Thu, 2011-07-21 at 17:41 -0400, J. Bruce Fields wrote: 
> > On Thu, Jul 21, 2011 at 01:49:02PM -0400, Steve Dickson wrote:
> > > Our performance team has noticed that increasing
> > > RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
> > > increases throughput when using the RDMA transport.
> > 
> > The main risk that I can see being that we have on the stack in two
> > places:
> > 
> > 	rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg, ...
> > 	{
> > 		...
> > 		u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
> > 
> > 	rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg, ...
> > 	{
> > 		...
> > 		struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS]; 
> > 
> > Where ip_phys_buf is 16 bytes.
> > 
> > So that's 512 bytes in the first case, 1024 in the second.  This is
> > called from rpciod--what are our rules about allocating memory from
> > rpciod?
> 
> Is that allocated on the stack? We should always try to avoid 1024-byte
> allocations on the stack, since that eats up a full 1/8th (or 1/4 in the
> case of 4k stacks) of the total stack space.

Right, it's on the stack, so I was wondering what we should do
instead....

> If, OTOH, that memory is being allocated dynamically, then the rule is
> "don't let rpciod sleep".

OK, so, looking around, the buf_alloc methods might provide examples to
follow for dynamic allocation here?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Max Matveev July 22, 2011, 8:19 a.m. UTC | #4
On Thu, 21 Jul 2011 13:49:02 -0400, Steve Dickson wrote:

 steved> Our performance team has noticed that increasing
 steved> RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
 steved> increases throughput when using the RDMA transport.

Did they try new client with old server and vice versa?
Both read and write?

max

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Dickson July 25, 2011, 3:18 p.m. UTC | #5
Sorry for the delayed response... I took a day off..  

On 07/22/2011 04:19 AM, Max Matveev wrote:
> On Thu, 21 Jul 2011 13:49:02 -0400, Steve Dickson wrote:
> 
>  steved> Our performance team has noticed that increasing
>  steved> RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
>  steved> increases throughput when using the RDMA transport.
> 
> Did they try new client with old server and vice versa?
> Both read and write?
I believe it was done on the server side, but I've cc-ed the
person who did the testing.... 

steved.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index cae761a..5d1cfe5 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -109,7 +109,7 @@  struct rpcrdma_ep {
  */
 
 /* temporary static scatter/gather max */
-#define RPCRDMA_MAX_DATA_SEGS	(8)	/* max scatter/gather */
+#define RPCRDMA_MAX_DATA_SEGS	(64)	/* max scatter/gather */
 #define RPCRDMA_MAX_SEGS 	(RPCRDMA_MAX_DATA_SEGS + 2) /* head+tail = 2 */
 #define MAX_RPCRDMAHDR	(\
 	/* max supported RPC/RDMA header */ \