Message ID | 20141016194000.13414.83844.stgit@manet.1015granger.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/16/14 15:40, Chuck Lever wrote: > So far, TCP is the only transport that supports bi-directional RPC. > > When mounting with NFSv4.1 using a transport that does not support > bi-directional RPC, establish a TCP sidecar connection to handle > backchannel traffic for a session. The sidecar transport does not > use its forward channel except for sending BIND_CONN_TO_SESSION > operations. > > This commit adds logic to create and destroy the sidecar transport. > Subsequent commits add logic to use the transport. I thought NFS v4.0 also uses a separate connection for the backchannel? Can any of that code be reused here, rather than creating new sidecar structures? Anna > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfs/client.c | 1 + > fs/nfs/nfs4client.c | 54 +++++++++++++++++++++++++++++++++++++++++++++ > include/linux/nfs_fs_sb.h | 2 ++ > 3 files changed, 57 insertions(+) > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c > index 6a4f366..19f49bf 100644 > --- a/fs/nfs/client.c > +++ b/fs/nfs/client.c > @@ -78,6 +78,7 @@ const struct rpc_program nfs_program = { > .stats = &nfs_rpcstat, > .pipe_dir_name = NFS_PIPE_DIRNAME, > }; > +EXPORT_SYMBOL_GPL(nfs_program); > > struct rpc_stat nfs_rpcstat = { > .program = &nfs_program > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > index 5f4b818..b1cc35e 100644 > --- a/fs/nfs/nfs4client.c > +++ b/fs/nfs/nfs4client.c > @@ -213,6 +213,8 @@ static void nfs4_destroy_callback(struct nfs_client *clp) > { > if (__test_and_clear_bit(NFS_CS_CALLBACK, &clp->cl_res_state)) > nfs_callback_down(clp->cl_mvops->minor_version, clp->cl_net); > + if (clp->cl_bc_rpcclient) > + rpc_shutdown_client(clp->cl_bc_rpcclient); > } > > static void nfs4_shutdown_client(struct nfs_client *clp) > @@ -291,6 +293,53 @@ int nfs40_init_client(struct nfs_client *clp) > > #if defined(CONFIG_NFS_V4_1) > > +/* > + * Create a separate rpc_clnt using TCP that can provide a > + * backchannel service. > + */ > +static int nfs41_create_sidecar_rpc_client(struct nfs_client *clp) > +{ > + struct sockaddr_storage address; > + struct sockaddr *sap = (struct sockaddr *)&address; > + struct rpc_create_args args = { > + .net = clp->cl_net, > + .protocol = XPRT_TRANSPORT_TCP, > + .address = sap, > + .addrsize = clp->cl_addrlen, > + .servername = clp->cl_hostname, > + .program = &nfs_program, > + .version = clp->rpc_ops->version, > + .flags = (RPC_CLNT_CREATE_DISCRTRY | > + RPC_CLNT_CREATE_NOPING), > + }; > + struct rpc_clnt *clnt; > + struct rpc_cred *cred; > + > + if (rpc_xprt_is_bidirectional(clp->cl_rpcclient)) > + return 0; > + > + if (test_bit(NFS_CS_NORESVPORT, &clp->cl_flags)) > + args.flags |= RPC_CLNT_CREATE_NONPRIVPORT; > + memcpy(sap, &clp->cl_addr, clp->cl_addrlen); > + rpc_set_port(sap, NFS_PORT); > + cred = nfs4_get_clid_cred(clp); > + if (cred) { > + args.authflavor = cred->cr_auth->au_flavor; > + put_rpccred(cred); > + } else > + args.authflavor = RPC_AUTH_UNIX; > + > + clnt = rpc_create(&args); > + if (IS_ERR(clnt)) { > + dprintk("%s: cannot create side-car RPC client. Error = %ld\n", > + __func__, PTR_ERR(clnt)); > + return PTR_ERR(clnt); > + } > + > + clp->cl_bc_rpcclient = clnt; > + return 0; > +} > + > /** > * nfs41_init_client - nfs_client initialization tasks for NFSv4.1+ > * @clp - nfs_client to initialize > @@ -300,6 +349,11 @@ int nfs40_init_client(struct nfs_client *clp) > int nfs41_init_client(struct nfs_client *clp) > { > struct nfs4_session *session = NULL; > + int ret; > + > + ret = nfs41_create_sidecar_rpc_client(clp); > + if (ret) > + return ret; > > /* > * Create the session and mark it expired. > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h > index 922be2e..159d703 100644 > --- a/include/linux/nfs_fs_sb.h > +++ b/include/linux/nfs_fs_sb.h > @@ -87,6 +87,8 @@ struct nfs_client { > > /* The sequence id to use for the next CREATE_SESSION */ > u32 cl_seqid; > + /* The optional sidecar backchannel transport */ > + struct rpc_clnt *cl_bc_rpcclient; > /* The flags used for obtaining the clientid during EXCHANGE_ID */ > u32 cl_exchange_flags; > struct nfs4_session *cl_session; /* shared session */ > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 20, 2014, at 1:33 PM, Anna Schumaker <Anna.Schumaker@netapp.com> wrote: > On 10/16/14 15:40, Chuck Lever wrote: >> So far, TCP is the only transport that supports bi-directional RPC. >> >> When mounting with NFSv4.1 using a transport that does not support >> bi-directional RPC, establish a TCP sidecar connection to handle >> backchannel traffic for a session. The sidecar transport does not >> use its forward channel except for sending BIND_CONN_TO_SESSION >> operations. >> >> This commit adds logic to create and destroy the sidecar transport. >> Subsequent commits add logic to use the transport. > > I thought NFS v4.0 also uses a separate connection for the backchannel? For NFSv4.0, the server opens a connection to the client. For NFSv4.1, the client opens the side car connection to the server, and then performs BIND_CONN_TO_SESSION, an operation not in NFSv4.0. The processes are not terribly similar. > Can any of that code be reused here, rather than creating new sidecar structures? Since it’s the client opening a connection in this case, I don’t see any obvious common code paths that I haven’t already re-used. I’m open to suggestions. > Anna > >> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >> --- >> fs/nfs/client.c | 1 + >> fs/nfs/nfs4client.c | 54 +++++++++++++++++++++++++++++++++++++++++++++ >> include/linux/nfs_fs_sb.h | 2 ++ >> 3 files changed, 57 insertions(+) >> >> diff --git a/fs/nfs/client.c b/fs/nfs/client.c >> index 6a4f366..19f49bf 100644 >> --- a/fs/nfs/client.c >> +++ b/fs/nfs/client.c >> @@ -78,6 +78,7 @@ const struct rpc_program nfs_program = { >> .stats = &nfs_rpcstat, >> .pipe_dir_name = NFS_PIPE_DIRNAME, >> }; >> +EXPORT_SYMBOL_GPL(nfs_program); >> >> struct rpc_stat nfs_rpcstat = { >> .program = &nfs_program >> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c >> index 5f4b818..b1cc35e 100644 >> --- a/fs/nfs/nfs4client.c >> +++ b/fs/nfs/nfs4client.c >> @@ -213,6 +213,8 @@ static void nfs4_destroy_callback(struct nfs_client *clp) >> { >> if (__test_and_clear_bit(NFS_CS_CALLBACK, &clp->cl_res_state)) >> nfs_callback_down(clp->cl_mvops->minor_version, clp->cl_net); >> + if (clp->cl_bc_rpcclient) >> + rpc_shutdown_client(clp->cl_bc_rpcclient); >> } >> >> static void nfs4_shutdown_client(struct nfs_client *clp) >> @@ -291,6 +293,53 @@ int nfs40_init_client(struct nfs_client *clp) >> >> #if defined(CONFIG_NFS_V4_1) >> >> +/* >> + * Create a separate rpc_clnt using TCP that can provide a >> + * backchannel service. >> + */ >> +static int nfs41_create_sidecar_rpc_client(struct nfs_client *clp) >> +{ >> + struct sockaddr_storage address; >> + struct sockaddr *sap = (struct sockaddr *)&address; >> + struct rpc_create_args args = { >> + .net = clp->cl_net, >> + .protocol = XPRT_TRANSPORT_TCP, >> + .address = sap, >> + .addrsize = clp->cl_addrlen, >> + .servername = clp->cl_hostname, >> + .program = &nfs_program, >> + .version = clp->rpc_ops->version, >> + .flags = (RPC_CLNT_CREATE_DISCRTRY | >> + RPC_CLNT_CREATE_NOPING), >> + }; >> + struct rpc_clnt *clnt; >> + struct rpc_cred *cred; >> + >> + if (rpc_xprt_is_bidirectional(clp->cl_rpcclient)) >> + return 0; >> + >> + if (test_bit(NFS_CS_NORESVPORT, &clp->cl_flags)) >> + args.flags |= RPC_CLNT_CREATE_NONPRIVPORT; >> + memcpy(sap, &clp->cl_addr, clp->cl_addrlen); >> + rpc_set_port(sap, NFS_PORT); >> + cred = nfs4_get_clid_cred(clp); >> + if (cred) { >> + args.authflavor = cred->cr_auth->au_flavor; >> + put_rpccred(cred); >> + } else >> + args.authflavor = RPC_AUTH_UNIX; >> + >> + clnt = rpc_create(&args); >> + if (IS_ERR(clnt)) { >> + dprintk("%s: cannot create side-car RPC client. Error = %ld\n", >> + __func__, PTR_ERR(clnt)); >> + return PTR_ERR(clnt); >> + } >> + >> + clp->cl_bc_rpcclient = clnt; >> + return 0; >> +} >> + >> /** >> * nfs41_init_client - nfs_client initialization tasks for NFSv4.1+ >> * @clp - nfs_client to initialize >> @@ -300,6 +349,11 @@ int nfs40_init_client(struct nfs_client *clp) >> int nfs41_init_client(struct nfs_client *clp) >> { >> struct nfs4_session *session = NULL; >> + int ret; >> + >> + ret = nfs41_create_sidecar_rpc_client(clp); >> + if (ret) >> + return ret; >> >> /* >> * Create the session and mark it expired. >> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h >> index 922be2e..159d703 100644 >> --- a/include/linux/nfs_fs_sb.h >> +++ b/include/linux/nfs_fs_sb.h >> @@ -87,6 +87,8 @@ struct nfs_client { >> >> /* The sequence id to use for the next CREATE_SESSION */ >> u32 cl_seqid; >> + /* The optional sidecar backchannel transport */ >> + struct rpc_clnt *cl_bc_rpcclient; >> /* The flags used for obtaining the clientid during EXCHANGE_ID */ >> u32 cl_exchange_flags; >> struct nfs4_session *cl_session; /* shared session */ >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 20, 2014 at 9:09 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > > On Oct 20, 2014, at 1:33 PM, Anna Schumaker <Anna.Schumaker@netapp.com> wrote: > > > On 10/16/14 15:40, Chuck Lever wrote: > >> So far, TCP is the only transport that supports bi-directional RPC. > >> > >> When mounting with NFSv4.1 using a transport that does not support > >> bi-directional RPC, establish a TCP sidecar connection to handle > >> backchannel traffic for a session. The sidecar transport does not > >> use its forward channel except for sending BIND_CONN_TO_SESSION > >> operations. > >> > >> This commit adds logic to create and destroy the sidecar transport. > >> Subsequent commits add logic to use the transport. > > > > I thought NFS v4.0 also uses a separate connection for the backchannel? > > For NFSv4.0, the server opens a connection to the client. For > NFSv4.1, the client opens the side car connection to the > server, and then performs BIND_CONN_TO_SESSION, an operation > not in NFSv4.0. The processes are not terribly similar. > > > Can any of that code be reused here, rather than creating new sidecar structures? > > Since it’s the client opening a connection in this case, > I don’t see any obvious common code paths that I haven’t > already re-used. I’m open to suggestions. Why aren't we doing the callbacks via RDMA as per the recommendation in RFC5667 section 5.1?
Hi Trond- On Oct 20, 2014, at 3:40 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > Why aren't we doing the callbacks via RDMA as per the recommendation > in RFC5667 section 5.1? There’s no benefit to it. With a side car, the server requires few or no changes. There are no CB operations that benefit from using RDMA. It’s very quick to implement, re-using most of the client backchannel implementation that already exists. I’ve discussed this with an author of RFC 5667 [cc’d], and also with the implementors of an existing NFSv4.1 server that supports RDMA. They both agree that a side car is an acceptable, or even a preferable, way to approach backchannel support. Also, when I discussed this with you months ago, you also felt that a side car was better than adding backchannel support to the xprtrdma transport. I took this approach only because you OK’d it. But I don’t see an explicit recommendation in section 5.1. Which text are you referring to? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 20, 2014 at 11:11 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > Hi Trond- > > On Oct 20, 2014, at 3:40 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> Why aren't we doing the callbacks via RDMA as per the recommendation >> in RFC5667 section 5.1? > > There’s no benefit to it. With a side car, the server requires > few or no changes. There are no CB operations that benefit > from using RDMA. It’s very quick to implement, re-using most of > the client backchannel implementation that already exists. > > I’ve discussed this with an author of RFC 5667 [cc’d], and also > with the implementors of an existing NFSv4.1 server that supports > RDMA. They both agree that a side car is an acceptable, or even a > preferable, way to approach backchannel support. > > Also, when I discussed this with you months ago, you also felt > that a side car was better than adding backchannel support to the > xprtrdma transport. I took this approach only because you OK’d it. > > But I don’t see an explicit recommendation in section 5.1. Which > text are you referring to? The very first paragraph argues that because callback messages don't carry bulk data, there is no problem with using RPC/RDMA and, in particular, with using RDMA_MSG provided that the buffer sizes are negotiated correctly. So the questions are: 1) Where is the discussion of the merits for and against adding bi-directional support to the xprtrdma layer in Linux? What is the showstopper preventing implementation of a design based around RFC5667? 2) Why do we instead have to solve the whole backchannel problem in the NFSv4.1 layer, and where is the discussion of the merits for and against that particular solution? As far as I can tell, it imposes at least 2 extra requirements: a) NFSv4.1 client+server must have support either for session trunking or for clientid trunking b) NFSv4.1 client must be able to set up a TCP connection to the server (that can be session/clientid trunked with the existing RDMA channel) All I've found so far on googling these questions is a 5 1/2 year old email exchange between Tom Tucker and Ricardo where the conclusion appears to be that we can, in time, implement both designs. However there is no explanation of why we would want to do so. http://comments.gmane.org/gmane.linux.nfs/22927
On Oct 20, 2014, at 6:31 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Mon, Oct 20, 2014 at 11:11 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >> Hi Trond- >> >> On Oct 20, 2014, at 3:40 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >> >>> Why aren't we doing the callbacks via RDMA as per the recommendation >>> in RFC5667 section 5.1? >> >> There’s no benefit to it. With a side car, the server requires >> few or no changes. There are no CB operations that benefit >> from using RDMA. It’s very quick to implement, re-using most of >> the client backchannel implementation that already exists. >> >> I’ve discussed this with an author of RFC 5667 [cc’d], and also >> with the implementors of an existing NFSv4.1 server that supports >> RDMA. They both agree that a side car is an acceptable, or even a >> preferable, way to approach backchannel support. >> >> Also, when I discussed this with you months ago, you also felt >> that a side car was better than adding backchannel support to the >> xprtrdma transport. I took this approach only because you OK’d it. >> >> But I don’t see an explicit recommendation in section 5.1. Which >> text are you referring to? > > The very first paragraph argues that because callback messages don't > carry bulk data, there is no problem with using RPC/RDMA and, in > particular, with using RDMA_MSG provided that the buffer sizes are > negotiated correctly. The opening paragraph is advice that applies to all forms of NFSv4 callback, including NFSv4.0, which uses a separate transport initiated from the NFS server. Specific advice about NFSv4.1 bi-directional RPC is left to the next two paragraphs, but they suggest there be dragons. I rather think this is a warning not to “go there.” > So the questions are: > > 1) Where is the discussion of the merits for and against adding > bi-directional support to the xprtrdma layer in Linux? What is the > showstopper preventing implementation of a design based around > RFC5667? There is no show-stopper (see Section 5.1, after all). It’s simply a matter of development effort: a side-car is much less work than implementing full RDMA backchannel support for both a client and server, especially since TCP backchannel already works and can be used immediately. Also, no problem with eventually implementing RDMA backchannel if the complexity, and any performance overhead it introduces in the forward channel, can be justified. The client can use the CREATE_SESSION flags to detect what a server supports. > 2) Why do we instead have to solve the whole backchannel problem in > the NFSv4.1 layer, and where is the discussion of the merits for and > against that particular solution? As far as I can tell, it imposes at > least 2 extra requirements: > a) NFSv4.1 client+server must have support either for session > trunking or for clientid trunking Very minimal trunking support. The only operation allowed on the TCP side-car's forward channel is BIND_CONN_TO_SESSION. Bruce told me that associating multiple transports to a clientid/session should not be an issue for his server (his words were “if that doesn’t work, it’s a bug”). Would this restrictive form of trunking present a problem? > b) NFSv4.1 client must be able to set up a TCP connection to the > server (that can be session/clientid trunked with the existing RDMA > channel) Also very minimal changes. The changes are already done, posted in v1 of this patch series. > All I've found so far on googling these questions is a 5 1/2 year old > email exchange between Tom Tucker and Ricardo where the conclusion > appears to be that we can, in time, implement both designs. You and I spoke about this on Feb 13, 2014 during pub night. At the time you stated that a side-car was the only spec- compliant way to approach this. I said I would go forward with the idea in Linux, and you did not object. > However > there is no explanation of why we would want to do so. > http://comments.gmane.org/gmane.linux.nfs/22927 I’ve implemented exactly what Ricardo proposed in this thread, including dealing with connection loss: > > The thinking is that NFSRDMA could initially use a TCP callback channel. > > We'll implement BIND_CONN_TO_SESSION so that the backchannel does not > > need to be tied to the forechannel connection. This should address the > > case where you have NFSRDMA for the forechannel and TCP for the > > backchannel. BIND_CONN_TO_SESSION is also required to reestablish > > dropped connections effectively (to avoid losing the reply cache). And here’s what you had to say in support of the idea: > Given what they're hoping to achieve, I'm fine with > doing a simple implementation of sessions first, then progressively > refining it. What’s the next step? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 21, 2014 at 4:06 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Oct 20, 2014, at 6:31 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> On Mon, Oct 20, 2014 at 11:11 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >>> Hi Trond- >>> >>> On Oct 20, 2014, at 3:40 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>> >>>> Why aren't we doing the callbacks via RDMA as per the recommendation >>>> in RFC5667 section 5.1? >>> >>> There’s no benefit to it. With a side car, the server requires >>> few or no changes. There are no CB operations that benefit >>> from using RDMA. It’s very quick to implement, re-using most of >>> the client backchannel implementation that already exists. >>> >>> I’ve discussed this with an author of RFC 5667 [cc’d], and also >>> with the implementors of an existing NFSv4.1 server that supports >>> RDMA. They both agree that a side car is an acceptable, or even a >>> preferable, way to approach backchannel support. >>> >>> Also, when I discussed this with you months ago, you also felt >>> that a side car was better than adding backchannel support to the >>> xprtrdma transport. I took this approach only because you OK’d it. >>> >>> But I don’t see an explicit recommendation in section 5.1. Which >>> text are you referring to? >> >> The very first paragraph argues that because callback messages don't >> carry bulk data, there is no problem with using RPC/RDMA and, in >> particular, with using RDMA_MSG provided that the buffer sizes are >> negotiated correctly. > > The opening paragraph is advice that applies to all forms > of NFSv4 callback, including NFSv4.0, which uses a separate > transport initiated from the NFS server. Specific advice about > NFSv4.1 bi-directional RPC is left to the next two paragraphs, > but they suggest there be dragons. I rather think this is a > warning not to “go there.” All I see is a warning not to rely on XIDs to distinguish between backchannel and forward channel RPC requests. How is that particular to RDMA? >> So the questions are: >> >> 1) Where is the discussion of the merits for and against adding >> bi-directional support to the xprtrdma layer in Linux? What is the >> showstopper preventing implementation of a design based around >> RFC5667? > > There is no show-stopper (see Section 5.1, after all). It’s > simply a matter of development effort: a side-car is much > less work than implementing full RDMA backchannel support for > both a client and server, especially since TCP backchannel > already works and can be used immediately. > > Also, no problem with eventually implementing RDMA backchannel > if the complexity, and any performance overhead it introduces in > the forward channel, can be justified. The client can use the > CREATE_SESSION flags to detect what a server supports. What complexity and performance overhead does it introduce in the forward channel? I've never been presented with a complete discussion of this alternative either from you or from anybody else. >> 2) Why do we instead have to solve the whole backchannel problem in >> the NFSv4.1 layer, and where is the discussion of the merits for and >> against that particular solution? As far as I can tell, it imposes at >> least 2 extra requirements: >> a) NFSv4.1 client+server must have support either for session >> trunking or for clientid trunking > > Very minimal trunking support. The only operation allowed on > the TCP side-car's forward channel is BIND_CONN_TO_SESSION. > > Bruce told me that associating multiple transports to a > clientid/session should not be an issue for his server (his > words were “if that doesn’t work, it’s a bug”). > > Would this restrictive form of trunking present a problem? > >> b) NFSv4.1 client must be able to set up a TCP connection to the >> server (that can be session/clientid trunked with the existing RDMA >> channel) > > Also very minimal changes. The changes are already done, > posted in v1 of this patch series. I'm not asking for details on the size of the changesets, but for a justification of the design itself. If it is possible to confine all the changes to the RPC/RDMA layer, then why consider patches that change the NFSv4.1 layer at all?
On Oct 21, 2014, at 3:45 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Tue, Oct 21, 2014 at 4:06 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >> >> There is no show-stopper (see Section 5.1, after all). It’s >> simply a matter of development effort: a side-car is much >> less work than implementing full RDMA backchannel support for >> both a client and server, especially since TCP backchannel >> already works and can be used immediately. >> >> Also, no problem with eventually implementing RDMA backchannel >> if the complexity, and any performance overhead it introduces in >> the forward channel, can be justified. The client can use the >> CREATE_SESSION flags to detect what a server supports. > > What complexity and performance overhead does it introduce in the > forward channel? The benefit of RDMA is that there are opportunities to reduce host CPU interaction with incoming data. Bi-direction requires that the transport look at the RPC header to determine the direction of the message. That could have an impact on the forward channel, but it’s never been measured, to my knowledge. The reason this is more of an issue for RPC/RDMA is that a copy of the XID appears in the RPC/RDMA header to avoid the need to look at the RPC header. That’s typically what implementations use to steer RPC reply processing. Often the RPC/RDMA header and RPC header land in disparate buffers. The RPC/RDMA reply handler looks strictly at the RPC/RDMA header, and runs in a tasklet usually on a different CPU. Adding bi-direction would mean the transport would have to peek into the upper layer headers, possibly resulting in cache line bouncing. The complexity would be the addition of over a hundred new lines of code on the client, and possibly a similar amount of new code on the server. Small, perhaps, but not insignificant. >>> 2) Why do we instead have to solve the whole backchannel problem in >>> the NFSv4.1 layer, and where is the discussion of the merits for and >>> against that particular solution? As far as I can tell, it imposes at >>> least 2 extra requirements: >>> a) NFSv4.1 client+server must have support either for session >>> trunking or for clientid trunking >> >> Very minimal trunking support. The only operation allowed on >> the TCP side-car's forward channel is BIND_CONN_TO_SESSION. >> >> Bruce told me that associating multiple transports to a >> clientid/session should not be an issue for his server (his >> words were “if that doesn’t work, it’s a bug”). >> >> Would this restrictive form of trunking present a problem? >> >>> b) NFSv4.1 client must be able to set up a TCP connection to the >>> server (that can be session/clientid trunked with the existing RDMA >>> channel) >> >> Also very minimal changes. The changes are already done, >> posted in v1 of this patch series. > > I'm not asking for details on the size of the changesets, but for a > justification of the design itself. The size of the changeset _is_ the justification. It’s a much less invasive change to add a TCP side-car than it is to implement RDMA backchannel on both server and client. Most servers would require almost no change. Linux needs only a bug fix or two. Effectively zero-impact for servers that already support NFSv4.0 on RDMA to get NFSv4.1 and pNFS on RDMA, with working callbacks. That’s really all there is to it. It’s almost entirely a practical consideration: we have the infrastructure and can make it work in just a few lines of code. > If it is possible to confine all > the changes to the RPC/RDMA layer, then why consider patches that > change the NFSv4.1 layer at all? The fast new transport bring-up benefit is probably the biggest win. A TCP side-car makes bringing up any new transport implementation simpler. And, RPC/RDMA offers zero performance benefit for backchannel traffic, especially since CB traffic would never move via RDMA READ/WRITE (as per RFC 5667 section 5.1). The primary benefit to doing an RPC/RDMA-only solution is that there is no upper layer impact. Is that a design requirement? There’s also been no discussion of issues with adding a very restricted amount of transport trunking. Can you elaborate on the problems this could introduce? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 21, 2014 at 8:11 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Oct 21, 2014, at 3:45 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> On Tue, Oct 21, 2014 at 4:06 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >>> >>> There is no show-stopper (see Section 5.1, after all). It’s >>> simply a matter of development effort: a side-car is much >>> less work than implementing full RDMA backchannel support for >>> both a client and server, especially since TCP backchannel >>> already works and can be used immediately. >>> >>> Also, no problem with eventually implementing RDMA backchannel >>> if the complexity, and any performance overhead it introduces in >>> the forward channel, can be justified. The client can use the >>> CREATE_SESSION flags to detect what a server supports. >> >> What complexity and performance overhead does it introduce in the >> forward channel? > > The benefit of RDMA is that there are opportunities to > reduce host CPU interaction with incoming data. > Bi-direction requires that the transport look at the RPC > header to determine the direction of the message. That > could have an impact on the forward channel, but it’s > never been measured, to my knowledge. > > The reason this is more of an issue for RPC/RDMA is that > a copy of the XID appears in the RPC/RDMA header to avoid > the need to look at the RPC header. That’s typically what > implementations use to steer RPC reply processing. > > Often the RPC/RDMA header and RPC header land in > disparate buffers. The RPC/RDMA reply handler looks > strictly at the RPC/RDMA header, and runs in a tasklet > usually on a different CPU. Adding bi-direction would mean > the transport would have to peek into the upper layer > headers, possibly resulting in cache line bouncing. Under what circumstances would you expect to receive a valid NFSv4.1 callback with an RDMA header that spans multiple cache lines? > The complexity would be the addition of over a hundred > new lines of code on the client, and possibly a similar > amount of new code on the server. Small, perhaps, but > not insignificant. Until there are RDMA users, I care a lot less about code changes to xprtrdma than to NFS. >>>> 2) Why do we instead have to solve the whole backchannel problem in >>>> the NFSv4.1 layer, and where is the discussion of the merits for and >>>> against that particular solution? As far as I can tell, it imposes at >>>> least 2 extra requirements: >>>> a) NFSv4.1 client+server must have support either for session >>>> trunking or for clientid trunking >>> >>> Very minimal trunking support. The only operation allowed on >>> the TCP side-car's forward channel is BIND_CONN_TO_SESSION. >>> >>> Bruce told me that associating multiple transports to a >>> clientid/session should not be an issue for his server (his >>> words were “if that doesn’t work, it’s a bug”). >>> >>> Would this restrictive form of trunking present a problem? >>> >>>> b) NFSv4.1 client must be able to set up a TCP connection to the >>>> server (that can be session/clientid trunked with the existing RDMA >>>> channel) >>> >>> Also very minimal changes. The changes are already done, >>> posted in v1 of this patch series. >> >> I'm not asking for details on the size of the changesets, but for a >> justification of the design itself. > > The size of the changeset _is_ the justification. It’s > a much less invasive change to add a TCP side-car than > it is to implement RDMA backchannel on both server and > client. Please define your use of the word "invasive" in the above context. To me "invasive" means "will affect code that is in use by others". > Most servers would require almost no change. Linux needs > only a bug fix or two. Effectively zero-impact for > servers that already support NFSv4.0 on RDMA to get > NFSv4.1 and pNFS on RDMA, with working callbacks. > > That’s really all there is to it. It’s almost entirely a > practical consideration: we have the infrastructure and > can make it work in just a few lines of code. > >> If it is possible to confine all >> the changes to the RPC/RDMA layer, then why consider patches that >> change the NFSv4.1 layer at all? > > The fast new transport bring-up benefit is probably the > biggest win. A TCP side-car makes bringing up any new > transport implementation simpler. That's an assertion that assumes: - we actually want to implement more transports aside from RDMA. - implementing bi-directional transports in the RPC layer is non-simple Right now, the benefit is only to RDMA users. Nobody else is asking for such a change. > And, RPC/RDMA offers zero performance benefit for > backchannel traffic, especially since CB traffic would > never move via RDMA READ/WRITE (as per RFC 5667 section > 5.1). > > The primary benefit to doing an RPC/RDMA-only solution > is that there is no upper layer impact. Is that a design > requirement? > > There’s also been no discussion of issues with adding a > very restricted amount of transport trunking. Can you > elaborate on the problems this could introduce? I haven't looked into those problems and I'm not the one asking anyone to implement trunking.
> On Oct 22, 2014, at 4:39 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> On Tue, Oct 21, 2014 at 8:11 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >> >>> On Oct 21, 2014, at 3:45 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>> >>>> On Tue, Oct 21, 2014 at 4:06 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >>>> >>>> There is no show-stopper (see Section 5.1, after all). It’s >>>> simply a matter of development effort: a side-car is much >>>> less work than implementing full RDMA backchannel support for >>>> both a client and server, especially since TCP backchannel >>>> already works and can be used immediately. >>>> >>>> Also, no problem with eventually implementing RDMA backchannel >>>> if the complexity, and any performance overhead it introduces in >>>> the forward channel, can be justified. The client can use the >>>> CREATE_SESSION flags to detect what a server supports. >>> >>> What complexity and performance overhead does it introduce in the >>> forward channel? >> >> The benefit of RDMA is that there are opportunities to >> reduce host CPU interaction with incoming data. >> Bi-direction requires that the transport look at the RPC >> header to determine the direction of the message. That >> could have an impact on the forward channel, but it’s >> never been measured, to my knowledge. >> >> The reason this is more of an issue for RPC/RDMA is that >> a copy of the XID appears in the RPC/RDMA header to avoid >> the need to look at the RPC header. That’s typically what >> implementations use to steer RPC reply processing. >> >> Often the RPC/RDMA header and RPC header land in >> disparate buffers. The RPC/RDMA reply handler looks >> strictly at the RPC/RDMA header, and runs in a tasklet >> usually on a different CPU. Adding bi-direction would mean >> the transport would have to peek into the upper layer >> headers, possibly resulting in cache line bouncing. > > Under what circumstances would you expect to receive a valid NFSv4.1 > callback with an RDMA header that spans multiple cache lines? The RPC header and RPC/RDMA header are separate entities, but together can span multiple cache lines if the server has returned a chunk list containing multiple entries. For example, RDMA_NOMSG would send the RPC/RDMA header via RDMA SEND with a chunk list that represents the RPC and NFS payload. That list could make the header larger than 32 bytes. I expect that any callback that involves more than 1024 byte of RPC payload will need to use RDMA_NOMSG. A long device info list might fit that category? >> The complexity would be the addition of over a hundred >> new lines of code on the client, and possibly a similar >> amount of new code on the server. Small, perhaps, but >> not insignificant. > > Until there are RDMA users, I care a lot less about code changes to > xprtrdma than to NFS. > >>>>> 2) Why do we instead have to solve the whole backchannel problem in >>>>> the NFSv4.1 layer, and where is the discussion of the merits for and >>>>> against that particular solution? As far as I can tell, it imposes at >>>>> least 2 extra requirements: >>>>> a) NFSv4.1 client+server must have support either for session >>>>> trunking or for clientid trunking >>>> >>>> Very minimal trunking support. The only operation allowed on >>>> the TCP side-car's forward channel is BIND_CONN_TO_SESSION. >>>> >>>> Bruce told me that associating multiple transports to a >>>> clientid/session should not be an issue for his server (his >>>> words were “if that doesn’t work, it’s a bug”). >>>> >>>> Would this restrictive form of trunking present a problem? >>>> >>>>> b) NFSv4.1 client must be able to set up a TCP connection to the >>>>> server (that can be session/clientid trunked with the existing RDMA >>>>> channel) >>>> >>>> Also very minimal changes. The changes are already done, >>>> posted in v1 of this patch series. >>> >>> I'm not asking for details on the size of the changesets, but for a >>> justification of the design itself. >> >> The size of the changeset _is_ the justification. It’s >> a much less invasive change to add a TCP side-car than >> it is to implement RDMA backchannel on both server and >> client. > > Please define your use of the word "invasive" in the above context. To > me "invasive" means "will affect code that is in use by others". The server side, then, is non-invasive. The client side makes minor changes to state management. > >> Most servers would require almost no change. Linux needs >> only a bug fix or two. Effectively zero-impact for >> servers that already support NFSv4.0 on RDMA to get >> NFSv4.1 and pNFS on RDMA, with working callbacks. >> >> That’s really all there is to it. It’s almost entirely a >> practical consideration: we have the infrastructure and >> can make it work in just a few lines of code. >> >>> If it is possible to confine all >>> the changes to the RPC/RDMA layer, then why consider patches that >>> change the NFSv4.1 layer at all? >> >> The fast new transport bring-up benefit is probably the >> biggest win. A TCP side-car makes bringing up any new >> transport implementation simpler. > > That's an assertion that assumes: > - we actually want to implement more transports aside from RDMA So you no longer consider RPC/SCTP a possibility? > - implementing bi-directional transports in the RPC layer is non-simple I don't care to generalize about that. In the RPC/RDMA case, there are some complications that make it non-simple, but not impossible. So we have an example of a non-simple case, IMO. > Right now, the benefit is only to RDMA users. Nobody else is asking > for such a change. > >> And, RPC/RDMA offers zero performance benefit for >> backchannel traffic, especially since CB traffic would >> never move via RDMA READ/WRITE (as per RFC 5667 section >> 5.1). >> >> The primary benefit to doing an RPC/RDMA-only solution >> is that there is no upper layer impact. Is that a design >> requirement? Based on your objections, it appears that "no upper layer impact" is a hard design requirement. I will take this as a NACK for the side-car approach. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 22, 2014 at 8:20 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > >> On Oct 22, 2014, at 4:39 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >> >>> On Tue, Oct 21, 2014 at 8:11 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >>> >>>> On Oct 21, 2014, at 3:45 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>>> >>>>> On Tue, Oct 21, 2014 at 4:06 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >>>>> >>>>> There is no show-stopper (see Section 5.1, after all). It’s >>>>> simply a matter of development effort: a side-car is much >>>>> less work than implementing full RDMA backchannel support for >>>>> both a client and server, especially since TCP backchannel >>>>> already works and can be used immediately. >>>>> >>>>> Also, no problem with eventually implementing RDMA backchannel >>>>> if the complexity, and any performance overhead it introduces in >>>>> the forward channel, can be justified. The client can use the >>>>> CREATE_SESSION flags to detect what a server supports. >>>> >>>> What complexity and performance overhead does it introduce in the >>>> forward channel? >>> >>> The benefit of RDMA is that there are opportunities to >>> reduce host CPU interaction with incoming data. >>> Bi-direction requires that the transport look at the RPC >>> header to determine the direction of the message. That >>> could have an impact on the forward channel, but it’s >>> never been measured, to my knowledge. >>> >>> The reason this is more of an issue for RPC/RDMA is that >>> a copy of the XID appears in the RPC/RDMA header to avoid >>> the need to look at the RPC header. That’s typically what >>> implementations use to steer RPC reply processing. >>> >>> Often the RPC/RDMA header and RPC header land in >>> disparate buffers. The RPC/RDMA reply handler looks >>> strictly at the RPC/RDMA header, and runs in a tasklet >>> usually on a different CPU. Adding bi-direction would mean >>> the transport would have to peek into the upper layer >>> headers, possibly resulting in cache line bouncing. >> >> Under what circumstances would you expect to receive a valid NFSv4.1 >> callback with an RDMA header that spans multiple cache lines? > > The RPC header and RPC/RDMA header are separate entities, but > together can span multiple cache lines if the server has returned a > chunk list containing multiple entries. > > For example, RDMA_NOMSG would send the RPC/RDMA header > via RDMA SEND with a chunk list that represents the RPC and NFS > payload. That list could make the header larger than 32 bytes. > > I expect that any callback that involves more than 1024 byte of > RPC payload will need to use RDMA_NOMSG. A long device > info list might fit that category? Right, but are there any callbacks that would do that? AFAICS, most of them are CB_SEQUENCE+(PUT_FH+CB_do_some_recall_operation_on_this_file | some single CB_operation) The point is that we can set finite limits on the size of callbacks in the CREATE_SESSION. As long as those limits are reasonable (and 1K does seem more than reasonable for existing use cases) then why shouldn't we be able to expect the server to use RDMA_MSG? >>> The complexity would be the addition of over a hundred >>> new lines of code on the client, and possibly a similar >>> amount of new code on the server. Small, perhaps, but >>> not insignificant. >> >> Until there are RDMA users, I care a lot less about code changes to >> xprtrdma than to NFS. >> >>>>>> 2) Why do we instead have to solve the whole backchannel problem in >>>>>> the NFSv4.1 layer, and where is the discussion of the merits for and >>>>>> against that particular solution? As far as I can tell, it imposes at >>>>>> least 2 extra requirements: >>>>>> a) NFSv4.1 client+server must have support either for session >>>>>> trunking or for clientid trunking >>>>> >>>>> Very minimal trunking support. The only operation allowed on >>>>> the TCP side-car's forward channel is BIND_CONN_TO_SESSION. >>>>> >>>>> Bruce told me that associating multiple transports to a >>>>> clientid/session should not be an issue for his server (his >>>>> words were “if that doesn’t work, it’s a bug”). >>>>> >>>>> Would this restrictive form of trunking present a problem? >>>>> >>>>>> b) NFSv4.1 client must be able to set up a TCP connection to the >>>>>> server (that can be session/clientid trunked with the existing RDMA >>>>>> channel) >>>>> >>>>> Also very minimal changes. The changes are already done, >>>>> posted in v1 of this patch series. >>>> >>>> I'm not asking for details on the size of the changesets, but for a >>>> justification of the design itself. >>> >>> The size of the changeset _is_ the justification. It’s >>> a much less invasive change to add a TCP side-car than >>> it is to implement RDMA backchannel on both server and >>> client. >> >> Please define your use of the word "invasive" in the above context. To >> me "invasive" means "will affect code that is in use by others". > > The server side, then, is non-invasive. The client side makes minor > changes to state management. > >> >>> Most servers would require almost no change. Linux needs >>> only a bug fix or two. Effectively zero-impact for >>> servers that already support NFSv4.0 on RDMA to get >>> NFSv4.1 and pNFS on RDMA, with working callbacks. >>> >>> That’s really all there is to it. It’s almost entirely a >>> practical consideration: we have the infrastructure and >>> can make it work in just a few lines of code. >>> >>>> If it is possible to confine all >>>> the changes to the RPC/RDMA layer, then why consider patches that >>>> change the NFSv4.1 layer at all? >>> >>> The fast new transport bring-up benefit is probably the >>> biggest win. A TCP side-car makes bringing up any new >>> transport implementation simpler. >> >> That's an assertion that assumes: >> - we actually want to implement more transports aside from RDMA > > So you no longer consider RPC/SCTP a possibility? I'd still like to consider it, but the whole point would be to _avoid_ doing trunking in the NFS layer. SCTP does trunking/multi-pathing at the transport level, meaning that we don't have to deal with tracking connections, state, replaying messages, etc. Doing bi-directional RPC with SCTP is not an issue, since the transport is fully symmetric. >> - implementing bi-directional transports in the RPC layer is non-simple > > I don't care to generalize about that. In the RPC/RDMA case, there > are some complications that make it non-simple, but not impossible. > So we have an example of a non-simple case, IMO. > >> Right now, the benefit is only to RDMA users. Nobody else is asking >> for such a change. >> >>> And, RPC/RDMA offers zero performance benefit for >>> backchannel traffic, especially since CB traffic would >>> never move via RDMA READ/WRITE (as per RFC 5667 section >>> 5.1). >>> >>> The primary benefit to doing an RPC/RDMA-only solution >>> is that there is no upper layer impact. Is that a design >>> requirement? > > Based on your objections, it appears that "no upper layer > impact" is a hard design requirement. I will take this as a > NACK for the side-car approach. There is not a hard NACK yet, but I am asking for stronger justification. I do _not_ want to find myself in a situation 2 or 3 years down the road where I have to argue against someone telling me that we additionally have to implement callbacks over IB/RDMA because the TCP sidecar is an incomplete solution. We should do either one or the other, but not both...
On Oct 22, 2014, at 4:53 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Wed, Oct 22, 2014 at 8:20 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >> >>> On Oct 22, 2014, at 4:39 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>> >>>> On Tue, Oct 21, 2014 at 8:11 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >>>> >>>>> On Oct 21, 2014, at 3:45 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>>>> >>>>>> On Tue, Oct 21, 2014 at 4:06 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >>>>>> >>>>>> There is no show-stopper (see Section 5.1, after all). It’s >>>>>> simply a matter of development effort: a side-car is much >>>>>> less work than implementing full RDMA backchannel support for >>>>>> both a client and server, especially since TCP backchannel >>>>>> already works and can be used immediately. >>>>>> >>>>>> Also, no problem with eventually implementing RDMA backchannel >>>>>> if the complexity, and any performance overhead it introduces in >>>>>> the forward channel, can be justified. The client can use the >>>>>> CREATE_SESSION flags to detect what a server supports. >>>>> >>>>> What complexity and performance overhead does it introduce in the >>>>> forward channel? >>>> >>>> The benefit of RDMA is that there are opportunities to >>>> reduce host CPU interaction with incoming data. >>>> Bi-direction requires that the transport look at the RPC >>>> header to determine the direction of the message. That >>>> could have an impact on the forward channel, but it’s >>>> never been measured, to my knowledge. >>>> >>>> The reason this is more of an issue for RPC/RDMA is that >>>> a copy of the XID appears in the RPC/RDMA header to avoid >>>> the need to look at the RPC header. That’s typically what >>>> implementations use to steer RPC reply processing. >>>> >>>> Often the RPC/RDMA header and RPC header land in >>>> disparate buffers. The RPC/RDMA reply handler looks >>>> strictly at the RPC/RDMA header, and runs in a tasklet >>>> usually on a different CPU. Adding bi-direction would mean >>>> the transport would have to peek into the upper layer >>>> headers, possibly resulting in cache line bouncing. >>> >>> Under what circumstances would you expect to receive a valid NFSv4.1 >>> callback with an RDMA header that spans multiple cache lines? >> >> The RPC header and RPC/RDMA header are separate entities, but >> together can span multiple cache lines if the server has returned a >> chunk list containing multiple entries. >> >> For example, RDMA_NOMSG would send the RPC/RDMA header >> via RDMA SEND with a chunk list that represents the RPC and NFS >> payload. That list could make the header larger than 32 bytes. >> >> I expect that any callback that involves more than 1024 byte of >> RPC payload will need to use RDMA_NOMSG. A long device >> info list might fit that category? > > Right, but are there any callbacks that would do that? AFAICS, most of > them are CB_SEQUENCE+(PUT_FH+CB_do_some_recall_operation_on_this_file > | some single CB_operation) That is a question only a pNFS layout developer can answer. Allowing larger CB operations might be important. I thought I heard Matt list a couple of examples that might move bulk data via CB, but probably none that have implementations currently. I’m not familiar with block or flex-file, so I can’t make any kind of guess about those. > The point is that we can set finite limits on the size of callbacks in > the CREATE_SESSION. As long as those limits are reasonable (and 1K > does seem more than reasonable for existing use cases) then why > shouldn't we be able to expect the server to use RDMA_MSG? The spec allows both RDMA_MSG and RDMA_NOMSG for CB RPC. That provides some implementation flexibility, but it also means either end can use either MSG type. An interoperable CB service would have to support all scenarios. RDMA_NOMSG can support small or large payloads. RDMA_MSG can support only a payload that fits in the receiver’s pre-posted buffer (and that includes the RPC and NFS headers) because NFS CB RPCs are not allowed to use read or write chunks. Since there are no implementations, currently, I was hoping everyone might agree to stick with using only RDMA_NOMSG for NFSv4.1 CB RPCs on RPC/RDMA. That would mean there was a high limit on CB RPC payload size, and RDMA READ would be used to move CB RPC calls and replies in all cases. NFS uses NOMSG so infrequently that it would be a good way to limit churn in the transport’s hot paths. Particularly NFS READ and WRITE are required to use RDMA_MSG with their payload encoded in chunks. If the incoming message is MSG, then clearly it can’t be a reverse RPC, and there’s no need to go looking for it (as long as we all agree on the “CBs use only NOMSG” convention). The receiver can tell just by looking at the RPC/RDMA header that extra processing won’t be needed in the common case. Clearly we need a prototype or two to understand these issues. And probably some WG discussion is warranted. >>>> The complexity would be the addition of over a hundred >>>> new lines of code on the client, and possibly a similar >>>> amount of new code on the server. Small, perhaps, but >>>> not insignificant. >>> >>> Until there are RDMA users, I care a lot less about code changes to >>> xprtrdma than to NFS. >>> >>>>>>> 2) Why do we instead have to solve the whole backchannel problem in >>>>>>> the NFSv4.1 layer, and where is the discussion of the merits for and >>>>>>> against that particular solution? As far as I can tell, it imposes at >>>>>>> least 2 extra requirements: >>>>>>> a) NFSv4.1 client+server must have support either for session >>>>>>> trunking or for clientid trunking >>>>>> >>>>>> Very minimal trunking support. The only operation allowed on >>>>>> the TCP side-car's forward channel is BIND_CONN_TO_SESSION. >>>>>> >>>>>> Bruce told me that associating multiple transports to a >>>>>> clientid/session should not be an issue for his server (his >>>>>> words were “if that doesn’t work, it’s a bug”). >>>>>> >>>>>> Would this restrictive form of trunking present a problem? >>>>>> >>>>>>> b) NFSv4.1 client must be able to set up a TCP connection to the >>>>>>> server (that can be session/clientid trunked with the existing RDMA >>>>>>> channel) >>>>>> >>>>>> Also very minimal changes. The changes are already done, >>>>>> posted in v1 of this patch series. >>>>> >>>>> I'm not asking for details on the size of the changesets, but for a >>>>> justification of the design itself. >>>> >>>> The size of the changeset _is_ the justification. It’s >>>> a much less invasive change to add a TCP side-car than >>>> it is to implement RDMA backchannel on both server and >>>> client. >>> >>> Please define your use of the word "invasive" in the above context. To >>> me "invasive" means "will affect code that is in use by others". >> >> The server side, then, is non-invasive. The client side makes minor >> changes to state management. >> >>> >>>> Most servers would require almost no change. Linux needs >>>> only a bug fix or two. Effectively zero-impact for >>>> servers that already support NFSv4.0 on RDMA to get >>>> NFSv4.1 and pNFS on RDMA, with working callbacks. >>>> >>>> That’s really all there is to it. It’s almost entirely a >>>> practical consideration: we have the infrastructure and >>>> can make it work in just a few lines of code. >>>> >>>>> If it is possible to confine all >>>>> the changes to the RPC/RDMA layer, then why consider patches that >>>>> change the NFSv4.1 layer at all? >>>> >>>> The fast new transport bring-up benefit is probably the >>>> biggest win. A TCP side-car makes bringing up any new >>>> transport implementation simpler. >>> >>> That's an assertion that assumes: >>> - we actually want to implement more transports aside from RDMA >> >> So you no longer consider RPC/SCTP a possibility? > > I'd still like to consider it, but the whole point would be to _avoid_ > doing trunking in the NFS layer. SCTP does trunking/multi-pathing at > the transport level, meaning that we don't have to deal with tracking > connections, state, replaying messages, etc. > Doing bi-directional RPC with SCTP is not an issue, since the > transport is fully symmetric. > >>> - implementing bi-directional transports in the RPC layer is non-simple >> >> I don't care to generalize about that. In the RPC/RDMA case, there >> are some complications that make it non-simple, but not impossible. >> So we have an example of a non-simple case, IMO. >> >>> Right now, the benefit is only to RDMA users. Nobody else is asking >>> for such a change. >>> >>>> And, RPC/RDMA offers zero performance benefit for >>>> backchannel traffic, especially since CB traffic would >>>> never move via RDMA READ/WRITE (as per RFC 5667 section >>>> 5.1). >>>> >>>> The primary benefit to doing an RPC/RDMA-only solution >>>> is that there is no upper layer impact. Is that a design >>>> requirement? >> >> Based on your objections, it appears that "no upper layer >> impact" is a hard design requirement. I will take this as a >> NACK for the side-car approach. > > There is not a hard NACK yet, but I am asking for stronger > justification. I do _not_ want to find myself in a situation 2 or 3 > years down the road where I have to argue against someone telling me > that we additionally have to implement callbacks over IB/RDMA because > the TCP sidecar is an incomplete solution. We should do either one or > the other, but not both… It is impossible to predict the future. However, I’m not sure there’s a problem building both eventually, especially because there is no spec guidance I’m aware of about which bi-directional RPC mechanisms MUST be supported for NFS/RDMA. I might be wrong, but we have enough mechanism in CREATE_SESSION for a client with bi-directional RPC/RDMA support to detect a server with no bi-directional RPC/RDMA, and use side-car only in that case. If we need more discussion, I can drop the side-car patches for 3.19 and do some more research. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 21, 2014 at 01:11:26PM -0400, Chuck Lever wrote: > The size of the changeset _is_ the justification. It’s > a much less invasive change to add a TCP side-car than > it is to implement RDMA backchannel on both server and > client. Something I'm confused about: is bidirectional RPC/RDMA optional or mandatory for servers to implement? Something somewhere has to be mandatory if we want to guarantee a working backchannel between any two implementations. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Oct 23, 2014, at 9:32 AM, J. Bruce Fields <bfields@fieldses.org> wrote: > On Tue, Oct 21, 2014 at 01:11:26PM -0400, Chuck Lever wrote: >> The size of the changeset _is_ the justification. It’s >> a much less invasive change to add a TCP side-car than >> it is to implement RDMA backchannel on both server and >> client. > > Something I'm confused about: is bidirectional RPC/RDMA optional or > mandatory for servers to implement? IMO bi-directional RPC/RDMA is not required anywhere in RFCs 5666 (the RPC/RDMA spec) or 5667 (the NFS on RPC/RDMA spec). > Something somewhere has to be mandatory if we want to guarantee a > working backchannel between any two implementations. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 6a4f366..19f49bf 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -78,6 +78,7 @@ const struct rpc_program nfs_program = { .stats = &nfs_rpcstat, .pipe_dir_name = NFS_PIPE_DIRNAME, }; +EXPORT_SYMBOL_GPL(nfs_program); struct rpc_stat nfs_rpcstat = { .program = &nfs_program diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index 5f4b818..b1cc35e 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -213,6 +213,8 @@ static void nfs4_destroy_callback(struct nfs_client *clp) { if (__test_and_clear_bit(NFS_CS_CALLBACK, &clp->cl_res_state)) nfs_callback_down(clp->cl_mvops->minor_version, clp->cl_net); + if (clp->cl_bc_rpcclient) + rpc_shutdown_client(clp->cl_bc_rpcclient); } static void nfs4_shutdown_client(struct nfs_client *clp) @@ -291,6 +293,53 @@ int nfs40_init_client(struct nfs_client *clp) #if defined(CONFIG_NFS_V4_1) +/* + * Create a separate rpc_clnt using TCP that can provide a + * backchannel service. + */ +static int nfs41_create_sidecar_rpc_client(struct nfs_client *clp) +{ + struct sockaddr_storage address; + struct sockaddr *sap = (struct sockaddr *)&address; + struct rpc_create_args args = { + .net = clp->cl_net, + .protocol = XPRT_TRANSPORT_TCP, + .address = sap, + .addrsize = clp->cl_addrlen, + .servername = clp->cl_hostname, + .program = &nfs_program, + .version = clp->rpc_ops->version, + .flags = (RPC_CLNT_CREATE_DISCRTRY | + RPC_CLNT_CREATE_NOPING), + }; + struct rpc_clnt *clnt; + struct rpc_cred *cred; + + if (rpc_xprt_is_bidirectional(clp->cl_rpcclient)) + return 0; + + if (test_bit(NFS_CS_NORESVPORT, &clp->cl_flags)) + args.flags |= RPC_CLNT_CREATE_NONPRIVPORT; + memcpy(sap, &clp->cl_addr, clp->cl_addrlen); + rpc_set_port(sap, NFS_PORT); + cred = nfs4_get_clid_cred(clp); + if (cred) { + args.authflavor = cred->cr_auth->au_flavor; + put_rpccred(cred); + } else + args.authflavor = RPC_AUTH_UNIX; + + clnt = rpc_create(&args); + if (IS_ERR(clnt)) { + dprintk("%s: cannot create side-car RPC client. Error = %ld\n", + __func__, PTR_ERR(clnt)); + return PTR_ERR(clnt); + } + + clp->cl_bc_rpcclient = clnt; + return 0; +} + /** * nfs41_init_client - nfs_client initialization tasks for NFSv4.1+ * @clp - nfs_client to initialize @@ -300,6 +349,11 @@ int nfs40_init_client(struct nfs_client *clp) int nfs41_init_client(struct nfs_client *clp) { struct nfs4_session *session = NULL; + int ret; + + ret = nfs41_create_sidecar_rpc_client(clp); + if (ret) + return ret; /* * Create the session and mark it expired. diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index 922be2e..159d703 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -87,6 +87,8 @@ struct nfs_client { /* The sequence id to use for the next CREATE_SESSION */ u32 cl_seqid; + /* The optional sidecar backchannel transport */ + struct rpc_clnt *cl_bc_rpcclient; /* The flags used for obtaining the clientid during EXCHANGE_ID */ u32 cl_exchange_flags; struct nfs4_session *cl_session; /* shared session */
So far, TCP is the only transport that supports bi-directional RPC. When mounting with NFSv4.1 using a transport that does not support bi-directional RPC, establish a TCP sidecar connection to handle backchannel traffic for a session. The sidecar transport does not use its forward channel except for sending BIND_CONN_TO_SESSION operations. This commit adds logic to create and destroy the sidecar transport. Subsequent commits add logic to use the transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- fs/nfs/client.c | 1 + fs/nfs/nfs4client.c | 54 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/nfs_fs_sb.h | 2 ++ 3 files changed, 57 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html