Message ID | 20170428172535.7945-3-trond.myklebust@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Trond- > On Apr 28, 2017, at 1:25 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > Allow the user to specify that the client should use multiple connections > to the server. For the moment, this functionality will be limited to > TCP and to NFSv4.x (x>0). Some initial reactions: - 5/5 could be squashed into this patch (2/5). - 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3 support for multiple connections in the non-pNFS case? If there is a good reason, it should be stated in 4/5's patch description or added as a comment somewhere in the source code. - Testing with a Linux server shows that the basic NFS/RDMA pieces work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use nconnect > 1. I'm looking into it. - Testing with a Solaris 12 server prototype that supports NFSv4.1 works fine with nconnect=[23]. Not seeing much performance improvement there because the server is using QDR and a single SATA SSD. Thus I don't see a strong need to keep the TCP-only limitation. However, if you do keep it, the logic that implements the second sentence in the patch description above is added in 3/5. Should this sentence be in that patch description instead? Or, instead: s/For the moment/In a subsequent patch > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > --- > fs/nfs/internal.h | 1 + > fs/nfs/super.c | 10 ++++++++++ > 2 files changed, 11 insertions(+) > > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h > index 31b26cf1b476..31757a742e9b 100644 > --- a/fs/nfs/internal.h > +++ b/fs/nfs/internal.h > @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data { > char *export_path; > int port; > unsigned short protocol; > + unsigned short nconnect; > } nfs_server; > > struct security_mnt_opts lsm_opts; > diff --git a/fs/nfs/super.c b/fs/nfs/super.c > index 54e0f9f2dd94..7eb48934dc79 100644 > --- a/fs/nfs/super.c > +++ b/fs/nfs/super.c > @@ -76,6 +76,8 @@ > #define NFS_DEFAULT_VERSION 2 > #endif > > +#define NFS_MAX_CONNECTIONS 16 > + > enum { > /* Mount options that take no arguments */ > Opt_soft, Opt_hard, > @@ -107,6 +109,7 @@ enum { > Opt_nfsvers, > Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost, > Opt_addr, Opt_mountaddr, Opt_clientaddr, > + Opt_nconnect, > Opt_lookupcache, > Opt_fscache_uniq, > Opt_local_lock, > @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = { > { Opt_mounthost, "mounthost=%s" }, > { Opt_mountaddr, "mountaddr=%s" }, > > + { Opt_nconnect, "nconnect=%s" }, > + > { Opt_lookupcache, "lookupcache=%s" }, > { Opt_fscache_uniq, "fsc=%s" }, > { Opt_local_lock, "local_lock=%s" }, > @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw, > if (mnt->mount_server.addrlen == 0) > goto out_invalid_address; > break; > + case Opt_nconnect: > + if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS)) > + goto out_invalid_value; > + mnt->nfs_server.nconnect = option; > + break; > case Opt_lookupcache: > string = match_strdup(args); > if (string == NULL) > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > Hi Trond- > > >> On Apr 28, 2017, at 1:25 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >> >> Allow the user to specify that the client should use multiple connections >> to the server. For the moment, this functionality will be limited to >> TCP and to NFSv4.x (x>0). > > Some initial reactions: > > - 5/5 could be squashed into this patch (2/5). > > - 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3 > support for multiple connections in the non-pNFS case? If there is a > good reason, it should be stated in 4/5's patch description or added > as a comment somewhere in the source code. > > - Testing with a Linux server shows that the basic NFS/RDMA pieces > work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > nconnect > 1. I'm looking into it. > > - Testing with a Solaris 12 server prototype that supports NFSv4.1 > works fine with nconnect=[23]. Not seeing much performance improvement > there because the server is using QDR and a single SATA SSD. > > Thus I don't see a strong need to keep the TCP-only limitation. However, > if you do keep it, the logic that implements the second sentence in the > patch description above is added in 3/5. Should this sentence be in that > patch description instead? Or, instead: > > s/For the moment/In a subsequent patch Oops, I forgot to mention: mountstats data looks a little confused when nconnect > 1. For example: WRITE: 3075342 ops (131%) avg bytes sent per op: 26829 avg bytes received per op: 113 backlog wait: 162.375128 RTT: 1.481101 total execute time: 163.861735 (milliseconds) Haven't looked closely at that 131%, but it could be either the kernel or the script itself is assuming one connection per mount. >> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> >> --- >> fs/nfs/internal.h | 1 + >> fs/nfs/super.c | 10 ++++++++++ >> 2 files changed, 11 insertions(+) >> >> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h >> index 31b26cf1b476..31757a742e9b 100644 >> --- a/fs/nfs/internal.h >> +++ b/fs/nfs/internal.h >> @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data { >> char *export_path; >> int port; >> unsigned short protocol; >> + unsigned short nconnect; >> } nfs_server; >> >> struct security_mnt_opts lsm_opts; >> diff --git a/fs/nfs/super.c b/fs/nfs/super.c >> index 54e0f9f2dd94..7eb48934dc79 100644 >> --- a/fs/nfs/super.c >> +++ b/fs/nfs/super.c >> @@ -76,6 +76,8 @@ >> #define NFS_DEFAULT_VERSION 2 >> #endif >> >> +#define NFS_MAX_CONNECTIONS 16 >> + >> enum { >> /* Mount options that take no arguments */ >> Opt_soft, Opt_hard, >> @@ -107,6 +109,7 @@ enum { >> Opt_nfsvers, >> Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost, >> Opt_addr, Opt_mountaddr, Opt_clientaddr, >> + Opt_nconnect, >> Opt_lookupcache, >> Opt_fscache_uniq, >> Opt_local_lock, >> @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = { >> { Opt_mounthost, "mounthost=%s" }, >> { Opt_mountaddr, "mountaddr=%s" }, >> >> + { Opt_nconnect, "nconnect=%s" }, >> + >> { Opt_lookupcache, "lookupcache=%s" }, >> { Opt_fscache_uniq, "fsc=%s" }, >> { Opt_local_lock, "local_lock=%s" }, >> @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw, >> if (mnt->mount_server.addrlen == 0) >> goto out_invalid_address; >> break; >> + case Opt_nconnect: >> + if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS)) >> + goto out_invalid_value; >> + mnt->nfs_server.nconnect = option; >> + break; >> case Opt_lookupcache: >> string = match_strdup(args); >> if (string == NULL) >> -- >> 2.9.3 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > - Testing with a Linux server shows that the basic NFS/RDMA pieces > work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > nconnect > 1. I'm looking into it. Reproduced with NFSv4.1, TCP, and nconnect=2. 363 /* 364 * RFC5661 18.51.3 365 * Before RECLAIM_COMPLETE done, server should deny new lock 366 */ 367 if (nfsd4_has_session(cstate) && 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, 369 &cstate->session->se_client->cl_flags) && 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) 371 return nfserr_grace; Server-side instrumentation confirms: May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 Network capture shows the RPCs are interleaved between the two connections as the client establishes its lease, and that appears to be confusing the server. C1: NULL -> NFS4_OK C1: EXCHANGE_ID -> NFS4_OK C2: CREATE_SESSION -> NFS4_OK C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED C2: SEQUENCE -> NFS4_OK C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION C1: BIND_CONN_TO_SESSION -> NFS4_OK C2: BIND_CONN_TO_SESSION -> NFS4_OK C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED .... mix of GETATTRs and other simple requests .... C1: OPEN -> NFS4ERR_GRACE C2: OPEN -> NFS4ERR_GRACE The RECLAIM_COMPLETE operation failed, and the client does not retry it. That leaves its lease stuck in GRACE. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: > > > On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > > > - Testing with a Linux server shows that the basic NFS/RDMA pieces > > work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > > nconnect > 1. I'm looking into it. > > Reproduced with NFSv4.1, TCP, and nconnect=2. > > 363 /* > 364 * RFC5661 18.51.3 > 365 * Before RECLAIM_COMPLETE done, server should deny new lock > 366 */ > 367 if (nfsd4_has_session(cstate) && > 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, > 369 &cstate->session->se_client->cl_flags) && > 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) > 371 return nfserr_grace; > > Server-side instrumentation confirms: > > May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true > May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false > May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 > > Network capture shows the RPCs are interleaved between the two > connections as the client establishes its lease, and that appears > to be confusing the server. > > C1: NULL -> NFS4_OK > C1: EXCHANGE_ID -> NFS4_OK > C2: CREATE_SESSION -> NFS4_OK > C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION What security flavors are involved? I believe the correct behavior depends on whether gss is in use or not. --b. > C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > C2: SEQUENCE -> NFS4_OK > C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > C1: BIND_CONN_TO_SESSION -> NFS4_OK > C2: BIND_CONN_TO_SESSION -> NFS4_OK > C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > > .... mix of GETATTRs and other simple requests .... > > C1: OPEN -> NFS4ERR_GRACE > C2: OPEN -> NFS4ERR_GRACE > > The RECLAIM_COMPLETE operation failed, and the client does not > retry it. That leaves its lease stuck in GRACE. > > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote: > > On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: >> >>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >>> >>> - Testing with a Linux server shows that the basic NFS/RDMA pieces >>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use >>> nconnect > 1. I'm looking into it. >> >> Reproduced with NFSv4.1, TCP, and nconnect=2. >> >> 363 /* >> 364 * RFC5661 18.51.3 >> 365 * Before RECLAIM_COMPLETE done, server should deny new lock >> 366 */ >> 367 if (nfsd4_has_session(cstate) && >> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, >> 369 &cstate->session->se_client->cl_flags) && >> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) >> 371 return nfserr_grace; >> >> Server-side instrumentation confirms: >> >> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true >> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false >> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 >> >> Network capture shows the RPCs are interleaved between the two >> connections as the client establishes its lease, and that appears >> to be confusing the server. >> >> C1: NULL -> NFS4_OK >> C1: EXCHANGE_ID -> NFS4_OK >> C2: CREATE_SESSION -> NFS4_OK >> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > > What security flavors are involved? I believe the correct behavior > depends on whether gss is in use or not. The mount options are "sec=sys" but both sides have a keytab. So the lease management operations are done with krb5i. > --b. > >> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED >> C2: SEQUENCE -> NFS4_OK >> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION >> C1: BIND_CONN_TO_SESSION -> NFS4_OK >> C2: BIND_CONN_TO_SESSION -> NFS4_OK >> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED >> >> .... mix of GETATTRs and other simple requests .... >> >> C1: OPEN -> NFS4ERR_GRACE >> C2: OPEN -> NFS4ERR_GRACE >> >> The RECLAIM_COMPLETE operation failed, and the client does not >> retry it. That leaves its lease stuck in GRACE. >> >> >> -- >> Chuck Lever >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote: > > > On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote: > > > > On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: > >> > >>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > >>> > >>> - Testing with a Linux server shows that the basic NFS/RDMA pieces > >>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > >>> nconnect > 1. I'm looking into it. > >> > >> Reproduced with NFSv4.1, TCP, and nconnect=2. > >> > >> 363 /* > >> 364 * RFC5661 18.51.3 > >> 365 * Before RECLAIM_COMPLETE done, server should deny new lock > >> 366 */ > >> 367 if (nfsd4_has_session(cstate) && > >> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, > >> 369 &cstate->session->se_client->cl_flags) && > >> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) > >> 371 return nfserr_grace; > >> > >> Server-side instrumentation confirms: > >> > >> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true > >> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false > >> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 > >> > >> Network capture shows the RPCs are interleaved between the two > >> connections as the client establishes its lease, and that appears > >> to be confusing the server. > >> > >> C1: NULL -> NFS4_OK > >> C1: EXCHANGE_ID -> NFS4_OK > >> C2: CREATE_SESSION -> NFS4_OK > >> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > > > > What security flavors are involved? I believe the correct behavior > > depends on whether gss is in use or not. > > The mount options are "sec=sys" but both sides have a keytab. > So the lease management operations are done with krb5i. OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION before step C1. My memory is that over auth_sys you're allowed to treat any SEQUENCE over a new connection as implicitly binding that connection to the referenced session, but over krb5 the server's required to return that NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION. I think CREATE_SESSION is allowed as long as the principals agree, and that's why the call at C2 succeeds. Seems a little weird, though. --b. > > > > --b. > > > >> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > >> C2: SEQUENCE -> NFS4_OK > >> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > >> C1: BIND_CONN_TO_SESSION -> NFS4_OK > >> C2: BIND_CONN_TO_SESSION -> NFS4_OK > >> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > >> > >> .... mix of GETATTRs and other simple requests .... > >> > >> C1: OPEN -> NFS4ERR_GRACE > >> C2: OPEN -> NFS4ERR_GRACE > >> > >> The RECLAIM_COMPLETE operation failed, and the client does not > >> retry it. That leaves its lease stuck in GRACE. > >> > >> > >> -- > >> Chuck Lever > >> > >> > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On May 4, 2017, at 1:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote: >> >>> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote: >>> >>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: >>>> >>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: >>>>> >>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces >>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use >>>>> nconnect > 1. I'm looking into it. >>>> >>>> Reproduced with NFSv4.1, TCP, and nconnect=2. >>>> >>>> 363 /* >>>> 364 * RFC5661 18.51.3 >>>> 365 * Before RECLAIM_COMPLETE done, server should deny new lock >>>> 366 */ >>>> 367 if (nfsd4_has_session(cstate) && >>>> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, >>>> 369 &cstate->session->se_client->cl_flags) && >>>> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) >>>> 371 return nfserr_grace; >>>> >>>> Server-side instrumentation confirms: >>>> >>>> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true >>>> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false >>>> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 >>>> >>>> Network capture shows the RPCs are interleaved between the two >>>> connections as the client establishes its lease, and that appears >>>> to be confusing the server. >>>> >>>> C1: NULL -> NFS4_OK >>>> C1: EXCHANGE_ID -> NFS4_OK >>>> C2: CREATE_SESSION -> NFS4_OK >>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION >>> >>> What security flavors are involved? I believe the correct behavior >>> depends on whether gss is in use or not. >> >> The mount options are "sec=sys" but both sides have a keytab. >> So the lease management operations are done with krb5i. > > OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION > before step C1. > > My memory is that over auth_sys you're allowed to treat any SEQUENCE > over a new connection as implicitly binding that connection to the > referenced session, but over krb5 the server's required to return that > NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION. Ah, that would explain why nconnect=[234] is working against my Solaris 12 server: no keytab on that server means lease management is done using plain-old AUTH_SYS. Multiple connections are now handled entirely by the RPC layer, and are opened and used at rpc_clnt creation time. The NFS client is not aware (except for allowing more than one connection to be used) and relies on its own recovery mechanisms to deal with exceptions that might arise. IOW it doesn't seem to know that an extra BC2S is needed, nor does it know where in the RPC stream to insert that operation. Seems to me a good approach would be to handle server trunking discovery and lease establishment using a single connection, and then open more connections. A conservative approach might actually hold off on opening additional connections until there are enough RPC transactions being initiated in parallel to warrant it. Or, if @nconnect > 1, use a single connection to perform lease management, and open @nconnect additional connections that handle only per- mount I/O activity. > I think CREATE_SESSION is allowed as long as the principals agree, and > that's why the call at C2 succeeds. Seems a little weird, though. Well, there's no SEQUENCE operation in that COMPOUND. No session or connection to use there, I think the principal and client ID are the only way to recognize the target of the operation? > --b. > >> >> >>> --b. >>> >>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED >>>> C2: SEQUENCE -> NFS4_OK >>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION >>>> C1: BIND_CONN_TO_SESSION -> NFS4_OK >>>> C2: BIND_CONN_TO_SESSION -> NFS4_OK >>>> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED >>>> >>>> .... mix of GETATTRs and other simple requests .... >>>> >>>> C1: OPEN -> NFS4ERR_GRACE >>>> C2: OPEN -> NFS4ERR_GRACE >>>> >>>> The RECLAIM_COMPLETE operation failed, and the client does not >>>> retry it. That leaves its lease stuck in GRACE. >>>> >>>> >>>> -- >>>> Chuck Lever >>>> >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> Chuck Lever >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 04, 2017 at 02:55:06PM -0400, Chuck Lever wrote: > > > On May 4, 2017, at 1:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote: > >> > >>> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote: > >>> > >>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: > >>>> > >>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > >>>>> > >>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces > >>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use > >>>>> nconnect > 1. I'm looking into it. > >>>> > >>>> Reproduced with NFSv4.1, TCP, and nconnect=2. > >>>> > >>>> 363 /* > >>>> 364 * RFC5661 18.51.3 > >>>> 365 * Before RECLAIM_COMPLETE done, server should deny new lock > >>>> 366 */ > >>>> 367 if (nfsd4_has_session(cstate) && > >>>> 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, > >>>> 369 &cstate->session->se_client->cl_flags) && > >>>> 370 open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) > >>>> 371 return nfserr_grace; > >>>> > >>>> Server-side instrumentation confirms: > >>>> > >>>> May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns true > >>>> May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false > >>>> May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 > >>>> > >>>> Network capture shows the RPCs are interleaved between the two > >>>> connections as the client establishes its lease, and that appears > >>>> to be confusing the server. > >>>> > >>>> C1: NULL -> NFS4_OK > >>>> C1: EXCHANGE_ID -> NFS4_OK > >>>> C2: CREATE_SESSION -> NFS4_OK > >>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > >>> > >>> What security flavors are involved? I believe the correct behavior > >>> depends on whether gss is in use or not. > >> > >> The mount options are "sec=sys" but both sides have a keytab. > >> So the lease management operations are done with krb5i. > > > > OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION > > before step C1. > > > > My memory is that over auth_sys you're allowed to treat any SEQUENCE > > over a new connection as implicitly binding that connection to the > > referenced session, but over krb5 the server's required to return that > > NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION. > > Ah, that would explain why nconnect=[234] is working against my > Solaris 12 server: no keytab on that server means lease management > is done using plain-old AUTH_SYS. > > Multiple connections are now handled entirely by the RPC layer, > and are opened and used at rpc_clnt creation time. The NFS client > is not aware (except for allowing more than one connection to be > used) and relies on its own recovery mechanisms to deal with > exceptions that might arise. IOW it doesn't seem to know that an > extra BC2S is needed, nor does it know where in the RPC stream > to insert that operation. > > Seems to me a good approach would be to handle server trunking > discovery and lease establishment using a single connection, and > then open more connections. A conservative approach might actually > hold off on opening additional connections until there are enough > RPC transactions being initiated in parallel to warrant it. Or, if > @nconnect > 1, use a single connection to perform lease management, > and open @nconnect additional connections that handle only per- > mount I/O activity. > > > > I think CREATE_SESSION is allowed as long as the principals agree, and > > that's why the call at C2 succeeds. Seems a little weird, though. > > Well, there's no SEQUENCE operation in that COMPOUND. No session > or connection to use there, I think the principal and client ID > are the only way to recognize the target of the operation? I'm just not clear why the explicit BIND_CONN_TO_SESSION is required in the gss case. Actually, it's not gss exactly, it's the state protection level: If, when the client ID was created, the client opted for SP4_NONE state protection, the client is not required to use BIND_CONN_TO_SESSION to associate the connection with the session, unless the client wishes to associate the connection with the backchannel. When SP4_NONE protection is used, simply sending a COMPOUND request with a SEQUENCE operation is sufficient to associate the connection with the session specified in SEQUENCE. Anyway. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2017-05-04 at 13:45 -0400, J. Bruce Fields wrote: > On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote: > > > > > On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote: > > > > > > On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote: > > > > > > > > > On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.c > > > > > om> wrote: > > > > > > > > > > - Testing with a Linux server shows that the basic NFS/RDMA > > > > > pieces > > > > > work, but any OPEN operation gets NFS4ERR_GRACE, forever, > > > > > when I use > > > > > nconnect > 1. I'm looking into it. > > > > > > > > Reproduced with NFSv4.1, TCP, and nconnect=2. > > > > > > > > 363 /* > > > > 364 * RFC5661 18.51.3 > > > > 365 * Before RECLAIM_COMPLETE done, server should deny > > > > new lock > > > > 366 */ > > > > 367 if (nfsd4_has_session(cstate) && > > > > 368 !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, > > > > 369 &cstate->session->se_client- > > > > >cl_flags) && > > > > 370 open->op_claim_type != > > > > NFS4_OPEN_CLAIM_PREVIOUS) > > > > 371 return nfserr_grace; > > > > > > > > Server-side instrumentation confirms: > > > > > > > > May 4 11:28:29 klimt kernel: nfsd4_open: has_session returns > > > > true > > > > May 4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is > > > > false > > > > May 4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0 > > > > > > > > Network capture shows the RPCs are interleaved between the two > > > > connections as the client establishes its lease, and that > > > > appears > > > > to be confusing the server. > > > > > > > > C1: NULL -> NFS4_OK > > > > C1: EXCHANGE_ID -> NFS4_OK > > > > C2: CREATE_SESSION -> NFS4_OK > > > > C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > > > > > > What security flavors are involved? I believe the correct > > > behavior > > > depends on whether gss is in use or not. > > > > The mount options are "sec=sys" but both sides have a keytab. > > So the lease management operations are done with krb5i. > > OK. I'm pretty sure the client needs to send BIND_CONN_TO_SESSION > before step C1. > > My memory is that over auth_sys you're allowed to treat any SEQUENCE > over a new connection as implicitly binding that connection to the > referenced session, but over krb5 the server's required to return > that > NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION. > > I think CREATE_SESSION is allowed as long as the principals agree, > and > that's why the call at C2 succeeds. Seems a little weird, though. > See https://tools.ietf.org/html/rfc5661#section-2.10.3.1 So, we probably should send the BIND_CONN_TO_SESSION after creating the session, but since that involves figuring out whether or not state protection was successfully negotiated, and since we have to support handling NFS4ERR_CONN_NOT_BOUND_TO_SESSION anyway, I'm all for just waiting for the server to send the error. > --b. > > > > > > > > --b. > > > > > > > C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > > > > C2: SEQUENCE -> NFS4_OK > > > > C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION > > > > C1: BIND_CONN_TO_SESSION -> NFS4_OK > > > > C2: BIND_CONN_TO_SESSION -> NFS4_OK > > > > C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED > > > > > > > > .... mix of GETATTRs and other simple requests .... > > > > > > > > C1: OPEN -> NFS4ERR_GRACE > > > > C2: OPEN -> NFS4ERR_GRACE > > > > > > > > The RECLAIM_COMPLETE operation failed, and the client does not > > > > retry it. That leaves its lease stuck in GRACE. > > > > > > > > > > > > -- > > > > Chuck Lever > > > > > > > > > > > > > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe > > > > linux-nfs" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.h > > > > tml > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux- > > > nfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.htm > > > l > > > > -- > > Chuck Lever > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux- > > nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com
On Thu, May 04, 2017 at 08:40:07PM +0000, Trond Myklebust wrote: > See https://tools.ietf.org/html/rfc5661#section-2.10.3.1 > > So, we probably should send the BIND_CONN_TO_SESSION after creating the > session, but since that involves figuring out whether or not state > protection was successfully negotiated, and since we have to support > handling NFS4ERR_CONN_NOT_BOUND_TO_SESSION anyway, I'm all for just > waiting for the server to send the error. Makes sense to me. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 31b26cf1b476..31757a742e9b 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data { char *export_path; int port; unsigned short protocol; + unsigned short nconnect; } nfs_server; struct security_mnt_opts lsm_opts; diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 54e0f9f2dd94..7eb48934dc79 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -76,6 +76,8 @@ #define NFS_DEFAULT_VERSION 2 #endif +#define NFS_MAX_CONNECTIONS 16 + enum { /* Mount options that take no arguments */ Opt_soft, Opt_hard, @@ -107,6 +109,7 @@ enum { Opt_nfsvers, Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost, Opt_addr, Opt_mountaddr, Opt_clientaddr, + Opt_nconnect, Opt_lookupcache, Opt_fscache_uniq, Opt_local_lock, @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = { { Opt_mounthost, "mounthost=%s" }, { Opt_mountaddr, "mountaddr=%s" }, + { Opt_nconnect, "nconnect=%s" }, + { Opt_lookupcache, "lookupcache=%s" }, { Opt_fscache_uniq, "fsc=%s" }, { Opt_local_lock, "local_lock=%s" }, @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw, if (mnt->mount_server.addrlen == 0) goto out_invalid_address; break; + case Opt_nconnect: + if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS)) + goto out_invalid_value; + mnt->nfs_server.nconnect = option; + break; case Opt_lookupcache: string = match_strdup(args); if (string == NULL)
Allow the user to specify that the client should use multiple connections to the server. For the moment, this functionality will be limited to TCP and to NFSv4.x (x>0). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/internal.h | 1 + fs/nfs/super.c | 10 ++++++++++ 2 files changed, 11 insertions(+)