diff mbox

[RFC,2/5] NFS: Add a mount option to specify number of TCP connections to use

Message ID 20170428172535.7945-3-trond.myklebust@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust April 28, 2017, 5:25 p.m. UTC
Allow the user to specify that the client should use multiple connections
to the server. For the moment, this functionality will be limited to
TCP and to NFSv4.x (x>0).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/nfs/internal.h |  1 +
 fs/nfs/super.c    | 10 ++++++++++
 2 files changed, 11 insertions(+)

Comments

Chuck Lever May 4, 2017, 1:45 p.m. UTC | #1
Hi Trond-


> On Apr 28, 2017, at 1:25 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
> 
> Allow the user to specify that the client should use multiple connections
> to the server. For the moment, this functionality will be limited to
> TCP and to NFSv4.x (x>0).

Some initial reactions:

- 5/5 could be squashed into this patch (2/5).

- 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3
support for multiple connections in the non-pNFS case? If there is a
good reason, it should be stated in 4/5's patch description or added
as a comment somewhere in the source code.

- Testing with a Linux server shows that the basic NFS/RDMA pieces
work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
nconnect > 1. I'm looking into it.

- Testing with a Solaris 12 server prototype that supports NFSv4.1
works fine with nconnect=[23]. Not seeing much performance improvement
there because the server is using QDR and a single SATA SSD.

Thus I don't see a strong need to keep the TCP-only limitation. However,
if you do keep it, the logic that implements the second sentence in the
patch description above is added in 3/5. Should this sentence be in that
patch description instead? Or, instead:

s/For the moment/In a subsequent patch


> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> ---
> fs/nfs/internal.h |  1 +
> fs/nfs/super.c    | 10 ++++++++++
> 2 files changed, 11 insertions(+)
> 
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 31b26cf1b476..31757a742e9b 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data {
> 		char			*export_path;
> 		int			port;
> 		unsigned short		protocol;
> +		unsigned short		nconnect;
> 	} nfs_server;
> 
> 	struct security_mnt_opts lsm_opts;
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 54e0f9f2dd94..7eb48934dc79 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -76,6 +76,8 @@
> #define NFS_DEFAULT_VERSION 2
> #endif
> 
> +#define NFS_MAX_CONNECTIONS 16
> +
> enum {
> 	/* Mount options that take no arguments */
> 	Opt_soft, Opt_hard,
> @@ -107,6 +109,7 @@ enum {
> 	Opt_nfsvers,
> 	Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
> 	Opt_addr, Opt_mountaddr, Opt_clientaddr,
> +	Opt_nconnect,
> 	Opt_lookupcache,
> 	Opt_fscache_uniq,
> 	Opt_local_lock,
> @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = {
> 	{ Opt_mounthost, "mounthost=%s" },
> 	{ Opt_mountaddr, "mountaddr=%s" },
> 
> +	{ Opt_nconnect, "nconnect=%s" },
> +
> 	{ Opt_lookupcache, "lookupcache=%s" },
> 	{ Opt_fscache_uniq, "fsc=%s" },
> 	{ Opt_local_lock, "local_lock=%s" },
> @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw,
> 			if (mnt->mount_server.addrlen == 0)
> 				goto out_invalid_address;
> 			break;
> +		case Opt_nconnect:
> +			if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS))
> +				goto out_invalid_value;
> +			mnt->nfs_server.nconnect = option;
> +			break;
> 		case Opt_lookupcache:
> 			string = match_strdup(args);
> 			if (string == NULL)
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever May 4, 2017, 1:53 p.m. UTC | #2
> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> Hi Trond-
> 
> 
>> On Apr 28, 2017, at 1:25 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote:
>> 
>> Allow the user to specify that the client should use multiple connections
>> to the server. For the moment, this functionality will be limited to
>> TCP and to NFSv4.x (x>0).
> 
> Some initial reactions:
> 
> - 5/5 could be squashed into this patch (2/5).
> 
> - 4/5 adds support for using NFSv3 with a DS. Why can't you add NFSv3
> support for multiple connections in the non-pNFS case? If there is a
> good reason, it should be stated in 4/5's patch description or added
> as a comment somewhere in the source code.
> 
> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> nconnect > 1. I'm looking into it.
> 
> - Testing with a Solaris 12 server prototype that supports NFSv4.1
> works fine with nconnect=[23]. Not seeing much performance improvement
> there because the server is using QDR and a single SATA SSD.
> 
> Thus I don't see a strong need to keep the TCP-only limitation. However,
> if you do keep it, the logic that implements the second sentence in the
> patch description above is added in 3/5. Should this sentence be in that
> patch description instead? Or, instead:
> 
> s/For the moment/In a subsequent patch

Oops, I forgot to mention: mountstats data looks a little confused
when nconnect > 1. For example:

WRITE:
           3075342 ops (131%)
        avg bytes sent per op: 26829    avg bytes received per op: 113
        backlog wait: 162.375128        RTT: 1.481101   total execute time: 163.861735 (milliseconds)

Haven't looked closely at that 131%, but it could be either the kernel
or the script itself is assuming one connection per mount.


>> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
>> ---
>> fs/nfs/internal.h |  1 +
>> fs/nfs/super.c    | 10 ++++++++++
>> 2 files changed, 11 insertions(+)
>> 
>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>> index 31b26cf1b476..31757a742e9b 100644
>> --- a/fs/nfs/internal.h
>> +++ b/fs/nfs/internal.h
>> @@ -117,6 +117,7 @@ struct nfs_parsed_mount_data {
>> 		char			*export_path;
>> 		int			port;
>> 		unsigned short		protocol;
>> +		unsigned short		nconnect;
>> 	} nfs_server;
>> 
>> 	struct security_mnt_opts lsm_opts;
>> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
>> index 54e0f9f2dd94..7eb48934dc79 100644
>> --- a/fs/nfs/super.c
>> +++ b/fs/nfs/super.c
>> @@ -76,6 +76,8 @@
>> #define NFS_DEFAULT_VERSION 2
>> #endif
>> 
>> +#define NFS_MAX_CONNECTIONS 16
>> +
>> enum {
>> 	/* Mount options that take no arguments */
>> 	Opt_soft, Opt_hard,
>> @@ -107,6 +109,7 @@ enum {
>> 	Opt_nfsvers,
>> 	Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
>> 	Opt_addr, Opt_mountaddr, Opt_clientaddr,
>> +	Opt_nconnect,
>> 	Opt_lookupcache,
>> 	Opt_fscache_uniq,
>> 	Opt_local_lock,
>> @@ -179,6 +182,8 @@ static const match_table_t nfs_mount_option_tokens = {
>> 	{ Opt_mounthost, "mounthost=%s" },
>> 	{ Opt_mountaddr, "mountaddr=%s" },
>> 
>> +	{ Opt_nconnect, "nconnect=%s" },
>> +
>> 	{ Opt_lookupcache, "lookupcache=%s" },
>> 	{ Opt_fscache_uniq, "fsc=%s" },
>> 	{ Opt_local_lock, "local_lock=%s" },
>> @@ -1544,6 +1549,11 @@ static int nfs_parse_mount_options(char *raw,
>> 			if (mnt->mount_server.addrlen == 0)
>> 				goto out_invalid_address;
>> 			break;
>> +		case Opt_nconnect:
>> +			if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS))
>> +				goto out_invalid_value;
>> +			mnt->nfs_server.nconnect = option;
>> +			break;
>> 		case Opt_lookupcache:
>> 			string = match_strdup(args);
>> 			if (string == NULL)
>> -- 
>> 2.9.3
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever May 4, 2017, 4:01 p.m. UTC | #3
> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> nconnect > 1. I'm looking into it.

Reproduced with NFSv4.1, TCP, and nconnect=2.

363         /*
364          * RFC5661 18.51.3
365          * Before RECLAIM_COMPLETE done, server should deny new lock
366          */
367         if (nfsd4_has_session(cstate) &&
368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
369                       &cstate->session->se_client->cl_flags) &&
370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
371                 return nfserr_grace;

Server-side instrumentation confirms:

May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0

Network capture shows the RPCs are interleaved between the two
connections as the client establishes its lease, and that appears
to be confusing the server.

C1: NULL -> NFS4_OK
C1: EXCHANGE_ID -> NFS4_OK
C2: CREATE_SESSION -> NFS4_OK
C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
C2: SEQUENCE -> NFS4_OK
C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
C1: BIND_CONN_TO_SESSION -> NFS4_OK
C2: BIND_CONN_TO_SESSION -> NFS4_OK
C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED

.... mix of GETATTRs and other simple requests ....

C1: OPEN -> NFS4ERR_GRACE
C2: OPEN -> NFS4ERR_GRACE

The RECLAIM_COMPLETE operation failed, and the client does not
retry it. That leaves its lease stuck in GRACE.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields May 4, 2017, 5:36 p.m. UTC | #4
On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
> 
> > On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > 
> > - Testing with a Linux server shows that the basic NFS/RDMA pieces
> > work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> > nconnect > 1. I'm looking into it.
> 
> Reproduced with NFSv4.1, TCP, and nconnect=2.
> 
> 363         /*
> 364          * RFC5661 18.51.3
> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
> 366          */
> 367         if (nfsd4_has_session(cstate) &&
> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
> 369                       &cstate->session->se_client->cl_flags) &&
> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> 371                 return nfserr_grace;
> 
> Server-side instrumentation confirms:
> 
> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
> 
> Network capture shows the RPCs are interleaved between the two
> connections as the client establishes its lease, and that appears
> to be confusing the server.
> 
> C1: NULL -> NFS4_OK
> C1: EXCHANGE_ID -> NFS4_OK
> C2: CREATE_SESSION -> NFS4_OK
> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION

What security flavors are involved?  I believe the correct behavior
depends on whether gss is in use or not.

--b.

> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> C2: SEQUENCE -> NFS4_OK
> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> C1: BIND_CONN_TO_SESSION -> NFS4_OK
> C2: BIND_CONN_TO_SESSION -> NFS4_OK
> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> 
> .... mix of GETATTRs and other simple requests ....
> 
> C1: OPEN -> NFS4ERR_GRACE
> C2: OPEN -> NFS4ERR_GRACE
> 
> The RECLAIM_COMPLETE operation failed, and the client does not
> retry it. That leaves its lease stuck in GRACE.
> 
> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever May 4, 2017, 5:38 p.m. UTC | #5
> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote:
> 
> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
>> 
>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>> 
>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
>>> nconnect > 1. I'm looking into it.
>> 
>> Reproduced with NFSv4.1, TCP, and nconnect=2.
>> 
>> 363         /*
>> 364          * RFC5661 18.51.3
>> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
>> 366          */
>> 367         if (nfsd4_has_session(cstate) &&
>> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
>> 369                       &cstate->session->se_client->cl_flags) &&
>> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
>> 371                 return nfserr_grace;
>> 
>> Server-side instrumentation confirms:
>> 
>> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
>> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
>> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
>> 
>> Network capture shows the RPCs are interleaved between the two
>> connections as the client establishes its lease, and that appears
>> to be confusing the server.
>> 
>> C1: NULL -> NFS4_OK
>> C1: EXCHANGE_ID -> NFS4_OK
>> C2: CREATE_SESSION -> NFS4_OK
>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> 
> What security flavors are involved?  I believe the correct behavior
> depends on whether gss is in use or not.

The mount options are "sec=sys" but both sides have a keytab.
So the lease management operations are done with krb5i.


> --b.
> 
>> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>> C2: SEQUENCE -> NFS4_OK
>> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>> C1: BIND_CONN_TO_SESSION -> NFS4_OK
>> C2: BIND_CONN_TO_SESSION -> NFS4_OK
>> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>> 
>> .... mix of GETATTRs and other simple requests ....
>> 
>> C1: OPEN -> NFS4ERR_GRACE
>> C2: OPEN -> NFS4ERR_GRACE
>> 
>> The RECLAIM_COMPLETE operation failed, and the client does not
>> retry it. That leaves its lease stuck in GRACE.
>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields May 4, 2017, 5:45 p.m. UTC | #6
On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:
> 
> > On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote:
> > 
> > On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
> >> 
> >>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >>> 
> >>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> >>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> >>> nconnect > 1. I'm looking into it.
> >> 
> >> Reproduced with NFSv4.1, TCP, and nconnect=2.
> >> 
> >> 363         /*
> >> 364          * RFC5661 18.51.3
> >> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
> >> 366          */
> >> 367         if (nfsd4_has_session(cstate) &&
> >> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
> >> 369                       &cstate->session->se_client->cl_flags) &&
> >> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> >> 371                 return nfserr_grace;
> >> 
> >> Server-side instrumentation confirms:
> >> 
> >> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
> >> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
> >> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
> >> 
> >> Network capture shows the RPCs are interleaved between the two
> >> connections as the client establishes its lease, and that appears
> >> to be confusing the server.
> >> 
> >> C1: NULL -> NFS4_OK
> >> C1: EXCHANGE_ID -> NFS4_OK
> >> C2: CREATE_SESSION -> NFS4_OK
> >> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> > 
> > What security flavors are involved?  I believe the correct behavior
> > depends on whether gss is in use or not.
> 
> The mount options are "sec=sys" but both sides have a keytab.
> So the lease management operations are done with krb5i.

OK.  I'm pretty sure the client needs to send BIND_CONN_TO_SESSION
before step C1.

My memory is that over auth_sys you're allowed to treat any SEQUENCE
over a new connection as implicitly binding that connection to the
referenced session, but over krb5 the server's required to return that
NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.

I think CREATE_SESSION is allowed as long as the principals agree, and
that's why the call at C2 succeeds.  Seems a little weird, though.

--b.

> 
> 
> > --b.
> > 
> >> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> >> C2: SEQUENCE -> NFS4_OK
> >> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> >> C1: BIND_CONN_TO_SESSION -> NFS4_OK
> >> C2: BIND_CONN_TO_SESSION -> NFS4_OK
> >> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
> >> 
> >> .... mix of GETATTRs and other simple requests ....
> >> 
> >> C1: OPEN -> NFS4ERR_GRACE
> >> C2: OPEN -> NFS4ERR_GRACE
> >> 
> >> The RECLAIM_COMPLETE operation failed, and the client does not
> >> retry it. That leaves its lease stuck in GRACE.
> >> 
> >> 
> >> --
> >> Chuck Lever
> >> 
> >> 
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever May 4, 2017, 6:55 p.m. UTC | #7
> On May 4, 2017, at 1:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:
>> 
>>> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote:
>>> 
>>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
>>>> 
>>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>>> 
>>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
>>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
>>>>> nconnect > 1. I'm looking into it.
>>>> 
>>>> Reproduced with NFSv4.1, TCP, and nconnect=2.
>>>> 
>>>> 363         /*
>>>> 364          * RFC5661 18.51.3
>>>> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
>>>> 366          */
>>>> 367         if (nfsd4_has_session(cstate) &&
>>>> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
>>>> 369                       &cstate->session->se_client->cl_flags) &&
>>>> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
>>>> 371                 return nfserr_grace;
>>>> 
>>>> Server-side instrumentation confirms:
>>>> 
>>>> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
>>>> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
>>>> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
>>>> 
>>>> Network capture shows the RPCs are interleaved between the two
>>>> connections as the client establishes its lease, and that appears
>>>> to be confusing the server.
>>>> 
>>>> C1: NULL -> NFS4_OK
>>>> C1: EXCHANGE_ID -> NFS4_OK
>>>> C2: CREATE_SESSION -> NFS4_OK
>>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>>> 
>>> What security flavors are involved?  I believe the correct behavior
>>> depends on whether gss is in use or not.
>> 
>> The mount options are "sec=sys" but both sides have a keytab.
>> So the lease management operations are done with krb5i.
> 
> OK.  I'm pretty sure the client needs to send BIND_CONN_TO_SESSION
> before step C1.
> 
> My memory is that over auth_sys you're allowed to treat any SEQUENCE
> over a new connection as implicitly binding that connection to the
> referenced session, but over krb5 the server's required to return that
> NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.

Ah, that would explain why nconnect=[234] is working against my
Solaris 12 server: no keytab on that server means lease management
is done using plain-old AUTH_SYS.

Multiple connections are now handled entirely by the RPC layer,
and are opened and used at rpc_clnt creation time. The NFS client
is not aware (except for allowing more than one connection to be
used) and relies on its own recovery mechanisms to deal with
exceptions that might arise. IOW it doesn't seem to know that an
extra BC2S is needed, nor does it know where in the RPC stream
to insert that operation.

Seems to me a good approach would be to handle server trunking
discovery and lease establishment using a single connection, and
then open more connections. A conservative approach might actually
hold off on opening additional connections until there are enough
RPC transactions being initiated in parallel to warrant it. Or, if
@nconnect > 1, use a single connection to perform lease management,
and open @nconnect additional connections that handle only per-
mount I/O activity.


> I think CREATE_SESSION is allowed as long as the principals agree, and
> that's why the call at C2 succeeds.  Seems a little weird, though.

Well, there's no SEQUENCE operation in that COMPOUND. No session
or connection to use there, I think the principal and client ID
are the only way to recognize the target of the operation?


> --b.
> 
>> 
>> 
>>> --b.
>>> 
>>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>>>> C2: SEQUENCE -> NFS4_OK
>>>> C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
>>>> C1: BIND_CONN_TO_SESSION -> NFS4_OK
>>>> C2: BIND_CONN_TO_SESSION -> NFS4_OK
>>>> C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED
>>>> 
>>>> .... mix of GETATTRs and other simple requests ....
>>>> 
>>>> C1: OPEN -> NFS4ERR_GRACE
>>>> C2: OPEN -> NFS4ERR_GRACE
>>>> 
>>>> The RECLAIM_COMPLETE operation failed, and the client does not
>>>> retry it. That leaves its lease stuck in GRACE.
>>>> 
>>>> 
>>>> --
>>>> Chuck Lever
>>>> 
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields May 4, 2017, 7:58 p.m. UTC | #8
On Thu, May 04, 2017 at 02:55:06PM -0400, Chuck Lever wrote:
> 
> > On May 4, 2017, at 1:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:
> >> 
> >>> On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote:
> >>> 
> >>> On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:
> >>>> 
> >>>>> On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> >>>>> 
> >>>>> - Testing with a Linux server shows that the basic NFS/RDMA pieces
> >>>>> work, but any OPEN operation gets NFS4ERR_GRACE, forever, when I use
> >>>>> nconnect > 1. I'm looking into it.
> >>>> 
> >>>> Reproduced with NFSv4.1, TCP, and nconnect=2.
> >>>> 
> >>>> 363         /*
> >>>> 364          * RFC5661 18.51.3
> >>>> 365          * Before RECLAIM_COMPLETE done, server should deny new lock
> >>>> 366          */
> >>>> 367         if (nfsd4_has_session(cstate) &&
> >>>> 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,
> >>>> 369                       &cstate->session->se_client->cl_flags) &&
> >>>> 370             open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> >>>> 371                 return nfserr_grace;
> >>>> 
> >>>> Server-side instrumentation confirms:
> >>>> 
> >>>> May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns true
> >>>> May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is false
> >>>> May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0
> >>>> 
> >>>> Network capture shows the RPCs are interleaved between the two
> >>>> connections as the client establishes its lease, and that appears
> >>>> to be confusing the server.
> >>>> 
> >>>> C1: NULL -> NFS4_OK
> >>>> C1: EXCHANGE_ID -> NFS4_OK
> >>>> C2: CREATE_SESSION -> NFS4_OK
> >>>> C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION
> >>> 
> >>> What security flavors are involved?  I believe the correct behavior
> >>> depends on whether gss is in use or not.
> >> 
> >> The mount options are "sec=sys" but both sides have a keytab.
> >> So the lease management operations are done with krb5i.
> > 
> > OK.  I'm pretty sure the client needs to send BIND_CONN_TO_SESSION
> > before step C1.
> > 
> > My memory is that over auth_sys you're allowed to treat any SEQUENCE
> > over a new connection as implicitly binding that connection to the
> > referenced session, but over krb5 the server's required to return that
> > NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.
> 
> Ah, that would explain why nconnect=[234] is working against my
> Solaris 12 server: no keytab on that server means lease management
> is done using plain-old AUTH_SYS.
> 
> Multiple connections are now handled entirely by the RPC layer,
> and are opened and used at rpc_clnt creation time. The NFS client
> is not aware (except for allowing more than one connection to be
> used) and relies on its own recovery mechanisms to deal with
> exceptions that might arise. IOW it doesn't seem to know that an
> extra BC2S is needed, nor does it know where in the RPC stream
> to insert that operation.
> 
> Seems to me a good approach would be to handle server trunking
> discovery and lease establishment using a single connection, and
> then open more connections. A conservative approach might actually
> hold off on opening additional connections until there are enough
> RPC transactions being initiated in parallel to warrant it. Or, if
> @nconnect > 1, use a single connection to perform lease management,
> and open @nconnect additional connections that handle only per-
> mount I/O activity.
> 
> 
> > I think CREATE_SESSION is allowed as long as the principals agree, and
> > that's why the call at C2 succeeds.  Seems a little weird, though.
> 
> Well, there's no SEQUENCE operation in that COMPOUND. No session
> or connection to use there, I think the principal and client ID
> are the only way to recognize the target of the operation?

I'm just not clear why the explicit BIND_CONN_TO_SESSION is required in
the gss case.

Actually, it's not gss exactly, it's the state protection level:

	If, when the client ID was created, the client opted for
	SP4_NONE state protection, the client is not required to use
	BIND_CONN_TO_SESSION to associate the connection with the
	session, unless the client wishes to associate the connection
	with the backchannel.  When SP4_NONE protection is used, simply
	sending a COMPOUND request with a SEQUENCE operation is
	sufficient to associate the connection with the session
	specified in SEQUENCE.

Anyway.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust May 4, 2017, 8:40 p.m. UTC | #9
On Thu, 2017-05-04 at 13:45 -0400, J. Bruce Fields wrote:
> On Thu, May 04, 2017 at 01:38:35PM -0400, Chuck Lever wrote:

> > 

> > > On May 4, 2017, at 1:36 PM, bfields@fieldses.org wrote:

> > > 

> > > On Thu, May 04, 2017 at 12:01:29PM -0400, Chuck Lever wrote:

> > > > 

> > > > > On May 4, 2017, at 9:45 AM, Chuck Lever <chuck.lever@oracle.c

> > > > > om> wrote:

> > > > > 

> > > > > - Testing with a Linux server shows that the basic NFS/RDMA

> > > > > pieces

> > > > > work, but any OPEN operation gets NFS4ERR_GRACE, forever,

> > > > > when I use

> > > > > nconnect > 1. I'm looking into it.

> > > > 

> > > > Reproduced with NFSv4.1, TCP, and nconnect=2.

> > > > 

> > > > 363         /*

> > > > 364          * RFC5661 18.51.3

> > > > 365          * Before RECLAIM_COMPLETE done, server should deny

> > > > new lock

> > > > 366          */

> > > > 367         if (nfsd4_has_session(cstate) &&

> > > > 368             !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE,

> > > > 369                       &cstate->session->se_client-

> > > > >cl_flags) &&

> > > > 370             open->op_claim_type !=

> > > > NFS4_OPEN_CLAIM_PREVIOUS)

> > > > 371                 return nfserr_grace;

> > > > 

> > > > Server-side instrumentation confirms:

> > > > 

> > > > May  4 11:28:29 klimt kernel: nfsd4_open: has_session returns

> > > > true

> > > > May  4 11:28:29 klimt kernel: nfsd4_open: RECLAIM_COMPLETE is

> > > > false

> > > > May  4 11:28:29 klimt kernel: nfsd4_open: claim_type is 0

> > > > 

> > > > Network capture shows the RPCs are interleaved between the two

> > > > connections as the client establishes its lease, and that

> > > > appears

> > > > to be confusing the server.

> > > > 

> > > > C1: NULL -> NFS4_OK

> > > > C1: EXCHANGE_ID -> NFS4_OK

> > > > C2: CREATE_SESSION -> NFS4_OK

> > > > C1: RECLAIM_COMPLETE -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION

> > > 

> > > What security flavors are involved?  I believe the correct

> > > behavior

> > > depends on whether gss is in use or not.

> > 

> > The mount options are "sec=sys" but both sides have a keytab.

> > So the lease management operations are done with krb5i.

> 

> OK.  I'm pretty sure the client needs to send BIND_CONN_TO_SESSION

> before step C1.

> 

> My memory is that over auth_sys you're allowed to treat any SEQUENCE

> over a new connection as implicitly binding that connection to the

> referenced session, but over krb5 the server's required to return

> that

> NOT_BOUND error if the server skips the BIND_CONN_TO_SESSION.

> 

> I think CREATE_SESSION is allowed as long as the principals agree,

> and

> that's why the call at C2 succeeds.  Seems a little weird, though.

> 


See https://tools.ietf.org/html/rfc5661#section-2.10.3.1

So, we probably should send the BIND_CONN_TO_SESSION after creating the
session, but since that involves figuring out whether or not state
protection was successfully negotiated, and since we have to support
handling NFS4ERR_CONN_NOT_BOUND_TO_SESSION anyway, I'm all for just
waiting for the server to send the error.

> --b.

> 

> > 

> > 

> > > --b.

> > > 

> > > > C1: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED

> > > > C2: SEQUENCE -> NFS4_OK

> > > > C1: PUTROOTFH | GETATTR -> NFS4ERR_CONN_NOT_BOUND_TO_SESSION

> > > > C1: BIND_CONN_TO_SESSION -> NFS4_OK

> > > > C2: BIND_CONN_TO_SESSION -> NFS4_OK

> > > > C2: PUTROOTFH | GETATTR -> NFS4ERR_SEQ_MISORDERED

> > > > 

> > > > .... mix of GETATTRs and other simple requests ....

> > > > 

> > > > C1: OPEN -> NFS4ERR_GRACE

> > > > C2: OPEN -> NFS4ERR_GRACE

> > > > 

> > > > The RECLAIM_COMPLETE operation failed, and the client does not

> > > > retry it. That leaves its lease stuck in GRACE.

> > > > 

> > > > 

> > > > --

> > > > Chuck Lever

> > > > 

> > > > 

> > > > 

> > > > --

> > > > To unsubscribe from this list: send the line "unsubscribe

> > > > linux-nfs" in

> > > > the body of a message to majordomo@vger.kernel.org

> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.h

> > > > tml

> > > 

> > > --

> > > To unsubscribe from this list: send the line "unsubscribe linux-

> > > nfs" in

> > > the body of a message to majordomo@vger.kernel.org

> > > More majordomo info at  http://vger.kernel.org/majordomo-info.htm

> > > l

> > 

> > --

> > Chuck Lever

> > 

> > 

> > 

> > --

> > To unsubscribe from this list: send the line "unsubscribe linux-

> > nfs" in

> > the body of a message to majordomo@vger.kernel.org

> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

> 

> --

> To unsubscribe from this list: send the line "unsubscribe linux-nfs"

> in

> the body of a message to majordomo@vger.kernel.org

> More majordomo info at  http://vger.kernel.org/majordomo-info.html

> 

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
J. Bruce Fields May 4, 2017, 8:42 p.m. UTC | #10
On Thu, May 04, 2017 at 08:40:07PM +0000, Trond Myklebust wrote:
> See https://tools.ietf.org/html/rfc5661#section-2.10.3.1
> 
> So, we probably should send the BIND_CONN_TO_SESSION after creating the
> session, but since that involves figuring out whether or not state
> protection was successfully negotiated, and since we have to support
> handling NFS4ERR_CONN_NOT_BOUND_TO_SESSION anyway, I'm all for just
> waiting for the server to send the error.

Makes sense to me.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 31b26cf1b476..31757a742e9b 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -117,6 +117,7 @@  struct nfs_parsed_mount_data {
 		char			*export_path;
 		int			port;
 		unsigned short		protocol;
+		unsigned short		nconnect;
 	} nfs_server;
 
 	struct security_mnt_opts lsm_opts;
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 54e0f9f2dd94..7eb48934dc79 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -76,6 +76,8 @@ 
 #define NFS_DEFAULT_VERSION 2
 #endif
 
+#define NFS_MAX_CONNECTIONS 16
+
 enum {
 	/* Mount options that take no arguments */
 	Opt_soft, Opt_hard,
@@ -107,6 +109,7 @@  enum {
 	Opt_nfsvers,
 	Opt_sec, Opt_proto, Opt_mountproto, Opt_mounthost,
 	Opt_addr, Opt_mountaddr, Opt_clientaddr,
+	Opt_nconnect,
 	Opt_lookupcache,
 	Opt_fscache_uniq,
 	Opt_local_lock,
@@ -179,6 +182,8 @@  static const match_table_t nfs_mount_option_tokens = {
 	{ Opt_mounthost, "mounthost=%s" },
 	{ Opt_mountaddr, "mountaddr=%s" },
 
+	{ Opt_nconnect, "nconnect=%s" },
+
 	{ Opt_lookupcache, "lookupcache=%s" },
 	{ Opt_fscache_uniq, "fsc=%s" },
 	{ Opt_local_lock, "local_lock=%s" },
@@ -1544,6 +1549,11 @@  static int nfs_parse_mount_options(char *raw,
 			if (mnt->mount_server.addrlen == 0)
 				goto out_invalid_address;
 			break;
+		case Opt_nconnect:
+			if (nfs_get_option_ul_bound(args, &option, 1, NFS_MAX_CONNECTIONS))
+				goto out_invalid_value;
+			mnt->nfs_server.nconnect = option;
+			break;
 		case Opt_lookupcache:
 			string = match_strdup(args);
 			if (string == NULL)