diff mbox series

[1/1] NFSv4.0 allow nconnect for v4.0

Message ID 20200116190857.26026-1-olga.kornievskaia@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/1] NFSv4.0 allow nconnect for v4.0 | expand

Commit Message

Olga Kornievskaia Jan. 16, 2020, 7:08 p.m. UTC
From: Olga Kornievskaia <kolga@netapp.com>

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 fs/nfs/nfs4client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Schumaker, Anna Jan. 17, 2020, 9:09 p.m. UTC | #1
Hi Olga,

On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>

Have you done any testing with nconnect and the v4.0 replay caches? I did some
digging on the mailing list and found this in one of the cover letters from
Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't
feel comfortable subjecting NFSv3/v4 replay caches to this treatment yet."

Thanks,
Anna

> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfs/nfs4client.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 460d625..4df3fb0 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server *server,
>  
>  	if (minorversion == 0)
>  		__set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
> -	else if (proto == XPRT_TRANSPORT_TCP)
> +	if (proto == XPRT_TRANSPORT_TCP)
>  		cl_init.nconnect = nconnect;
>  
>  	if (server->flags & NFS_MOUNT_NORESVPORT)
Trond Myklebust Jan. 17, 2020, 9:14 p.m. UTC | #2
On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote:
> Hi Olga,
> 
> On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> 
> Have you done any testing with nconnect and the v4.0 replay caches? I
> did some
> digging on the mailing list and found this in one of the cover
> letters from
> Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now;
> I don't
> feel comfortable subjecting NFSv3/v4 replay caches to this treatment
> yet."
> 

That comment should be considered obsolete. The current code works hard
to ensure that we replay using the same connection (or at least the
same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected.
For that reason we've had NFSv3 support since the feature was merged.
The NFSv4.0 support was just forgotten.

> Thanks,
> Anna
> 
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfs/nfs4client.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > index 460d625..4df3fb0 100644
> > --- a/fs/nfs/nfs4client.c
> > +++ b/fs/nfs/nfs4client.c
> > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server
> > *server,
> >  
> >  	if (minorversion == 0)
> >  		__set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
> > -	else if (proto == XPRT_TRANSPORT_TCP)
> > +	if (proto == XPRT_TRANSPORT_TCP)
> >  		cl_init.nconnect = nconnect;
> >  
> >  	if (server->flags & NFS_MOUNT_NORESVPORT)
Schumaker, Anna Jan. 17, 2020, 9:16 p.m. UTC | #3
On Fri, 2020-01-17 at 21:14 +0000, Trond Myklebust wrote:
> On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote:
> > Hi Olga,
> > 
> > On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote:
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > 
> > Have you done any testing with nconnect and the v4.0 replay caches? I
> > did some
> > digging on the mailing list and found this in one of the cover
> > letters from
> > Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now;
> > I don't
> > feel comfortable subjecting NFSv3/v4 replay caches to this treatment
> > yet."
> > 
> 
> That comment should be considered obsolete. The current code works hard
> to ensure that we replay using the same connection (or at least the
> same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected.
> For that reason we've had NFSv3 support since the feature was merged.
> The NFSv4.0 support was just forgotten.

Thanks for the explanation! I'll add the patch.

Anna

> 
> > Thanks,
> > Anna
> > 
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  fs/nfs/nfs4client.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > > index 460d625..4df3fb0 100644
> > > --- a/fs/nfs/nfs4client.c
> > > +++ b/fs/nfs/nfs4client.c
> > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server
> > > *server,
> > >  
> > >  	if (minorversion == 0)
> > >  		__set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
> > > -	else if (proto == XPRT_TRANSPORT_TCP)
> > > +	if (proto == XPRT_TRANSPORT_TCP)
> > >  		cl_init.nconnect = nconnect;
> > >  
> > >  	if (server->flags & NFS_MOUNT_NORESVPORT)
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
>
Steve Dickson Jan. 20, 2020, 3:35 p.m. UTC | #4
Hello,

On 1/16/20 2:08 PM, Olga Kornievskaia wrote:
> From: Olga Kornievskaia <kolga@netapp.com>
> 
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfs/nfs4client.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 460d625..4df3fb0 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server *server,
>  
>  	if (minorversion == 0)
>  		__set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
> -	else if (proto == XPRT_TRANSPORT_TCP)
> +	if (proto == XPRT_TRANSPORT_TCP)
>  		cl_init.nconnect = nconnect;
>  
>  	if (server->flags & NFS_MOUNT_NORESVPORT)
> 
Tested-by: Steve Dickson <steved@redhat.com>

With this patch v4.0 mounts act just like v4.1/v4.2 mounts
But is that a good thing. :-)  

Here is what I've found in my testing...

mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp

Will create 12 TCP connections and maintain those 12 
connections until the umount happens. By maintain I mean 
if the connection times out, it is reconnected 
to maintain the 12 connections 

# mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp
# netstat -an | grep 172.31.1.54 | wc -l
12
# netstat -an | grep 172.31.1.54        
tcp        0      0 172.31.1.24:901         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:667         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:746         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:672         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:832         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:895         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:673         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:732         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:795         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:918         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:674         172.31.1.54:2049        ESTABLISHED
tcp        0      0 172.31.1.24:953         172.31.1.54:2049        ESTABLISHED

# umount /mnt/tmp
# netstat -an | grep 172.31.1.54 | wc -l
12
# netstat -an | grep 172.31.1.54
tcp        0      0 172.31.1.24:901         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:667         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:746         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:672         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:832         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:895         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:673         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:732         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:795         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:918         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:674         172.31.1.54:2049        TIME_WAIT  
tcp        0      0 172.31.1.24:953         172.31.1.54:2049        TIME_WAIT 

Is this the expected behavior? 

If so I have a few concerns...

* The connections walk all over the /etc/services namespace. Meaning
using ports that are reserved for registered services, something
we've tried to avoid in userland by not binding to privilege ports and
use of backlist ports via /etc/bindresvport.blacklist

* When the unmount happens, all those connections go into TIME_WAIT on 
privilege ports and there are only so many of those. Not good during mount 
storms (when a server reboots and thousand of home dirs are remounted).

* No man page describing the new feature.

I realize there is not much we can do about some of these
(aka umount==>TIME_WAIT) but I think we need to document 
what we are doing to people's connection namespace when 
they use this feature. 

steved.
Trond Myklebust Jan. 23, 2020, 12:23 a.m. UTC | #5
On Mon, 2020-01-20 at 10:35 -0500, Steve Dickson wrote:
> Hello,
> 
> On 1/16/20 2:08 PM, Olga Kornievskaia wrote:
> > From: Olga Kornievskaia <kolga@netapp.com>
> > 
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  fs/nfs/nfs4client.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > index 460d625..4df3fb0 100644
> > --- a/fs/nfs/nfs4client.c
> > +++ b/fs/nfs/nfs4client.c
> > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server
> > *server,
> >  
> >  	if (minorversion == 0)
> >  		__set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
> > -	else if (proto == XPRT_TRANSPORT_TCP)
> > +	if (proto == XPRT_TRANSPORT_TCP)
> >  		cl_init.nconnect = nconnect;
> >  
> >  	if (server->flags & NFS_MOUNT_NORESVPORT)
> > 
> Tested-by: Steve Dickson <steved@redhat.com>
> 
> With this patch v4.0 mounts act just like v4.1/v4.2 mounts
> But is that a good thing. :-)  
> 
> Here is what I've found in my testing...
> 
> mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp
> 
> Will create 12 TCP connections and maintain those 12 
> connections until the umount happens. By maintain I mean 
> if the connection times out, it is reconnected 
> to maintain the 12 connections 
> 
> # mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp
> # netstat -an | grep 172.31.1.54 | wc -l
> 12
> # netstat -an | grep 172.31.1.54        
> tcp        0      0
> 172.31.1.24:901         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:667         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:746         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:672         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:832         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:895         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:673         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:732         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:795         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:918         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:674         172.31.1.54:2049        ESTABLISHED
> tcp        0      0
> 172.31.1.24:953         172.31.1.54:2049        ESTABLISHED
> 
> # umount /mnt/tmp
> # netstat -an | grep 172.31.1.54 | wc -l
> 12
> # netstat -an | grep 172.31.1.54
> tcp        0      0
> 172.31.1.24:901         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:667         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:746         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:672         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:832         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:895         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:673         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:732         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:795         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:918         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:674         172.31.1.54:2049        TIME_WAIT  
> tcp        0      0
> 172.31.1.24:953         172.31.1.54:2049        TIME_WAIT 
> 
> Is this the expected behavior? 
> 
> If so I have a few concerns...
> 
> * The connections walk all over the /etc/services namespace. Meaning
> using ports that are reserved for registered services, something
> we've tried to avoid in userland by not binding to privilege ports
> and
> use of backlist ports via /etc/bindresvport.blacklist
> 
> * When the unmount happens, all those connections go into TIME_WAIT
> on 
> privilege ports and there are only so many of those. Not good during
> mount 
> storms (when a server reboots and thousand of home dirs are
> remounted).
> 
> * No man page describing the new feature.
> 
> I realize there is not much we can do about some of these
> (aka umount==>TIME_WAIT) but I think we need to document 
> what we are doing to people's connection namespace when 
> they use this feature. 

I'm not sure that I understand the concern. The connections are limited
to a specific window of ports by the min_resvport/max_resvport sunrpc
module parameters just as they were before we added 'nconnect'. Nothing
has changed in the way we choose ports...
Steve Dickson Jan. 27, 2020, 6:39 p.m. UTC | #6
On 1/22/20 7:23 PM, Trond Myklebust wrote:
> On Mon, 2020-01-20 at 10:35 -0500, Steve Dickson wrote:
>> Hello,
>>
>> On 1/16/20 2:08 PM, Olga Kornievskaia wrote:
>>> From: Olga Kornievskaia <kolga@netapp.com>
>>>
>>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
>>> ---
>>>  fs/nfs/nfs4client.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
>>> index 460d625..4df3fb0 100644
>>> --- a/fs/nfs/nfs4client.c
>>> +++ b/fs/nfs/nfs4client.c
>>> @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server
>>> *server,
>>>  
>>>  	if (minorversion == 0)
>>>  		__set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
>>> -	else if (proto == XPRT_TRANSPORT_TCP)
>>> +	if (proto == XPRT_TRANSPORT_TCP)
>>>  		cl_init.nconnect = nconnect;
>>>  
>>>  	if (server->flags & NFS_MOUNT_NORESVPORT)
>>>
>> Tested-by: Steve Dickson <steved@redhat.com>
>>
>> With this patch v4.0 mounts act just like v4.1/v4.2 mounts
>> But is that a good thing. :-)  
>>
>> Here is what I've found in my testing...
>>
>> mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp
>>
>> Will create 12 TCP connections and maintain those 12 
>> connections until the umount happens. By maintain I mean 
>> if the connection times out, it is reconnected 
>> to maintain the 12 connections 
>>
>> # mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp
>> # netstat -an | grep 172.31.1.54 | wc -l
>> 12
>> # netstat -an | grep 172.31.1.54        
>> tcp        0      0
>> 172.31.1.24:901         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:667         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:746         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:672         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:832         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:895         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:673         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:732         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:795         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:918         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:674         172.31.1.54:2049        ESTABLISHED
>> tcp        0      0
>> 172.31.1.24:953         172.31.1.54:2049        ESTABLISHED
>>
>> # umount /mnt/tmp
>> # netstat -an | grep 172.31.1.54 | wc -l
>> 12
>> # netstat -an | grep 172.31.1.54
>> tcp        0      0
>> 172.31.1.24:901         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:667         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:746         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:672         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:832         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:895         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:673         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:732         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:795         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:918         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:674         172.31.1.54:2049        TIME_WAIT  
>> tcp        0      0
>> 172.31.1.24:953         172.31.1.54:2049        TIME_WAIT 
>>
>> Is this the expected behavior? 
>>
>> If so I have a few concerns...
>>
>> * The connections walk all over the /etc/services namespace. Meaning
>> using ports that are reserved for registered services, something
>> we've tried to avoid in userland by not binding to privilege ports
>> and
>> use of backlist ports via /etc/bindresvport.blacklist
>>
>> * When the unmount happens, all those connections go into TIME_WAIT
>> on 
>> privilege ports and there are only so many of those. Not good during
>> mount 
>> storms (when a server reboots and thousand of home dirs are
>> remounted).
>>
>> * No man page describing the new feature.
>>
>> I realize there is not much we can do about some of these
>> (aka umount==>TIME_WAIT) but I think we need to document 
>> what we are doing to people's connection namespace when 
>> they use this feature. 
> 
> I'm not sure that I understand the concern. The connections are limited
> to a specific window of ports by the min_resvport/max_resvport sunrpc
> module parameters just as they were before we added 'nconnect'. Nothing
> has changed in the way we choose ports...
> 
Maybe this problem has existed for a while... 

Here are the mins/max ports
RPC_DEF_MIN_RESVPORT    (665U)
RPC_DEF_MAX_RESVPORT    (1023U)

From /etc/services there are the services in that range
acp(674), ha-cluster(694), kerberos-adm(749), kerberos-iv(750)
webster(765), phonebook(767), rsync(873), rquotad(875), 
telnets(992), imaps(993), pop3s(995)

Granted a lot of these are unused/legacy services, but some of
them, like imaps and rsync, are still used. 

My point is since the nconnect connections are persistent, for
the life of the mount, we could end up squatting on ports other
services will needed.

Maybe there is not much we can do about this... But we should explain
somewhere, like the man page, that nconnect will create up to 16
persistent connection on register privilege ports.

steved.
Trond Myklebust Jan. 27, 2020, 7:35 p.m. UTC | #7
On Mon, 2020-01-27 at 13:39 -0500, Steve Dickson wrote:
> 
> On 1/22/20 7:23 PM, Trond Myklebust wrote:
> > On Mon, 2020-01-20 at 10:35 -0500, Steve Dickson wrote:
> > > Hello,
> > > 
> > > On 1/16/20 2:08 PM, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > > 
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/nfs/nfs4client.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > > > index 460d625..4df3fb0 100644
> > > > --- a/fs/nfs/nfs4client.c
> > > > +++ b/fs/nfs/nfs4client.c
> > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct
> > > > nfs_server
> > > > *server,
> > > >  
> > > >  	if (minorversion == 0)
> > > >  		__set_bit(NFS_CS_REUSEPORT,
> > > > &cl_init.init_flags);
> > > > -	else if (proto == XPRT_TRANSPORT_TCP)
> > > > +	if (proto == XPRT_TRANSPORT_TCP)
> > > >  		cl_init.nconnect = nconnect;
> > > >  
> > > >  	if (server->flags & NFS_MOUNT_NORESVPORT)
> > > > 
> > > Tested-by: Steve Dickson <steved@redhat.com>
> > > 
> > > With this patch v4.0 mounts act just like v4.1/v4.2 mounts
> > > But is that a good thing. :-)  
> > > 
> > > Here is what I've found in my testing...
> > > 
> > > mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp
> > > 
> > > Will create 12 TCP connections and maintain those 12 
> > > connections until the umount happens. By maintain I mean 
> > > if the connection times out, it is reconnected 
> > > to maintain the 12 connections 
> > > 
> > > # mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp
> > > # netstat -an | grep 172.31.1.54 | wc -l
> > > 12
> > > # netstat -an | grep 172.31.1.54        
> > > tcp        0      0
> > > 172.31.1.24:901         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:667         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:746         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:672         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:832         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:895         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:673         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:732         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:795         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:918         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:674         172.31.1.54:2049        ESTABLISHED
> > > tcp        0      0
> > > 172.31.1.24:953         172.31.1.54:2049        ESTABLISHED
> > > 
> > > # umount /mnt/tmp
> > > # netstat -an | grep 172.31.1.54 | wc -l
> > > 12
> > > # netstat -an | grep 172.31.1.54
> > > tcp        0      0
> > > 172.31.1.24:901         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:667         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:746         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:672         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:832         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:895         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:673         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:732         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:795         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:918         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:674         172.31.1.54:2049        TIME_WAIT  
> > > tcp        0      0
> > > 172.31.1.24:953         172.31.1.54:2049        TIME_WAIT 
> > > 
> > > Is this the expected behavior? 
> > > 
> > > If so I have a few concerns...
> > > 
> > > * The connections walk all over the /etc/services namespace.
> > > Meaning
> > > using ports that are reserved for registered services, something
> > > we've tried to avoid in userland by not binding to privilege
> > > ports
> > > and
> > > use of backlist ports via /etc/bindresvport.blacklist
> > > 
> > > * When the unmount happens, all those connections go into
> > > TIME_WAIT
> > > on 
> > > privilege ports and there are only so many of those. Not good
> > > during
> > > mount 
> > > storms (when a server reboots and thousand of home dirs are
> > > remounted).
> > > 
> > > * No man page describing the new feature.
> > > 
> > > I realize there is not much we can do about some of these
> > > (aka umount==>TIME_WAIT) but I think we need to document 
> > > what we are doing to people's connection namespace when 
> > > they use this feature. 
> > 
> > I'm not sure that I understand the concern. The connections are
> > limited
> > to a specific window of ports by the min_resvport/max_resvport
> > sunrpc
> > module parameters just as they were before we added 'nconnect'.
> > Nothing
> > has changed in the way we choose ports...
> > 
> Maybe this problem has existed for a while... 
> 
> Here are the mins/max ports
> RPC_DEF_MIN_RESVPORT    (665U)
> RPC_DEF_MAX_RESVPORT    (1023U)
> 
> From /etc/services there are the services in that range
> acp(674), ha-cluster(694), kerberos-adm(749), kerberos-iv(750)
> webster(765), phonebook(767), rsync(873), rquotad(875), 
> telnets(992), imaps(993), pop3s(995)
> 
> Granted a lot of these are unused/legacy services, but some of
> them, like imaps and rsync, are still used. 
> 
> My point is since the nconnect connections are persistent, for
> the life of the mount, we could end up squatting on ports other
> services will needed.
> 
> Maybe there is not much we can do about this... But we should explain
> somewhere, like the man page, that nconnect will create up to 16
> persistent connection on register privilege ports.


If users have a need to run servers on a port that might chosen by a
kernel nfs, lockd or rpcbind client, then they can guarantee no
collisions by redefining the 'privileged ports' available to sunrpc to
any arbitrary range <portnr1> - <portnr2>:

Either

 * Reserve the ports at module load time, by adding a line to a config
   file /etc/modprobe.d/foo.conf of the form
      options sunrpc min_resvport=<portnr1> max_resvport=<portnr2>

or

 * Change the port reservation after module load (but before mounting
   your first NFS filesystem) using
      echo '<portnr1>' > /sys/module/sunrpc/parameters/min_resvport
      echo '<portnr2>' > /sys/module/sunrpc/parameters/max_resvport

This is something users ought to be doing already if they need
guaranteed availability of specific ports in the range 665-1023 while
using the NFS client. That is entirely independent of whether or not
they are using nconnect.
Olga Kornievskaia Jan. 31, 2020, 7:56 p.m. UTC | #8
Trond, Anna,

If it's not too late then I'd like to re-tract this patch. I
understand if somebody else would really like to have this
functionality in then please speak up. If it's too late to pull I also
understand but I need to at least try.

Thank you.

On Thu, Jan 16, 2020 at 2:09 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> From: Olga Kornievskaia <kolga@netapp.com>
>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  fs/nfs/nfs4client.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 460d625..4df3fb0 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server *server,
>
>         if (minorversion == 0)
>                 __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
> -       else if (proto == XPRT_TRANSPORT_TCP)
> +       if (proto == XPRT_TRANSPORT_TCP)
>                 cl_init.nconnect = nconnect;
>
>         if (server->flags & NFS_MOUNT_NORESVPORT)
> --
> 1.8.3.1
>
J. Bruce Fields June 11, 2020, 8:09 p.m. UTC | #9
On Fri, Jan 17, 2020 at 09:16:54PM +0000, Schumaker, Anna wrote:
> On Fri, 2020-01-17 at 21:14 +0000, Trond Myklebust wrote:
> > On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote:
> > > Hi Olga,
> > > 
> > > On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote:
> > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > 
> > > Have you done any testing with nconnect and the v4.0 replay caches? I
> > > did some
> > > digging on the mailing list and found this in one of the cover
> > > letters from
> > > Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now;
> > > I don't
> > > feel comfortable subjecting NFSv3/v4 replay caches to this treatment
> > > yet."
> > > 
> > 
> > That comment should be considered obsolete. The current code works hard
> > to ensure that we replay using the same connection (or at least the
> > same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected.
> > For that reason we've had NFSv3 support since the feature was merged.
> > The NFSv4.0 support was just forgotten.
> 
> Thanks for the explanation! I'll add the patch.

What happened to this patch?  As far as I can tell, the conclusion of
this thread was that it should be applied.

--b.

> 
> Anna
> 
> > 
> > > Thanks,
> > > Anna
> > > 
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/nfs/nfs4client.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > > > index 460d625..4df3fb0 100644
> > > > --- a/fs/nfs/nfs4client.c
> > > > +++ b/fs/nfs/nfs4client.c
> > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server
> > > > *server,
> > > >  
> > > >  	if (minorversion == 0)
> > > >  		__set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
> > > > -	else if (proto == XPRT_TRANSPORT_TCP)
> > > > +	if (proto == XPRT_TRANSPORT_TCP)
> > > >  		cl_init.nconnect = nconnect;
> > > >  
> > > >  	if (server->flags & NFS_MOUNT_NORESVPORT)
> > -- 
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> > 
> >
Rick Macklem June 11, 2020, 11 p.m. UTC | #10
J. Bruce Fields wrote:
>On Fri, Jan 17, 2020 at 09:16:54PM +0000, Schumaker, Anna wrote:
>> On Fri, 2020-01-17 at 21:14 +0000, Trond Myklebust wrote:
>> > On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote:
>> > > Hi Olga,
>> > >
>> > > On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote:
>> > > > From: Olga Kornievskaia <kolga@netapp.com>
>> > >
>> > > Have you done any testing with nconnect and the v4.0 replay caches? I
>> > > did some
>> > > digging on the mailing list and found this in one of the cover
>> > > letters from
>> > > Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now;
>> > > I don't
>> > > feel comfortable subjecting NFSv3/v4 replay caches to this treatment
>> > > yet."
>> > >
>> >
>> > That comment should be considered obsolete. The current code works hard
>> > to ensure that we replay using the same connection (or at least the
>> > same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected.
>> > For that reason we've had NFSv3 support since the feature was merged.
>> > The NFSv4.0 support was just forgotten.
>>
>> Thanks for the explanation! I'll add the patch.
Maybe I am misinterpreting this discussion and, if so, please just ignore these
comments.
Here are two snippets from RFC7530:
In Sec. 3.1.
   Where an NFSv4 implementation supports operation over the IP network
   protocol, the supported transport layer between NFS and IP MUST be an
   IETF standardized transport protocol that is specified to avoid
   network congestion; such transports include TCP and the Stream
   Control Transmission Protocol (SCTP).  To enhance the possibilities
   for interoperability, an NFSv4 implementation MUST support operation
   over the TCP transport protocol.

This clearly states that NFSv4.0 cannot operate over UDP. (See the MUST above.)

In Sec. 3.1.1.
   When processing an NFSv4 request received over a reliable transport
   such as TCP, the NFSv4 server MUST NOT silently drop the request,
   except if the established transport connection has been broken.
   Given such a contract between NFSv4 clients and servers, clients MUST
   NOT retry a request unless one or both of the following are true:

   o  The transport connection has been broken

   o  The procedure being retried is the NULL procedure
If the TCP connection is broken, making a new TCP connection on the
same port #s would be risky, unless a long delay is done. Normally a
different port# would be used and the implication of the above is that
any retry of an RPC (above the TCP layer retransmits of segments)
will normally be from a different port#.

Long ago I played around with a DRC for TCP (which ended up in FreeBSD)
and because retries happen after a long timeout and there can be many
other RPCs done in between the first attempt for the RPC and a subsequent
retry of the RPC, the design must be very different than a DRC for UDP.
--> LRU caching doesn't work well unless the cache size is very large.
--> For NFSv4.0 over TCP (not NFSv3), the DRC must assume (or at least
      handle) retries from a different client port#, since they will be done on
      a different TCP connection and that will almost inevitably imply a
      different port#.

rick

What happened to this patch?  As far as I can tell, the conclusion of
this thread was that it should be applied.

--b.

>
> Anna
>
> >
> > > Thanks,
> > > Anna
> > >
> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > ---
> > > >  fs/nfs/nfs4client.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > > > index 460d625..4df3fb0 100644
> > > > --- a/fs/nfs/nfs4client.c
> > > > +++ b/fs/nfs/nfs4client.c
> > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server
> > > > *server,
> > > >
> > > >         if (minorversion == 0)
> > > >                 __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
> > > > -       else if (proto == XPRT_TRANSPORT_TCP)
> > > > +       if (proto == XPRT_TRANSPORT_TCP)
> > > >                 cl_init.nconnect = nconnect;
> > > >
> > > >         if (server->flags & NFS_MOUNT_NORESVPORT)
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> >
> >
Olga Kornievskaia June 12, 2020, 1:26 p.m. UTC | #11
On Thu, Jun 11, 2020 at 4:09 PM J. Bruce Fields <bfields@fieldses.org> wrote:
>
> On Fri, Jan 17, 2020 at 09:16:54PM +0000, Schumaker, Anna wrote:
> > On Fri, 2020-01-17 at 21:14 +0000, Trond Myklebust wrote:
> > > On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote:
> > > > Hi Olga,
> > > >
> > > > On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote:
> > > > > From: Olga Kornievskaia <kolga@netapp.com>
> > > >
> > > > Have you done any testing with nconnect and the v4.0 replay caches? I
> > > > did some
> > > > digging on the mailing list and found this in one of the cover
> > > > letters from
> > > > Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now;
> > > > I don't
> > > > feel comfortable subjecting NFSv3/v4 replay caches to this treatment
> > > > yet."
> > > >
> > >
> > > That comment should be considered obsolete. The current code works hard
> > > to ensure that we replay using the same connection (or at least the
> > > same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected.
> > > For that reason we've had NFSv3 support since the feature was merged.
> > > The NFSv4.0 support was just forgotten.
> >
> > Thanks for the explanation! I'll add the patch.
>
> What happened to this patch?  As far as I can tell, the conclusion of
> this thread was that it should be applied.

I decided not to submit this patch but anybody else is free to add
that patch to add support for 4.0 nconnect as there is no reason it
shouldn't be supported.

>
> --b.
>
> >
> > Anna
> >
> > >
> > > > Thanks,
> > > > Anna
> > > >
> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > > > ---
> > > > >  fs/nfs/nfs4client.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> > > > > index 460d625..4df3fb0 100644
> > > > > --- a/fs/nfs/nfs4client.c
> > > > > +++ b/fs/nfs/nfs4client.c
> > > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server
> > > > > *server,
> > > > >
> > > > >         if (minorversion == 0)
> > > > >                 __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
> > > > > -       else if (proto == XPRT_TRANSPORT_TCP)
> > > > > +       if (proto == XPRT_TRANSPORT_TCP)
> > > > >                 cl_init.nconnect = nconnect;
> > > > >
> > > > >         if (server->flags & NFS_MOUNT_NORESVPORT)
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > trond.myklebust@hammerspace.com
> > >
> > >
diff mbox series

Patch

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 460d625..4df3fb0 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -881,7 +881,7 @@  static int nfs4_set_client(struct nfs_server *server,
 
 	if (minorversion == 0)
 		__set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags);
-	else if (proto == XPRT_TRANSPORT_TCP)
+	if (proto == XPRT_TRANSPORT_TCP)
 		cl_init.nconnect = nconnect;
 
 	if (server->flags & NFS_MOUNT_NORESVPORT)