Message ID | 20200116190857.26026-1-olga.kornievskaia@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] NFSv4.0 allow nconnect for v4.0 | expand |
Hi Olga, On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote: > From: Olga Kornievskaia <kolga@netapp.com> Have you done any testing with nconnect and the v4.0 replay caches? I did some digging on the mailing list and found this in one of the cover letters from Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now; I don't feel comfortable subjecting NFSv3/v4 replay caches to this treatment yet." Thanks, Anna > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > --- > fs/nfs/nfs4client.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > index 460d625..4df3fb0 100644 > --- a/fs/nfs/nfs4client.c > +++ b/fs/nfs/nfs4client.c > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server *server, > > if (minorversion == 0) > __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); > - else if (proto == XPRT_TRANSPORT_TCP) > + if (proto == XPRT_TRANSPORT_TCP) > cl_init.nconnect = nconnect; > > if (server->flags & NFS_MOUNT_NORESVPORT)
On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote: > Hi Olga, > > On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote: > > From: Olga Kornievskaia <kolga@netapp.com> > > Have you done any testing with nconnect and the v4.0 replay caches? I > did some > digging on the mailing list and found this in one of the cover > letters from > Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now; > I don't > feel comfortable subjecting NFSv3/v4 replay caches to this treatment > yet." > That comment should be considered obsolete. The current code works hard to ensure that we replay using the same connection (or at least the same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected. For that reason we've had NFSv3 support since the feature was merged. The NFSv4.0 support was just forgotten. > Thanks, > Anna > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > --- > > fs/nfs/nfs4client.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > > index 460d625..4df3fb0 100644 > > --- a/fs/nfs/nfs4client.c > > +++ b/fs/nfs/nfs4client.c > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server > > *server, > > > > if (minorversion == 0) > > __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); > > - else if (proto == XPRT_TRANSPORT_TCP) > > + if (proto == XPRT_TRANSPORT_TCP) > > cl_init.nconnect = nconnect; > > > > if (server->flags & NFS_MOUNT_NORESVPORT)
On Fri, 2020-01-17 at 21:14 +0000, Trond Myklebust wrote: > On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote: > > Hi Olga, > > > > On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote: > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > Have you done any testing with nconnect and the v4.0 replay caches? I > > did some > > digging on the mailing list and found this in one of the cover > > letters from > > Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now; > > I don't > > feel comfortable subjecting NFSv3/v4 replay caches to this treatment > > yet." > > > > That comment should be considered obsolete. The current code works hard > to ensure that we replay using the same connection (or at least the > same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected. > For that reason we've had NFSv3 support since the feature was merged. > The NFSv4.0 support was just forgotten. Thanks for the explanation! I'll add the patch. Anna > > > Thanks, > > Anna > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > --- > > > fs/nfs/nfs4client.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > > > index 460d625..4df3fb0 100644 > > > --- a/fs/nfs/nfs4client.c > > > +++ b/fs/nfs/nfs4client.c > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server > > > *server, > > > > > > if (minorversion == 0) > > > __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); > > > - else if (proto == XPRT_TRANSPORT_TCP) > > > + if (proto == XPRT_TRANSPORT_TCP) > > > cl_init.nconnect = nconnect; > > > > > > if (server->flags & NFS_MOUNT_NORESVPORT) > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >
Hello, On 1/16/20 2:08 PM, Olga Kornievskaia wrote: > From: Olga Kornievskaia <kolga@netapp.com> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > --- > fs/nfs/nfs4client.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > index 460d625..4df3fb0 100644 > --- a/fs/nfs/nfs4client.c > +++ b/fs/nfs/nfs4client.c > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server *server, > > if (minorversion == 0) > __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); > - else if (proto == XPRT_TRANSPORT_TCP) > + if (proto == XPRT_TRANSPORT_TCP) > cl_init.nconnect = nconnect; > > if (server->flags & NFS_MOUNT_NORESVPORT) > Tested-by: Steve Dickson <steved@redhat.com> With this patch v4.0 mounts act just like v4.1/v4.2 mounts But is that a good thing. :-) Here is what I've found in my testing... mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp Will create 12 TCP connections and maintain those 12 connections until the umount happens. By maintain I mean if the connection times out, it is reconnected to maintain the 12 connections # mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp # netstat -an | grep 172.31.1.54 | wc -l 12 # netstat -an | grep 172.31.1.54 tcp 0 0 172.31.1.24:901 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:667 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:746 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:672 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:832 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:895 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:673 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:732 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:795 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:918 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:674 172.31.1.54:2049 ESTABLISHED tcp 0 0 172.31.1.24:953 172.31.1.54:2049 ESTABLISHED # umount /mnt/tmp # netstat -an | grep 172.31.1.54 | wc -l 12 # netstat -an | grep 172.31.1.54 tcp 0 0 172.31.1.24:901 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:667 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:746 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:672 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:832 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:895 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:673 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:732 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:795 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:918 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:674 172.31.1.54:2049 TIME_WAIT tcp 0 0 172.31.1.24:953 172.31.1.54:2049 TIME_WAIT Is this the expected behavior? If so I have a few concerns... * The connections walk all over the /etc/services namespace. Meaning using ports that are reserved for registered services, something we've tried to avoid in userland by not binding to privilege ports and use of backlist ports via /etc/bindresvport.blacklist * When the unmount happens, all those connections go into TIME_WAIT on privilege ports and there are only so many of those. Not good during mount storms (when a server reboots and thousand of home dirs are remounted). * No man page describing the new feature. I realize there is not much we can do about some of these (aka umount==>TIME_WAIT) but I think we need to document what we are doing to people's connection namespace when they use this feature. steved.
On Mon, 2020-01-20 at 10:35 -0500, Steve Dickson wrote: > Hello, > > On 1/16/20 2:08 PM, Olga Kornievskaia wrote: > > From: Olga Kornievskaia <kolga@netapp.com> > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > --- > > fs/nfs/nfs4client.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > > index 460d625..4df3fb0 100644 > > --- a/fs/nfs/nfs4client.c > > +++ b/fs/nfs/nfs4client.c > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server > > *server, > > > > if (minorversion == 0) > > __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); > > - else if (proto == XPRT_TRANSPORT_TCP) > > + if (proto == XPRT_TRANSPORT_TCP) > > cl_init.nconnect = nconnect; > > > > if (server->flags & NFS_MOUNT_NORESVPORT) > > > Tested-by: Steve Dickson <steved@redhat.com> > > With this patch v4.0 mounts act just like v4.1/v4.2 mounts > But is that a good thing. :-) > > Here is what I've found in my testing... > > mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp > > Will create 12 TCP connections and maintain those 12 > connections until the umount happens. By maintain I mean > if the connection times out, it is reconnected > to maintain the 12 connections > > # mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp > # netstat -an | grep 172.31.1.54 | wc -l > 12 > # netstat -an | grep 172.31.1.54 > tcp 0 0 > 172.31.1.24:901 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:667 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:746 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:672 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:832 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:895 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:673 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:732 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:795 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:918 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:674 172.31.1.54:2049 ESTABLISHED > tcp 0 0 > 172.31.1.24:953 172.31.1.54:2049 ESTABLISHED > > # umount /mnt/tmp > # netstat -an | grep 172.31.1.54 | wc -l > 12 > # netstat -an | grep 172.31.1.54 > tcp 0 0 > 172.31.1.24:901 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:667 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:746 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:672 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:832 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:895 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:673 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:732 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:795 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:918 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:674 172.31.1.54:2049 TIME_WAIT > tcp 0 0 > 172.31.1.24:953 172.31.1.54:2049 TIME_WAIT > > Is this the expected behavior? > > If so I have a few concerns... > > * The connections walk all over the /etc/services namespace. Meaning > using ports that are reserved for registered services, something > we've tried to avoid in userland by not binding to privilege ports > and > use of backlist ports via /etc/bindresvport.blacklist > > * When the unmount happens, all those connections go into TIME_WAIT > on > privilege ports and there are only so many of those. Not good during > mount > storms (when a server reboots and thousand of home dirs are > remounted). > > * No man page describing the new feature. > > I realize there is not much we can do about some of these > (aka umount==>TIME_WAIT) but I think we need to document > what we are doing to people's connection namespace when > they use this feature. I'm not sure that I understand the concern. The connections are limited to a specific window of ports by the min_resvport/max_resvport sunrpc module parameters just as they were before we added 'nconnect'. Nothing has changed in the way we choose ports...
On 1/22/20 7:23 PM, Trond Myklebust wrote: > On Mon, 2020-01-20 at 10:35 -0500, Steve Dickson wrote: >> Hello, >> >> On 1/16/20 2:08 PM, Olga Kornievskaia wrote: >>> From: Olga Kornievskaia <kolga@netapp.com> >>> >>> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> >>> --- >>> fs/nfs/nfs4client.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c >>> index 460d625..4df3fb0 100644 >>> --- a/fs/nfs/nfs4client.c >>> +++ b/fs/nfs/nfs4client.c >>> @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server >>> *server, >>> >>> if (minorversion == 0) >>> __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); >>> - else if (proto == XPRT_TRANSPORT_TCP) >>> + if (proto == XPRT_TRANSPORT_TCP) >>> cl_init.nconnect = nconnect; >>> >>> if (server->flags & NFS_MOUNT_NORESVPORT) >>> >> Tested-by: Steve Dickson <steved@redhat.com> >> >> With this patch v4.0 mounts act just like v4.1/v4.2 mounts >> But is that a good thing. :-) >> >> Here is what I've found in my testing... >> >> mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp >> >> Will create 12 TCP connections and maintain those 12 >> connections until the umount happens. By maintain I mean >> if the connection times out, it is reconnected >> to maintain the 12 connections >> >> # mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp >> # netstat -an | grep 172.31.1.54 | wc -l >> 12 >> # netstat -an | grep 172.31.1.54 >> tcp 0 0 >> 172.31.1.24:901 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:667 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:746 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:672 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:832 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:895 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:673 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:732 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:795 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:918 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:674 172.31.1.54:2049 ESTABLISHED >> tcp 0 0 >> 172.31.1.24:953 172.31.1.54:2049 ESTABLISHED >> >> # umount /mnt/tmp >> # netstat -an | grep 172.31.1.54 | wc -l >> 12 >> # netstat -an | grep 172.31.1.54 >> tcp 0 0 >> 172.31.1.24:901 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:667 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:746 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:672 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:832 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:895 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:673 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:732 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:795 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:918 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:674 172.31.1.54:2049 TIME_WAIT >> tcp 0 0 >> 172.31.1.24:953 172.31.1.54:2049 TIME_WAIT >> >> Is this the expected behavior? >> >> If so I have a few concerns... >> >> * The connections walk all over the /etc/services namespace. Meaning >> using ports that are reserved for registered services, something >> we've tried to avoid in userland by not binding to privilege ports >> and >> use of backlist ports via /etc/bindresvport.blacklist >> >> * When the unmount happens, all those connections go into TIME_WAIT >> on >> privilege ports and there are only so many of those. Not good during >> mount >> storms (when a server reboots and thousand of home dirs are >> remounted). >> >> * No man page describing the new feature. >> >> I realize there is not much we can do about some of these >> (aka umount==>TIME_WAIT) but I think we need to document >> what we are doing to people's connection namespace when >> they use this feature. > > I'm not sure that I understand the concern. The connections are limited > to a specific window of ports by the min_resvport/max_resvport sunrpc > module parameters just as they were before we added 'nconnect'. Nothing > has changed in the way we choose ports... > Maybe this problem has existed for a while... Here are the mins/max ports RPC_DEF_MIN_RESVPORT (665U) RPC_DEF_MAX_RESVPORT (1023U) From /etc/services there are the services in that range acp(674), ha-cluster(694), kerberos-adm(749), kerberos-iv(750) webster(765), phonebook(767), rsync(873), rquotad(875), telnets(992), imaps(993), pop3s(995) Granted a lot of these are unused/legacy services, but some of them, like imaps and rsync, are still used. My point is since the nconnect connections are persistent, for the life of the mount, we could end up squatting on ports other services will needed. Maybe there is not much we can do about this... But we should explain somewhere, like the man page, that nconnect will create up to 16 persistent connection on register privilege ports. steved.
On Mon, 2020-01-27 at 13:39 -0500, Steve Dickson wrote: > > On 1/22/20 7:23 PM, Trond Myklebust wrote: > > On Mon, 2020-01-20 at 10:35 -0500, Steve Dickson wrote: > > > Hello, > > > > > > On 1/16/20 2:08 PM, Olga Kornievskaia wrote: > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > --- > > > > fs/nfs/nfs4client.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > > > > index 460d625..4df3fb0 100644 > > > > --- a/fs/nfs/nfs4client.c > > > > +++ b/fs/nfs/nfs4client.c > > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct > > > > nfs_server > > > > *server, > > > > > > > > if (minorversion == 0) > > > > __set_bit(NFS_CS_REUSEPORT, > > > > &cl_init.init_flags); > > > > - else if (proto == XPRT_TRANSPORT_TCP) > > > > + if (proto == XPRT_TRANSPORT_TCP) > > > > cl_init.nconnect = nconnect; > > > > > > > > if (server->flags & NFS_MOUNT_NORESVPORT) > > > > > > > Tested-by: Steve Dickson <steved@redhat.com> > > > > > > With this patch v4.0 mounts act just like v4.1/v4.2 mounts > > > But is that a good thing. :-) > > > > > > Here is what I've found in my testing... > > > > > > mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp > > > > > > Will create 12 TCP connections and maintain those 12 > > > connections until the umount happens. By maintain I mean > > > if the connection times out, it is reconnected > > > to maintain the 12 connections > > > > > > # mount -onconnect=12 172.31.1.54:/home/tmp /mnt/tmp > > > # netstat -an | grep 172.31.1.54 | wc -l > > > 12 > > > # netstat -an | grep 172.31.1.54 > > > tcp 0 0 > > > 172.31.1.24:901 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:667 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:746 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:672 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:832 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:895 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:673 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:732 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:795 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:918 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:674 172.31.1.54:2049 ESTABLISHED > > > tcp 0 0 > > > 172.31.1.24:953 172.31.1.54:2049 ESTABLISHED > > > > > > # umount /mnt/tmp > > > # netstat -an | grep 172.31.1.54 | wc -l > > > 12 > > > # netstat -an | grep 172.31.1.54 > > > tcp 0 0 > > > 172.31.1.24:901 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:667 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:746 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:672 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:832 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:895 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:673 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:732 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:795 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:918 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:674 172.31.1.54:2049 TIME_WAIT > > > tcp 0 0 > > > 172.31.1.24:953 172.31.1.54:2049 TIME_WAIT > > > > > > Is this the expected behavior? > > > > > > If so I have a few concerns... > > > > > > * The connections walk all over the /etc/services namespace. > > > Meaning > > > using ports that are reserved for registered services, something > > > we've tried to avoid in userland by not binding to privilege > > > ports > > > and > > > use of backlist ports via /etc/bindresvport.blacklist > > > > > > * When the unmount happens, all those connections go into > > > TIME_WAIT > > > on > > > privilege ports and there are only so many of those. Not good > > > during > > > mount > > > storms (when a server reboots and thousand of home dirs are > > > remounted). > > > > > > * No man page describing the new feature. > > > > > > I realize there is not much we can do about some of these > > > (aka umount==>TIME_WAIT) but I think we need to document > > > what we are doing to people's connection namespace when > > > they use this feature. > > > > I'm not sure that I understand the concern. The connections are > > limited > > to a specific window of ports by the min_resvport/max_resvport > > sunrpc > > module parameters just as they were before we added 'nconnect'. > > Nothing > > has changed in the way we choose ports... > > > Maybe this problem has existed for a while... > > Here are the mins/max ports > RPC_DEF_MIN_RESVPORT (665U) > RPC_DEF_MAX_RESVPORT (1023U) > > From /etc/services there are the services in that range > acp(674), ha-cluster(694), kerberos-adm(749), kerberos-iv(750) > webster(765), phonebook(767), rsync(873), rquotad(875), > telnets(992), imaps(993), pop3s(995) > > Granted a lot of these are unused/legacy services, but some of > them, like imaps and rsync, are still used. > > My point is since the nconnect connections are persistent, for > the life of the mount, we could end up squatting on ports other > services will needed. > > Maybe there is not much we can do about this... But we should explain > somewhere, like the man page, that nconnect will create up to 16 > persistent connection on register privilege ports. If users have a need to run servers on a port that might chosen by a kernel nfs, lockd or rpcbind client, then they can guarantee no collisions by redefining the 'privileged ports' available to sunrpc to any arbitrary range <portnr1> - <portnr2>: Either * Reserve the ports at module load time, by adding a line to a config file /etc/modprobe.d/foo.conf of the form options sunrpc min_resvport=<portnr1> max_resvport=<portnr2> or * Change the port reservation after module load (but before mounting your first NFS filesystem) using echo '<portnr1>' > /sys/module/sunrpc/parameters/min_resvport echo '<portnr2>' > /sys/module/sunrpc/parameters/max_resvport This is something users ought to be doing already if they need guaranteed availability of specific ports in the range 665-1023 while using the NFS client. That is entirely independent of whether or not they are using nconnect.
Trond, Anna, If it's not too late then I'd like to re-tract this patch. I understand if somebody else would really like to have this functionality in then please speak up. If it's too late to pull I also understand but I need to at least try. Thank you. On Thu, Jan 16, 2020 at 2:09 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > From: Olga Kornievskaia <kolga@netapp.com> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > --- > fs/nfs/nfs4client.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > index 460d625..4df3fb0 100644 > --- a/fs/nfs/nfs4client.c > +++ b/fs/nfs/nfs4client.c > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server *server, > > if (minorversion == 0) > __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); > - else if (proto == XPRT_TRANSPORT_TCP) > + if (proto == XPRT_TRANSPORT_TCP) > cl_init.nconnect = nconnect; > > if (server->flags & NFS_MOUNT_NORESVPORT) > -- > 1.8.3.1 >
On Fri, Jan 17, 2020 at 09:16:54PM +0000, Schumaker, Anna wrote: > On Fri, 2020-01-17 at 21:14 +0000, Trond Myklebust wrote: > > On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote: > > > Hi Olga, > > > > > > On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote: > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > Have you done any testing with nconnect and the v4.0 replay caches? I > > > did some > > > digging on the mailing list and found this in one of the cover > > > letters from > > > Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now; > > > I don't > > > feel comfortable subjecting NFSv3/v4 replay caches to this treatment > > > yet." > > > > > > > That comment should be considered obsolete. The current code works hard > > to ensure that we replay using the same connection (or at least the > > same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected. > > For that reason we've had NFSv3 support since the feature was merged. > > The NFSv4.0 support was just forgotten. > > Thanks for the explanation! I'll add the patch. What happened to this patch? As far as I can tell, the conclusion of this thread was that it should be applied. --b. > > Anna > > > > > > Thanks, > > > Anna > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > --- > > > > fs/nfs/nfs4client.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > > > > index 460d625..4df3fb0 100644 > > > > --- a/fs/nfs/nfs4client.c > > > > +++ b/fs/nfs/nfs4client.c > > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server > > > > *server, > > > > > > > > if (minorversion == 0) > > > > __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); > > > > - else if (proto == XPRT_TRANSPORT_TCP) > > > > + if (proto == XPRT_TRANSPORT_TCP) > > > > cl_init.nconnect = nconnect; > > > > > > > > if (server->flags & NFS_MOUNT_NORESVPORT) > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@hammerspace.com > > > >
J. Bruce Fields wrote: >On Fri, Jan 17, 2020 at 09:16:54PM +0000, Schumaker, Anna wrote: >> On Fri, 2020-01-17 at 21:14 +0000, Trond Myklebust wrote: >> > On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote: >> > > Hi Olga, >> > > >> > > On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote: >> > > > From: Olga Kornievskaia <kolga@netapp.com> >> > > >> > > Have you done any testing with nconnect and the v4.0 replay caches? I >> > > did some >> > > digging on the mailing list and found this in one of the cover >> > > letters from >> > > Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now; >> > > I don't >> > > feel comfortable subjecting NFSv3/v4 replay caches to this treatment >> > > yet." >> > > >> > >> > That comment should be considered obsolete. The current code works hard >> > to ensure that we replay using the same connection (or at least the >> > same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected. >> > For that reason we've had NFSv3 support since the feature was merged. >> > The NFSv4.0 support was just forgotten. >> >> Thanks for the explanation! I'll add the patch. Maybe I am misinterpreting this discussion and, if so, please just ignore these comments. Here are two snippets from RFC7530: In Sec. 3.1. Where an NFSv4 implementation supports operation over the IP network protocol, the supported transport layer between NFS and IP MUST be an IETF standardized transport protocol that is specified to avoid network congestion; such transports include TCP and the Stream Control Transmission Protocol (SCTP). To enhance the possibilities for interoperability, an NFSv4 implementation MUST support operation over the TCP transport protocol. This clearly states that NFSv4.0 cannot operate over UDP. (See the MUST above.) In Sec. 3.1.1. When processing an NFSv4 request received over a reliable transport such as TCP, the NFSv4 server MUST NOT silently drop the request, except if the established transport connection has been broken. Given such a contract between NFSv4 clients and servers, clients MUST NOT retry a request unless one or both of the following are true: o The transport connection has been broken o The procedure being retried is the NULL procedure If the TCP connection is broken, making a new TCP connection on the same port #s would be risky, unless a long delay is done. Normally a different port# would be used and the implication of the above is that any retry of an RPC (above the TCP layer retransmits of segments) will normally be from a different port#. Long ago I played around with a DRC for TCP (which ended up in FreeBSD) and because retries happen after a long timeout and there can be many other RPCs done in between the first attempt for the RPC and a subsequent retry of the RPC, the design must be very different than a DRC for UDP. --> LRU caching doesn't work well unless the cache size is very large. --> For NFSv4.0 over TCP (not NFSv3), the DRC must assume (or at least handle) retries from a different client port#, since they will be done on a different TCP connection and that will almost inevitably imply a different port#. rick What happened to this patch? As far as I can tell, the conclusion of this thread was that it should be applied. --b. > > Anna > > > > > > Thanks, > > > Anna > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > --- > > > > fs/nfs/nfs4client.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > > > > index 460d625..4df3fb0 100644 > > > > --- a/fs/nfs/nfs4client.c > > > > +++ b/fs/nfs/nfs4client.c > > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server > > > > *server, > > > > > > > > if (minorversion == 0) > > > > __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); > > > > - else if (proto == XPRT_TRANSPORT_TCP) > > > > + if (proto == XPRT_TRANSPORT_TCP) > > > > cl_init.nconnect = nconnect; > > > > > > > > if (server->flags & NFS_MOUNT_NORESVPORT) > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@hammerspace.com > > > >
On Thu, Jun 11, 2020 at 4:09 PM J. Bruce Fields <bfields@fieldses.org> wrote: > > On Fri, Jan 17, 2020 at 09:16:54PM +0000, Schumaker, Anna wrote: > > On Fri, 2020-01-17 at 21:14 +0000, Trond Myklebust wrote: > > > On Fri, 2020-01-17 at 21:09 +0000, Schumaker, Anna wrote: > > > > Hi Olga, > > > > > > > > On Thu, 2020-01-16 at 14:08 -0500, Olga Kornievskaia wrote: > > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > Have you done any testing with nconnect and the v4.0 replay caches? I > > > > did some > > > > digging on the mailing list and found this in one of the cover > > > > letters from > > > > Trond: "The feature is only enabled for NFSv4.1 and NFSv4.2 for now; > > > > I don't > > > > feel comfortable subjecting NFSv3/v4 replay caches to this treatment > > > > yet." > > > > > > > > > > That comment should be considered obsolete. The current code works hard > > > to ensure that we replay using the same connection (or at least the > > > same source/dest IP+ports) so that NFSv3/v4.0 DRCs work as expected. > > > For that reason we've had NFSv3 support since the feature was merged. > > > The NFSv4.0 support was just forgotten. > > > > Thanks for the explanation! I'll add the patch. > > What happened to this patch? As far as I can tell, the conclusion of > this thread was that it should be applied. I decided not to submit this patch but anybody else is free to add that patch to add support for 4.0 nconnect as there is no reason it shouldn't be supported. > > --b. > > > > > Anna > > > > > > > > > Thanks, > > > > Anna > > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > > --- > > > > > fs/nfs/nfs4client.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c > > > > > index 460d625..4df3fb0 100644 > > > > > --- a/fs/nfs/nfs4client.c > > > > > +++ b/fs/nfs/nfs4client.c > > > > > @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server > > > > > *server, > > > > > > > > > > if (minorversion == 0) > > > > > __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); > > > > > - else if (proto == XPRT_TRANSPORT_TCP) > > > > > + if (proto == XPRT_TRANSPORT_TCP) > > > > > cl_init.nconnect = nconnect; > > > > > > > > > > if (server->flags & NFS_MOUNT_NORESVPORT) > > > -- > > > Trond Myklebust > > > Linux NFS client maintainer, Hammerspace > > > trond.myklebust@hammerspace.com > > > > > >
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index 460d625..4df3fb0 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -881,7 +881,7 @@ static int nfs4_set_client(struct nfs_server *server, if (minorversion == 0) __set_bit(NFS_CS_REUSEPORT, &cl_init.init_flags); - else if (proto == XPRT_TRANSPORT_TCP) + if (proto == XPRT_TRANSPORT_TCP) cl_init.nconnect = nconnect; if (server->flags & NFS_MOUNT_NORESVPORT)