Message ID | 056501d5b437$91f1c6c0$b5d55440$@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NFSv4: nfs4_do_fsinfo() should not do implicit lease renewals | expand |
On Mon, 2019-12-16 at 17:38 +0000, Robert Milkowski wrote: > From: Robert Milkowski <rmilkowski@gmail.com> > > Currently, each time nfs4_do_fsinfo() is called it will do an > implicit > NFS4 lease renewal, which is not compliant with the NFS4 > specification. > This can result in a lease being expired by NFS server which will > then > return NFS4ERR_EXPIRED or NFS4ERR_STALE_CLIENTID. > > Signed-off-by: Robert Milkowski <rmilkowski@gmail.com> > --- > fs/nfs/nfs4proc.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 76d3716..aad65dd 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -5019,16 +5019,19 @@ static int nfs4_do_fsinfo(struct nfs_server > *server, > struct nfs_fh *fhandle, str > struct nfs4_exception exception = { > .interruptible = true, > }; > - unsigned long now = jiffies; > + unsigned long last_renewal = jiffies; > int err; > > do { > err = _nfs4_do_fsinfo(server, fhandle, fsinfo); > trace_nfs4_fsinfo(server, fhandle, fsinfo->fattr, err); > if (err == 0) { > + /* no implicit lease renewal allowed here */ > + if (server->nfs_client->cl_last_renewal != 0) > + last_renewal = > server->nfs_client->cl_last_renewal; > nfs4_set_lease_period(server->nfs_client, > fsinfo->lease_time * HZ, > - now); > + last_renewal); > break; > } > err = nfs4_handle_exception(server, err, &exception); NACK. The above argument only applies to legacy minor version 0 setups, and does not apply to NFSv4.1 or newer.
Hi, If a sub-filesystem (nfsv4 mirror mount) gets unmounted and then mounted again (by accessing it) the nfs4_do_fsinfo() function is called, which currently assumes implicit lease renewal. I believe this is no compliant with the RFC. I've managed to trigger the issue by two different methods: 1) in prod If there is an NFSv4 filesystem mounted with sub-mounts (similar setup as below), after nfs_mountpoint_expiry_timeout of inactivity, Each submount will be unmounted. If now it gets accessed again it will be automatically mounted again which will result currently in implicit lease renewal on the client side which in turn can result in a relatively small window where the client thinks its lease is still valid while an nfs server has already expired the lease. 2) manual unmount # cat /etc/exports / *(rw,sync) /var *(rw,sync) On a Linux NFS client: # mount -o vers=4 10.50.2.59:/ /mnt/3 $ head /mnt/3/var/log/vmware-vmsvc.log >/dev/null $ df -h | tail -2 10.50.2.59:/ 29G 25G 2.3G 92% /mnt/3 10.50.2.59:/var 20G 2.2G 16G 13% /mnt/3/var # while [ 1 ]; do date; umount /mnt/3/var; ls /mnt/3/var >/dev/null; sleep 10; done ... By constantly unmounting the sub-filesystem (/var) and then accessing it so it gets mounted again (which triggers the nfs4_do_fsinfo()), the cl_last_renewal is set to now on the client which prevents RENEW operations from being send, and eventually the NFS server will expire the lease (common defaults are 60s on Linux and 90s on Solaris servers). In testing I confirmed that both Linux and Solaris NFSv4 servers will not do an implicit lease renewal in this case (nfs4_do_fsinfo() results in GETATTR operations being send), in which case the lease might expire and both Linux and Solaris NFS servers will return NFS4ERR_EXPIRED. The error is not handled correctly either and will result in EIO propagated to an application issuing open(). See my other email with subject: [PATCH] NFSv4: open() should try lease recovery on NFS4ERR_EXPIRED which contains a fix for the NFS4ERR_EXPIRED handling. This patch however fixes the issue with implicit lease renewal by not setting the cl_last_renewal to now, unless there is no lease set yet. This was tested with 5.5.0-rc2 and the provided patch is applied on top of the 5.5.0-rc2 as well. btw: Recent ONTAP versions return NFS4ERR_STALE_CLIENTID which is handled correctly - Linux client will try to renew its lease and if successful it will retry open(). Best regards, Robert Milkowski
>btw: Recent ONTAP versions return NFS4ERR_STALE_CLIENTID which is handled
correctly - Linux client will try to renew its lease and if successful it
will retry open().
One more comment here, although the issue won't manifest to applications
with NetApp fileservers in this case, it is still wrong for the client to
end up in such a state.
Also, you will need to wait longer (keep unmounting/mounting) with NetApp
filers as they seem to have implemented courtesy locks/delayed lease
expiration as per the RFC.
This means to reproduce the issue against NetApp you need to wait at least
2x grace period + 1 minute).
Best regards,
Robert Milkowski
> From: Trond Myklebust <trondmy@hammerspace.com> ... >NACK. The above argument only applies to legacy minor version 0 setups, and does not apply to NFSv4.1 or newer. Correct. However many sites still use v4.0. Best regards, Robert Milkowski
-On Mon, 2019-12-16 at 18:43 +0000, Robert Milkowski wrote: > > From: Trond Myklebust <trondmy@hammerspace.com> > ... > > NACK. The above argument only applies to legacy minor version 0 > > setups, and does not apply to NFSv4.1 or newer. > > Correct. However many sites still use v4.0. > That's not a good reason to break code that works just fine for NFSv4.1. It would be better to move the initialisation of clp->cl_last_renewal into nfs4_init_clientid() and nfs41_init_clientid() (after the calls to nfs4_proc_setclientid_confirm() and nfs4_proc_create_session() respectively).
> On Dec 16, 2019, at 1:43 PM, Robert Milkowski <rmilkowski@gmail.com> wrote: > > > >> From: Trond Myklebust <trondmy@hammerspace.com> > ... >> NACK. The above argument only applies to legacy minor version 0 setups, and does not apply to NFSv4.1 or newer. > > Correct. However many sites still use v4.0. v4.1 and newer actually do renew the lease with the FSINFO COMPOUND, since the COMPOUND starts with a SEQUENCE operation. For v4.0, you can get the equivalent behavior by adding a RENEW to the end of the FSINFO COMPOUND. -- Chuck Lever
On Mon, 16 Dec 2019 at 18:58, Trond Myklebust <trondmy@hammerspace.com> wrote: > > -On Mon, 2019-12-16 at 18:43 +0000, Robert Milkowski wrote: > > > From: Trond Myklebust <trondmy@hammerspace.com> > > ... > > > NACK. The above argument only applies to legacy minor version 0 > > > setups, and does not apply to NFSv4.1 or newer. > > > > Correct. However many sites still use v4.0. > > > > That's not a good reason to break code that works just fine for > NFSv4.1. > Of course not, that's not what I meant and I misunderstood your nack too. > It would be better to move the initialisation of clp->cl_last_renewal > into nfs4_init_clientid() and nfs41_init_clientid() (after the calls to > nfs4_proc_setclientid_confirm() and nfs4_proc_create_session() > respectively). > This could be done but this is potentially a separate change, as in nfs4_do_fsinfo() we still need to make sure we do not implicitly renew lease for v4.0, so I think the patch needs to be modified as: ... + /* no implicit lease renewal allowed here for v4.0 */ + if (server->nfs_client->cl_minorversion == 0 && server->nfs_client->cl_last_renewal != 0) + last_renewal = server->nfs_client->cl_last_renewal; nfs4_set_lease_period(server->nfs_client, fsinfo->lease_time * HZ, - now); + last_renewal); ... This way it won't affect newer nfs versions (I don't think it affected them in any bad way, still it was unintended and not correct). Agree with the above change? btw: Chuck, thanks for the hint regarding the SEQUENCE op on v4.1+
On Tue, 17 Dec 2019 at 18:12, Robert Milkowski <rmilkowski@gmail.com> wrote: > > On Mon, 16 Dec 2019 at 18:58, Trond Myklebust <trondmy@hammerspace.com> wrote: > > > > It would be better to move the initialisation of clp->cl_last_renewal > > into nfs4_init_clientid() and nfs41_init_clientid() (after the calls to > > nfs4_proc_setclientid_confirm() and nfs4_proc_create_session() > > respectively). > > > > This could be done but this is potentially a separate change, as in > nfs4_do_fsinfo() we still need to > make sure we do not implicitly renew lease for v4.0, so I think the > patch needs to be modified as: I took a look at the client initialization and the nfs4_init_clientid() already calls nfs4_setup_state_renewal(), which in turn calls nfs4_proc_get_lease_time(), however that might return with an error in which case it won't set the cl_last_renewal, the code is: static int nfs4_setup_state_renewal(struct nfs_client *clp) ... status = nfs4_proc_get_lease_time(clp, &fsinfo); if (status == 0) { nfs4_set_lease_period(clp, fsinfo.lease_time * HZ, now); ... In my environment with nfsv4+krb the nfs4_proc_get_lease_time() returns with -10016 (NFS4ERR_WRONGSEC) during initial mount (which after a quick scan of the rfc seems that might be ok initially). Also for v4.1 do_renew_lease() is called by nfs41_sequence_process() multiple times during mount before we even reach nfs4_do_fsinfo() so for 4.1+ it never gets cl_last_renewal==0, at least not under this circumstances. There might be other cases where nfs4_do_fsinfo() will get clp->cl_last_renewal=0 for nfsv4.0, and since the nfs4_do_fsinfo() already initializes the cl_last_renewal record, plus as Chuck pointed out in case of v4.1+ it is expected to implicitly renew the lease anyway, I would rather focus on fixing only the reported issue here for now, which is the incorrect implicit renewal in fsinfo for v4.0, and leave improvements to client initialization for another day. I will send an updated patch shortly which doesn't impact v4.1+.
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 76d3716..aad65dd 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5019,16 +5019,19 @@ static int nfs4_do_fsinfo(struct nfs_server *server, struct nfs_fh *fhandle, str struct nfs4_exception exception = { .interruptible = true, }; - unsigned long now = jiffies; + unsigned long last_renewal = jiffies; int err; do { err = _nfs4_do_fsinfo(server, fhandle, fsinfo); trace_nfs4_fsinfo(server, fhandle, fsinfo->fattr, err); if (err == 0) { + /* no implicit lease renewal allowed here */ + if (server->nfs_client->cl_last_renewal != 0) + last_renewal = server->nfs_client->cl_last_renewal; nfs4_set_lease_period(server->nfs_client, fsinfo->lease_time * HZ, - now); + last_renewal); break;