Message ID | 20240405-rhel-31513-v2-1-b0f6c10be929@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] nfsd: hold a lighter-weight client reference over CB_RECALL_ANY | expand |
On Fri, Apr 05, 2024 at 01:56:18PM -0400, Jeff Layton wrote: > Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the > client. While a callback job is technically an RPC that counter is > really more for client-driven RPCs, and this has the effect of > preventing the client from being unhashed until the callback completes. > > If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we > can end up in a situation where the callback can't complete on the (now > dead) callback channel, but the new client can't connect because the old > client can't be unhashed. This usually manifests as a NFS4ERR_DELAY > return on the CREATE_SESSION operation. > > The job is only holding a reference to the client so it can clear a flag > in the after the RPC completes. Fix this by having CB_RECALL_ANY instead > hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that > sort of reference when dealing with the nfsdfs info files, but it should > work appropriately here to ensure that the nfs4_client doesn't > disappear. > > Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") > Reported-by: Vladimir Benes <vbenes@redhat.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> Applied to nfsd-fixes while waiting for review and testing. Thanks! > --- > Changes in v2: > - Clean up the changelog > - Add Fixes: tag > - Use kref_get instead of kref_get_unless_zero > --- > fs/nfsd/nfs4state.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index 5fcd93f7cb8c..3cef81e196c6 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -3042,12 +3042,9 @@ static void > nfsd4_cb_recall_any_release(struct nfsd4_callback *cb) > { > struct nfs4_client *clp = cb->cb_clp; > - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); > > - spin_lock(&nn->client_lock); > clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); > - put_client_renew_locked(clp); > - spin_unlock(&nn->client_lock); > + drop_client(clp); > } > > static int > @@ -6616,7 +6613,7 @@ deleg_reaper(struct nfsd_net *nn) > list_add(&clp->cl_ra_cblist, &cblist); > > /* release in nfsd4_cb_recall_any_release */ > - atomic_inc(&clp->cl_rpc_users); > + kref_get(&clp->cl_nfsdfs.cl_ref); > set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); > clp->cl_ra_time = ktime_get_boottime_seconds(); > } > > --- > base-commit: 05258a0a69b3c5d2c003f818702c0a52b6fea861 > change-id: 20240405-rhel-31513-028ab6f14252 > > Best regards, > -- > Jeff Layton <jlayton@kernel.org> > >
On Fri, 2024-04-05 at 14:07 -0400, Chuck Lever wrote: > On Fri, Apr 05, 2024 at 01:56:18PM -0400, Jeff Layton wrote: > > Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to > > the > > client. While a callback job is technically an RPC that counter is > > really more for client-driven RPCs, and this has the effect of > > preventing the client from being unhashed until the callback > > completes. > > > > If nfsd decides to send a CB_RECALL_ANY just as the client reboots, > > we > > can end up in a situation where the callback can't complete on the > > (now > > dead) callback channel, but the new client can't connect because > > the old > > client can't be unhashed. This usually manifests as a NFS4ERR_DELAY > > return on the CREATE_SESSION operation. > > > > The job is only holding a reference to the client so it can clear a > > flag > > in the after the RPC completes. Fix this by having CB_RECALL_ANY > > instead > > hold a reference to the cl_nfsdfs.cl_ref. Typically we only take > > that > > sort of reference when dealing with the nfsdfs info files, but it > > should > > work appropriately here to ensure that the nfs4_client doesn't > > disappear. > > > > Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low > > memory condition") > > Reported-by: Vladimir Benes <vbenes@redhat.com> > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > Applied to nfsd-fixes while waiting for review and testing. Thanks! > > > > --- > > Changes in v2: > > - Clean up the changelog > > - Add Fixes: tag > > - Use kref_get instead of kref_get_unless_zero > > --- > > fs/nfsd/nfs4state.c | 7 ++----- > > 1 file changed, 2 insertions(+), 5 deletions(-) > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index 5fcd93f7cb8c..3cef81e196c6 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -3042,12 +3042,9 @@ static void > > nfsd4_cb_recall_any_release(struct nfsd4_callback *cb) > > { > > struct nfs4_client *clp = cb->cb_clp; > > - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); > > > > - spin_lock(&nn->client_lock); > > clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); > > - put_client_renew_locked(clp); > > - spin_unlock(&nn->client_lock); > > + drop_client(clp); > > } > > > > static int > > @@ -6616,7 +6613,7 @@ deleg_reaper(struct nfsd_net *nn) > > list_add(&clp->cl_ra_cblist, &cblist); > > > > /* release in nfsd4_cb_recall_any_release */ > > - atomic_inc(&clp->cl_rpc_users); > > + kref_get(&clp->cl_nfsdfs.cl_ref); > > set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp- > > >cl_flags); > > clp->cl_ra_time = ktime_get_boottime_seconds(); > > } > > > > --- > > base-commit: 05258a0a69b3c5d2c003f818702c0a52b6fea861 > > change-id: 20240405-rhel-31513-028ab6f14252 > > > > Best regards, > > -- > > Jeff Layton <jlayton@kernel.org> > > > > > Hi, I've just finished the testing of the new patch on the same HW configuration and the dracut test suite is stable again. Thank you for your patches! Vladimir Benes Tested-by: Vladimir Benes <vbenes@redhat.com>
On Fri, 5 Apr 2024 at 20:07, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Fri, Apr 05, 2024 at 01:56:18PM -0400, Jeff Layton wrote: > > Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the > > client. While a callback job is technically an RPC that counter is > > really more for client-driven RPCs, and this has the effect of > > preventing the client from being unhashed until the callback completes. > > > > If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we > > can end up in a situation where the callback can't complete on the (now > > dead) callback channel, but the new client can't connect because the old > > client can't be unhashed. This usually manifests as a NFS4ERR_DELAY > > return on the CREATE_SESSION operation. > > > > The job is only holding a reference to the client so it can clear a flag > > in the after the RPC completes. Fix this by having CB_RECALL_ANY instead > > hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that > > sort of reference when dealing with the nfsdfs info files, but it should > > work appropriately here to ensure that the nfs4_client doesn't > > disappear. > > > > Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") > > Reported-by: Vladimir Benes <vbenes@redhat.com> > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > Applied to nfsd-fixes while waiting for review and testing. Thanks! Please add this to the 6.6 LTS brach, too Ced
On Tue, Apr 09, 2024 at 07:24:17PM +0200, Rik Theys wrote: > Hi, > > On 4/5/24 20:07, Chuck Lever wrote: > > On Fri, Apr 05, 2024 at 01:56:18PM -0400, Jeff Layton wrote: > > > Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the > > > client. While a callback job is technically an RPC that counter is > > > really more for client-driven RPCs, and this has the effect of > > > preventing the client from being unhashed until the callback completes. > > > > > > If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we > > > can end up in a situation where the callback can't complete on the (now > > > dead) callback channel, but the new client can't connect because the old > > > client can't be unhashed. This usually manifests as a NFS4ERR_DELAY > > > return on the CREATE_SESSION operation. > > > > > > The job is only holding a reference to the client so it can clear a flag > > > in the after the RPC completes. Fix this by having CB_RECALL_ANY instead > > > hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that > > > sort of reference when dealing with the nfsdfs info files, but it should > > > work appropriately here to ensure that the nfs4_client doesn't > > > disappear. > > > > > > Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") > > > Reported-by: Vladimir Benes<vbenes@redhat.com> > > > Signed-off-by: Jeff Layton<jlayton@kernel.org> > > Applied to nfsd-fixes while waiting for review and testing. Thanks! > > > Can this fix also be included in the 6.1.x LTS kernel? Given that "NFSD: add > delegation reaper to react to low memory condition" was added to 6.1.81, it > would be nice to have this fix in the 6.1 series. > > This way it will also be picked up by Debian at some point (it seems they > are upgrading to 6.1.82 for their next stable point release). Thanks to the Fixes: tag in the commit's description, it will probably appear in a later release of linux-6.1.y automatically. If we don't see it the next few weeks, I will ping the stable maintainers.
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 5fcd93f7cb8c..3cef81e196c6 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3042,12 +3042,9 @@ static void nfsd4_cb_recall_any_release(struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); - spin_lock(&nn->client_lock); clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); - put_client_renew_locked(clp); - spin_unlock(&nn->client_lock); + drop_client(clp); } static int @@ -6616,7 +6613,7 @@ deleg_reaper(struct nfsd_net *nn) list_add(&clp->cl_ra_cblist, &cblist); /* release in nfsd4_cb_recall_any_release */ - atomic_inc(&clp->cl_rpc_users); + kref_get(&clp->cl_nfsdfs.cl_ref); set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); clp->cl_ra_time = ktime_get_boottime_seconds(); }
Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the client. While a callback job is technically an RPC that counter is really more for client-driven RPCs, and this has the effect of preventing the client from being unhashed until the callback completes. If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we can end up in a situation where the callback can't complete on the (now dead) callback channel, but the new client can't connect because the old client can't be unhashed. This usually manifests as a NFS4ERR_DELAY return on the CREATE_SESSION operation. The job is only holding a reference to the client so it can clear a flag in the after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of reference when dealing with the nfsdfs info files, but it should work appropriately here to ensure that the nfs4_client doesn't disappear. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Vladimir Benes <vbenes@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> --- Changes in v2: - Clean up the changelog - Add Fixes: tag - Use kref_get instead of kref_get_unless_zero --- fs/nfsd/nfs4state.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) --- base-commit: 05258a0a69b3c5d2c003f818702c0a52b6fea861 change-id: 20240405-rhel-31513-028ab6f14252 Best regards,