diff mbox series

[v2] nfsd: Don't fail OP_SETCLIENTID when there are too many clients.

Message ID 172972144286.81717.3023946721770566532@noble.neil.brown.name (mailing list archive)
State New
Headers show
Series [v2] nfsd: Don't fail OP_SETCLIENTID when there are too many clients. | expand

Commit Message

NeilBrown Oct. 23, 2024, 10:10 p.m. UTC
Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is
memory allocation failure.  Putting a hard limit on the number of
clients is really helpful as it will either happen too early and prevent
clients that the server can easily handle, or too late and allow clients
when the server is swamped.

The calculated limit is still useful for expiring courtesy clients where
there are "too many" clients, but it shouldn't prevent the creation of
active clients.

Testing of lots of clients against small-mem servers reports repeated
NFS4ERR_DELAY responses which doesn't seem helpful.  There may have been
reports of similar problems in production use.

Also remove an outdated comment - we do use a slab cache.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4state.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

Comments

Chuck Lever Oct. 24, 2024, 1:31 p.m. UTC | #1
On Thu, Oct 24, 2024 at 09:10:42AM +1100, NeilBrown wrote:
> 
> Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is
> memory allocation failure.  Putting a hard limit on the number of
> clients is really helpful as it will either happen too early and prevent

Do you mean "is not really helpful" ?

I recall that Dai had suggested something similar in this area,
while he was working on the courteous server patches. Dai, do you
have any comments about this change? If not, can you send a R-b ?


> clients that the server can easily handle, or too late and allow clients
> when the server is swamped.
> 
> The calculated limit is still useful for expiring courtesy clients where
> there are "too many" clients, but it shouldn't prevent the creation of
> active clients.
> 
> Testing of lots of clients against small-mem servers reports repeated
> NFS4ERR_DELAY responses which doesn't seem helpful.  There may have been
> reports of similar problems in production use.
> 
> Also remove an outdated comment - we do use a slab cache.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4state.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index d585c267731b..0791a43b19e6 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -2218,21 +2218,16 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
>  	return 1;
>  }
>  
> -/* 
> - * XXX Should we use a slab cache ?
> - * This type of memory management is somewhat inefficient, but we use it
> - * anyway since SETCLIENTID is not a common operation.
> - */
>  static struct nfs4_client *alloc_client(struct xdr_netobj name,
>  				struct nfsd_net *nn)
>  {
>  	struct nfs4_client *clp;
>  	int i;
>  
> -	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) {
> +	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients &&
> +	    atomic_read(&nn->nfsd_courtesy_clients) > 0)
>  		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> -		return NULL;
> -	}
> +
>  	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
>  	if (clp == NULL)
>  		return NULL;
> -- 
> 2.46.0
>
Jeff Layton Oct. 24, 2024, 3:34 p.m. UTC | #2
On Thu, 2024-10-24 at 09:10 +1100, NeilBrown wrote:
> Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is
> memory allocation failure.  Putting a hard limit on the number of
> clients is really helpful as it will either happen too early and prevent

"unhelpful" ?

> clients that the server can easily handle, or too late and allow clients
> when the server is swamped.
> 
> The calculated limit is still useful for expiring courtesy clients where
> there are "too many" clients, but it shouldn't prevent the creation of
> active clients.
> 
> Testing of lots of clients against small-mem servers reports repeated
> NFS4ERR_DELAY responses which doesn't seem helpful.  There may have been
> reports of similar problems in production use.
> 
> Also remove an outdated comment - we do use a slab cache.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4state.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index d585c267731b..0791a43b19e6 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -2218,21 +2218,16 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
>  	return 1;
>  }
>  
> -/* 
> - * XXX Should we use a slab cache ?
> - * This type of memory management is somewhat inefficient, but we use it
> - * anyway since SETCLIENTID is not a common operation.
> - */
>  static struct nfs4_client *alloc_client(struct xdr_netobj name,
>  				struct nfsd_net *nn)
>  {
>  	struct nfs4_client *clp;
>  	int i;
>  
> -	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) {
> +	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients &&
> +	    atomic_read(&nn->nfsd_courtesy_clients) > 0)
>  		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
> -		return NULL;
> -	}
> +
>  	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
>  	if (clp == NULL)
>  		return NULL;

Do we even need to check nn->nfs4_max_clients at all?

Maybe we should just kick the laundromat whenever 
nfsd_courtesy_clients > 0. I would suggest just removing
nfs4_max_clients altogether, but it looks like
nfs4_get_client_reaplist() uses it and it's not clear to me what would
better replace it.
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d585c267731b..0791a43b19e6 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2218,21 +2218,16 @@  STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn)
 	return 1;
 }
 
-/* 
- * XXX Should we use a slab cache ?
- * This type of memory management is somewhat inefficient, but we use it
- * anyway since SETCLIENTID is not a common operation.
- */
 static struct nfs4_client *alloc_client(struct xdr_netobj name,
 				struct nfsd_net *nn)
 {
 	struct nfs4_client *clp;
 	int i;
 
-	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) {
+	if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients &&
+	    atomic_read(&nn->nfsd_courtesy_clients) > 0)
 		mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
-		return NULL;
-	}
+
 	clp = kmem_cache_zalloc(client_slab, GFP_KERNEL);
 	if (clp == NULL)
 		return NULL;