diff mbox series

[1/1] NFSD: send OP_CB_RECALL_ANY to clients when number of delegations reach a threshold

Message ID 1708192859-25002-1-git-send-email-dai.ngo@oracle.com (mailing list archive)
State New
Headers show
Series [1/1] NFSD: send OP_CB_RECALL_ANY to clients when number of delegations reach a threshold | expand

Commit Message

Dai Ngo Feb. 17, 2024, 6 p.m. UTC
When the number of granted delegation becomes large, there are some
undesire effects happen on the NFS server. Besides the consumption
of system resources, the number of entries on the linked lists of the
file cache can grow significantly.

When this condition happens, the NFS performace grounds to a halt as
reported here [1].

This patch attempts to alleviate this problem by asking the clients to
voluntarily return any unused delegations when the number of delegation
reaches 3/4 of the max_delegations by sending OP_CB_RECALL_ANY to all
clients that hold delegations.

[1] https://lore.kernel.org/all/PH0PR14MB5493F59229B802B871407F9CAA442@PH0PR14MB5493.namprd14.prod.outlook.com

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/nfsd/nfs4state.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Chuck Lever Feb. 17, 2024, 9:09 p.m. UTC | #1
On Sat, Feb 17, 2024 at 10:00:59AM -0800, Dai Ngo wrote:
> When the number of granted delegation becomes large, there are some
> undesire effects happen on the NFS server. Besides the consumption
> of system resources, the number of entries on the linked lists of the
> file cache can grow significantly.
> 
> When this condition happens, the NFS performace grounds to a halt as
> reported here [1].

That was a v5.15.131 kernel. The LRU problems were addressed in
v6.2. This doesn't seem like a clean rationale for adding this
reaper behavior in, say, v6.9.


> This patch attempts to alleviate this problem by asking the clients to
> voluntarily return any unused delegations when the number of delegation
> reaches 3/4 of the max_delegations by sending OP_CB_RECALL_ANY to all
> clients that hold delegations.

I don't have a strong sense of how big max_delegations can get. Is
there evidence that CB_RECALL_ANY has enough impact, reliably, to
reduce the size of the filecache?

More below.


> [1] https://lore.kernel.org/all/PH0PR14MB5493F59229B802B871407F9CAA442@PH0PR14MB5493.namprd14.prod.outlook.com
> 
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  fs/nfsd/nfs4state.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index fdc95bfbfbb6..5fb83853533f 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -130,6 +130,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
>  static const struct nfsd4_callback_ops nfsd4_cb_getattr_ops;
>  
>  static struct workqueue_struct *laundry_wq;
> +static void deleg_reaper(struct nfsd_net *nn);
>  
>  int nfsd4_create_laundry_wq(void)
>  {
> @@ -696,6 +697,9 @@ static struct nfsd_file *find_any_file_locked(struct nfs4_file *f)
>  static atomic_long_t num_delegations;
>  unsigned long max_delegations;
>  
> +/* threshold to trigger deleg_reaper */
> +static unsigned long delegations_soft_limit;
> +
>  /*
>   * Open owner state (share locks)
>   */
> @@ -6466,6 +6470,7 @@ nfs4_laundromat(struct nfsd_net *nn)
>  	struct nfs4_cpntf_state *cps;
>  	copy_stateid_t *cps_t;
>  	int i;
> +	long n;
>  
>  	if (clients_still_reclaiming(nn)) {
>  		lt.new_timeo = 0;
> @@ -6550,6 +6555,9 @@ nfs4_laundromat(struct nfsd_net *nn)
>  	/* service the server-to-server copy delayed unmount list */
>  	nfsd4_ssc_expire_umount(nn);
>  #endif
> +	n = atomic_long_inc_return(&num_delegations);

I don't think you want to modify the number of delegations here.
atomic_long_read() instead?


> +	if (n > delegations_soft_limit)

This introduces a mixed-sign comparison: n is a long, but
delegations_soft_limit is an unsigned long. I'm always suspicious
about whether an atomic counter can underflow, and then we have
real problems when there are mixed-sign comparisons.

But I'm also wondering if, instead, this logic should look directly
at the length of the filecache LRU.


> +		deleg_reaper(nn);
>  out:
>  	return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT);
>  }
> @@ -8482,6 +8490,7 @@ set_max_delegations(void)
>  	 * giving a worst case usage of about 6% of memory.
>  	 */
>  	max_delegations = nr_free_buffer_pages() >> (20 - 2 - PAGE_SHIFT);
> +	delegations_soft_limit = (max_delegations / 4) * 3;

I don't see a strong reason to keep delegations_soft_limit as a
separate variable.


>  }
>  
>  static int nfs4_state_create_net(struct net *net)
> -- 
> 2.39.3
>
Dai Ngo Feb. 17, 2024, 9:27 p.m. UTC | #2
On 2/17/24 1:09 PM, Chuck Lever wrote:
> On Sat, Feb 17, 2024 at 10:00:59AM -0800, Dai Ngo wrote:
>> When the number of granted delegation becomes large, there are some
>> undesire effects happen on the NFS server. Besides the consumption
>> of system resources, the number of entries on the linked lists of the
>> file cache can grow significantly.
>>
>> When this condition happens, the NFS performace grounds to a halt as
>> reported here [1].
> That was a v5.15.131 kernel. The LRU problems were addressed in
> v6.2. This doesn't seem like a clean rationale for adding this
> reaper behavior in, say, v6.9.

Do you know what commits in v6.9 fix this problem?

Regardless, I think when the number of delegations is getting large,
it's good to ask the clients to release unused delegations. The
Linux client keeps unused delegations for about ~90 secs before
returning them.

>
>
>> This patch attempts to alleviate this problem by asking the clients to
>> voluntarily return any unused delegations when the number of delegation
>> reaches 3/4 of the max_delegations by sending OP_CB_RECALL_ANY to all
>> clients that hold delegations.
> I don't have a strong sense of how big max_delegations can get. Is
> there evidence that CB_RECALL_ANY has enough impact, reliably, to
> reduce the size of the filecache?

I don't have any numbers for this. But in my testing, the Linux client
returns unused delegations immediately when receiving the CB_RECALL_ANY
instead of keeping them ~90 secs.

>
> More below.
>
>
>> [1] https://lore.kernel.org/all/PH0PR14MB5493F59229B802B871407F9CAA442@PH0PR14MB5493.namprd14.prod.outlook.com
>>
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>>   fs/nfsd/nfs4state.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index fdc95bfbfbb6..5fb83853533f 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -130,6 +130,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
>>   static const struct nfsd4_callback_ops nfsd4_cb_getattr_ops;
>>   
>>   static struct workqueue_struct *laundry_wq;
>> +static void deleg_reaper(struct nfsd_net *nn);
>>   
>>   int nfsd4_create_laundry_wq(void)
>>   {
>> @@ -696,6 +697,9 @@ static struct nfsd_file *find_any_file_locked(struct nfs4_file *f)
>>   static atomic_long_t num_delegations;
>>   unsigned long max_delegations;
>>   
>> +/* threshold to trigger deleg_reaper */
>> +static unsigned long delegations_soft_limit;
>> +
>>   /*
>>    * Open owner state (share locks)
>>    */
>> @@ -6466,6 +6470,7 @@ nfs4_laundromat(struct nfsd_net *nn)
>>   	struct nfs4_cpntf_state *cps;
>>   	copy_stateid_t *cps_t;
>>   	int i;
>> +	long n;
>>   
>>   	if (clients_still_reclaiming(nn)) {
>>   		lt.new_timeo = 0;
>> @@ -6550,6 +6555,9 @@ nfs4_laundromat(struct nfsd_net *nn)
>>   	/* service the server-to-server copy delayed unmount list */
>>   	nfsd4_ssc_expire_umount(nn);
>>   #endif
>> +	n = atomic_long_inc_return(&num_delegations);
> I don't think you want to modify the number of delegations here.
> atomic_long_read() instead?

Right, it's a typo, it should be a read.

>
>
>> +	if (n > delegations_soft_limit)
> This introduces a mixed-sign comparison: n is a long, but
> delegations_soft_limit is an unsigned long. I'm always suspicious
> about whether an atomic counter can underflow, and then we have
> real problems when there are mixed-sign comparisons.

Yes, n should be unsigned long.

>
> But I'm also wondering if, instead, this logic should look directly
> at the length of the filecache LRU.

Is there a quick check; counter? otherwise if we have to walk the
list then it just compounds the problem.

>
>
>> +		deleg_reaper(nn);
>>   out:
>>   	return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT);
>>   }
>> @@ -8482,6 +8490,7 @@ set_max_delegations(void)
>>   	 * giving a worst case usage of about 6% of memory.
>>   	 */
>>   	max_delegations = nr_free_buffer_pages() >> (20 - 2 - PAGE_SHIFT);
>> +	delegations_soft_limit = (max_delegations / 4) * 3;
> I don't see a strong reason to keep delegations_soft_limit as a
> separate variable.

Just to save some CPU cycles for not to recompute the soft limit every time.
However, since this is done in the laundromat then it probably not that
critical. Also recompute the soft limit will work correctly if the max_delegations
is changed dynamically.

-Dai

>
>
>>   }
>>   
>>   static int nfs4_state_create_net(struct net *net)
>> -- 
>> 2.39.3
>>
Chuck Lever Feb. 17, 2024, 9:41 p.m. UTC | #3
On Sat, Feb 17, 2024 at 01:27:15PM -0800, dai.ngo@oracle.com wrote:
> 
> On 2/17/24 1:09 PM, Chuck Lever wrote:
> > On Sat, Feb 17, 2024 at 10:00:59AM -0800, Dai Ngo wrote:
> > > When the number of granted delegation becomes large, there are some
> > > undesire effects happen on the NFS server. Besides the consumption
> > > of system resources, the number of entries on the linked lists of the
> > > file cache can grow significantly.
> > > 
> > > When this condition happens, the NFS performace grounds to a halt as
> > > reported here [1].
> > That was a v5.15.131 kernel. The LRU problems were addressed in
> > v6.2. This doesn't seem like a clean rationale for adding this
> > reaper behavior in, say, v6.9.
> 
> Do you know what commits in v6.9 fix this problem?

The filecache LRU issues were addressed in v6.2 and before.


> Regardless, I think when the number of delegations is getting large,
> it's good to ask the clients to release unused delegations. The
> Linux client keeps unused delegations for about ~90 secs before
> returning them.

That seems reasonable. No need for the server to ask for them back
unless it is under some stress.


> > > This patch attempts to alleviate this problem by asking the clients to
> > > voluntarily return any unused delegations when the number of delegation
> > > reaches 3/4 of the max_delegations by sending OP_CB_RECALL_ANY to all
> > > clients that hold delegations.
> > I don't have a strong sense of how big max_delegations can get. Is
> > there evidence that CB_RECALL_ANY has enough impact, reliably, to
> > reduce the size of the filecache?
> 
> I don't have any numbers for this. But in my testing, the Linux client
> returns unused delegations immediately when receiving the CB_RECALL_ANY
> instead of keeping them ~90 secs.

The issue is whether returning delegations in bulk has any impact on
filecache behavior. I'd like to see some evidence of that, because
after v6.2 I think NFSD's filecache shouldn't be perturbed by lots of
open files.


> > More below.
> > 
> > 
> > > [1] https://lore.kernel.org/all/PH0PR14MB5493F59229B802B871407F9CAA442@PH0PR14MB5493.namprd14.prod.outlook.com
> > > 
> > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > ---
> > >   fs/nfsd/nfs4state.c | 9 +++++++++
> > >   1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index fdc95bfbfbb6..5fb83853533f 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -130,6 +130,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
> > >   static const struct nfsd4_callback_ops nfsd4_cb_getattr_ops;
> > >   static struct workqueue_struct *laundry_wq;
> > > +static void deleg_reaper(struct nfsd_net *nn);
> > >   int nfsd4_create_laundry_wq(void)
> > >   {
> > > @@ -696,6 +697,9 @@ static struct nfsd_file *find_any_file_locked(struct nfs4_file *f)
> > >   static atomic_long_t num_delegations;
> > >   unsigned long max_delegations;
> > > +/* threshold to trigger deleg_reaper */
> > > +static unsigned long delegations_soft_limit;
> > > +
> > >   /*
> > >    * Open owner state (share locks)
> > >    */
> > > @@ -6466,6 +6470,7 @@ nfs4_laundromat(struct nfsd_net *nn)
> > >   	struct nfs4_cpntf_state *cps;
> > >   	copy_stateid_t *cps_t;
> > >   	int i;
> > > +	long n;
> > >   	if (clients_still_reclaiming(nn)) {
> > >   		lt.new_timeo = 0;
> > > @@ -6550,6 +6555,9 @@ nfs4_laundromat(struct nfsd_net *nn)
> > >   	/* service the server-to-server copy delayed unmount list */
> > >   	nfsd4_ssc_expire_umount(nn);
> > >   #endif
> > > +	n = atomic_long_inc_return(&num_delegations);
> > I don't think you want to modify the number of delegations here.
> > atomic_long_read() instead?
> 
> Right, it's a typo, it should be a read.
> 
> > 
> > 
> > > +	if (n > delegations_soft_limit)
> > This introduces a mixed-sign comparison: n is a long, but
> > delegations_soft_limit is an unsigned long. I'm always suspicious
> > about whether an atomic counter can underflow, and then we have
> > real problems when there are mixed-sign comparisons.
> 
> Yes, n should be unsigned long.
> 
> > 
> > But I'm also wondering if, instead, this logic should look directly
> > at the length of the filecache LRU.
> 
> Is there a quick check; counter? otherwise if we have to walk the
> list then it just compounds the problem.

Yep:

	list_lru_count(&nfsd_file_lru)

That doesn't really tell us whether those LRU items are pinned by a
delegation, though, hm. Just thinking out loud.


> > > +		deleg_reaper(nn);
> > >   out:
> > >   	return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT);
> > >   }
> > > @@ -8482,6 +8490,7 @@ set_max_delegations(void)
> > >   	 * giving a worst case usage of about 6% of memory.
> > >   	 */
> > >   	max_delegations = nr_free_buffer_pages() >> (20 - 2 - PAGE_SHIFT);
> > > +	delegations_soft_limit = (max_delegations / 4) * 3;
> > I don't see a strong reason to keep delegations_soft_limit as a
> > separate variable.
> 
> Just to save some CPU cycles for not to recompute the soft limit every time.
> However, since this is done in the laundromat then it probably not that
> critical. Also recompute the soft limit will work correctly if the max_delegations
> is changed dynamically.
> 
> -Dai
> 
> > 
> > 
> > >   }
> > >   static int nfs4_state_create_net(struct net *net)
> > > -- 
> > > 2.39.3
> > >
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fdc95bfbfbb6..5fb83853533f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -130,6 +130,7 @@  static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
 static const struct nfsd4_callback_ops nfsd4_cb_getattr_ops;
 
 static struct workqueue_struct *laundry_wq;
+static void deleg_reaper(struct nfsd_net *nn);
 
 int nfsd4_create_laundry_wq(void)
 {
@@ -696,6 +697,9 @@  static struct nfsd_file *find_any_file_locked(struct nfs4_file *f)
 static atomic_long_t num_delegations;
 unsigned long max_delegations;
 
+/* threshold to trigger deleg_reaper */
+static unsigned long delegations_soft_limit;
+
 /*
  * Open owner state (share locks)
  */
@@ -6466,6 +6470,7 @@  nfs4_laundromat(struct nfsd_net *nn)
 	struct nfs4_cpntf_state *cps;
 	copy_stateid_t *cps_t;
 	int i;
+	long n;
 
 	if (clients_still_reclaiming(nn)) {
 		lt.new_timeo = 0;
@@ -6550,6 +6555,9 @@  nfs4_laundromat(struct nfsd_net *nn)
 	/* service the server-to-server copy delayed unmount list */
 	nfsd4_ssc_expire_umount(nn);
 #endif
+	n = atomic_long_inc_return(&num_delegations);
+	if (n > delegations_soft_limit)
+		deleg_reaper(nn);
 out:
 	return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT);
 }
@@ -8482,6 +8490,7 @@  set_max_delegations(void)
 	 * giving a worst case usage of about 6% of memory.
 	 */
 	max_delegations = nr_free_buffer_pages() >> (20 - 2 - PAGE_SHIFT);
+	delegations_soft_limit = (max_delegations / 4) * 3;
 }
 
 static int nfs4_state_create_net(struct net *net)